Establishing Multimodal Telepresence Sessions using the ... - CiteSeerX

5 downloads 1209 Views 2MB Size Report
Clearly, the teleconferencing protocols are well suited for managing teleoperation sessions, and can ease the develop- ment and use of Internet telerobotics.
Establishing Multimodal Telepresence Sessions using the Session Initiation Protocol (SIP) and Advanced Haptic Codecs H. Hawkeye King∗ , Blake Hannaford∗

Julius Kammerl† , Eckehard Steinbach†

Department of Electrical Engineering University of Washington Seattle, WA, USA

Institute for Media Technology Technische Universit¨ at M¨ unchen Munich, Germany

Abstract In telepresence and telemanipulation systems, multimodal data is exchanged over a network allowing humans to experience and to operate in remote or inaccessible environments. To operate over the global Internet and connect to multiple telepresence systems, a flexible framework for initiating, handling and terminating Internet-based telerobotic sessions becomes necessary. In this work, we explore the use of standard Internet session and transport protocols in the context of telerobotic applications. Session Initiation Protocol (SIP) is widely used to handle multimedia teleconference sessions with audio, video or text, and provides many services advantageous for establishing connections between heterogeneous haptic interfaces and telerobotic systems. We apply the session paradigm to the creation and negotiation of haptic telepresence sessions and propose to extend this framework to work with the haptic modality. The notion of a “haptic codec” is introduced for transforming haptic data into a common format, applying data reduction or compression techniques and implementing teleoperation control architectures. The use of the RealTime Transport Protocol (RTP) is explored for transport of teleoperation data. Finally, a prototype and demonstrator system is presented for evaluation of the proposed framework. Keywords: Telerobotics, telepresence, teleoperation, SIP, haptic codec. Index Terms: H.4.3 [Information Systems Applications]: Communications Applications – Computer conferencing, teleconferencing, and videoconferencing; H.5.2 [Information Interfaces and Presentation]: User Interfaces – Haptic I/O. 1 INTRODUCTION Telerobotics has found uses in specialized applications such as handling of radioactive materials, space robotics and telesurgery. To provide telepresence immersion into a distant environment multimodal sensory data, such as video, audio, tactile and force feedback information, are transmitted over a communication channel and displayed to the human operator. Meanwhile, the Internet has brought high-speed communication networks into every corner of our lives spurring the development of telerobotic systems for Internet operation. Many telerobotics research groups are rushing to explore this exciting avenue, but the communication model typically used is still that of pre-internet point-to-point networks. It is common for each teleoperation system to be ∗ e-mail:

{hawkeye1,blake}@u.washington.edu eckehard.steinbach}@tum.de

† e-mail:{kammerl,

programmed with the IP address of its remote partners, a practice with limited flexibility for locating users at multiple addresses [1–3]. Furthermore, there are no established protocols for locating remote robots or standard formats for exchanging telerobotic control data. A common framework is therefore needed to allow a variety of haptic and teleoperation systems to work together flexibly. The challenges of establishing communication sessions among computer systems are fundamental to computer networking and have been extensively addressed. In multimedia teleconferencing, a wide range of network protocols exist for real-time audio and visual media exchange and are used by NetMeeting, AIM, Google Talk and others. These protocols solve complex engineering tasks such as resource location, media and codec negotiation, data transport, and security. Clearly, the teleconferencing protocols are well suited for managing teleoperation sessions, and can ease the development and use of Internet telerobotics. This paper describes the adoption of standard Internet telephony and teleconferencing session and media streaming protocols for telepresence robotics, and a proof-of-concept teleoperation system is presented. The proposed framework uses the widely adopted Session Initiation Protocol (SIP) [4] for multi-modal telepresence sessions including streaming audio, video and haptic modalities. SIP uses the Session Description Protocol (SDP) [5] during an initial call handshake to negotiate network parameters and select optimal codecs for streaming media. These features are not unique to SIP, but are also common to other Internet session protocols like IAX2 and H.323. In this context, we further introduce the notion of a “haptic codec”, akin to audio and video codecs, for integrating state-of-the-art compression techniques for haptic signals. The haptics modality has unique requirements, and therefore haptic codecs will likely implement various data reduction and teleoperation control techniques instead of compression or error correction more common to audio and video codecs. Haptic codecs will simplify standardization and sharing of new developments in teleoperative control, and allow teleoperation capabilities to be added to existing multimedia applications. There are many similarities between streaming teleoperation control data and other multimedia data. The Real-Time Transport Protocol (RTP) is commonly used in multimedia streaming applications. It is based on the User Datagram Protocol (UDP) and has many features that are useful for networked haptics such as synchronizing several media streams and tracking QoS statistics. In Section 2.2, advantages and disadvantages of using of RTP for transport of haptics media [6, 7] are weighed. Other groups have explored standardization of robotics communication. Previous work by Tachi et al. at University of Tokyo focused on auto-description of telerobotic configurations and dynamic teleoperation data types [8]. While the

Figure 1: Haptics/Teleoperation protocol suite uses SIP for session management, SDP for codec and parameter negotiation, and RTP for transport of encoded media.

R-Cubed markup language used a standard data interface based on XML, there was no notion of session description and there was no effort made to use other Internet standards for telepresence. Current robotics communication standards under development are geared towards mobile robots like UAVs and UGVs, but do not consider telemanipulation [9]. Many available software platforms exist for robotics that include networked communication [10–12]. However, these are not typically designed with telemanipulation in mind, and are unsuited for the high packet rates of stream based communication, sophisticated Internet addressing and presence, or low latency, closed-loop, haptic control over a global network. Recent work centered at the University of Washington has explored the standardization of teleoperative control data [1, 13]. However, this research is solely focused on the data interface between systems, and does not have the notion of a session or flexibility to automatically adopt different data encodings. In a new departure from prior research, our teleoperation protocol framework takes advantage of synergies between the teleconferencing and telerobotics domains, and examines how Internet communication standards and paradigms can boost flexibility and interoperability in telepresence robotics. The remainder of this paper is organized as follows. In Section 2, we give an overview of the proposed communication framework and protocol suite. Section 3 introduces the concept of “haptic codecs”, which provide a standardized interface for implementing haptic compression algorithms and haptic control architectures. An overview of our demonstrator system is given in Section 4. Section 5 concludes the paper. 2 Communication Architecture This section describes in detail how Internet session protocols for teleconferencing can be used for telepresence robotics. Figure 1 gives an overview of the proposed protocol suite. 2.1 Session Negotiation Several competing IP-based application layer protocols are commonly used for making audio/video/chat teleconference calls. Among the most popular are the Session Initiation Protocol (SIP), XMPP, H.323, IAX2 and Skype’s peer-topeer protocols. These protocols use a client-server model for addressing and contacting users; the exception is Skype which uses a peer-to-peer model instead. In the current work SIP is used since it is one of the most mature session protocols with many freely available open source libraries. SIP provides a comprehensive session architecture of standardized entities such as registry servers, redirect servers and proxy servers. These can provide important functionality

Figure 2: SIP master to slave call with both users addressed on the same server.

to flexible multi-modal telepresence sessions, such as nameaddress mapping, client localization, session forwarding and redirection, user management, capability negotiation, and security. SIP allows user addressing in a format similar to email addresses (i.e. [email protected]), and can also provide “presence” information, tracking users’ status as online or offline. The SIP specification allows reliable or unreliable transport protocols to be used for session management signals. Media transport is handled separately from session management signaling, and therefore any media transport protocol (e.g. TCP, UDP, RTP) can be used with SIP. Figure 2 gives an overview of session initiation, parameter negotiation, transport and session termination processes. To start a teleconference session the local user sends a SIP invitation to the remote user. Embedded in the SIP invitation is a SDP description of media types, codecs, and codec parameters (e.g. bit-rate and RTP/UDP port) supported by the inviting client. The remote user then acknowledges the invitation and answers the call, sending a SIP answer message. In the SIP answer is a SDP message choosing the media codecs and parameters supported by both parties, selected by comparing local capabilities to those indicated in the offer. For the haptic modality, codec and parameter negotiation may select the applied control architecture and parameters, workspace and device degrees of freedom, endeffector capabilities, and any other haptics-specific requirements. The master and slave capabilities, and the codec in use will determine what parameters will need to be negotiated. At this point all communication parameters have been decided. The multimedia session will then begin using the selected codecs, transport protocols and ports. For complete control of a telemanipulation system additional commands and information may need to accompany the haptic data stream. Specifically, important data like tool-change commands, emergency stops, or user information should be transmitted with a reliable mechanism. The SIP architecture provides the SUBSCRIBE and NOTIFY

Codec control

SIP

SIP Session Control (TCP)

SIP

System Description, System State, Control Parameters, etc

Haptic Codec d

“Haptic Raw” Data

Haptic D i Device

Haptic Transport (RTP/UDP) Transportable, “encoded” haptic data for selected control algorithm and data format.

Codec control

Haptic Codec d Haptic D i Device

“Haptic Raw” Data

Figure 4: Haptic codec transcodes raw haptic device data to and from formats for telerobotic control. The codec is controlled by the session management application, which also manages the transport Presented to Lehrstuhl für H. Hawkeye King and Julius Kammerl streams. Medientechnik, TUM, June 2009

Figure 3: SIP INVITE along with SDP media-type and codec offer.

interface [14] that allows SIP clients to request notification from, and exchange data with, remote nodes when events occur. Figure 3 shows an example of a SIP INVITE message containing a SDP descriptor. In this packet three media types are specified: video, audio and the new type haptic which we have added. For each media type one or more codecs may be offered. In this packet the GSM codec for voice is offered to support the audio media type, and for video media the h.264 codec is offered. One codec “Haptic Coder” is offered under the “haptic” media type. The addition of the haptic media type is fully compliant with the SDP specification. 2.2 Real-time Transport of Haptic Data Often the User Datagram Protocol (UDP) is used for remote haptic information, because of low latency and low overhead. The Real-Time Transport Protocol (RTP) is an Internet standard protocol for streaming media in real-time, that is based on UDP and provides many additional, useful features for teleoperation control over the Internet. Remote haptic interaction is highly sensitive to delay, since latency in the feedback loop can cause instability and degrade task performance [15, 16]. To minimize time delay, RTP uses the lightweight, yet unreliable UDP protocol as the underlying transport mechanism. Haptics, like many streaming media formats is typically robust to occasional packet loss, and thus RTP can detect and discard lost or out-of-order packets. It also includes built in mechanisms for jitter compensation, enabling the haptic application to trade-off additional latency for more consistent timing. RTP has a companion protocol, the RTP Control Protocol (RTCP), that operates in parallel collecting out of band statistics such as roundtrip delay and packet loss. With this information, RTP can perform stream synchronization between multiple media streams. This would allow haptics and video, for example, to be presented to the user in a time-coordinated manner if desired. Multicast is also supported by RTP, which would be beneficial for cooperative telerobotics and haptics. While RTP has these many convenient features for telerobotics data transport, there is a drawback in terms of packet header overhead. This is a critical factor, since haptic applications often use a 1kHz control loop between master

and slave that, for peak performance, should be maintained across the network channel. Thus, packet headers can take up significant bandwidth since they are sent at high packet rates. The minimum header size for an RTP packet is 12 bytes, which is significant since data payloads tend to be small; possibly in the range of 68 bytes [13]. However, some of the data, like timestamp and sequence number (6 bytes), is redundant with typical teleoperation packets and can be removed from the payload to reduce the packet overhead penalty. Also, recent results on haptic data reduction indicate that the required packet rate in teleoperation sessions can be reduced by more than 90% without impairing the transparency or immersiveness of the system [17]. Additional studies are necessary to determine what impact is incurred by the additional RTP packet overhead. 3

Codecs for Haptic Signals

In this Section, we introduce the notion of a “haptic codec” that can be plugged into our proposed communication framework. Virtually all multimedia applications use codecs to support a range of media types and encodings, so using haptic codecs facilitates easy integration of the haptic modality with existing software. A codec is essentially a method for transcoding data to and from a raw format to another format that is of better rate-distortion ratio and/or has better error resiliency properties. For haptics, the raw input format to the host computer is the device position and orientation, while the raw output is force and torque. The encoded format could take on many different types depending on the teleoperation requirements. Bit-rate reduction schemes and data transforms for stability and time delay compensation are often used in network teleoperation. These are likely candidates for implementation as haptic codecs. An example of perceptual haptic data reduction is the “Deadband” coding approach [17]. By removing haptic information below human perception thresholds, haptic data can be encoded with fewer samples than the corresponding not encoded representation while keeping introduced distortion below human perception thresholds. This kind of psychophysics-based haptic data compression has demonstrated data reduction rates of up to 95% without impairing the immersiveness of telerobotic systems. Bilateral networked control architectures such as the scattering transformation [15] and passivity observer/passivity controller [18] can ensure the stability of a teleoperator system in the presence of network time-delay or stiff controller gains. This is a new application of the codec paradigm, since these considerations are specific to telerobotics and haptics.

(a) Teleoperator

(b) OpenPhone GUI

Figure 5: Overview of the demonstrator and prototyping system for SIP-based teleoperation. 5(a) shows an image of the teleoperator. The system employs a SensAble PHANToM Omni haptic device equipped with a Logitech audio/video Webcam and a writing instrument end effector. 5(b) shows a screenshot of the operator display with GUI and visual feedback.

The integration of a codec abstraction layer for the haptic transport stream allows easy selection of the global control architecture, based on the system requirements, available control codecs, and system parameters. It also frees the controls engineer from the details of networking and session management. These are just two examples of useful codecs that could be implemented and used with SIP. Media codecs use standard formats (e.g. MPEG2) so that data can be shared and interpreted correctly across computing platforms. Haptic codecs transform raw, possibly devicespecific input and output into common data representations allowing interoperability of all devices using the codec. 4

Demonstrator system

To demonstrate the proposed session framework a proof-of-concept system was implemented. For a video showing the SIP controlled demonstrator system, please point your www browser to http://www.lmt.ei.tum.de/movies/HapticSIPSessions.m4v. The demonstrator hardware comprises two Linux PCs equipped with PHANToM Omni haptic devices (SensAble Inc., Woburn, MA, USA). The master system tracks the operator’s hand movements and provides haptic feedback. The master host computer also displays audio and video feedback from the slave robot. The slave Omni, shown in Figure 5(a) is equipped with a Logitech Webcam with integrated microphone attached to the final link. It also has a writing pen attached as an end effector. Software for the prototype is based on the Open Phone Abstraction Library (OPAL), which is “Free Software” and released under the GNU/GPL License. It is cross-platform compatible with the Linux, Mac OS X and Windows operating systems and provides a full featured SIP protocol stack including the SDP codec negotiation functionality and a low-latency RTP protocol implementation. We extended OPAL’s media handling routines to support a haptics media type, alongside audio and video media types. A haptic device interface wrapper was developed using the OpenHaptics API, and the newly invented “haptic raw” me-

dia format was defined and linked to the device with OPAL’s device interface API. This is a novel data format specification for raw haptic data frames. Additional encoded formats for haptic data, which might be developed in the future, are easily added using OPAL. The library has built in support for loading binary codec plugins, and the codecs themselves supply string descriptors specifying additional supported formats. The string descriptors are used by OPAL to match raw formats to codecs, and in SDP for auto-negotiation of optimal codecs. New code was classes were added to provided the necessary data sources and sinks, while the mostly unmodified OPAL library managed the session, connected the raw format to the codec, and established media streams using the codec. OpenPhone is a SIP capable GUI client bundled with OPAL. For the demonstrator system, OpenPhone was compiled against the modified OPAL libraries. With no modification of the OpenPhone source code the application was able to perform multi-modal telepresence sessions with bilateral teleoperative control. Once the session is established media streams are set up automatically by OPAL with no human intervention. A screenshot of the application is shown in Figure 5(b). A simple haptic codec for this system implements a bilateral position-position control of the haptic devices. The unsophisticated proof-of-concept codec simply reads position from the haptic device API and uses a basic teleoperation packet to send uncompressed data to the remote host. PD control is implemented in a lower-level device interface. Many further refinements of this example codec are possible. In the demonstration scenario, the master-side host computer calls the slave-side telerobot to initiate a SIP call. When the call is accepted the SDP stack selects codecs for audio, video and haptic media transmission. After codec negotiation, the OPAL library creates the three RTP multimedia streams, one for each modality, and handles the data flow between the device, codec and network (see Figure 4). The demonstrator successfully established haptics, audio, and video media streams, transmitting haptics data at close to 1kHz.

5 Conclusions and Future Work In this paper, we presented a novel framework leveraging Internet session and presence protocols for telepresence robotics. The framework is based of the widely adopted Session Initiation Protocol (SIP), which is designed for Internet telephony and teleconferencing. Entirely within the SIP standards, a method for establishing real-time haptics transport streams was described and demonstrated. Furthermore, we proposed the use of haptic codecs that can be integrated into existing teleconferencing software. These codecs can implement a multitude of data transforms, and implement haptic compression algorithms or control architectures. Haptics codecs can provide standardized data interfaces for easily interconnecting novel robots. Finally, we described a demonstrator system for prototyping and evaluating the SIP-based communication architecture and haptic codecs. We predict that using robust session protocols and common haptic codecs will enable interoperability among a heterogeneous population of next generation of telerobots in a wide range of applications. An important aspect which was not addressed is exactly what parameters will be negotiated before and during a session. To allow interoperability among unique teleoperator systems with widely varying designs and capabilities it might be necessary to perform an initial negotiation determining what data will be shared and how that data should be applied to a robot. We are excited to work on this interesting problem. In addition, other teleoperation relevant information such as tool change commands or device failure notification should be communicated reliably between master and slave. SIP provides the SUBSCRIBE/ NOTIFY architecture, which allows users to request notifications from connected systems. Other protocol suites use XML-RPC or other event-based (instead of stream-based) mechanisms for this type of data. Future implementations of this system will incorporate additional messaging mechanisms for information outside the haptics stream. Finally, the properties of RTP should be studied in the context of high packet rate, low latency networked teleoperation. It remains to be seen whether the additional overhead of RTP packet headers and time to process the RTP information will effect teleoperator performance. 6 ACKNOWLEDGMENTS This work was supported by the German Research Foundation (DFG) within the Collaborative Research Centre SFB 453 on “High-Fidelity Telepresence and Teleaction”. We would like to thank Dr. Angelika Peer (Institute of Automatic Control Engineering (LSR), Technische Universit¨ at M¨ unchen) for her support and helpful suggestions. Furthermore, we would like to thank Xiaojun Ma for his contributions on developing our demonstrator system. References [1] H. H. King, K. Tadano, R. Donlin, D. Friedman, M. J. Lum, V. Asch, C. Wang, K. Kawashima, and B. Hannaford, “A Preliminary Protocol for Interoperable Telesurgery,” in Proceedings of the 17th International Conference on Advanced Robotics, 2009. [2] J. Arata, H. Takahashi, P. Pitakwatchara, S. Warisawa, K. Tanoue, K. Konishi, S. Ieiri, S. Shimizu, N. Nakashima, K. Okamura et al., “A remote surgery experiment between Japan and Thailand over Internet using a low latency CODEC system,” in 2007 IEEE International Conference on Robotics and Automation, 2007, pp. 953–959.

[3] A. Peer, S. Hirche, C. Weber, I. Krause, M. Buss, S. Miossec, P. Evrard, O. Stasse, E. Neo, A. Kheddar et al., “Intercontinental Cooperative Telemanipulation between Germany and Japan,” in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008. [4] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler, “SIP: Session Initiation Protocol,” RFC 3261 (Proposed Standard), Internet Engineering Task Force, Jun. 2002, updated by RFCs 3265, 3853, 4320, 4916, 5393, 5621. [Online]. Available: http://www.ietf.org/rfc/rfc3261.txt [5] M. Handley and V. Jacobson, “SDP: Session Description Protocol,” RFC 2327 (Proposed Standard), Internet Engineering Task Force, Apr. 1998, obsoleted by RFC 4566, updated by RFC 3266. [Online]. Available: http: //www.ietf.org/rfc/rfc2327.txt [6] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “RTP: A Transport Protocol for Real-Time Applications,” RFC 3550 (Standard), Internet Engineering Task Force, Jul. 2003, updated by RFC 5506. [Online]. Available: http://www.ietf.org/rfc/rfc3550.txt [7] H. Schulzrinne and S. Casner, “RTP Profile for Audio and Video Conferences with Minimal Control,” RFC 3551 (Standard), Internet Engineering Task Force, Jul. 2003. [Online]. Available: http://www.ietf.org/rfc/rfc3551.txt [8] S. Tachi, “Real-time remote robotics-toward networked telexistence,” IEEE Computer Graphics and Applications, vol. 18, no. 6, pp. 6–9, 1998. [9] Joint Architecture for Unmanned Systems (JAUS) Reference Architecture, Version 3.0, http://www.jauswg.org/. [10] H. Utz, S. Sablatn¨ og, S. Enderle, and G. Kraetzschmar, “Miromiddleware for mobile robot applications,” Robotics and Automation, IEEE Transactions on, vol. 18, no. 4, August 2002. [11] Microsoft Robotics Studio. http://msdn.microsoft.com/robotics. [12] B. Gerkey, R. Vaughan, and A. Howard, “The player/stage project: Tools for multi-robot and distributed sensor systems,” in Proceedings of the 11th International Conference on Advanced Robotics, 2003, pp. 317–323. [13] H. H. King, B. Hannaford, K.-W. Kwok, G.-Z. Yang, P. Griffiths, A. Okamura, I. Farkhatdinov, J.-H. Ryu, G. Sankaranarayanan, V. Arikatla, K. Tadano, K. Kawashima, A. Peer, T. Schauß, M. Buss, L. Miller, D. Glozman, J. Rosen, and T. Low, “Plugfest 2009: Global interoperability in telerobotics and telemedicine,” in 2010 IEEE International Conference on Robotics and Automation, 2010. [14] A. B. Roach, “Session Initiation Protocol (SIP)-Specific Event Notification,” RFC 3265 (Proposed Standard), Internet Engineering Task Force, Jun. 2002, updated by RFC 5367. [Online]. Available: http://www.ietf.org/rfc/ rfc3265.txt [15] R. Anderson and M. Spong, “Bilateral control of teleoperators with time delay,” Automatic Control, IEEE Transactions on, vol. 34, no. 5, pp. 494–501, May 1989. [16] M. J. Lum, J. Rosen, H. King, D. C. Friedman, T. Lendvay, A. S. Wright, M. N. Sinanan, and B. Hannaford, “Teleoperation in surgical robotics network latency effects on surgical performance,” in 31th Annual International Conference of the IEEE Engineering in Medicine and Biology Society EMBS, Sept. 2009. [17] P. Hinterseer, S. Hirche, S. Chaudhuri, E. Steinbach, and M. Buss, “Perception-based data reduction and transmission of haptic data in telepresence and teleaction systems,” IEEE Transactions on Signal Processing, vol. 56, no. 2, pp. 588– 597, Feb. 2008. [18] B. Hannaford and J. Ryu, “Time domain passivity control of haptic interfaces,” U.S. Patent # 7,027,965, 11-April- 2006.

Suggest Documents