A Mobile Middleware Component Providing Voice Over IP Services to ...

A Mobile Middleware Component Providing Voice Over IP Services to Mobile Users Michael Wallbaum1 , Dominique Carrega2, Michael Krautgartner3 and Hendrik Decker3 Aachen University of Technology, Department of Computer Science IV, 52056 Aachen, Germany 1

2

3

[email protected]

Tesci SA, Tour Aurore, 18 Place des Re ets, 92975 Paris La Defense Cedex, France [email protected]

Siemens AG, ZT SE 2, 81730 Munich, Germany [email protected]

Abstract. The ACTS project MOVE currently designs and develops a

middleware architecture called Voice-Enabled Mobile Application Support Environment (VE-MASE). The VE-MASE enhances the middleware architecture, which was developed in the ACTS project OnTheMove, by providing support for interactive real-time multimedia applications and integrated voice and data services. In preparation for future 3rd generation mobile networks, the aim is to enable a completely new class of interactive multimedia services targeted at, but not limited to, mobile devices that are equipped with the VE-MASE. This paper discusses the VE-MASE component called Audio Gateway, which is responsible for providing Internet Telephony services to mobile users.

1 Introduction In the near future, wherever they are, people will want to use multimedia information services, whether global or local, on both xed and mobile terminals, with comparable availability of services and quality of communication. The upcoming Universal Mobile Telecommunication System (UMTS) [SdS+] will lead to a convergence of the wireless network infrastructure and will provide mobile users with universal access to the entire communications space. Up to 27 million users of UMTS mobile multimedia services by the year 2005 are forecast by a European market analysis [Swai97]. Today's usage of wide area mobile networks is mainly limited to digital voice connections, fax services and short message services. Although undoubtedly useful, these services do not provide adequate support for multimedia applications because they do not well synchronise the dierent media, especially voice and data. In particular, no common interface between service providers and the dierent mobile and xed network providers exists to support standard voice-enabled applications. Mobile applications based on the Internet Protocol (IP) include

voice-based services only in a very simpli ed manner { by downloading small les containing speech data. In contrast, IP-based speech services on xed networks such as Internet Telephony are rapidly evolving and will be a part of the future telecommunication world. Also, multimedia electronic mail, hypermedia information browsers and video conferencing systems are on their way to everyday use. Nowadays, there exists already a large gap between services available on xed networks and those available to mobile users. This gap is addressed by the MOVE [Move] project. The main project objectives are: { To design a middleware architecture - called Voice-Enabled Mobile Application Support Environment (VE-MASE) - that supports strong integration of voice and data over UMTS networks for interactive mobile multimedia services and applications. { To evaluate emerging mobile multimedia protocols integrating voice and data communication and architectural approaches well suited for interactive wireless multimedia communication. { To oer an Voice/Data Application Programming Interface (V/D-API) to content and service providers for rapid and exible deployment and operation of voice/data services, supporting on-line user contextual assistance. { To specify and prototype a demonstration service that demonstrates the bene ts of the V/D-API. { To de ne a demonstration on advanced Personal Digital Assistants (PDAs) or notebooks to demonstrate the V/D-API and VE-MASE architecture, and to show the value of the approach for service providers. By achieving these objectives, MOVE will accelerate the use of voice-enabled mobile multimedia information services and assist the development of new applications and network-embedded mobility support. This paper introduces the VE-MASE component which provides mobile users with the ability for real-time audio conferencing, namely the Audio Gateway. Section 2 introduces the MOVE project's approach and describes the architecture and design of the VE-MASE and its key components. Section 3 describes the design and functionality of the Audio Gateway. Speci cally, its architecture and interworking with other VE-MASE components is introduced and issues related to signalling are discussed. An outlook on how the Audio Gateway will be integrated and deployed in a demonstrator testbed will be given in Section 4.

2 The VE-MASE Architecture The proposed VE-MASE architecture enhances the Mobile Application Support Environment (MASE), that was developed in the course of the OnTheMove project [OTM]. The MASE provides for seamless integration of dierent bearers, carriers and terminal types with a focus on "static" multimedia services such as the delivery of textual information and images. It was assumed that the

GSM ISDN / PSTN / GSM

Wireless Cellular Network LAN

VE-MASE Mobility Gateway

Data network (e.g., Internet)

VE-MASE equipped Service Provider

UMTS

Fig. 1. Overview Of The MOVE Network Architecture. mobile user only generates and sends a limited amount of data. The VE-MASE re nes the existing MASE components and adds new components to create a true multimedia infrastructure enabling bi-directional, real-time services. Figure 1 illustrates the physical distribution of the VE-MASE. The VE-MASE functionality is accessed through the Voice/Date Application Programming Interface (V/D API). Applications making use of the V/D-API may run partly on Gateway Servers located in the xed network, partly on Information Servers located either within the mobile network or outside, and partly, on the mobile devices themselves. Typically, application parts running on the Information or Gateway Server are service provider application parts relieving the mobile devices from complex operations. The VE-MASE will adapt to the speci c transport capabilities of the underlying networks and supply a set of mobility and voice related management functions as depicted in Figure 2:

{ The Audio Gateway on the Mobility Gateway provides for real-time audio

conferencing between peers in a wireless access network and peers located in a xed network environment. { The Collaboration Manager on the Mobility Gateway enables the members of a conference to perform collaborative web browsing. { The Call Manager is part of the distributed VE-MASE architecture and is responsible for the set-up and the termination of a voice/data conference between a mobile client and another client with voice/data capability. The call set-up is based on an existing IP connection; either a call is placed from the client terminal or call set-up information is delivered to the proxy gateway where the connection is set up to a Voice over IP based terminal, e.g., a call-centre. Depending on real-time changes of QoS parameters, the voice/data conference is possibly terminated or only the audio session is terminated.

Fig. 2. The VE-MASE Architecture.

{ The Scheduler ensures that real-time streams (e.g. audio) are not delayed

by non real-time data (e.g. data created by collaborative web browsing). Incoming packets are classi ed according to their service class. Quality of Service (QoS) parameters are measured for each stream and signalled to the System Adaptability Manager. { The MOVE extension of the System Adaptability Manager (SAM) collects events and measurements from the Audio Gateway, the Scheduler, and in principle also from the Collaboration Manager and the HTTP proxy. The quality of service (QoS) parameters are analysed in real-time for real-time audio and non real-time data and the result (e.g., QoS class cannot be guaranteed) is delivered to the Call Manager. The function of the SAM extension is to perform QoS trading for the complete transmission medium (e.g., voice and data), i.e. by co-ordinating per-stream QoS trading. { The Pro le Manager will give the service provider access to user-speci c context information (e.g., to a history of HTML downloads preceding a particular information access). The following sections will discuss the Audio Gateway and how it interworks with the other VE-MASE components.

Stationary Terminal

Internet

Mobile Terminal

Mobility Gateway

Fig. 3. A Simple Audio Gateway Scenario.

3 Audio Gateway The Audio Gateway is a central component of the VE-MASE since it provides for real-time audio conferencing between peers in a low bandwidth wireless access network and peers located in the xed network environment. Connected by one or more wireless access networks like GSM, DECT or Wireless LAN, mobile clients receive a possibly adapted Voice over IP (VoIP) stream from the associated Audio Gateway, which is located at the border between the wireless and the xed network environment. Likewise the Audio Gateway forwards audio data from the mobile to the stationary client. The bi-directional and real-time properties of the gateway, which primarily dierentiates the VE-MASE from its predecessor MASE, provides the basis for a number of new interactive mobile services and applications, such as the call centre demonstration service that will be prototyped and investigated by the MOVE project. The Audio Gateway acts as a mediator between the participants in the xed and wireless networks. Instead of just routing the media streams from one side of the gateway to the other by network layer protocols, the gateway intercepts the data streams on the application level. There it can manipulate the streams by means of intelligent operations such as transcoding, mixing, congestion control, etc. In principle, this scheme makes the Audio Gateway the endpoint of

communication for the transport level entities of both parties engaged in a conference. Figure 3 shows a simple conference scenario with a stationary and a mobile terminal. To achieve wide acceptance within the user community and in order to allow for simple implementation by reuse of existing code, the Audio Gateway makes use of standard protocols such as the Real-Time Transport Protocol (RTP)[Sch96a]. Thereby it can interact with standard VoIP clients, i.e. no assumptions are made concerning the client software used by the peer in the xed network, except that it must use RTP to transport frames of encoded speech data over an IP-network. The Audio Gateway is closely modelled on the Video Gateway that was implemented as part of the OnTheMove project [MSP97] [Kre+98]. Like the Video Gateway it is located at the border to the access network and performs operations such as forwarding and transcoding of media streams on the application level. But the Audio Gateway diers from the Video Gateway in three important aspects: { In order to support interactive voice services such as those provided by the PSTN the Audio Gateway enables bi-directional, full-duplex transport of audio data. { The Audio Gateway needs to consider the very strict delay and jitter requirements of interactive audio conferencing. { The Audio Gateway is tightly integrated in the VE-MASE architecture, thus allowing integrated, adaptive transport and scheduling of voice and data packets. Nevertheless, the conceptual similarities of the Audio and Video Gateways could ease a potential integration of these components in the future. The following section describes the Audio Gateway's architecture.

3.1 The Gateway Architecture

As the Audio Gateway acts as the endpoint of communication on the transport level for all parties taking part in the conference, its main task is to receive an audio stream from a sender and forward it to a receiver. The transport address of the actual receiver of a stream is conveyed to the gateway at session initiation time. The Audio Gateway follows a layered architecture as shown in Figure 4. According to the prevailing quality of service (QoS) parameters, a channel selector determines which of the incoming media streams can be delivered to the output. If necessary, the next layer transcodes the data streams of the selected channels into a dierent format. The mixer merges the dierent channels to form a single audio stream, which is passed to the application framing entity. It subdivides the bit stream into application frames that allow limiting the amount of loss occurring during transmission. Furthermore, the network adaptation layer is responsible for adapting the bit stream to the form most suitable for transmission over the given network.

FIFO

FIFO

FIFO

Channel Selector

Transcoder

Mixer

QoS

Application Framing

Network Adaptation

Fig. 4. The Layered Gateway Architecture.

The focus in the discussion of the gateway architecture is on the downlink (the direction to the mobile client), where channel selection and mixing are bene cial. For the uplink it is assumed that a mobile client will only generate one media stream, which will simply be forwarded to the client in the xed network after potential transcoding.

Media And Channel Selection. The Media and Channel Selector is used in

conjunction with the Video Gateway and plays a vital role in scenarios involving changing network conditions, e.g. caused by roaming, and in scenarios where the mobile user cannot in uence the sender's output. The Media and Channel Selector obtains the current network con guration from the QoS Module, which comprises a characterisation of the input streams and the QoS characteristics of the currently employed wireless access network. The former includes the number of audio (and possibly video) channels and a description of each incoming bit stream, stating, e.g., bit rate and frames per second. According to the network conditions the selector will choose channel and media from the input stream and, if possible, con gure available transcoders.

Transcoding And Codecs. The employed audio codecs dictate most of the

QoS requirements. For example, a simple H.323 [ITU1] compliant conferencing system produces a stream of G.711 coded frames with a bit rate of 64 kbit/s. The Audio Gateway can transcode this audio stream, e.g. to G.723.1 [Cox97]. The latter is a newer format which allows to reduce the needed bandwidth to a bit rate below 5.6 kbit/s. A potential drawback of this scheme is the additional delay introduced by transcoding which, in fact, requires decoding and encoding each frame. Though transcoding should be avoided when it is possible to inform the sender about the desired target encoding, it is a legitimate technique to enable mobile users to receive media streams, which do not require their interaction. Examples of this include lectures and conferences broadcast on the Internet or the reception of radio programmes.

Mixing. So far it was assumed that a conference consists of two participants

{ one in the xed network and one in the wireless access network. But it is also conceivable to have more than two parties involved. If the media streams are not distributed by a multicast transport mechanism but by unicast, then the receivers will have one input stream from each sender. To meet the constraints of the narrowband wireless link, an application level relay called a mixer [Sch96a] is included in the Audio Gateway. The mixer resynchronises incoming audio frames to reconstruct the constant spacing between them and combines these reconstructed audio streams into a single stream. This scheme saves valuable bandwidth since the mixer's single output stream contains the audio information of all input streams, e.g. mixing together three G.723.1 streams into one G.723.1 stream reduces the amount of required bandwidth from 15.9 kbits/sec to 5.3

kbits/sec. A drawback of mixers is that receivers cannot control the playback of individual senders, i.e. they cannot mute or adjust the volume of a speci c sender's data, which is contained in the mixed stream.

Application Level Framing. Application level framing [CT90] is closely re-

lated to the coding scheme. If knowledge about the syntax of the coder output is available, the bit stream may be segmented into frames at the application layer. The newly created segments form the smallest loss unit that can be encountered at the application layer. This is the basis for graceful degradation in case of transmission errors or losses. Because of the semantic knowledge of the bit stream, the error characteristics of the wireless network can be taken into account. It is not envisaged to change the packetisation of the frames as they arrive from the stationary sender. Increasing the number of frames per packet would lead to more delay and would also increase the probability of packet losses. Decreasing the number of frames per packet is also not desirable, since it causes a greater packetisation overhead, though this eect can be partially avoided when IP/UDP/RTP header compression [CaJa99] is employed.

Network Adaptation. At this level, full knowledge of the semantic structure

of the bit-/frame-stream should be available and may be exploited to enhance the loss or error properties: Possible methods include protection of vital information, commonly known as unequal error protection, by retransmission or forward error correction (FEC) codes. A form of forward error correction, that plays a role in VoIP, is the redundant encoding of audio data [Per+97]. Whilst FEC schemes are often overly complex and constantly require additional bandwidth, ARQ (automatic repeat request) schemes add additional delay. The choice of application controlled ARQ schemes nonetheless has its bene ts. The application can decide if and when a retransmission is necessary. If only non-vital information is corrupted, the application can decide to use other methods to conceal the error. By separating the error and loss protection from the network/transport layer and moving it to the application layer, content-based corruption handling becomes possible. This is in strong contrast to current transport protocols where all packets are handled the same way. The application only de nes the guidelines and policies, while the application layer handles the rest to relieve the application and to provide a predictable behaviour.

3.2 The Gateway Client The generic Audio Gateway architecture provides for a variety of possible realisations and con gurations. Most importantly, the split between the wireless and the wireline network on the levels of transport and signalling makes it possible to employ dierent client software on both sites. This is especially useful, since both sites have dierent requirements:

{ On the mobile client site ecient use of bandwidth is of utmost impor-

tance. Furthermore, an Audio Gateway client should have minimal storage and CPU resource requirements to meet the restrictions of today's mobile terminals such as notebooks and PDA's. { On the stationary client site it is meaningful to employ legacy applications. The usage of legacy applications enables communication with hosts, which are not integrated into the VE-MASE architecture, thus raising the users' and providers' acceptance of the VE-MASE middleware and its applications. Since RTP/RTCP is a widely acknowledged standard for the ecient realtime transport of audio and video data, dierences are mainly re ected on the level of signalling. This issue will be discussed in the following section.

3.3 Signalling Issues A goal of the MOVE project is to enable xed network users to conduct VoIP sessions with mobile VE-MASE users by employing their regular client. The ITU standard H.323 [ITU1] for audio- and video conferencing is widespread among commercial client applications. It uses RTP/RTCP as a transport protocol, but includes a variety of other protocols for conference setup, control and termination. The following discussion of the relation of H.323 to the Audio Gateway is limited to the signalling required for Internet Telephony (H.245, Q.931) and does not include the conferencing protocol T.120 for data exchange. Deploying an enhanced H.323 gatekeeper [ITU2] as part of the Audio Gateway is one possible way to perform call control. With this scheme standard Internet Telephony software can also be used on the mobile client, while an interception of the audio data at the mobility gateway is still required in order for the Audio Gateway to perform its tasks. The gatekeeper, which is an optional component in an H.323 system, provides call control services to the H.323 endpoints. This entails address translation, admission control, bandwidth control, and zone management. Other optional functions are call authorisation, bandwidth management, call management, and call control signalling. For the interesting call control signalling mode, the gatekeeper may choose to complete the call signalling with the endpoints and may process the call signalling (Q.931) itself. The gatekeeper uses the ability to route H.323 calls, which can also be used to re-route a call to another endpoint. Extending the call signalling capabilities beyond the standard is important if transcoding of the audio data by the Audio Gateway is necessary, e.g. due to dierent voice codecs at the two endpoints. Thus, a H.245 capability exchange has to be conducted. Using the Gatekeeper routed call signalling the H.245 Control Channel can be routed between the endpoints through the Gatekeeper. The alternative to using end-to-end H.323 signalling is to employ lightweight signalling for the wireless link. This option is currently implemented in the MOVE project. In this scenario the mobility gateway acts as the termination point of the H.323 signalling for the H.323 compliant VoIP client in the xed network. The VoIP client in the wireless network employs a lightweight signalling

protocol for call control (e.g., audio conference with minimal control [Sch96b]). This scheme provides for small mobile devices with restricted storage and CPU capacities, since the Audio Gateway Client does not have to implement a full H.323 stack. Furthermore, it makes more ecient use of the access network's limited bandwidth than H.323, since the signalling overhead is kept to a minimum [Sch98]. For integrated conference control, i.e. target localisation, capability exchange, etc., the lightweight SIP protocol [HSSR99] can be deployed. It can signal the request for a combined data and voice session from the stationary or mobile client to the mobility gateway including the IP address of the mobile user to be called.

3.4 Interworking With Other VE-MASE Components The VE-MASE components located on the Mobility Gateway provide several functions that are needed by the Audio Gateway. The most important component is the UMTS Adaptation Layer developed in OnTheMove which oers a uniform API to transport services provided by dierent access networks and bearer services. The API allows querying of complex QoS parameters and status information indicating the currently prevailing conditions of the installed access networks. The UAL is needed to allow a simpli ed transport access and to get QoS parameters needed for the calculation of e.g. ltering functions and transcoders. Conversely, the Audio Gateway provides other VE-MASE components such as the System Adaptability Manager with vital information. The SAM monitors QoS parameters dynamically and delivers an overall voice/data QoS level to the Call Manager according to the QoS trading policy (usually, to always strive for an optimal QoS and to ensure the smoothness of quality degradation). More precisely, the SAM collects events and measurements from the Audio Gateway, the Scheduler, the Collaboration Manager, the HTTP Proxy and its multimedia conversion component. The QoS parameters are analysed in real-time for real-time audio and non-real-time data. The result (e.g., QoS class cannot be guaranteed) is delivered to the Call Manager, who also receives an adapted QoS class that actually can be guaranteed. In turn, the Call Manager instructs the Audio Gateway and the Collaboration Manager accordingly (e.g., to use a lower or higher bandwidth codec, or to shut down or resume web browsing).

4 Outlook The Audio Gateway is currently being implemented and will be integrated into the MOVE demonstrator, which will serve the validation of the project's approach. The demonstrator scenario involves a mobile customer browsing through a Web site proposing a location-aware service for mobile users, and a "callcentre" agent providing vocal and multimedia assistance to the customer, with

the help of a customer support application. The service designed for demonstration purposes consists of a hotel-search service for mobile customers. With the help of this demonstrator a qualitative evaluation of the Audio Gateway will be possible in order to determine whether speech quality, delay, etc. are acceptable for the users. A quantitative evaluation using dierent access networks (e.g. DECT, Wireless LAN and GSM) will also be performed in order to determine actual values for the above quality of service parameters.

References [CaJa99] S. Casner, V. Jacobson. Compressing IP/UDP/RTP Headers for Low-Speed Serial Links. IETF Request For Comments 2508 (February 1999). [Cox97] R.V. Cox. Three Speech Coders from the ITU Cover a Range of Applications. IEEE Communications Magazine (November 1997). [CT90] D. D. Clark, D. L. Tennenhouse. Architectural considerations for a new generation of protocols. SIGCOMM Symposium on Communications Architectures and Protocols , (Philadelphia, Pennsylvania), pp. 200{208, IEEE, Sept. 1990. Computer Communications Review, Vol. 20(4) (September 1990). [HSSR99] M. Handley, H. Schulzrinne, E. Schooler, J. Rosenberg. SIP: Session Initiation Protocol. IETF Request For Comments 2543 (March 1999). [ITU1] ITU-T Rec. H.323. Packet-based multimedia communication systems. (February 1998). [ITU2] ITU-T H-Series Recommendations. H.323 Version 2 { Packet based multimedia communications systems. (1998). [Kre+98] B. Kreller, A. Park, J. Meggers, G. Forsgren, E. Kovacs, M. Rosinus. UMTS: A Middleware Architecture and mobile API Approach. IEEE Personal Communications Magazine (April 1998). [Move] ACTS MOVE Homepage. http://move.rwth-aachen.de. [MSP97] J. Meggers, T. Strang, A. Park. A Video Gateway to Support Video Streaming to Mobile Clients. ACTS Mobile Communication Summit, Aalborg (October 1997). [OTM] ACTS OnTheMove Homepage. http://www.sics.se/~onthemove. [Per+97] C. Perkins, et al. RTP Payload for Redundant Audio Data. IETF Request For Comments 2198 (September 1997). [SdS+] J. Schwarz da Silva et al. Evolution Towards UMTS. ACTS Infowin, http://www.infowin.org/ACTS/IENM/CONCERTATION/MOBILITY/umts0.htm. [Sch96a] H. Schulzrinne et. al. RTP: A Transport Protocol for Real-Time Applications. IETF Request For Comments 1889 (January 1996). [Sch96b] H. Schulzrinne. RTP Pro le for Audio and Video Conferences with Minimal Control. IETF Request For Comments 1890 (January 1996). [Sch98] H. Schulzrinne, J. Rosenberg. A Comparison of SIP and H.323 for Internet Telephony. Network and Operating System Support for Digital Audio and Video (NOSSDAV), Cambridge, England (July 1998). [Swai97] R.S. Swain. Evolving the UMTS Vision. Report of the Mobility, Personal and Wireless Communications Domain of the European Communities ACTS Programme (December 1997).