Session Initiation Protocol Tuomas Nurmela Univerisity of Helsinki Seminar on Transport of Multimedia Streams in Wireless Internet
[email protected] Abstract—Session Initiation Protocol, SIP, provides controlplane signaling for the IP networks. SIP enables initiating, modifying and terminating sessions for a user, while maintaining neutrality to physical media capabilities and using other protocols to negotiate these. SIP assumes that the transport layer is inherently unreliable and as such provides transport layer mechanisms. For target device discovery SIP requires the use of application layer routing. Besides these, the protocol is extensible and has already been extended to support IETF presence framework and instant messaging. However, in order to perform in its core area, IP telephony call signaling, in regards to PSTNIP Telephony integration, the protocol requires further work especially in the area of emergency calls. 3GPP has decided to use SIP for signaling and work is ongoing to meet 3GPP network and IP multimedia system requirements. Keywords— SIP, SIPPING, state of standardization of SIP, SDP, session application layer routing, emergency calls
I.
INTRODUCTION
New multimedia application needs drive towards new functionalities in the IP network. This is coupled with continuing pressure to enable IP-based Internet Telephony in order to avoid having network providers to replace the aging telephony networks with new dedicated hardware. All this coincides with huge amounts of Internet fiber overcapacity for which demand is hard to find. IETF has been trying to adjust to the new conditions. As an answer to challenges IETF has been developing new multimedia architecture. The architecture aims to be flexible enough to support various application needs as well as deployable, in order to enable incremental transfer to production by standardizing interoperability mechanisms. The multimedia architecture capabilities in Internet and Wireless networks are closely linked to IETF efforts to develop a flexible Quality of Service architecture [6] as well as ongoing research efforts in multicast protocols [7]. However, intra-company networks or close to core networks with high bandwidth overcapacity allow limited deployment without requiring either. Multimedia handling, with its soft real-time or hard real-time requirements depending on target of use, is
complicated to do on packet switching networks, since Quality of Service cannot inherently be determined. However packet switching is essential in order to maintain a scalable, fault-tolerant system without need or resorting to expensive special-purpose equipment. The new architecture is an evolutionary step extending the current TCP/IP protocol family. It could be said its somewhat of a compromise, forcing middleware and systems engineers to choose correct combinations of the whole stack instead of just safely choosing one of the transport layer alternative protocols and be done with it like with the basic Internet TCP/IP-architecture. The Session Initiation Protocol, SIP [19], is one of the protocols used in the IETF multimedia architecture. The architecture includes a number of other protocols, such as Real-time Transport Protocol (RTP) [1] for transporting real-time data and providing QoS feedback, the Real-Time streaming protocol (RTSP) [2] for controlling delivery of streaming media, the Media Gateway Control Protocol (MGCP) [3] or the joint ITUT and IETF developed Megaco, also called H.248 [4] [5], used for controlling gateways to the Public Switched Telephone Network (PSTN) and the Session Description Protocol (SDP) [10] for describing multimedia sessions. SIP is a text-based application-layer control protocol that is mainly used to establish, modify, and terminate multimedia sessions e.g. Internet Telephony calls. However, SIP is not limited to either devices, supporting also pagers, laptops etc, nor to a specific call type, supporting one-to-one as well as multiparty conferences. While applications of SIP involve mainly human-tohuman communication, SIP design clearly addresses the needs of a generic user, enabling anything with an address to participate. The establishing phase supports locating the user, negotiating whether the party wishes to accept the call and what the supported and required features of the communicating parties and communication media. SIP does not define the actual session attributes, treating them as opaque payload data in order to remain independent of the communication media capabilities. Modification of session includes changing parameters of
1 (17)
the session, inviting additional participant to conference calls and invoking available data-plane services. Currently SIP is intended to address multiple needs: these include IP telephony special needs such as supporting caller availability status change, emergency call connectivity as well as supporting IETF presence framework and new applications such as instant messaging. Originally this was not the case: the primary focus was audio and video conferencing over the Internet prior to carrier-grade signaling needs [55]. To manage SIP development in a controlled manner IETF currently has two main working groups (WGs): the SIP WG concentrates on basic functionality of SIP and its extensions to ensure the protocol suitability is considered in areas where it will be applied while the Session Initiation Proposal Investigation (SIPPING) WG concentrates on evaluating and prioritizing SIP special needs and multimedia requirements, documenting SIP extension requirements and forwarding these to SIP WG for standardization. While SIP is currently at its version 2 defined in 2002 in [19], three years after the initial RFC [15], with over 100 documents (RFCs, internetdrafts and working papers) the question of managing the development still lingers in the air. In addition to the basic SIP development, there is work done in IP Telephony WG to integrate SIP and PSTN signaling, in Geographic Location Privacy (GEOPRIV) WG to extend user location-based service to cover geographical location and in Authentication, Authorization and Accounting (AAA) WG to support and finalize SIP security issues. Besides IETF activities, 3rd Generation Partnership Project, 3GPP, has adopted SIP as a mandatory protocol for handling signaling in IP multimedia services provided to 3G devices. This assures SIP deployment to millions of phones. Two active forums are associated to SIP: the SIP Forum [8] promotes general awareness by providing information about SIP whereas SIP Center [9] promotes commercialization and interoperability of vendor solutions by offering technical resources, testing environments and interoperability tests. The paper is structured as follows: Section II describes SIP protocol basic functionality including participants, messages, application-layer routing, application-layer transport mechanisms, basic flow phases and SDP. Section III describes SIP layered design approach and protocol properties such as security, quality of service and performance as well as discussing the scope of SIP usage and additional requirements especially in the context of Emergency Call support requirements. Section IV provides a short summary on related work
including description of APIs, key SIP extensions, key differences to related protocols such as ITU H.323 and Cisco SCCP and short summaries on work done to integrate SIP to PSTN and 3GPP IP multimedia systems. Section VI draws conclusions regarding SIP. As a clarification to terminology, the paper uses call, session and conference (multiparty session) interchangeably, mostly depending on context at a given time. When not otherwise referred, [19] is used as the source. II.
BASICS OF SESSION INITIATION PROTOCOL
This section describes the typical participants to a SIP infrastructure. This is followed with introduction to SIP messaging, routing of request in session establishing and SIP transport mechanisms to e.g. provide reliability to UDP-based messaging. Section concludes with basics of SDP and ends with an example of SIP flow. A. SIP components
SIP has four logical entity types (user agents, registrars, redirect servers, proxies) and an abstract service known as the location service. SIP doesn’t define how logical entities are implemented or deployed: a SIP element can include multiple entity types. Basic network services such as DHCP (for boot-strapping) and DNS (for name-to-IP address, port transport protocol resolution) [21] are also required. Each entity that actively participates is said to have a core, an identity. The abstract location service on the other hand is used by SIP but not defined by it. Figure 1 provides a possible configuration of SIP capable network. Redi rect Server / Proxy Server / DNS / DHCP / Registrar
UA (B)
Key UA
SIP User Agent Wan li nk
UA (A) Proxy Server / Redirect Server / Registrar
DNS/DHCP/Location Service
Figure 1: SIP entities and basic network services
2 (17)
User agents (UA) have two roles: a Client (UAC) that issues requests and receives responses and a Server (UAS) that receives requests directed to it and issues responses by either accepting, rejecting or redirecting the request. A SIP user can be represented by multiple SIP addresses, each of which can point to multiple devices. A device can be accessible through multiple SIP addresses. The SIP address is similar to an email address and is assumed to remain stable in relation to how it is defined: it can be given by a network provider (e.g.
[email protected]), be in relation to ones work role (e.g.
[email protected]) or affiliated organization (e.g.
[email protected]). The address changes when e.g. the user changes the network provider, moves to another job or changes organization, not necessary when the user switches location. For e.g. temporary change of location purposes, the user can have multiple SIP addresses and redirect calls to the current location. As a SIP address can concurrently relate to multiple devices, a SIP request has to be able to fork. This is something that no other signaling protocol currently does. User agents have to be implemented in a way that they can manage multiple responses to a single request, although under normal one-to-one circumstances, they receive only a single response to requests. Registrars are responsible for maintaining User Agent access information based on User Agents informing on modification needs with specific request containing the SIP address and the contact addresses i.e. IP-addresses of the devices bound to the SIP address. The registrars accept requests that are targeted to SIP addresses within its managed domain and communicate these onward to Location Service that maintains this information. Proxy servers are intermediaries used mainly for routing requests to another target that must be closer to the final target than the proxy. Proxies also allow policy enforcement and rerouting of requests. One way of classifying the proxies is by the location of the proxy in the path from the UA to the target UA. The closest proxy to the UAC is the outbound proxy, while the closest proxy to the target UAS is the inbound proxy. All proxies in between these two are the intermediate proxies. Another way of classifying the proxies is the statefulness. Stateless proxies simply forward requests and responses without actively generating new types of request and response messages. Stateful proxies on the other hand act as UASs: they respond to UAC requests with the best response out of possible UASs’, which is closest to the UAC’s requirements. To find multiple answers the stateful proxy can fork the original UA
request to two or more destinations. The forking can be done in unicast or in multicast to e.g. provide better support for automatic call distribution (ACD) systems. The stateful proxy groups “best” responses (i.e. responses that allow the UAC to continue session establishing process) in a response context from which it chooses the final response based on its response precedence rule-set. The proxy can cancel all nonsuitable responses, i.e. errors or responses that were not selected due a better response, in order to keep state management down to a minimum in the SIP network Stateful proxies can be further divided into call stateful proxies, which maintain state of the entire call, from the establishing to the termination of the call and transactional stateful proxies that maintain state of at least a single request of the UAC. As such all call stateful proxies are transactional, but the reverse doesn’t apply. The concept of transaction in SIP is explained later in Section III.C. Redirect servers manage redirecting contacts to UAs that are out of the registrars domain. Redirect servers can be used to redirect callers to another SIP address in order to avoid having a SIP user know all SIP addresses of the target user. Redirections are done by a specific statuscode in the reply, like in HTTP1.1 [63]. In addition to user availability, it must be remembered that proxies are the centralized component in the SIP architecture. One server is likely to handle small-to medium deployment, but multiple proxies are likely to be needed in large domains. To alleviate possible problems, the redirect servers can provide another SIP address for the UAS in order to direct the UAC to use e.g. another proxy path. Location Service is a database that contains the SIPaddress to a list of contact IP- addresses bindings. The location service is used by proxy and redirect services to locate the UAS and by the Registrar to update UA location information. The location service also maintains user level availability and preferences as well as contact address-specific capabilities. Contact-address specific state of the device (e.g. whether turned on mute, not connected etc) is not maintained in the location service. B. SIP Messages
SIP request and response message form resembles HTTP1.1, consisting of a request or status line, header fields and an optional entity body. The basic SIP 2.0 defines six request methods: REGISTER-request is used to provide location information by the UA. The method is passed
3 (17)
periodically to a Registrar that updates the Location Service. INVITE-request is used to establish a session. Because the invitation can lead to a long pause before e.g. the target party answers the phone, the method is linked to a separate additional reliability mechanism, provided by the ACK-request. -method is used by the caller to confirm reliable INVITE-request exchange to UAS, somewhat analogous to TCP three-way handshake. The use of the method is independent of the transport protocol used. OPTIONS-request enables to negotiate session options without requiring establishing of a session. This enables both caller preferences (e.g. if in a shower and a phone with a video-capability rings, one may want to turn the video-transfer off despite phone capabilities. Likewise choosing a language is typically something that can be useful for text-based communication or when calling a work role-based SIP address) and device preferences (e.g. which authentication protocol should be used, what algorithm is used for payload encoding compression etc). CANCEL-request is used to terminate requests due to e.g. request forking. The use of the method doesn’t affect an ongoing session. BYE-request is used to terminate a session. The request is valid if he requestor has already established the session or is negotiating the establishing. Extensions to SIP describe a SUBSCRIBE-request used to indicate interest in knowing when the party is available and a NOTIFY-request for informing of status changes [16]. Implementations supporting the additional messages can automatically handle informing a person when the called party becomes available by sending a NOTIFY-request after state modification by a REGISTER-request. In order to seamlessly work with the PSTN system, a separate PSTN-Internet Networking (PINT) Server logical entity is defined to communicate the methods to and from the PSTN and VoIP networks. Additional extensions describe the UPDATE-request [23] that is used to modify the session either during INVITE-request exchange prior to final ACK or after the INVITE-request has resolved. However, since the UPDATE-request is not allowed to affect dialogue state (see below), specific rules apply to how it must be used. Header fields manage device caller id’s, content type, loop-prevention, packet-order handling, party identifiers, and SIP routing. Header fields can be often expressed in a compact form and don’t require a specific order (excluding internal order of stackable header fields like those used in routing). Certain header fields can contain parameters that identify extensions to SIP. While
standardization of parameters provides support for interoperability between vendors, it also provides means for proprietary functionality. For now, the most common parameter used is the tag parameter that is contained in the From- and To-header field as a random local session identifiers, that can, with the globally unique identifier in Call-ID -header field, identify a peer-to-peer relationship called a dialogue. Since a flow state cannot be established with UDP, dialogue identifiers with separate message ordering mechanism is used to help in message sequencing and proper routing as intermediate SIP elements can distinguish SIP dialogue state [19, pp.69]. Section I.C contains further information on how application-layer routing is performed with header fields. Section I.D provides information on header field used for message ordering. Response messages are divided into provisional and final responses based on the status-code. Provisional responses (status-code 1xx) are used to indicate that the request was received and is being processed. Provisional responses enable the requesting party to e.g. know that the VoIP-phone is ringing on the other end of the line. There are no reliability mechanisms in the basic SIP for provisional responses. Final responses (status-code 2xx-6xx) indicate a resolution of UAC request. Final responses are divided into successes (2xx), redirects (3xx) and different types of failures (4xx-6xx) that e.g. suggest trying again (possibly later) or indicate global inability to provide service. An optional entity body enables carrying (controlplane) data. This can be used to create additional functionality by defining another protocol that conforms to the SIP request-response model. As such, entity body enables extending application of SIP beyond parameterbased extensions to SIP itself. The entity-body can use Multipurpose Internet Mail Extensions (MIME) [65] encoding to carry. The MIME message type has to be indicated through separate header fields in the INVITErequest or with separate OPTIONS-requests. Likewise, SIP can tunnel itself in the entity body. In this case it can e.g. use encryption with Secure MIME [66] to ensure privacy, yet some of the header fields used for routing still need to be clear-text. The main use for entity body is carrying SDP, discussed in section I.E. C. SIP application-layer routing and SIP mobility
SIP application layer routing includes basic routing, loop-prevention, and mobility support. The last issue is dealt from MobileIP and SIP perspective.
4 (17)
SIP application layer routing is required mainly in the call establish -phase. As the application layer routing forms an overlay network, SIP entities have no knowledge of the actual network layer topology or even adjacent link strain. As such, the path to a device that is, in the network layer, very close can become burdened with extra hops. The application layer routing is independent of network layer protocol. SIP is not tied to IP addressing in any way, supporting both IPv4 and IPv6. When the parties have located each other though call establishing, the contact addresses (IP addresses) are known and no application layer routing is required. SIP response route is created in the request path. Each SIP elements adds a Via-header field, forming a sequential list of hops on the route the request has passed. The response message is routed back based on these header fields with each SIP element removing the Via-header field it inserted before forwarding the response to the next hop. Compared to many peer-topeer application layer routing algorithms (e.g. Chord [49], CAN [50]), SIP similarly doesn’t try to do peer-topeer for all transfer. The UAs only use it for session establishing, or more specifically for service discovery, as direct contact addresses are shared during session invitation. Contact addresses can be cached by UAs and stateful proxies based on the expiration information. In case the UA (original client, a redirect server or stateful proxy) wants to force a specific request path, it can define a list of Route -header fields, called a route set, that explicitly indicate the target and intermediate systems. Proxies can request to remain in the path by using Record-route header field. Symmetric response routing [30] is a critical extension to the routing. It allows the UAs addresses to be NATted. According to the basic SIP the UAs are expected to use public IP addresses, which are recorded with ports to the SIP message. NATted private IP addresses are not a problem, as (outside) IP-address is marked by receiver with a Via-header field received parameter if it differs from sent-by parameter (inside) IP address that is marked by sender. However, the NAT port translation is a problem since the port marked by the sender, which are typically PATted by NAT-boxes. To work around this problem, UACs add an additional rport parameter to Via-header field in the request that is sent to outbound proxy. The outbound proxy places a (outside) PAT-port number to the rport parameter. When it receives the response, the proxy can uses the received and rport parameters to send the response to the NAT-binding. The UAC must be able receive to the same port it sent the request from. Additionally, in UDP
cases, the NAT may need an additional fix to maintain the NAT-binding, since it is maintained for a minute or so as it creates no flow. Alternatively the UAC should retransmit the INVITE-request with e.g. 20s frequency. Loop-prevention [19, pp.173] in the application-layer is an optional feature in SIP. This is done with a hop count kept in Max-Forwards -header field that is decremented by one by each entity in the path from the UAC to the target. The default value for Max-Forwards is 90, that is estimated to cover large SIP deployments. Mobility support is limited in SIP as contact address negotiation is not meant to be ongoing or be done during a call. To further look at mobility in SIP context we divide the needs to mobile IP managed mobility and session layer SIP mobility. Mobile IPv4 [67] [68] uses a home agent to represent the user to the network. The home agent tunnels IP packets sent to the mobile node to a foreign agent that is located in the visited network. The foreign agent forwards these to the care-of-address allocated for the mobile node. Packets from the mobile node are routed through the foreign agent directly towards the target host, creating what is called triangle routing. Triangle routing adds additional complexity to SIP application layer routing. Since the UAS device (e.g. laptop) has been registered to the home agent and the home agent manages the Mobile IP connection, the location service directs all incoming calls to the home agent, which in turn tunnels the connection. SIP UAC will always see the Home Agent as the UAS address. Likewise, the SIP UAS would direct INVITE-request replies to the home agent proxy. Mobile IPv4 might be required if e.g. the visited network firewalls only permit tunneling to a foreign agent address in the network or the visited address has no SIP entities at all, similar to UA C in Figure 2. HomeAgent (C) UA (C) MobileNode
Redirect Server / Proxy Server / Location Service / DNS / DHCP / Registrar
Foreign Agent UA (B)
Key
UA
MobileIP tunnel WAN link SIP User Agent
UA (A) Proxy Server / Redirect Server / DNS/DHCP/Location Service Registrar
Figure 2: SIP entities and MobileIP enabled SIP host
5 (17)
SIP mobility [40] [37] [38] can be divided into roaming mobility, personal mobility, session mobility and service mobility. Some of these mobility scenarios can be divided into pre-call mobility, which is the situation when mobility happens prior to the session being established and mid-call mobility, where the session has been established and the user has to be able to maintain the session during mobility. Mid-call mobility is currently an open issue under investigation. Pre-call mobility in each of the cases is done with SIP redirection. All approaches require that the visited network is SIP aware. Roaming mobility is the situation when a user is not in a home network. To enable pre-call mobility, the UA, after address resolution through DHCP, registers to the visited network registrar. After this, it registers to the home network registrar through the visited network outbound proxy and the home network proxy. To avoid this double registration, the visited domain registrar could do the registering on behalf of the UA or the administrative domains of the Registrars could be combined. The first option requires further specification of SIP. Personal mobility refers to users ability to redirect calls to any user device. This basically refers to how SIP deals with addressing as discussed in II.A. Session mobility is about e.g. changing from a VoIP phone to a SIP capable mobile phone because one is leaving the office. This can done in a preplanned manner or occur due outage e.g. because the battery of phone died out, which would require automatically initiated recovery, which SIP doesn’t support. There are at least three ways in which session mobility can be worked into SIP: the original sender issues a new INVITE-request to the same address and it is transferred to the new devices contact addresses and negotiated normally. This requires sending party intervention and of course the apps must be able to send to multiple destinations. Another approach is third party call control (3pcc), in which the receiver sends an INVITE-request to the new device indicating other party’s parameters. A third approach could be a REFER-request the other UA, indicating the new target to which a session should be negotiated. These require full use of the current device until the tear down of the connection. Service mobility is about keeping ones personal services (e.g. calender, buddy list etc). Service mobility is related to extending SIP to cover signaling between different types of services and maintaining these. REGISTER and NOTIFY messages are one part of the solution. Service signaling is still under investigation.
Session re-establishing by quick re-negotiation created with a proactive make-before-break –mechanism is not supported in SIP or SDP. Disconnection could happen for a number of reasons including temporary local network failure and temporary device (e.g. cell phone) failure. SIP and SDP probably see this as an implementation issue, since the protocols avoids making assumptions regarding UA capabilities. D. SIP application-layer transport mechanisms
SIP is typically transported on top of UDP to avoid TCP handshake delay although TCP support is also mandatory. The default port for UDP and TCP is 5060 with TLS [69] [70] encrypted TCP in 5061. SCTP support is still in draft state [33], though it is not currently further developed. It should be noted that due to application layer routing, SIP transport-layer protocol choice is not an end-to-end but a per application layer hop decision. Even though the original UA uses UDP, proxies may use another transport protocol. Basic SIP has multiple mechanisms to provide additional application-layer transport mechanisms to overcome UDP problems. These include packet message reliability, congestion management and message ordering. SIP has no mechanisms for fast session reestablishing to support recovery from connectivity failure. Additionally SIP supports message-level multihoming, however, this requires taking into account connection reuse and symmetric response routing issues and is as such scoped out. Reliability is handled in two different ways, depending on whether the protocol is using an INVITEmessage or other than INVITE-request. For INVITE- message, since it can take a while before the phone is answered, the entities send provisional responses to notify the UAC that call is being processed. In addition to this, the final response is separately ACKed by the requestor. The UAC has a retransmission timer that is initially the estimated RTT, defaulting to 500ms. This grows exponentially. Besides the UAC retransmission timer, the overall INVITE-request resolution has a 3 minute timer, after which proxies automatically timeout the connection. For other than INVITE-request on top of UDP, every request is retried if no answer is received within the UAC retransmission timer. For other than INVITErequests, the timer is 64*estimated RTT, also grown exponentially with retransmission needs. While the reliability mechanism seems to be necessary, it should be noted that if used in mobility
6 (17)
supporting middleware like e.g. Wireless CORBA [51], the middleware may itself be equipped with an adaptation layer that in a similar manner adds reliable transport properties when communicating over UDP. Provisional response reliability [20] is not guaranteed for provisional responses over UDP in basic SIP. Since this could lead to interoperability issues when integrating SIP with PSTN-signalling, a reliability mechanism has been created, which simply mirrors the ACK-approach by defining a PR-method message that is used to acknowledge provisional responses. In this case the UAS must wait for all acknowledgements to the provisional responses it has sent prior to sending the any final responses with status-code indicating success to the INVITE. While these provide reliability for end-to-end communication, it doesn’t really help in the case any of the proxies in the path loose connectivity to the next hop proxy in the path. SIPs approach to this problem is DNSbased: by giving multiple proxy-addresses, in case of proxy failure the requestor can use another proxy, although statefulness may be required in this case [21]. Congestion management is done by exponential backoff of reliability providing retransmission timer as well as packet size limiting. Packet size limiting states that if packet size is (path MTU – 200 bytes) or more or if its 1300 bytes or more when path MTU is unknown (an implicit Ethernet assumption), the packet should be sent using a reliable transport [19, pp.141]. While these congestion management approaches are a good start, it has to be remembered that the transport is still UDP with no mechanisms to deal with congestion avoidance or adaptation. Paths between proxies can be unnecessarily stressed due inappropriate UA behaviour. Assured congestion safety [31] is a planned extension that is meant to counter this effect. The extension defines congestion-safe request as meaning either use of a reliable transport or using the extension and a congestion-safe request that are paced and managed by proxies that are able to reject UDP packets that would require fragmentation. SIP request UDP pacing requires waiting for a response before resending the request. This way all the entities in the path (not just the UA) are waiting for the timer to time out prior to resending requests. Proxy UDP packet rejection is used on a situation where the proxy is given a large message to be forwarded over UDP and that would require fragmenting the message to multiple UDP packets. The fragmentation need is based on the Path MTU that the proxy knows or estimates (using its local network MTU). In this case the proxy rejects the package using basic SIP error code and
two header fields extending SIP: the Proxy-Max-Size that indicates the maximum UDP packet size and the Proxy-Seen-Size that expresses the size the packet had when received from the UA. Congestion safe response is something that is either no larger is size than the request or that is a response to a request that was congestion-safe. As such, the mechanism is based on request control. Message ordering of requests is done through the Cseq-header field that provides locally unique sequence value containing an integer and request method used. The header field shows retransmissions by increase of integer for the same request. Keepalive of a session is obvious to the dialogue members (UAs), but the intermediate SIP entities have no knowledge of the session state in case a UDP packet is lost. This can lead to stateful proxies ending up maintaining dialogue state indefinitely. Session timers [32] have been proposed to correct this. The proposal suggests UACs supporting the mechanism express this in the INVITE-request. The SIP elements in the path can insert a Session-expires -header field that contains the desired interval for a refresh message. Each of the stateful proxies can evaluate on whether this interval is suitable and if not, reject the INVITE message with an error status while indicating with another header field what its refresh requirements are. After possibly multiple iterations the INVITE arrives to the target UAS which finalizes the Sessionexpires interval. After the caller UAC receives this, it sends an INVITE- or UPDATE-request within the timer period to refresh the proxies in the path. E. Session Description Protocol (SDP)
The Session Description Protocol (SDP) [10] is developed by the Multiparty Multimedia Session Control WG (MMUSIC WG). The primary focus originally was the announcement of multimedia conferences, but SIP, MGCP and RTSP have presented new needs that the protocol has been adapted to. This effort has required stretching the syntax and semantics, which in the long run is not a viable solution. The MMUSIC WG is currently actively drafting [14] a new version of the protocol, called SDPng (SDP next generation) that would be able to express the wider variety of needs. SDP is required to provide information about multimedia capabilities so that the parties involved can decide whether or not the session will be established. This information includes most importantly media streams, which define the content (e.g. audio, video,
7 (17)
application, data, control) of the stream, in a manner similar to MIME content. In addition each streams payload type (i.e. media format), destination address and port number are provided. Encryption mechanism can also be described. Basic SDP assumes each stream will be independent and have a dedicated connection. In addition to media stream description, issues such as start, stop and repeat times for e.g. an Internet radio program can be indicated. Contrary to SIP SDP header fields need to be in a specific order. Basic SDP only describes the parameters used. A simple offer/answer model [11] was required to describe how the actual negotiation of SDP parameters between parties happens. The model describes how initial offer and answer are generated, how the media stream description is updated and how the UAC and the UAS iterate from the initial offer to the final acceptance of SDP parameters. All this is covered both in terms of unicast messages as well as multicast messages. Media stream grouping [12] enables describing how multiple media streams relate to each other, forming a media flow, contrary to the basic SDP that handles every media stream independently. This can be used e.g. to express lip synchronization requirements with a video. For wireless networks the extension is especially useful since multiple codecs can exist in multiple ports and all are used for the speech. Without this extension, SDP could not describe this relation. Mapping of media streams to resource reservation flows [13] extends the media stream grouping by enabling the group to make a joint resource reservation. Other apps that require dedicated media streams can do these alongside grouped reservations. F. An example of a basic SIP flow UA Proxy server vo1.hq.buffalo.com hq.buffalo.com 199.121.1.203 199.121.1.204 1
Proxy server south.metro.com 203.168.11.203
DNS
UA Location mp1.south.metro.com service 203.168.11.207
INVITE SIP/2.0 100 Trying
2 3
5
Query DNS: buffalo.com DNS Response: 4 203.168.11.203 INVITE SIP/2.0 100 Trying
6 7
Query LS server
[email protected] Response: bob@mp1...
9
8
INVITE SIP/2.0 180 Ringing
SIP/2.0 180 Ringing SIP/2.0 180 Ringing SIP/2.0 200 Success
12
SIP/2.0 200 Success
SIP/2.0 200 Success
10 13
14
15
INVITE sip:
[email protected] SIP/2.0 Via: SIP/2.0/UDP vo1.hq.buffalo.com To: Bob From: Alice ; tag=18271 Call-ID:
[email protected] CSeq: 12921 INVITE Contact:
The outbound proxy, once receiving this, replies with a provisional response, indicating that the specific session is being processed (2): SIP/2.0 100 Trying To: Bob From: Alice ; tag=18271 CSeq: 12921 INVITE Call-ID:
[email protected]
In addition to this, the outbound proxy does a DNS queries (3) for the metro.com sip service and receives Bob’s inbound proxy-server IP address, protocol and port as a replies (4), as there are no stateless intermediary proxies. The reply indicates that e.g. UDP is preferred, but TCP is also available. IP address 203.168.11.203 prefers UDP and has the service in port 5060. Alice’s outbound proxy forwards the request (5), adding only its own Via-header field: INVITE sip:
[email protected] SIP/2.0 Via: SIP/2.0/UDP hq.buffalo.com Via: SIP/2.0/UDP vo1.hq.buffalo.com To: Bob From: Alice ; tag=18271 Call-ID: 3223842@ vo1.hq.buffalo.com CSeq: 12921 INVITE Contact:
ACK
16 17 18
11
Figure 3 illustrates a SIP session messages based on example in SIP RFC [19, pp. 213-219]. SDP session attribute negotiation and Via-header field parameters involving SIP transaction identification (indicated by branch –parameter), sender IP addresses and ports, (indicated by received and sent-by–parameters), were left out. Transactions are described in Section II.A. First of, as Alice picks her VoIP phone and dials the SIP address
[email protected], the UA sends an INVITErequest (1) to the outbound proxy for delivery to Bob, recording IP to the Via-header field, globally unique SIP session identifier to the Call-ID -header field as well as direct contact information to the Contact-header field. To assist proxies, the UA adds the tag identifier, in order to allow Bobs UA to eventually finalize the dialogue by adding another tag:
Non-SIP data transfer (e.g. RTP media) BYE SIP/2.0 200 Success
Figure 3: SIP flow between Alice and Bob
19
Bob’s inbound proxy receives this and sends a provisional response (6), similar to (2) but the Alice’s outbound proxies Via-header field included. The outbound proxy decides not to send this to Alice’s UAC since it has already sent this to it. The inbound proxy
8 (17)
first priority is locating Bob’s contact address. It therefore queries the server providing location service for Bob’s contact address (7). The location server responds (8) with the IP address (either only address or preferred address that Bob defined through a registrar), protocol and port of mp1.south.metro.com. Bob’s inbound proxy adds its Via-header field, rewrites the INVITE SIP URI and forwards the request (9): INVITE sip:
[email protected] SIP/2.0 Via: SIP/2.0/UDP south.metro.com Via: SIP/2.0/UDP hq.buffalo.com Via: SIP/2.0/UDP vo1.hq.buffalo.com To: Bob From: Alice ; tag=18271 Call-ID:
[email protected] CSeq: 12921 INVITE Contact:
As the request arrives to Bob’s VoIP phone, it starts ringing. The phone sends a provisional response. This now includes the locally unique tag in the To-header field, establishing an early dialogue, prior to session being established and direct contact address to Bob (10): SIP/2.0 180 Ringing Via: SIP/2.0/UDP south.metro.com Via: SIP/2.0/UDP hq.buffalo.com Via: SIP/2.0/UDP vo1.hq.buffalo.com To: Bob ; tag=129991 From: Alice ; tag=18271 Call-ID:
[email protected] CSeq: 12921 INVITE Contact:
The response is sent (11) by inbound proxy to Alice’s outbound proxy with one Via-header field (Bob’s proxy) removed and onward (12) to Alice’s UA with still one more Via-header field (Alice’s proxy server) removed. While the provisional response is being routed on the application layer, Bob picks up the phone. Bob’s phone sends a final response (13) to Bob’s proxy as a notification of success: SIP/2.0 200 Success Via: SIP/2.0/UDP south.metro.com Via: SIP/2.0/UDP hq.buffalo.com Via: SIP/2.0/UDP vo1.hq.buffalo.com To: Bob ; tag=129991 From: Alice ; tag=18271 Call-ID:
[email protected] CSeq: 12921 INVITE Contact:
The response is forwarded (14) by Bobs outbound proxy to Alices outbound proxy with one Via-header field (Bobs proxy) removed and onward (15) to to Alices UA with still one more Via-header field (Alices proxy server) to remove.
To provide reliability, Alice’s phone acknowledges the final response. This is sent directly to Bob’s UA (16): ACK sip:
[email protected] SIP/2.0 Via: SIP/2.0/UDP vo1.hq.buffalo.com To: Bob ; tag=129991 From: Alice ; tag=18271 Call-ID:
[email protected] CSeq: 12922 ACK
After this, the media session begins (17). If no modifications are made to the session or SDP session attributes, Alice finally terminates the session with the following message (18): BYE sip:
[email protected] SIP/2.0 Via: SIP/2.0/UDP vo1.hq.buffalo.com To: Bob ; tag=129991 From: Alice ; tag=18271 Call-ID:
[email protected] CSeq: 12923 BYE Contact:
The effect of modifications to termination message are only important in terms of CSeq, although its uniqueness is decided by both the integer and the method name. As such, the integer could still have the same value. In case Bob would terminate the session, the integer could be anything, since the sequence number would be dependent on Bob’s UAs sequencing. Also in this case, the From-header field would contain Bob’s SIP address with the same tag and To-header field would contain Alice’s SIP address with the same tag. To finalize the session termination, Bob’s terminal provides a response (19) to Alice’s BYE request: SIP/2.0 200 Success To: Bob ; tag=129991 From: Alice ; tag=18271 Call-ID:
[email protected] CSeq: 12923 BYE
The effects of user mobility would depend on how it is done. In case Bob was in the move with a Mobile IP – based terminal, the tunnel from Home Agent to Foreign agent and from there to Bob’s terminal would be responsible for the session maintenance and basic routing would carry the message from Bob’s terminal to the in-bound proxy in Bob’s home network. SIP would be totally oblivious to this as it is done in the network layer. Alice’s UAC would assume the Home Agent address is Bob’s terminal. On the other hand, in case Bob was e.g. visiting a daughter company of metro.com and had a possibility to register there, he would have used the Registrar to change metro.com location service to redirect all calls to his SIP address to a new SIP address, e.g.
[email protected]. In this case Alices SIP INVITE
9 (17)
would have proceeded as in Figure 3 up until the locations server response (8), which would indicate the new address. The proxy would generate the redirect response, also indicating how long the stateful proxies and UA can cache the information (in this case 10 hours or 7200 seconds) for new session establishing purposes, prior to renewing the information: SIP/2.0 302 Moved Temporarily Via: SIP/2.0/UDP hq.buffalo.com Via: SIP/2.0/UDP vo1.hq.buffalo.com To:
[email protected]; From:
[email protected]; tag=18271 Call-ID:
[email protected] CSeq: INVITE 12921 Contact: ; expires:7200
Alice’s UA would then initiate a new INVITE-request to the new SIP address in the Contact header field and follow the steps in Figure 3. III.
SIP DESIGN AND PROTOCOL PROPERTIES
The section describes the layer model that guides SIP design by layers principle that reflects the previous sections features. In addition the protocol properties such as security, quality of service, performance and limitations to usage are discussed. Finally, the section describes the usage considerations in the specific context of emergency calls during disasters. A. SIP design by layers
To avoid restricting usage to specific types of session initiation, modifying and terminating purposes, SIP is layered to five separate logical layers that enable making generaizations regarding a specific layer behavior. This help remind people working on the protocol of the basic ideology behind the protocol. The layers are from the lowest to the highest[19, pp.18]: Syntax and encoding layer is specified using an augmented Backus-Naur Form grammar (BNF) [71] for SIP messages that are UTF-8 character set based. This forms three areas that need to be addressed: protocol performance, security and parser implementation. Performance and Security issues regarding plain-text format are included later in this section. The parser implementations require a lot of effort: the parser must be implemented manually, it cannot be auto generated from syntactic definition of Augmented BNF. With each specification change the parser must be modified. Either a parser needs to be manually implemented or a separate translation to another
grammar supplemented by additional code to generate the parser. Transport layer that defines how a client sends requests and receives responses and how a server receives requests and sends responses over the network. The layer is contained by all SIP entities and is mostly interested in reliability, transport properties as well as discussed in I.D. Transaction layer provides a concept of transaction that is “a request sent by a client transaction (using the transport layer) to a server transaction, along with all responses to that request sent from the server transaction back to the client” [19, pp.19]. More concretely said all provisional responses are included in a transaction, the method to the INVITE-request final response is its own transaction and retransmissions should not be visible to the next higher layer [19, pp.121-123]. Transaction layer is not used by stateless proxies. In some cases responses from this layer will not be SIP messages but transaction timers. In these cases the output is modified to look like a SIP specific error message, depending on layer error type [19, pp.42]. Transactions are identified uniquely with branch parameter in the Via-header field. The current SIP specification inficates a magic cookie sequence (z9hG4bK) that has to be used as a beginning value for the parameter, in order to seperate the implementation from the prior proposed standard. Transaction user (TU) is any non- stateless proxy IP entity. Basically the TU is the owner of a specific transaction: it creates the transaction with the required parameters and can cancel it if so sees fit. Core is the identity of a SIP entity. Where as the transaction user owns a specific transaction, the core establishes or participates to dialogues. B. Security
Since SIP is very much about user presence communicated through a device (a user agent), the nature of SIP makes security particularly important: privacy and trust issues become paramount. Of course the basic AAA issues are be already taken into account in the link-layer in mobile networks, independent of SIP. SIP provides security services based on HTTP and S/MIME, including authentication (both user to user and proxy to user), message integrity protection and confidentiality. In addition, privacy supporting UA behavior has been defined as well as a logical privacy service for intermediaries that helps UAs. This is further enhanced by the security agreement mechanism.
10 (17)
Authentication is done on request-by-request basis and is based on the HTTP basic and digest authentication [64]: challenge-response mechanisms like CHAP [43] can be used, whereby the target of the request initiates the authentication with a challenge, which the requestor responds to. This ensures one-way authentication (as authenticating the target is another issue) and provides protection against replay attacks (by use of nonce).. If the request has been forked by a stateful proxy, intermediate and inbound proxies may require multiple UAC authentications. To alleviate UA strain, the proxies forking requests are required to aggregate authentication challenges to the minimum and send these to the UA. Integrity of messages is can be guaranteed by using PGP-based digital signatures. The other possibility is to use of S/MIME and tunneling of SIP messages. Confidentiality is not supported with any new mechanisms. Due to this, TLS/SSL could be typically used if TCP is used as a transport. UDP on the other hand would require use of IPSec or link-layer security encryption. One possibility also is to use S/MIME to encrypt SIP header field information and use SIP tunneling to guarantee confidentiality and integrity. However, some header fields (e.g. ones required for dialogue) are still required to be in plaintext to enable application layer routing [19, pp.207]. Denial-of-Service (DoS), i.e. attacks bombarding the SIP servers with requests in order to disallow actual users from using them, is a complicated issue for SIP as the protocol offers multiple ways to exploit this attack. There are no straightforward answers to counter this in addition to other security services and sound network design. Possible exploits include e.g. Via-header field misuse, Record-route -header field information misuse and REGISTER-request misuse [19, pp.236]. Via-header fields can be used with spoofed addresses to harness multiple UAs and proxies to generate an amplified denial-of-service attack if authentication is not required. This is analogous to a smurf DoS attacks [72]. The Record-route –header fields expose the proxies to a DoS-attack since the mobile terminal user can becomes aware of the all the proxies in the path to the service route. In 3GPP networks this basically means network operator proxies required by the architecture become exposed and a terminal user can use the information to launch an attack with separate equipment and cause unavailability of the service. In case authentication is not required, the REGISTERrequest offers multiple possibilities: a contact address of the intruder can be added to enable joining multiparty conferences, contact addresses can be deleted to
effectively deny user invitation to anything and multiple contact addresses could be registered for a given SIPaddress to be used as an amplifier in a DoS attack. Privacy Service [24] is provided by the intermediate entities, typically proxies. The service is responsible for supplying privacy functions that are unavailable for the UAs, such as withholding identity and personal information of a SIP user. The UAs activate privacy service by attaching a Privacy-header field to the requests and responses. For legacy clients, messaging to application level, without a Privacy-header field is supported. The intermediate entity evaluates the request and conforms to it if it’s allowed to pass anonymous requests. The receiver must accept anonymous sessions. The Security Mechanism Agreement [25] procedure enables the UAs to securely agree on arbitrary security mechanisms with the next hop entity. The basis for this mechanism is the reality that as SIP deployment will consist millions of phones, one really cannot make assumptions about what security mechanisms are available to the phones and network. C. Quality of Service
Handling Quality of Service resource reservation based architectures contains a problem for SIP: in which order should reservation of resources with e.g. RSVP [73] and initiation of session be done. If reservation of resources is made before actual session initiation, the target might be unreachable and the capacity was reserved for no purpose. On the other hand if session is initiated before resource reservation, failing the latter may lead to disconnection or inability to conform to required QoS parameters. To alleviate the problem, preconditions [18] were introduced to both SDP and SIP. Preconditions separate current and desired state, enabling all parties to express their state and requirements for each multimedia stream. Resource reservations are unidirectional. The SDP precondition is sent in the INVITE-request, which is not processed further before target and caller both agree on resource reservation. As such, resource reservation becomes interleaved with the session initiation. D. Performance
Additional protocol features can improve SIP performance. However, some issues may require use of external specialized equipment. This section discusses
11 (17)
connection reuse, compression of messages and load balancing. Connection reuse [34] in SIP is aimed for reliable transports. While responses to a request are returned to the correct port (e.g. the whole INVITE dialogue described in I.E example flow), requests from the target UA are unlikely to use the same connection. In order to reduce latency, especially if doing TLS over TCP, which requires additional round trips to set up the encryption just to e.g. send a BYE-request, a connection reuse mechanism is at work. The draft defines an additional parameter alias to the Via-header field, which is used to recognize that the UAC allows for reuse of existing connection for the requests from target UA. The connection ephemeral port, i.e. the high port that needs to be reused, must be found by target UA itself. It will not be provided by e.g. DNS manipulation or by the server of the connection. To avoid connection hijacking by an intruder, additional mechanisms are used for authentication of connection. Compression [28] of SIP messages is currently at its draft stage. The compression will be based on basic Signalling Compression, as defined by Signaling Compression (SIGCOMP) WG. Use of compression is expressed as a comp=sigcomp parameter in either the SIP URI of request or Via-header field of the response. The other option would have been to use DNS records, but, considering that there are three transports (if you include SCTP) of which two of them (TCP and SCTP) can have TLS used on top of them, a single server would require multiple entries to express compression support for all the possible choices. If SIP is tunneled in case e.g. S/MIME is used to encrypt the original SIP message, payload compression available in basic SIP and can be negotiated with entities in the path [19, pp.169]. Load balancing in SIP is based on DNS records. Since one outbound proxy probably doesn’t ask the target proxy address that many times, rather caching the request to a local resolver, corporate sessions to another corporation probably travel the same route. This problem is analogous to forwarding HTTP-proxies. Currently corporates are moving to portal infrastructure architecture. The inbound proxies in principle could be transferred behind an external load balancer middleboxes to guarantee more transparent failover and load based request distribution. SIP URI – based hash persistence could be used analogous to HTTP forwarding proxy farms persistence. However, SIP proxy failure recovery by another SIP proxy would be dependent on dialogue state. Also, keepalive checking of SIP proxies using UDP is somewhat difficult since there
is no feedback channel with UDP. Likewise, Via-header field and Record-route -header field with private IP addresses complicate the matter, since the load balancers currently do not interpret SIP header-fields at all. E. Limitations to usage
SIP and SDP are extremely extendable protocols. Still, there are areas where they are not appropriate. Some of the areas are discussed below [39]. SIP is not an application-layer data transfer protocol. Even though it may look like HTTP, it is a control-plane signaling protocol that is used carry session attributes but not actual application data. SIP is not a routing protocol. It does not transfer routing information or in anyway interleave its functionality with routing protocols. While SIP is used for session initiation setup, it does not provide signaling for resource reservation nor would there be any point in extending SIP to provide such features and all the extensions that are already available. SIP does not offer conference control services such as member ejection, feedback, virtual microphone passing, chair control, voting or polling. SIP also doesn’t have any protocol features that would make assumptions on how conferences should be managed. SIP can be used to initiate a session that uses some other conference control protocol. This is an important difference when comparing to the likes of H.323 protocol. SIP does not have mechanisms to control user device state (e.g. turn SIP devices to mute). In terms of implementing this device control, the implementation may or may not use the locations service to discover the contact addresses of the devices to be managed. SIP request methods, responses and header fields all have a basic primitive issue that they are trying to accomplish that needs to be obvious to the recipient, not something that needs to be evaluated. Therefore requests and responses should not be extended with header fields or parameters that break primitive rule and header fields should not be extended with parameters that break this rule. F. Emergency Calls During Disasters
Emergency calls form a problem in a packet switched, due to only best effort service, congestion probability during demand rise, lack of admission control and no central control. SIP further complicates this compared to traditional PSTN signaling, since if SIP were to carry
12 (17)
resource reservation information, it would do this in the call setup phase that uses the SIP application layer route to the target. This can naturally be different from the actual route between the two communicating parties. Also, basic SIP call setup itself cannot be prioritized over other traffic. While under normal circumstances signaling of emergency calls can be done over a packet network, disaster events require additional support. Support for existing priority schemes is critical. While many national PSTN networks have their own, US Government Emergency Telecommunications Service (GETS) and Multilevel Precedence and Pre-emption (MLPP) used by the US military are used as examples. GETS [57] uses multiple levels of preference with emergency calls as the highest. This allows for multiple paths to destination, call setup priority over normal calls, call queue priority over lower preference calls and override of network management restriction capabilities to guarantee connectivity in PSTN. GETS is activated by normal phones that dial to specific numbers. MLPP [29] is used in non-public military networks to secure call resources during war and national emergencies. It allows selection of call precedence that is used to decide which calls are pre-empted (disconnected) in case of resource limits. The precedence information is communicated during call setup with the signaling protocol. MLPP and its variants are typically pre-configured to special equipment. The whole problem of emergency calls was initially approached by creating a specific user identity [41], sos, and then supporting 911 and 112 codes in the SIP uristring [42]. After this narrow approach, the requirements definition was moved to a recently created Internet Emergency Preparedness WG (ieprep WG), that looks at enabling PSTN-to-IP-telephony emergency calls in a wider scope among other things. Requirements to Resource Priority Mechanism [29] describe requirements for four SIP-related resources that can be constrained. These are the SIP proxies, IP network, the possible VoIP-gateway and the sender and/or receiver, depending on the environment. The approach the requirements take aim towards enabling seamless usage of the current PSTN system by mapping the priority information to PSTN emergency call system (ECS) specific codes, each in their own name space to avoid creating a global prioritization scheme. In addition to this, it is central that a SIP signal can safely arrive at the PSTN gateway. This approach has to support PSTNto-PSTN signaling where SIP is used in an intermediate VoIP-network and vice versa and work on any SIP method. In addition to these, several security issues, IP
transparency as well as discovery of available ECSs become critical. Geographical location support [61] is a critical additional requirement for the emergency services in a wireless network. Currently the work is ongoing, the draft only requires support for SIP, as well as allowing for passing of emergency information despite emergency center authentication failure. Cases where user authentication fails, or both the center and the user fail in authentication are open among other things. IV.
RELATED WORK
While control-protocols are an old idea dating to FTP, the extension ideas around SIP make it hard to say to just what it should be compared to. However, in order to be a viable choise for signalling, the basic premise is that it can perform this function well. This section covers APIs for developing with SIP, most visible SIP extensions and alternatives to SIP. In addition basic PSTN signalling requirements and approach in 3GPP IP multimedia systems is commented. A. SIP software development
Currently, there are multiple APIs for developers, mainly due to differing requirements. To further help application development, there are a few deployed SIP test servers [52] that can be used. Parlay [54] provides a similar API for both SIP and PSTN call handling. While suitable for those with background in PSTN applications engineering, it doesn’t allow to access all the features provided by the extensibility of SIP. Call Providing Language (CPL) [35] is a dedicated language, suitable for handling SIP transactions. CPL abstracts the specifics of the signaling protocol, providing also support for H.323 (see below, section III.C). CPL is intended also for advanced UAs. Cgi-scripts [17] are also a good approach, especially for quick development and freedom in choosing the programming language. Compared to HTTP, the dependence on proxies and dialogue state. Therefore the cgi application can be in the path to the final UAS, instead of being the actual stateless target for the service. oSIP [53] is a GNU C-library issued by Free Software Foundation. It provides an API for SIP and SDP message parsing as well as managing SIP transactions. The library is based on the newest SIP RFC 3261 [19].
13 (17)
The oSIP page also contains links to SIP applications that can be used to speed up development and testing. The Java Community Process (JCP) has standardized a low-level Java API for SIP in a JSR-32 [58]. Reference implementation is available. Besides this, JSR 116 [59] defines a higher-level Servlets API for server use, supporting sessions, data storage and retrieval and other matters similar to HTTP servlets. However, both of these are based on the older SIP RFC 2543 [15].
protocol. Instant Messaging basically uses it on a dataplane to send the actual messages. Of course for text messages this might be fine and all, but with multimedia messaging and – perhaps in the future – video or soundclips, it makes you wonder just how far will SIP be stretched. However, SIP for instant messaging is but one of the proposals for IETF instant messaging standard.
B. SIP extensions
H.323 [55] [56] is the ITU alternative to SIP that has over ten years of development history behind it, if one considers it to have evolved from H.324, which was developed to improve H.320. Contrary to typical remarks, H.323 is not a legacy protocol that would need to be replaced; the newest release H.323v4 was approved in 2000. In addition to long development period, H.323 has been the signaling protocol deployed with non-proprietary VoIP solutions. There are many fundamental differences when comparing H.323 to SIP as H.323 is binary encoded, uses ASN.1 syntax in message definition, includes features to handle intermediate network entity failure and has inbuilt conference call management messages. Also, devices that are H.323-based use a reliable transport protocol, TCP, although UDP is supported as well. Compared to IETF SIP development effort, H.323 also has a more hierarchical and layered specifications structure, which helps in understanding how individual specifications relate to the whole. Skinny Client Control Protocol (SCCP) [60], also called Skinny, is a half-proprietary VoIP terminal control protocol defined by Cisco Systems, Inc. It is used to control Cisco 7960 and 7960 voice over IP phones. Cisco mainly promotes Skinny as a lightweight implementation of H.323. The operations usage may result to typical proprietary protocol problems (e.g. 3rdparty firewalls can’t NAT correctly or otherwise function incorrectly and debugging support is dependent on vendor support engineers). In addition VoIP-phones and media gateways, companies like Symbol Technologies and SocketIP have implemented this protocol also in softswitches that enable interaction of devices that use different signaling protocols.
SIP Event Notification [22] is a basic notification framework that allows for asynchronous messaging. It is based on the NOTIFY and SUBSCRIBE messages. In order to represent the user, the framework defines a subscriber that subscribes to certain kind of target state changes and a state agent that is responsible for sending the notifications. The framework can be fully exploited by extending it with Event Packages, which describe the states and the content exchanged in the SUBSCRIBE and NOTIFY message payloads. This enables defining e.g. buddy lists and message waiting –services. The Event Package approach lacks hierarchy since Event Packages are in themselves independent of each other. Event Template Packages allow defining state and information exchange that affects all Event Packages. SIP Instant Messaging [27], developed by SIP for Instant Messaging and Presence Levereging Extensions (SIMPLE) WG, is one of the possible approaches to Instant Messaging standardization over the Internet. SIMPLE conform to the IETF Presence Framework [26]. SIMPLE defines an Event Package that whereby a Presence Agents (PAs), which are extended SIP UAs, receive and subscribe to notifications as well as notify the user on presence state change. A user can have multiple PAs, as users can have multiple UAs in SIP. Each PA has a unique SIP address. The presence information is targeted towards one or many of them explicitly. In addition there are Presence User Agents (PUAs) that are independent of the SIP UA. These enable to produce and manipulate presence information without having to have other SIP properties. As in PAs, a user can have multiple PUAs. However, PUAs cannot be sent presence information. As it is, if you e.g. want to sent Christmas Greetings Instant Messages to a bunch of friends, the PUA entity allows for that quick-and-dirty SIP client, since you most probably don’t want it to receive anything. While all this is neat, it’s perhaps good to recall that SIP was first and foremost a control-plane signaling
C. SIP alternatives
D. PSTN integration
SIP for Telephones (SIP-T) [62], created by the IP Telephony WG, provides architecture for SIP usage over PSTN connections. It describes four different scenarios
14 (17)
by either SIP bridging gateways for SIP UA messages over PSTN to another SIP UA, PSTN signaling protocols over SIP-based VoIP-network to another PSTN phone as well as interworking SIP-to-PSTN and PSTN-to-SIP. SIP and PSTN signaling protocols such as ISUP are not totally compatible. Possibilities to enable internetworking include carrying ISUP messages as MIME data in SIP payload or adding messages that can be used to carry ISUP messages without affecting SIP dialogue state, similarly to UPDATE-request. Third Party Call Control (3pcc) [37], currently under investigation for SIP, allows a controlling party to manage the session. Besides possible session mobility use, 3pcc is usable for conferencing, call center call transfers and similar typical PSTN services. However, with SIP, there are a multitude of new applications for 3pcc including a click-to-dial approach whereby the user simply clicks a web page that initiates the call for the user to the given SIP address. However, as with mobility, managing this becomes non-trivial when taking into account the resource reservations that could have already been established for the call. E. 3GPP SIP usage and modifications
In order to use SIP in 3GPP network architecture [48] and more specifically in the 3GPP IP multimedia system (IMS) [46], SIP has to conform to the system requirements. Besides IMS, 3GPP has packet-switched network domain This domain however lacks real-time data, error correction or header-compression. The 3GPP requirements [36] and dependencies [44] are under active update. 3GPP SIP defines three SIP proxies called Call/Session Control Functions (CSCF). Each has a distinct role with specific responsibilities in the IMS, taking into account the fact that the user is located often in a non-home domain. The responsibilities vary in areas of registrations; session management and charging and resource utilization [45]. The interoperation in protocol message level for each SIP entity and IMS component has been defined in a step-by-step example that also explains the interleaving with 3GPP network issues [45]. To supplement this, conformance requirements in order to support SIP in IP multimedia systems have been further refined [47].
V.
CONCLUSIONS
While SIP may be the IETF protocol for doing signaling over the Internet, one has to wonder about SIPs design priorities and goals. First of, due to e.g. the unresolved issue of emergency calls, the PSTN integration will remain an open issue for some time. Secondly the symmetric response routing obviously signals (no pun intended) lack of appreciation for realworld networks that thankfully do include NAT and private networks. Thirdly SDP, a protocol intended for announcing multimedia conferences, was adopted to do the multimedia parameter negotiation. Except of course it lacked the negotiation part. While IETF has still got some work to do, the 3GPP, with its intension to put SIP to tens of millions of phones, can expect to likewise encounter challenges on the road: even though adoption will be gradual, a piloting phase with pure VoIP solutions prior to 3GPP deployment would enable getting production experience with the protocol. However, with H.323 on the market with its maturity, we might be forced to miss out this opportunity on many deployment occasions. From protocol adoption point of view, the Megaco/H.248 approach of joint development might have at least reduced the competitive spirit in H.323 and SIP camps. Whatever the goals for SIP are, whatever the future for SIP deployment holds, everything in SIP screams of evolutionary design. If one looks at the design by layers -approach, it does enable handling SIP concepts such as entity identity, transactions and dialogue in its own cute little way. However, it really doesn’t help one bit when thinking about extending SIP. And to be honest, that’s all that has been thought about ever since the protocol gained the proposed standard status. The question remains whether the next proposed standard of SIP will just have some of the new features integrated to it or if it will actually make changes to the protocol core features. There is still time for a revision prior 3GPP deployment. Today it can be done. Tomorrow on the other hand is another story completely. Then again, maybe the whole signaling issue really is a small thing in the big picture and as long as the emergency call issue is resolved, it really doesn’t matter that much.
15 (17)
REFERENCES [1] H. Schulzrinne et al., “RTP: A Transport Protocol for Real-Time Applications”, RFC 3550, July 2003 [2] H. Schulzrinne et al., “Real Time Streaming Protocol (RTSP)”, RFC 2326, April 1998 [3] R. Arango, “Media Gateway Control Protocol (MGCP) Version 1.”, RFC 2705, October 1999 [4] F.Cuervo, “Megaco Protocol Version 1.0”, RFC 3015, November 2000 [5] Tom Taylor, “Megaco/H.248: A new standard for Media Gateway Control”, IEEE Communications, October 2000 [6] G. Huston, “Next Steps for the IP Quality of Service Architecture”, RFC 2990, November 2000 [7] IEEE Network, Multicasting: an enabling technology, Special Issue January/February 2003 [8] SIP Forum, www.sipforum.org [9] SIP Center, www.sipcenter.com [10] M.Handley, V. Jacobson, “SDP: Session Description Protocol”, RFC 2327, April 1998 [11] J. Rosenberg, H. Schulzrinne: “An Offer/Answer Model with the Session Description Protocol (SDP)”, RFC 3264, June 2002 [12] C. Camarillo et al, “Grouping of Media Lines in the Session Description Protocol (SDP)”, RFC 3388, December 2002 [13] G.Camarillo, A.Monrad, “Mapping of Media Streams to Resource Reservation Flows”, RFC 3524, April 2003 [14] M. Handley, “SDP: Session Description Protocol”, Internet-Draft, September 4, 2003 draft-ietf-mmusic-sdp-new-14.txt [15] M Handley et al., ”SIP: Session Initiation Protocol”, RFC 2543, June 1999 [16] S.Petrack et al., “The PINT Service Protocol: Extensions to SIP and SDP for IP Access to Telephone Call Services”, RFC 2848, June 2000 [17] J.Lennox et al., “Common Gateway Interface for SIP”, RFC 3050, January 2001 [18] G. Camarillo, Ed., W. Marshall, Ed., J. Rosenberg, “Integration of Resource Management and SIP”, RFC 3312, October 2002 [19] J. Rosenberg et al., ”SIP: Session Initiation Protocol”, RFC 3261, June 2002 [20] J. Rosenberg, H.Schulzrinne, ”Reliability of Provisional Responses in the Session Initiation Protocol (SIP)”, RFC3262, June 2002 [21] J. Rosenberg, H. Shultzrinne: “SIP – locating servers”, RFC 3263, June 2002 [22] A.B.Roach, “Session Initiation Protocol (SIP)-Specific Event Notification”, RFC 3265, June 2002 [23] J.Rosenberg, “The session initiation protocol (SIP) UPDATE method”, RFC 3311, September 2002 [24] J.Peterson, “Privacy Mechanism for SIP”, RFC 3323, November 2002 [25] J. Arkko et al., “Security Mechanism Agreement for the Session Initiation Protocol”, RFC 3329, January 2003 [26] M.Day et al, “A Model for Presence and Instant Messaging”, RFC 2778, February 2000 [27] B.Cambell et al., “Session Initiation Protocol (SIP) Extension for Instant Messaging”,RFC 3429, December 2002 [28] G. Camarillo, “Compressing the Session Initiation Protocol (SIP)”, RFC 3486, February 2003 [29] H. Schulzrinne, “Requirements for Resource Priority Mechanisms for the Session Initiation Protocol (SIP)”, RFC 3487, February 2003 [30] J. Rosenberg et al, “An Extension to the Session Initiation Protocol (SIP) for Symmetric Response Routing”, RFC 3581, August 2003 [31] D.Willis, B. Campbell, ”Session Initiation Protocol Extension to Assure Congestion Safety”, Internet-Draft, February 12, 2003 draft-ietf-sip-congestsafe-01.txt
[32] S.Donovan et al.,”Session Timers in the Session Initiation Protocol (SIP)”, Internet-Draft, July 1, 2003 draft-ietf-sip-session-timer-11.txt [33] S. Donovan, J,Rosenberg, “The Stream Control Transmission Protocol as a transport for the Session Initiation Protocol”, Internet-Draft, expired, 2002 draft-ietf-sip-sctp-03.txt [34] R.Mahy, “Connection Reuse in the Session Initiation Protocol (SIP)”, Internet-Draft, Aug 2003 draft-ietf-sip-connect-reuse-00.txt [35] J. Lennox et al.,"CPL: A language for user control of internet telephony services", Internet-Draft, August 2003 draft-ietf-iptel-cpl-08.txt [36] M.Garcia-Martin, “3rd Generation Partnership Project (3GPP) Release 5 requirements for Session Initiation Protocol (SIP)”, Internet-Draft, draft-ietf-sipping-3gpp-r5-requirements-00.txt [37] J. Rosenberg et al, “Best Current Practices for Third Party Call Control in the Session Initiation Protocol”, Internet-Draft, June 30, 2003 draft-ietf-sipping-3pcc-04.txt [38] G. Camarillo, P. Kyziwat, “Interactions of Preconditions with Session Mobility in the Session Initiation Protocol (SIP)”, Internet-Draft, August 28, 2003 draft-camarillo-sip-rfc3312-update-00.txt [39] Henning Schulzrinne et al., ”Session Initiation Protocol: Internet-Centric Signalling”, IEEE Communications Magazine, October 2000 [40] Henning Schulzrinne, Elin Wedlund, “Application-Layer Mobility using SIP”, ACM Mobile Computing and Communications Review, Volume 4, Number 3, July 2000 [41] Schulzrinne, “Universal Emergency Address for SIP-based Internet Telephony”, expired Internet-Draft, February 2002 draft-schulzrinne-sipping-sos-01.txt [42] Schulzrinne, ”Requirements for Session Initiation Profocol (SIP) –based Emergency Calls”, expired Internet-Draft, February 21, 2003 draft-schulzrinne-sipping-emergency-req-00.txt [43] W. Simpson, “PPP Challenge Handshake Authentication Protocol (CHAP)”, RFC 1994, August 1996 [44] Stephen Hayes, “3GPP IETF Dependencies and Priorities”, www.3gpp.org/TB/Other/IETF.htm [45] 3GPP TS 24.228, “Signalling flows for the IP multimedia call control based on SIP and SDP” (Release5), v.5.5.0, 2003 [46] 3GPP TS 23.228: “IP Multimedia Subsystem (IMS)” (Release 5), v.6.2.0, 2003 [47] 3GPP TS 24.229: “IP Multimedia Call Control Protocol based on SIP and SDP”, Stage3 (Release 5), v.5.5.0, 2003 [48] 3GPP TS 23.002: “Network architecture” (Release 5), v.6.1.0, 2003 [49] Ion Stoica et al., “Chord: A Scalable Peer-to-Peer lookup service for internet applicatrions”, SIGCOMM01, 2001 [50] Sylvia Ratnasamy et al., ”A Scalable Content-Addressable Network”, SIGCOMM01, 2001 [51] OMG, Wireless CORBA adopted specification, 2003 www.omg.org/technology/documents/formal/telecom_wireless.htm [52] List of Public SIP Servers web page www.cs.columbia.edu/sip/servers.html [53] GNU oSIP library, www.fsf.org/software/osip/ [54] Parlay, http://www.parlay.org/ [55] Hong Liu and Petros Mouchtaris, “Voice over IP Signalling: H.323 and beyond”, IEEE Communications, October 2000 [56] Packetizer Inc, “H.323 versus SIP: a comparison”, August 11, 2003 www.packetizer.com/H_323 versus SIP A Comparison.htm [57] OMNCS, “GETS planning guide”, July 2003, gets.ncs.gov [58] JSR-32 JAINTM SIP API Specification , Final Release, 5.8.2003 www.jcp.org/en/jsr/detail?id=32
16 (17)
[59] JSR-116 SIP Servlet API, Final Release, 27.1.2003 www.jcp.org/en/jsr/detail?id=116 [60] Wikipedia encyclopedia www.wikipedia.org/wiki/Skinny_Client_Control_Protocol [61] Jorge Cueller et al, “Geopriv requirements”, Internet-Draft, Mar 2003 draft-ietf-geopriv-reqs-03 [62] A. Vemuri, J. Peterson, “SIP –T: Context and architecture”, RFC 3372, September 2002 [63] R.Fielding et al., “Hypertext Transfer Protocol -- HTTP/1.1”, RFC 2616, June 1999 [64] J. Franks et al, “HTTP authentication: Basic and Digest Access Authentication", RFC 2617, June 1999 [65] N. Freed, N. Borenstein, “Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types”, RFC 2046, November 1996 [66] B. Ramsdell, “S/MIME Version 3 Message Specification”, RFC 2633, June 1999 [67] Charles Perkins (ed), “IP Mobility Support for IPv4”, RFC 3344, August 2002 [68] Charles Perkins, “Mobile IP”, Landmark 10 IEEE articles, 50th Anniversary Commemorative Issue, IEEE Communications Magazine, May 2002 [69] T. Dierks, C. Allen, “The TLS Protocol version 1.0”, RFC 2246, January 1999 [70] S. Blake-Wilson et al., “Transport Layer Security (TLS) Extensions”, RFC 3546, June 2003 [71] D. Crocker (ed), “Augmented BNF for Syntax Specification: ABNF”, RFC 2234, November 1997 [72] CERT, “Advisory on Smurf Attacks”, CA-1998-01, 1998 www.cert.org/advisories/CA-1998-01.html [73] Lixia Zhang et al, “RSVP: A new Resource ReSerVation Protcol”, Landmark 10 IEEE articles, 50th Anniversary Commemorative Issue, IEEE Communications Magazine, May 2002
17 (17)