Abstract. Streaming media from the Internet is a successful application for end-users. With the upcoming success of mo- bile devices and home networking ...
Multimedia Systems 8: 397–405 (2002) Digital Object Identifier (DOI) 10.1007/s00530-002-0063-2
Multimedia Systems © Springer-Verlag 2002
A proxy architecture for collaborative media streaming Verena Kahmann, Lars Wolf Institute of Operating Systems and Computer Networks, Technical University Braunschweig, Muehlenpfordtstr. 23, 38106 Braunschweig, Germany
Abstract. Streaming media from the Internet is a successful application for end-users. With the upcoming success of mobile devices and home networking environments, cooperation among users will become more important in the future. To achieve such cooperation, explicit middleware standards have been defined. On the other hand, Internet conferencing applications do not handle collaborative streaming sessions with individual control for each user. We propose a new concept for cooperation exemplary for collaborative media streaming using IETF multimedia session control protocols together with a proxy architecture. This concept enables both synchronization among clients and flexible control to individual users. Key words: IETF Protocols – Multimedia session control – Collaborative applications
1 Introduction Media streaming from Internet media servers has recently become an important application. Other common multimedia applications are video conferences where several users are able to share a common channel. With conferencing, collaborative applications have gained interest. Today, whiteboard applications exist, allowing users to share common spaces as well as to work in their own private space. Streaming applications, however, are not used as flexibly at times. It is possible to stream media to a conference, but mostly, all users have the same view of the content. For live media, this is exactly what is desired, and the solution to subscribe to a common multicast group is well known. However, for recorded files, streaming gives the possibility of controlling the content (e.g. to jump to a different position), which is desirable, for example, if users wish to view a movie scene once again. In conferences, existing applications only allow one user at a time to execute such a control functionality. Thus, the remote control must be negotiated and passed among users applying a specific “floor control” protocol. Since user demands may differ a lot, we state that streaming control should be possible for each user individually to enable new scenarios of collaboration.
Another collaboration scenario we have examined further is the idea of users meeting spontaneously, where one user joins the streaming application of another. Here, the joining user’s movie should be synchronized, but individual control must be possible, because users need to keep their streaming application beyond the end of the meeting. It is eligible to support existing streaming applications with the ability to provide for such collaborative sessions using the middleware layer of multimedia systems. In the IETF protocol stack, such a middleware layer is not defined explicitly, but several higher-layer multimedia protocols can be used to serve as middleware. Therefore, we define a proxy architecture where users can collaboratively watch streamed movies, as well as flexibly control their own streaming session if they wish to. The architecture builds on IETF protocols, because a requirement was to be able to work with existing streaming servers.
1.1 Collaborative media streaming In contrast to downloading a media file as a whole, streaming offers real-time delivery of live streams as well as of recorded files. Additionally, the streams of a movie can be controlled as an aggregate or as single streams. The Real-Time Streaming Protocol (RTSP) [10] of the IETF multimedia control architecture provides the means for controlling the streaming session. Its functionality is comparable to a video-cassette recorder. The IETF has defined the Session Initiation Protocol (SIP) [4] for call or conference setup. Other common architectures are H.323 or Mbone conferencing tools. Streaming media to a conference may be achieved by subscribing to a common multicast group, e.g. by specifying a multicast session address in the invitation. However, this does not enable flexible control, and we have already mentioned that in entertainment scenarios individual control is highly desirable. We denote each scenario, where users synchronize at certain times with other users of the application, but generally keep streaming control individually, by the term collaborative media streaming. In this case, combining media streaming with conferencing is not trivial, because both synchronization and flexibility should be applied where demanded.
398
Combining RTSP and SIP, it is possible to invite users to an existing streaming session and pass the URL of the stored movie to the users, i.e. to perform a push of the stream to a new user. In the other direction, i.e. to actively get or pull the stream, it is also possible to join such a streaming conference retrieving the URL of the stored movie using the SIP conferencing service. The push to or the pull from another user is what we generally call an application extension. The application of the client then can do its own streaming session setup and control the stream with RTSP signaling (PAUSE, PLAY etc.). Overall, this results in several streams of the same content to different users. Hence, a synchronization between these media streams is needed to protect the collaborative nature of the scenario. However, such a synchronization is not covered by the RTSP standard. 1.2 Architectural concepts We have defined a proxy architecture to enable both synchronization and flexibility on collaborative streaming sessions. In particular, we consider synchronization at the time a user joins the collaborative streaming session. By using a proxy, existing media servers in the Internet need not be changed. In this architecture, we define the concept of association to bind two or more clients of a streaming session to a relation, which is more loosely coupled than a multicast group. For example, each user may jump freely in the movie, thus opening sub-associations with other users. Synchronization in our architecture is achieved by handling associations and calculating timing information for any client joining an association. We introduce a proposal to map session invitation and streaming control requests to association requests. We do this using the examples of the IETF multimedia control protocols like SIP and RTSP. The proxy submits the calculated timing information to the media server by means of standard RTSP requests, to jump to the correct position in the media stream. For synchronization of streams where conferencing is not needed, we propose to use a service discovery protocol for application extension. As we use IETF application layer protocols we are not dependent on special hardware or specialized middleware standards like HAVi, but such standards can inter-operate with our concept if they have the Internet protocol stack implemented. An important point in our proxy concept is to keep signaling and data paths separate. The advantage is that streaming media is independent of the channel used for the invitation. The breakdown of the connection between users does not necessarily include the breakdown of the streaming session. Therefore the concept is suitable for networks with partial disconnectivity. Another advantage is that participants can choose their own streams, e.g. for audio streams with different languages or different levels of quality of service. However, it is possible to define an option allowing the proxy to participate in the data path, in order to take advantage of optimizations like caching. In order to obtain applicability in many network environments, we have decided to use a service discovery protocol to detect proxy and streaming services. We have used the Service Location Protocol (SLP) of the IETF to keep with the IETF protocol context and because the protocol is applicable in ad hoc networking environments, as well as in wider scopes. As
V. Kahmann, L. Wolf
already mentioned, a service discovery protocol can be used for push or pull services where conferencing is not applicable. Thus, we use SLP for the implementation of the pull service. Our concept may be used in any IP networking environment. Clients and servers of higher-layer IETF protocols only need TCP or UDP sockets. Provided that media streams are divided into several tracks according to QoS requirements or their content (e.g. different languages), the concept allows clients to choose individually tailored streams, because RTSP may control single tracks of a stream if needed. Thus, small clients in environments with scarce bandwidth resources can be supported. In the rest of this paper, we first introduce the IETF protocols basic to the architecture (Sect. 2) before the architecture of a streaming proxy serving as a session management entity is presented (Sect. 3). The building blocks and the services concept of our architecture are described in the following subsections. Then we propose our concept of association in Sect. 4, before considering session setup and mapping of control functionality to association requests in Sect. 4.1. We discuss some of the implementation aspects in Sect. 5. Among these are synchronization issues and the restrictions given in our prototype design. An overview of related work is given in Sect. 6, before we finally conclude the paper. 2 IETF protocols The multimedia session control protocols of the IETF have found wide acceptance today because of their lightweight implementation and flexibility. All signaling protocols (RTSP, SIP and SLP) can be delivered over TCP or UDP. If delivered via UDP, an appropriate retransmission mechanism with exponential back-off must be implemented. RTP is implemented over UDP in today’s Internet, though this is not necessary by protocol design. 2.1 RTSP RTSP [10] is quite a settled protocol for streaming media. It only provides signaling for streaming, and normally does not care about the transmission of data. It is intended for unicast as well as for multicast streaming. However, using multicast, it is not possible to control streaming with commands like fast forward or jump to a specific position. RTSP uses the client-server paradigm, but in contrast to HTTP, servers may also send requests to clients. This is, however, restricted to specific commands. The protocol cycle is similar for each streaming session, and can be divided into the parts of session setup, control and teardown. All requests must be acknowledged with an appropriate answer code, for example 200 for OK. Setting up a streaming session is done by requesting a description document with a DESCRIBE command. Description documents mostly follow the Session Description Protocol (SDP) standard, and give information about the media session like IP address, media streams, RTP profiles etc. If the information in the description is suitable, a RTSP session is set up with the SETUP request issued by the client. After successful session setup, the client may issue control requests like PLAY or PAUSE commands. A PLAY request
A proxy architecture for collaborative media streaming
may have a range header indicating the position in the movie from which the streaming should be started. Jumping to a different position in the movie is thus done by a PAUSE followed by a PLAY with the specific position. Eventually, the end of a session is requested by a TEARDOWN. Both server and client can end a session in this way. RTSP servers may allow other options or parameters which can be retrieved by the OPTION or GET PARAMETER request, respectively. In the standard, the possibility for using a proxy has been defined. RTSP proxies act as virtual servers for the clients connecting to them, and as virtual clients for the servers they talk to. Proxies can manage sessions for their clients, a property which has been used for caching purposes mainly. In our architecture, we only consider unicast streaming because of the inapplicability of control commands when using multicast. In our original design, we do not consider the case of caching, and the concept does not require caching, because the proxy need not appear in the data path. If the proxy is allowed to participate in the data path, our architecture can also benefit from caching. 2.2 SIP SIP [4] has emerged in recent years as an alternative to H.323 for IP Telephony. The protocol is used for the setup and management of multimedia calls or conferences on the Internet. Similar to the RTSP case, data transmission is handled by RTP. In contrast to H.323, SIP is considered to be a more lightweight protocol. It also allows a more flexible setup of proxy components, because such components react similar to normal clients. The INVITE method is used to call a user into a conference. A media description following the SDP standard is included in the invitation. Responses to the INVITE method are used to indicate the processing of the call. They may be intermediate, e.g. for “ringing”, or final, if the callee has accepted or rejected the call. A special ACK method is used to indicate that a call has actually been set up. With this handshake protocol, the media used can be negotiated. SIP proxies or redirect servers may locate a client or a user by addressing a location service which may consist of registrars using data bases. Proxies then forward the request to the address given in the data base, whereas redirect servers send this address back to the caller. Proxies may fork a request if there are several possibilities given to reach the callee, or use subsequent requests to the addresses given. Additional SIP methods for call control have been defined which will not be considered here further. SIP is interesting for collaborative media streaming because users may want to talk about the movie they watch together. SIP normally establishes such a communication channel. This channel among session participants could also be used for synchronization improvements if necessary. However, even SIP tends to be quite complicated, above all if conferencing is applied. Conference servers or other methods must be used if users want to join a conference. We have already mentioned applications where some of the features may not be required if the collaboration is constrained on media streaming only. SIP just offers too much functionality to
399
issue a quick invitation while not using anything of the protocol any more. Therefore, we have favoured the use of a service discovery protocol for such scenarios. In our prototype design we have kept SIP for the Push Service to show the applicability of our architecture for scenarios where conversation with other users is desired while streaming. 2.3 RTP The Real-Time Transport Protocol (RTP) [9] is the most important and well known protocol for multimedia data transmission, introducing concepts like sequence numbers and timestamps. Since the protocol is closely tied to applications, these are effectively supported in playing out multimedia streams. The companion RTP control protocol (RTCP) is used to periodically transfer conference state between senders and receivers. It is extendable to transfer arbitrary information. Thus, it is applicable to help in synchronization of sessions. Other projects have used RTCP to enable soft-state protocol processing. In our architecture, the proxy is normally not involved in the data path. Therefore, we will not consider specific RTP issues like evaluating timestamps or using RTCP to transfer conference state except where already available. 2.4 SLP SLP [3] does not belong to the multimedia control protocol family. It is an application-layer protocol for service discovery, and has been designed for scalability on enterprise level. However, it can be used in other network environments, even in ad hoc networking. In SLP, agents represent network elements. The user agent (UA) stands for the client, the service agent (SA) represents the service provider, and service advertisements can be stored in a directory agent (DA). The DA is not necessarily needed, and may be omitted where it is hard to be configured. In that case, which may be appropriate for ad-hoc networks, the UA multicasts or broadcasts a Service Request (SrvRqst) with specified characteristics of the required service. Any listening SA compares the request s to the services it offers, and if these match, a Service Reply (SrvRply) containing the Service Access Point (SAP) is sent using unicast. The SAP together with the attributes of a service are contained in a service: URL scheme. The service scheme is a URL scheme starting with the string “service:” followed by the service type name. For each service type, a service template defines the corresponding service attributes and the service scheme associated with it. A service can have attributes. Services are registered together with their attributes at the DAs. A request for a service can also contain a subset of the attributes, allowing for search for services matching certain criteria. Services that are registered at a DA have a certain lifetime, so if the SA cannot de-register for a certain reason, the service does not remain stale. The so-called mesh-enhanced SLP (mSLP) implementation considers interworking of several DAs, therefore even in ad-hoc networks DAs may be applied, because they can be replicated that way.
400
V. Kahmann, L. Wolf
In our architecture, we use SLP for service discovery of streaming and proxy servers as well as for discovery of movies another client streams at times. The latter is needed for an easy modeling of the pull service.
3 Streaming Proxy The possible scenarios we consider for our architecture include getting a movie from another user (Pull Service) or showing a movie to another user (Push Service). In both cases, the other user must be located and in the case of getting a movie, even the application running on the other user’s screen must be located. Moreover, the second user demands to get the stream synchronized with the first client. Therefore, a relation between users must be established. We thus propose to use a proxy acting as a mediator between clients and media servers. Thus, existing media servers need not be rewritten with the ability for application extension. Since the proxy only influences the signaling path, it may be co-located with a media server or gateway, and even with a client. We have used the fact that several IETF protocols have defined proxy or gateway components already, thus we can re-use existing components as far as possible.
3.1 Building blocks The design of a Streaming Proxy in our architecture can be separated into the parts of protocol processing and management. The protocol processing part consists of the RTSP or SIP proxies and the SLP SA/DA, whereas the management implements the association concept as shown in Sect. 4. It contains a registry and a synchronization block. The building blocks needed to deploy the application extension functionality are shown in Fig. 1. Note the separation of signaling and data path, since the proxy is only involved in the signaling path. The proxy functionality for the signaling protocols SIP and RTSP acts mostly as virtual client/server in each case. Additionally, we add a SLP directory agent to register services. Besides mere processing of protocol requests a database is Media Transport (Server)
needed to save client and services data. To enable collaborative streaming, the corresponding requests are passed from the SIP or RTSP proxy entities to the synchronization block (see Sect. 4.1 for an example). This block processes the requests according to the functionality for implementing the association concept as shown in Sect. 4, and sends them to a media server handling the actual media transport as done by existing RTSP servers, if necessary. Additionally, it could also give feedback to the clients for smoothing out buffer differences (confer Sect. 5.1). Proxies for SIP and RTSP are defined in the corresponding IETF documents [4,10]. SLP directory agents have been initially defined in Guttman et al. [3], which is currently under revision by the corresponding working group.
3.2 Services in the proxy architecture As already mentioned, besides a mere streaming service we offer push and pull services to each client. By using a push service, the client can invite a person to a collaborative streaming session. The push service can best be modeled by using an protocol allowing to issue invitations. We decided to use SIP for the push service to examine the usefulness for conferencing. For scenarios without conferencing, the push service can be modeled with service advertisements, but we have not examined this further. In contrast, the pull service is used to look up a movie another user streams and to join that collaborative streaming session. This can be modeled using SIP conferencing. However, conference servers are not easily available. We therefore have decided to implement a different version of the pull service without conferencing, using service discovery only. Since we do not need more state for opening the streaming client application than the URL, the pull service can be modeled as a search for streaming services which are streamed by a certain user. The client should therefore be registered with the streaming service at the proxy to allow searching for the client’s identifier as a specific attribute. To support the use of push and pull services, we register each movie a certain user chooses for streaming at the proxy in a specific structure defined by the association concept introduced below. This association is considered a service used by the Push and Pull Services for establishing a relation between Service Registration
Proxy Proxy Management Management
Streaming Service
Association Service SIP SIPProxy Proxy/ / Conference ConferenceServer Server SLP SA / DA
RTSP RTSP Proxy Proxy
SIP / SLP signaling RTSP signaling DATA transport
Fig. 1. Streaming proxy building blocks
Synchronization
Registry
Establish Association
Get / Show Movie
Push (SIP)
RTSP Proxy
Establish Association
URL
Pull (SLP) URL
Fig. 2. Service components
URL
RTSP Client
A proxy architecture for collaborative media streaming
401
Association:
RTSP requests
SIP requests
Association Mapper
ApplicationHandle Association methods
StartTime
StartPosition
ClientList
Proxy Association Manager Time information
ClientList:
RTSP Proxy
C1
C2
…
Cn RTSP requests Client
RTSP Client
Service Registration SIP Proxy
SLP SA/DA
SLP Requests
SLP UA
SIP requests
SIP User Agent
Client UI Control of Movie
SubStartTime SubStartPosition
SubClientList
Sub-Association Fig. 3. Association structure
users, and by the Streaming Service to retrieve information about the state of related users’ streaming sessions. We show the components of the services in Fig. 2. In our prototype, the Push Service consists of SIP clients of the involved users and the SIP proxy. The Pull Service consists of the SLP UA at the client and the SLP SA or DA at which the association service registers. The streaming service contains RTSP client, proxy and possibly streaming servers. The latter, however, will not communicate directly with the association service. The streaming service is registered at the association service, together with timing information. This timing information is used by the synchronization block. Push and pull services both require us to establish a relation between users, which means that the new client must be registered with the association service. The streaming service can then use the association service for retrieving timing information to synchronize related clients.
Call / Conference Control
Fig. 4. Architecture of association management
Also, elements that should be controlled by a single person can be protected for a certain period on behalf of the association. The data flow is separate from the association. Therefore, every client can take the desired streams by means of RTSP session setup independently from other clients, which is reasonable because clients or users apply their own capabilities or preferences, respectively. Nonetheless it is possible to request the same set of streams another client is currently receiving, e.g. by exporting that information to the lookup service and retrieving it on application lookup. We show the architecture, namely the interaction of SIP and RTSP proxy and SLP SA/DA building blocks with the association manager in Fig. 4. The Association Mapper maps SIP, SLP and RTSP protocol requests to the corresponding association methods which are given to the Association Manager. This building block manages the association list structure and returns timing information to the proxy building blocks. In the figure, the arrow between SLP SA/DA and Association Mapper to show processing of SLP requests has been omitted for the sake of clarity. 4.1 Association methods
4 Association concept We have introduced the concept of association in Kahmann and Wolf [7]. An association is, in the easiest case, a structure containing the location of the media, which is normally a streaming URL, and a list of client addresses. If synchronization of the clients should be obtained explicitly, the point of time and the position in the movie where streaming has been started must be stored. This is illustrated in Fig. 3. Clients can apply control commands like PAUSE or jumping to other points in the stream on their own, thus opening their own subassociation represented by a recursive structure with starting time, starting position and another client list in the figure. On leaving the sub-association, clients may easily return to the original timeline or to the position they left the main branch. In the latter case, this position has to be saved in the association structure. Other objects like access control lists can be easily added to the association, thus allowing the first client to grant or reduce permissions for looking up information on the application.
To obtain a defined interface and to enable re-usability of existing functionalities of IETF protocol proxies, we have defined a set of association methods. Each RTSP, SIP or SLP protocol request is mapped to an association method by the Association Mapper. These methods are used by the Association Manager to manipulate the association structure itself. Thus, they define an interface between the control functions applied by the client(s) and the logical relation between each of them. In our case, an association will be initialized when the first client starts a streaming session with an RTSP SETUP request. This request will be mapped to buildAssociation(). The buildAssociation() only initializes a new association if the client is not already added to an association with the corresponding URL. This is to handle following RTSP SETUP requests from associated clients, as well as RTSP SETUP requests to the same URL where a client is not associated to other clients. Additionally, the Association will be registered as a service at the SLP DA to provide for discovery of that service.
402
The second client will be invited to the session in the case of a push service, or do a self-invitation or service lookup in the case of a pull. This SIP INVITE or SLP SrvRqst will cause a bindAssociation() method. We have invented this method to enable a relation between RTSP and SIP or SLP requests, e.g. to tie the following standard RTSP methods to the suitable association. The client requesting the push or pull is added to the client list in the association and a pointer to a sub-association may be added where appropriate. We register the respective client in the attribute list of the Association service to enable searching for that client as an attribute. RTSP PLAY requests are mapped on the joinAssociation() method. For the first client opening a streaming session, the starting time and starting position must be added to the Association structure. The proxy saves the current time so as to calculate the time that has passed for later join requests. Since clients that want to join the association cannot use the range parameter on their own, they issue a standard RTSP PLAY request. The proxy then interprets RTSP PLAY requests from associated clients that they want to start streaming from the synchronized position. Thus, the actual streaming position has to be calculated from starting time and position or by using the RTSP GET PARAMETER request for the time that has already passed, to achieve a rough synchronization. This timing information will be returned to the RTSP proxy to forward the RTSP PLAY request with the range parameter set to that position to the media server. The media server then starts streaming to the joining client from the indicated position. Recall that the range parameter can also be set in RTSP PLAY requests originating from a client to indicate the position in the movie from which streaming is to be started. Normally, this is done after issuing a RTSP PAUSE request, as explained in the next paragraph. Thus, the range parameter means that the client is the only participant of a (sub)association. It is mapped to the required position in the stream. The Association Manager then processes this joinAssociation (StreamPosition) in such a way that the current time and the required stream position is added to the Association structure. We have already mentioned that a jump to a different position in a movie means that the RTSP clients sends a PAUSE followed by a RTSP PLAY with a specific position. The PAUSE request is then modeled by a pauseAssociation method. We have used this separate mapping to associations, because not every PAUSE needs to be a jump to a different position, and it may be difficult for a RTSP proxy to realize when a PAUSE is the introduction to a jump. The pauseAssociation checks in any case if the client is the only participant in a (sub-)association. If so, it does not need to open a new subassociation. Otherwise, a sub-association is opened by adding a pointer to a new association structure. The client opening the sub-association is added to the client list in the sub-association structure. A following PLAY request is handled then as described above. The end of a streaming session is indicated by a TEARDOWN. Both RTSP server and client can send this request, but these requests can be handled equally. RTSP TEARDOWN is mapped on leaveAssociation() which means that the client is deleted from each association it has participated in. If pointers to sub-associations with other clients participat-
V. Kahmann, L. Wolf
ing exist, they can be deleted, because these clients must all have own pointers to this sub-association. If the last client has issued a leave, the (sub-)association may be deleted. Proxy / Association Mapper
C1 RTSP SETUP
Association Manager
buildAssociation() C1
RTSP PLAY join Association()
C1(t1,P1)
SIP INVITE C2 bindAssociation(C2) t
Add actual time and position in stream to association
C1(t1,P1) C2
SIP INVITE
RTSP SETUP RTSP PLAY t
join Association()
C2
C1(t1,P1) C2(P2)
Calculate necessary starting position P2 for C2
TimeInformation(C2, P2) RTSP PLAY P2 t
t
t
Server
Fig. 5. Association setup for a push example
An example of a push service with the interaction of the proxy building blocks is shown in Fig. 5. In this example, the session setup of a first client C1 and the invitation with the following session setup for the second client C2 are illustrated. The corresponding association methods and their effect on the association structure are also shown. In the figure, the timeline for RTSP and SIP proxy building blocks is coupled with the timeline for the Association Mapper. All RTSP requests are forwarded to the Server, however, for the sake of clarity, only the request containing the range parameter is shown. For the same reason, most responses are omitted also. The pull service uses SLP for lookup. We use the fact that each user’s stream that is free to be extended to another user is registered as an association service at the proxy. In that way, the service can be found with a SLP SrvRqst. The corresponding URL to be used for the RTSP client application is delivered as result in the SrvRply. This protocol processing is shown in Fig. 6. The association methods we have described map basic RTSP, SLP and SIP functionality. Hence, applying any conProxy / Association Mapper
C1
Association Manager
RTSP SETUP buildAssociation() C1
RTSP PLAY join Association() t
C1(t1,P1)
Add actual time and position in stream to association
C1(t1,P1) C2
Add C2 to association, give information to synchronization block
C1(t1,P1) C2(P2)
Calculate necessary starting position P2 for C2
SLP SrvRqst bindAssociation(C2)
SLP SrvRply (URL) RTSP SETUP RTSP PLAY
join Association()
t C2
TimeInformation(C2, P2) RTSP PLAY P2 t
t
t
Server
Fig. 6. Association setup for pull example
A proxy architecture for collaborative media streaming
trol functionality on the movie opens a sub-association, so this client will stream the content independently. To explicitly follow other clients into their sub-association or to get back to the original timeline, special commands have to be applied by the client application. Since RTSP or SIP do not handle that level of cooperation, we will have to implement additional protocol functionality for the exchange of such messages. Such functionality may contain leaving the subassociation, following a certain client into the association, or opening a sub-association together with a group of clients. Additionally, helper functions like getting the list of clients in the association or in a certain sub-association are implemented. 5 Implementation aspects In this section, we present basic design and implementation issues. Since it is essential to use standard components, we implement a listener entity which catches all relevant IETF protocol requests and sends them to the Association Mapper defined in Sect. 4. SIP proxy servers and SLP DAs may send results to the recipients in parallel. The RTSP proxy must be modified in the sense that it must wait before sending RTSP PLAY requests to the streaming server until the time information is returned from the Association Manager. Association methods are sent to the Association Manager. In Sect. 4.1, we have shown that this Manager manipulates the Association structure. The fields of the Association structure are saved in entries in database tables. The registration table contains association and sub-association identifiers, together with client identifiers. For each association method, the Association Manager must retrieve the according information from the registration, add or delete entries, and save the new information. For each new sub-association, the StartTime and StartPosition field must be saved in the synchronization table. If a sub-association is deleted, the entries must also be deleted. Thus, the main task of the Association Manager in a real implementation of our proxy architecture is to subsequently manipulate the database tables according to the Association methods. Our concept abstracts from underlying network technology, so both multicast and unicast can serve as network technologies for implementations. Thus, if only synchronization is needed without control functionality, this could be done using standard IP multicast. Applied to our concept, this means that the association will only be a handle used for lookup without synchronization issues. If the need for individual control is still given, streaming to different multicast groups on demand can be a solution. The approach we follow means that modifications in the control functionality of current streaming clients and additional functionality of RTSP proxies (like the interaction with the association manager) will have to be implemented. However, synchronization with another streaming client can be achieved without changing SIP or RTSP protocol methods. 5.1 Synchronization issues We have kept signaling and data path separate for the reasons of flexible control. Another advantage is that data paths can
403
be separate if a distinct QoS should be applied, or users are mobile with changing routes to them. Yet, this separation can be a disadvantage in some cases if synchronization is critical. If timing information is calculated by the association manager and passed to the media server, there is a delay between the synchronization point and the point when the new media streams to the second client are sent out. However, we apply the same mapping to each PLAY request, and if timing information can be calculated fast enough and transmission delay between proxy and server will not change, e.g. if the proxy is co-located with the server, the time between joining an association and processing the PLAY request in the server will be the same for each client. In conferencing applications, clients may require strict synchronization because they want to talk about what they see. Since our concept is independent of data transport, the (sub-)associations may be mapped to multicast groups. Joining an association corresponds to joining a multicast group then. Opening a new (sub-)association means to open a new multicast group. This must be signaled to the media server, therefore changes in the signaling protocol as well as in the media server need to be made. Another precondition is that multicast routing is available. A drawback of this approach may be the waste of multicast addresses for large numbers of sub-associations. Another synchronization issue is caused by different playout buffers in client applications which may not be solved by applying multicast. The playback in one client may start earlier or later dependent on buffer size. However, if these buffer sizes can be determined, the difference between them can be calculated and put into the timing information sent to the media server. Moreover, such information could be sent to clients to enable them to apply buffer-level control mechanisms. Proposals for implementing a protocol for synchronization between clients in that way can be found in Rothermel and Helbig [8]. Transcoding of the media streams for only a set of clients may cause additional delay potentially leading to poor synchronization. If transcoding is necessary, additional features have to be added, for example an artificial delay for other streams. However, to estimate which delay is introduced by transcoding may be difficult to do.
5.2 Restrictions In our prototype implementation, we have only considered basic functionality of the presented IETF protocols necessary for collaborative media streaming. For example, we did not implement queueing of subsequent RTSP PLAY requests, because most RTSP clients do not implement this functionality. Similarly, we did not consider time ranges in the range parameter. Up to now, we have not considered malicious users and security hazards. However, IETF protocol requests can be authenticated if necessary. Secure key distribution may remain a problem in some networking environments, above all ad hoc networks. The approach as proposed is implemented as a hard-state architecture, however expiry times for associations can be added. In the case that the proxy may participate in the data
404
path, RTCP messages could be used for refreshing state. Otherwise, specific protocol messages may be exchanged. The case of collaboratively streaming media with both synchronization and flexible control has not been considered in the RTSP standard. To be able to re-use existing protocol stacks, we interpret the RTSP PLAY request of clients within an association as the demand to synchronize with the association. It may be discussed how far such collaborative streaming sessions could be of interest to add a multi-party RTSP to the existing standard. The advantage of a multi-party RTSP may be that protocol processing is more straight-forward, because protocol requests or headers that are particularly suitable for association management can be introduced. For instance, the range parameter “now”, which is at times restricted to live media, can then be applied to collaborative streaming of recorded files. This simplifies the case of clients joining the conference after beginning of the streaming session. However it is questionable if spontaneous collaborative streaming sessions without conferencing could benefit from this approach also. 6 Related work Up to now, several middleware approaches have been defined which allow the extension of applications. For example, in the home network environment, HAVi [5] defines a middleware layer which allows the extension of applications to different places. The drawback of these approaches is mostly the restrictedness on a special networking environment. If clients in different environments want to co-operate, additional mapping functionality must be deployed in both stacks. With our approach, every system having the IP suite implemented can run our application or be equipped with the proxy functionality. Thus, interoperability is easily provided. Recently, powerful middleware frameworks like CORBA have defined additional specifications to support streaming applications [12]. The combination of such streams with conferencing in the sense that each user may have individual control of the media streams has not been addressed there explicitly. Such a system could be implemented using methods from other CORBA specifications that enable cooperative applications [13,14]. However, CORBA media servers are not widely deployed in today’s Internet, and we felt a mapping of RTSP commands to CORBA object manipulations in the proxy could add processing overhead. Since the processing of client requests must be handled similar as in our concept, an evaluation of this could be a future work issue. In Internet environments, concepts of group communications have been implemented using multicast. Streaming multimedia content using multicast has been proposed, for example, by Jonas et al. [6] or Sen et al. [11]. Sharing of streaming applications has not found much consideration yet. One approach is shown in Fortino and Nigro [2], with a proposal for cooperative playback sessions. In that approach, the main concept is the conference control mechanism between clients and the propagation of control messages to other clients. Media streams are transported by using multicast. However, there are some drawbacks implementing multicast for application extension and data transport. In today’s Internet, multicast routing is typically not available, so it is necessary to provide for tunneling or the implementation of
V. Kahmann, L. Wolf
reflectors splitting multicast into unicast connections. Another disadvantage from the user point of view is that the user cannot use control functions like jumping to a different point in the movie. Our concept has the advantage of granting control to each user so that the clients need not necessarily follow but can apply all control commands themselves. For learning systems in particular, this may be valuable, because users may want to repeat their lessons on their own without asking others. To overcome the restrictions of IP multicast, proposals have been made to use application-level multicast. An overlay network as shown as in Chu et al. [1] could also be used to copy media streams at a client and send them to joining users. However, it is difficult to handle individual user demands in this case, because the client copying the media to the joining client would receive a large amount of data it does not need. A conceptually similar idea to transfer conferencing state to users joining a conference at later times has been proposed in Vogel et al. [15]. However, the approach has been designed for whiteboard applications. In contrast to such fully distributed applications, where it is possible to save a local copy of the content, this may not be done with streaming media, mostly due to copyright restrictions. 7 Conclusion and future work In this paper, we have presented an architecture for collaboration among streaming clients without specifying an explicit middleware layer. Instead, a proxy architecture has been introduced to enable middleware support for streaming applications. This architecture is applicable in several scenarios, including conferencing and spontaneous collaborative streaming sessions. Push and pull scenarios are both supported by the architecture. We have divided the proxy architecture into several services which interoperate to provide the desired functionality. The basic streaming service registers with the association service to provide application lookup for pull service users, and in turn retrieves timing information from the association service for users synchronizing to related users. Push and pull services use the association service to establish relations between users. These relations are covered by the association concept we propose in the paper. Protocol requests of push and pull service as well as control functionality of the streaming service like jumping to other points in the movie, are mapped to methods of this association concept. These methods include establishment of association, joining of clients and opening sub-associations. Other functionality streaming clients do not handle at times can also be appended, for example following a user into subassociations or leave a sub-association. Future work will include a full implementation by re-using existing RTSP, SIP or SLP components, as far as possible. We aim at running and evaluating tests with users to experience which functionality they need in collaborative streaming sessions. Possible optimizations of the concept can also be seen with respect to efficient data transport by taking advantage of caching. Since this implies to incorporate the proxy into the data path, we will test the effectiveness of such an approach in several scenarios.
A proxy architecture for collaborative media streaming
Another interesting aspect is to examine the use of our architecture in networks with partial disconnectivity further. The scenario of users meeting spontaneously and synchronizing multimedia streams could become an important future application. The application of our association concept as a loose grouping concept has just been designed for such environments, and applying additional concepts like distributed databases or soft-state protocol processing can help to prove effectiveness in such ad-hoc or nomadic computing environments. References 1. Chu Y, Rao S, Seshan S, Zhang H (2001) Enabling conferencing applications on the internet using an overlay multicast architecture. In: Proc. ACM SIGCOMM, San Diego. ACM Press, New York, pp 55–67 2. Fortino G, Nigro L (2000) A cooperative playback system for on-demand multimedia sessions over Internet. In: Proc. IEEE Conference on Multimedia and Expo, New York. IEEE Publishing, Piscataway, NJ, pp. 41–45 3. Guttman E, Perkins C, Veizades J, Day M (1999) Service Location Protocol, Version 2. RFC 2608, The Internet Engineering Task Force 4. Handley M, Schulzrinne H, Schooler E, Rosenberg J (1999) SIP: Session Initiation Protocol. RFC 2543, The Internet Engineering Task Force 5. HAVi Specification 1.0 (2000) The HAVi Organization; http://www.havi.org/ 6. Jonas K, Modeker J, Kretschmer M (1998) Get a KISS – Communication infrastructure for streaming services in a heterogeneous environment. In: Proc. ACM Multimedia Conference, Bristol, UK. ACM Press, New York, pp 401–410
405 7. Kahmann V, Wolf L (2001) Collaborative media streaming in an in-home network. In: Proc. of International Workshop on Smart Appliances and Wearable Computing (IWSAWC), Phoenix/Mesa. IEEE Computer Society Press, Los Alamitos, CA; pp. 181–186 8. Rothermel K, Helbig T (1995) An adaptive stream synchronization protocol. In: Proc. 5th International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), Durham, NH. T.D.C. Little, R. Gusella (eds.), Lecture Notes in Computer Science, Vol. 1018, Springer, Berlin Heidelberg, pp 189–202 9. Schulzrinne H, Casner S, Frederick R, Jacobson V (1996) RTP: A Transport Protocol for Real-Time Applications. RFC 1889, The Internet Engineering Task Force 10. Schulzrinne H, Rao A, Lanphier R (1998) Real Time Streaming Protocol (RTSP). RFC 2326, The Internet Engineering Task Force 11. Sen S, Towsley D, Zhang Z-L, Dey J (1999) Optimal multicast smoothing of streaming video over an internetwork. In: Proc. IEEE INFOCOM, New York. IEEE Communications Society, New York, pp. 455–463 12. The Object Management Group, Inc. (1998) Audio/visual stream specification v1.0. OMG Document formal/00-01-03 13. The Object Management Group, Inc. (2000) Relationship service specification v1.0. OMG Document formal/00-06-24 14. The Object Management Group, Inc. (2000) Task and Session CBOs Specification V1.0. OMG Document formal/00-05-03 15. Vogel J, Mauve M, Geyer W, Hilt V, Kuhm¨unch C (2000) A generic late join service for distributed interactive media. In: Proc. 8thACM Multimedia, LosAngeles.ACM Press, NewYork, pp. 259–267