An Architecture for Group Communications-Support Networked Multimedia Applications John Zissopoulos, John Soldatos, Evaghelos Vayias, George Branis, Nikolas Mitrou National Technical University of Athens Department of Electrical and Computer Engineering Telecommunications Laboratory Phone: +30 1 772 15 13 FAX: +30 1 772 25 34
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
This paper presents an architecture for a multimedia application platform able to support real-time group communication activities (such as: Videophone, Teletraining, Computer Supported Collaborative Work) between remotely located members of groups by delivering multimedia communication capabilities directly at their personal computers ("desktops"). The paper focuses on a component-based design approach that gives the potential to integrate into the platform various Application Components (Audio, Video, Text chat, File transfer, Slide show, Shared Whiteboard or text editor, etc.) depending on the activity supported and thus permitting to customise the platform for the specific activity. Apart from the basic architecture, alternative connection schemes depending on the capabilities of the network are surveyed. A discussion of the advantages and disadvantages of the described architecture is also given and, finally, an implementation aiming to support Teletraining activities is presented.
1. Introduction Recent advances in complementary technologies such as computing, telecommunications and signal processing have resulted in the emergence of a new type of applications (if considered from the computing point of view) or services (if considered from the telecommunications point of view) which is called networked multimedia applications. This family of applications covers a wide range of cases with diverse characteristics. Thus, several categorisations of networked multimedia applications have been proposed based upon various criteria [1]. For example, such divisions exist between: • people-to-people applications which focus on enabling communication between people by using multiple media and people-to information sources applications, where a person accesses and retrieves multimedia information stored in an information source. • synchronous or real-time applications where continuous, time-dependent media like audio and video are used together with time-independent media and asynchronous applications which mainly support multimedia messaging and work flow and generally constitute what is called groupware. • applications that their service is delivered to the user in specially equipped dedicated rooms and applications that may be delivered to the end-user’s desktop possibly enhanced by some special hardware devices. Although asynchronous applications have matured to successful commercial products and can also be supported by various Internet services [2], applications supporting synchronous interaction have not yet known such wide deployment. Commercial products such as PictureTel’s, Intel Proshare, Nortel’s, etc. although they support person-to-person interactions, they do not seem suitable for larger group interactions where additional service components apart from audio and video are needed [3]. Experimental synchronous group interaction applications, such as ISABEL [4] and JVTOS [5], have been developed in the scope of European Research Projects, but most of them need specific network configuration and specially equipped workstations aiming mostly at supporting large distributed events between audiences [6], [7].
This paper focuses on people-to-people, synchronous, desktop applications and more specifically in distributed multimedia applications that enable and improve synchronous interaction between remotely located members of groups by delivering multimedia communication capabilities at their personal computers (“desktops”). The term “synchronous group communication activities” covers various types of applications, such as: • Person-to-person applications as e.g. a videophone application. • Person-to-group applications, where multimedia information is mainly transmitted from a single source to many recipients with limited feedback in the reverse way. Typical example is a Teletraining activity where a “tutor” gives a lecture to remote “students”. • Group conferencing applications, where fully bi-directional conversational communication is taking place between two or more people. It may include simply audio-video communication between members of the group or it may extend to support also workspace sharing in the form of common windows or editing and drawing tools. In the latter case, we refer to a real-time multimedia Computer Supported Collaborative Work (CSCW) system. The purpose of this paper is to describe the architecture of an application platform able to support all the group communication activities discussed above. The platform is based on the client-server model and built on an architectural design that gives the potential to integrate into the platform various Application Components (Audio, Video, Text chat, File transfer, Slide show, Shared Whiteboard or text editor, etc.) depending on the activity supported and thus permitting to customise the platform for the specific activity. The overall architecture of the platform is first presented and then the various modules of the individual applications are described. Such an application requires multicast capabilities. Connection schemes that may implement these multicast capabilities depending on the existence of internal multicast mechanisms in the network are surveyed. A discussion of the described architecture is then given presenting advantages and drawbacks of this architecture. Finally, the implementation of this system for the Win32 platform over a TCP/IP based LAN is presented together with some observations from the use of the system in this environment. This implementation aims mainly at supporting Teletraining activities, although more general communication and cooperation purposes can be also served.
2. Overall Architecture The platform presented in this paper supports group communication activities between computer users that need to cooperate within several groups even though they are spatially dispersed. The activities supported are synchronous, which means that users take part in them concurrently and use continuous, time-dependent media like audio and video to enable communication between them, together with other time-independent information types such as data files, images, etc. This application platform focuses on interconnecting individuals using their personal workstations, so that they can participate in several activities like conferences, training sessions, meetings, without having to leave their place of work. This kind of interaction is usually needed in a corporate environment to enable persons from different corporate departments to cooperate with minimum transportation costs. Therefore, this application does not target at supporting large distributed events, where the main issue is the interconnection of audiences [4]. The above issue is particularly important, because different type of interaction control has to be offered by the application in the above two cases. Namely, applications targeting at audience interconnection, such as ISABEL need to offer a more formal and centralised control of the interaction between the participating audiences which allows them to have a common view of the application on the workstations’ screens, thus homogenising the various desktops ([4] and [6]). When the aim is the interconnection of several individuals within a group, less formal interaction control is needed, as the participants may easily coordinate the interaction between them using the application’s service components such as voice transmission or chat. Moreover, each participant wishes to have its
own personal arrangement of the user interface, depending on the features of his workstation. This is the approach adopted in this application. Nevertheless, service components like slide show or telepointer are coordinated at the application level by exchange of messages through the server, as a user has to first take hold of a “token” in order to direct the telepointer or the slide show. A similar application that offers packet videoconferencing between individuals over the Internet is the IVS, presented at [8]. The presented application though offers additional service components and not only audio and video transmission and also provides inherent group meeting facilities, based on a server module. As is shown in Figure 1, which presents the overall architecture of the application platform, the architecture is based on a Client-Server approach. The Server application is executed on a machine with a well-known network address and serves as group meeting point providing directory services for the various groups. All users have to connect to the Server in order to get awareness of the groups supported as well as of the group participants and also to join a group and thus participate themselves in the communication activity. The existence of a server also facilitates the on-line monitoring of the entire system by an operator as well as the logging of the monitoring information for traffic analysis or accounting purposes. The Client application is executed on the workstations of the users and enables the exchange of the diverse media information (audio, video, text, graphics) between the members of a group. The service components needed to support the type of activities mentioned above are implemented in the Client-side and include: • Service components that enable conferencing and bring to each participants desktop the picture and voice of the other remote participants. Such components include Audio and Video transmission. Chat (short text messages) is also a special conferencing service that can be used e.g. for feedback for users that cannot transmit (but only receive) audio and video, or can supplement the audio communication if the latter is problematic. • Presentation-enabling service components such as Slide show, Telepointer, etc. • Workspace sharing components such as Shared Whiteboard, Workgroup editor, file transfer, image transfer, etc. The Client handles the encoding and decoding of multimedia information related to these components and manages any devices associated to them (such as audio or video boards). It also caters for the proper representation of the received media information to the user through a suitable user interface as well as for the transport of the transmitted media information between itself and the Server and from there on to other participants Client applications. The Client also enables the user to access the Server’s directory services in order to connect and join a group as well as to control the flow of information regarding the various multimedia services supported. The latter is achieved by exchange of messages between the Client and the Server which is taking place concurrently with the media information exchange. The operations that the Server is performing in the context of this group communication application include the following: • Maintain a user and group registration Database. The information about users includes user-name and password, the services that the user is permitted to use, the sessions to which he may join, etc. The Database also lists the various groups that currently exist on the platform and to which a user may join. • Receive and process messages sent by the Clients. These messages may concern: a) a request for an operation that should be performed by the server, e.g. connection request, request to get a list of groups or to join a group, etc. b) control of the flow of information regarding the various multimedia services supported. In case (a), the Server performs the requested operation and sends the result back to the Client. In case (b), the Server routes the message to the other Clients of the same group.
• Receive multimedia information from Clients which is either routed to the appropriate recipient or multicasted to all the Clients of the group via the group’s point-to-multipoint connection, depending on whether the transmitter of the information wants it to be unicasted to a specific recipient or to be multicasted to all the participants of the group. • Provide support for the Workspace Sharing services (such as the Whiteboard, the workgroup editor, etc.) by means of providing storage and retrieval/update functionality for shared workspace objects. • Control the flow of information in several conferencing services (such as the Slide Show and the Telepointer) which require that a user must first take hold of a “token” in order to direct the activity and provides support for the inter-streaming synchronisation of continuous media services (video, audio) by means of global time-stamps. • Perform on-line monitoring of the ongoing activities as well as logging of the monitoring information. The logged data can be analysed in the sequel, so as to generate statistical results concerning accounting, traffic characteristics, network and server utilisation.
Group A
Server
Group B
User
User
Group point-to-multipoint Connection
User
Message & Information Exchange
User
Figure 1 - Overall architecture of the application platform
3. Description of the Application Components The design architecture follows an Object-Oriented and component-based approach in order to facilitate integration of new service components to the ones already implemented. The component-based approach also allows the customisation of the platform for the support of diverse communication activities, by selecting and incorporating an appropriate subset of service components. The Object-Oriented model results in a precise and concrete definition of the interactions between objects, hence permitting the minimum dependency of the implementation on internal object details. Thus, for example, adoption of a new compression technique, or implementation of a new synchronisation scheme, or even change in the underlying transport service (e.g. from TCP or UDP to native ATM Adaptation Layer) does not affect the entire implementation in a major way. Figure 2 illustrates the application components of the platform and the interactions between them. The Clients and the Server interact by exchanging Data Units of a well-defined structure according to an Application Layer Protocol. The exchanged Data Units can be divided into two categories: 1. Control Messages: which are fixed length packets used to control various operations. Messages are sent by a Client in order to request an operation by the Server (Connect, Join a group, Add a group, etc.) or to control information flow by notifying other Clients.
2. Information Packets: which transport information produced by the various service components. These packets have a fixed header part and a data part. The fixed header part has some standard fields regardless of the packet’s producing service as well as a part with fields interpreted differently by each service component. Let us now briefly describe the Server’s application components and their functionality: • Connection Objects: these objects handle the connection between Client and Server. User-specific information (name, group joined, services enabled and other) is stored in this object. This object actually consists of three connections (socket objects), one for non-continuous media, one for video and one for audio, which possess internal buffer structures to store the incoming and outgoing Data Units. The separation of the socket objects was decided to facilitate synchronisation as well as to permit assigning different QoS to each socket if this is supported by the transport layer. • Processing Thread: The thread scans the Buffers of the Connection Objects and processes the incoming messages and packets. It also performs routing of the information packets to the Connection Objects of the appropriate receivers. • Monitor & Log Object: This object collects activity-log data for the various operations that are taking place on the Server. As the Server is a central point of interconnection as well as of information exchange, every aspect of the ongoing activities may be monitored and logged through this object. As can be seen in Figure 2, the Client architecture follows a layered approach. A brief description of the Client’s application components is now given: • User Interface Objects: these objects interprete user actions and notify the appropriate service components. They also perform the representation of the various media to the user according to data received from the service components. • Service Component Objects: Each of these objects handles a specific media type and its associated devices if any. These objects hide the details of media manipulation from the other objects thus facilitating the extension of the application by addition of other service components. The service components also perform encoding/decoding of the information and prepare Data Units to be transmitted. • Socket Objects: The socket objects constitute the interface of the application to the transport service and handle all the communication details. They interact with the other objects, namely the service components, via buffers (a BufferIn for reception and a BufferOut for transmission) where Data Units are placed. • Synchronisation and Control Object: This object controls the data flow among the three distinct layers of the Client (User Interface, Service Components, Sockets) and synchronises the presentation of information to the user. A typical scenario of operation with a Client connecting to the Server and joining a group, together with a more comprehensive description of the functionality of the various application components that take part and the messages and packets that are being exchanged is presented in the sequel: The Client first sends the message Connect_Request supplying the user’s name and password. The Server authenticates the user according to information stored in its user database and replies to it with a Connect_Ok message assigning an ID (of one byte length) to be used as identifier of the Client in the header of the exchanged Data Units instead of its network address. After the authentication of the user, the Server retrieves information concerning the user from its database and a Connection Object is constructed in the Server-side to hold this information and to handle the connection between Client and Server. Apart from the point-to-point Connection Objects between Client and Server, there are also point-tomultipoint Connection Objects in the Server, one for each group. These objects connect the Server to every Client
U s e r I n t e r f a c e
Video Service Component Audio Service Component . . .
VideoSocket
VideoSocket
AudioSocket
AudioSocket
DataSocket
DataSocket
Service Component N
Client-1 Connection Object
Point-to-Multipoint Connection Object for Group
Sync & Control Object Message & Information Processing Thread
CLIENT - 1 . . .
CLIENT - N
VideoSocket
AudioSocket
DataSocket
Sync & Control Object U s e r I n t e r f a c e
Video Service Component Audio Service Component . . . Service Component N
VideoSocket
VideoSocket
AudioSocket
AudioSocket
DataSocket
DataSocket
Monitoring & Log Object
Client-N Connection Object
SERVER
Figure 2 - Application components of the platform and interactions between them participating in the group and they are used in order to efficiently multicast control messages and information packets to all Clients of a group. In case the underlying transport service cannot support point-to-multipoint connections, the Message and Information Processing Thread at the Server has to implement multicasting at the Application Layer by copying messages and packets to the socket buffers of all Clients.
Following the Client’s connection to the Server and the user’s authentication, the user is able to participate in the collaboration activity by exchanging information with the other users of his group. The architectural components presented in Figure 2 interact with each other in order to enable the information exchange. The interactions between these components are described below: ♦ The User Interface objects of the Client, driven by the user’s actions, notify the appropriate Service Component objects that the user wishes to perform a specific action, for example, to join a group or to initiate a slide show or to transmit video. The parameters of the user’s action are passed to the Service Component which in turn generates the respective control messages (e.g. Join_Group, Get_Slide_Token) and/or the suitable information packets containing data from the media source that the specific component manipulates (e.g. a video camera, a file, a graphic image, etc.). ♦ The Service Components place the generated Data Units in the BufferOut of the appropriate Socket object. The Data Units are from there on transmitted to the Server under the supervision of the Synchronisation & Control Object. In the Server-side, the Data Units are stored in the BufferIn of the appropriate Socket Objects. A Processing Thread scans the Buffers of the Socket Objects in a Round Robin scheme and processes the stored Data Units. Processing may consist either of an appropriate operation on the Server’s Data Structures and Database and a reply message to the Client that requested the operation or of the routing of the Data Unit to the BufferOut of the Client that should receive it. In case the Data Unit must be multicasted to the entire group, the point-to-multipoint Connection Object is used. The Data Unit is then transmitted to the Client. An alternative solution to the Round Robin scheme is to use a separate processing thread for each Client. To what degree such a scheme would increase performance seriously depends on the Operating System design. Hence, as this scheme would render the implementation more complex, it was decided not to adopt in a first phase.
4. Alternative Connection Schemes Although a basic connection scheme is presented in the previous section, it greatly relies on the provision of an internal multicast mechanism by the transport network and on the fact that the Server will be able to deliver the entire traffic generated by the users. The first condition is generally not met in most existing networks and the second one, although feasible, may cause a serious performance degradation mainly expressed as an increase in the end-to-end delay. Therefore, some additional connection schemes are discussed in this section, each one possessing some advantages and disadvantages over the basic scheme, with respect to the two aforementioned conditions. In each case, a tradeoff between diminishing end-to-end delay and connection complexity exists. Consider a system based on the proposed architecture with N Clients, X groups and Mi Clients joined in the i-th group (i = 1,...,X). It is worth mentioning that, regardless of the scheme used, N point-to-point connections from every Client to the server are inevitable, as these connections are required for the exchange of control messages with the server. The cases described below are based on the assumption that every client needs to have full interconnection (N-to-N) capability. • In case the Server is a bottleneck which increases end-to-end delay: N point-to-point connections from each Client to the Server and N point-to-multipoint connections from each client to the other participants of its group providing him with the option of multicasting packets to its entire group without having to transmit it first to the server When no multicast option is supported by the network layer, the application layer takes over the responsibility of multicasting packets. The following schemes may be applied as alternatives:
• In case the Server does not affect end-to-end delay: N point-to-point connections from each Client to the Server. In this case multicast is provided, as the server routes a multicast packet in the application layer by sending it to every client that has joined the sender's group. It is obvious that this solutions results in (Mi -1) copies of the original packet which is multicasted in the scope of the i-th group. • In case the Server is a bottleneck which increases end-to-end delay: N point-to-point connections from each Client to the Server and Mi(Mi -1)/2 point-to-point connections to the other participants of the i-th group. Hence X Mi ( Mi − 1) . the total number of connections needed to apply this scheme to the i-th group amounts to: ∑ 2 i =1
5. A Discussion of the proposed Client-Server Architecture The Client-Server architecture has nowadays became a ubiquitous architecture for a variety of distributed applications. The selection of this architecture for an interactive networked multimedia system is highly based on the potential benefits which result from such a choice. The major feature of the Client-Server model, as used in this application with the connection scheme presented in section 4, is the fact that the Server is a central point of control and information exchange. Taking into account this feature, the following points illustrate the strength and the efficiency of the client server model: • Connection complexity is transferred mainly to the Server which handles the group multicast connections, whereas Clients only handle simple point-to-point connections. • Accounting procedures as well as collection and analysis of statistics related to the network load produced by the system is greatly facilitated. • Central control and administration of users and groups is provided by the Server, as the Server application maintains an overall view of the system. Nevertheless, as every transmitted packet passes through the server, drawbacks mainly pertaining to performance issues arise. The most crucial of these shortcomings are: • The end-to-end delay achieved is not optimum when compared to the case of direct transmission from one user to another, as a result of the additional buffering that is performed by the Server. • The Server is overwhelmed with data, which may lead to bottlenecks making end-to-end delay totally unacceptable. Therefore, in order to exploit the advantages and outweigh the drawbacks of this architecture the machine where the Server application is to run should meet strict network bandwidth and CPU power requirements.
6. An Implementation An implementation of the platform presented in the previous sections was undertaken in order to validate the described architecture as well as to gain experience in the design and implementation of group conferencing systems based on synchronous continuous media communication. The main field of application of the implemented system was Teletraining, although more general communication and collaboration activities can be enabled and supported by the application service components that have been implemented. The implementation is oriented towards Microsoft Windows 95 and Windows NT platforms (Win32) using ordinary Windows PC-based Workstations as clients and server as well. These PCs are connected to a TCP/IP based Local Area Network, featuring Ethernet and switched Ethernet segments. The Client Workstations are equipped with sound adapters, microphones and speakers for speech capturing and playback as well as with digital cameras in
order to capture the participants’ image. Software or hardware compressors/decompressors for audio and video can be optionally installed in the workstations. The Client application offers to the user the option to select his/her preferred codec from the list of all available (both software and hardware) codecs in the workstation, but in this case, all participants in the group should use the same codecs. The exchange of packets between clients takes place through simple point to point connections that are established from every client to the server. The multicast capabilities required are provided by the Server at the application layer, as already stated while describing the corresponding connection scheme. Moreover, direct point to point connections between clients can also be established in the context of the current implementation, so as to facilitate the timely delivery of continuous media without the server’s support. (e.g. to use the application as a videophone). The services currently available include: Video/Audio reception/transmission
The user may choose either to multicast audio and video to all the participants of the group or to send audio and video to a specific recipient. In the first case, audio mixing of all the group members is performed at each recipients workstation. The user of the workstation has the ability to adjust the volume level for each group member from whom he receives audio. In the second case, direct connections are established between the two users and the application is used as a videophone application. The user may choose to use whichever compression scheme is available on his/her workstation, with the limitation that participants of the same group should use the same compression scheme. The agreement on a specific compression scheme for audio and video should be done informally through users direct contact and, without any specific application message exchange. Chat/Whisper
Enable exchange of text messages between participants of a group. The Whisper service in particular allows a user to send a private message to a specific recipient.
File Transfer
Allows exchange of files between participants of a group. Files can be multicasted to the entire group or sent only to a specific recipient. Clipboard Transfer
Allows exchange of bitmaps (or other form of data such as charts, spreadsheets, etc.) which should first be copied to the Microsoft Windows clipboard. Like file transfer, the clipboard content can be multicasted to the entire group or sent only to a specific recipient. Slide Show
Allows one specific user (called Master) to present a series of slides (of its choice) to every other user within his/her group. In order for a user to become “master”, he/she should first gain possession of a “token”. This action is performed by sending the appropriate message to the server, which, in its turn, informs the other users that somebody has become holder of the token. Telepointer
Allows one specific user (called Master) to perform a presentation based on a image multicasted to the entire group (via the Clipboard Transfer or Slide Show service) using his/her mouse pointer. Information about the movement of the mouse pointer of the master is sent to the other users and an image of the pointer replicates the master’s movement on the recipients copy of the image. Shared Whiteboard
It is a typical shared window on which all participants of a group may simultaneously create simple drawings and small text regions. This implementation addresses most of the issues stated in this paper, and is strongly based on ideas and solutions cited in the context of the proposed architecture. Both the development process and the function of the system reinforced the validity and scientific foundation of the ideas, predictions and considerations of this paper. References [1]
Fluckiger, F. (1995), Understanding Networked Multimedia, Prentice-Hall, NJ
[2]
Roberts, B. (1996), “Groupwar Strategies”, BYTE Magazine, vol. 21, no. 7
[3]
Taylor, K., Tolly, K. (1995) “Desktop Videoconferencing: Not Ready for Prime Time”, Data Communications International, vol. 24, no. 5
[4]
Quemada, J., De Miguel, T., Azcorra, A., Pavon, S., Salvachua, J., Petit, M., Larrabeiti, D., Robles, T., Huecas, G. “ISABEL: A CSCW Application for the Distribution of Events”, ………..?
[5]
Dermler, G., Froitzheim , K. (1992), “JVTOS - A Reference Model for a new Multimedia Service”, 4th IFIP Conference on High Speed Networking, Proceedings, Liege, Belgium
[6]
De Miguel, T., Pavon, S. Salvachua, J., Quemada, J., Chas, P.L., Fernandez-Amigo, J., Acuna, C., Rodriguez, L., Lagarto, V., Bastos, J. (1994) “ISABEL - Experimental Distributed Cooperative Work Application over Broadband Networks”, Lecture Notes in Computer Science, vol. 868, pp. 353-362
[7]
Azcora, A., De Miguel, T., Petit, M., Rodriguez, L., Acuna, C., Chas, P.L., Lagarto, V., Bastos, J. (1995) “Multicast IP support for distributed conferencing over ATM”, Networld+Interop 95, Mar. 1995, pp.9
[8]
Turletti, T. (1994) “The INRIA Videconferencing System (IVS)”, ConneXions, vol. VIII, no. 10, October 1994