In Proceedings of the ACM Conference on Organizational Computing Systems, Atlanta, Georgia, Nov 5-8, 1991
Hierarchical Conferencing Architectures for Inter-Group Multimedia Collaboration Harrick M. Vin, P. Venkat Rangan, and Srinivas Ramanathan1 Multimedia Laboratory Department of Computer Science and Engineering University of California at San Diego La Jolla, CA 92093-0114
Abstract
Advances in computer and communication technologies have stimulated the integration of digital video and audio with computing, leading to the development of various computer-assisted collaborations. In this paper, we propose a multi-level conferencing paradigm (called super conferences) for supporting collaborative interactions between geographically separated groups of users, with each group belonging to possibly a dierent organization. Hierarchical communication architectures are naturally suited for carrying out media transmission in super conferences. We study the performance of hierarchical communication architectures, and present algorithms for bounding end-to-end delays of real-time media trac in them. We derive some interesting limits on the number of participants in a group and the number of groups within a super conference, so as not to violate bandwidth and delay requirements of multimedia. At the Multimedia Laboratory at the University of California, San Diego, we have implemented a conferencing system on an environment of Sun SPARCstations and PC-ATs equipped with digital video and audio processing hardware. As an interesting application of the conferencing system, we have developed a tele-presenter by which users can remotely attend lectures in progress. We present our initial experiences with using the system.
1 Introduction
Fundamental characteristics of voice and video trac, such as sensitivity to delay, high bandwidth requirements, and the ability to tolerate high error rates stand in marked contrast to the requirements of data. These dierences have traditionally resulted in separate networks for voice, video, and data communication. Recent advances communication technology have made available large bandwidth at modest cost, and the advances in computer technology have led to the development of high performance workstations with digital audio and video capabilities [7]. The advent of such multimedia workstations has given rise to many computer supported collaborative applications. An important class of collaborative applications is multimedia conferencing between geographically separated groups of users, each group possibly belonging to a dierent organization. Software paradigms and communication architectures for supporting inter-group multimedia conferencing on computer networks is the focus of this paper. Many ongoing research projects are investigating mechanisms for person-to-person conferences and shared textoriented workspaces. Lantz [9], and Sarin and Greif [13] have studied conferencing architectures for text and graphics, but they are not exible enough for audio and video. Angebranndt, et. al. [4] provide a client-server architecture for integrating audio into a graphics workstation environment, but the emphasis is on lower-level audio resource management rather than rich conferencing capabilities. Forsdick, et. al. at BBN [8], and Aguilar, et. al. at SRI [2] 1 Electronic mail addresses - Harrick M. Vin:
[email protected], P. Venkat Rangan:
[email protected], Srinivas Ramanathan:
[email protected].
propose architectures for person-to-person voice conferencing. Ziegler, et. al. [15, 17, 18] evaluate the performance of a distributed voice conferencing system on broadcast and point-to-point networks. Ahuja, et. al. at AT&T Bell Laboratories [3], Ludwig, et. al. at Bellcore [11], Casner, et. al. at ISI [6], and Swinehart, et. al. at Xerox PARC [14, 16] have proposed architectures for person-to-person video conferencing. However, conferencing paradigms for inter-group collaborations have not received much attention. In this paper, we propose a multi-level conferencing paradigm (called super conferences) for supporting collaborative interactions among groups of users. Hierarchical architectures are naturally suited for carrying out media communication in super conferences. We study the performance of hierarchical communication architectures, and present algorithms for bounding end-to-end delays of real-time media trac in them. We derive some interesting limits on the number of participants in a group and the number of groups within a super conference, so as not to violate bandwidth and delay requirements of multimedia. We have implemented a multimedia conferencing system on an environment of Sun SPARCstations and PC-ATs equipped with digital video and audio processing hardware. As an interesting application of the conferencing system, we have developed a tele-presenter by which users can remotely attend lectures in progress. We present our initial experiences with the system. The rest of the paper is organized as follows: In Section 2, we introduce the multi-level conferencing paradigm. Hierarchical communication architectures and a model for evaluating their performance are described in Sections 3 and 4, respectively. The implementation of the conferencing system and its experimental evaluation are presented in Section 5. Finally, Section 6 concludes the paper.
2 Paradigms for Multimedia Conferencing
We de ne a conference as a basic paradigm by which interactions such as (1) communication among multiple participants, and (2) one or more participants accessing one or more servers, are carried out. Participants in these conferences can be either individual users (called simple participants), or other conferences (called group participants). Depending on the types of its participants, a conference can be classi ed into one of two categories: Simple Conferences contain only simple participants. Super Conferences contain at least one group participant representing some other conference, as their participant. If a conference C1 is a participant in a conference C , C1 is termed as a sub-conference (or a group) participating in the super conference C . Note that a sub-conference can itself be recursively a super conference. It should be observed that, even though super conferences can always be attened into simple conferences, the super conference paradigm serves as a better (more natural and ecient) abstraction to model inter-group and interorganization collaborations. As an illustration, consider a meeting C between two groups of managers M1 and M2 , belonging to two dierent organizations, to decide policies for technical cooperation between their organizations. The nature of the collaboration requires that members of each group discuss a policy among themselves before communicating it to the other group. This collaboration can be modeled by creating two conferences C1 and C2, among the members of groups M1 and M2 , respectively, and then making both C1 and C2 participants of C (thus making C a super conference). Capturing this scenario using only simple conferences introduces special requirements. For instance, a new participant joining C1 has to determine all the conferences that C1 is participating in (i.e., C in this case), and then join each of them. In comparison, using a super conference abstraction for C has the advantage that participation in C1 (or C2) automatically implies participation in C , thereby capturing the notion of group participation in conferences. In addition, the super conference abstraction provides a natural mechanism for separating intra-group and inter-group collaborations. The functionality of a conferencing system that supports such an abstraction can be divided into two parts: Connection Management and Media Communication. Connection management refers to the intelligence required to establish and control the progress of conferences. The connection manager handles control functions such as naming and addressing of participants, negotiations and resource allocation for setting up media connections, enforcement of access rights, etc. A model for connection management that captures the notion of super conference is described in [12]. Once a conference is established, media communication can take place among its participants. Architectures for media communication in super conferences are outlined in the next section.
3 Communication Architectures for Super Conferences
In a multi-party conference, each participant must receive media information from all the other participants. Packets of the same medium from the participants are combined to form a composite packet, and then played back. This process is called mixing (also referred to as bridging [1]). The technique used for combining packets from dierent sources depends on the media. In the case of audio, mixing multiple streams involves digitally summing the audio samples and then attenuating the result. Mixing in video domain may require some image processing: in the simplest case, it may require reducing the individual video images to a fraction of the frame size, and combining the fractions to form a composite frame. For example, in a conference consisting of four participants, video frames from each of the participants may be reduced to the size of a quadrant, and the four quadrants combined to form a composite image. The architecture for communication among participants of a conference can be centralized at one end of the spectrum, or fully distributed at the other end of the spectrum. The centralized architecture requires that each participant in a conference transmit media information to a central mixer. The mixer receives packets from all the participants, creates a composite packet by mixing the received packets, and then transmits it to all the participants. Each participant, on receiving the composite media packet, may have to perform some media dependent processing of the composite packet (such as removing his own contribution in the case of audio) before scheduling it for playback. At the other end of the spectrum is the distributed architecture, which requires that each participant in a conference transmit media information to each of the other participants. Mixing is performed by each participant independently. Whereas the centralized architecture is simple to implement but in exible (i.e., does not provide features such as autonomous control of each media stream by participants), the distributed architecture is exible but incurs duplication of mixing computation and bandwidth usage. Providing exibility in centralized architectures also result in duplication of mixing computation. Neither architecture scales well (with either the number of participants or the geographical separation between participants) if the network, the network interface, or the processing power at the mixer is the bottleneck. By clustering together participants, and using a hierarchical mixing architecture (see Figure 1), we can bound the bandwidth and processing requirements at the mixers. In a mixing hierarchy, participants constitute the leaf nodes, and the mixers constitute non-leaf nodes. Each mixer receives media packets from its children, mixes them, and sends the composite packet to its parent. The mixer that is at the root of the hierarchy forwards the nal mixed packet to each of the leaf nodes. The bandwidth required for packet reception at each mixer is proportional to the number of its children, whereas the bandwidth for packet transmission is that of sending to just one parent. (Even though the root mixer has to send a mixed packet to each of the participants, since the mixed packet is common to all the participants, the root mixer needs to make only one packet transmission by using multicasting). Mixer
Group size = 3 Height = 1
Agent
Group size = 2 Height = 2
Multicast to all participants
Mixer
Agent
Agent Group-1
Mixer
Agent
Group size= 2 Height = 1
Agent Group-2
Figure 1: A hierarchical architecture for mixing A generalization of the hierarchical architecture yields a graph-structured mixing architecture. In a non-hierarchical graph, there may be multiple paths between a participant and a mixer. Hence, a mixer may receive multiple mixed packets containing the same participant's packet. To eliminate the duplication, the participant's packet may have to be transmitted in addition to the mixed packet, leading to wastage of bandwidth. Since graph-structured architectures do not aord any special advantages over hierarchical ones, they are not very interesting for mixing. A special case of a hierarchical architecture is a directed ring, which can be thought of as a mixing tree in which each node has exactly one child. Such a con guration is appropriate for token ring based networks, and is analyzed
by Ziegler, et. al. [18]. In super conferences, which may contain large number of geographically separated participants, associating a mixer (or a set of mixers) with each sub-conference gives rise to a mixing hierarchy. Each mixer associated with a sub-conference can either multicast the composite packet to all of its participants (intra-group communication) or to its parent (inter-group communication), thus providing a mechanism to separate intra-group and inter-group collaborations. Hence, hierarchical architectures are naturally suited for super conferences. We now analyze the limits imposed by delay and bandwidth considerations on the height of the hierarchy and the number of children of each mixer, which leads to upper bounds on the number of participants and groups in super conferences.
4 Analysis of Hierarchical Communication Architectures
In order to derive the limits on the performance of hierarchical mixing, we develop a model that relates delay and bandwidth requirements to group size, height of a mixing tree, and network and system parameters. Table 1 shows the various parameters of interest and their notation. Symbol De nition Bnet network bandwidth, (bits/second) Rpkt media packet generation rate at a participant, (packets/sec) Tpkt media packet generation period = 1=Rpkt, (sec/packet) spkt size of each media packet, (bits/packet) tnet transmission delay of a packet, (sec) tprp propagation delay of a packet = Bspkt net , (sec) tmix time to mix two media packets, (sec) tdif maximum dierence in generation times of packets forming a composite packet, (sec) Dpkt packetization delay for media packets, (sec) Dply processing delay of composite packet before playback, (sec) Dcom total communication delay of a packet, (sec) Dend total end-to-end delay of a packet, (sec) Nmax maximum number of participants in a conference - based on bandwidth limitation Nsup number of participants in a super conference Ngrp number of participants in a group H height of a mixing tree Table 1: De nitions of symbols used in the paper In this analysis, we make the following assumptions:
All the participants of a super conference, together with their respective mixers, share bandwidth on a local
area network (e.g., Ethernet). Due to the broadcast nature of such networks, we assume the availability of multicast communication. Transmission latencies of media packets on the network are bounded. All the participants generate media packets at the same rate (namely, Rpkt). The clocks of the participants and mixers are not globally synchronized.
We will rst derive limits on the number of participants in a group Ngrp , and then on the number of groups within a super conference (which, given Ngrp , will be determined by limits on the height of the mixing hierarchy).
4.1 Limits on Participants in a Group
Given that Rpkt is the rate of packet generation, spkt is the size of each packet, and Bnet is the network bandwidth, the limit on the maximum number of media sources Nmax , that can simultaneously transmit packets onto the network, is given by Nmax Rpkt spkt Bnet
) Nmax R Bnet pkt spkt
(1)
Note that Nmax provides an upper bound on the number of participants that can be supported in a super conference. If the participants of a super conference span multiple networks (as opposed to sharing bandwidth on the same network, as assumed in our analysis), the value of Nmax can be increased even further. The bound on the maximum number of participants (Ngrp ) that can be associated with a single mixer (i.e., the maximum number of participants in a group) arises from the limitation in the rate of packet reception at the network-host interface, and the packet processing overhead [5]. The following analysis derives a bound on the value of Ngrp : During each packet generation period, Ngrp packets are sequentially received on the network by the mixer. Since the propagation delay2 for each packet is tprp , the total propagation delay is given by (Ngrp tprp ). Assuming that mixing is performed by averaging of media data (as in the case of audio), the maximum time for mixing Ngrp media packets is (Ngrp ? 1) tmix (packet losses due to insucient bandwidth, insucient rate of packet reception at the mixer, and other errors may reduce overhead of mixing). In the presence of multicasting, the mixer has to perform only one transmission to send the mixed media packet to all the participants. Since all the above functions must be performed during each packet generation period, and if we assume that mixing of media packet is initiated after receiving all the packets from the network, we obtain: Ngrp tprp + (Ngrp ? 1) tmix + tprp Tpkt
? tprp ) ) Ngrp Tpkt t+ (tmix +t mix
prp
(2)
Note that the analysis for Equation 2 assumes that each participant continuously transmits media packets to the mixer. In many applications such as speech tele-conferencing (as opposed to a tele-orchestra, in which, each participant is a performer who is continuously transmitting music), silence detection can be used to transmit only those packets that contain speech levels above a threshold. For instance, if fact represents the fraction of participants who may speak (i.e., be active) simultaneously, then Equation (2) changes to Ngrp fact tprp + (Ngrp fact ? 1) tmix + tprp Tpkt
) Ngrp Tfpkt + ((ttmix +?ttprp)) act
mix
prp
(3)
Since fact 1, Equation (3) yields a larger bound on the maximum number of participants in group. For typical network and media parameters, the bound obtained by Equation 2 and 3 is much smaller than the one obtained from the bandwidth limitations (Equation 1). More precisely, the packet reception rate at the networkhost interface and the packet processing overhead limit the number of participants in a group to a much smaller value than the limit imposed by the bandwidth considerations3 . A hierarchical architecture supports larger number of participants in a conference by increasing the height of the mixing tree while limiting the size of each group. Speci cally, for a hierarchy of height H , if the number of children of each mixer in the hierarchy is bounded by Ngrp , the maximum number of participants that can be supported in a super conference is bounded by (Ngrp )H . 2 In practice, collisions on the network and other processing overhead increase the average time for a successful transmission over a network. The propagation delay tprp provides the lower bound on the transmission overhead. 3 In the case of centralized and distributed communication architectures, a mixer receives packets from all the participants in the conference. Hence, the bound on the value of Ngrp derived in Equations 2 and 3 also provide the bound on the maximum number of participants that can be supported in a conference.
Note that such a mixing hierarchy creates a pipelined system, in which the end-to-end delay increases with the height of the hierarchy. The interactive and real-time nature of collaborations require that the end-to-end delays of media packets be bounded. We will now analyze the eects of delay constraints on the height H of a mixing hierarchy.
4.2 Limits on the Height of the Hierarchy
The end-to-end delay of a media packet is composed of the following: 1. Packetization delay, Dpkt , which includes the time to collect all the samples constituting a packet, and the processing delay to perform operations such as silence detection. If packets are generated at intervals of Tpkt, the packetization delay Dpkt Tpkt . 2. Total communication delay, Dcom , which includes network transmission times as well as queueing and protocol processing overheads from the time a packet leaves a participant to the time a mixed packet containing it returns to the participant. 3. Playback overhead, Dply , which is the time taken by a participant to process a received mixed media packet before playing it back. This processing depends on the media; for instance, in the case of audio, the participant has to remove its own contribution from the mixed packet by subtraction and subsequent normalization. Thus, the total end-to-end delay will be, Dend = Dpkt + Dcom + Dply (4) Note that only the communication delay Dcom depends on the height of the mixing hierarchy. Each mixer in the hierarchy, when it receives the rst packet (from one of its children) that goes to form a new mixed packet, has to delay the completion of the mixing operation and transmission of the composite packet until it receives all the other packets (from the remaining children) that constitute the mixed packet. However, if the network is unreliable, some of the packets that go to form a mixed packet may not arrive at the mixer. Hence, an important question is: how long should a mixer wait for packets from sources before deciding to transmit a partially mixed packet? A simple solution is for the mixer to transmit a partially mixed packet when a media packet that goes to form a subsequent mixed packet is received from one of the sources. However, this causes waiting delays of the order of packet duration Tpkt at each mixer. Hence, for a hierarchy of height H , the communication delay can be given by4: Dcomm = H Tpkt +(H +1) tmax net . Note that Tpkt cannot be chosen to be very small (typical values for voice packets on Ethernet are 20 to 150 ms) mainly to keep the packet transmission overhead low. Hence, even in small mixing hierarchies, the communication delay will turn out to be unacceptably large for supporting interactive and real-time multimedia applications. We now present an algorithm that removes the proportional dependence of Dcom on Tpkt. In this algorithm, each mixer maintains information about the expected generation times of packets at its children. When a participant Ps joins the conference, it transmits a probe packet ns up the hierarchy to enable each intermediate mixer to compute the earliest and latest generation times for the packet. If the packet reaches a mixer at height h at time (ns) (according to the mixer's clock), the mixer computes its earliest and latest generation times as follows: tegen(ns ) = (ns ) ? h tmax net tlgen(ns ) = (ns) ? h tmin net Since packets from source Ps are generated regularly at an interval of Tpkt starting from the generation time of ns , the mixer can estimate the earliest and the latest generation times of packet (ns + k) as follows: tegen(ns + k) = tegen (ns ) + k Tpkt tlgen (ns + k) = tlnet(ns) + k Tpkt 4 H tmax for the transmission delay from a leaf to the root up the hierarchy, and an additional tmax for the multicast of the nal net net mixed packet from the root back to the leaf.
Consider the process of forming the kth mixed packet at the mixer. If the mixer forms a composite packet by mixing packets received from participants P1, P2, ..., Pm , then the minimum earliest and maximum latest generation times of packets constituting the kth mixed packet are given by: tmin te (n + k) gen (k) = s2f1min ;2;:::;mg gen s tmax tl (n + k) gen (k) = s2f1max ;2;:::;mg gen s
Hence, the earliest and latest arrival times of packets constituting the kth mixed packet can be pre-computed as follows: e (k) = h tmin + tmin (k) min net gen l (k) = h tmax + tmax (k) max net gen e (k); l (k)]. Thus, all packets constituting the kth mixed packet must arrive at a mixer at height h within the interval [min max l max max At the root mixer, h = H (height of the entire tree), and max (k) = H tnet + tgen (k). Hence, the maximum l (k) ? tmin (k). aggregate communication delay suered by a media packet from a leaf to the root is given by max gen max Adding tnet for the transmission of the nal mixed packet from the root to all the leaf nodes, we obtain the maximum aggregate communication delay for a media packet as: max min max Dcom = ((tmax gen + H tnet ) ? tgen ) + tnet If the maximum dierence between the generation time of the packets being mixed together is bounded by tdif , we min get tmax gen ? tgen tdif . Hence, we obtain Dcom tdif + (H + 1) tmax net The above computation of Dcom has ignored the overhead of mixing, which incurs a delay of (Ngrp ? 1) tmix at each mixer, increasing the total communication delay to: max Dcom = (tdif + H (tmax net + (Ngrp ? 1) tmix ) + tnet ) Substituting this in equation (4), we obtain that the total end-to-end delay of media packets in the mixing network is given by, max Dend = Dpkt + (tdif + H (tmax (5) net + (Ngrp ? 1) tmix ) + tnet ) + Dply The maximum tolerable end-to-end delay, Dend is usually determined by the nature of media and the application.
Solving Equation (5) for height of the mixing hierarchy, we obtain, D ? (D + Dply + tdif + tmax net ) Hmax end max pkt (6) (tnet + (Ngrp ? 1) tmix ) Given that the maximum number of participants in a group (Ngrp ) is restricted to the value derived in Equation 2, Equation 6 derives the maximum permissible height of a mixing hierarchy that can satisfy Hthe end-to-end delay grp ?1 mixers. Hence, constraints. Note that a hierarchy of height H contains atmost (Ngrp )H participants, and NNgrp ?1 given the bandwidth constraints of the network (and hence, the value of Nmax derived from Equation 1), we have to determine H Hmax such that H H + Ngrp ? 1 Nmax Ngrp N ?1 grp
5 Implementation and Experience
At the Multimedia Laboratory at UCSD, we have implemented a conferencing system on a network of multimedia stations, each consisting of a Sun SPARCstation, a PC-AT, a video camera, and a TV monitor (see Figure 2). The SPARCstations and PC-ATs are connected via Ethernets. The PC-ATs are equipped with digital video processing hardware produced by UVC Corporation [10]. The video hardware can digitize and compress motion video at realtime rates with a resolution of 480x200 pixels and 12 bits of color information per pixel. The SPARCstations are equipped with audio hardware that can digitize voice at 8 KBytes/sec.
Multimedia Station
Multimedia Station
Video Monitor
Workstation
Video Monitor
Camera
Workstation
PC-AT
Camera
PC-AT
GATEWAY
ETHERNETS
Figure 2: Hardware Con guration Symbol Operation tmix Mixing two packets tdif Maximum dierence in generation time of packets mixed together Dpkt Packetization delay Dply Processing delay before playback
Time (in ms) 1.35 66.67 68.3 5.438
Table 2: Timing measurements of the mixing parameters
5.1 Experimental Performance Evaluation
Using a network of a large number of SPARCstations that was available to us, we carried out several experiments to evaluate the performance limits of audio communication and mixing in multimedia conferences. Even though the system supports video conferencing, since the number of PC-ATs with video hardware is four in our current setup, we had to restrict the experiments to only audio. SPARCstations encode audio signals into 8 bit -law5 samples at a rate of 8000 samples/sec. Audio samples are packetized and transmitted on the Ethernet. In order to strike a balance between network transmission overhead (which favors large packet sizes), and the packetization delay (which favors small packet sizes), the audio packet size was chosen to be 512 samples, yielding Tpkt = 66.67 ms., and Rpkt = 16 packets/sec. The overhead for a successful transmission of a packet tprp was measured to be 1.64 ms. The timing measurements for various operations of the audio conferencing system are given in Table 2. Given that the network bandwidth is 10 Mbits/sec (Ethernet), using Equation 1, we get Nmax = 150. Similarly, using Equation 2, we obtain Ngrp = 21. From Equation 6, for an application that permits the maximum allowable end-to-end delay of 200 ms, we obtain that Hmax = 2. In order to validate the performance model, we experimentally measured the maximum number of participants that can be supported in a group. Figure 3 shows the variation of the fraction of packets reaching a mixer with increase in the size of its group. When that fraction goes below 98%, there is a rapid deterioration of voice quality, and the mixing hierarchy breaks down. The break down point yields a maximum group size of 20 in the presence of multicasting, and 12 in its absence, both of which closely match the values estimated using our model. We also measured the eect of gateways, and the eect of load on conferencing. Figure 4 shows the reduction in the maximum group size when the mixer and its group of participants are separated by gateways. The rst gateway causes a signi cant reduction of the maximum group size from 12 to 9, whereas the second gateway causes only a slight reduction. This is because, the reduction is mainly due to the reduced eective bandwidth due to gateways, -law is a CCITT standard for encoding audio.
5
% of packets recieved at the mixer
100 With Multicasting Without multicasting
90
80
70
60
50
40
30
20
10
0 0
5
10
15
20
25
30
Group Size
Figure 3: Performance of audio conferencing with increase in group size
% of packets recieved at the mixer
and not due to increased transmission delay. 100 Two Gateways One Gateway Single Ethernet
90
80
70
60
50
40
30
20
10
0 0
2
4
6
8
10
12
14
16
18
20
Group Size
Figure 4: Eect of gateways on audio conferencing Figures 5(a), 5(b) and 5(c) depict the eect of load on the conferencing system. Figure 5(a) shows the performance of a conferencing system when a mixer serves multiple groups simultaneously. Figures 5(b) and 5(c) illustrate the eect of running compute intensive jobs at the mixer and the participants. It may be observed that increased load at the mixer signi cantly reduces the fraction of packets reaching the mixer. As can be expected, the eect due to load at a participant is less dramatic.
5.2 Related Collaborative Applications
Using our basic conferencing paradigm, we have built several multimedia collaborative applications that are in daily use in our laboratory. Two of the more useful ones are as follows:
% of packets recieved at the mixer
Group Size - 12 Group Size - 10
100
90
80
70
60
50
40
30
20
10
0 0
1
2
3
4
5
6
7
8
9
10
Number of groups
(a)
Jobs running at the participant Jobs running at the mixer
% of packets reaching the mixer
% of packets sent that return to the participant
The jobs were run at the participant The jobs were run at the mixer
100
90
80
70
60
50
100
90
80
70
60
50
40 40
30 30
20
20
10
10
0 0
1
2
3
Number of compilation jobs
(b)
0 0
1
2
3
Number of compilation jobs
(c)
Figure 5: Eect of load on audio conferencing
Tele-presenter:
The tele-presenter enables users at their workstations to remotely attend lectures in progress (see Figure 6). Lectures are broadcast as conferences to which users can join or leave at any time. Each participant in a telepresentation receives snapshots of video (the snapshots can be arranged to be captured whenever the speaker puts up a new slide for presentation), as well as continuous audio originating from the lecture.
Figure 6: A tele-presentation in progress
On-line conference recording system:
Users can record the proceedings of one or more conferences with or without themselves participating in those conferences.
6 Conclusion
We have proposed a multi-level conferencing paradigm (called super conferences) for supporting inter-group collaborations. Hierarchical architectures are naturally suited for carrying out media communication in super conferences. We have presented algorithms for bounding end-to-end delays of real-time media trac in hierarchical communication architectures. Constraints on bandwidth and delay requirements of real-time trac yields some interesting limits on the number of participants in a group and the number of groups that can be supported within a super conference. We have implemented a multimedia conferencing system on an environment of Sun SPARCstations and PC-ATs equipped with digital video and audio processing hardware. Our experimental evaluations have shown that the maximum group size is 12 without multicasting, and 20 with multicasting, which corroborate the estimates obtained from our model. The conferencing system has also served as a basis for developing collaborative applications such as the tele-presenter and the on-line conference recorder.
References
[1] E.J. Addeo, A.D. Gelman, and A.B. Dayao. Personal Multi-Media Multi-Point Communication Services For Broadband Networks. In Proceedings of the IEEE Globecom'88 Conference, pages 53{57, Nov 1988. [2] L. Aguilar, J.J. Garcia-Luna-Aceves, D. Moran, E.J. Craighill, and R. Brungardt. Architecture for A MultiMedia Tele-Conferencing System. Proceedings of the SIGCOMM'86 Symposium on Communications Architectures and Protocols, Stowe, VT, pages 126{136, August 5-7, 1986. [3] S. R. Ahuja, J. Ensor, and D. Horn. The Rapport Multimedia Conferencing System. In Proceedings of COIS'88 Conference on Oce Information Systems, Palo Alto, CA, pages 1{8, March 23-25, 1988. [4] S. Angebranndt, R. L. Hyde, D. H. Luong, N. Siravara, and C. Schmandt. Integrating Audio and Telephony in a Distributed Workstation Environment. In Proceedings of Summer 1991 USENIX Conference, Nashville, TN, pages 419{436, June 10-14, 1991. [5] D. R. Boggs, J. C. Mogul, and C. A. Kent. Measured Capacity of an Ethernet: Myths and Reality. In Proceedings of SIGCOMM'88, pages 222{234, August 1988. [6] S. Casner, K. Seo, W. Edmond, and C. Topolcic. N-Way Conferencing with Packet Video. Proceedings of the Third International Workshop on Packet Video, Morristown, NJ, March 22-23, 1990. [7] P. Cochrane and M. Brain. Future Optical Fiber Transmission Tech. and Networks. IEEE Communications Magazine, pages 45{60, November 1988. [8] H.C. Forsdick. Explorations in Real-Time Multi-Media Conferencing. Proceedings of the 2nd International Symposium on Computer Message Systems, IFIP, pages 331{347, September 1985. [9] K.A. Lantz. An Experiment in Integrated Multimedia Conferencing. In Proceedings of CSCW'86, pages 267{275, December 1986. [10] M. Leonard. Compression Chip Handles Real-Time Video and Audio. Electronic Design, 38(23):43{48, December 1990. [11] L.F. Ludwig and D.F. Dunn. Laboratory for Emulation and Study of Integrated and Coordinated Media Communication. Proceedings of SIGCOMM'88 Symposium on Communications Architectures and Protocols, Austin, TX, pages 283{291, August 3-5, 1988. [12] P. Venkat Rangan and H. M. Vin. Multimedia Conferencing as A Universal Paradigm for Collaboration. In Proceedings of the Eurographics Workshop on Multimedia Systems, Applications, and Interaction, April 1991. [13] S. Sarin and I. Greif. Computer-Based Real-Time Conferences. IEEE Computer, 18(10):33{45, October 1985. [14] H. M. Vin, P. T. Zellweger, D. C. Swinehart, and P. Venkat Rangan. Multimedia Conferencing in the Etherphone Environment. To appear in IEEE Computer - Special Issue on Multimedia Information Systems, October 1991. [15] G. Weiss and C. Ziegler. A Comparative Analysis of Implementation Mechanisms for Packet Voice Conferencing. In Proceedings of INFOCOM'90, pages 1062{1070, June 1990. [16] P. T. Zellweger, D. B. Terry, and D.C. Swinehart. An Overview of the Etherphone System and Its Applications. Proceedings of the 2nd IEEE Conference on Computer Workstations, pages 160{168, March 1988. [17] C. Ziegler and G. Weiss. Mechanisms for Integrated Voice and Data Conferencing. In Proceedings of the SIGCOMM'90 Symposium, pages 101{107, 1990. [18] C. Ziegler, G. Weiss, and E. Friedman. Implementation Mechanisms for Packet Switched Voice Conferencing. IEEE Journal on Selected Areas Communications, 7(5):698{706, June 1989.