A's awareness of B in medium M is a function of. A's focus on B in .... handler(s) and communication ..... Tassel, Jerome, Bob Briscoe and Alan Smith. (1997), âAn ...
A QoS Architecture for Collaborative Virtual Environments Chris Greenhalgh, Steve Benford and Gail Reynard School of Computer Science and Information Technology The University of Nottingham, Nottingham, UK {cmg, sdb, gtr}@cs.nott.ac.uk
management. We first proposed this approach of awareness driven video quality of service in [10]; this paper extends our previous work by developing the underlying QoS architecture required to support this approach.
ABSTRACT We present a QoS architecture for collaborative virtual environments (CVEs), focusing on the management of streamed video within shared virtual worlds. Users express QoS requirements by negotiating levels of mutual awareness using our previously defined spatial model of interaction. The architecture uses these awareness values as part of dynamic QoS management. A key aspect of the architecture is that it maintains a balance between the needs of a group of users as a whole (e.g., which streams are admitted onto a shared network) versus those of individual users within the group (e.g., which streams are subscribed to by a local host). We walk through a demonstration scenario, a virtual shopping mall, to show the architecture at work.
Balancing group and individual needs – our architecture maintains a balance between the needs of individuals and those of the group to which they belong. The needs of a group involve making optimal use of common resources, for example, deciding which streams of information to admit onto a local shared network. Group needs have to be balanced against the needs of the individual, including their specific interests and the resources that are available to them. For example, choosing whether to subscribe to streams once they have been admitted (also limited by the processing capabilities of their local host). We propose that this idea of balancing group needs against individual needs is a key problem for QoS management in CVEs and, indeed, for collaborative applications in general. Section 2 of our paper proposes a QoS architecture that can meet these requirements. Section 3 demonstrates this architecture by walking through a virtual shopping mall scenario. Finally, section 4 describes how this work relates to other research into QoS and CVEs.
INTRODUCTION Collaborative virtual environments (CVEs) are distributed multi-user virtual environments that support cooperative work and play [1,3,5,6,8]. They are a challenging class of application for multimedia networking for several reasons. First, they may support tens or hundreds of participants, many of whom may be active contributors of information at any moment in time. Second, participants may take part in highly dynamic forms of communication involving exploration of a social space, chance encounters and dynamic group membership. Third, they integrate multiple media including 3D graphics, text and streamed audio and video. We present a QoS architecture for CVEs that aims to meet these challenges. There are two key aspects to this architecture. Dynamic QoS negotiation – participants express their communication requirements by negotiating socalled levels of mutual awareness. These may be controlled implicitly, for example by moving within virtual space, or explicitly, for example by attending to or projecting information at specific objects in the world. Our architecture uses changing levels of mutual awareness to drive session control and QoS
PROPOSED QoS ARCHITECTURE We now introduce our QoS architecture for CVEs. Two points are worth noting from the outset. First, although the principles and the architecture might be applied to all of the media that define a CVE, much of our explanation focuses on the management of video streams that are associated with objects in a virtual world and that are embedded within it as dynamic texture maps. Second, we build on our previously developed spatial model of interaction. This is a flexible framework for negotiating mutual awareness that moves beyond approaches such as level of detail and importance of presence [9] with regard to its support for collaborative interaction in virtual environments [1, 4, 5]. As it is referred to 1
several times in the following text, a brief review of the spatial model is necessary at this point.
and objects. For example, information about streams, layers and multicast groups that are associated with video sources in the world. Our architecture inspects the awareness values in order to determine an optimal allocation of the available streams among the users. In doing this, it tries to maintain a balance between meeting individual preferences and achieving an optimal group allocation of streams. The overall approach is shown in figure 1. Figure 2 provides greater detail.
A brief review of the spatial model The spatial model of interaction is a computational framework that allows a CVE to estimate the relevance of different elements of a virtual world to each user. This is expressed as the user’s “awareness” of these elements. The spatial model uses three concepts to negotiate awareness: •
nimbus: controlled by the originator of information to express social behaviours such as interrupting, shouting and whispering, as well as security restrictions and directionality.
•
focus: controlled by the (potential) recipient of information to express allocation of attention, preference, and interest.
Collective choice Individual preferences combine to group preferences. These are used to allocate shared resources. individual preferences
shared resources Individual choice
Users express individual preferences via awareness. These are used to select among shared resources.
•
third party objects: the context in which the communication occurs, which may include boundaries (e.g., walls and windows) and modifiers (e.g., a speaker’s podium). Focus and nimbus are defined as spatial fields whose values may vary with distance from the user. Focus, nimbus, and the effects of third party objects can be defined independently for each potential medium of interaction. Considering two participants A and B, A’s awareness of B in medium M is a function of A’s focus on B in M and B’s nimbus on A in M, modified by any relevant third party objects. Although the role of focus is relatively intuitive – supporting users in selecting information of interest, the role of nimbus requires more consideration. Essentially, nimbus allows a user to project their information at others to a greater or lesser extent, thereby engaging in activities such as whispering, shouting and interrupting. The demonstration section (and figure 4) presents an example of how focus and nimbus have been realised in the virtual shopping mall scenario.
Figure 1: relationship between individual and collective choice Collective choice – is concerned with optimising group usage of resources. Referring to the top of figure 2, awareness information representing individual preferences across all potential streams is collated for all current users of a given environment, and is used to determine combined group preferences (see later for how this may be done). These guide any decisions that will affect the group as a whole, especially the process of dynamic group admission control that determines which streams from among all of those available should made available to the group. This may be subject to a dynamic source quota that limits the total number of available streams (and layers). Individual choice – is concerned with how QoS is managed for each individual user. Referring to the bottom of figure 2, the user’s individual preferences are used to select among the currently available streams of information, i.e., those that were admitted in the collective phase. This may be subject to an individual budget that limits the number of streams and layers to be processed by a local host.
High-level overview of our approach Our architecture takes two primary inputs. The first is a matrix of awareness values among a set of users and objects in the virtual world (or among some subgroup if the world is partitioned into regions). These are interpreted as declarations of users’ preferences for resources within the world. They change dynamically as users move about or adjust the parameters of the spatial model (e.g., changing the shapes of focus and nimbus). The second input is a description of all of the potentially available streams of information that are associated with users
Implementation Issues At this point, it may help the reader to see this overall approach in relation to a particular implementation. In our work we have made extensive use of network-supported layered multicast to distribute streamed media, especially streamed video that is associated with objects in the world. This allows us to make a clear division 2
between individual choice, which is expressed through individual receivers joining and leaving multicast groups, and collective choice, which is
expressed through dynamic control over multicast sources (e.g. changing admitted bandwidth or use of server resources).
Collective choice (allocates shared resources according to combined preference) Dynamic group admission control
Dynamic source quota
Multicast stream Multicast stream source(s) sources (audio, video, data, …) (audio, video, data)
Combined group preferences
All potential streams
Currently available (admitted) streams
User(s) User(s) Individual preferences (all potential streams)
Final selection (from currently available streams)
Individual choice (chooses from among allocated shared resources according to individual preference)
Figure 2: a Group-oriented QoS Management Architecture combined group preferences. We argue that there can be no universally applicable solution to this problem, because it will be profoundly influenced by the social, task, and organisational context in which the system is used. For example,
In constrained network contexts (e.g., shared LAN based applications) this decoupling of sources and receivers is straightforward; the addition of a new receiver to an available multicast group has no additional effect on the network or any other shared resources. Thus, collective choice determines which streams and layers are admitted to the shared network whereas individual choice determines to which of these streams and layers a particular user subscribes. However, in more complicated network topologies the addition of new receivers will in itself affect the flow of traffic over the network (e.g., for IP multicast, the local router will only forward multicast traffic onto the local network if a host has requested it via IGMP). In this case, admission control and receiver control must be considered in a more integrated fashion. We now consider the issues of collective and individual choice in more detail describing how they are realised in our current implementation.
•
Are all users created equal, or are some users more “equal” than others?
•
Is fairness important? If so, to what extent? Is total group utility more important than any individual’s experiences?
•
How does the group define “utility” (or whatever other measure is used in allocating shared resources)? Does the group agree a common definition, or do members have competing ideals?
•
How are costs shared among users? Do additional receivers for an existing group pay only the marginal cost (this may be zero), or do they pay a share of the total cost for the stream? However, we also argue that we do not need to find a single perfect answer to these questions (which would take a set of individual preferences and give
Collective choice A basic requirement of our approach is to be able to map from a set of individual preferences to 3
“the answer”). Rather, because CVEs are dynamic and interactive systems, we need to give each user sufficient control, feedback, and understanding of the process so that they can incrementally refine their own preferences in order to guide the group allocation process. For example, suppose a user is interested in a particular video source, but is not currently receiving an adequate quality of service. They should be able to progressively increase their “focus” on that source – explicitly increasing their declared interest in it – and the group allocation system should be sensitive to this. Appropriate feedback (with coping strategies) should be provided if the user’s requirements cannot be met. Our architecture handles this process of collective choice through two components: a CVE monitor and a group QoS manager. From the environment’s shared state the CVE monitor discovers the identities of all potential observers (users) and the session information for all potential media sources. Part of this shared state includes information about spatial model awareness, foci, nimbi and third party objects; this allows the CVE monitor to reproduce the awareness calculations of each user, and so determine their individual preferences. The CVE monitor passes awareness information and stream details to the group QoS manager. This combines all of the users’ requirements to form a composite set of requirements. As noted above, there is no single solution to this merging of requirements. Our current implementation gives preference to: first, lower layers over higher layers (provided they are required by a user); second, the combined awarenesses of all users interested in the source; and third, the number of users interested in the source.
Individual choice We now look in more detail at the generation of individual preferences, and the way in which these can be integrated into a general host QoS management system. Figure 3 shows the host QoS architecture that we have adopted, as it relates to the handling of streamed video in CVEs. On the right of the figure is the user’s CVE application, which allows them to explore a shared 3D virtual environment including live video images. On the left are generic (application-independent) host QoS facilities, derived from [11]. This figure assumes that the host is being used by a single user. As is typical in CVEs, the application determines the identities and requirements of all of the video streams available within a virtual world from the replicated virtual world state. The CVE application itself applies the spatial model of awareness to determine the user’s moment-by-moment awareness of each video stream in the current virtual environment. The interaction between the CVE application and the host QoS facilities is as follows. The CVE application informs the host QoS manager of the possible streams that it knows about (from session descriptions embedded within the virtual world state). The CVE application also continually updates the QoS manager with the user’s current awareness of each source, as determined by the spatial model of interaction. The QoS manager performs the actual selection of sources (and layers within those sources). This is based on the application preferences (spatial model awareness), together with other constraints and factors, e.g. user-specified resource quotas for particular applications, available host resources, and the user’s apparent activity and interests across any other concurrently executing applications as determined by the top-level awareness model. The QoS manager continually updates the CVE application with the set of streams (sources and layers) which have been chosen. The CVE application passes on these selections to embedded stream handlers, which handle joining of relevant multicast groups, and stream decoding.
The group QoS manager has its own group quota which it must allocate (this can be changed explicitly and dynamically via a dialog box in our implementation); it allocates the available resources according to the collective preferences, determined above. It performs this reallocation every couple of seconds in the current system. Finally, the group QoS manager notifies each of the video sources when layers are admitted or rejected. This is done via a central video coordinator in our current prototype. Clearly, admission and rejection are not final decisions, and all video sources remain active, waiting for any change in the number of layers being admitted, and adjusting their sending accordingly.
In the case of video, the final decoded frames are passed onto the graphical renderer, which uses them to update textures on objects within the 3D scene, to give embedded video views within the virtual environment (audio streams are mixed and played out using the host’s audio facilities.)
4
General QoS feedback/control
User Awareness customisation
User agent and top-level awareness model
Spatial awareness model
3D view and inscene QoS feedback
Application input
Renderer
Interaction/ Navigation
Host User budget agent
Overall priority information
QoS monitor
QoS Manager: negotiate costs and optimise
Host and network quoting
Host OS control
Application priority information
Video textures Stream details
Stream manager Stream control
Stream handler(s) and communication facilities
World state e.g. scene graph
Shared data service
Host QoS infrastructure
State transfer and updates
CVE application Network
Video data stream
Video session information
Figure 3: Host QoS Architecture Integrating Spatial Model Awareness
In this way we hope to combine dynamic user requirements with QoS guarantees, to create a system in which appropriate performance and cost targets can be met while being sensitive to the users changing requirements and interests.
The key points of this host QoS architecture are as follows. •
•
•
An awareness model suited to the application (such as the spatial model of interaction in a CVE) is used to determine priorities between streams with regard to each application.
Having described our awareness-driven grouporiented QoS architecture, the next section illustrates these issues and the architecture using our current prototype.
An application-independent awareness or task model (the user agent and “top-level” awareness model) with an associated customization interface guides resource allocation between applications. For example graphical dialogs could be used to specify quotas for particular media or applications, or the system might more generally monitor the user’s apparent activities in different applications (e.g. whether they are iconised, or have the keyboard focus).
DEMONSTRATION The demonstration of our QoS architecture consists of software extensions to our own MASSIVE-2 CVE platform [4] and an example virtual world, a virtual shopping mall. We begin by describing the basic capabilities of our current prototype. We then describe the virtual shopping mall scenario. Finally, we present the results of a walkthrough of this scenario using the current prototype in order to illustrate the resource allocation part of our QoS architecture in operation.
A host QoS manager takes this inter- and intraapplication prioritisation information and uses it to actually determine resource allocations based on host, network and budget constraints.
5
observe each source (provided that the graphical rendering frame-rate is at least 10Hz).
Prototype capabilities We have extended MASSIVE-2’s video support [10] in two main respects: •
The details of all available video layers and the spatial model awareness of each are passed to a QoS allocator for each individual user. This determines which layers of which streams should be subscribed to, subject to a total (configurable) host resource budget for video handling within the application.
•
A group resource allocator monitors the requirements of all users in the world, and forms collective preferences that it uses to perform dynamic video source admission control, against a finite (variable) group budget.
This simple encoding and distribution scheme means that the additional CPU utilization due to handling video is almost directly proportional to the number of frames per second of video being received. Therefore our current QoS allocator for a local host is given a CPU budget expressed as a number of frames per second of video which can be handled across all streams. This frame rate budget can be changed interactively (via a 2D dialog box) to control the maximum resource utilization of this particular application (assuming that several applications may be running locally). The optimisation implemented in the QoS manager uses awareness for two distinct purposes:
Our current implementation uses a simple layered multicast distribution scheme for video. The video uses intra-frame compression (JPEG) only. Each video source is allocated a number of multicast groups, and sends a different subset of its possible video frames to each group. In this demonstration the layering is: •
layer 1 only gives 1Hz video frame-rate.
•
layers 1 and 2 together give 5Hz frame-rate.
•
layers 1, 2 and 3 together give 10Hz frame-rate.
•
it uses fine-grained variations in awareness to prioritise video streams, e.g. to reflect variations in distance and/or orientation with regard to the observer.
•
it uses gross distinctions in levels of awareness to infer particular kinds of communication requirements, and allocates resources differently for each situation. To make this clearer, figure 4 shows the functions which have been used for focus and nimbus (the observer’s and the source’s respective controls over awareness).
Each receiver independently chooses which layers of the admitted sources it should join. This in turn determines the frame-rate at which the user will
Focus: observer
Nimbus: source
User Normal availability Face-to-face and select zone Group and conversation zone Peripheral zone
No availability
Figure 4: functions used for focus and nimbus
6
Reduced availability
homes, for example using their television and a settop box. The use of CVE technologies supports various social aspects of shopping found in the physical world including: shopping as a group; chance encounters and meetings within the environment; direct interaction with sales staff for pre- and post-sales support; and a sense of society and common activity. The mall comprises: open social spaces for group interaction and to facilitate chance encounters; entry/exit portals which conceal and smooth over the arrival and departure of participants; avenues or arteries which link social zones; and shops, distinct retail spaces associated with particular business entities. Part of the mall is shown in figure 5. Overlaid on the figure is the sample scenario that we will walk through with the system later in this section. Although this journey has been constructed for this demonstration, our experience suggests that it is typical of the kinds of social activity that occurs within CVEs.
With regard to focus, as a participant moves about a virtual world they “carry” their focus with them as a spatial field which conforms approximately to their visual field of view. Focus is typically configured so that the user will be more aware of objects that are close to them and that are close to the centre of their field of view (rather than at the periphery). In addition we have chosen to pick out three distinct sub-zones within the participant’s focus:
•
a face-to-face or near-field zone, in which the object is assumed to be so close as to be dominating the user’s attention and should therefore be allocated the best possible QoS (this also has a narrow extension in the exact centre of the field of view which allows a user to “concentrate” on a slightly more distant object by placing it centrally in their view);
•
a group or conversation zone, which is intended to correspond to normal social and conversation groups, in which the user wants to be aware of everyone, though perhaps giving emphasis to one or two in particular, such as the current speaker(s);
Various views of the environment (screen-shots) are shown in figure 6. Figure 6 (a) shows the user’s view on leaving the entry portal at location A and first stepping into the social zone. Figure 6 (b) shows their view when talking face-to-face with a friend at B. Figure 6 (c) shows their view when taking advice from the group of shoppers at location C. Finally figure 6 (d) shows their view while moving from the group at C towards the shop which is in the centre of the avenue in front of them.
•
a more distant peripheral zone, in which interesting things may occur, but which are not considered to be immediately important to the user. These sub-zones identify particular kinds of human communication requirements. Furthermore, more subtle variations within each allow the QoS manager to prioritise resource allocations in a way which is responsive to user position and movement (e.g. the user might shift their orientation to make one particular video stream more central, causing it be preferentially allocated resources).
In this example, each user has a textured video face. The QoS allocator determines which layers of which streams to join using the following algorithm:
1) take face-to-face and select zone video sources in order and allocate 5Hz if possible, else 1Hz if possible;
On the other hand, nimbus (which is associated with sources) is used in these examples to suppress video streams when the potential observer is behind them (and therefore could not see them anyway, unless people had transparent heads). In addition, nimbus can be used to indicate requirements for privacy (a small, local nimbus) or to distinguish different levels of “push”, e.g. whispering, normal speech or shouting.
2) take group and conversation zone video sources in order and allocate 1Hz if possible;
3) upgrade face-to-face/select zone video sources (if any) to 10Hz if possible;
4) upgrade group zone video sources, in order, to 5Hz if possible;
5) take peripheral zone video sources in order and
Virtual Shopping Mall Scenario The example application we have chosen to illustrate both our QoS management architecture and the use of video in CVEs is a social virtual shopping mall similar to that of Interspace [3]. It is anticipated that users might access such an environment from their
allocate 1Hz if possible. This algorithm aims to achieve a balance between supporting group discussions, one-to-one dialogues and general awareness of others who are nearby, all of which are relevant to the shopping mall scenario. 7
Social zones: browsing and chance and planned meetings
D 15
14
Shops 13
8 10
12
6
A. Would-be shopper arrives at entry portal. B. On entering the social zone sees a friend and joins them for a chat. C. Noticing an interesting discussion, moves over to join in briefly. D. Receives directions to a particular shop and talks to an assistant there. E. Returns home via the portal
4
5C 7
2
A
The plot:
9
11
1
3
B
Other virtual shoppers Entry/exit portals
“Arteries”: access to shops and link
between zones
Figure 5: Virtual Shopping Mall Scenario
(a) On entering the social zone.
(b) Talking face-to-face with a friend.
1
(c) Taking advice from the group.
(d) On the way to the shop.
Figure 6: various views from the example scenario 8
example, between times 60 and 100 the user is talking to the group of four other (figure 6 (c)), and their nominal requirements are up to 25 frames/second (one – in the centre of the field of view – at 10Hz, and the other three at 5Hz).
System Walkthrough Results Using the current system we have walked through this application with static agents to represent the other participants. These have the same nimbus and video characteristics as would real participants (except that they do not move about or get bored). During this walkthrough we have considered the resources allocated to the individual user walking through the scenario, and the total network activity that represents the shared resources allocated to the group. The former is shown in figure 7. The highest line (“Desired”) shows the number of video frames per second that the user would receive if awareness directly determined the choice of layers. For
The next line (“With Host Budget”) shows the effect of the host QoS manager imposing a budget limit of 10 frames per second. This line approaches the “Desired” line, but is correctly limited to no more than 10 Hz. At times, a lower level is all that can be achieved due to the quantisation of layers (e.g. layer 3 on its own is 5 frames of video per second). The host budget limit only has a significant effect between times 60 and 100 as noted above.
Received Video Frames/second for Scenario User 30 Desired With Host Budget (10) With Group Budget (20) Layer Joins
Received Video Frames (per second)
25
20
15
10
5
0 0
20
40
60
80 100 Elapsed Time (seconds)
120
140
160
180
Figure 7: Individual User Activity friend, or at time 160 when they are talking to the shop assistant. Also shown are layer join events, i.e. times at which the user joins a different video layer. For example, at time 25 the user enters the social zone (figure 6 (a)), and suddenly requests layer 1 for the seven other users in view. Throughout the scenario the user joins (and leaves) 20 layers, at an average rate of one layer every 8 seconds. Figure 8 summarises the overall group allocation during the walkthrough. Not shown on the graph is the total available video traffic, which is 160 frames per second, throughout (16 video faces, including the main user).
The lowest line (“With Group Budget”) shows the intersection between the host’s budgeted request and the currently admitted video layers (as determined by the group QoS manager). In this example, the group budget has been to set to only 20 video frames per second. In places, the user gets most or all of their desired layers. This occurs either when they are interested in layer 1 only (which is given priority in the group allocation process used – e.g. time 25), or when they are talking to the group at C (since this group has sufficient influence with the group allocation process to get some additional layers – e.g. time 100). On the other hand, the user gets much less than they wanted when they are the only individual interested in a particular source, for example at time 50 when they are talking to their 9
face-to-face with another relatively isolated user (their friend at time 50, and the shop assistant at time 160): the user requests 10 Hz from the other, who in turn requests 10 Hz from the user. The low region, during the group interaction, is due to the members of the group all tending to focus on the new user, rather than on other (different) members of the group.
The upper line (“Desired (all users)”) shows the total number of multicast video frames per second, as requested by all users (each user is applying their own local budget, as above). Since layered multicast communication is used, multiple users requesting the same source and layer have no impact on this figure. The user’s movement through the space causes this to vary from 38 to 65 frames per second. The high points are around times 50 and 160, when the user is
Admitted Network Video Frames/second, all Sources 70 Desired (all users) Admitted Layer Admissions
Admitted Video Frames (per second)
60
50
40
30
20
10
0 0
20
40
60
80 100 Elapsed Time (seconds)
120
140
160
180
Figure 8: Group (Admission) activity The lower line (“Admitted”) shows the number of multicast video frames per second admitted to the network from all video sources in the world. It falls off at the end due to layer frame rate quantisation, e.g. when the moving user wants layers from a currently inactive source they receive at least layer 1, and this leaves insufficient budget for layer 2 of another source, causing an overall drop. The group Qos Manager ensures that the admitted frame rate at any time does not exceed 20 frames/second. Also shown are layer admission events, when the group QoS manager admits a new layer of a source. For example, at time 155 the moving user requests video from the shop assistant, and the group QoS manager admits the shop assistant’s layer 1 video in response. There are a total of 29 admission events during the scenario, an average of one every 5.5 seconds.
RELATED WORK In this section we briefly consider related research with regard to other CVEs that support streamed video, and work on QoS negotiation, QoS specification and layered multicast distribution. CVEs with streamed video – several recent CVEs support the integration of live video streams within 3D virtual worlds, displaying the video as dynamic texture maps. These include CU-SeeME VR [6] and Freewalk [8], which are 3-D video conferencing applications: in both systems users are embodied and mobile within a graphical virtual world as simple blocks with (typically) head-and-shoulders video texture mapped onto the front of the block. Previous work with our own MASSIVE-2 system [10] has also included integration of live video streams for faces and views into the physical world. This paper describes a prototype in which resources are allocated to individual video streams 10
dynamically, to reflect as closely as possible the subset which are currently in view. The CUSeeMeVR system does not attempt to perform this kind of dynamic management; however Freewalk does perform more active selection of video streams according to physical distance (which also determines selected resolution). Similarly, the (single-user) system of [9] determines the priorities for video objects within a scene based on a combination of distance and mutual orientation with respect to the observer; the frame-rates of lowerpriority video streams are reduced to achieve a satisfactory rendering frame-rate and network usage. We suggest that the spatial model of interaction’s framework for calculating awareness adds further flexibility and dynamic control to these approaches, especially the inclusion of nimbus to specifiy the projection of information from a source and third party objects to define the effects of context such as boundaries. QoS negotiation – we have already described the characteristic features of CVEs; perhaps the most significant with regard to QoS is the highly dynamic nature of user requirements, and hence of sessions and of QoS more generally in these applications. As Cambell et al. note [2] QoS renegotiation was essentially absent from early work on QoS. While it is now recognised as being integral, we believe that CVEs are much more demanding in this respect that other more established QoS-critical applications (e.g. media on demand). More generally, our starting point in looking at QoS (like Vogel et al. [12]) is the user and their interests and activities. Indeed, the central focus of our work is on dynamic specification of user QoS requirements; we rely on existing QoS architecture work to populate the other layers of a complete architecture. QoS specification – our approach to QoS specification is rather less direct than is typical for current applications. For example, [11] (on which we have based the host QoS management component of our approach) employs explicit dialogs with the user to establish and adjust Qos and cost settings. We use direct mechanisms of this form to set overall budgets (e.g. the host and group framerate budgets are set using dialog boxes). However, within these top-level constraints there are many streams, and user requirements are highly dynamic; it would be nonsensical to require the user to directly configure and reconfigure QoS for each source in the virtual world as they moved about.
Layered multicast distribution – our approach to layered multicast distribution can be related to Receiver-driven Layered Multicast (RLM) [7], although with some distinctions. In a similar way to RLM, we use layered multicast to support heterogeneity between receivers. Individual choice is also expressed ultimately through receiver-initiated joins. However, our emphasis on multiple related streams (i.e. multiple media streams within a shared virtual world) means that receivers differ not only in network and end-system characteristics, but also in the details of each user’s momentary interests. We have also introduced an explicit element of admission control, in the form of the group QoS manager. In this way we provide direct support for cooperative decision-making, which is not present in the RLM scheme; our applications are likely to be unstable and relatively inefficient without some collaboration between receivers. In future, the group QoS manager’s functionality could be effectively distributed between the sources and receivers, to return to a fully distributed realisation. SUMMARY We have introduced a QoS architecture for CVEs that incorporates two key features: dynamic QoS negotiation and achieving a balance between optimising group and individual resource allocations. Our architecture is based on the use of the spatial model of interaction to provide a flexible framework for computing levels of mutual awareness in a CVE. These levels of awareness are used by a group QoS manager to select information and make it available to the group (e.g., admitting video streams and layers to a shared local network). They are also used by local QoS managers on each host to decide how individual users subscribe to this selected information (e.g., subscribing to different video streams and layers) once they have been admitted. We presented a demonstration application of our architecture, a virtual shopping mall. A walkthough of this application showed how the QoS architecture allocated resources to a group of users and to an individual within this group. We propose that our architecture embodies a viable approach to managing rich and dynamic social interaction within CVEs. More generally, we anticipate that this approach might be used to support other kinds of co-operative application that will also involve balancing the QoS requirements of individuals against the common needs of a group.
11
Proc. ACM Conference on Human Factors in Computing Systems, CHI’98, Los Angeles, March 1998, ACM Press (in press). 11. Tassel, Jerome, Bob Briscoe and Alan Smith (1997), “An End to End Price-Based QoS Control Component Using Reflective Java”, Fourth COST 237 Workshop, Lisboa, Portugal, December 15-19, 1997 12. Vogel, A., Kerherve, B., von Bochmann, G., and Gecsei, J. (1995) “Distributed Multimedia and QOS: a Survey”, IEEE Multimedia, Summer 1995, pp. 10-19.
REFERENCES 1. Benford, S.D., Bowers, J., Fahlén, L.E., Greenhalgh, C.M., Mariani, J., and Rodden, T.R. (1995), “Networked Virtual Reality and Cooperative Work”, Presence: Teleoperators and Virtual Environments, Vol. 4, No. 4, Fall 1995, pp. 264-386, MIT Press. 2. Cambell, A., Coulson, G., and Hutchinson, D. (1994) “A Quality of Service Architecture”, ACM SIGCOMM Computer Communications Review, 24 (2), April, 6-27. 3. Gen Suzuki, “Interspace: Toward Networked Reality of Cyberspace”, Proc. Imagina, 1-3 February, 1995, Monte Carlo, ISBN 2-86938113-1. 4. Greenhalgh, C. and Benford, S., “Supporting Rich and Dynamic Communication in LargeScale Collaborative Virtual Environments”, Presence: Teleoperators and Virtual Environments, MIT Press, 1998 (in press). 5. Greenhalgh, Chris, and Benford, Steve (1995), “MASSIVE: A Virtual Reality System for Teleconferencing”, ACM Transactions on Computer Human Interaction (TOCHI), Vol. 2, No. 3, pp. 239-261, ACM Press, September 1995. 6. Han, J. and Smith, B., CU-SeeMe VR Immersive Desktop Teleconferencing, Proc. Multimedia’96, Nov 18-22, Boston, USA, pp199-208, ACM Press. 7. McCanne, S., Jacobson, V., and Vetterli, M. (1996) “Receiver-driven Layered Multicast”, Computer Communications Review, ACM SIGCOMM, 26 (4), Oct. 1996, ISSN 01464833. 8. Nakanishi, H., Yoshida, C., Nishimura, T. and Ishida, T., Freewalk: Supporting Causal Meetings in a Network, Proc. CSCW'96, pp. 308-314, ACM Press. 9. Oh, Seiwong, Hiroyuki Sugano, Kazutoshi Fujikawa, Toshio Matsuura, Shinji Shimojo, Masatoshi Arikawa and Hideo Miyahara, “A Dynamic QoS Adaptation Mechanism for Networked Virtual Reality”, Proceedings of Fifth IFIP International Workshop on Quality of Service, pp. 397-400, Columbia University, New York, USA, May 1997. 10. Reynard, G., Benford, S., and Greenhalgh, C. (1998), “Awareness Driven Video Quality of Service in Collaborative Virtual Environments”, 12