Patterns of Network and User Activity in an ... - Semantic Scholar

3 downloads 55597 Views 71KB Size Report
network audio communication, using microphone-headsets. The two teams ..... conferencing type trials. .... “For example, in an audio conference the data [audio].
Patterns of Network and User Activity in an Inhabited Television Event Chris Greenhalgh, Steve Benford, Mike Craven School of Computer Science and IT, University of Nottingham, Nottingham, NG7 2RD, UK +44 115 951 4221

{cmg, sdb, mpc}@cs.nott.ac.uk mechanisms that could enhance the capability of CVEs to support applications such as inhabited television.

ABSTRACT Inhabited Television takes traditional broadcast television and combines it with multiuser virtual reality, to give new possibilities for interaction and participation in and around shows or channels. “Out Of This World” was an experimental inhabited TV show, staged in Manchester, in September 1998, using the MASSIVE-2 system. During this event we captured comprehensive records of network traffic, and additional logs of user activity (in particular movement and speaking). In this paper we present the results of our analyses of network and user activity in these shows. We contrast our results with those obtained from previous analyses of teleconferencing-style scenarios. We find that the inhabited television scenario results in much higher levels of user activity, and significant bursts of coordinated activity. We show how these characteristics must be taken into account when designing a system and infrastructure for applications of this kind. In particular, it is clear that any notion of strict turn-taking (and associated assumptions about resource sharing) is completely unfounded in this domain. We also show that the concept of “levels of participation” is a powerful tool for understanding and managing the bandwidthrequirements of an inhabited television event.

The defining feature of inhabited television ([1], [7]) is that an on-line audience can socially participate in a TV show that is staged within a shared virtual world. The producer defines a framework, but it is the audience interaction and participation that brings it to life. A broadcast stream is mixed from the action within the virtual world and transmitted to a conventional viewing audience, either as a live event or sometime later as edited highlights. Inhabited TV seeks to leverage on-line communities to provide new forms of content for digital television, while, at the same time, providing a new focus and impetus for those on-line communities. Inhabited TV (unlike interactive TV) also extends the traditional television viewing experience to include social interaction between viewers or participants, new forms of control over narrative structure (e.g., navigation within a virtual world), and greater interaction with content (e.g., direct manipulation of props). The approach that we take in this paper (as in [5] and [8]) is based on the quantitative analysis of network and system records. We reserve more qualitative analysis for human experts (e.g. [2], [3]), although [9] is more ambitious in its application of a quantitative approach. The quantitative models of user and network activity that we derive allow CVE (and network) designers and administrators to understand the loads under which such systems are likely to operate. For example, they can be used to make informed decisions about the network resources required to support a particular system and user community (commonly referred to as provisioning). These issues will be increasingly significant as inhabited television moves into the commercial arena. This type of analysis is also relevant to work on the simulation of crowds in CVEs and other applications.

Keywords CVE, VR, network analysis, user behaviour, inhabited television.

1. INTRODUCTION Inhabited TV combines collaborative virtual environments (CVEs) and broadcast TV to create a new medium for entertainment and social communication. This paper analyses the network utilization and the patterns of user activity (primarily moving and speaking) that arose during an experimental inhabited television event called “Out Of This World” (OOTW). This work follows on from, and is contrasted with, our previous analyses of network and user activity in CVEs used for teleconferencing ([5], [8]). As we will show, inhabited television gives rise to significantly different patterns of user activity. From this analysis we derive proposals for specific technical

This paper is structured as follows. Section 2 introduces the Out Of This World event on which this paper is based. Section 3 then presents an analysis of the patterns of network activity observed in one of the “showings” of OOTW. Section 4 draws on data from all four performances of OOTW to characterize user activity (moving and speaking) in this event, and contrasts this with the results from previous studies of other applications. Section 5 proposes a number of technical and organizational mechanisms which could be used to enhance the support which CVEs provide for applications such as inhabited television. Finally, section 6 concludes and identifies areas for future work.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commerical advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior permission and/or a fee.

2. OUT OF THIS WORLD This section describes pertinent aspects of the Out Of This World (OOTW) event, to give the reader a context within which to situate the analysis which follows.

VRST 99 London UK Copyright ACM 1999 1-58113-141-0/99/12…$5.00

34

OOTW was a public experiment with inhabited TV that was staged in front of a live theatre audience. The event was staged as part of ISEA: Revolution, a programme of exhibitions and cultural events that ran alongside the 9th International Symposium on Electronic Art (ISEA’98) that was held in Manchester in the UK in September 1998. There were four public performances of OOTW in the Green Room theatre over the weekend of the 5th and 6th of September. Following three previous inhabited television experiments ([1]) the top-level goals of OOTW were to: involve members of the public in a fastmoving TV show within a collaborative virtual environment; and to produce a coherent broadcast from the action within the CVE. Further details can be found in [7].

two teams (the aliens and the robots) had to race across a doomed space station in order to reach the one remaining escape craft. On their way they had to compete in a series of interactive games and collaborative tasks in order to score points. The final game was a race in which these points were converted into a head-start for the leading team. The winning team, with the best loser, were then able to escape to freedom.

OOTW was a game-show, with a cliched outer space theme. The game world was populated by 11 main participants, divided into three “levels of participation” (inhabited television roles); these were:



root inhabitants to the spot, e.g. during interview and debriefing phases of the show (as shown in figure 1);



allow them free motion within a single game arena (the virtual world was divided into five distinct areas or arenas); or



move them rapidly from one arena to the next, along virtual “travellators”.



a show host, who appeared in the world as a live video texture,



two team leaders, who were professional performers, who used immersive VR interfaces including HMDs and magnetic trackers, and



OOTW was realized using the MASSIVE-2 system ([6]). For the show we created a dedicated world management system. This management interface allowed us to partially control the activities of all of the participants, including the inhabitants. In particular, it allowed us to apply constraints to their movements. This allowed us at various times to:

In this way we sought to increase the pace of the event, and ensure that participants would be in the right place at the right time (which can be very hard to achieve in a CVE).

two teams of inhabitants (four in each team), drawn from members of the public entering the event, who used a desktop interface with a joystick for navigation.

As we will show in the analysis, both levels of participation and the management system had a significant influence on characteristic network and user activity.

Figure 1 shows part of the virtual world: the robot team (four inhabitants, plus team leader) has just lost the final race – the finishing line is visible in the background.

3. NETWORK TRAFFIC The first part of this analysis is concerned with the network traffic observed during the event. In particular, we analyse the complete network traffic generated during the second show (which was captured using the tcpdump utility – a freely available tool that captures all network packet headers to a file). This record of network traffic was supplemented by automated records of significant events in the world (e.g. management operations), and video recordings of the shows (both the final broadcast output, and an additional view from a virtual monitoring camera). We begin by exploring the total network traffic from that show, identifying the main types of traffic and the form of the performance’s bandwidth requirements. We then compare the bandwidth requirements of different participants according to their level of participation (i.e. host, captain, inhabitant, and camera).

3.1 Network Traffic Components The tcpdump tool captures complete network packet headers. By examining the source and destination addresses of packets and their protocol identifier we are able to classify the traffic according to its purpose. The traffic trace for this analysis begins at 20:03:08 on the 5th September 1998, just before the world and all users are restarted for the show. The actual broadcast from the world runs from time 2394 to time 5024.

Figure 1: Out Of This World – the robot team in defeat The world also contained four (invisible) virtual cameras, each with its own human operator (these comprised a fourth level of participation). The images from these cameras were sent to a video mixing desk, where an experienced TV director mixed a single broadcast output which was projected into the adjoining theatre space in front of the main viewing audience.

The main traffic types and their contributions are summarized in the table below and discussed in the remainder of this section. Note that multicasting is a networking mechanism that allows the same packet to be sent efficiently to many different destinations, whereas a unicast packet can only be received by a single

All of the participants (except for the cameras) had real-time network audio communication, using microphone-headsets. The

35

designated recipient. Most MASSIVE-2 communication is multicast, with most participants’ machines receiving essentially the same information. Type

Bytes

Audio (Multicast)

%

523160128

Rate, bit/s

91.0

Considering the multicast traffic components in turn:

783760.5

Video (Multicast)

22709952

4.0

34804.5

CVE data (Multicast)

18219904

3.2

27295.7

TCP

6759358

1.2

10126.4

CVE from world (Unicast)

2340236

0.4

3506.0

CVE from rest (Unicast)

925035

0.2

1385.8

NodeMgr (Unicast)

471784

0.1

706.8

3060

0.0

4.7

574598837

100.0

860822.2

Other Multicast Total

Figure 3 shows the corresponding non-TCP unicast traffic. The TCP traffic is a constant background level throughout, and was part of the event monitoring process; this would not be present in normal use and is not considered further in this paper. •

Audio traffic accounts for the vast majority of the network traffic (91%). This accords with our observations from other trials using peer-to-peer audio in the absence of video. After time 1000 (before the start of the show) the audio traffic level is very stable at about 800 Kbits/second throughout the show. This background audio level corresponds to realtime audio from the 11 users (host, two captains, eight inhabitants). This level is constant because the normal silence detection mechanism was disabled. This was done on site to avoid audible artefacts in the broadcast. Consequently, every user sends audio traffic continuously. If silence detection had been used the average audio bandwidth would have fallen from 784 kbit/s to 319 kbits/s (80.5% of the resulting total). As discussed in section 4.2, the peak bandwidth would not have changed. There are small bursts of additional audio traffic, for example around time 3200 and 4500. These localised bursts of audio are sound effects generated during the games.



There is a single video stream for the show host. This moves between two steady levels of about 21 Kbits/second and 49 Kbits/second. The lower bandwidth corresponds to times when the host is not in front of the camera (e.g. before the event starts around time 2400), while the higher bandwidth is when the host is present, i.e. for most of the game. This variation is because of the JPEG compression for each frame.

Table 1: main network traffic components Figure 2 shows how the multicast components of this traffic over the course of the show (showing one-minute average bandwidths). OOTW show 2, multicast traffic breakdown 1e+06 Total Multicast Audio Video Application Other

900000 800000

Bandwidth (bits/second)

700000 600000 500000



60000

The application (CVE) traffic accounts for changes in the virtual world, such as: users moving, the appearance and disappearance of speech balloons (which represent audio activity), and the activity of objects within the worlds (e.g. the “space frogs” from the first game, which hop away from approaching inhabitants). This traffic has a fairly consistent background level of about 20 Kbits/second, rising to a maximum of 56 Kbits/second (one minute average) around time 4500 (the race). The unicast traffic accounts for 0.7% of the total traffic, and is concentrated at the start of the period, before time 600. Unicast traffic in MASSIVE-2 has three roles: state transfers, ensuring multicast reliability, and point-to-point communication (e.g. the world manager imposing constraints on individual users and objects):

50000



State transfers appear to be concentrated in the first 600 seconds (as we would expect). This period accounts for 60% of the total UDP unicast traffic.



There are some examples of unicast traffic related to particular stages of the show, between the world machine and other machines, for example around times 1200, 1900, 2900.



There is also some reliability-related traffic (which reports and replaces missing multicast packets, to ensure that every participant has a complete view of the virtual world). This

400000 300000 200000 100000 0 0

1000

2000

3000 Time (seconds)

4000

5000

6000

Figure 2: multicast traffic for show 2 OOTW show 2, unicast UDP traffic breakdown 80000 (Application multicast) To and from World Other machines Node Manager

Bandwidth (bits/second)

70000

40000

30000

20000

10000

0 0

1000

2000

3000 Time (seconds)

4000

5000

6000

Figure 3: non-TCP unicast traffic for show 2

36

required (in MASSIVE-2) to just signal that the process is still alive. From the above we can predict an average bandwidth requirement for a single participant at each level of participation, considering all media:

appears to be of the order of 10% of the unicast traffic, or equivalently less than 1% of the corresponding multicast traffic. It is quite peaky in character, though it bears no clear relation to variations in the multicast traffic. •

Node manager traffic, primarily timing checks, accounts for around 12% of the total unicast traffic. It runs consistently at around 700 bits/second.

3.2 Bandwidth and Levels of Participation In the previous section we explored the network traffic from the second of the four showings of OOTW, and described how it breaks down into the system’s constituent activities. In this section we relate the multicast network traffic (98.2% of the total) to the different levels of participation used in OOTW which were introduced in section 2.





Host

125000

audio, video, application

Captain

80300

audio, movement (4-part, tracked)

Inhabitant

74600

audio, movement (1-part, joystick)

Camera

340

keep-alive, degraded movement

We now move from direct consideration of network traffic to consider user activity (in particular movement and speaking) during the shows. In particular, we:

Only the host has a video stream, which consumes around 49 Kbits/second when they are on-screen (around 20 Kbits/second when they are not). So the host’s bandwidth requirements are naturally 49 Kbits/second greater than an equivalent non-video participant.



analyse the overall rates of moving and speaking as seen by the network, and relate these to previous studies; and

• explore the effects of activity constraints on user movement. In this analysis we will focus for the most part on the 32 inhabitants (members of the public) who took part in the four shows – the host and performers had rather different roles and constraints on their activities. Also, in a full-scale inhabited television event we expect the inhabitants to greatly outnumber the performers.

The host, captains and inhabitants all have real-time audio which contributes 72.8 Kbits/second of audio network data. As already noted silence detection was disabled to improve the perceived quality and reliability of the audio channel, so all potential speakers send audio data all of the time. The situation for virtual world (application) data is more complicated. It is not possible to distinguish the traffic from the Host (it appears within the multicast traffic from the world as a whole). It is, however, possible to identify the CVE traffic for each team leader and inhabitant. We find that:



Components

4. USER ACTIVITY





Bits/s

Table 2: total average bandwidth for a single participant at each level of participation

Firstly, there is simple relationship between level of participation and media, which has a direct impact on the network. Specifically: •

Level

4.1 Movement We begin this user activity analysis by considering the movement of each inhabitant’s embodiment within the virtual world. From a network perspective this is interesting because movement update messages need only be sent when an embodiment is moving; no traffic is needed while it is stationary. In OOTW an embodiment can move because:

The team captains generate around 7500 bits/second on average, over four times the average for an inhabitant. This is as we might expect, since each captain’s embodiment comprises four components (head, body, left hand, right hand) and, being tracked, they appear to move continuously as far as the system is concerned. The combined host and world traffic is only a little more than a single inhabitant’s traffic. The inhabitants have an average bandwidth (in this show) of around 1800 bits/second. There is considerable variation between inhabitants, from 1230 bits/second to 2379 bits/second, a range of 1149 bits/second. This traffic comprises movement updates and the appearance and disappearance of their speech balloon.



the user that it represents is causing it to move, e.g. an inhabitant moving their joystick or a team captain moving a tracker;



the management system may be imposing (or changing) movement constraints on the embodiment, which are causing it to move; or



there may be noise in the system which cause the application to see apparent movement (this applies to the team captain’s magnetic trackers).

Overall movement rates

The cameras (four used for the broadcast plus one additional observation machine for diagnostics and logging) have an average bandwidth of only 340 bits/second (maximum for one camera of 470 bits/second). Even the highest of these is about a quarter of that for an average inhabitant; the average is less than a fifth. This figure is lower because the network update rate of the cameras was reduced significantly compared to inhabitants. Consequently, the bandwidth requirement approaches that

For all inhabitants (but ignoring the host and performers) in all shows we find that the mean level of apparent movement is 55.5% (SD 13.2%), with a range of 38% to 85%. So, on average, each participant spends 55.5% of the show apparently moving as viewed by the system (i.e. generating update events). Figure 4 shows the cumulative distribution of movement rates for all participants in all shows.

37



Cumulative distribution of % time moving, all shows, all participants

100

% of participant-shows

80

60

Inhabitants Captains Host



The interface for inhabitants in OOTW is a single joystick (and headset for audio). The previous trials used primarily mouse-based interfaces, with some use of the keyboard. The use of the joystick may make movement easier for longer periods. Further research would be needed to characterize the importance of these possible factors.

40

20

0 0

20

40

60

80

100

% time moving

Correlation of movement In previous studies we have also examined the extent to which a group of participants coordinate their movements, i.e. moving independently or simultaneously. Our previous observations have been that there is slightly more concerted movement than would be expected by chance, although the overall distribution is still close to random independent movement.

Figure 4: cumulative distributions of movement rates by level of participation Note that the host has a consistently low movement rate, averaging 12.9%, because the host is moved and positioned solely under the control of the event management system. The team captains have consistently maximal movement rates, all 100%, because there is sufficient noise in the magnetic tracking system (as used) that the team captains always appear to be moving, as viewed by the system.

Figure 5 shows the distribution of the number of inhabitants moving simultaneously for all shows, after compensating for the effects of management control (i.e. ignoring movements which are due solely to the application of constraints).

Movement and the management interface

Correlation of compensated movements, all shows, inhabitants only 3000

In order to further understand the level of inhabitant movement observed we must differentiate between the two reasons which can cause an inhabitant to move: •

Actual Uncorrelated 2500

the user is moving the joystick; or 2000



Duration (seconds)

the event management system is moving them through the movement constraint mechanism. For the inhabitants, if we ignore movement caused only by constraints, we find that the mean rate of movement is 44.5% (SD 15.1%) with a range of 24.2% to 82.2%. So, the imposition of constraints has produced on average 10% additional apparent movement and corresponding network traffic. This influence ranges from 13.8% additional movement in the case of the least active inhabitant to only 2.8% in the case of the most active inhabitant.

1500

1000

500

0 1

This compensated level of movement should now reflect the actions of the users themselves. This is a much higher rate than we have found in previous trials such as the BT/JISC-funded Inhabiting the Web (ITW) trials with MASSIVE-1 (19.6%) ([5]), EC COVEN project trials with dVS (15.6%) or COVEN project trials with DIVE (26.3%). In comparing this to the much lower levels from previous trials we identify the following differences: •

The inhabitants in OOTW had access to no other applications or input devices, and so had no alternative activity within the computer system other than engaging in the show. They did this through moving and speaking. The previous trials had the potential for additional parallel activities (e.g. in other applications on the user’s machine) in addition to activity within the virtual world.

2

3

4 5 6 7 Number of participants moving simultaneously

8

9

Figure 5: correlation of inhabitant movement, excluding constrained motion. We note that there is (relatively) a huge amount of time when all eight inhabitants are moving simultaneously (about 4.5 minutes per show). If the effects of constrained movement are included then this rises to around 10 minutes per show. This distribution is very different from the distributions observed in previous teleconferencing type trials. The much greater correlation of movement is almost certainly primarily due to the structure and character of the games. For example, the first game involves all of the inhabitants running about after space frogs, and the race involves all of the inhabitants pulling the jet car together to the finishing line. Clearly, the system and network must be provisioned in anticipation of significant periods with every participant moving. This bears out the reservations which we

The most important distinction is the kind of task being undertaken within the virtual environment. The previous three sets of trials were all essentially tele-conferencing applications, with some additional themed content. In OOTW, however, movement is a critical part of all of the games. For example, in the first game arena, the inhabitants must chase “space frogs” towards their team leader, while in the final race they drag a “jet car”, carrying their leader, along a winding track.

38

have expressed in previous analyses concerning the likely taskdependency of measures such as this ([5]).

Overall audio rates For all inhabitants in all shows we find that the mean level of apparent audio participation is 29.3%, standard deviation 11.9% and range 9.6% to 62.7%. Figure 6 shows the cumulative distributions of audio rates for all levels of participation in all shows.

The constraint-driven fully coordinated movements are primarily the transitions between arenas on the travellators, when all of the participants are moving at the same time. All of the inhabitants are also moved about at the end of each game, when they are gathered together at the exit of the arena (as in figure 1).

Cumulative distribution of % time speaking, all shows, all participants

4.2 Speaking

100

Having considered participant movement in the previous section we now consider audio participation rates. Each participant (host, inhabitant and team captain) had an open microphone which they could use to talk to the other participants. In the case of the inhabitants and the team captains this had a microphone/headphones combination which allowed hands-free real-time audio interaction.

Inhabitants Captains Host

% of participant-shows

80

60

40

Silence detection The audio communication was provided by a general-purpose packetised audio server, one per participant, which used multicast peer-to-peer audio data distribution. In the previous trials which we have analysed a silence detection algorithm has been used by the audio software so that audio packets are only transmitted when sounds are being received by the microphone. This creates a direct link between speaking (or at least the production of noise) and the generation of network audio traffic. It was our intention to use the same thing in OOTW. However, when sound-checking the system in rehearsal it was decided that the silence suppression mechanism should be disabled. This was done because the soundscape was already quite chaotic (eleven potential speakers plus sound effects) and the silence suppression was making things more difficult (e.g. losing the beginnings of quiet phrases or cutting off altogether in very quiet utterances).

20

0 0

20

40

60

80

100

% time moving

Figure 6: cumulative distribution of audio rates by level of participation The host and performers had consistently high apparent audio rates, average 79.1% and 67.3%, respectively. We believe that only part of this was due to their actual speech; the rest is likely to be picked up from the audience and the house PA (the performers and host were in or next to the theatre space). In previous trials we have observed the following audio activity rates: ITW (26.4%) ([8]), COVEN dVS (5.2%), and COVEN DIVE (8.1%). The inhabitants’ mean audio rate of 29.3% is directly comparable with that for the ITW, but much higher than that found in the other trials. We believe that the audio rate for ITW was artificially inflated by the use of a less effective silence detection mechanism, and by the use of open speakers (leading to feedback into the system) at one site ([8]). We conjecture that the higher speaking rates for inhabitants in OOTW may be due to the higher pace and more structured roles and interaction in OOTW compared to the previous trials in which we have been involved. These have tended to be rather pedestrian and disorganized, with no clear patterns of responsibility and less focused involvement.

Consequently, from the perspective of the network, it became as if every participant was speaking all of the time. This contributes to the fact that audio traffic dominates the total network bandwidth (e.g. 91% of the traffic in show 2 - see section 3.1 and table 1). In the circumstances it is fortunate that we did not assume a lower level of peak audio activity in provisioning our network and systems. Whilst the network’s view of audio traffic was rather degenerate, we would also like to explore what would have happened if silence suppression had been retained. This allows us to apply our experiences in OOTW to other settings in which silence suppression is used. It also allows us, to a first approximation, to reason about how much participants actually spoke, and to explore whether this was influenced by the various distinctive elements of OOTW.

Correlation of audio As with movement we also wish to consider the extent to which a group of participants coordinate their speech, i.e. speaking independently or simultaneously. Our previous observations have been that there is slightly less concerted speaking than chance, although the overall distribution is still close to a model of independent activity (extremely close in the case of the COVEN DIVE analysis). Figure 7 shows the distribution of the number of inhabitants “speaking” simultaneously for all shows. Again we restrict our consideration to inhabitants because of the different characteristics of the other participants, especially the host, which would affect the analysis.

For this post-event analysis we have approximated the operation of the system’s normal silence suppression algorithm by applying it to the 56 bytes of audio sample data from each audio packet which are preserved in the network traffic log (captured by tcpdump). This is 17.5% of the total audio data on which the algorithm would normally operate, and so it is likely to underestimate the amount of speaking that would have been detected had silence suppression been used in the event. Having said this, it should give quite a good approximation for our purposes.

39

Correlation of speaking, all shows, inhabitants only

5.1 Levels of Participation

3000 Actual Uncorrelated

We observed in section 3.2 that level of participation has a profound effect on the bandwidth requirements of a given participant. This observation motivates two main proposals, below.

2500

Duration (seconds)

2000



The concept of levels of participation is a potentially powerful tool to use when managing and provisioning the network for an inhabited television event. For example, given a quota of participants at each level it is possible to predict the total bandwidth requirements. Alternatively, given a total available bandwidth it is possible to allocate varying fractions of it to each level of participation, to adjust the form and balance of participation in an inhabited television event.



It should (in principle) be possible to add many more camera-like disembodied observers. With further optimisations it should be possible to add camera-like observers to the world without adding anything directly to the multicast traffic. This would allow us to support very large online audiences. However, these observers will still add extra load to the state transfer and multicast reliability mechanisms employed. In the case of MASSIVE-2 additional unicast UDP would have been generated by those users for state transfers and when multicast packets were lost. Even in much more sophisticated schemes, such as Scalable Reliable Multicast ([4]), additional receivers will affect traffic and network behaviour. This must be taken into consideration when determining how many observing processes might be supported by a particular system.

1500

1000

500

0 1

2

3

4 5 6 7 Number of participants speaking simultaneously

8

9

Figure 7: correlation of audio for all inhabitants. The deviation from uncorrelated activity is much less than for movement, but still considerable. For example, one might be inclined to discount the likelihood of all inhabitants speaking simultaneously based on the random model (a total of half a second over all shows). In fact, there are a total of more than two minutes recorded with all eight inhabitants speaking simultaneously. These characteristics are clearly very different from some of the traditional assumptions of turn-taking in online audio conferences: “For example, in an audio conference the data [audio] traffic is inherently self-limiting because only one or two people will speak at a time…” [10, section 6.1]

5.2 Highly Correlated Activity

We have conjectured in previous analyses that significant bursts of correlated audio activity could arise in worlds with common elements of interest and we believe that this is borne out by this analysis. Event elements such as team dynamics (e.g. cheering/heckling), response to game events (e.g. scoring, approaching the finish) and coordinated phases (e.g. being released to start a game) might all tend to produce coordinated vocal activity.

We believe that the analysis of Out Of This World – when compared to previous studies – demonstrates that patterns of audio activity and movement are profoundly dependent on the applications and interfaces used. In particular, a highly structured event such as this one is liable to produce highly correlated bursts of intense activity. This means that any notion of design or provisioning based solely on average levels of activity must be treated with extreme caution. For systems which anticipate high levels of concurrent activity we propose the following approaches to enhancing scalability and consistency of performance:

It is not possible in this analysis to rule out the possibility of external interference. For example, loud noises in the environment might be picked up by all the inhabitants’ microphones. A much more controlled study and isolated environments would be needed to assess this possibility. However, we note that such external influences could be significant in real situations as well, for example, if an on-line virtual event is linked to a television broadcast then that would serve as a possible global source of coordinated interference.



In any case, we believe that the possibility of coordinated audio activity must be considered very carefully if the quality of the experience is to be assured.

5. RESPONSE In this section we will respond – at a technical level – to the two most distinctive features of our analysis: the significance of level of participation in measured activity; and the high observed incidence of simultaneous activity. These are considered in turn.

40

Simple peer-to-peer audio, even multicast, with silence suppression, becomes increasingly demanding where rates of participation and correlation of activity are high. The alternatives are to adopt a relatively rigid form of floor control (which may be suitable in some applications) or to introduce server audio components which can mix multiple audio streams “in the network”. Such an approach could still use multicasting to distribute the final composite audio channel, but the variability in network and processing requirements could be localised to these (potentially distributed) server components. There will also be necessary sacrifices in using such an approach, such as loss of perspeaker separation and increased latency. An example of such an approach is the crowd aggregate provided in MASSIVE-2 ([7]) which mixes the audio of its members and re-broadcasts a composite audio stream to the rest of the virtual environment.



Many CVE systems have per-user flow control limits, but no overall (per world, per region, etc.) flow control limits. For example, in MASSIVE-2 each user sends rate-limited updates when they move, but this rate limit is independent of the other activity in the world. To cope with higher levels of participation, and especially with variable levels of participation or activity, these relatively elastic communication activities could adapt to the current overall level of activity within the system. For example, if everyone is moving at the same time then the individual rate of movement updates is reduced accordingly. Consequently the overall world activity remains bounded, even in the face of unpredictable and highly correlated bursts of user activity. This is analogous to (though potentially more complicated that) the rate control mechanism used for RTCP (Real Time Control Protocol, [10]) packets, which effectively shares out a collective bandwidth allocation between all current session members. We believe that the combination of these two approaches – distributed abstraction services and adaptive clients – will allow the construction of more scalable CVEs which provide graceful response under variable patterns of load (e.g. variable user activity).

7. ACKNOWLEDGMENTS We acknowledge the support of the European Community for the eRENA project under the ESPRIT IV Intelligent Information Interfaces (I3) programme, the EPSRC for the Multimedia Networking for Inhabited Television project under the Multimedia Networking Applications programme, and BT Laboratories for the Network Architectures for Inhabited Television project.

8. REFERENCES [1] Steve Benford, Chris Greenhalgh, Chris Brown, Graham Walker, Tim Reagan, Paul Rea, Jason Morphett and John Wyver. Experiments in Inhabited TV. CHI 98 Late Breaking Results (Conference Summary), April 1998, ACM Press, ISBN 1-58113-028-7, pp. 289-290 [2] Bowers, J., Pycock, J., and O’Brien, J. Talk and Embodiment in Collaborative Virtual Environments. Proc. CHI 96, April 1996, ACM Press, ISBN 0-201-94687-4, pp. 58-65. [3] Bowers, J., O’Brien, J., and Pycock, J. Practically Accomplishing Immersion: Cooperation in and for Virtual Environments. Proc. CSCW 96, November 1996, ACM Press, ISBN 0-89791-765-0, pp. 380-389. [4] S. Floyd, V. Jacobson, and S. McCanne. A Reliable Multicast Framework for Light-Weight Sessions and Application Level Framing, Proc. ACM SIGCOMM 95, August 1995, pp. 342-356. [5] Chris Greenhalgh. Analysing movement and world transitions in virtual reality tele-conferencing. Proc. ECSCW 97, John A. Hughes, Wolfgang Prinz, Tom Rodden and Kjeld Schmidt (eds.), 1997, Kluwer Academic Publishers, ISBN 0-7923-4638-6, pp. 313-328. [6] Greenhalgh, C. M., and Benford, S. D. Supporting Rich And Dynamic Communication In Large Scale Collaborative Virtual Environments. Presence: Teleoperators and Virtual Environments, Vol. 8, No. 1, February 1999, MIT Press, pp. 14-35. [7] Chris Greenhalgh, Steve Benford, Ian Taylor, John Bowers, Graham Walker, and John Wyver. Creating a Live Broadcast from a Virtual Environment. Proc. ACM SIGGRAPH 99, August 1999, ISBN 0-201-48560-5, pp. 375-384. [8] C. Greenhalgh, A. Bullock, J. Tromp, and S. Benford, Evaluating the network and usability characteristics of virtual reality tele-conferencing, in Telepresence, P.J. Sheppard and G.R. Walker (eds.), Dordrecht: Kluwer Academic Publishers, ISBN 0-412-84700-0, pp. 170-207. [9] Nakanishi, H., Yoshida, C., Nishimura, T. and Ishida, T. FreeWalk: A three-dimensional meeting place for communities, in Community Computing – collaboration over global information networks, Ishida, T. (ed.), Wiley, 1998, ISBN: 0-471-979651, pp. 55-89. [10] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: A Transport Protocol for Real-Time Applications, IETF RFC 1889, January 1996.

6. CONCLUSIONS AND FUTURE WORK In this paper we have presented an analysis of Out Of This World, an experimental inhabited television event. In particular we have analysed the patterns of user and network activity that arose during this event. This provides a contrasting application of CVEs, compared to previous studies of tele-conferencing application. As anticipated, we have observed very different patterns of user activity in this application. Specifically: •

The structuring of the event into levels of participation has a major influence on the activities and network requirements of participants in each level.



The highly structured, relatively highly-paced, and coordinated nature of the event produces highly correlated bursts of user activity.

Both of these observations should beaddressed in future CVEs in support of applications such as inhabited television. There are two broad areas for future work, following on from this analysis. First, we have identified four specific technical proposals: the use of levels of participation to facilitate the management and provisioning of inhabited television and other CVE events; the introduction of viewing processes with very low additional network requirements (and their control and coordination); support for scalable distributed media, e.g. distributed audio mixing services; and group rate-adaptation for scalable CVEs. We intend to explore these within the development context of our next-generation CVE system, MASSIVE-3. Second, the number of analyses of this form is still very limited. We now have initial studies of tele-conferencing, and of inhabited television. Presumably other applications, and other groups of users, will behave very differently.

41

Suggest Documents