Scalable Transmission of Avatar Video Streams in Virtual ... - CiteSeerX

8 downloads 6318 Views 491KB Size Report
Relying on a number of servers to transcode a ... ficiently adapt bandwidth usage at client side through mul- ..... the data to the dedicated video server. F1 in Fig 4 ...
Scalable Transmission of Avatar Video Streams in Virtual Environments Peter Quax†

Tom Jehaes†

Chris Flerackers‡

Wim Lamotte†

†Expertise Center For Digital Media ‡Androme NV Limburgs Universitair Centrum Wetenschapspark 4 Universitaire Campus B-3590 Diepenbeek, Belgium B-3590 Diepenbeek, Belgium E-mail: †{peter.quax,tom.jehaes,wim.lamotte}@luc.ac.be, ‡[email protected] Abstract In this paper we present our efforts and ideas in designing a framework for networked virtual environment applications that incorporates real-time video communication between avatars. The system is primarily designed around scalability for large scale networked virtual environments. In order to achieve this, our approach is to optimize client responsibilities and make maximal use of direct client-client communication streams. Relying on a number of servers to transcode a high-quality video stream from each client into an arbitrary number of output streams is a solution that scales very poorly with a growing number of users. The main feature of importance for this paper is the ability to efficiently adapt bandwidth usage at client side through multiple multicast groups. Server tasks are reduced to the bare minimum, thereby ensuring a highly scalable end-result, depending only on the processing power of the connected clients. We do show however, that by introducing a simple server setup, the system can be adapted to be deployed on today’s xDSL based access networks.

1

Introduction and related work

Networked Virtual Environments (NVE’s) have been the subject of a large number of research projects, starting primarily with military simulations [6], evolving into dedicated research projects at universities and institutions [1, 2]. In recent years, the entertainment industry has caught up with this technology and used it to create virtual communities and, more importantly, multiplayer games. With the widespread introduction of high-bandwidth internet connections at home (using either xDSL or cable technology), these applications have become a major source of revenue for a number of firms. With the advent of increasingly advanced encoding and decoding techniques (such as MPEG1, 2 and 4), video

streaming has become a reality for use on websites and videophones alike. Commercial NVEs, and games in particular, however, do not yet make full use of the advances in this field. While games nowadays do support audio communications (mostly as add-on to games), video is limited to messenger- or videoconferencing-like applications. Recently, there has been some research on combining NVE and video technology. [3] uses two frames from a 360 degree video capture to display the textures on a simple model in the virtual world. [10] proposes video avatars as a way to enhance visual communication by using a 3D representation. [7] describes a system that focusses on stereo representation of the video avatar. [9] discusses the issues of head model reconstruction and reduction of bandwidth using bounding boxes on the video streams. Introducing video communication in NVE applications is a difficult subject due to a number of factors. In case of a traditional server-based NVE system, the systems responsible for video data distribution need to be able to accept streams coming from a potentially high number of users and distribute them among an even higher number of clients. This leaves little or no overhead for any processing to be done at server side, such as transcoding or rescaling of the streams. There is also the problem of bandwidth limitation at client side, especially when used in highly populated virtual environments. As video data consumes a lot of bandwidth, it is not practical to have each client receive a halfmegabit per second video stream from every other client in its view, regardless of the distance between them. A solution to scalable video transmission is incorporated into the MPEG 4 standard with FGS; however, there is little use of this technology in NVE applications. There is also the added complication of dependency on a single codec (e.g. MPEG-4), instead of providing a number of different codecs, depending on the desired quality/bandwidth parameters. Our system overcomes these problems by allowing clients to send video data to a number of multicast groups, each associated with a pre-determined video quality. How

Figure 1. Video Avatars.

Figure 2. General Architecture.

this is achieved is described in detail in the following sections.

multicast addresses that are in use at any given time. If a client moves to a specific region in the world, it is informed by the server of that region’s multicast address (F3 in Fig 2). Tests have shown that a server in this setup may be used to provide information for 1000+ simultaneous users. It is also feasible to distribute this server setup among a number of machines because the only synchronization that has to take place is a list of regions and associated multicast addresses.

2

General World Setup

The basis of our system was primarily built for scalability testing using a high number of simultaneously connected clients [5]. This was later extended to include a simple video based communication setup [8]. Video Avatars in our setup are implemented as 3D user representations that display a live video picture instead of modelling facial features using geometry (see Fig 1). The world setup is briefly described below to clarify the concepts that are used in later sections. The entire world is divided into a number of regions, each of which has a unique multicast address associated with it. Each client only sends data to the multicast group of the region it is located in at each given time. The data that is sent is made up of high-level actions that comprise both low level information such as positions and orientations, and high level information such as animation information. Each client has an Area-Of-Interest (AOI) associated with it which is determined by e.g. its line of sight. Depending on the regions that are in a client’s AOI, multicast groups are joined and left as the client moves around the world. This way, we achieve a highly scalable system in which clients can easily adapt the incoming data flow depending on available bandwidth or processing power. Distribution of action data in this scheme is automated and without need for any server intervention(F4 in Fig 2). It is important to note that the upstream bandwidth use of any given client is never influenced by the active number of clients in its AOI. At a higher level we also achieve easy adaptation for use on less powerful devices (mainly handheld) due to the high-level action based mechanism. Master servers are responsible for authentication/administration and are used to dynamically forward clients to a specific game server (F1 in Fig 2). There is a communication infrastructure between the master and game servers to exchange information such as network load and availability of system resources (F2 in Fig 2). The role of the game servers in this system is to track the

3

Client Responsibilities

To enable scalable transmission of video streams, we envisage a system that is by large analogous to the setup described above. The virtual world is again divided into a number of distinct regions, each with one or more multicast addresses associated with them. These addresses are distinct from the addresses used to transmit game data, to enable any client to perform a simple form of QoS : distinguishing essential from non-essential information (such as video streams). Scalability in our setup follows from the definition of a number of multicast groups that are associated with a single region in the world. Each of these groups has a pre-defined quality setting associated with it. Examples of these qualities are given in table 1. Each client is responsible for sending its video data to each multicast region with the quality parameters as defined for that multicast region. If one or more of the required quality streams are unavailable (due to, for example, lower quality of the input streams), a lower quality stream is sent to that specific multicast group. All packets sent within a specific multicast region are tagged with a quality parameter that defines the quality of the video stream that is contained within the packets. The lowest-quality setting will most likely consist of a single still frame that is retransmitted every few seconds. A client that is moving around in the virtual world is responsible for determining its own ’Video Area of Interest’ (VAOI). This VAOI contains at each given time the regions of the world that comprise video data to be received and displayed. It can continuously be updated to add or remove regions. A client can also change the video stream quality of a region already in its VAOI, depending for example the

Quality

Resolution

FPS

Bitrate

High Medium Low Minimal

CIF QCIF SQCIF SQCIF

20 15 10 5

110000 80000 30000 15000

Table 1. Sample Video Quality Parameters.

Figure 3. Video Area Of Interest Selection.

distance between the user and that region. Switching of received quality is done simply by joining the multicast group that contains the lower quality video streams of a specified region in the world. (see Fig. 3). Determination of the VAOI can be done according to a number of factors. A client that has large downstream capacity may choose a VAOI like Fig 3.A, while one with a slightly lower capacity may prefer Fig 3.B. When fast switching of multicast groups is possible, this may be optimized even further like in Fig 3.C or Fig 3.D. Note that this system is fundamentally different from scalable video codecs but does not in any way hinder their possible application for any given quality setting. When considering for example MPEG-4 FGS or H.263 scalable streams, selection of the desired output quality is performed by the receiving client by discarding the scaling information, but the total required bandwidth in either uplink or downlink direction is not affected by that client’s selection (when used in a pure peer-to-peer way). In our system, incoming bandwidth is continuously changing as new regions are entered/left and as the number of (video) avatars in the subscribed regions changes. To throttle the bandwidth in downstream direction it suffices to adjust either the size of the VAOI or to switch to lower quality groups for some or all regions. Upstream bandwidth of any given client is never influenced by the number of other users that have the client in their VAOI. Another way of combining H.263 or MPEG4 scalable codecs is to transmit the basic information to one multicast group and send the scalability information to another. By subscribing to either the basic group or combining it with the information of the group that contains the scalability information, a reduction of bandwidth is achieved. The set-up in theory allows for any codec to be implemented, thereby ensuring that an optimal one can be used for any given desired output quality and/or bitstream size. In practice, it has shown to be useful to use a codec framec as work such as FFMPEG or Microsoft’s DirectShow , this speeds up the required processing to encode streams in a number of different qualities. Possible problems with this approach are largely due to

available processing power at client-side. This comprises the time necessary to encode the input video streams into a number of output streams with different quality settings. There is no need for any further processing to be done at client side on incoming video streams (such as with MPEG 4 FGS) other than decoding, as the desired output quality is already contained within the bitstream as encoded by the sending party.

4

Server Responsibilities

The servers in the proposed architecture need not perform many additional tasks compared to the non-video setup. The required extra functionality can therefore be integrated into the same servers as in the non-video based setup. When a client connects to a game server, it is assigned a multicast group address for sending its data, depending on its starting position in the virtual world. When changing position or expanding/extending the VAOI, clients request the multicast addresses of the regions concerned from one of the available game servers. Broadcast video in NVE applications presents a new opportunity for TV stations and Information Providers in general to distribute their programs. These are often highquality streams that consume large amounts of bandwidth at client side. Servers that distribute this information to specific multicast groups in our system are situated server-side (mostly at ISP level). Subscription to these broadcast quality streams is, at client side, either done automatically or manually, depending on the downstream bandwidth available. Scalability can be demonstrated in the same way as with the non-video based setup, using a setup that simulates avatar movement and video stream distribution. Given that the standard system scales very well with 1000+ users, it is unlikely that the minor additional tasks will have a major negative impact on server scalability.

5

In-field Application Issues

Allowing individual clients to multicast large amounts of data is a policy that is seldom adopted by ISP’s at this time, mainly due to possible explosive growth of bandwidth

Figure 4. Video Servers in the Access Network.

tion, leaving some room for transfer of game data, which anyway is minimal in comparison to video data. Connections that have higher uplink capacities do not suffer these problems. Resolution is the same for each two streams due to codec performance limitations. Tests have shown the P4 1,7 Ghz system is real-time capable of encoding 4 quality streams and decoding streams from 20 clients using varying qualities. As shown in table 2, faster machines are able to decode either more streams or higher quality streams.

7 Res.

FPS

MET 1(ms)

MET 2(ms)

kbps

MDT 1(ms)

MDT 2(ms)

CIF

25

4.2

3.15

110

3.25

2.29

CIF

15

5.38

3.5

50

3.31

2.21

QCIF

25

2.92

2.12

90

0.92

0.73

QCIF

15

1.46

1.37

25

1.07

0.67

13.96

10.14

275

Total

Future Work

The framework is currently being extended with serverbased streaming of different representations of avatars such as detailed geometry, simplified geometry and image based representations.

8

Acknowledgements

Table 2. Video Timings and Measurements. usage. As a solution for this case, we propose the introduction of video-servers at DSLAM level (for xDSL networks) that enable unicast to multicast conversion. Each client that wishes to send data to a multicast group unicasts the data to the dedicated video server. F1 in Fig 4 shows the video stream on the client’s private point-to-point connection, while F2 denotes the stream on the shared network at DSLAM level. The server’s responsibility in turn is to multicast this data to the desired multicast group (F3 in Fig 4). As this server can be located in the xDSL infrastructure itself it is very likely that this kind of multicasting will be allowed at ISP level. Multicasting at DSLAM level (mostly ATM) is currently employed for broadcast quality video stream distribution for digital interactive television. With IP-based DSLAM’s, suppport for other (third party) applications is very likely to be enabled.

6

Test Results

Our implementation currently runs on a PC based setup, employing the JRTP [4] library. The system currently runs on a Gigabit LAN environment for evaluation purposes. Compression and decompression is done through FFMPEG’s avcodec library. Table 2 presents the timing and network load for four different quality settings using a 1,7GHz(1) and a 2.3GHz(2) system. Encoding of four higher quality streams is possible but leaves little room for decoding of video avatar data. It can however easily be resolved by adjusting the bitrate parameters in the codecs or choosing a lower resolution for the maximum quality stream. As shown in table 2, it is possible to fit 4 streams into a typical 384 kbit/s upstream connec-

Part of this research is funded by IWT project number 020339 and the Flemish Government.

References [1] D. B. Anderson et al. Diamond park and spline: A social virtual reality system with 3d animation, spoken interaction, and runtime modifiability. TR 2, MERL, 96. [2] T. A. Funkhouser. RING: A client-server system for multiuser virtual environments. In Symposium on Interactive 3D Graphics, pages 85–92, 209, 1995. [3] J. Insley et al. Using video to create avatars in virtual reality. In Visual Proceedings of the 1997 SIGGRAPH Conference, page 128, Los Angeles CA, 1997. [4] J. Liesenborgs, W. Lamotte, and F. V. Reeth. Voice over ip with jvoiplib and jrtplib. In 26th Annual IEEE Conference on Local Computer Networks, 2001. [5] J. Liesenborgs, P. Quax, et al. Designing a virtual environment for large audiences. In Proceedings of the 16th International Conference on Information Networking, pages 3A–2.1–3A–2.10, 2002. [6] M. R. Macedonia et al. Npsnet: A network software architecture for large-scale virtual environment. Presence, 3(4):265–287, 1994. [7] T. Ogi, T. Yamada, K. Tamagawa, and M. Hirose. Video avatar communication in a networked virtual environment. In Proceedings of INET 2000, 2000. [8] P. Quax, T. Jehaes, P. Jorissen, and W. Lamotte. A multiuser framework supporting video-based avatars. In Second Workshop on Network and System Support for Games, 2003. [9] V. Rajan et al. A realistic video avatar system for networked virtual environments. In proceedings of IPT 2002, Orlando, FL, 2002. [10] S. Yura, T. Usaka, and K. Sakamura. Video avatar: Embedded video for collaborative virtual environment. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, volume 2, 1999.

Suggest Documents