QoS Management for Live Videos in Networked Virtual ... - CiteSeerX

2 downloads 738 Views 80KB Size Report
Incorporating live videos into 3D CG (Computer Graphics) spaces provides richer .... We received support from the grant “Specified Research” at Hiroshima City ...
International Conference on Virtual Systems and Multimedia 1996 in Gifu, Sept 18 - 20, 1996

VSMM’96 in Gifu

QoS Management for Live Videos in Networked Virtual Spaces Masatoshi Arikawa† Akira Amano† Kaori Maeda† Reiji Aibara†† Shinji Shimojo††† Yasuaki Nakamura† Kaduo Hiraki† Kouji Nishimura†† Mutsuhiro Terauchi† ( † Hiroshima City University †† Hiroshima University ††† Osaka University ) http://www.h2o.hiroshima-cu.ac.jp/

Abstract Incorporating live videos into 3D CG (Computer Graphics) spaces provides richer virtual spaces compared with virtual spaces composed of only 3D CG components. This paper discusses good points and problems of some typical applications using live videos through cameras and remote controllable facilities. LoD (Level of Detail) functions use the distance from a user's viewpoint to an object in a virtual scene in order to decide the quality of graphics representation of the object. The LoD is important to manage the QoS (Quality of Service) of live videos in virtual spaces. Some generalized LoD will be presented for the QoS management of live videos in networked virtual spaces.

1 Introduction VRML (Virtual Reality Modeling Language) is getting popular on Internet. VRML version 2.0 provides the functions of changing the scenes. Advanced applications of VRML version 2.0 are multi-user spaces. We can be aware of each other in the multi-user spaces. Real time data make 3D CG (Computer Graphics) spaces much richer worlds. However, current style multi-user spaces use only the current location and direction of a user's viewpoint because the network band width of the current Internet is not so broad. We are experimenting incorporating live videos into 3D CG spaces using high speed computer networks and computer graphics workstations. The capability of current computer technology is not enough to deal with live videos in 3D CG spaces. We must find good ways of using this limited current computer technology effectively. The quality of service (QoS) for treating live video in 3D CG spaces is decided by some parameters of live videos such as resolution, frame rate and delay. The QoS of live videos in virtual spaces should be managed using the rules of LoD (Level of Detail) that is the function of the distance from a user's viewpoint to a component object. For example, if an object is near to a user, the object with the highest quality detail should be put in the scene. If the object is far from the user, the object should not be put in the scene or the object with rough representation should be put in the scene.

2 Incorporating Live Videos into 3D CG Spaces This section shows good points and problems of incorporating live videos into 3D CG spaces using three typical examples.

467

VSMM’96 in Gifu

2.1 Live Videos in 3D CG Spaces Assume that there are many televisions in a 3D CG space. Due to limitations of CPU capability and network band width, it is difficult to realize fine resolutions and high frame rates for all these televisions. LoD functions are important to allow us to walk through smoothly in a complex 3D CG space. The LoD functions use the distance from a user's viewpoint to an object in a 3D CG scene in order to decide the quality of graphic representation of the object. For instance, when users are going closer to a television placed in a 3D CG space, the QoS of the video on the television becomes higher. If we use LoD, all QoS of all video data are controlled by only the distance from a user's viewpoint to an object in a 3D CG space. Figure 1 shows examples of LoD for pictures and videos. In addition to resolutions of video data, frame rates are also controlled by the rules of LoD. We explain basic mechanisms of LoD control of live videos in 3D CG spaces. Figure 2 (a) shows a general main loop, including “video streams capturing” and “texture mapping.” “Event Input” means operations from a user using a mouse, key board, etc., in order to move, stop and change the direction, etc. The module “Changing 3D Model” updates the scene such as changing the position of a camera, moving an object by time and deciding the LoD of an object. The module “Rendering” creates a piece of image from a 3D space to be shown on a screen. The total time of these three modules decides the frame rate of a 3D CG scene when we walk through the scene. If the frame rate is high, we can walk through smoothly. In general, software texture mapping costs are high. If the frame rate of the texture updating on a component object is high, the frame rate of the main loop becomes low. If a user's moving speed is fast, the frame rate of the main loop should be high using low resolution video streams (Figure 2 (a)). If a user is moving close to a television slowly, or stops to watch the television, high frame rate of the main loop is not needed, but high resolution and high frame rate of video stream of the television are needed (Figure 2 (b)). With the current general purpose CPU, it is difficult to incorporate even one fine video stream into a 3D CG scene. There is a special hardware to map video streams directly to texture mapping memory. The special hardware provides a high resolution and high frame rate video stream as a texture of an object. However, there are some restrictions compared to software video texture mapping (Figure 2 (c)).

Figure 1. Examples of LoD (Level of Detail) for Pictures and Videos 468

VSMM’96 in Gifu

V SM M ’96 in G ifu

(a) Fast Walk Through

(b) Stop or Slow Walk Through

(c) Hardware Video Texture Mapping Figure 2. Main Loop of Real Time 3D CG Rendering and QoS Management 469

VSMM’96 in Gifu

2.2 Live Videos Taken from Controllable Cameras If we can use a remote controllable camera, a “window” connected to the real world can be created in a 3D CG space. We can see the real world through the window in a 3D CG space (Figure 3). When we stand on the left side of the window in a 3D CG space, we can watch the right side scene in the real world. When we move closer to the window, we can see a wider scene in the real world. Thus, the window in the 3D CG space can behave as a window in the real world. To realize the window in 3D CG spaces, the viewing vector from the user to the window should be kept in the same angle of the controllable camera in the real world. Also we need to skew the image taken from the real world to fit the correct geometry for mapping the video stream. There are delays in the camera control, therefore we need synchronizing mechanisms. Viewing vector, zoom factor, focus, image size, shutter speed and time should be synchronized with a user's viewpoint when the image is captured. By using viewing vector information, we can constrain a user's viewing vector in the virtual space. As the QoS of a live video should be decided by the angle as well as the distance of a user's viewpoint to an object, an extended LoD is needed to control the live video.

2.3 Live Videos with Remote Controllable Facilities In Section 2.2, a camera can be controlled from a remote site. This section introduces another remote controllable facility, that is, a turn table, to realize a virtual space (Figure 4). A 3D object in the real world, such as a sculpture, is mapped into a 3D CG space as a component in it. A user can observe the object of the real world from an arbitrary angle in a virtual space. In the example of Figure 4, if a user goes around the object mapped from the real world in counterclockwise in a 3D CG space, the plane for mapping a live video is also rotated in the same angle of the viewing vector of the user. On the other hand, a remote controllable turn table on which the object is placed is turned around in clockwise in the real world, while a user is walking around in counterclockwise in the 3D CG space. Thus, remote controllable facilities make a fusion of 3D CG spaces and parts of the real world.

3. QoS Management of Live Videos in 3D CG Spaces LoD is designed for managing the quality of static data such as geometric and picture data. When we treat live videos as components in virtual spaces, LoD must be extended. The QoS of a live video on a component object in a virtual space is defined by many factors such as resolution, frame rate, delay and consistency. One example of the consistency of a virtual space is that the viewing vector from a user to a video texture plane in a 3D CG space should be the same as the viewing vector of the corresponding camera in the real world. In other words, the location of the user in the 3D CG space must be synchronized with the current video frame on the plane. The synchronization imposes a delay on the user. The QoS for a component object in a virtual space is called “Partial QoS (PQoS).” On the other hand, the QoS for a virtual space is called “Global QoS (GQoS).” Users specify their intended GQoS such as smoothness of walking through, resolution for component objects, and importance levels for classes of component objects. The PQoS of one component in a virtual space is decided by some factors such as GQoS, capability of computation and network speed, and a user's viewpoint such as his/her distance, direction and moving speed.

470

VSMM’96 in Gifu

User

Window α

β

β

α

Camera

User (a) 3D CG Space

Fusion

User

Window

(b) The Real World

β

α β

α

User The Real World

3D CG Space

(c) A Virtual Space

Figure 3. Live Video with a Remote Controllable Camera

Object X Turn Table Plane

α

α α

User Camera (a) 3D CG Space

Fusion

(b) The Real World

Object X α

User

(c) A Virtual Space

Figure 4. Live Video with Remote Controllable Facilities 471

VSMM’96 in Gifu

4. Fusion of the Real World and 3D CG Spaces as Virtual Spaces 4.1 Mapping the Real World into 3D CG Spaces By using two camera image streams with viewing vectors and time information, we can obtain real time 3D information of the real world by stereo matching methods. With a camera on a one dimensional slider, we can obtain the same information. In this system, the time synchronization and the viewing vector are important for precise measurement. When mapping the real world image into 3D CG spaces, only one user can use one camera in the real world since the viewing vector varies depending on the user. One solution for the arbitration is to limit users' access to these resources. Another solution is to generate images from arbitrary viewing points from multiple camera images.

4.2 Mapping 3D CG Spaces into the Real World Viewing computer display is one way to map 3D CG spaces into the real world. However, there are many other ways to map virtual spaces into the real world such as displaying with a large video projector. In this case, the virtual spaces can be seen by anyone at any time. When we use a large projector screen for displaying virtual spaces, we must think of viewing vectors of users because every face on the screen can be recognized as if they are looking at the users. Therefore, we must measure their viewpoints in front of the screen and generate appropriate images.

4.3 Mapping the Real World into the Real World Mapping the real world into the real world is a method to merge two distant rooms into a single room. It is important to put some additional information into the real world image as 3D CG components. For example, we can add some caption information or 3D CG models into scenes in the real world. To realize this, we must recognize object names and 3D data from the real world image using computer vision technique.

5. Concluding Remarks Virtual space applications with live videos must be very attractive. In order to deal with live videos in networked virtual spaces, we introduced Partial QoS (PQoS) as an extension of LoD. It is not simple to manage PQoS of all component objects in a scene. We need to find a trade-off between a user's specification (a kind of Global QoS) and the current computation and network capability. We will construct a principle of the PQoS management for live videos in virtual spaces.

Acknowledgments We received support from the grant “Specified Research” at Hiroshima City University. NTT (Nippon Telegram and Telephone Corporation) provides us with high speed ATM networks for “Joint Utilization Tests of Multimedia Communications.”

References [1] VRML Forum. [www]http://vrml.wired.com/ [2] Issues and Challenges in ATM Networks, Communications of the ACM, Vol. 38, No. 2, Feb. 1995. [3] M. Arikawa, A. Amano, K. Maeda, R. Aibara, S. Shimojo, Y. Nakamura, K. Hiraki, K. Nishimura and M. Terauchi: Dynamic LoD for QoS Management in the Next Generation VRML, IEEE, Proc. of the Int'l Conf. on Multimedia Computing and Systems, Hiroshima, June 17-23, 1996, pp. 24 - 27.

472