Authoring and User Interaction for the Production of - Semantic Scholar

4 downloads 808 Views 389KB Size Report
production of a sound field for a large listening area ... often copied in user interface layouts of digital mixing .... The several tools and sources could be ar-.
Authoring and User Interaction for the Production of Wave Field Synthesis Content in an Augmented Reality System Frank Melchior, Tobias Laubach Fraunhofer IDMT [email protected] Abstract Wave field synthesis (WFS) enables the accurate reproduction of a sound field for a large listening area with correct characteristics for each listener position. An exact perspective on the synthesized wave field is provided for every listener. Therefore, WFS-technology is ideally suited to be combined with augmented reality systems, where every user perceives his own visual perspective of a given scene. This paper presents a concept for authoring and user interaction for the production of wave field synthesis content in an augmented reality system. Also, the implementation of a prototype WFS-AR System based on ARToolkit is explained.

1. Introduction Historically the design of user interfaces for composing auditory scenes is strongly related to the development of the required hardware, e.g. the design of an analog mixing-desk is directly motivated by the layout of the required circuits. The design of analog consoles is often copied in user interface layouts of digital mixing desks even if it is no longer required from a processing hardware point of view. It enables a fast adaptation for users of analog consoles. However, it results in an interface not designed from the user’s point of view. Another important evolution in the audio world is the increasing integration of mixing and editing through the development of digital audio workstations. As a result of the separation of signal processing and user interfaces in the edge of digital audio processing it is not longer necessary to design a mixing console or workstation in the conventional way. The design can start from a usercentered view and can use modern user interface techniques like those enabled by Augmented Reality. Another important point comes up through the use of Wave Field Synthesis reproduction. Here, it is no longer useful to store separate audio tracks for every reproduction speaker. Instead, it is required to store audio data in an object-oriented format [1]. The goal of this work is the design and implementation of a complete authoring system for auditory scenes especially for WFS reproduc-

Diemer de Vries Delft University of Technology [email protected] tion. The user interaction should be based on Augmented Reality user interfaces, paradigms and metaphors. This is useful for systems which enable three-dimensional audio reproduction in an object-oriented way. The user interface can uses the rich visualization and interaction techniques to enable the interaction with complex spatial auditory scenes and is no longer limited by conventional mixing consoles or interaction devices [2][3].

2. Background and related work 2.1. Wave Field Synthesis In the late eighties Berkhout proposed a fundamentally new concept for sound reproduction [4][5]. In contrast to all existing methods, the new concept called Wave Field Synthesis is a volume solution. This means that the systems based on wave theory, generate an accurate representation of the original wave field in the entire listening space. The Kirchhoff-Helmholtz and Rayleigh representation theorem form the theoretical foundation of the concept. For practical applications the ideal setup of loudspeaker planes around the listening area could be reduced to a linear array. Basics of WFS and corresponding equations and references can be found in [6]. Equation 1 gives the Kirchoff-Helmholz Integral, which is the basis for the wave field synthesis theory. PA =

1 ⎡⎛ 1 + jkr e ( − jkr ) ⎞⎤ ⎞ ⎛ ⎟⎥dS cos ϕe ( − jkr ) ⎟ + ⎜⎜ jωρ 0Vn ⎢⎜ P 2 ∫ 4π S ⎣⎝ r r ⎟⎠⎦ ⎠ ⎝

(1)

The corresponding geometry is given in Figure 1. With the help of (1) the pressure in a point A can be calculated if the wave field of an external source distribution is known at the surface S of a source-free volume containing A. The first term of the integrand of (1) represents a dipole source distribution driven with the strength of the pressure P at the surface, while the second term represents a monopole source distribution driven with a normal component of the particle velocity

Vn at the surface. The original source distribution is called primary source distribution; the monopole and dipole sources are called secondary source distribution. Since A can be anywhere within the volume enclosed by S, the wave field within that volume is completely determined by (1), whereas the integral is identically zero outside the enclosure. The Kirchhoff integral can be simplified by the cost of a fixed surface geometry and a non zero wave field outside the closed surface. The wave field inside the closed surface is correctly described by these solutions, known as the Rayleigh I and II integrals. The derivation of the Rayleigh integrals can be found in [6] among others. Based on these integrals the wave field of any primary source distribution can be synthesized by a planar array of secondary sources. Analogously to the 3D situation the wave field of a primary source distribution can be represented by a distribution of secondary sources at a straight line. The 2D analogy of a secondary point source behaves as a line source in 3D i.e. instead of a 1/r -law, a 1/√r -law is followed. Therefore [6] [7] have proposed a different approach, resulting in an expression for a so called 21/2Doperator. The operator can be generalized for a primary source behind and in front of the secondary sources. By that it is possible to synthesize a sound field in the listening plane by the use of a line distribution of secondary sources. The handling of aspects like spatial sampling and other effects which occur in practical implementations of WFS with a line distribution of secondary sources have been proved in several implementations and experiments. These have shown how to handle these limitations and that a superb audio reproduction systems can be realized using the principles of WFS [6]. - + - + + + ++ - + + + + +- S -+ + +- ++ ++ ++ +n + + ϕ +A -+ + r + -+ ++ + + - + +-+ ++ ++ ++ - ++ - + + + + + ++- + - +-

Figure 1. Geometry for the Kirchhoff-Helmholz integral.

2.2. Augmented Reality Augmented Reality (AR) superimposes computergenerated graphics onto the user's view of the real world. In contrast to Virtual Reality (VR), AR allows virtual and real objects to coexist within the same space. This is possible by the use of video see-through and optical seethrough Head Mounted Displays (HMDs) which are the traditional output devices, and are still mainly used for Augmented Reality applications. Furthermore a system

to track position and orientation of the user and interaction mechanisms are required. One popular basic framework is ARToolkit [8] which has been used to explore AR-based interaction techniques [9] [10] and enables the tracking of optical markers thru image processing.

3. User interaction in the augmented reality system 3.1. Taxonomy of the interaction tasks The traditional audio production workflow can be divided into recording, editing, mixing and mastering. The AR Authoring System for Wave Field Synthesis content production is designed for the mixing, editing and mastering task. For these steps of the production the sound designer carries out the following main level tasks for each sound source: Editing, Source Attributes, Spatial Layout, Filtering, Dynamic Processing, Effects, Room Simulation. An AR authoring system has to implement all these tasks to enable the complete authoring process. This could be done in the general form of tools which have to be applied to an audio source. A source represents an audio stream delivered by the hard disk or an input of the system. In case of hard disk playback an editing tool is required. The several tools and sources could be arranged in groups and layers and the tools could also be applied on the groups and layers. The basic concept has already been presented in [1] but this concept is useful to be applied in a more general way, which enables the complete free grouping, layering and application of the tools to all other types of objects. The defined main level tasks consist of different subtasks, which could be arranged in the appropriate main tool. So after the user has selected the source or object he could edit the already applied tool or apply a tool from the library. After that, the user interface has to provide an adequate environment to do the editing. For the editing of trajectories of the spatial layout a detailed taxonomy has been defined in [2], including an idea for the visualization of parameters over time with the help of a trajectory. Another approach is to transform the parameters of a source or rather their timbre into the visual appearance of the source. This could be useful in the case of filtering, e.g. a sound with much energy in the high frequencies could be visualized as a very edgy and sharp shape. Examples of this approach can be found in [11]. For the room simulation relevant parameters like room size or reverb time could be presented as a room model with an adequate visual appearance. If a perceptual approach for the editing is used, or a version of a room should be built which could not be displayed using a conventional room,

the impulse response could be used in a visualization of the source. The dynamic processing could be visualized as a level dependent pulsation of the source, which could be manipulated with restricting graphical tools by the user.

3.2. Definition of the required interaction system Like in AR/VR systems in general, the interaction with the system could be divided into three main categories: object interaction, viewpoint manipulation and application control. Object interaction could be divided into the categories selection, positioning and orientation. When an object (sound source) is selected, the previously described tools could be applied. In the given system the object interaction and especially the object selection plays an important role for efficient user interaction. The same holds for zooming and navigating inside the scene play. The complete view of the scene by the user could be arranged as an exocentric setup, which is equivalent to the world in miniature concept [12], or as an egocentric setup. In the egocentric setup the sources are visualized at their original position inside and outside the listening area. For editing of tools assigned to a given source, this source could be selected and the tools can be made visible. Alternatively, the tools could be made visible by the use of the zooming interface paradigm [12]. The system requires interaction devices which could be carried around by the user to do the editing in the egocentric setup. These devices have to be tracked in space to enable, for example, the gesture input of trajectories. On the other hand it is important to see the possible tool and select it without filling the view of the user with too much graphical overhead. The best solution would be a two hand device, which enables the user to carry the tools around with one hand and to do the scene interaction with the other. For some operations like waveform editing it is also required to get a very exact position of the tools.

Figure 2. Exocentric (left) and egocentric (right) setups for sound source manipulation.

One possible solution gives the personal interaction panel [14], which was developed in the Studierstube framework [15]. In the realized version of [16] all the requirements described above were met. By the use of a pen it is possible to give a very accurate positioning for editing tasks and by tracking the panel and the pen the tools could be carried around on the panel and an interaction device for gesture input is given.

4. Realization of the prototype system 4.1. Interaction mechanisms of the prototype system Because the prototypical setup should just validate the concept, only the direct modification of source positions was implemented. Two basic interaction scenarios where implemented into the prototype. Figure 2 (left) shows the exocentric setup. In this scenario a small model of the reproduction room is augmented on a tabletop. The user sees the speaker setup augmented on a sheet of 10 markers and can interact with the sound sources by the use of a paddle or a grap. The advantage of this setup is the possibility to interact with sources both outside and inside the listening area. Also this setup enables the interaction with all sources from one single position in the reproduction room. Figure 2 (right) shows the egocentric setup, where the sources are augmented at their original positions in the reproduction room. Using a bigger version of the paddle the user can modify the position of focused sources. Sources outside the reproduction room may not be positioned in this setup. The perspective of the user through the head mounted display is shown in Figure 3. A kind of egocentric setup has also been realized in [3]. Due to the use of WFS for sound reproduction in our setup it is possible to move around inside the listening area, while the audio sources remain at stable spatial positions inside and outside the listening room.

Figure 3. User View for the egocentric setup for sound source manipulation

4.2. Integration in the WFS authoring system In [1] an authoring system for WFS content production has been described. It enables the complete production of content for auditory scenes for the reproduction in the integrated WFS system. The system is equipped with a desktop graphical user interface (Spamix) for the automation and editing of the auditory scenes. Dataprocessing

Input

Camera

Mouse & Keyboard

1394 Controller

Output

AR4Spamix

Graphics card

Spamix

Processing- & Rendering modules

HMD

...

Loudspeaker panels

Figure 4. The Integration of the AR-system into the spatial audio workstation

Figure 4 shows the integration of the AR-system into the spatial audio workstation (SAW). The AR-system acts as a second front-end - communicating with the SAW via a dedicated Ethernet protocol. It is possible for two users to work simultaneously on the same auditory scene, one with the help of the SAW and the other utilizing the AR environment.

5. Future directions and Conclusions In this paper we presented a concept and a prototype system for authoring auditory scenes for Wave Field Synthesis reproduction using augmented reality interaction devices and mechanisms. This system allows the user to position sound sources inside and outside the listening area due to controlling a dedicated set of parameters of a complete desktop WFS-authoring system. After the successful realization of the simple prototype system described above, a system will be realized with all tools described in part 3. Beside the optimization of the AR-technology in terms of tracking and stereoscopic image reproduction using advanced APIs like Studierstube a detailed interaction concept has to be defined and evaluated for all the tools described. Research in the field of audio-visual perception for the combination of WFS with stereoscopic image reproduction especially for the use of HMDs has to be carried out. Currently only the combination for projection-based systems has been investigated [17][18][19]. Combining WFS and AR technology enables highly interactive and immersive audiovisual systems. Promising applications utilizing the new creative potential will reside in the field of interactive sound design.

6. References [1] F. Melchior, T. Röder, S. Brix, et al. , “Authoring System for Wave Field Synthesis Content Production”, 115th AES Convention 2003, New York [2] J. Sheridan, G. Sood, T. Jacob, H. Gardner, and S. Barrass “Soundstudio4D - A VR Interface for gestural composition of spatial soundscapes,” proceedings of ICAD’04, Sydney, Australia, July, 2004 [3] D. Dobler, M. Haller, P. Stampfl, “ASR – Augmented Sound Reality”, ACM SIGGRAPH Conference Abstracts and Applications, San Antonio Texas, pp. 148, 2002 [4] A. J. Berkhout, „A Holographic Approach to Acoustic Control,“ J. Audio Eng. Soc., vol. 36, pp. 977-995, 1988 [5] A. J. Berkhout, D. de Vries, and P. Vogel, „Acoustic Control by Wave Field Synthesis,“ J. Acoust. Soc. Am., vol. 93, pp. 2764–2778, 1993. [6] E. Verheijen, “Sound Reproduction by Wave Field Synthesis,“ PhD Thesis, TU Delft, 1997 [7] E. Start, “Direct sound enhancement by wave field synthesis,“ PhD Thesis, TU Delft 1997 [8] H. Kato, M. Billinghurst, et al. , “ARToolKit. Technical report,” Hiroshima City University, 1999 [9] H. Kato, M. Billinghurst, I. Poupyrev, “Virtual Object Manipulation on a Table-Top AR Environment,” proceedings of ISAR 2000, IEEE Computer Society, 2000. [10] V. Buchmann, S. Violich, M. Billinghurst, “FingARtips – Gesture Based Direct Manipulation in Augmented Reality,” proceedings of Graphite 2004, ACM Press, p. 212–221, 2004 [11] G. Schatter, E. Züger, Ch. Nitschke, et. al., “Intuitive grafische Nutzerschnittstellen für die elektronische Klangerzeugung mit Genetischen Algorithmen und Fuzzy-Sets,” Tonmeistertagung 2005, Leipzig, Germany [12] R. Stoakley, M.J. Conway, R. Pausch, “Virtual Reality on a WIM: Interactive Worlds in Miniature,” proceedings of CHI’95 [13] J. Raskin, “The Humane Interface: New Directions for Designing Interactive Systems,” Addison-Wesley, 2000 [14] Z. Szalavári, M. Gervautz, “The personal interaction panel – a Two handed interface for augmented reality,” proceedings of EUROGRAPHICS'97, Budapest, Hungary, pp. 335-346, 1997 [15] D. Schmalstieg, A. Fuhrmann, G. Hesina, et al., “The Studierstube Augmented Reality Project,” PRESENCE Teleoperators and Virtual Environments, Vol. 11, No. 1, MIT Press [16] G. Reitmayr, D. Schmalstieg, “Mobile Collaborative Augmented Reality,” proceedings ISAR 2001, New York, USA, 2001. [17] F. Melchior, D. de Vries, S. Brix, “Zur Kombination von Wellenfeldsynthese mit monoskopischer und stereoskopischer Bildwiedergabe,” DAGA’05, Germany, 2005 [18] W. de Bruijn, “Application of Wave Field Synthesis in Videoconferencing,,“ PhD Thesis, TU Delft 2004 [19] F. Melchior, S. Brix, T. Sporer, T. et al, “Wave Field Synthesis in combination with 2D video projection”, 24th AES Conference, Banff 2003

Suggest Documents