A Camera-based 3D Performance Environment Rolf Gehlhaar DLit
Luis Miguel Girão
School of Art and Design Coventry University 2007
Artshare Investigação, Tecnologia e Arte, Lda.
[email protected]
artshare[at]sap o.p t
ABSTRACT Here we present Multiverse, a camera-based 3D performance environment for one person which allows the performer t o trigger a large number of cyclical parameter values that are used to control digital frequency modulation and granular synthesis engines, as well as sound effects such as combfiltering, reverberation and spatial distribution.
Keywords non-tactile music control, performance process,
1.
computer
vision,
automated
INTRODUCTION
continue with the same system. I wanted to develop something new: a more integrated technical solution, one that would work in a 3D space, whose functionality could be seen as well as heard, and one that did not involve only the triggering and manipulation of samples. Basically, this new performing environment would have t o satisfy several conditions: it would have to be more easily transportable than the SOUND=SPACE system, its sound generation should be primarily digital synthesis and both the player and the audience should have some sort of visual referent, i.e. should be able to see and understand, to some extent, the significance of what the player is doing (in contrast to many computer music ‘performances’ in which a number of motionless people sit on stage behind their laptops, completely engrossed in minute control movements that n o one can see).
During recent summer workshops in digital art for children held at the Academia Digital in Aveiro, Portugal in 2006 [1], my colleague Luis Girão and I wanted to experiment with teaching children how to use a web camera as a sensor t o trigger images, movies and sounds. This led us to develop a simple 2D Jitter interface that the children could use and for which they could produce images and sounds during the workshops. The ultimate aim was that each participant in the workshop would create and present a short audiovisual performance. The results were very encouraging: the children understood the application and understood what kinds of visual and sound materials would work very well. We also thought that the experiment was a success. I, in particular, thought this could be the seed for the development of a new performance environment for myself.
2.
BACKGROUND
In 1986, shortly after developing my SOUND=SPACE installation, I had begun to use its ultrasonic ranging system as a solo performing environment [2][3][4]. Instead of using the ranging system to survey a large empty space horizontally, I placed the individual ranging units on the floor, looking upwards, in a figure-8 formation. By standing in the middle of this figure, I could, with my hand, arms and legs, trigger up t o 6 different sounds / parameters of sound / transformations at the same time, more or less as in the sketch below .(Figure 1.) I performed with this ”instrument” for about 18 years, both i n a solo and an ensemble context, continuously developing the software as newer, faster, more flexible hardware became available. This system did provide a very flexible and expressive environment but by 2005 I felt I had exhausted both its sound and gestural capabilities. I did not want t o
Figure 1. SOUND=SPACE performance set up
3.
THE INSTRUMENT
Consequently, I decided upon developing an interface consisting of two web cameras to be set up at a right angle t o each other, surveying the spatial volume inhabited by the upper torso of the performer, that volume accessible to the hands (Figure 2.).
The output of the web cameras is converted into a 3D matrix of 900 cells. The performer is located within this matrix. Any cell of the matrix can be programmed to function as a trigger for any desirable function. Whenever the performer’s hand appears within a given cell of this matrix, its function is activated. Furthermore, information from the matrix is used to generate a moving image projected above or next to the performer. This image is a re-representation of the performer as well as a visualisation of those cells of the matrix that have control functions (not all of the possible 900 cells are used as triggers).
The player controls these two ‘engines’ indirectly, not via gestures that trigger sounds but via gestures that start, stop or reset timers. The ticks of these timers (a total of 47 of them) are converted into specific values - via interpolation tables; these values are then sent as parameters to the two synthesis engines mentioned. Thus, MULTIVERSE is more a complex of autonomous processes over which the player has some control than a ‘traditional’ instrument. It is an expressive performing ‘beast’; the performer is only its guide. At the end of each performance the exact parameter values generated by the timers is stored in a preset. The values stored here furnish the initial set values of the next performance. Furthermore, every time I play, I endeavour to choose a different piece from my catalogue.
4. TECHNICAL IMPLEMENTATION – development of its 3D positioning interface 4.1 Overview This is a camera based system composed by two computers in a network. One of them runs a Max/MSP/Jitter patch that acquires image from two video cameras and outputs audio. The other one runs a program written in Processing that produces video output out of the information input from the network.
Figure 2. MULTIVERSE performance set up. After some research and experimentation I decided that a combination of granular synthesis (~munger from http://www.music.columbia.edu/PeRColate/) and FM synthesis (“FM surfer” patch by John Bishop) would be the sources of my new instrument’s sound. Once this decision was made, only the source of sound required for granular synthesis had to be found. During the past 20 years of performing with SOUND=SPACE I had frequently created short samples that were little snippets taken from my own compositions or improvisations. These were, like all the hundreds of other samples, used simply as raw sound material to be chopped, transposed, filtered, reverberated, distorted, etc. This had worked quite well. Thus, I decided that the sound sources of the granulation engine would be recordings of my own composed instrumental and electronic works and recordings of performances by my two current performing ensembles, META41 and unoduotrio2.
The basic principle for the development of this interface is a matrix splitting technique similar to the one implemented i n Cyclops[5]. Although not used in this specific patch, the Jitter object developed by Eric Singer divides the matrix of an incoming live feed image in a variable number of portions. The average variation of values in these sub-matrices is analyzed in several different ways producing useful data. Our algorithm looks for a threshold value. This technique was also used in a program that I developed during my collaboration with the visual artist João Raposo i n his work Light Drawings in which it allowed the triggering of sound events [6][7]. A similar system was used, as described above, in a series of workshops about interactive media.
4.2
Concept
In this 3D positioning system, a Cartesian coordinate system is built by the live feed of the two video cameras. Each camera input is analyzed using the technique described above. The incoming matrix is divided into 13 columns times 13 rows. If the determined threshold is overcome it produces a boolean true information related to its specific cell. The results of the analysis of both cameras are combined in order to match a true value in a cell of the first camera with another correspondent one on the second camera(Figure 3.)
1
META4, founded in London, UK in 2000, a live electronic ensemble has as its members Rolf Gehlhaar, Nikola Kodjiabasha, Hagop Gehlhaar-Matossian and Vahakn Gehlhaar-Matossian (www.courtauld.ac.uk/east_wing/archive/2001/eastwing/arti sts/META4.html).
The form and position of the player are transcribed from this video analysis of the real to this virtual Cartesian system.
2
unoduotrio, founded in Aveiro, Portugal in 2003, a live electronic/instrumental ensemble has as its members Rolf Gehlhaar, Paulo M. Rodrigues and Luis M. Girão.
(www.http://casadasartes.blogspot.com/2006/11/cyberlieder -performers-manipulao.html).
The cameras are positioned in space in order to capture the torso, head and arms of the player. Their image middle axis form a 90º angle and intersect in space on the head coordinates of the player (Figure 5.) The camera positioning described above results from the constraints related to lens distortion. They can be set up in different positions but it implies that some points can not be reached easily. Non the less, the extensive amount of possible triggers allows a good level of set up flexibility.
Figure 3. Virtual 3D space camera
4.3
camera
Set Up
This system is light based which means that in order for it t o work its a sine qua non condition that there is a significant difference between the foreground (player) and the background. Ideally the background should be uniformly single colored, but in some concert circumstances it happened that it was not and the performance of the instrument was not significantly affected. It can be set up in environments that result in both dark or bright background (Figure 4).
PC Figure 5. Set Up Scheme.
4.4
Player Fold-Back
There is a remarkable difference between the real position of the player and the one represented by this system. It is not t o forget that the space represented is the product of two 2D images that are mainly ruled by the laws of perspective for a single focal point. This means, for instance, that the distance from the camera of the hand of the player reflects in its image size. To allow the player to have a better understanding of its position on this virtual space a screen with the image of the two cameras and the corresponding map of triggers is used as visual fold-back. (Figure 6.) Figure 6. MAX Patch Screen.
4.5
Figure 4. Famalicao, Portugal
Audience Feed-Back
A 3D representation of the this interface was develop i n Processing[8] and runs on other machine. The data resulting from the image analysis process is sent via UDP packets to this other networking computer.[9][10] An invisible cube composed by 2197 small cubes is the infrastructure of the interface´s virtual space. All active volumes in space are represented by a correspondent gray cube. In the case where the active volume corresponds to a trigger in use the correspondent representational cube is displayed in black. This visualization is video project onto a stage screen over the players set. It gives the audience some visual feedback
allowing to get more involved with the performing process (Figure 6.).
[7] Rodrigues, P. M., Vairinhos, M., Girão, L. M., et alia,"Integrating Interactive Multimedia in Theatrical Music: The Case of Bach2Cage", Proceedings of Artech2005,1st Workshop on Digital Art, V.N. Cerveira, Portugal, 2005. [8] http://www.processing.org [9] http://www.synthesisters.com/hypermail/maxmsp/Jan06/38573.html [10] http://www.synthesisters.com/hypermail/maxmsp/Jan06/38573.html
Figure 6. UnoDuoTrio performance.
5.
REFERENCES
[1] http://www.pad.ua.pt/documentos/doc_21.doc [2] Sparacino F., Davenport G., and Pentland A., “Media in Performance: Interactive spaces for dance, theater, and museum exhibits., IBM Systems Journal, vol. 39, nos. 34, pp. 479-510, 2000. [3] Gehlhaar R., “Sound=Space: an interactive musical environment.”, Contemporary Music Review, vol. 6, no. 1, pp. 59-72, 1991. [4] http://www.gehlhaar.org [5] http://www.ericsinger.com/cyclopsmax.html [6] http://www.fullking.com/dluz/