Gestural control of sound synthesis and processing ... - CiteSeerX

1 downloads 0 Views 233KB Size Report
1 CNRS-LMA, 31 chemin Joseph Aiguier. 13402 Marseille ..... Gérard Charbonneau: "Timbre and the Perceptual Effects of Three Types of Data Reduc- tion".
Gestural control of sound synthesis and processing algorithms Daniel Arfib1 & Loïc Kessous 1 1

CNRS-LMA, 31 chemin Joseph Aiguier 13402 Marseille Cedex20 France [email protected]

Abstract. Computer programs such as MUSIC V or CSOUND lead to a huge number of sound examples, either in the synthesis or in the processing domain. The translation of such algorithms to real-time programs such as MAX-MSP allows these digitally created sounds to be used effectively in performance. This includes interpretation, expressivity, or even improvisation and creativity. This particular bias of our project (from sound to gesture) brings about new questions such as the choice of strategies for gesture control and feedback, as well as the mapping of peripherals data to synthesis and processing data. The learning process is required for these new controls and the issue of virtuosity versus simplicity is an everyday challenge…

1 Introduction We will describe here the first phase of a project named "le geste créatif en Informatique musicale" (creative gesture in computer music) where we link sound synthesis and processing programs to gesture devices. Starting from a data base of non-real time algorithms written in MusicV or Csound, we choose some of these "frozen music" algorithms and translate them into real-time programs where some additional anchors allow a gestural control by musical or gestural devices This research has already been used for the gestualisation of an entire piece of music, le Souffle du Doux, mostly linking it to a "radio-drum" device and a keyboard [1]. We are now choosing other data sources to give a broad view of digital music programming styles, and trying other peripherals, either obvious or non-obvious. This experiment raises different questions which can be practical – you need to be very pragmatic to go on stage- but also psychological: what is most important: sound or gesture, what is most efficient, to provide an easy gesture or an expert one? Most of these questions deal with the mapping system that is used to link the gesture to the sound algorithms, and the choice of the strategies that are used.

2

Daniel Arfib & Loïc Kessous.

2 Real-time and non real time programs Traditionally, computer music has always made a big differentiation between non real time programs, where a sound sequence is calculated without any interaction between performer and systems that are directly linked to a performance device. This difference was bound to a philosophical division (composing and performing are two different arts) but also and mostly to a technical point of view: machines that allowed real time interaction (such as synthesizers) were not the same ones that were computing musical scores. Nowadays this division is slowly dissolving as we will now show for some Macintosh programs that we use. 2.1

The MusicV and CSound Area

Music V and Csound are two programs which define a language that composers can use to describe not only events, but also the construction of the sound itself. They are driven by alphanumerical lines. Music V was the first one to appear and be thoroughly documented [12] and has given rise to many examples or computer pieces. This is the language we have used for years in our laboratory at CNRS. More recently, it appears that it is not maintained any longer. Csound is an equivalent program which comes in several flavours. An impressive work has been done to provide the community with an extremely well-documented and maintained stable version [6]. These programs rely on a block structure where "unit generators" are linked together to constitute an "instrument" that is activated by events and follows driving curves. Example of a Csound Program ;-------------------------------; PARAMETER LIST ;p4 : AMPLITUDE ;p5 : PITCH ;p6 : OSCILLATOR FUNCTION ;p7 : WAVESHAPING FUNCTION ;p8 : ENVELOPE (DISTORTION INDEX) FUNCTION ;p8 : ENVELOPE (post correxction) FUNCTION ;-------------------------------; INITIALIZATION BLOCK ifr = cpspch(p5) ; PITCH TO FREQ ioffset = .5 ; OFFSET ;--------------------------------------kenv oscil1i 0,1,p3,p8 ; ENVELOPE (DISTORTION INDEX) kamp oscil1i 0,1,p3,p9 ;POST cCORRECTION ain1 oscili ioffset,ifr,p6 ; FIRST OSCILLATOR awsh1 tablei kenv*ain1,p7,1,ioffset; WAVESHAPING OF OSCILLATOR asig = p4*kamp*awsh1 out asig; OUTPUT endin

These programs were not meant to run in real time, but rather to remove the limitations intrinsic to real-time. The notion of gesture is directly linked to the triggering

Gesture control workshop 2001

3

of these events and the shape of the driving curves. Though a real time version of Csound exists that can be driven by MIDI, these programs are really dedicated to a compositional job more than a performing one. Some front ends allow for some kind of graphical input. However, from my point of view, it is easier to start with a program which is dedicated still the beginning to real-time issues. 2.2

The MAX-MSP area

Max-Msp is a Macintosh program that uses a graphical interface to describe objects which are very similar to the MusicV "unit generators". It is a real time program and is intended to be a performance tool. It is directly linkable to MIDI devices and has drivers for peripherals such as graphic tablets and for some joysticks and driving wheels.

Fig. 1. A Max-MSP Patch uses the same kind of unit generators as the Csound program, but it uses a graphical interface and it runs in real time.

One may wonder if there is a link between the real-time and non real time approaches. A previous study [1] consisted in finding a methodology to translate Music V programs to Max-Msp patches. If the construction of the instrument itself is quite straightforward, the constitution of the driving curves is not totally evident. While Music V and Csound use dozens of curve generators, Max-Msp mostly loads precalculated tables or uses tables described graphically using breakpoints. Also the NOT activation, which means the ensemble of parameters that define the structure of an entire note, should use a list structure and be dispatched -and eventually mapped to synthesis parameters- inside the instrument. Though this operation is not at all trivial and is still hand made, we have proved that it is possible to translate a MusicV or Csound instrument to a Max-Msp program and this is for us an essential key for the gestural control.

4

Daniel Arfib & Loïc Kessous.

3 Gesture to sound and sound to gesture The traditional approach is to start from peripherals that people buy or build and ask themselves: what can I do with it? Afterwards what can bind to this device, be it a sound, an image, a program. Our project is starting from the opposite question: when you have a synthetic sound of which you know the algorithm [16], what kind of gesture can you superimpose to make it work? A good way to deal with this question is to draw a dual path which symbolizes the gesture to sound and sound to gesture connection [14,17].

Fig. 2. The link between physical gestures and sonic production can be seen as a dual process of action and feedback, where an intermediate level represents a perceptual space of listening parameters, or conversely a psychoacoustic space of action factors

In Fig2, we can see that starting from the right (sonic production) we want to influence the synthesis parameters, which means to give expressivity or improvise, by way of physical gestures. The key is to build an intermediate space that can be linked to both worlds: starting from the sound, it represents the perceptual facts as can be recovered from perceptual criteria and put into psychological spaces through data reduction methods. A strategy is then defined by linking the intermediate space to both gestural and synthesis parameters.

4 Decision and modulation gestures Here we deal with a simplified taxonomy of gestures that start from sound allures, a more complex one can be found in [22]. Sounds have the particularity that they are really bound to time. Their perception relies on the arrow of time and they cannot be seen as "objects" disconnected from this dimension. As a matter of example in percussive sounds the first 50 ms immediately trigger the perception of their percussive nature and are a particular clue for the recognition of the timbre of the instrument. Decision gestures are so named because they decide of a particular set of values at a given time. Striking MIDI keyboards is something of that kind. On a Max drum (Fig.1), the crossing of a certain plane triggers the production of an event, where the

Gesture control workshop 2001

5

data are the X and Y crossing position, and the velocity is calculated by the time used to cross the distance between two planes. Decision gestures influence the sound by an abrupt change of parameters or by the initialization of a process. A trivial example is the triggering of an event, but even this note start-up is not so simple to define. For example the crossing of a plane gives two coordinates which must then be mapped over a number of synthesis parameters that are typically more than two.

Fig. 3. A radio-drum consists of two sticks containing radio emitters and a plate upon which antennas are placed. This device allows a gestural capture of the position but also calculates decision coordinates by the help of the crossing of a virtual horizontal plane. In this figure the left hand makes a decision gesture while the right one modulates using a circular gesture

Conversely sounds also have a steady state which lasts for some time where it is essential to keep them alive. Any natural sound has an evolution which is not only a plain reproduction of a steady state spectrum. As an example the vibrato of a voice is a way of producing a lasting sound without being dull. As Mathews states and demonstrates [13], the perfect freezing of a scanned synthesis process immediately gives way to a synthetic feeling whereas its natural evolution produces a feeling of life.

Fig. 4. A conventional racing wheel and its associated pedals is a perfect source for an easy modulation gesture

Modulation gestures are gestures that can drive the sound over a long period of time. They usually rely on the detection of movement in space, and can use 3D posi-

6

Daniel Arfib & Loïc Kessous.

tioning, rotation or inertial devices, or more simply linear and rotating potentiometers. Changing one device for another means two steps. The first step is the programming of the device itself to obtain a set of values describing the physical parameters. As a matter of example, a USB game driving wheel gives the following parameters: the value of three axes (the wheel and two pedals) plus a certain number of small switches. The second step consists of the mapping of these variables to the patch.

5 Some strategies We will now show an example of modulation gestures linked to a virtual instrument which uses the combination of both waveshaping [2] and filtering (formant) algorithms. 5.1

a simple patch

Fig. 5. A Max-Msp patch using three controls: here the pitch is linked to a keyboard, while the distortion index and the center frequency of the filter are linked to the acceleration pedal and the driving wheel.

In this case, the waveshaping instrument is the sum of three distortion units, each of them being biased in frequency by a very small amount to produce a choir effect. The shape of the distortion is particular in the sense that it is a periodical function. The sound it produces is reminiscent of the FM synthesis described by Chowning [8]. The filtering uses a resonant filter whose controls are the filtering central frequency and the bandwidth. This instrument is really dedicated to modulation gestures, because the sound itself gives the impression of clay that we want to mold.

Gesture control workshop 2001

5.2

7

The voicer, a musical instrument

This musical instrument allows the articulation of French vowels defined by three formants, as well as the control of the pitch in a way that allows intonation and nuances. A graphic tablet and a joystick are used as controllers for this instrument.

Fig. 5. The voicer Max-Msp patch where the pitch control is li nked to a graphic tablet and the articulation of vowels to a joystick via an interpolation plane.

The Max-MSP patch uses an all pole filter structure that is superimposed on a sound source. The pitch variation is linked to the coordinates of the tip of a pen on a graphic tablet in way that is not linear: the angle in a polar representation provides the pitch and vibrato is possible in some zones of the circle. Lateral buttons make an octave up or down jump. The coordinates of the joystick axes are linked to an "interpolation plane" which converts them to synthesis parameters. On an interpolation plane, each referenced object corresponds to a set of synthesis parameters and the position of one point on the plane interpolates between the values of the parameters of these objects following "attraction curves" [19]. So two coordinates for the joystick position allow control of n parameters (in our case 6 which define the cascaded filter) and, this way, a navigation in the interpolation space is possible.

8

Daniel Arfib & Loïc Kessous.

Fig. 5. The voicer instrument is played by the two hands simultaneously.

One plays this musical instrument, named the voicer, with the two hands controlling the graphic tablet pen and the joystick. It is very expressive, because of its intuitive control of the pitch, with glissando, vibrato and subtle variations, and the articulation of the vowels ordered in a practical way on the XY plane [20]. The control of temporal envelopes linked to the pressure of the tip of the pen allows the playing of linked or detached notes. One can follow scores, or improvise, and this has been used in concert for a musical piece named "Spectral shuttle" where it makes a dialog with a saxophone processed in real-time.

Gesture control workshop 2001

9

6 Future: Mapping, feedback, learning and virtuosity Mapping, feedback, learning and virtuosity are important considerations in our realization. Connecting a device to sound creation involves a reflection on these domains. Therefore I describe here the framework that is behind our research. Visual feedback is an important step in the driving of digital audio effects [3], and it has been shown that the visual interface is a definitive help for the creation of an artistic transformation of sounds. This holds for synthesis too. Mapping is an essential area for a proper control. As stated in chapter 3 the building of a "psychoacoustic" level requires evaluation of some perceptive criteria. This is a combination of curve extraction and data reduction. Sound analysis has already provided methods to project data into a 3D or nD perceptive space[7,9,11,18,21] and linking physical parameters coming from the device to these parameters is a mapping which corresponds to some form of pedagogy: musical intentions must be transcribed to perceptive data.

Fig. 6. A gesture used on a photosonic instrument. The light is displaced by the left hand while the filter is controlled – horizontally and vertically- by the right hand..

The existence of such mappings has a very strong influence on the learning curve of the instrument: there must be some coherence between the gesture itself and the sonic result. But also, as stated by some authors [10], simplicity is not the ultimate goal: virtuosity is common in performances from difficult instruments, and it looks like the effort required to learn an instrument is also a key to the possibility to play it well. As a matter of example the gestures that one uses to control a photosonic instrument [5] are very precise (rings over the photosonic disk are 3 mm wide). However this instrument is very playable and its complete transcription into a real-time numeric simulation is part of our project.

7 Conclusion This first phase of our project has already shown that it is possible to use the large knowledge base of non real time music programs such as Music V and Csound to

10

Daniel Arfib & Loïc Kessous.

build real time systems combining Max-Msp applications and gestural devices. The diverse strategies allow some classifications of gesture and they also emphasize the role of mapping in the sound to gesture and gesture to sound relationship. Future directions will concentrate on a more precise knowledge of the learning process and on the virtuosity these links permit.

8 Acknowledgments This research is sponsored by the National Center for Scientific Research (CNRS) and the Conseil Général des Bouches du Rhône (France) under the project name "le geste créatif en Informatique musicale".

References 1. Daniel Arfib & Loic Kessous: "from Music V to creative gesture in computer music", proceedings of the VIIth SBC conference, Curitiba, June 2000, available in CD format (a new version has been accepted for Electronic Musicologic Review) 2. Daniel Arfib: "Digital synthesis of complex spectra by means of multiplication of nonlinear distorted sine waves", Journal of the Audio Engineering Society 27: 757-768. 3. Daniel.Arfib: "Visual representations for digital audio effects and their control", proceedings of DAFx99, Trondheim, decembre 1999, pp 63-66 4. Daniel.Arfib, Jacques.Dudon : "A digital version of the photosonic instrument", proceedings ICMC99, Pekin, novembre 1999 , pp 288-290 5. Daniel Arfib & Jacques Dudon: "Photosonic disk: interactions between graphic research and gestural controls", in CD-ROM "Trends in Gestural control of music", editeurs M. Wanderley & M. Battier , publication Ircam, 2000 6. Richard Boulanger: The Csound Book, Perspectives in Software Synthesis, Sound Design, Signal Processing and Programming , MIT press, 2000 7. Gérard Charbonneau: "Timbre and the Perceptual Effects of Three Types of Data Reduction". Computer Music Journal 5(2) p.10-19, 1981. 8. John Chowning: "The Synthesis of Comples Audio spectra by Means of Frequency Modulation. Journal of the audio Engineering Society 21: 561-534. Reprinted in C. Roads and J. Strawn editions 1985 "Foundations of Computer Music". Cambridge, Massachusetts: MIT Press. 9. John Grey: "An Exploration of Musical Timbre". PhD dissertation, department of Psychology, Stanford University, 1975.

Gesture control workshop 2001

11

and: John Grey: "Multidimensional Perceptual Scaling of Musical Timbres" J.A.S.A vol.61 n°5 May 1977 p 1270-1277 10. Andy Hunt, Marcelo M. Wanderley, Ross Kirk. : "Towards a Model for Instrumental Mapping in Expert Musical Interaction", proceedings ICMC 2000, Berlin 11. Stephen McAdams, Suzanne Winsberg, Sophie Donnadieu, Geert De Soete, Jochen Krimphoff; "Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes", Psychological Research, 58, 177-192 (1995) available on 12. Max Mathews: "The Technology of Computer Music" (1969). MIT Press, Cambridge, MA. 13. William Verplank, Max V. Mathews, Robert Shaw: "Scanned Synthesis, proceedings of the Icmc 2000, Berlin 14..Eric Metois: "Musical Gestures and Audio Effects Processing", DAFx98 conference, barcelona, available on 15. Joe Paradiso: "American ionnovation in electronic musical instruments", available on 16. Jean-Claude Risset et Wessel: "Exploration of Timbre by Analysis and Synthesis", The Psychology of Music, Deutsch eds. Orlando Academics, 1982. 17. Sylviane Sapir: "Interactive digital audio environments: gesture as a musical parameter", proceedings of DAFx00, available on 18. Wessel David L.: "Timbre Space as a Musical Control Structure", Rapport Ircam 12/78, 1978, available on 19. Vect pour Max Macintosh, vectFat 1.1 (par Adrien Lefevre), available on 20. A. Slawson "Vowel quality and musical timbre as function of spectrum envelope and fundamental frequency" 1968,Journal of Acoustic Society of America, n°43, p 87-101. 21. Thierry Rochebois, Thèse, available on 22. Claude Cadoz & Marcelo M. Wanderley: Gesture-Music, in M. Wanderley and M. Battier (eds): Trends in Gestural Control of Music- Ircam - Centre Pompidou, 2000.

Suggest Documents