sounds - grrrr.org

Two-dimensional gesture mapping for interactive real-time electronic music instruments Thomas Grill [email protected]

University for Music and Performing Arts, Vienna University of Applied Arts, Vienna Austrian Research Institute for Artificial Intelligence (OFAI)

Topolò, 2006

„You can play on a shoestring if you are sincere“ (John Coltrane)

Wishes • Interaction with „natural“ objects (also non-sounding)

• Adequate reaction upon complex gestures • Superimposed „source-binding“ (Smalley) ➡Flexible sensorics (sound, touch, etc.) ➡Flexible sound synthesis „You can play on a shoestring if you are sincere“ (John Coltrane)

Acoustic instruments

✓ Intuitive gesture-to-sound transformation ✓ Efficient and expressive ✗ Sound limited due to material, construction

Software-based electronic instruments

✗ Abstract technology-biased user interfaces ✗ Coarse gesture input, limited expressiveness ✓ Unlimited spectrum of sounds and structures

Challenges • Gestures must be mapped to sound synthesis • Effective human input „devices“ have

many interrelated degrees of freedom

? How can we make such a mapping flexible/general ? How can this process be comprehensible for a player

Proposition • Gestural space may not be dense in all dimensions ➡Given a proper representation we can try to use dimensionality reduction

➡We can do the same for the sound repertory ★ By representing gesture and sound repertories in two dimensions one can easily control and visualize the mapping process (on screen).

Some wishful thinking... !

acoustic/sensoric gestures

feature extraction

real-time trajectory within a two-dimensional map representation of the gesture repertory

translation/transformation from gesture to synthesis repertory

real-time trajectory within a two-dimensional map representation of the synthesis repertory

sound synthesis

sound output

How? • Find sensible feature set for gesture analysis • Find sensible feature set for sound representation • Use dimensionality reduction to represent gesture and sound repertories as 2D maps

• Use 2D transformation strategies to

define the actual gesture-to-sound mapping

Gesture analysis

• Gestures as analog (audio) signals

e.g. picked up by contact microphones

• Feature set describing spectral content of signal

Timbre analysis • Perceptual model of human hearing

(frequency scale [mel, bark], loudness [sone])

• Model of spectral envelope (e.g. using DCT) ➡MFCCs (mel frequency cepstrum coefficients)

Dimensionality reduction • Sequences of MFCC vectors (typ. 8-10 numbers) represent gestures and sounds.

• Frame length ≈ 1 ms

➡ many frames make up repertories ➡ parts of those may be similar

• Use self-organizing feature maps (SOMs) to spread out the variety of features on a 2D map.

Voice 90x90 features

Voice 90x90 features

Anticipated architecture ó

gestures on input device (recorded by sensors)

feature extraction find matching feature set in 2D map of gesture repertory

gesture transformation based on 2D coordinates look up synthesis parameters in 2D map of sound repertory sound synthesis volume adjustment

sound

2D transformation • Define Gaussian transformation regions to

translate from the gesture to the sound map

• Regions can be translated, scaled, rotated A A B C

B

C




B

C




B

C





• A point can be transformed by multiple regions A A B C

B

C

Sound repertory • The trajectory in gesture space has been

transformed to a trajectory in sound space.

• Along this trajectory sounds are looked up in the repertory map in real-time.

Sound repertory • Sounds in the repertory can be represented by ➡Sample grains (granular synthesis) ➡Synthesis parameters

Sound repertory • Sounds (defined by synthesis parameters)

in the repertory have been pre-analyzed.

➡Sound features (e.g. MFCCs) define 2D position of underlying synthesis parameters.

Spectral modeling synthesis deterministic components (sinusoidals)

stochastic components (colored noise)

Spectral modeling synthesis

Developed by Xavier Serra (PhD thesis, 1989)

Instrument interface

Technical building blocks audio

audio in volume trace

MFCCs

MFCC lookup in SOM

2D transformation

lookup of synthesis parameters in quad-tree

sinusoidals + residuum composition

color-coded picture of gesture SOM visualization and control of Gaussian transformation regions color-coded picture of sound SOM

graphical interface

SMS synthesis

made with audio out

pd + GEM

Basic instrument setup

Conclusion • The proposed instrument system is powerful and flexible, featuring a simple and intuitive performance interface.

• Like a traditional acoustic instrument,

this system demands practice as well as thorough preparation of the essential gesture and sound repertories.

↯The actual feasibility has yet to be verified.

Future • Better analysis and synthesis. • Make preprocessing of repertories much faster (etc. updating in real-time).

• Make accessible temporal features and transformation strategies.

• Intensified research in high-level analysis,

representation and synthesis of musical structures.

References • Thesis and poster: http://grrrr.org/pub • Sergi Jordà: New musical interfaces (PhD thesis) • Denis Smalley: Spectromorphology • Proceedings of the NIME conferences http://www.nime.org

• Other sources: see thesis bibliography