timeline, waveform, and mixing console views of sound creation. ...... ican counterculture, 1961-1966,â in The San Francisco Tape Music Center: 1960s Counter-.
SKETCHING SOUND GESTURAL INTERACTION IN EXPRESSIVE MUSIC PROGRAMMING
A DISSERTATION SUBMITTED TO THE DEPARTMENT OF MUSIC AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
Spencer Salazar May 2017
© 2017 by Spencer Salazar. All Rights Reserved. Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution3.0 United States License. http://creativecommons.org/licenses/by/3.0/us/
This dissertation is online at: http://purl.stanford.edu/mf249vj6694
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Ge Wang, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Julius Smith, III
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Perry Cook
Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.
iii
Abstract New developments in technology lead to new types of interactions in computer music performance and composition. In the realm of mobile touchscreen devices such as phones and tablet computers, a variety of research efforts and software applications have explored these possibilities. These include musical experiences that utilize multitouch interaction, the physical properties of the devices themselves, the orientation and location sensing of the devices, and their persistent connection to the network. However, these interactions have been largely ignored in the space of music programming on the device itself. We have developed two prototype systems to explore concepts related to the employment of these interactions and technologies in programming music on mobile touchscreen devices. The first of these, miniAudicle for iPad, is an environment for programming ChucK code on an iPad. In addition to a text editor and synthesis engine, miniAudicle for iPad incorporates features to ease the oftenlaborious process of typing on a touchscreen. The application includes additional functionality to greater leverage the intrinsic capabilities of the device, including multitouch interaction and location and orientation sensing. The second prototype developed is a sound design and music composition system utilizing touch and handwritten stylus input. In this system, called “Auraglyph,” users draw a variety of audio synthesis structures, such as oscillators and filters, in an open canvas. Once created, these structures may be further parameterized by touch or in some cases with additional hand-drawn figures. Auraglyph also displays a variety of aspects of its program state to the user, conferring a deeper understanding of how each piece of a program affects the overall result. In addition to a novel interaction framework, standard synthesis functions, and system feedback display, Auraglyph is characterized by a
iv
unique visual design intended to draw users into its perspective of music computing. These systems and the principles they embody have been evaluated through user studies, through the author’s experiences composing with them, and through the compositional experiences of other musicians. These evaluations comprise written and oral user feedback, quantitative analyses, and a number of performances of music utilizing these systems. Results from initial studies were used to re-examine these systems and rework them for later studies that were conducted. Together, these data have illuminated the advantages and drawbacks of the systems designed in this research and the principles underlying them. Ultimately, we believe this research shows that the critical parameters for developing sophisticated software for new interaction technologies are consideration of the technology’s inherent affordances and mindful attention to design. To this end, we have proposed a set of principles for designing these systems stemming from this research and previous research in this field. Upon attaining sufficient functionality and reliability, miniAudicle for iPad will be released via https://ccrma.stanford.edu/~spencer/mini-ipad and Auraglyph will be released via its website, https://auragly.ph/.
v
Acknowledgments I would like to thank my parents Janet Gray and Mario Salazar for their unending support, and my grandfather José Salazar. I would like to thank my brothers Jesse and Tyler, my sister Dominique, and my nephew Miró, in addition to Tesla Monson, Mark Cerqueira, and Jad Kanbar, all of whose love and friendship has been a crucial support and counterbalance to my academic endeavors. I would like to express my immense gratitude to my advisor and friend Ge Wang, my mentor Perry Cook, and my dissertation committee members Julius Smith, Chris Chafe, and James Landay, for their insights into the research discussed herein and their invaluable guidance, as well as to Dan Trueman, who has been a vital mentor throughout my academic career. I would like to especially thank Ajay Kapur, who has been an incredible friend and mentor throughout the development of this dissertation, and Raakhi Kapur. I would also give a huge thanks to all of my Stanford friends over the years, including Rob Hamilton, Chryssie Nanou, Romain Michon, John Granzow, Hongchan Choi, Tim O’Brien, Kurt Werner, Turner Kirk, Nick Bryan, Jieun Oh, Jorge Herrera, Alex Chechile, Madeline Huberth, Woody Herman, Pablo Castellanos Macin, Dave Kerr, Victoria Grace, Diana Siwiak, Zhengshan Shi, Myles Borins, Hana Shin, Chet Gnegy, Gina Collecchia, Cooper Newby, Colin Sullivan, Jen Hsu, Lauchlan Casey, Derek Tingle, Luke Dahl, Ed Berdahl, Wendy Ju, Johan Ismael, Roshan Vid, and Hunter McCurry. My friends and students in the CalArts community have provided invaluable feedback into this research and personal support during its undertaking, including Ashwin Vaswani, Daniel Hynds, Mike Leisz, Juan-Pablo Yépez, Parag Mital, Andrew Piepenbrink, Jake Turpin, Daniel Chavez Crook, April Gerloff, Sarah Reid, Jason Tibi, Cassidy Swanson, Nathan Shaw, Martín Vélez, Clayton Burton, Ivy Liu, Kyle McCarthy, Evan Schaaf, Michael Zasadzinski, Adrian Baghdasarian,
vi
Christopher Delgado, Max Keene, John Klingbeil, Jack Rodgers, Holland Sangster, and Adrian Turner. Rebecca Fiebrink and Blair Bohannon provided insight and resources for assessing creative technology that was invaluable in preparing the evaluation section of this dissertation.
vii
Contents Abstract
iv
Acknowledgments
vi
1 Introduction
2
1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.2
Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.3
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2 Background
11
2.1
Systems for Interactive Music Programming and Live Coding . . . . . . . . . . .
11
2.2
Mobile and Touch Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.3
Handwriting Input for Computing and Music . . . . . . . . . . . . . . . . . . . .
29
2.4
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3 Coding and Musical Gesture
37
3.1
Motivation and Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.2
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.2.1
Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2.2
Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.2.3
ChucK Mobile API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3.2.4
OSC and MIDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
viii
3.3
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
3.4
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4 Sketching and Musical Gesture
47
4.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
4.2
Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
4.3
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
4.3.1
Node Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.3.2
Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
4.3.3
Document Management . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
4.3.4
User-Training the Handwriting Recognition System . . . . . . . . . . . . .
60
4.3.5
Visual Aesthetic and Dynamics . . . . . . . . . . . . . . . . . . . . . . .
61
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
4.4.1
Handwriting Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
4.4
4.5
5 Evaluation
67
5.1
miniAudicle for iPad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
5.2
Auraglyph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
5.2.1
Auraglyph Study 1: Composing for Robots . . . . . . . . . . . . . . . . .
74
5.2.2
Auraglyph Study 2: Composing with Mobile Technology Workshop . . . .
79
5.2.3
Personal Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
5.3
6 Conclusion
98
6.1
A Framework for Mobile Interaction . . . . . . . . . . . . . . . . . . . . . . . . .
6.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3
Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
A Node Types in Auraglyph
99
106
A.1 Audio Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 ix
A.2 Control Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 B User Study Documentation
116
B.1 miniAudicle for iPad Entry Survey . . . . . . . . . . . . . . . . . . . . . . . . . . 116 B.2 miniAudicle for iPad Exit Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 B.3 Auraglyph survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Bibliography
122
x
List of Tables 5.1
Average rankings by users of each factor in the modified Creativity Support Index.
69
5.2
Top analytics events during the miniAudicle for iPad user study. . . . . . . . . . .
72
5.3
Average rankings of each factor in the modified Creativity Support Index by Auraglyph users in the “Composing for Robots” class. . . . . . . . . . . . . . . . . . . . . .
76
5.4
Top analytics events during the “Composing for Robots” user study. . . . . . . . .
77
5.5
Audio node use during the “Composing for Robots” user study. . . . . . . . . . . .
78
5.6
Average rankings of each factor in the modified Creativity Support Index by Auraglyph users in the “Composing with Mobile Technology” workshop. . . . . . . . . . . .
85
5.7
Top analytics events during the “Composing with Mobile Technology” workshop. .
87
5.8
Audio node use during the “Composing with Mobile Technology” workshop. . . .
88
xi
List of Figures 1.1
A live-coding performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
Smule’s Ocarina. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Smule’s Magic Piano. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.4
Biophilia, an iOS and Android application accompanying the Björk album of the same name. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.5
Radiohead’s Polyfauna mobile application. . . . . . . . . . . . . . . . . . . . . .
5
1.6
Comparison of mouse interaction and touch interaction by number of layers between the user and the digital system they are interacting with. . . . . . . . . . . . . . . .
7
2.1
The Audicle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.2
miniAudicle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.3
The Field creative programming environment. . . . . . . . . . . . . . . . . . . . .
13
2.4
Max. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.5
An example graphical score in Pure Data. . . . . . . . . . . . . . . . . . . . . . .
15
2.6
A patch in Reaktor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.7
Derivative TouchDesigner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.8
A peak meter in Kronos (from [24]). . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.9
Smule Ocarina instrument (left) and global musical network (right). . . . . . . . .
20
2.10 The Princeton Laptop Orchestra. . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.11 The Stanford Mobile Phone Orchestra. . . . . . . . . . . . . . . . . . . . . . . . .
23
2.12 MoMu conceptual diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.13 The reacTable (left) and an example fiducial marker (right). . . . . . . . . . . . . .
26
xii
2.14 Reactable Mobile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.15 The Buchla Thunder (left) and SLABS (right). . . . . . . . . . . . . . . . . . . . .
28
2.16 Optical disks used by the Variophone (left) and hand-drawn musical scale used with the Vibroexponator (right) (from [65]). . . . . . . . . . . . . . . . . . . . . . . . .
30
2.17 An example score for the ANS Synthesizer (from [69]). . . . . . . . . . . . . . . .
31
2.18 Daphne Oram operating the Oramics machine. . . . . . . . . . . . . . . . . . . . .
31
2.19 Adjusting a hand-drawn graphical structure in Ivan Sutherland’s Sketchpad. . . . .
32
2.20 A waveform designed in UPIC (left) and a UPIC score (right) (from [79]). . . . . .
33
2.21 Editing synthesis parameters with the Fairlight CMI light pen. . . . . . . . . . . .
34
3.1
Editor mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.2
Player mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.3
An individual tab in player mode. . . . . . . . . . . . . . . . . . . . . . . . . . .
42
4.1
Idea sketching, at the confluence of imagining, seeing (or perhaps hearing?), and drawing (from [102]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
4.2
A full Auraglyph program, with annotations. . . . . . . . . . . . . . . . . . . . . .
52
4.3
The editor for a SquareWave node. . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.4
Modifying the freq parameter of a unit generator with handwritten numeric input.
54
4.5
Selecting the input port for a new connection. . . . . . . . . . . . . . . . . . . . .
54
4.6
Breaking a connection between two nodes. . . . . . . . . . . . . . . . . . . . . . .
56
4.7
Base node types: unit generator (left) and control-rate processor (right). . . . . . .
56
4.8
Menu for selecting an object sub-type. . . . . . . . . . . . . . . . . . . . . . . . .
57
4.9
Editor for a Waveform node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
4.10 Editor for a Sequencer node. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
4.11 Adding rows and columns to a Sequencer node. . . . . . . . . . . . . . . . . . .
58
4.12 Saving a document. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
4.13 Loading a document. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
4.14 Window opening animation in Auraglyph. . . . . . . . . . . . . . . . . . . . . . .
62
xiii
5.1
Demonstration instruments design in miniAudicle for iPad. . . . . . . . . . . . . .
5.2
Auraglyph in performance at the Auraglyph x Robots concert, and the poster for the
70
event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
5.3
A rehearsal of HedonismBot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
5.4
A student developing an idea in Auraglyph. . . . . . . . . . . . . . . . . . . . . .
81
5.5
Screenshot of an Auraglyph program developed in the Composing with Mobile Technology workshop (courtesy April Gerloff). . . . . . . . . . . . . . . . . . . .
82
5.6
Sketches of prototype notation for Auraglyph performance. . . . . . . . . . . . . .
83
5.7
Demonstrating a program that manipulates sound with the iPad’s orientation sensors. 84
5.8
An Auraglyph quartet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.9
A still image from “Auraglyph | Demo_1: Basic FM + Filters,” the first public pre-
84
sentation of Auraglyph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
5.10 The ending of PULSE, with multiple cascades of modulated feedback delays. . . .
91
5.11 The author performing with Auraglyph at CalArts. . . . . . . . . . . . . . . . . .
92
5.12 Notes, or a “score,” for the author’s performance at TOO LATE IV. . . . . . . . . .
93
6.1
Topology of mobile interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
6.2
Linear vs. exponential motion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
xiv
1
I dream of instruments obedient to my thought and which with their contribution of a whole new world of unsuspected sounds, will lend themselves to the exigencies of my inner rhythm. Edgard Varese
Chapter 1
Introduction There are no theoretical limitations to the performance of the computer as a source of musical sounds, in contrast to the performance of ordinary instruments. At present, the range of computer music is limited principally by cost and by our knowledge of psychoacoustics. These limits are rapidly receding. M. V. Mathews, The Digital Computer as a Musical Instrument (1963) With this resounding pronouncement Max Mathews kickstarted the era of digitally synthesized computer music. From that point forward it has been understood that computers can produce any sound that humans are capable of hearing, given sufficient computing resources. For some time these resources have been widely available on consumer desktop and laptop computers, and, in recent years, on phones and a panoply of other mobile devices. Given an array of computing devices that can produce any sound, the objective of the computer musician then becomes what sounds are worth creating. From classical subtractive and additive synthesis methods to modern physical modeling techniques, the history of computer music can be viewed as a succession of constraints placed on this infinite space of possibilities. In exchange, these constraints afford computer musicians higher-level abstractions for thinking about and creating music. To this end, much attention has been devoted to the development of programming frameworks for audio synthesis, processing, and analysis. Early music programming languages introduced the unit generator design pattern; later developments have allowed graphical programming of musical and 2
CHAPTER 1. INTRODUCTION
3
audio processing primitives; more recent systems have offered tools for parallelization and strong timing of musical events. Each such framework establishes semantics for thinking about digital music; these semantics may limit what is easy to do in any given language, but provide the necessary support for higher-level reasoning about design and composition. Also of critical importance to the computer musician is the role of interaction. Initially digital music creation involved limited interactivity, as composers would develop an entire music software program and then wait for some time, often overnight, for the computer to generate an audible result. This is an abrupt contrast to pre-electronic music production, by way of acoustic musical instruments; sound can be issued immediately, but dexterity, physical exertion, and extensive training are required for a performer to control the sound in any structured way. Over the years, available computing power has increased, costs have decreased, and physical size has diminished, allowing a diversity of approaches for interactive computer music. Some of these apply human-computer interaction research to the musical domain, employing general-purpose computing interfaces like mice, keyboards, pen tablets, and gaming controllers to design and enact musical events. Related to this, the developing live-coding movement has positioned the act of programming itself as a musical performance (Figure 1.1). Other efforts seek to update acoustic instruments with digital technology, complementing these with sensors and actuators mediated by a computer. Many interactive computer music systems apply both approaches. Crucially, this research and design of interactivity has been necessary to create structures and meanings for computer music. Interactive software is our means to explore the infinite variety of sounds made available to us by the computer.
Figure 1.1: A live-coding performance.
CHAPTER 1. INTRODUCTION
4
Most recently, touchscreen interfaces and mobile devices have profoundly impacted the landscape of mainstream human computer interaction in the early 21st century. In the past decade, scores of mobile touchscreen devices have been cemented in the popular consciousness – mobile phones, tablet computers, smart watches, and desktop computer screens, to name a few. For many individuals, mobile touch-based devices are increasingly the primary interface for computing. It is conceivable this trend will only accelerate as computing technology becomes even more naturally integrated into daily life. New human-computer interaction paradigms have accompanied these hardware developments, addressing the complex shift from classical keyboard-and-mouse computing to multitouch interaction.
Figure 1.2: Smule’s Ocarina.
Figure 1.3: Smule’s Magic Piano.
These new technologies and interaction models have seen a number of interesting uses in musical interaction, including the software applications of Smule, like Ocarina and Magic Piano (Figure 1.2), and artist-focused apps like Björk’s Biophilia (Figure 1.4) and Radiohead’s Polyfauna (Figure 1.5). However, touchscreen-based mobile devices have found meager application in programming of music software, despite recent innovations in interactive musical programming on desktop/mouseand-keyboard systems. This overlooked use of touchscreen and mobile interaction presents unique and compelling opportunities for musical programming. Touchscreen interaction offers a natural medium for expressive music programming, composition, and design by providing intrinsic gestural control and interactive feedback, two fundamental processes of musical practice. This dissertation explores these ideas through the design, implementation, and evaluation of two concepts for music programming and performance on mobile, touchscreen interfaces. Through this exploration, we
CHAPTER 1. INTRODUCTION
5
have developed a novel framework for sound design and music composition using exclusively touch and handwriting input.
Figure 1.4: Biophilia, an iOS and Android application accompanying the Björk album of the same name.
Figure 1.5: Radiohead’s Polyfauna mobile application.
1.1 Motivation This work is motivated by the desire to better understand the distinguishing capabilities and limitations of touchscreen technology, and, using these as guiding principles, to empower musical programming, composition, and design on such devices. Complex software developed for a given interaction model — such as keyboard-and-mouse — may not successfully cross over to a different interaction model — such as a touchscreen device. As these technologies have become widespread, it is not sufficient to sit back and watch inappropriate, preexisting interaction models be forced into the mobile touchscreen metaphor. Rather, it is incumbent upon the research community to explore how best to leverage the unique assets and drawbacks of this new paradigm, which is evidently here to stay. Similar trends might be seen in the shift in computer music from mainframes and dedicated synthesizers to the personal computer in the 1980s, and then to the laptop, now ubiquitous in modern computer music performance practice. As these computing paradigms gave way from one to another, the software tools and interaction metaphors adjusted to better take advantage of the dominant paradigm. Therefore, our overriding design philosophy in researching these systems is not to transplant a desktop software development environment to a tablet, but to consider what interactions
CHAPTER 1. INTRODUCTION
6
the tablet might best provide for us. We believe the critical properties of touchscreen interaction in musical creation are its support for greater gestural control, direct manipulation of on-screen structures, and potential for tight feedback between action and effect. Gesture and physicality is of primary importance in music creation; paradoxically, the power of computer music to obviate gesture is perhaps its most glaring flaw. Keyboard-and-mouse computing does little to remedy this condition. From a musical perspective, keys are a row of on-off switches with limited degree of gestural interaction, and the conventional use of the mouse is a disembodied pointer drifting in virtual space. As described in Section 2.2, a rich history of laptop instrument design has sought to embrace these disadvantages as assets, with fascinating results. Yet gesture and physicality is the essence of interaction with mobile touchscreen devices. Touchscreens enable complex input with one or more fingers of one or both hands. They allow their users to directly touch and interact with the virtual world they portray. A mobile device can be picked up, waved, dropped, flung, flipped, gesticulated, attached, put in a pocket, spoken, sung, or blown into, and/or carried short or long distances. Moreover, these and other activities can be detected by the sensors of the device and interpreted by software. So it is natural to ask how these capabilities might be applied to musical creativity. Direct manipulation allows users of creative software to interact directly with objects on-screen, narrowing the divide between physical reality and digital abstraction. While this can be used to augment simple physical metaphors, like the knobs and sliders popular in musical applications, it can also facilitate richer interactions beyond what is possible in the physical realm. This contrasts with conventional desktop computer interaction, in which the mouse cursor distances the user from the activities that occur in the underlying virtual space. Figure 1.6 visually demonstrates these comparative distances. This naturalistic form of interaction greater leverages its users’ preexisting intuitions about the world. The resulting intuitiveness not only makes it easier for novices to learn a new set of interactions, but makes simple actions require less thinking for experienced users, allowing more sophisticated interactions to be built on this intuitive foundation. Touch and direct manipulation further enable tight feedback between action and effect. As an object is manipulated on-screen, it can quickly react visually to the changes being made to it. In a system of interacting components, the result of changing one component can be further reflected in
CHAPTER 1. INTRODUCTION
7
Figure 1.6: Comparison of mouse interaction and touch interaction by number of layers between the user and the digital system they are interacting with. the overall system. This dynamic, immediate feedback into the state of a virtual system allows users to quickly see the connections between inputs and outputs, facilitating understanding of the overall system. This is especially advantageous in musical systems, in which a “try-and-see” approach to design and composition is typical and a musician might explore numerous small variations on a single idea. Touch interaction is not a prerequisite for this dynamic feedback, but the shorter distance between the user and the virtual structures on screen augments its utility. This dissertation also examines the role of stylus-based interaction in mobile touchscreen devices. Stylus interaction can allow for user input of higher precision than typically possible with touch input, such as drawing symbols, waveforms, and other shapes. By placing an implement between the user and the virtual world, stylus input distances the user somewhat from the activity on screen. However, writing on a flat virtual surface is still fairly immediate, and touch input remains available for interactions where higher levels of directness is desirable in exchange for comparably lower levels of precision. Of course, mobile touch interaction is not without its weaknesses. The most obvious of these in conventional systems is the lack of tactile feedback. When someone types a key or clicks a mouse, their finger experiences a brief physical sensation indicating that they key or click was registered.
CHAPTER 1. INTRODUCTION
8
Keyboard users with reasonable training can feel if their finger is fully covering a single key or erroneously covering multiple keys, and fix the error before ultimately pressing the key. Contemporary touchscreens are generally unable to provide such physical cues as to the success of an impending or transpiring gesture. Rather, touch systems must rely on visual or aural feedback for these cues. This lack of tactile feedback complicates interaction with systems with discrete inputs. For instance, most commonly available touchscreen systems cannot simulate the clicking detents of a rotary knob, or the click of a keyboard. Creative musical systems using touch interaction must acknowledge these limitations in their design. Together, these characteristics of touchscreens are a compelling framework for working with music in diverse and artistically meaningful ways. The research herein examines how these advantages and weaknesses might be used to create and understand music computationally.
1.2 Roadmap In this dissertation we explore new models for programming musical systems. Firstly, we discuss previous research in areas related to the present research. These subjects include systems for interactive music programming and live coding, mobile and touch computing in music, and the applications of handwriting input to music and computing. Following this, we examine the place of textual programming in touchscreen interaction, in part by developing a programming and performance environment for the ChucK language on the Apple iPad, called miniAudicle for iPad. Textual programming on touchscreen devices has been largely overlooked for music software development; it is easy to dismiss due to the imprecision of keyboardbased text input in such an interface. Nonetheless we believe that exploring the augmentation of text programming with touch interaction is worth considering as an initial approach to music software development. Next, we introduce a new and drastically different programming paradigm in which touch manipulation is augmented with stylus-based handwriting input. In combining stylus and touch interaction, the system we have developed provides a number of advantages over existing touchscreen
CHAPTER 1. INTRODUCTION
9
paradigms for music. Handwritten input by way of a stylus, complemented by modern digital handwriting recognition techniques, replaces the traditional role of keyboard-based text/numeric entry with handwritten letters and numerals. In this way, handwritten gestures can both set alphanumeric parameters and write out higher level constructs, such as programming code or musical notation. A stylus allows for modal entry of generic shapes and glyphs, e.g. canonical oscillator patterns (sine wave, sawtooth wave, square wave, etc.) or other abstract symbols. Furthermore, the stylus provides precise graphical free-form input for data such as filter transfer functions, envelopes, parameter control curves, and musical notes. In this system, multitouch finger input continues to provide functionality that has become expected of touch-based software, such as direct movement of on-screen objects, interaction with conventional controls (sliders, buttons, etc.), and other manipulations. Herein we discuss the design, development, and evaluation of Auraglyph, a system created according to these concepts. We then present an evaluation of these two systems, miniAudicle for iPad and Auraglyph, and the conceptual foundations underlying them. This evaluation uses two approaches. The first is a set of user studies in which a number of individuals used miniAudicle for iPad or Auraglyph to create music. The second is a discussion of the author’s personal experiences with using the software described, including technical exercises and compositional explorations. Lastly, this dissertation concludes by offering a framework for designing creative mobile software. This framework has been developed based on the experiences and results described herein as well as existing research in this area. It comprises direct manipulation, dynamic graphics, physicality, network, and identity. We intend to develop and evaluate further research of this nature with this framework in mind, and hope that other researchers and music technologists will find it useful in their endeavors.
1.3 Contributions This dissertation offers the following contributions.
CHAPTER 1. INTRODUCTION
10
• Two drastically different implementations of gestural interactive systems for music programming and composition. These are miniAudicle for iPad, a software application for programming music on tablet devices using the ChucK programming language, and Auraglyph, a software application for sound design and music composition using a new language, based on handwritten gesture interaction. • A series of evaluations of these ideas and their implementations, judging their merit for music creation and computer music education. These evaluations have shown that gestural-based music programming systems for mobile touchscreen devices, when executed effectively, provide distinct advantages over existing desktop-based systems in these areas. • A conceptual framework for gestural interactive musical software on touchscreen devices, consisting of direct manipulation, dynamic graphics, physicality, network, and identity.
Chapter 2
Background 2.1 Systems for Interactive Music Programming and Live Coding Much of this work draws from desktop computer-based environments for music programming, including miniAudicle [2] and the Audicle [3], two programming environments for the ChucK music programming language [4]. ChucK is a text-based music programming language that includes fundamental, language-level primitives for precise timing, concurrency, and synchronization of musical and sonic events. These features specifically equip the language to deal with both high-level musical considerations such as duration, rhythm, and counterpoint, in addition to low-level sound design concerns like modulation and delay timings. ChucK also leverages a diverse palette of built-in sounds, including unit generators designed specifically for the language and a variety of modules from the Synthesis ToolKit [5]. The Audicle provides a number of facilities for real-time graphical introspection and debugging of ChucK code (Figure 2.1). One of the primary design goals of the Audicle was to provide sophisticated “visualizations of system state” to assist the programmer in reasoning about their programs. It achieved this by spreading multiple perspectives of running code across several virtual “faces.” These included a face to visualize the syntax tree of compiled ChucK code, the links between concurrently running programs, and the timing of each running program. The CoAudicle extended musical live-coding to interactive network-enhanced performance between multiple performers [6].
11
CHAPTER 2. BACKGROUND
12
Figure 2.1: The Audicle. miniAudicle (Figure 2.2) slimmed down the Audicle in terms of design and functionality to provide a general-purpose, cross-platform, and light-weight live coding environment for ChucK programming. Impromptu [7] and SuperCollider [8] are both audio programming languages whose standard
Figure 2.2: miniAudicle.
CHAPTER 2. BACKGROUND
13
coding environments were built around interactive coding. Each of these systems uses an interactive code editing interface, enabling live coding of music in a performative setting [9]. More recent developments in live-coding systems have lead to ixi lang, which complements SuperCollider with a domain-specific language designed for live-coding [10], Overtone, which uses the Clojure programming language to control SuperCollider’s core synthesis engine [11], and Sonic Pi, a Ruby-descended language and environment for musical live coding [12]. The Field programming language [13], designed for creating audiovisual works, features a number of enhancements to textual programming conventions (Figure 2.3). Notably, graphical user interface elements such as sliders, buttons, and graphs can be inserted and modified directly in code. Code in Field can further be arranged in a graphical window, called the “canvas,” and be subjected to a variety of interactive graphical manipulations.
Figure 2.3: The Field creative programming environment. The object graph-based programming languages Pure Data (Pd) [14] and Max [15] (Figure 2.4) have tremendously influenced the space of visual computer music design and composition and represent the evolving status quo of graphical music programming. Programming in Pd or Max consists of creating objects and other nodes in a graph structure, in the computer science sense of the term, that is displayed on screen; connecting these nodes in various arrangements forms computational logic. These connections proceed at either control rate or audio rate depending on the identity of
CHAPTER 2. BACKGROUND
14
the object and the specification of its input and output terminals—audio-rate objects are allowed to have control-rate terminals, and audio-rate terminals may in some cases accept control-rate input. In delineating the underlying design choices made in the development of Max, Puckette candidly describes its advantages and limitations [16], and, to the extent that they are comparable, those of its counterpart Pure Data. Puckette expresses an ideal in which each constituent node of a patch displays its entire state in text format. For instance, cycle˜ 100 expresses a sine wave at 100 Hz, or lores˜ 400 0.9 expresses a resonant lowpass filter with a cutoff frequency of 400 and filter Q of 0.9. These parameters are explicitly displayed in the graphical canvas, rather than hidden behind separate editor windows. In a sense, one can easily scan a Max document window and understand quickly the nature of the entire program, or print the program and be able to accurately recreate most of its parts.
Figure 2.4: Max. This clarity of design is absent in other aspects of Pd and Max, however. In both languages, the identity and order of an object’s arguments must generally be known beforehand to set them in a newly created object. The identity and function of each object’s inlets and outlets are also not
CHAPTER 2. BACKGROUND
15
displayed. Furthermore, the exact function of an inlet can vary depending on the content of its input, requiring the user to simply memorize this information or look it up in the documentation. In one common example, the soundfiler in Pd accepts multiple “flags” as the first item in a list of inputs; the value of this flag determines how the rest of the inputs are interpreted (for instance, “read” will read the specified sound file to a specified buffer or “write” to write one). This is a convenient way to multiplex sophisticated functionality into a single object, but can also obscure program logic as the nature of any particular object input is not clear based on visual inspection. More recent versions of Max have improved on these issues by assertively displaying contextual documentation. The design of arrays and tables in Pd or buffers in Max also presents roadblocks to easily understanding the greater functionality of a whole program. In both languages, the array/table or buffer must be declared and named in an isolated object, adding it to a global namespace of audio buffers. It can then be read or modified by other objects by referring to it by name. By separating the definition of these buffers from the control flow of their use and managing them in a global namespace, high-level program logic is allowed to be spread out among disparate points in the program.
Figure 2.5: An example graphical score in Pure Data. One of the primary motivations of developing Pure Data was to introduce a mechanism for openended user-defined musical scores [17]. As implemented, this consists of a mechanism for defining
CHAPTER 2. BACKGROUND
16
hierarchies of structured data and a graphical representation of these structures (Figure 2.5). Pd enables these data structures to be manipulated programmatically or by using a graphical editor. Several efforts have extended Pd and Max’s programming model to the synthesis of real-time graphics. The Graphics Environment for Multimedia (GEM), originally created for Max but later integrated into Pd, is an OpenGL-based graphics rendering system [18, 19]. Graphics commands are assembled using Pd objects corresponding to geometry definition, properties of geometries, and pixel-based operations, including image and video processing. Jitter1 is a Max extension with tools for handling real-time video, two- and three-dimensional rendered graphics, and video effects. Reaktor2 is a graphical patching system for developing sound processing programs (Figure 2.6). Despite a superficial resemblance to Max and Pure Data, Reaktor does not position itself as a programming language per-se. Instead, it is presented as a digital incarnation of analog modular synthesis with virtual modules and patch cables on-screen. Reaktor’s design makes a fundamental distinction between the interface and implementation of software developed using it. This separation allows end-users of a Reaktor program to interact with the software’s high-level interface while avoiding low-level details. A similar concept is embodied in Max’s Presentation mode, which hides non-interactive objects and patches, leaving only the interactive parts of the program. Kyma is a graphical audio programming and sound design system also based on interconnecting a modular set of sound processing nodes [20, 21]. Kyma was designed to sustain both sound design and musical composition within the same programming system, forgoing the orchestra/score model common at the time. In addition to a dataflow perspective of sound, Kyma provides additional timeline, waveform, and mixing console views of sound creation. The SynthEdit program for the NeXT Computer allowed users to graphically build unit generator topologies by selecting individual components from a palette and connecting them using the mouse cursor [22]. Unit generators could be parameterized through an inspector window. Developed from the NeXT computer’s basic drawing program, SynthEdit also allowed users to freely draw on the panel containing the audio components [23]. 1 2
https://cycling74.com/max7 https://www.native-instruments.com/en/products/komplete/synths/reaktor-6/
CHAPTER 2. BACKGROUND
17
Figure 2.6: A patch in Reaktor. Derivative TouchDesigner3 is a graphical dataflow programming language for computer graphics and sound (Figure 2.7). A TouchDesigner program consists of a graph of objects that process sound, video, or geometry data in various ways. Its design and user feedback is somewhat antithetical to that of Pure Data and Max, in that the properties of each node can only be viewed or edited through a dialog that is normally hidden, but each node clearly displays the result of its processing in its graphical presentation. Rather than displaying the definition of any given node, TouchDesigner shows that node’s effect. As a result, it is straightforward to reason about how each part of a TouchDesigner program is affecting the final output. More recently, the Kronos programming language extends functional programming models to a visual space in the context of real-time music synthesis [24]. In Kronos, a programmer directly edits the syntax tree representation of a program; inputs to functions are routed from literal values, other function outputs, or function objects themselves. These functions are capable of operating both on individual data items or on full audio or graphical signals, allowing for a rich synergy between computation, synthesis, and visualization (Figure 2.8). 3
https://www.derivative.ca/
CHAPTER 2. BACKGROUND
18
Figure 2.7: Derivative TouchDesigner.
Figure 2.8: A peak meter in Kronos (from [24]). There exist numerous graphical programming environments specific to domains other than music and audio synthesis. National Instruments’ LabVIEW4 is a graphical programming system for scientific, engineering, and industrial applications such as data acquisition, control systems design, and automation. MathWorks’ Simulink5 allows its programmers to simulate complex interacting industrial and scientific systems by creating connections between the individual components of a 4 5
https://www.ni.com/labview/ https://www.mathworks.com/products/simulink.html
CHAPTER 2. BACKGROUND
19
block diagram. Apple’s Quartz Composer6 and vvvv7 are both dataflow-oriented graphical programming languages for creating fixed or interactive multimedia compositions; vvvv additionally includes functionality for integrating textual programming, physical computing, computer vision, and projection mapping.
2.2 Mobile and Touch Computing Two significant contemporary trends in mainstream computing—the increasing prevalence of mobile computing devices, and the proliferation of touch as a primary computer interface—have vastly expanded the potential of music technology practitioners and researchers. A number of software applications from mobile application developer Smule have begun to delineate the space of possibilities for mobile music computing [25]. The iPhone application Ocarina is both a musical instrument, designed uniquely around the iPhone’s interaction capabilities, and a musical/social experience, in which performers tune in to each others’ musical renditions around the world [26, 27] (Figure 2.9). The design of Ocarina explicitly leveraged the user input and output affordances of its host device, and avoided design choices which might have been obvious, or have been suitable for other devices, but a poor fit for a mobile phone platform. Rather than “taking a larger instrument (e.g., a guitar or a piano) and ‘shrinking’ the experience into a small mobile device, [Wang] started with a small and simple instrument, the ocarina, [...] and fleshed it out onto the form factor of the iPhone” [26]. In fact, Ocarina used nearly every input and output on the iPhone, including the microphone, touchscreen, the primary speaker, geolocation, WiFi/cellular networking, and the orientation sensors. To use the instrument component of Ocarina, a user would blow into the microphone on the bottom of the device. This input signal, effectively noise, was analyzed by an envelope follower, and used as a continuous control of amplitude. Pitch was controlled using four on-screen “tone-holes,” or buttons; as with an acoustic ocarina, each combination of pressed buttons produced a different pitch. To enable a degree of 6
https://developer.apple.com/library/content/documentation/GraphicsImaging/Conceptual/ QuartzComposerUserGuide/qc_intro/qc_intro.html 7 https://vvvv.org/
CHAPTER 2. BACKGROUND
20
virtuosity, tilting the device introduces vibrato. The resulting audio output is a synthesized flutelike sound, and performance of the instrument both sonically and gesturally resembles playing an acoustic ocarina.
Figure 2.9: Smule Ocarina instrument (left) and global musical network (right). In addition to a self-contained instrument, Ocarina provides a concise and effective networking and social model. As users play the Ocarina, their performances were recorded and streamed to Smule’s servers, along with a limited amount of metadata including location and a self-chosen username. In a separate mode of the application, users could then listen to these uploaded performances from around the world. The individual musical notes of each performance emanated from the approximate location of its performer, while a gentle glow highlighted the locations of the user base as a whole. This social experience is mediated entirely by music; in contrast to conventional social networks, there is no ability to communicate by text or non-musical means. Critically, the world performances were effectively recordings that were replayed, rather than a live stream–thus evading difficulties related to network latency in live musical interaction. In this way Ocarina created a unique social/music experience. Leaf Trombone: World Stage took this model a step further, by congregating groups of users into a live American Idol-like panel of judges to critique and rate performances on a mobile phone instrument based on a trombone [28, 29]. As judges listened to a Leaf Trombone performance, they provided commentary in the form of text comments and emoji as the performer watched his or her rendition be pilloried or praised. A final rating of the complete performance would be provided by each judge at the performance’s conclusion.
CHAPTER 2. BACKGROUND
21
Magic Fiddle brought these ideas to the iPad, whose larger size accommodates a different set of interactions, such as finer grained pitch control and touch-based amplitude control [30]. Magic Fiddle also imbued a distinctive personality in its software design as the application “spoke” to the user as a sentient being with its own motivations and emotions. According to Wang et al., Magic Fiddle could express a variety of feelings and mental states, seeming variously “intelligent, friendly, and warm,” or “preoccupied, boastful, lonely, or pleased”; the instrument also “probably wants to be your friend.” This sense of personality, part of a larger tutorial mechanism in which the instrument effectively acted as its own teacher, served to draw the user in to the musical experience deeper than they might with an inanimate object.
Figure 2.10: The Princeton Laptop Orchestra. The conceptual background of these endeavors stems from research in the Princeton Laptop Orchestra [31, 32] (Figure 2.10), the Stanford Laptop Orchestra [33], and the Small Musically Expressive Laptop Toolkit [34]. In motivating the creation of the Princeton Laptop Orchestra, the ensemble’s co-founder Dan Trueman considered a number of issues in digital instrument design, including the “sonic presence” of an instrument and the extent to which a performer is engaged with their instrument (so-called “performative attention”) [35]. These notions extend from previous research in instrument design by Trueman and collaborators Curtis Bahn and Perry Cook [36, 37, 38], such as BoSSA, a violin-like instrument outfitted with a sensor-augmented bow and a spherical speaker array. The central pillar of these efforts is a focus on instrument creation as an intricate weaving of digital technology, sound design, gestural interaction design, and physical craft. Cook explicitly argues the importance of mutual interdependence between a physical musical interface
CHAPTER 2. BACKGROUND
22
and the underlying digital synthesis it controls [39], in contrast to the anything-goes bricolage made possible by an increasing variety of interchangeable MIDI controllers and software synthesizers. Each of the efforts described above examines the explicit affordances of a technological device as a musical instrument in its own right, rather than as a generic unit of audio computing. They further consider the dynamics of its essential parts and integrate these into the aesthetics of the instrument. This approach ensures a maximally efficient interface between gesture and sound, allowing for both initial accessibility and expressive depth. These principles have deeply influenced the design of the new, touch-based programming tools discussed herein. The Mobile Phone Orchestra (MoPhO) has applied these techniques in developing a set of compositions and principles for mobile phone-based performance practice [40] (Figure 2.11). One MoPhO composition leveraged the phone’s spatial sensors to detect the direction in which it was being pointed, and as a performer blew into the phone’s microphone, synthetic wind chimes would play from loudspeakers in that direction. Another piece treated a set of phones as ping-pong paddles, using inertial sensing to detect its performers’ swinging motions and bounce a virtual ball around the performance space while sonifying its movement and activity. MoPhO further calls for additional practices to leverage the distinct capabilities of mobile technology, such as “social and geographical elements of performance” and audience participation. Many of MoPhO’s compositions were built using the MoMu toolkit [41] (Figure 2.12), a programming framework for real-time interactive mobile audio applications. MoMu unifies a variety of interactive technologies prevalent in mobile art and music applications into a single common interface. Among these are 3-D graphics, touch interaction, inertial/spatial sensing, networking, fast Fourier transform, and of course real-time audio. In this sense, it explicitly delineates which technologies its authors have found valuable in realizing interactive mobile music software. A number of efforts have signaled a desire to streamline the design of musical interactions based on touchscreen and mobile devices. Many of the tools in this space have leveraged the gestural and interactive aspects of mobile and touchscreen technology while leaving with conventional computers the parts that work comparatively well on that platform. TouchOSC8 is a flexible control application that sends Open Sound Control [42] or MIDI messages to an arbitrary client in response to user 8
https://hexler.net/software/touchosc
CHAPTER 2. BACKGROUND
23
Figure 2.11: The Stanford Mobile Phone Orchestra.
Figure 2.12: MoMu conceptual diagram. interaction with a variety of on-screen sliders, rotary knobs, buttons, device orientation, and other controls. TouchOSC comes with a selection of default interface layouts in addition to providing tools to customize new arrangements of controls. In this sense, TouchOSC exports to other software programs the task of mapping the controls to sound (or computer-generated visuals, or any other
CHAPTER 2. BACKGROUND
24
desired output). This facilitates rapid development of a mobile audio system built using any tool capable of working with OSC or MIDI. However, this typically also binds the system to a desktop or laptop computer and a wireless network for exchanging data with this computer; while it is possible for the OSC/MIDI client to run locally on the mobile device in tandem with TouchOSC, in the author’s experience this is less common. This impairs scenarios where the mobile device might be freely moved to different locations, and as noted by Tarakajian [43] forces the performer to consider low-level networking details like IP addresses and ports. Generally these details are not what musicians want to primarily concern themselves with in the midst of setting up a performance, and in the author’s experience wireless networking is a common point of failure in computer music performance. TouchOSC is also limited to interactions that are compatible with its fixed set of control objects; interactions dependent on providing rich feedback to the user about the system being controlled are not possible with TouchOSC. Roberts’ Control [44, 45] automatically generates interface items like sliders and buttons on a touchscreen, based on specifications provided by connected desktop software such as Max, SuperCollider, or any other software compatible with OSC. Control allows a desktop-based computer program to completely describe its interface to a mobile phone client which then presents the interface as specified. Aside from laying out stock control widgets, Control permits interaction between the widgets as well as creation of new widgets by adding JavaScript code. Control obviates manual network configuration by automatically discovering devices running Control on the same network. Mira, an iPad application, dynamically replicates the interface of desktop-based Max/MSP programs, joining conventional music software development with touch interaction [43]. Pure Data [14] has seen use across a range of portable devices, from early mass-market portable devices like the iPaq personal digital assistant and the iPod [46, 47] to more recent devices running iOS and Android mobile operating systems [48]. These efforts have typically utilized Pd as an audio backend to a purpose-built performance interface, without the ability to directly program on the device itself. MobMuPlat is a software application that generalizes this concept, allowing user-generated Pd patches to be loaded alongside a definition for a custom user interface system [49]. The Pure Data script can be modified normally through its standard desktop editor, and the custom interface can be edited with a desktop application accompanying the mobile application.
CHAPTER 2. BACKGROUND
25
TouchOSC, Control, Mira, MobMuPlat, and similar tools utilize existing programming frameworks for desktop computers to build musical systems for mobile and touchscreen-based devices that don’t natively support these programming frameworks. This is an effective strategy for creating mobile music experiences in a way that leverages an artist or engineer’s existing skill set with a desktop-based programming system. On the other hand, these systems avoid the challenging question of how one might develop sophisticated musical software on the device itself, without being attached to a desktop computer. Perhaps this is to their benefit; it is not immediately evident that an existing programming system like Max or Reaktor would translate well to an mobile touch environment. The JazzMutant Lemur,9 in the words of Roberts [44], effectively “invented the market” of “multitouch control surfaces geared towards use in the arts.” The Lemur, now discontinued from production, was a dedicated hardware device that provided a variety of on-screen controls that emitted MIDI or OSC over an ethernet connection when activated. The interface could be arranged using a companion editor that ran on desktop software; available controls included the conventional knobs, buttons, and sliders, as well as a two-dimensional “Ring Area” and a physics-based “Multiball” control [50]. The Lemur was notable for having been used widely in popular music by such diverse artists as Daft Punk, Damian Taylor (performing with Björk), Richie Hawtin, Einstürzende Neubauten, Orbital, and Jonathan Harvey.10 The Lemur concept has since been resurrected as a software application for mobile phone and tablet11 that connects its diverse set of controls to other creative software via OSC and MIDI. The reacTable [51] introduced a large-format, multi-user table that leveraged touch input in addition to interacting with physical objects placed on its surface (Figure 2.13, left). The reacTable operated by using a camera to track perturbations of projected infrared light on the table’s surface, corresponding to human touch input. This camera could also detect the presence of physical tokens on the screen tagged with fiducial markers on the surface facing the screen (Figure 2.13, right). A reacTable user can create musical output by placing objects corresponding to various audio generators, effects, and controls on the table. The reacTable software automatically links compatible 9
https://www.jazzmutant.com/lemur_overview.php https://www.jazzmutant.com/artists_lemurized.php 11 https://liine.net/en/products/lemur/
10
CHAPTER 2. BACKGROUND
26
Figure 2.13: The reacTable (left) and an example fiducial marker (right). objects, such as a loop player and a delay effect, when they are moved or placed in proximity to each other. Up to two parameters can be controlled directly for each object by turning the physical token in either direction or by swiping a circular slider surrounding the token. Additional parameters are exposed through a popup dialog that can be activated by an on-screen button; for instance, filters can be swept in frequency and adjusted in Q by dragging a point around in two-dimensional space. This arrangement accommodates a variety of physically befitting gestures; lifting a token from the screen removes it entirely from the sound processing chain, and swiping the connection between two objects will turn its volume up or down. The reacTable easily handles multiple performers, and in fact its size and the quantity of individual controls seems to encourage such uses. Furthermore, the reacTable’s distinctive visual design and use of light and translucent objects on its surface create an alluring visual aesthetic. At the same time, the dependence on freely moving tangible objects abandons some of the advantages of a pure-software approach, such as easily saving and recalling patches or encapsulating functionality into modules. The reacTable website documents a save and load feature,12 but it is not clear if or how this interacts with physical objects that are placed on the screen. Jordà indicates that the modular subpatch creation was achieved in a prototype of reacTable, but this feature does not seem to have made it to a current version. Jordà has stated that an explicit design goal of reacTable was to “provide all the necessary information about the instrument and the player state” [51]. In addition to displaying the two directly controllable parameters for each object, the reacTable provides a first-order visualization of each 12
http://reactable.com/live/manual/
CHAPTER 2. BACKGROUND
27
object’s effect on the audio signal path by displaying its output waveform in real-time. Objects that generate notes or other control information display pulses corresponding to magnitude and quantity.
Figure 2.14: Reactable Mobile. The reacTable concept and software was later released as an application for mobile phone and tablet13 called “Reactable Mobile” (Figure 2.14). Reactable Mobile replaced physical tokens with on-screen objects that could be dragged around the screen and connected, mirroring much of the functionality of the original reacTable. ROTOR14 developed these ideas further on commercially available tablet computers, in particular offering a set of tangible controllers that could be placed on-screen to interact with objects within the software. Davidson and Han explored a variety of gestural approaches to musical interaction on largescale, pressure-sensitive multitouch surfaces [52]. These included pressure sensitive interface controls as well as more sophisticated manual input for deforming objects or measuring strain applied to a physical modeling synthesis. There is a rich history of electronic musical control interfaces that are touch-sensitive but do not 13 14
http://reactable.com/mobile/ http://reactable.com/rotor/
CHAPTER 2. BACKGROUND
28
Figure 2.15: The Buchla Thunder (left) and SLABS (right). have a screen. The Buchla Thunder MIDI controller (Figure 2.15, left) included a number of pressure and location sensing touch-plates in a strikingly unconventional arrangement [53]. The Thunder extended the the touch-plate interface common in Buchla’s modular synthesis instruments [54] to the field of standalone digital musical interfaces. David Wessel’s SLABS (Figure 2.15, right), developed with Rimas Avizienis, extended the touch sensitive pad musical interface further, providing 24 pads that sensed two-dimensional position as well as vertical force; each pad was sampled at high frequency and applied to sound synthesis at audio rate [55]. Early developments in touch input in the context of computer music is associated with the research of William Buxton, who with Sasaki et al. developed a capacitive touch sensing tablet and used as a simple frequency-modulation synthesizer, as a percussion trigger, and as a generalized control surface for synthesis parameters [56]. Interestingly, these researchers seem to regard their FM synthesizer merely as a test program, ignoring the possibility that this could be a new kind of musical instrument. In early research on acoustic touch and force sensitive displays, Herot and Weinzapfel hint at the desire to directly engage with virtual objects not only at a visual level but to “convey more natural perceptions” of these objects, specifically weight [57]. UrMus is a framework for prototyping and developing musical interactions on mobile phones [58]. UrMus includes a touch-based editing interface for creating mappings between from phone’s sensor inputs to audio and visual outputs. In this sense, it is one of the only systems at this time that attempts to engage mobile music programming entirely on the mobile device itself. UrMus also embeds a Lua-based scripting environment for developing more intricate mobile music software interactions.
CHAPTER 2. BACKGROUND
29
In the domain of general purpose computing, TouchDevelop is a touch-based text programming environment designed for use on mobile phones, in which programming constructs are selected from a context-aware list, reducing the dependence on keyboard-based text input [59]. Codea is a Lua-based software development system for iPad in which text editing is supplemented with touch gestures for parameter editing and mapping.15
2.3 Handwriting Input for Computing and Music Beginning in the early 20th century, the desire to construct sound by hand through technological means can be seen in the work and experimentation of a variety of artists, musicians, and technologists. Early experiments in manually inscribing sound forms into gramophone records were evidently conducted by the artist László Moholy-Nagy and composers Paul Hindemuth and George Antheil, independently of one another, but the results of these experiments were never released or formalized into musical works [60]. Anticipating these explorations, Moholy-Nagy proposed the creation of a so-called “groove-script” for emplacing synthetic sounds into the grooves of a record, the possibilities of which would represent “a fundamental innovation in sound production (of new, hitherto unknown sounds and tonal relations)” [61, 62]. Later, following the inception of optical encoding for cinematic film sound, Moholy-Nagy called for “a true opto-acoustic synthesis in the sound film” and for the use of “sound units [...] traced directly on the sound track” [60, 63]. Oskar Fischinger and Rudolph Pfenninger, having extensively studied the visual forms of optically-encoded sounds, both released works incorporating meticulously drawn graphics that were photographed and then printed to the sound track of motion picture film; the latter artist referred to his practice as tönende Handschrift (Sounding Handwriting) [60]. Almost simultaneously with Fischinger and Pfenninger’s efforts, research in converting handmade forms to sound recorded on optical film was being conducted by a number of Russian musicians and inventors, including Arseny Avraamov, Evgeny Sholpo, Georgy Rimsky-Korsakov, and Boris Yankovsky [64]. These efforts led to Sholpo’s development of the Variophone, which modulated waveforms drawn on rotating discs to enable control of pitch, timbre, and volume (Figure 15
https://codea.io/
CHAPTER 2. BACKGROUND
30
2.16, left) [65]. Yankovsky’s Vibroexponator utilized a rostrum camera and other optomechanical mechanisms to compose tones of varying pitch from a library of hand-drawn spectra and amplitude envelopes (Figure 2.16, right) [65].
Figure 2.16: Optical disks used by the Variophone (left) and hand-drawn musical scale used with the Vibroexponator (right) (from [65]). Animator Norman McLaren composed soundtracks for his video works by directly drawing on the sound track portion of the film surface [66, 67] or by photographing a set of cards with sounds hand-drawn on them [68]; the latter technique was employed in the soundtrack for his Academy Award-winning short film Neighbours (1952) [67]. The ANS Synthesizer was an optoelectrical additive synthesizer that generated waveforms from pure tones, representing a range of frequencies, machine-inscribed on a rotating glass disc [69]. A musician would compose with these tones by drawing freehand on a plate covered in mastic that was aligned with the disc, with the horizontal dimension representing time and the vertical dimension representing frequency (Figure 2.17). These strokes would cause gaps in the mastic’s coverage of the plate, and a photoelectric system would synthesize the pure tone waveforms at frequencies corresponding to these gaps’ positions. The Oramics machine, created by composer and sound engineer Daphne Oram, comprised ten loops of optical film which were drawn onto by the machine’s operator (Figure 2.18) [70]. Each strip was responsible for controlling a separate synthesis parameter, including waveform shape, duration, vibrato, timbre, intensity, amplitude, and frequency. Oram suggested that, using the Oramics system, a composer could learn “an alphabet of symbols with which he will be able to indicate all
CHAPTER 2. BACKGROUND
31
Figure 2.17: An example score for the ANS Synthesizer (from [69]). the parameters needed to build up the sound he requires.” In light of the system’s method of use and its characteristic musical results, Hutton has suggested that “Oram was more motivated to explore the technological process relating to the sound” rather than purely by music composition [70].
Figure 2.18: Daphne Oram operating the Oramics machine. Early research in direct, graphical manipulation of digital data via pen or stylus, augmented by
CHAPTER 2. BACKGROUND
32
computer processing, can be seen in Sutherland’s Sketchpad [71]. In Sketchpad, a light pen is used to control the generation of geometric structures on a computer screen. Pen input is used to define the visual appearance of these structures, such as the number of sides of a polygon or the center point of a circle. Imprecise pen strokes of the user are converted into straight lines and continuous arcs; rather than encoding the direct representation of the user’s gestures, Sketchpad translates user input into an assemblage of predefined geometric symbols (Figure 2.19). These symbols are parameterized in shape, position, size, and other properties to match user input in a structured fashion. Pen input is also used to describe geometric constraints applied to these structures by the software. As further input modifies a parameter of a structure, dependent constraints can be automatically updated as needed. For example, if a given structure is moved in the viewing plane, other objects anchored to it will move in tandem. In this way, both the visual appearance of the system and an underlying set of semantics are encoded using pen input, forming a framework for a kind of visual programming.
Figure 2.19: Adjusting a hand-drawn graphical structure in Ivan Sutherland’s Sketchpad. Similar historical research in pen input for programming is evident in GRAIL and the RAND Tablet [72]. The RAND Tablet consisted of a CRT screen, a flat tablet surface mounted in front of it, and a pen whose position could be detected along the tablet. The RAND Tablet found a variety of uses including recognition of Roman and Chinese characters [73] and map annotation. The GRAIL
CHAPTER 2. BACKGROUND
33
language was a programming system for the RAND Tablet which enabled the user to create flowchart processes through the “automatic recognition of appropriate symbols, [allowing] the man to print or draw appropriate symbols freehand” [74]. It is not clear what specific applications these flowchart operations were intended for, but a later system called BIOMOD applied these concepts to biological modeling [75]. The UPIC, designed by Iannis Xenakis, was a digital music system comprising a computer, a digitizing tablet, and two screens, and was intended to sustain compositional activities at both the micro and macro levels [76]. Using the tablet, a composer could freely draw any desired cyclical waveforms and amplitude envelopes; these were visualized on-screen and made available for synthesis. Using a similar procedure of drawing strokes on screen, the composer could then direct these waveforms, with the vertical position corresponding to frequency and the horizontal corresponding to time (Figure 2.20). Separate displays were used for drawn graphical and alphanumeric text rendering, and a printer was also available to print the graphics. Xenakis used the UPIC to develop his work Mycenae-Alpha [77]. The ideas underlying the UPIC were later extended to Iannix, a software application for conventional desktop computers that features graphic score notation and playback [78].
Figure 2.20: A waveform designed in UPIC (left) and a UPIC score (right) (from [79]). The Fairlight CMI digital music workstation came equipped with a light pen that could be used to draw on-screen waveforms to be synthesized, adjust partial amplitudes of an additive synthesizer, or modify other control parameters of the synthesis system [80] (Figure 2.21). Later versions of the CMI relocated the pen input to a screen-less tablet next to the computer’s keyboard, reportedly to
CHAPTER 2. BACKGROUND
34
address issues related to user arm fatigue from holding up the pen to the screen.16 New England Digital’s Synclavier was a similar dedicated computer music workstation in use at this time, integrating hardware, software, a musical keyboard, and a control panel to enable real-time music composition, performance, and education [81].
Figure 2.21: Editing synthesis parameters with the Fairlight CMI light pen. A number of research efforts have explored machine recognition of handwritten Western music notation. Fujinaga explored image-based recognition of music notation using a nearest-neighbor classifier and genetic algorithm [82, 83, 84]. Buxton et al. developed a number of score editing tools for the Structured Sound Synthesis Project, including a basic two-dimensional pen gesture recognizer for input of note pitch and duration [85]. The Music Notepad by Forsberg et al. developed this approach further, allowing for pen input of a rich variety of note types and rests, as well as pen gesture-based editing operations like deletion, copying, and pasting [86]. Miyao and Marayuma combined both optical and pen gesture approaches to improve the overall accuracy of notation recognition [87]. Landay’s research in interface design led to the development of SILK, an interactive design tool in which interface items sketched by a user are recognized and dynamically made interactive [88]. Individual interface “screens” constructed in this way can then be linked by drawing arrows from an interactive widget to the screen it activates. This approach greatly reduces the inefficiency in conceptualizing, prototyping, and validating a user interface by combining these processes into one. 16
http://www.musictech.net/2014/05/studio-icons-fairlight-cmi/
CHAPTER 2. BACKGROUND
35
PaperComposer is a software interface for musicians to design a custom, paper-based interface [89]. As one sketches over the printed interface with a digital pen, these interactions are digitized and linked back to the digital representation. Commercially available graphics input tablets, such as those manufactured by Wacom, have found broad use in computer music composition and performance. Graphics tablets provide a wealth of constant sensory data applicable to musical control, including three-dimensional position relative to the tablet surface, two-dimensional tilt angle of the pen, rotation of the pen, and pen pressure [90]. Mark Trayle developed a variety of techniques for utilizing stylus input in performance [91]. In some cases, the x and y positions of the pen, its pressure, and its angle of tilt would each be mapped to synthesis parameters, creating a direct interface to sound design and an instrument aiming for “maximum expression.” Often Trayle would directly trace graphic scores produced by his frequent collaborator Wadada Leo Smith. Other modes of graphics tablet use employed by Trayle would divide the tablet panel into multiple zones, with some zones mapped to direct control of a synthesis algorithm and other zones controlling on-going synthesis processes. Wright et al. developed a variety of tablet-based musical instruments, including a virtual tambura, a multi-purpose string instrument, and a tool for navigating timbres in an additive synthesizer [92]. Wessel and Wright discussed additional possibilities using graphics tablets, such as twohanded interaction involving a pen and puck mouse used on the tablet surface, direct mapping of tablet input to synthesis parameters, dividing the tablet surface into multiple regions with individual sonic mappings, treating the tablet surface as a musical timeline, and the entry of control signals or direct audio data that are buffered and reused by the underlying music system [93]. The use of graphics tablets by other digital musicians has extended to drawing of a dynamic, network-synchronized graphic score [94], control of overtone density in a beating Risset drone [32], bowing of a virtual string instrument [95], and parameterizing the playback of a stored audio sample through both conventional and granular means [96].
CHAPTER 2. BACKGROUND
36
2.4 Summary We have presented a variety of previous research and thought related to interactive music programming, music computing for mobile touchscreen technologies, and handwriting-based musical expressivity. Several common threads can be found in these collective efforts. One is a tension between high-level and low-level control. While many systems have sought to control higher levels of musical abstraction with gestural control, others have held on to direct mappings of synthesis parameters. Another is secondary feedback given about the system’s state; many of the systems discussed in this chapter provide additional visual information to represent the internal processes responsible for the resulting sound. At the core of much of this work there seems a desire to find new sources of creativity latent in emerging technology.
Chapter 3
Coding and Musical Gesture In this chapter we examine an often overlooked possibility of mobile touchscreen computing, that of text-based music software programming. To this end, we have designed and implemented an iPad application for real-time coding and performance in the ChucK music programming language [4]. This application shares much of its design philosophy, source code, and visual style with the miniAudicle editor for desktop ChucK development [2], and thus we call it miniAudicle for iPad.
3.1 Motivation and Principles The motivation behind miniAudicle for iPad is to provide a satisfactory method for creation and editing of ChucK code and to fully leverage the interaction possibilities of mobile touchscreen devices. The overriding design philosophy was not to transplant a desktop software development environment to a tablet, but to consider what interactions the tablet might best provide for us. Firstly we note that it is unreasonable to completely discard the text-input metaphor, that of typing code into an editor. The fundamental unit of ChucK programming is text. For these reasons we have sought to create the best code editing interface we could for a touchscreen device. Typing extensive text documents on touchscreens is widely considered undesirable. However, using a variety of popular techniques like syntax highlighting, auto-completion, and extended keyboards, we can optimize this experience. With these techniques, the number of keystrokes required to enter code is significantly reduced, as is the number of input errors produced in doing so. Additional interaction 37
CHAPTER 3. CODING AND MUSICAL GESTURE
38
techniques can improve the text editing experience beyond what is available on the desktop. For example, one might tap a unit generator typename in the code window to bring up a list of alternative unit generators of the same category (e.g. oscillators, filters, reverbs). Tapping a numeric literal could bring up a slider to set the value, where a one-finger swipe adjusts the value and a two finger pinch changes the granularity of those adjustments. Secondly, we believe that live-coding performance is a fundamental aspect of computer music programming, and contend that the mobile touchscreen paradigm is uniquely equipped to support this style of computing. Live-coding often involves the control and processing of many scraps of code, with multiple programs interacting in multiple levels of intricacy. Direct manipulation, the quintessential feature of a multitouch screen, might allow large groups of “units” — individual ChucK scripts — to be efficiently and rapidly controlled in real-time. This is the basis of miniAudicle for iPad’s Player mode, in which a user assembles and interacts with any number of ChucK programs simultaneously. Lastly, we are interested in the physicality of the tablet form-factor itself. The iPad’s hardware design presents a number of interesting possibilities for musical programming. For instance, it is relatively easy to generate audio feedback by directing sound with one’s hand from the iPad’s speaker to its microphone. A ChucK program could easily tune this feedback to musical ends, while the user maintains manual control over the presence and character of the feedback. The iPad contains a number of environmental sensors, including an accelerometer, gyroscope, and compass. ChucK programs that incorporate these inputs might use them to create a highly gestural musical interaction, using the tablet as both an audio processor and as a physical controller.
3.2 Design Our approach to these goals is to provide two complementary modes: Editor mode and Player mode. Editor mode aims to provide the best code editor possible given the limitations of typing text on a touchscreen. Player mode allows users to play and modify scripts concurrently using ChucK’s intrinsic on-the-fly programming capabilities. It aims to enable multitouch live-coding and performance techniques that would be difficult or impossible on traditional desktop computers. We believe the
CHAPTER 3. CODING AND MUSICAL GESTURE
39
combination of these systems makes miniAudicle for iPad a compelling mobile system for music programming and live coding. Interaction in miniAudicle for iPad is divided between these primary modes, described individually below. Several interface elements are common to both modes. First of these is a script browser which allows creating, managing, and selecting individual ChucK programs to load into either mode. Views of ChucK’s console output (such as error messages and internal diagnostics) and a list of the ChucK shreds (processes) running in the system are available from the main application toolbar. A settings menu can also be found here, and allows for turning on audio input, adjusting the audio buffer size, enabling adaptive buffering,1 and turning on background audio, which will cause ChucK programs to continue running in the background when other programs are run on the iPad. This toolbar also contains a switch to toggle between Editor and Player modes.
3.2.1 Editor Editor mode is the primary interface for writing and testing ChucK code. This mode is centered around a touch-based text editing view, in which a single ChucK source document is presented at a time (Figure 3.1). The document to edit can be changed via the script browser. Once a document is loaded, the text view provides a number of features common to programming text editors, such as syntax-based text coloring and integrated error reporting. Additionally, the on-screen keyboard has been supplemented with ChucK-specific keys for characters and combinations thereof that appear frequently in ChucK programs. These additional keys include the chuck operator (=>) and its variants, mathematical operators, a variety of brace characters, additional syntax characters, and the now/dac keywords. In cases where a specific key has variants, pressing and holding the key will reveal a menu for selecting the desired variant; for instance, the => key can be pressed and held to access keys for the filter.Q; Motion mo; MotionMsg momsg; if(!mo.open(Motion.ACCEL)) { cherr s.freq; Std.scalef(momsg.y, −1, 1, s.freq(), s.freq()*10) => filter.freq; } } }
Listing 3.1: An example of the ChucK Mobile API.
This interface has been designed with portability in mind, so it need not be tied to sensors specific to iPad. open() returns false upon failure, which allows programs to determine if the desired sensor is available on the device its running on. The motion and location sensors made available by the ChucK Mobile API are often present across a number of other consumer tablet and phone devices in addition to the iPad. If miniAudicle for iPad were to be developed for a different tablet OS or for mobile phones, ChucK programs written using the ChucK Mobile API would conceivably run and use any available sensors without modification. Using the ChucK Mobile API, ChucK programmers can readily experiment with motion-based
CHAPTER 3. CODING AND MUSICAL GESTURE
44
interfaces to sound synthesis. The ability to quickly mockup and evaluate these interactions directly on the device is a distinguishing characteristic of developing code on the device.
3.2.4 OSC and MIDI The version of ChucK embedded into miniAudicle for iPad includes ChucK’s standard support for Open Sound Control and MIDI input and output. Therefore, programs running in miniAudicle for iPad can be used as the source of musical control data sent over either protocol, or as the recipient of data generated by other programs. For instance, one might use TouchOSC as a controller for a ChucK-based synthesis patch running in miniAudicle for iPad by directing TouchOSC’s output to localhost, and ensuring the correct port is used in both programs. Another example is a generative composition, programmed in ChucK, that sends note data to the virtual MIDI port of a synthesizer running on the iPad. To support these usages, the “Background Audio” option needs to be enabled in miniAudicle for iPad’s settings menu.
3.3 Implementation miniAudicle for iPad uses standard iOS libraries including UIKit, the standard user interface framework. Leveraging a preexisting user interface library has enabled rapid development and prototyping, ease of long-term software maintenance, and design consistency with iOS and other iOS apps. The main text editing interface is a subclass of the standard UITextView that has been supplemented with several enhancements. This subclass, mATextView, has been modified to draw line numbers in a sidebar to the left of the text itself. Additionally, mATextView highlights error-causing lines in red. A given mATextView is managed by a mAEditorViewController, a common convention in UIKit and other software applications that follow the model-view-controller (MVC) pattern. The mAEditorViewController is responsible for other aspects of the editing interface, including syntax-based coloring of text and presenting inline error messages. miniAudicle for iPad integrates directly with the core ChucK implementation in a similar fashion as the desktop edition of miniAudicle. A C++ class simply named miniAudicle encapsulates the rather intricate procedures for managing and interacting with an instance of the ChucK compiler and
CHAPTER 3. CODING AND MUSICAL GESTURE
45
virtual machine. An instance of the miniAudicle class handles configuration, starting, and stopping of the VM, and can optionally feed audio input and pull audio output from the VM. It also manages loading application-specific plugins (such as those used by the ChucK Mobile API described in Section 3.2.3), adding, replacing, and removing ChucK scripts to the VM, and retrieving VM status and error information. Bespoke modifications to the core ChucK virtual machine were also made to support certain features that were felt to be necessary in miniAudicle for iPad. For instance, ChucK’s logging system, debug print (> operator), and cherr and chout output objects were modified to redirect to a custom destination instead of the default C stdout/stderr. This is used to direct ChucK printing and logging to the Console Monitor. Code changes were added to allow for statically compiling “chugins” (ChucK plugins) [97] directly into the application binary rather than dynamically loading these at run time. Several minor code fixes were also needed to allow ChucK’s existing MIDI engine to be used on iOS.
3.4 Summary miniAudicle for iPad considers how one might code in the ChucK music programming language on a medium-sized mobile, touchscreen device. While unable to abandon the textual paradigm completely, some functionality is provided to overcome the challenges created by an on-screen virtual keyboard. The application includes a separate “Player Mode” to better take advantage of the multitouch aspect of the iPad and enable a natural live coding environment. Furthermore, the ChucK Motion application programming interface is offered for the development of ChucK programs utilizing the location and orientation sensing functions of tablet devices. Overall, these capabilities provide a new and interesting platform for further musical programming explorations. However, in developing and experimenting with miniAudicle for iPad, the author held a lingering feeling that a touch-based programming system centered around text is fundamentally flawed. It was necessary to adapt the prevailing text programming paradigm to this newer environment, at the very least, to serve as a baseline for further research. But to truly engage the interactions made possible by mobile touchscreen technology, a new system, holistically designed
CHAPTER 3. CODING AND MUSICAL GESTURE
46
for the medium, would need to be created. One such system, Auraglyph, will be discussed in the next chapter.
Chapter 4
Sketching and Musical Gesture While touchscreen interaction and built-in sensors can benefit preexisting text-based music programming systems, it was clear fairly immediately that typing text code using a virtual keyboard was a poor experience. Yet it warrants further study to explore the application of touch interaction to the programming of musical software. Is there a touchscreen-based programming system that, in some instances, one might want to use over a desktop-based system? What might such a system look like and how would it work? To begin to answer these questions, we propose a new model for touchscreen interaction with musical systems: the combined use of stylus-based handwriting input and direct touch manipulation. A system using this model might provide a number of advantages over existing touchscreen paradigms for music. Stylus input replaces the traditional role of keyboard-based text/numeric entry with handwritten letters and numerals. It additionally allows for modal entry of generic shapes and glyphs, for example, canonical oscillator patterns (sine wave, sawtooth wave, square wave, etc.) or other abstract symbols. The stylus also provides graphical free-form input for data such as filter transfer functions, envelopes, and parameter automation curves. The use of a stylus for these inputs allows more precise input and thus more precise musical control. Multitouch finger input continues to provide functionality that has become expected of touch-based software, such as direct movement of on-screen objects, interaction with conventional controls (sliders, buttons, etc.), and other manipulations. Herein we discuss the design, prototyping, and evaluation of a system designed under these
47
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
48
principles, which we have named “Auraglyph.”1
4.1 Motivation The initial insight leading to this work was that numeric and text input on a touchscreen might be more effectively handled by recognizing hand-drawn numerals and letters, rather than an on-screen keyboard. Recognition of handwritten numerals and text is a classic problem of machine learning research, with many historical and contemporary innovations [98]. Crucially, a number of off-theshelf implementations of handwriting recognition techniques exist, allowing for rapid prototyping of applications using the technology without extensive research, development, refinement, and testing of machine learning algorithms, such as LipiTk [99] and the $1 Recognizer [100]. We soon realized that we could use handwriting recognition to analyze a substantial number of handwritten figures and objects beyond just letters and numbers. A user might then draw entire object graphs and audio topologies to be realized as an audio system or musical composition by the underlying software application in real-time. Handwriting recognition might also be used for modal input of standard figures in computer music, such as amplitude envelopes, filter frequency responses, staff notation, and audio mix automation. Manual sketching has been identified by many as an integral process to a number of creative endeavors. Buxton has cited sketching as the “archetypal activity of design,” necessary to quickly and inexpensively explore a variety of concepts and ideas and share these among collaborators [101]. Verplank’s writings position sketching in a simultaneous feedback cycle of imagining an outcome and seeing it manifest (Figure 4.1), likening the process to “the experience of any craftsman in direct engagement with his materials: imagining, shaping, seeing all at the same time” [102]. The art critic John Berger expounded on the exploratory nature of quick drawings and sketches, stating that “each mark you make on the paper is a stepping stone from which you proceed to the next, until you have crossed your subject as though it were a river, have put it behind you” [103]. 1
The name Auraglyph is derived from aural, “of or relating to the ear or to the sense of hearing,” (Merriam-Webster’s Collegiate Dictionary (11th ed.), Springfield, MA: Merriam-Webster Inc., 2003) and glyph, “a symbol [...] that conveys information nonverbally” (ibid.). The author concedes that he has made the classic mistake of introducing a neologism that mixes Latin and Greek roots, but the name had proven otherwise too fitting to change after the error was realized.
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
49
Figure 4.1: Idea sketching, at the confluence of imagining, seeing (or perhaps hearing?), and drawing (from [102]). These processes are similarly applicable to the design of interactive musical software. Evidenced by the research of Garcia et al. in computer-based support tools for compositional sketching [104, 89], handwritten planning is a vital component of many compositional processes, whether the resulting musical product is electronic, acoustic, or both. More generally, writing and drawing with a pen or pencil on a flat surface is a foundational expressive interaction for an incredible number of individuals; this activity is continuously inculcated from early childhood around the world. Sketching, as a natural interaction for expressing ideas in the real world, might be apt for realizing them in a virtual world. Auraglyph seeks to apply this to the context of computer music and audio design. By shortening the distance between abstract musical thought and its audible realization, computer-mediated handwriting input might arm composers to more effectively express their musical objectives. Another distinct advantage of this interaction framework is the ability to evaluate and run handwritten constructs in real time. As in Landay’s SILK, sketches in Auraglyph are reified into entities specific to the system (e.g., a drawn object might be converted to a sine wave generator, or a timer).
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
50
These entities can then present controls and interfaces for direct touch manipulation or stylus gestures, customized to that object type or mode. This level of real-time creation and manipulation affords a composer or programmer performative control similar to live-coding. Lastly, the availability of direct touch control enables a powerful two-handed interaction.2 Rather than simply using the stylus as an “extra finger,” this framework could design certain interactions for pen use and others for hand usage. Two-handed or “bimanual” interactions arise in a variety of commonplace and specialist activities, such as driving a manual-transmission vehicle, technical drawing, artistic practices, and in particular the performance of many musical instruments. Twohanded interaction can be a powerful technique to create interfaces that are accessible to new users, provide additional functionality that can be discovered over time, and are amenable to progressive degrees of user proficiency. As a user gains more experience with such an interface, he or she might gradually grow more capable with the tool overall. Sellen, Guiard, and Buxton have argued that “two hands can be a more efficient and effective approach to interaction” with regard to both time taken and cognitive effort, though careful design is needed to ensure a two-handed interaction does not turn out worse than a single-handed one [105]. Auraglyph was originally developed on tablet hardware that does not support separate pen input; the user instead employs a conductive pen-shaped object that appears to the system as another finger. The difference between pen and touch input is primarily conceptual, existing in the design intent of the software and the mind of the user. Newer tablet products such as the Microsoft Surface Pro or Apple iPad Pro support separate input paths for pen and touch, allowing these to be explicitly differentiated in a software implementation. This technology might further augment the principles of two-handed interaction present in Auraglyph; however at this time such developments are left to future research.
4.2 Principles With these ideas in mind, several principles underlie and support the design of Auraglyph. 2
Here, two-handed interaction is meant to refer to those involving both hands of a human user where each hand is intended to serve a distinct role in the overall activity.
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
51
Stylus input is used for original input of structures. These structures are then converted from raw pen strokes to objects in the system of a specified class and carrying adjustable characteristics. The class is determined by the form of the raw input and the current mode the software is in. For instance, drawing a circle form in the base mode of the app creates an audio generator node. Drawing an arbitrary figure in the waveform editor creates an oscillator with that waveform. In some cases the raw pen input is left as such, as in the case of “freedraw” mode or the waveform editor. In these cases the system representation of the structure differs little or not at all from the drawn form, and no further conversion is necessary or appropriate. Touch input is used to further adjust and parameterize these structures. Base-level nodes can be freely moved around the canvas. Parameters of oscillators and filters can be adjusted using a slider-like mechanism. Input modes are used to differentiate and separate stylus and touch input that can be easily confused by the system or that might lead to excessive input error. For instance, a separate freedrawing input mode is used to allow users to draw freeform figures without them being analyzed by the handwriting recognizer. A select mode allows users to select multiple structures for batch processing. The provision of multiple input modes can be seen as a compromise between some ideal of design purity and usability. The use of different modes implies the user must perform at least one extra action and additional mental processing before carrying out any particular desired interaction, inhibiting the user’s flow between different activities within the app. In exchange for this additional effort, the user is given a greater breadth of possible activities. Real-time feedback is used to constantly inform the user of the results of their decisions and modifications to the application. Fast and constant feedback on programming decisions is necessary in the context of creative coding. Often code and systems will be developed without a particular goal in mind; or, an aesthetic idea, once executed, will prove to be ultimately undesirable. As exemplified by the Audicle and reacTable systems, a programming environment that provides real-time feedback on how a program is being executed will better arm a creative coder to explore a breadth of sonic opportunities.
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
52
Figure 4.2: A full Auraglyph program, with annotations.
4.3 Design A user interacts with Auraglyph with a stylus and with touch. The basic environment of Auraglyph is an open, scrollable canvas, extending infinitely in two dimensions, in which the user freely draws. Pen strokes are reified into interactive nodes (such as unit generators or control rate processors), which can then be linked by drawing connections between them or parameterized with touch gestures. The nodes the user has drawn, the parameters they have been assigned, and the connections drawn between them are called an Auraglyph program (Figure 4.2), and determine the overall sound that is produced. The system’s interpretation of the user’s gestures in this basic environment depends on the current input mode, which the user can select from a set of buttons on the bottom left of the screen. Current supported input modes are node and freedraw. The node mode, described in the next section, allows for creating audio and control processing nodes, making connections between them, and adjusting their parameters. The freedraw mode allows users to directly draw onto the canvas. Strokes in freedraw mode are left as is for the user to annotate or decorate their program.
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
53
A few basic gestures apply to the canvas regardless of the input mode. A two-finger drag gesture will scroll the canvas in the direction of the drag. A two-finger pinch will zoom in or out.
4.3.1 Node Mode In node mode, after a user completes a pen stroke (a single contour between touching the pen to the screen and lifting it off the screen), it is matched against the set of base node types, via a handwriting recognition algorithm (discussed in Section 4.4.1). These nodes include an audio rate processor (unit generator) or control rate processor (see Section 4.3.2). If the stroke matches an available node’s glyph, the user’s stroke is replaced by a node of that type. Unmatched strokes slowly fade away, indicating a failure to match.
Figure 4.3: The editor for a SquareWave node. The editor can be “pinned” open by pressing the button in the upper-left corner; unless pinned, an editor window will automatically close when the user touches or draws anywhere outside the editor window. Tapping and holding a node will open a list of parameters for that node (Figure 4.3). If the parameter is numeric, the user can press its value and drag up or down to adjust it by hand. Typical parameters are adjusted on an exponential scale (e.g. frequency or gain), but a node can specify if some parameters are to follow a linear scale. A tap on the parameter name opens a control into which writing a number will set the value. This value can then be accepted or discarded, or the user can cancel setting the parameter entirely (Figure 4.4). Tapping outside the editor popup will cause it to be closed; however, by toggling the “pin” button in the upper left corner of an editor window, that editor will be pinned open until the user un-pins it. A node may have multiple inputs and/or outputs. These appear visually as small circles, or ports,
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
54
Figure 4.4: Modifying the freq parameter of a unit generator with handwritten numeric input. on the perimeter of the object. Drawing a stroke from an input port to an output port, or vice versa, forms a connection between those two objects. For example, connecting a SawWave node’s output to the freq input port of a SineWave node creates a simple FM (frequency modulation) program, with the sine as the carrier wave and the sawtooth as the modulator. Most objects only have one output source, but an input node may have several destinations within that object (e.g. frequency or amplitude of an oscillator node, cutoff or resonance of a filter). When dragging a new connection to a destination node, a pop-up menu appears from the node to display the options a user has for the input destination (Figure 4.5).
Figure 4.5: Selecting the input port for a new connection. Many, though not all, input ports on a given node are mirrored in the node’s parameter editor, and vice versa. This allows and even encourages extensive modulation of any aspect of a node’s operation, including both conventional and atypical modulation structures. It is fairly easy to create
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
55
any classical frequency modulation topology, for instance. Less common modulation schemes, such as extreme modulation of delay line lengths or the integration of ring modulation into unusual places, are also available for experimentation. One exception, the ADSR node does not have input ports for its attack, decay, sustain, or release parameters; these must be edited manually through the node’s parameter editor. These exceptions are mainly to reduce visual clutter on the node itself and the port selection pop-up menu. A full list of node types, including their respective parameters and input ports, is listed in Appendix A. Audio-rate connections display the audio currently being transmitted through them, and controlrate connections display a “ping” animation whenever a new value is pushed through them. Audio waveforms are normalized by their effective amplitude and then rescaled logarithmically. This ensures that a diverse range of audio amplitudes can be shown without any one of them overwhelming the visual field. Control-rate pings are also scaled logarithmically according to the value of the control signal. These displays allow visual debugging of the current program and also provide insight into the effects of individual parts of the program. Nodes and freehand drawings can be moved around on the canvas by touching and dragging them, a familiar gesture in the touchscreen software ecosystem. While dragging an object, moving the pen over a delete icon in the corner of the screen will remove that object, along with destroying any connections between it and other objects. Connections can be removed by grabbing them with a touch and then dragging them until they “break” (Figure 4.6). The entire canvas may be scrolled through using a two-finger touch, allowing for programs that extend well beyond the space of the tablet’s screen. Auraglyph fully supports multitouch interaction. Pinning multiple editor windows open allows the user to adjust as many parameters at once as they have free fingers. Multiple connections can be made or broken at the same time, allowing for synchronized changes in a program’s audio flow.
4.3.2 Nodes Two base types of nodes can be drawn to the main canvas: audio-rate processors (unit generators; represented by a circle) and control-rate processors (represented by a square) (Figure 4.7). Unit
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
56
Figure 4.6: Breaking a connection between two nodes. generators currently available include basic oscillators (sine, sawtooth, square, and triangle waves, plus hand-drawn wavetables), an ADSR envelope, filters (resonant low-pass, high-pass, band-pass), white noise, arithmetic operators, feedback delay, a compressor, and audio input and output nodes. Control-rate processors include timers, discrete time-step sequencers, a pitch-to-frequency converter, spatial orientation input, and basic math operators. After creating a base object, a scrollable menu opens to allow the user to select the desired node sub-types (Figure 4.8).
Figure 4.7: Base node types: unit generator (left) and control-rate processor (right). Some nodes have more advanced needs for modifying values beyond the standard parameter editor. The Waveform oscillator brings up a large input space for precisely drawing the desired waveform. The Sequencer object brings up a standard discrete step sequencer editor that can be expanded in number of steps or number of outputs per step. A full list of node types is provided in Appendix A. Two distinctive nodes are discussed here.
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
57
Figure 4.8: Menu for selecting an object sub-type. Here, we see a menu for a unit generator object, showing ADSR, low-pass filter, high-pass filter, and band-pass-filter sub-types. Scrolling the menu will reveal additional sub-types. Waveform The Waveform node synthesizes a user-defined cyclic waveform. Editing the Waveform node will open an editor window that allows the user to modify this waveform using the stylus (Figure 4.9). As the user modifies the waveform, its output is dynamically updated. The icon shown on the node is also adjusted to match whatever user waveform it is currently generating. Waveform nodes can be driven by other oscillators, or even other Waveform nodes, to create intricate hierarchies of custom waveform modulation. A Waveform node can also be used as a low frequency oscillator to effect long term change of a modulated parameter of some other node. Internally, Waveform nodes use a 1024-point wavetable and linearly interpolates values lying between two points in the table.
Figure 4.9: Editor for a Waveform node.
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
58
Sequencer The Sequencer node (Figure 4.10) allows the user to create sequences of arbitrary numbers of discrete steps, where each step corresponds to a single numeric value between 0 and 1. Using Add and Multiply these values can be mapped to a desired range and used to control arbitrary parameters of other nodes. Similar to many modular synthesis systems, the sequencer is not intended solely to be used for note or percussion sequencing. Rather it can be used to effect any discrete, time-varying parameter change.
Figure 4.10: Editor for a Sequencer node. By dragging a tab in the corner of the sequencer, the user can add more steps or make additional rows (Figure 4.10); each row is given an independent output. A shaded bar indicates the current step in the Sequencer; as the Sequencer advances the bar’s position is also updated. The step position is automatically advanced according to the Sequencer’s built-in BPM parameter, but can also be
Figure 4.11: Adding rows and columns to a Sequencer node.
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
59
driven by an external control input from a Timer node or another Sequencer. Each step corresponds to a value between 0 and 1. The specific value can be adjusted by touching a step and dragging it up or down. Quickly tapping a step will toggle it to 0 or 1, depending on its current state. Orientation The Orientation node outputs the host device’s orientation in terms of Euler angles (pitch, roll, and yaw, corresponding to rotation about the device’s x, y, and z axes, respectively). Orientation data is sampled at 60 Hz, and represents a composite of measurements from the device’s accelerometer, gyroscope, and magnetometer. Add and Multiply nodes can be used to scale these measurements to the desired range for musical control. In this way, the Orientation node allows an Auraglyph programmer to easily integrate motion sensing into their program and utilize the gestural possibilities of mobile devices for musical expression and performance.
4.3.3 Document Management Auraglyph provides basic document management functionality, consisting of saving the current document, loading a previously saved document, or creating a new blank document. Upon saving a document, if it has not already been assigned a name, a popup will ask the user to give it one. The “name” is a freeform sketch that is drawn within the popup box (Figure 4.12), rather than a textual descriptor.
Figure 4.12: Saving a document.
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
60
Requesting to load a document brings up a scrollable list of all of the previously saved documents, displayed by name (Figure 4.13). Loading a document will clear all of the current nodes and replace them with that of the previously saved program. Creating a new program will simply clear all of the current nodes, and add a single Output to start with.
Figure 4.13: Loading a document.
4.3.4 User-Training the Handwriting Recognition System The machine learning techniques Auraglyph uses to recognize written user input as node shapes or numerals require a training set from which it learns the distinguishing features of each shape (discussed in Section 4.4.1). This training set consists of small number of example drawings of each type of shape that is meant to be recognized. The system has been initially seeded with a small training set drawn by the author. Thus, to a degree, it is tailored to how the author draws each type of shape. It would be preferable in the future for the base training set to represent a broader range of individuals, to better capture the variety of ways one might draw each shape. To combat the limited nature of the initial training set, Auraglyph’s “Trainer” mode provides a simple interface to draw new examples and refine the recognition system. It is highly recommended that new Auraglyph users train the system with a few examples of each shape to ensure that the
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
61
nuances of their particular writing style is factored in to the recognition system. But as the base training set is expanded to include more individuals the need for end users to update the examples themselves should diminish.
4.3.5 Visual Aesthetic and Dynamics A core aspect of Auraglyph’s design is its unique and cohesive visual aesthetic. The decision is partly motivated by Wang’s “Principles of Visual Design for Computer Music” [106] (principle 10, “Aesthetic: have one”). To some extent this approach is also an effort to improve usability. By distinguishing itself stylistically from iOS’s default aesthetic and other contemporary visual design languages, Auraglyph suggests to the user that they are interacting with a system that is fundamentally different from what they may be used to, preparing them for further unfamiliarities. The uncommon design disarms users assumptions about how the software ought to operate, and in doing so eludes the need to abide by preexisting software conventions. Auraglyph’s visual design aims to evoke memories of computer systems of ages past; its bright orange vector lines and blocky letters are inspired by personal computers of the 1980s. This is intended to add charm and personality to Auraglyph (number 9 in Wang’s principles of visual design), such that if parts of the software are difficult to learn or operate, at least its user has something interesting to look at. This historically-inspired visual style also harks back to a time when stylus-based computing was relatively common. Ivan Sutherland’s pioneering Sketchpad software, developed in 1963, used a light pen extensively for user input, and light pens were a common computer input device through the 1980s, such as with the Fairlight CMI music workstation or Iannis Xenakis’ UPIC. Auraglyph’s visual allusions to the past invite users to speculate an alternative technological history where pen input had survived and perhaps even become essential, in contrast to the real-world ascendancy of the mouse and keyboard. Auraglyph goes to considerable lengths to ensure that its graphical motion and animation feels natural, smooth, and dynamic. The visual elements of Auraglyph do not simply appear or disappear; they fade in or fade out, expand and contract, or use more elaborate animations. Control-rate pings expand immediately and then slowly contract along an exponential slew. Starting from a single point, editor windows and other popup windows expand horizontally and then vertically before
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
62
reaching their full size, in an animation resembling the GUIs of computer systems long passed into obsolescence (Figure 4.14). These visual flourishes are intended to imbue the software with personality and character, perhaps allowing its user feel comfortable with the more complicated aspects of the software.
Figure 4.14: Window opening animation in Auraglyph. Color plays a crucial role in Auraglyph as well. The particular shade of orange used for most of the foreground graphics was selected due to its similarity to the monochromatic displays used in historical computer systems. The background color is not actually black, but a dark shade of navy, also selected as nostalgia for computer screens of the 1980s. Deviations from this monochromatic color scheme are few; occasionally a green color is used to indicate constructive actions (selecting an input port on for a node connection), while red indicates prohibited or destructive actions (breaking a node connection). This visual identity is also a result of practical considerations. Auraglyph leans almost entirely on graphics rendered using the mobile variant of OpenGL (for reasons discussed in Section 4.4 below). OpenGL is markedly harder and more time-consuming to develop in than iOS’s native user interface toolkits. A spare graphical style has therefore been a necessity to facilitate rapid prototyping and iteration on interaction concepts without expending effort on graphics that may ultimately be discarded or heavily reworked. Many elements of Auraglyph’s design serve both functional and aesthetic purposes. In aggregate, the waveforms displayed on the connections between audio nodes provide a high-level representation of the program’s current state. Individual waveforms indicate how each node in the program responds to its inputs, and can also reveal potential errors if a node’s output waveform looks wrong or is absent. An Auraglyph programmer can easily follow sound’s path through the
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
63
program they’ve created, examining the effect of each node in real time. This information is passive, not demanding attention, but available to the programmer if they need it. The pulses visualized between control nodes serve similar functional purposes. From an aesthetic perspective, the waveforms themselves often possess a sort of natural beauty, especially as basic oscillators are modulated, filtered, enveloped, and processed by any number of any other means. As these waveforms and control pulses dance across the screen, an Auraglyph program’s visual display manifests a sort of living, breathing system that is, at times, as interesting as the sonic result.
4.4 Implementation Auraglyph is implemented in C++ and Objective-C using standard iOS development frameworks and tools. Auraglyph uses a bespoke audio engine based on the unit generator model. As nodes are connected in the implementation layer, a directed graph is formed in the audio subsystem. Starting from the master output, destination nodes pull sample frames (in constant buffer-size increments) from each of their source nodes. Each node maintains a timestamp of the last frame it generated to avoid generating the same samples twice or jumping past the current global sample-time, in case two or more destination nodes pull from it.3 A destination node then maps samples from each of its sources to the corresponding input or parameter. Audio-rate destination nodes apply audio-rate source samples at audio rate, allowing its parameters to be modulated every sample, which is necessary to enable several important synthesis methods such as frequency modulation (FM) synthesis. Control-rate processing occurs in a push-based system. As control-rate nodes produce new output values, they are pushed to each of their destination nodes. Control-rate destination nodes, upon receiving input, may then generate new output to push to further nodes. Audio-rate destination nodes apply incoming control-rate signals at audio buffer boundaries, linking the control rate to the audio buffer size. Auraglyph’s buffer size is currently fixed to 256 samples, or approximately 6 milliseconds at the fixed 44.1 kHz sample rate. 3
The timestamp is updated before pulling input samples from source nodes, which short-circuits feedback loops in the synthesis graph by passing the previous buffer of audio. Therefore, feedback loops in Auraglyph introduce an implicit delay at the point of feedback with duration equal to one buffer of audio (currently fixed to 256 samples or approximately 6 milliseconds).
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
64
Auraglyph’s graphics are rendered using OpenGL ES 2.0.4 OpenGL graphics were chosen over higher-level GUI systems for a number of reasons. One of these was to enable a unique visual identity, as discussed in Section 4.3.5. Most importantly, the flexibility of OpenGL allows experimentation with and creation of novel user interaction models. In contrast, existing GUI frameworks were believed to be too constraining. Many of the crucial visual effects employed by Auraglyph would be difficult to implement without a 3-D rendering system. Furthermore, OpenGL is a crossplatform framework, so building Auraglyph with it may facilitate porting the software to platforms beyond iOS, such as Android, Microsoft Surface, desktop platforms, or even virtual reality systems. Auraglyph includes a number of different graphics rendering modes, as implemented in a collection of OpenGL Shading Language (GLSL) programs. A shader simply titled “Shader” is a basic 3D graphics renderer. TexFont is used to display text characters that have been prerendered to a graphics texture, serving as the workhorse of Auraglyph’s text rendering system. The Clip shader is a basic renderer that also supports defining a 2D clipping mask; contents outside of this mask will not be drawn. This facilitates scrolling content, where some extent of a visual structure may need to be drawn, but its rendered form should not overflow the boundaries of its visual container. The Waveform shader supports fast rendering of audio waveforms. This shader receives its x, y, and z coordinates in separate vertex buffers, so that one of these (typically y, e.g. height) can be fed directly to the shader from a buffer of audio samples without needing to repackage or copy the data. Each available node type in Auraglyph is implemented as a C++ class in the underlying source code. The C++ class for a given node type provides code to render both its visual appearance and its audio or control output. This enables a tight integration between the node’s internal processing and its visual appearance, at the cost of having no clear programmatic interface between the two. Input/output nodes also have separate functions to render their appearance on the interface layer. New nodes are added to Auraglyph by subclassing one of the appropriate basic node types (implemented as AGAudioNode or AGControlNode C++ classes) and overloading the virtual functions used for graphical rendering and audio/control output. 4
https://www.khronos.org/opengles/2_X/
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
65
4.4.1 Handwriting Recognition It has been an explicit goal of this research to leverage existing tools and frameworks for handwriting recognition, rather than developing new software for this purpose. Auraglyph’s handwriting recognition is implemented using LipiTk [99], a comprehensive open-source project for handwriting recognition research. LipiTk is not natively designed to function with iPad applications, but we extended it to do so with straightforward code changes and additions. LipiTk’s default configuration uses dynamic time warping (DTW) [107] and nearest-neighbor classification (k-NN) to match pen strokes to a preexisting training set of possible figures. The result of this procedure is one or more “most likely” matches along with confidence ratings for each match. We have found the speed and accuracy of LipiTk in this configuration to be satisfactory for real-time usage, though a slight, noticeable delay exists between finishing a stroke and the successful recognition of that stroke. Before they can be used to classify figures of unknown types, the recognition algorithms incorporated into LipiTk must be primed with a set of “training examples” for each possible figure to be matched. This training set is typically created by test users before the software is released, who draw multiple renditions of each figure into a specialized training program. This training program serializes the salient features of each figure into a database, which is distributed with the application itself. In our experience, LipiTk’s recognition accuracy is highly linked to the quality, size, and diversity of the training set. For instance, a version of our handwriting database trained solely by right-handed users suffered reduced accuracy when used by a left-handed user. A comprehensive training set would need to encompass strokes from a range of individuals of varying handedness and writing style. Interestingly, though, LipiTk’s algorithms are able to adapt dynamically to new training examples. An advanced system might gradually adjust to a particular user’s handwriting eccentricities over time, forming an organically personalized software interaction. Auraglyph takes advantage of this feature to a small degree, allowing a user to add new training strokes via a separate training interface.
CHAPTER 4. SKETCHING AND MUSICAL GESTURE
66
4.5 Summary Auraglyph proposes a new model of interacting with sound synthesis structures, via handwritten stylus gestures and touch. Sketching is an intrinsic part of nearly all creative activities, and as such presents a powerful metaphor for a music programming system. Auraglyph thus employs a twohanded interaction in which synthesis and control structures are created with stylus input and then parameterized via touch. Auraglyph includes a rich set of nodes for effecting audio processing and control that can be flexibly interconnected. Many of these leverage the distinct affordances of the device itself, including a drawn waveform editor, a multitouch sequencer, and an orientation sensing control. Auraglyph also allows user to draw freehand sketches that are left as-is to decorate and annotate a patch. Clean, monochrome visual forms and a distinct animation style give Auraglyph a unique and functional design aesthetic. Together, these characteristics leverage the distinguishing features of mobile touchscreen technology for the purpose of expressive music programming.
Chapter 5
Evaluation To assess the musical suitability of the software applications developed during this research and the merit of the concepts underlying them, a formal evaluation was conducted along a number of fronts. Both miniAudicle for iPad and Auraglyph were subjected to user studies incorporating quantitative analysis via anonymous survey and qualitative appraisal through a number of concert performances. During the studies, automated logging of user activity was carried out via Google Analytics, to provide a data-driven aggregate picture of how individuals used these software applications. The author also spent some time composing music in Auraglyph for live performance, yielding an additional dimension of insight into the concept’s musical potential.
5.1 miniAudicle for iPad The evaluation of miniAudicle for iPad was carried out in the form of a user study conducted with music technology students at the graduate and advanced undergraduate levels. These students were comfortable with ChucK, with music programming in general, and with various forms of music production and composition. The goal of the study was to validate the app’s design goals and draw conclusions as to the validity of the principles underlying them. It attempted to do this in the context of use by music technology students already familiar with the standard tools of the field and having diverse backgrounds in music composition and/or performance. The user study for miniAudicle for iPad asked participants to use miniAudicle for iPad for a 67
CHAPTER 5. EVALUATION
68
few hours over a period of a week. They were given the goal of developing a musical instrument, composition, or other creative work. Users were given iPads if they didn’t have one already, and the device was loaded with a specialized user study version of miniAudicle. This test version was tagged in the project’s code repository, which allowed the software to be continually developed while also ensuring that all users within the study received the same version of the app, regardless of when they started the study. Only the full-sized 9.7” variant of iPad was used for this study to control for issues related to the size of the screen and physical device. It was not logistically possible to control the exact model of iPad used for every individual, but each test device was of the 3rd generation model or better, having a minimum 1GHz CPU clock speed and a so-called “Retina” high-pixel-density display. Before starting to use miniAudicle for iPad, participants filled out an entry survey to form a baseline of their musical and technical background. They were directed to a website hosting an introductory tutorial video, but were not required to watch the video. The site also included documentation and sample code for the ChucK Mobile API (described in 3.2.3). They were then left on their own for a week or so, during which point they would theoretically use the app over a period of days to develop a musical work. After a week, participants were asked to complete an exit survey which included a series of quantitative questions in which they could indicate their agreement on a scale of 1-5 in addition to prompts for short responses to qualitative questions. These surveys were linked through a pseudonym assigned to each participant at the start of the study but otherwise the survey process was anonymous. (Both surveys are included in Appendix B.) After completing the survey, the participants were invited to demonstrate their results as a group with the author. All code and related resources created within miniAudicle for iPad during the user study were downloaded to the author’s computer for further analysis. Additionally, anonymous software-based usage tracking, using Google Analytics, was active throughout the study, to allow for later analysis of aggregate user behavior patterns within the app. The qualitative component of the exit survey used a form of the Creativity Support Index developed by Cherry and Latulipe [108] with modifications inspired by Wanderley [109] and other alterations. The Creativity Support Index (CSI) assesses interactive software for creative applications according to the general characteristics of collaboration, enjoyment, exploration, expressiveness,
CHAPTER 5. EVALUATION
69
immersion, and results being worth the effort. These factors are assessed by the user under study with two questions per category; using two similar but distinct questions for each factor is suggested as improving the statistical power of the survey. Then, the importance of each factor is indicated by the user with a series of paired-factor comparisons. These questions ask the user to indicate which of two of the above criteria they value more when performing the tasks indicated, for each pair of criteria. As a whole, the paired-factor comparisons establish a general hierarchy of importance of each factor. The final CSI is calculated as a single value, a linear combination of the users’ rating for each factor weighted by their deduced importance. For this study, the collaboration factor was omitted in the survey, given the lack of collaborative aspects in the version of miniAudicle for iPad available at the time of the user study. An additional factor was added, learnability, inspired by Wanderley [109], who has argued “it is essential to take into account the time needed to learn how to control a performance with a given controller.” Further, the weightings of criteria by paired-factor comparisons were omitted to simplify the exit survey, which we considered acceptable, especially since we find the individual criteria rankings to be more interesting than an aggregate score. Overall, 6 participants completed the user study. 33% of the participants stated that they used the app for approximately 4 hours, and the remaining 67% stated that they used the app for 5 hours or longer. The average user rankings of each criterion are summarized in Table 5.1. Factor Enjoyment Exploration Expressiveness Immersion Results worth effort Learnability
Average ranking (out of 5) 4.3 4.25 4.3 4.1 4.6 4.6
Table 5.1: Average rankings by users of each factor in the modified Creativity Support Index. The user study also solicited qualitative feedback from participants, with questions on the overall experience using miniAudicle for iPad and the creative possibilities that were enabled or closed off
CHAPTER 5. EVALUATION
70
by the app. In their responses to these questions, participants expressed that they “enjoyed the performative aspects” of the application and appreciated being able to access the mobile device’s orientation sensors within ChucK code. Responses also indicated an appreciation of the “laid-back experience of not typing on a keyboard” and the ability to do ChucK programming while out and about and not sitting at a desk with a laptop or desktop computer; in one case, using miniAudicle for iPad was described as “fun” in contrast to full-time professional software development. Negative reactions focused on the lack of a traditional keyboard and the insufficiency of the code-writing tools supplementing the on-screen keyboard. One participant suggested that larger projects would be difficult or impossible to develop and manage, and several indicated that they saw the app more as for a tool for tweaking and performing code that had been written elsewhere. Another participant plainly stated that they “didn’t like coding in the iPad.”
Figure 5.1: Demonstration instruments design in miniAudicle for iPad. Among the creations presented at the end of the user study were a drone activated by voice or breath input fed into multiple delays, a feedback network driven by oscillators and controlled by motion sensors, an accelerometer-based shake controller for the Synthesis ToolKit’s “Shakers” physical model [5], a motion-controlled “cello,” and an audio analysis system that drove desktopbased visuals (Figure 5.1). A number of these leveraged the ChucK Mobile API to achieve these features, and in some cases leaned on the physical properties of the device itself for their interaction.
CHAPTER 5. EVALUATION
71
17 distinct user-generated code files were retrieved from the test devices, encompassing 11948 total bytes of text and 467 non-whitespace lines of code, or about 27 non-whitespace lines of code per file. Most of these followed the while(true) structure that is conventional to many types of ChucK programs, in which initial setup code precedes an infinite loop that processes input and manipulates the audio output accordingly. An example can be seen in Listing 5.1. Loops were found in every program; functions were used occasionally, and classes sparingly. One program was structured using a chubgraph [97] to compose multiple unit generators and control code into a single aggregate unit generator. Some of the programs were evidently modified from the ChucK examples included with miniAudicle for iPad. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
TriOsc c => NRev rev => Chorus chorus => dac; SqrOsc s => rev; 0.6 => rev.mix; 0.35 => c.gain => s.gain; 400 => float sinfreq; Motion mo; MotionMsg momsg; mo.open(Motion.ATTITUDE); while(true) { mo => now; while(mo.recv(momsg)) { // >; momsg.x * momsg.x * 10000 => c.freq => sinfreq; sinfreq * 1.5001 => s.freq; } }
Listing 5.1: A ChucK program typical of those produced during the user study.
Analytics In advance of the user study, miniAudicle for iPad was outfitted with Google Analytics, a tracking tool for software applications. Within the app, usage data is fed to the Google Analytics software
CHAPTER 5. EVALUATION
72
library, which compiles and uploads this data to the central Google Analytics service. As the developer of the application, we were then able to see the aggregate metrics for behavior and usage within miniAudicle for iPad. Filenames, ChucK code, or personally identifying information were not included in this usage data, but a given user is tracked across multiple sessions of application use. From a high level, user behavior can be categorized by “screen” or by “event.” For instance, miniAudicle for iPad tracks two screens, for “Play” mode and for “Edit” mode, and has many events, such as “AddShred” or “EditScript” (corresponding to an edit to a script). Screens are tracked by number of views as well as by time spent in that view. Events are tracked by quantity. Overall, these analytics provided concrete data about how participants used miniAudicle for iPad, complementing the survey data and musical demonstrations. During the user study phase of miniAudicle for iPad, 6 unique individuals used miniAudicle for iPad over 47 total sessions. The average duration of using miniAudicle for iPad was approximately 15 minutes. 19 total scripts were created in these sessions. According to the analytics data, users spent an average of about 7.2 minutes in Editor mode each time they activated that mode, and an average of about 1.7 minutes in Player mode (these will not add up to average session duration since each change in mode counts as a new “screen”). Event tracking is summarized in Table 5.2. Users added a total of 577 shreds, comprising 25% Event AddShred RemoveShred EditScript ReplaceShred EnterEditMode ConsoleOpen ShredsOpen AppLaunch ExampleScripts MyScripts
Number of events of type 577 528 348 183 166 95 62 60 43 40
Percentage of total events 25% 23% 15% 8% 7% 4% 3% 3% 2% 2%
Table 5.2: Top analytics events during the miniAudicle for iPad user study.
CHAPTER 5. EVALUATION
73
of the total events; adding shreds was the most frequent event. Following it were removing shreds (23% of total events) and editing the script (15% of total events). (Individual script edits were not tracked; rather, an uninterrupted stream of edits punctuated by some other event were considered a single editing event.) Of the total shreds added, approximately 37% were added from Player mode, and 35% of the shreds removed were removed in Player mode. 11% of the shreds replaced were done so in Player mode. Interpretation We can start to form a number of interpretations of the results of this user study. It is encouraging that text editing on a keyboard was not completely unusable to participants in the user study. Additional enhancements to the editing component of the app might further ease the task of editing code on a touchscreen. On the other hand, the results of the user study also confirm that text editing on an iPad is not ideal for users. Its possible that no degree of small editing improvements will fully make text programming comfortable on a touchscreen. Perhaps most contrary to miniAudicle for iPad’s design goals, Player mode did not seem to be especially engaging. Users on average spent 1 minute and 40 seconds in Player mode per session that they used it. This opposes the hypothesis that tangible interaction is the key element to working with ChucK code on a mobile touchscreen device. Rather, it seems that participants in the user study fundamentally viewed working with ChucK code in miniAudicle for iPad as a text-based process. In this sense, Player mode, as presented in the user study, was not sufficient to leverage touch interaction into a uniquely engaging programming experience. However, users did seem to appreciate using miniAudicle for iPad as a versatile platform for prototyping software utilizing the sensors and physical aspects of the iPad. This speaks to the uniqueness of the iPad in terms of the sensors that are available and its form-factor in contrast with desktop computers. Future improvements to this system might put these capabilities in the foreground of the app and perhaps enhance them further.
CHAPTER 5. EVALUATION
74
5.2 Auraglyph Evaluation of Auraglyph for its musical and artistic capabilities was carried out in two forms. The first was a pair of user studies conducted with students at California Institute of the Arts (CalArts). An initial user study was held with students of the CalArts course “Composing for Robots,” which consisted of a specific task for the participants to complete using Auraglyph followed by a survey of their experience using the software. Following the survey, this evaluation was extended to have the students develop concert works for Auraglyph in groups of three or four. A second user study was conducted in a two week intensive workshop at CalArts, entitled “Composing with Mobile Technology.” This study used a similar approach to the previous study but in a compressed span of time. In between the two user studies was a period of continuing development of Auraglyph based on user feedback from the first study and the author’s own evaluation of the software. Therefore the two studies have not examined the exact same software; rather, the second study evaluated a form of Auraglyph that was intended to be an improvement of the initial study version. The primary goal of these efforts was not to evaluate a specific edition of the Auraglyph software, but to assess its underlying concepts and the ideas described in Chapter 4. The other form of Auraglyph’s evaluation centered around the author’s attempts to use the software in an artistically significant manner. Music software can make the creative process more accessible or provide a breadth of aesthetic opportunities, but we also believe that sophisticated music applications can offer a depth of creative experience. Acoustic instrumentalists spend years honing their craft, and, while music technology has removed many of the limitations of the pre-digital age, there is still a need for tools that engender the continuing development of skill and proficiency. This additional form of evaluation is intended to gauge Auraglyph in that context.
5.2.1 Auraglyph Study 1: Composing for Robots In autumn 2016, students in the author’s CalArts course “Composing for Robots” participated in a user study for Auraglyph. The study was structured in two main parts, the first consisting of individual use with quantitative analysis and the second involving groups of the students using Auraglyph
CHAPTER 5. EVALUATION
75
to compose concert works. The first part resembled the miniAudicle for iPad study discussed in Section 5.1. Participants in the study were given iPads with Auraglyph installed or, in some cases, Auraglyph was installed on iPads they owned themselves. They were each given the task of individually creating a “musical statement” using Auraglyph. After one week of using Auraglyph independently, they were asked to demonstrate their results, describe their experiences, provide feedback as to the design of the software, and complete a short survey. The initial musical results, after one week of use, were somewhat basic in sound design and form. Yet the students maintained a high degree of enthusiasm around the potential offered by Auraglyph. At the one week point, participants stated that they enjoyed the sketching metaphor of the application and the ability to capture their sonic stream of consciousness, and overall felt the basic functions of the software were intuitive. The chief criticism of the software at this stage was a slow learning curve and lack of resources for explaining the core functions of the application. Reliability was also noted as a concern, as certain user actions would consistently crash the application or put the application into an erroneous state. Other criticism centered on the absence of several critical ancillary features, like the ability to save an existing Auraglyph program, to create a new one with a clean slate, or to load a previously saved program (these features were not present in the version of Auraglyph tested). The inability to undo mistaken actions or copy and paste structures was also cited as a significant limitation. Participants expressed a desire for additional node types, such as more advanced math and signal processing functionality. Additional feedback included a desire for the niceties of typical digital audio workstations, such as node solo and mute functions. As with the miniAudicle for iPad study, the survey asked participants to evaluate the software according to a modified form of the Creativity Support Index. Seven total students participated in the study, and five of these completed the survey (the survey text is included in Appendix B). One of these seven students had previously participated in the miniAudicle for iPad user study described in Section 5.1. The average rankings for each modified CSI criterion are shown in Table 5.3 alongside the rankings from the previous miniAudicle for iPad study. To the surprise of the author, these fell significantly short of the comparable rankings for miniAudicle for iPad. Respondents stated that
CHAPTER 5. EVALUATION
Factor Enjoyment Exploration Expressiveness Immersion Results worth effort Learnability
76
Average ranking (out of 5) Auraglyph miniAudicle for iPad 3.6 4.3 2.7 4.25 2.9 4.3 2.4 4.1 3 4.6 4.6 4.6
Table 5.3: Average rankings of each factor in the modified Creativity Support Index by Auraglyph users in the “Composing for Robots” class. Rankings for the miniAudicle for iPad user study are included for comparison. “[w]hen it worked, it was fun to push up against the limitations of the software” and that it was “[a] fun way to build systems.” On the negative side, feedback included that “a lot of times it felt like it was a fight against the interface,” “I just had some frustrations with crashes,” that “[t]he UI is a little confusing,” and that it was “hard to figure something creative out on my own” beyond basic oscillators. Respondents also expressed a desire for additional educational resources, such as a video tutorials or example programs. Some early conclusions can be drawn from this feedback and the accompanying modified Creativity Support Index scores. One is that reliability is incredibly important for creative software to achieve its goals. Another is that documentation resources are critical for helping users achieve the full potential offered by a creative application. Notably, negative feedback from the surveys primarily focused on implementation issues such as crashing or unreliability; these issues ultimately have straightforward engineering solutions. Auraglyph’s comparatively poor performance in this initial study was not viewed as an indictment of its design or of its underlying metaphor. Analytics software was running throughout the two weeks between introducing Auraglyph to the students and distributing the survey. During this period, seven individuals used Auraglyph over 48 sessions. The average session duration was 21 minutes. Twelve sessions (25% of the total in this period) lasted longer than 30 minutes, and these extended sessions averaged 45 minutes in length. Table 5.4 summarizes the most frequent user activities that were tracked during the study. The
CHAPTER 5. EVALUATION
Event EditNodeParamSlider MoveNode OpenNodeEditor DrawNodeUnrecognized ConnectNodes DrawNodeCircle CreateNodeAudio DeleteNode SaveProgram DrawNodeSquare
77
Number of events of type 2,397 1,992 1,391 1,354 1,124 457 397 290 173 157
Percentage of total events 22.68% 18.85% 13.16% 12.81% 10.63% 4.32% 3.76% 2.74% 1.64% 1.49%
Table 5.4: Top analytics events during the “Composing for Robots” user study. most common events were adjusting parameters of nodes using sliders and moving nodes. Drawing an unrecognized figure that was intended to be a node (DrawNodeUnrecognized, 12.81%) was nearly three times as common as drawing a recognized node (DrawNodeCircle, 4.32%, and DrawNodeSquare, 1.49%), indicating significant problems in accurately detecting these figures. The least common event was drawing the numeral 7, which was successfully achieved only once during the study. Table 5.5 shows a breakdown of audio nodes created by users during the study. SineWave and, curiously, Multiply were the most frequently used nodes during the study (Multiply can be used either as a simple gain control or to multiply its inputs as in a ring modulator). Feedback and Waveform (a hand-drawn wave table generator) were also common. ADSR was used somewhat infrequently compared to other nodes, suggesting that participants were not especially concerned with applying envelopes to their sounds. Composing for Robots Concert Following this initial study, the students of “Composing for Robots” continued individual use of Auraglyph and then transitioned into developing group compositions for the course’s upcoming final concert. Hemispherical speakers and subwoofers were loaned by the Stanford Laptop Orchestra [33] to supplement the built-in iPad speakers for the concert. Given additional time to work
CHAPTER 5. EVALUATION
Audio node SineWave Multiply Feedback Waveform SquareWave BandPass Input Output SawWave ADSR LowPass Add TriWave HiPass Compressor
78
Nodes created 59 58 41 37 28 24 22 22 21 20 20 19 16 6 4
Percentage of total audio nodes 14.86% 14.61% 10.33% 9.32% 7.05% 6.05% 5.54% 5.54% 5.29% 5.04% 5.04% 4.79% 4.03% 1.51% 1.01%
Table 5.5: Audio node use during the “Composing for Robots” user study. with Auraglyph, the students appeared to become more comfortable with its various features, which manifested in improved musical output. The Auraglyph x Robots concert occurred in late Autumn at CalArts and comprised several musical works developed in “Composing for Robots.” These included two group compositions that utilized Auraglyph and a number of pieces for the mechatronic instruments of the CalArts Machine Orchestra [110]. Figure 5.2 shows several of the works in performance and the poster advertising the event. One of the pieces used Auraglyph to build a spatialized beat across three hemispherical speakers arrayed in the performance space. The performers improvised with sequencers and other controls to progress the piece as it built up and waned over time. In the other Auraglyph-based piece, titled HedonismBot, the group developed three feedbackbased musical instruments in Auraglyph and used these in tandem with the titular mechatronic instrument. The instruments were based on processing and replaying the iPad’s microphone input, and each possessed a unique sound design, covering distinct pitch registers and timbral spaces. The piece’s three performers held their instruments in various orientations around the hemispherical speakers to which the iPads were connected, creating acoustical feedback loops. At set points
CHAPTER 5. EVALUATION
79
Figure 5.2: Auraglyph in performance at the Auraglyph x Robots concert, and the poster for the event. Photos: Ashwin Vaswani. Poster: Ivy Liu. within the composition, each performer would walk up to HedonismBot, in the middle of the stage, and present their iPad, which triggered the bot to actuate a pattern of motorized clicks and whirs (Figure 5.3). Feeding into the Auraglyph instruments’ input processing, these sound patterns became increasingly elaborate as the piece progressed. Parts of the feedback loops were controlled with input from the iPad’s orientation sensors. As the performers adjusted their tablets deliberately and moved back and forth from their associated speaker to the robot, the overall impression was that of a musical ritual. The piece not only created a unique musical texture but engaged its musical objects and the performance space in ways uncommon in electronic music. It would not have made sense if, for instance, laptops were used instead of iPads, or if the distinct audio sources had been mixed into a common sound system.
5.2.2 Auraglyph Study 2: Composing with Mobile Technology Workshop In winter of 2017 the author held a two-week workshop at CalArts entitled “Composing with Mobile Technology.” The workshop was focused on developing music and compositions using miniAudicle for iPad, Auraglyph, and other selected applications for mobile phone and tablets. This workshop started a month after the Composing for Robots class, and the feedback and data received from the
CHAPTER 5. EVALUATION
80
Figure 5.3: A rehearsal of HedonismBot. previous exercise suggested improvements to the Auraglyph software that were completed before the workshop. These additional features included support for multitouch interaction (including the ability to adjust multiple parameters of multiple nodes simultaneously), support for saving and loading programs and easily creating new ones, a noise generation node, and a variety of fixes addressing bugs and crashes. These were expected to improve the overall process of working with Auraglyph and ultimately lead to a better music making experience. miniAudicle for iPad and Auraglyph were presented on separate days, and students were asked to develop short musical statements in each during class time. For the final class assignment, students were asked to develop and perform a full composition. Students in the workshop gravitated heavily towards composing with Auraglyph, with some students stating that that was the reason they took the class. This in itself constitutes a substantial, if informal, finding of the study, in that, when presented with both miniAudicle for iPad and Auraglyph without experience in either, participants strongly preferred working with Auraglyph. As a result the workshop provided a wealth of data for evaluating Auraglyph’s performance with a number of users.
CHAPTER 5. EVALUATION
81
14 students total participated in the workshop. One of these students had previously participated in the initial Auraglyph user study, and none had been involved in the miniAudicle for iPad study. Most of these were students in CalArts’ Music Technology program, and thus had a high degree of familiarity with many existing digital music creation tools. After demonstrating how to use Auraglyph to the students, installing it to their iPads, and providing class time to spend developing music with the app, the classroom developed over the next hour into a diverse soundscape of drones, beats, blips, and other noises. One student immediately set out to create a kick drum using an enveloped triangle wave and noise generator. Another student built a cascade of sequenced delays; by varying the sequence, he was able to explore a variety of sonic spaces (Figure 5.4). Students explored the freedraw feature of Auraglyph, using it less as an annotation mechanism and more as a way to customize and embellish their programs (Figure 5.5). One student began prototyping ideas for notating instructions to Auraglyph performers (Figure 5.6). A number of experiments centered around using the iPad’s orientation sensing capabilities for manipulating sound (Figure 5.7). In class, many of the students seemed completely absorbed in the process of making music with Auraglyph, with heads buried in tablets, fully engaged in this world of sound creation. Outside of class, several students reported that they had stayed up until the early hours of the morning using
Figure 5.4: A student developing an idea in Auraglyph.
CHAPTER 5. EVALUATION
82
Figure 5.5: Screenshot of an Auraglyph program developed in the Composing with Mobile Technology workshop (courtesy April Gerloff). Auraglyph, despite the workshop not mandating coursework outside of class time.1 It was surprising to see that students were freely exploring the sonic possibilities of Auraglyph without necessarily having a concrete long-term goal. At times, it seemed the effect of composing with Auraglyph was hypnotic or even therapeutic, especially given the external events and tumultuous American political climate in late 2016 and early 2017. Ultimately six performances were presented at the end of the workshop, including solo and group performers. In addition to the final works a number of interesting demonstrations and musical statements were also developed by participants in the workshop. One of the final performances involved a quartet of performers each using Auraglyph on separate iPads, distributed to two hemispherical 1
While Google Analytics was active in Auraglyph throughout this study, limitations are imposed on the data aggregated and reported that prevent us from directly confirming this. As noted later in this section, the most we can say with regards to long-term Auraglyph sessions is that there were 24 sessions that were 30 minutes or longer, and these averaged 45 minutes in length.
CHAPTER 5. EVALUATION
83
Figure 5.6: Sketches of prototype notation for Auraglyph performance. speakers (Figure 5.8). The quartet performed a rhythm-driven piece, despite there being no intrinsic capabilities for synchronizing instances of Auraglyph across multiple devices. This didn’t appear to be a critical limitation, as the fluid rhythmic interplay of the result lent itself to the overall dynamic nature of the music. A solo composer developed a work that explored a number of different sonic spaces, fluidly shifting through a multilayered texture of droning oscillators, a cacophonous swell of noise, and an industrial-style percussion beat. (Notably, at this time, Auraglyph did not support playing back soundfiles; the percussion in this piece was entirely synthesized from manipulating oscillators and noise.) He intended to develop the piece further for presentation at a future concert. Another student developed a piece for Auraglyph, electric guitar, and several other mobile devices running additional music software, building a dense soundscape over which he improvised with each tool.
CHAPTER 5. EVALUATION
84
Figure 5.7: Demonstrating a program that manipulates sound with the iPad’s orientation sensors.
Figure 5.8: An Auraglyph quartet. During and after the workshop, the author spoke directly with a number of the students to informally solicit their reactions to and criticisms of working with Auraglyph. One student appreciated that the “app promotes destructive creativity” with the ability to “throw oscs into oscs into more
CHAPTER 5. EVALUATION
85
oscs,” fostering “a mindset where you’re not going to make this harmless thing.” Another student stated that they had no previous experience with musical apps for iPad, but they liked that Auraglyph was a “real playable thing” and a “nice way to visualize synthesis.” One stated that it “felt right on the iPad,” while also identifying several drawbacks; these included limited tools for managing size and complexity of programs and the inability to assemble synthesis structures into reusable modules. A number of students expressed fear of the “Save” button, worried they might accidentally hit the nearby “New” button instead and lose their work. One of the workshop participants stated that he could “do that for hours,” also saying how he “really learned a lot about sound synthesis” and compared it favorably over Native Instruments’ Reaktor. One, after presenting a completed program to the author, stated that he wanted to hang it up on the wall as a work of art. Several students purchased new iPads during the workshop or soon afterwards, inspired in part by the musical creativity they had felt working with Auraglyph and miniAudicle for iPad during the workshop. Students with a background in analog modular synthesis seemed to take well to Auraglyph, though this was not exclusively the case. Several students had very limited exposure to music technology beyond digital audio workstations and music notation software, but this did not seem to inhibit them from expressing a range of creative ideas in Auraglyph. Students who had previously struggled to fully realize their ideas using ChucK in the author’s prior Introduction to Programming classes produced some of the best compositions in the workshop with Auraglyph.
Factor Enjoyment Exploration Expressiveness Immersion Results worth effort Learnability
Average ranking in study (out of 5) Second Auraglyph First Auraglyph miniAudicle for iPad 4.8 3.6 4.3 4.2 2.7 4.25 4.3 2.9 4.3 3.9 2.4 4.1 4.8 3 4.6 4.6 4.6 4.6
Table 5.6: Average rankings of each factor in the modified Creativity Support Index by Auraglyph users in the “Composing with Mobile Technology” workshop (“Second Auraglyph”). As a point of comparison, the average ranking for each factor is also shown for the “Composing for Robots” study (“First Auraglyph”) and the miniAudicle for iPad study.
CHAPTER 5. EVALUATION
86
11 participants of the workshops completed a survey provided after the first week of the workshop. This survey was a copy of that used in the Composition for Robots study and is included in Appendix B. The average user rankings of each criterion for the modified Creativity Support Index are summarized in Table 5.6, along with the rankings from the previous studies. Compared to the Composing for Robots study, each factor improved markedly except Learnability, which was already fairly high. Immersion ranked much lower than every other factor in this study; this might suggest either lingering reliability issues or perhaps a limit to the engagement of Auraglyph’s metaphor, as implemented. As with the previous studies, it is likely that the overall positive skew of the results was due in part to the participants’ personal relationships with the author, even though the anonymity of the results was stressed when distributing the survey. The freeform responses to the survey also proved illuminating. Users enjoyed using Auraglyph for sound design and “just for creating new and innovative sounds,” the modular aspect of its design, the visual design, and seeing “the different waves going all around.” One responder stated “[i]t was a lot easier for me to get started with than programming languages like Pd, SuperCollider, etc,” while another commented that “I just make music. I just want to keep going. It sounds good. I’m not thinking about EQing or any annoying DAW shit.” Criticism of Auraglyph in the survey focused on the absence of traditional musical timing mechanisms, the inability to encode conventional musical ideas and notation, and the anticipated difficulty of playing an extended live set. Reliability issues were also a concern, especially surrounding saving and loading programs. As before, analytics were active for the duration of the workshop. Google Analytics tracked 13 individual users across 89 sessions of using the app in the two week duration of the workshop (some workshop participants shared iPads resulting in their being tracked as a single user). The average session duration was 20 minutes; 24 sessions lasted 30 minutes or longer, these longer sessions averaging approximately 45 minutes. A breakdown of ten most common user activities is broken down in Table 5.7. The most common event type was editing a parameter with the touch slider. The second most common event type was drawing a shape that was not recognized as a node, pointing at fairly egregious reliability issues with the handwriting recognition implementation and a place for significant future improvement. Audio nodes were the most commonly drawn node type. The
CHAPTER 5. EVALUATION
Event EditNodeParamSlider DrawNodeUnrecognized MoveNode OpenNodeEditor ConnectNodes DrawNodeCircle CreateNodeAudio DeleteNode DrawNumeralUnrecognized EditNodeParamDrawOpen
87
Number of events of type 6,651 4,584 4,483 3,594 2,381 1,079 882 482 293 272
Percentage of total events 25.26% 17.41% 17.03% 13.65% 9.04% 4.10% 3.35% 1.83% 1.11% 1.03%
Table 5.7: Top analytics events during the “Composing with Mobile Technology” workshop. most frequently drawn numerals were 0 and 1, both drawn 92 times; however, attempts to draw a numeral that was unable to be recognized were more frequent. 188 individual free-drawn figures were drawn, encompassing 0.71% of user actions during the study. The least common tracked user action was drawing the numeral 7. Table 5.8 provides a summary of the audio nodes created during the workshop. 882 total audio nodes were created, with SineWave being the most common by far. Feedback and SquareWave were both also fairly common. At least one of each kind of node was utilized by a participant during the workshop, including the Composite node, an unfinished feature which was unintentionally left in the software build used during the workshop.
5.2.3 Personal Evaluation The development of the Auraglyph software was interleaved with periods of extended composition using the latest build of the software. These composition sessions were initiated at appropriate milestones in the software’s development. The sessions were used to reflect upon the current state of the software and to guide further development, in terms of high-level design, individual features, and the triage of critical bugs. In this sense Auraglyph was constantly being evaluated for its musical potential by the author
CHAPTER 5. EVALUATION
Audio node SineWave Feedback SquareWave ADSR TriWave SawWave Waveform Multiply LowPass Output Noise Add Input BandPass Compressor HiPass Composite
88
Nodes created 208 116 92 73 66 62 49 43 41 34 29 21 15 11 10 10 2
Percentage of total audio nodes 23.58% 13.15% 10.43% 8.28% 7.48% 7.03% 5.56% 4.88% 4.65% 3.85% 3.29% 2.38% 1.70% 1.25% 1.13% 1.13% 0.23%
Table 5.8: Audio node use during the “Composing with Mobile Technology” workshop. in an iterative loop of development and use of the software. This feedback loop of constant iterative development was carried over from the author’s work with Smule [25], in which a number of sonic application experiments were developed in a matter of months; in one extreme case, an entire Smule application was conceived, developed, and submitted to the iPhone App Store within a single day. These development cycles were informally broken into shorter sprints of heavy development followed by use and evaluation of the current state of the application. Creative software of an experimental nature requires this iterative development process to quickly evaluate a variety of design concepts of unknown utility, guiding them into usefulness or discarding them entirely. Such processes are also integral in the now-common situation where a single individual is the builder, composer, and performer of a musical software and/or hardware tool. A number of milestones in the development of Auraglyph triggered intense periods of composing with the new feature that had just been implemented. For instance, the completion of scrub controls of the node parameter editor, which allowed parameters to be adjusted easily and directly
CHAPTER 5. EVALUATION
89
Figure 5.9: A still image from “Auraglyph | Demo_1: Basic FM + Filters,” the first public presentation of Auraglyph. with touch, inspired creating the first public presentation of Auraglyph, a demonstration video2 (Figure 5.9) uploaded to YouTube which garnered several thousand views and attention from the online software synthesis community.3,4 The addition of proper multitouch interaction, such as the ability to scrub multiple node parameters at the same time or connect multiple nodes simultaneously suggested further possibilities, including multi-user interactions. The implementation of a number of nodes seemed to multiply the sonic potential of Auraglyph. After adding the Feedback node, the author spent the rest of the evening exploring the newly available possibilities for echo, spatialization, and modulation effects with varying parameterization and inter-node connections. Adding Noise triggered similar experimentation with the musical potential of random noise. In autumn 2016, the author was invited to perform at a student-initiated concert, TOO LATE IV, and decided to use Auraglyph to compose and perform 20 minutes of original electronic music for 2
Salazar, Spencer, “Auraglyph | Demo_1: Basic FM + Filters,” https://www.youtube.com/watch?v= Tdj5e82nPHQ 3 Synthopia, “New iPad Synth, Auraglyph, Lets You Draw Your Modular Synth Patches,” http://www.synthtopia. com/content/2016/09/03/new-ipad-synth-auraglyph-lets-you-draw-your-modular-synth-patches/ 4 Discchord, “Auraglyph | Demo_1: Basic FM + Filters,” http://discchord.com/appnews/2016/9/2/ auraglyph-demo_1-basic-fm-filters.html/
CHAPTER 5. EVALUATION
90
the event. This would be the first instance of the author actually producing a musical composition using Auraglyph; previously only short musical demonstrations or test programs had been created. This opportunity was seen as an initial experiment to see if the current version of Auraglyph could be used to create an interesting, satisfying, and/or listenable music experience for the author and an audience. It was not completely certain at the time if that would be possible, given the current state of the software, its creative affordances, and concerns about reliability. Given these concerns, the experiment was viewed as another step in the iterative development process, whose results and conclusions would be used to guide continued refinement and improvement of Auraglyph. This compositional process transpired over a period of two weeks, resulting in two compositions of roughly ten minutes in length. The original intention was to use two iPads, each with Auraglyph installed, and alternate between the two, creating a richly textural, continuous musical experience. miniAudicle for iPad was also considered for adding small sonic elements to the mix. The end result reduced this setup to a single 12.9” iPad Pro preloaded with two Auraglyph programs that would be reconfigured and augmented over the course of the performance. Each patch roughly corresponded to one “song.” The first song, DRONE, was a fairly simple piece progressing through layers of amplitude and frequency modulation at high and low rates, applied to oscillators tuned in various registers. The underlying program mostly utilized early functionality of Auraglyph, specifically the use of basic oscillators and filters and mixing them or connecting them into a variety of modulation architectures. The song progressed slowly over time and involved few sudden changes. The second song, PULSE, comprised three sections. It began with a low bass note of fixed pitch that sounded every two seconds or so, feeding into a cascade of feedback delays to create a steady reverberant pulse. The delay lengths were then modulated with low frequency oscillators, introducing noisy whirs and other sonic artifacts. Over this a slow melody, loosely in tune with the bass pulse, was then constructed. The second section began by dropping the delay modulations and the melody; the bass drone was switched to a sequenced bass line and the tempo was doubled. Over this, melodic activity similar to that from before was then reintroduced, but this time sounding on every offbeat eighth note (in a conceptual sense, as Auraglyph at the time had no way of representing standard Western note durations). In the third section, the melody was again removed and the bass pulse
CHAPTER 5. EVALUATION
91
Figure 5.10: The ending of PULSE, with multiple cascades of modulated feedback delays. from before restored in conjunction with the addition of many more feedback delays. These were cascaded, delay-modulated, and ring modulated to create a chaotic soundscape of noises, obscuring but not eradicating the piece’s starting point. Figure 5.10 shows an Auraglyph program in the midst of this process. The process of developing these compositions uncovered software bugs that needed to be fixed before the musical works could be safely performed in front of an audience without fear of a malfunction or crash. One such bug rendered audio nodes inoperative if all of their connections were removed. Another bug caused the Feedback node to crash if its delay parameter was modulated below 0; this corresponded to a non-sensical delay of negative duration, but crashing was and is considered a poor response to out-of-range input. A cluster of bugs related to the breakdown and cleanup of internal data structures of one program before loading another. The unreliability of the numeric handwriting recognition necessitated additional training examples of input numerals. Fortunately, all of these issues were abated prior to the performance. The author’s set at TOO LATE IV was the first public performance of music created with Auraglyph (Figure 5.11). It consisted mostly of music created with the two prearranged Auraglyph
CHAPTER 5. EVALUATION
92
programs interspersed with a ChucK program running on a laptop and controlled by the laptop’s trackpad. The author concluded his set by intentionally connecting nodes in a way that caused audio to glitch and break down into silence. The performance was warmly received, described afterwards by one audience member as “fire drones” whereas another was amazed that “an iPad could make sounds like that.” One individual present at the show, who identified herself as a visual artist, suggested projecting the iPad visuals during performance due to their interestingness in relation to the audio. As has become standard at performances involving Auraglyph, a number of individuals asked when they could get copies of the software for their own use. Overall the author was satisfied with the work that had been constructed over the previous weeks and performed at the concert.
Figure 5.11: The author performing with Auraglyph at CalArts. Photo: Daniel Chavez Crook. Crucially, distinct advantages of the Auraglyph paradigm made themselves apparent through this exercise. As a performer and composer, it was empowering to be able to work in an environment where low-level sound design, high-level composition, and gestural control were completely integrated into a cohesive framework. It was also liberating working directly with MIDI notes, frequencies, delay times, and amplitudes. Abstracted from the particular rules of Western music theory,
CHAPTER 5. EVALUATION
93
these raw values allowed the author to reason analytically about the relationships within a piece of music in a way that had previously eluded him in conventional music making. Pulled apart from white keys and black keys, detached from staffs and clefs, an inner musical order within the author was empowered to manifest itself.
Figure 5.12: Notes, or a “score,” for the author’s performance at TOO LATE IV. Working with Auraglyph was also interesting in the context of Trueman’s concept of performative attention [35], or more generally how much work must be done to achieve a desired musical result. Performing these pieces was a combination of parameterizing and reconfiguring a pre-made patch in addition to extending that patch with new nodes and connections, or creating a new patch from scratch. This required documentation of the procedures needed to progress a piece as planned, and careful execution of these instructions. The “score” for the author’s performance can be seen in Figure 5.12. Rather than simply running prearranged buffers of code or triggering loops, significant parts of the program had to be built live during the performance in a process similar to live coding.
CHAPTER 5. EVALUATION
94
Performing with Auraglyph required a mentally active process of recreating a program based on a model designed and rehearsed prior to the performance. This left room for plenty of mistakes to be made; instructions in all-capital letters in the score in Figure 5.12 were largely warnings of pitfalls in this process that would lead to undesirable musical results. In an extreme case, “MIND THE FILTER FREQ” warns to ensure that a filter frequency is not modulated below zero, which due to a software bug would have caused it to short-circuit the entire audio path.5 Multitouch interaction was especially helpful in the process of performing with Auraglyph. At times, progressing to the next part of a piece required making or breaking two connections at the same time. The gesture to make or break a connection is only finalized when the finger or pen that began it is lifted from the screen, so multiple fingers can prepare these gestures and then activate them simultaneously by lifting the hand(s) from the screen. Composing with Auraglyph also revealed some of the software’s distinct shortcomings and potential enhancements. The most significant of these was the difficulty of changing between two or more discrete states, for instance when dynamically alternating between notes in performing the melody of PULSE. In the melodic sections of PULSE, changes of individual notes were effected by directly writing in the MIDI note numbers corresponding to the desired pitch. This process was cumbersome and prone to error under the pressure of a concert scenario and with the uneven performance of Auraglyph’s numeric handwriting recognition, but, lacking some sort of on-screen musical keyboard or similar interface, there was no alternative. This difficulty mostly presented itself in the context of performing live with Auraglyph. In contrast, when developing a patch offline it is not usually a problem if it takes an extra half second to enter in the desired value for a parameter. Several issues with the Sequencer node were uncovered during this experiment as well. The most apparent of these was the inability to lock sequencer steps to values of the desired quanta, for instance, to a specific discrete MIDI pitch. Rather, the sequencer was designed so that each step could emit a value in a fixed range from zero to one. In the interest of versatility, the resolution within that range was proportional to the resolution of the touchscreen itself, and not quantized to 5
At the other end of the scale, it is not evident from this exercise that “virtuosity,” as Trueman discusses it, is possible. It is difficult to view virtuosity as simply the absence of mistakes. Rather, virtuosity ought to offer some gratification beyond a merely enjoyable or satisfactory performance. At this point it is not possible to conclude whether virtuosity in this sense is possible in Auraglyph or what that would even look or sound like.
CHAPTER 5. EVALUATION
95
any particularly musically relevant scale. When mapping these values to MIDI note numberings, it was easy to get fractional pitches that would be undesirably dissonant. This was problematic in the context of PULSE, in which melodic notes were adjusted over time with a Sequencer node, and fractional pitches had the potential to clash with previous notes in the melody and with the bass pulse. The sequencer was further limited in this exercise by its inability to set durations of notes using it. Instead, notes in the sequencer lasted through the full duration of each beat, only triggering the envelope to release if the next beat was completely off. It further occurred to the author that perhaps he was not using Auraglyph’s sequencer correctly, despite being the sole designer and developer of it. Instead of programming a melody on a single sequencer row, which in PULSE led to fractional pitches, perhaps a piano roll-like system would have worked better, in which each row of the sequencer corresponds to a unique pitch and the value of a given step in a row is mapped to velocity or ignored.
5.3 Interpretation A number of interpretations can be made from the evaluations of miniAudicle for iPad and Auraglyph described herein. Informally, when given the option between the later revision of Auraglyph and miniAudicle for iPad, most people seem to naturally gravitate towards Auraglyph if they haven’t used either. It is also clear that software reliability is crucial to achieving the greater goals of a creative software application. Auraglyph fared quite poorly in its first user study, in no small part due to its smattering of software bugs and crashes. The bugs that remained in Auraglyph through the second user study also caused vexation for participants in that study. Related to this, these studies indicate that a number of basic practical features can greatly improve the extent to which creative experimentation and exploration is possible in a software application. Based on these results, it is reasonable to conclude that save, load, and new document creation functions greatly enhanced the music-making experience for the second group of Auraglyph users. Participants in the second user study expressed anxiety about accidentally deleting their patches by hitting “New” when they meant “Save;” this anxiety forced those users to be more protective of their existing work to the detriment of further developing alternative ideas.
CHAPTER 5. EVALUATION
96
Users also indicated an interest in having features like undo, save as (or save a copy), copy, and paste, which were not present in Auraglyph during the user studies. These basic features provide a sense of forgiveness, enabling users to explore a broad set of creative ideas without fear of losing previous work or getting stuck in a direction that turned out to be a dead end. Other creative software development environments, such as the Audicle [3] and Field [13], have recognized this and include built-in forms of version control for managing and recalling changes to program code over time. The physicality of using Auraglyph and miniAudicle for iPad also seemed to be an important factor to many participants in the user studies. The movement of the iPad in space played a critical role in HedonismBot, both in the performers’ “playing” the feedback between the device and the speaker and carrying the devices to the titular robot. In the case of both miniAudicle for iPad and Auraglyph, participants in the user studies created instruments that utilized the iPad’s orientation sensors to control sound synthesis. Feedback from users hinted at different mental approaches to a tablet compared to a conventional computer, ranging from a “laid-back experience,” a “toy,” and “fun.” Most importantly, we believe the these studies offer firm evidence of the power of visual and interaction design in creative software. Strong design can make people want to use software for its own sake, allowing them to enjoy the process of understanding and applying complex ideas while exploring a creative space. At times while developing an idea in Auraglyph, the author would simply gaze at the waveforms flying around, fascinated by their elemental beauty, independent of the audible result. It was fun just to sit back and watch and listen to what had been created, even if it wasn’t musically meaningful. Students in the “Composing with Mobile Technology” workshop were captivated by the software for extended periods of time, reportedly until the early morning in a few instances. Auraglyph makes a number of distinctive visual design decisions—the simplistic line graphics, the windows that fold and unfold in a unique animation, the smooth pulses that travel between control nodes—that provide arguable functional benefit but are intended to provide small bits of ancillary satisfaction to the user as they develop their program. Significant effort was spent implementing these and other design elements (for instance, several weeks of development were dedicated to font rendering alone). That effort could have been directed towards more node types or more advanced
CHAPTER 5. EVALUATION
97
synthesis models, but we feel it was well spent as it was; the visual character and the way software feels is as crucial as its functionality. Visual design is not limited to purely cosmetic tweaks. The waveform display serves both as a visual decoration and as a tool for seeing how each node affects the audio signal, and the control signal pulses, while rendered in a visually unique way, serve as an important indicator of an Auraglyph program’s state. These features of Auraglyph were widely cited as positive aspects of working with the software. Perhaps the most critical component of the design is the interactions themselves, how a series of touches or stylus strokes map to a resulting state of the software. Participants in the user studies for Auraglyph appreciated the sketch-oriented, stream of consciousness approach to designing musical programs (so long as the handwriting recognition actually worked). Auraglyph was compared favorably in some aspects to similar software such as Pure Data, Reaktor, or digital audio workstations; despite having far less functionality than these systems, Auraglyph succeeded at presenting the functions it did have in an approachable and inviting way. When users speak of the diverse sonic possibilities and so-called “destructive creativity” that Auraglyph enables, they don’t mean that Auraglyph implements some fundamentally different way of achieving audio synthesis or music composition. On the contrary, its fairly basic set of operators consists chiefly of standard waveforms, basic filters, an ADSR envelope, delay, and white noise. Rather, the way that these basic elements are presented and how users are guided to interact with them—the design of the software—suggests these broad sonic capabilities and interactions.
Chapter 6
Conclusion The best way to predict the future is to invent it. Alan Kay
This dissertation has described the context, design, implementation, and evaluation of two systems for developing music software using mobile touch devices. The general approach to developing these systems was to consider what design features are actually appropriate for the touchscreen-based mobile technology medium. We believe this strategy applies equally to the mobile touch technology as it does to whichever category of interaction is next to become widespread, be it virtual reality, augmented reality, or something yet unknown. The future of creative technology is not predestined; as its necessary inventors we are free to break down its conventions and build them anew. The development of these systems has followed a general design framework for creating interactive mobile touch software. Some components of this framework were considered explicitly in the development of miniAudicle for iPad and Auraglyph whereas some were implicit; some have been carried over from the author’s and his colleagues’ previous work and some have only been realized in the implementation of the original research described in this dissertation. In the remaining pages, we will attempt to distill this framework into a form applicable to future work stemming from these research efforts and, we hope, applicable to the work of others.
98
CHAPTER 6. CONCLUSION
99
6.1 A Framework for Mobile Interaction In our understanding, mobile interaction is a synthesis of touchscreen interaction and its immediate consequent direct manipulation, dynamic graphics, physicality, networked communication, and identity (Figure 6.1).
Touch Interaction
Dynamic Graphics
Direct Manipulation
Identity
Physical shape Sensors
Physicality
Network
Figure 6.1: Topology of mobile interaction. Touch interaction as we use the term here is defined as a system in which a user interacts with software application through the same screen presenting the application’s interface using various touch-based gestures. These include simple tapping, holding a finger, and swiping a pressed finger in a direction or some more complex path. These gestures can be further compounded by sequencing them in time (e.g., a double-tap) or by coordinating gestures from multiple fingers. Gestures are naturally continuous, but may also be discretized if appropriate to the action that results (for instance, swiping to turn a page or double-tapping to zoom in to a zoomable interface). From a technical perspective, touch interaction introduces unique challenges. Touch positions must be managed and analyzed over time, and differences between two distinct gestures must generally be assessed according to heuristics that require tuning. For instance, analyzing a “swipe”
CHAPTER 6. CONCLUSION
100
gesture requires ascertaining how far an individual touch must travel to be considered a swipe and how much perpendicular finger movement to tolerate in the swipe. Software implementing gestures must be able to distinguish between multiple similar gestures and correctly distribute gestures that could have more than one possible target. As a whole, touch gestures readily lend themselves to basic desktop interaction metaphors, such as clicking buttons or hyperlinks. However, a powerful class of metaphor enabled by touch interaction is direct manipulation, in which an on-screen object is designed to represent some real-world object and may be manipulated and interacted with as such. Early touchscreen phone applications took such metaphors to garish extremes, including Koi Pond, creating a virtual pond within its user’s phone with fish they can poke, and Smule Sonic Lighter [25], an on-screen lighter which was flicked to light, would “burn” the edges of the screen if tilted, and could even be extinguished by blowing into the phone’s microphone. Touch interaction and direct manipulation both depend on dynamic graphics. Dynamic graphics can be thought of as a considered pairing of computer graphics per se and animation, guided by principles of physical systems. Interactive objects can have momentum, resistance, or attractions and repulsions, and they can grow and shrink or fade in and fade out, both in response to user input and of their own accord. The exact properties are not so important as the overall effect of liveness and the suggestion that one might be interacting with a real object rather than a computer representation. While this can be achieved with full physics engines and constraint solvers, often it is sufficient to animate the motion of objects with simple exponential curves, an effective simulation of a restoring force such as a spring or rubber band (Figure 6.2). On the other hand, pure linear motion is stilted and robotic, easily exposed as the movement of a machine. Direct manipulation also supports physicality, the third component of our mobile interaction framework. Physicality is most influenced, however, by the physical shape of the device itself. In fundamentally different ways from conventional computers, mobile devices can be easily picked up, gestured with, thrown, turned upside-down, stored in a pocket, and/or transported. They often do experience many of these interactions on a daily basis; their very utility is dependent on the ability to do so. The supplementary sensor interfaces of a device further contribute to the physicality of a device. Most mobile devices are outfitted with an array of inertial and spatial sensors, allowing
CHAPTER 6. CONCLUSION
101
Figure 6.2: Linear (top) vs. exponential (bottom) motion. The latter resembles the motion of an object being acted upon by a restoring force, giving it a more life-like quality when animated. software to determine its location on Earth, its orientation, and its direction. These capabilities further integrate the mobile device into its user’s environment and physical presence. Network is integral to our mobile interaction framework, though it is not directly related to the others. Mobile devices, and especially mobile phones, are communication tools, first and foremost. In addition to their primary functionality as a telephone, mobile phones have developed an assortment of additional channels for text, voice, and video communication. Cloud computing trends have begun to view mobile devices as nodes in the network lattice, constantly synchronizing their data to and from a central network store. In this sense, it is easy to think of mobile technology as being fundamentally inseparable from the network. From this perspective, the software applications described in this dissertation are flawed by not fundamentally engaging in networked interactions. One can easily imagine being able to synchronize control and timing information and even audio data between multiple Auraglyph performers either through the local network or over the global internet. The power of network in mobile interaction is best exemplified by, for example, Ocarina. The networking component of Ocarina in fact completes the experience by drawing its users in to a shared musical world. The fluid sharing of musical performances leverages the technology’s greatest advantage over a traditional instrument, the power to instantly communicate (musically) with the entire world. Identity in a sense unites all of the parts of this framework and is also the most ephemeral of them. Identity asks creators of creative mobile software to consider their user’s relationship to the device itself. A mobile phone is a multi-purpose device ultimately linked to communication and relationships. A designer of mobile phone software is forced to consider that, at any moment, their
CHAPTER 6. CONCLUSION
102
software might be forcibly interrupted by a FaceTime call from the user’s significant other, boss, parents, or child. A tablet is liable to be used lying on the couch, or reading in the park, or watching Netflix in bed. These functions of the technology are not inseparable from the hardware itself, and the user’s relationship with the device will affect their perception of software the runs on it. Mobile software design must take these factors into consideration.
6.2 Future Work The completion of this research has prompted two primary goals for the near future. The first of these is to further develop an extended musical work using Auraglyph, continuing the author’s compositional endeavors described in Chapter 5. We firmly maintain that the most important appraisal of new music technologies is to actually make music with it. Development of the author’s performance practice, and hopefully that of others, will offer a continuing assessment of the ideas contained herein. To this end, the second short term goal for this research is to prepare miniAudicle for iPad and Auraglyph for release to the general public. In terms of reliability and required feature set, there is a considerable difference between software that is utilized and tested in a controlled environment and software that is distributed to many unknown users across the globe. While this gap has been closed somewhat, some essential pieces remain to be completed; these include additional node types, undo/copy/paste operations, and platform integration features like Inter-App Audio and Ableton Link, which have become expected of non-trivial mobile audio software. Support features like a website, example files, documentation, and demonstration and tutorial videos must be created to soften the learning curve of both applications and market them effectively. A variety of additional sound design and compositional interactions would benefit Auraglyph. We would like to enhance the mathematical processing capabilities of Auraglyph with the option to write a mathematical equation and have it be applied to audio or control signals. The ability to directly write conventional Western notation, freeform graphical scores, and structures for long-term musical form would supplement Auraglyph’s compositional toolset. Adding tools to work with natural input—for instance, audio analysis and video input and analysis—would also expand the
CHAPTER 6. CONCLUSION
103
compositional possibilities afforded to users. Finishing the Composite node will enable users to build modules out of networks of nodes and facilitate code reuse. The addition of nodes for rendering custom graphics will also complement the expressive capabilities and visual distinctiveness of Auraglyph. There are interesting questions related to networking for both Auraglyph and miniAudicle for iPad. Network synchronization of tempo was explicitly asked for by Auraglyph users; its clear that local group performances could benefit from sharing timing information as well as general control data (for instance, pitches, keys, or frequencies) and raw audio waveforms. Distributing these collaborative audio programming interactions over a wide area network presents interesting challenges both in terms of design and implementation. Latency is always a concern when sharing real-time control or audio data over a wide area network; telematic performance practice has explored compositional approaches to managing latency, but these solutions are not always suitable for a given genre of music. Another possibility is to synchronize just the program structures over the network, with a single global program definition shared between programmers that generates parallel control and audio information on each system. Asynchronous models for collaborative programming are also worth exploring, in which a program can be built by multiple users over a longer period of time. Also needing research attention is how collaborative music programmers might find each other in a wide area networked framework; whether participants would need to coordinate with each other in advance, if there would be some sort of matchmaking system, or some other method of discovering other users with complementary musical aesthetics. Auraglyph’s handwriting recognition system is fairly inflexible, being based on a software library designed for machine learning research and not designed for iOS. It would be ideal to switch to a simpler handwriting recognition system, such as the $1 Stroke Recognizer [100]. A simpler system would assist the creation of new gesture sets, like musical notes, mathematical symbols, or the Roman alphabet, enabling further research into the integration of handwritten input into coding systems. The original proposal for this dissertation included a concept for a supplementary programming language called Glyph; time constraints and design difficulties inhibited progress on this idea. This
CHAPTER 6. CONCLUSION
104
language would mix textual and visual paradigms in a system of hand-drawn text tokens interconnected either by proximity or drawn graphical structures. These ideas merit further exploration to improve Auraglyph’s handling of common programming tasks like loops and functional programming. Further exploring the role of physical objects and writing tools in Auraglyph’s sketching paradigm suggests interesting avenues for research. Physical knobs, discrete tokens such as those used by the reacTable, or other musical controls placed on-screen present interesting possibilities to complement stylus and touch interaction. The development of other implements, such as an eraser or a paint brush, could facilitate additional classes of gestures and expand the physical/virtual metaphor underlying Auraglyph. miniAudicle for iPad can also benefit from further research and development efforts. Supplementing the text editing features to move further away from dependence on a virtual keyboard could possibly lead to greater support for expressive ChucK coding on the iPad. Such features might include machine learning-based text input prediction, or palettes of standard code templates like while loops, input handling, or control-rate audio modulation. Research on additional programming capabilities like user interface design and three-dimensional graphics have already begun, as has development of functionality to share ChucK code over the network with other miniAudicle users and to the general web audience [111]. We are actively interested how the concepts and methods described herein might be applied to new interaction media, such as virtual reality or augmented reality. Programmers might use their hands to connect programming nodes in a virtual space that can be managed at a high level of abstraction and then zoomed in to examine low level details. A virtual environment might also support networked, collaborative programming with other users around the world. An augmented reality system derived from these ideas might allow programmers to build programming structures that interact with their environment, co-located in space and interacting with elements of the real world.
CHAPTER 6. CONCLUSION
105
6.3 Final Remarks The desire to utilize interactions unique to mobile touchscreen technology for music programming has led to the development of two new software environments for music computing. miniAudicle for iPad brings ChucK to a tablet form factor, enabling new kinds of interactions based on the familiar textual programming paradigm. Contrasting with this initial effort, we propose a new model for music computing combining touch and handwritten input. A software application called Auraglyph has been developed to embody one conception of this model. A series of studies of the use of these systems have given these ideas validation, while also revealing additional related concerns such as reliability and forgiveness. When these software applications have reached a reasonable level of functionality and reliability expected of publicly available products, miniAudicle for iPad will be released via https://ccrma.stanford.edu/~spencer/mini-ipad and Auraglyph will be released via its website, https://auragly.ph/.
Appendix A
Node Types in Auraglyph A.1 Audio Nodes
name
icon
SineWave
description
Standard sinusoidal oscillator. parameters - freq : Oscillator frequency - gain : Output gain. ports - freq : Oscillator frequency. - gain : Output gain.
106
APPENDIX A. NODE TYPES IN AURAGLYPH
name
icon
SawWave
description
Standard sawtooth wave oscillator. parameters - freq : Oscillator frequency - gain : Output gain. ports - freq : Oscillator frequency - gain : Output gain.
SquareWave
Standard square wave oscillator. parameters - freq : Oscillator frequency - width : Pulse width of wave as fraction of full wavelength. - gain : Output gain. ports - freq : Oscillator frequency - width : Pulse width of wave as fraction of full wavelength. - gain : Output gain.
107
APPENDIX A. NODE TYPES IN AURAGLYPH
name
icon
TriWave
description
Standard triangle wave oscillator. parameters - freq : Oscillator frequency - gain : Output gain. ports - freq : Oscillator frequency - gain : Output gain.
Waveform
User-defined waveform oscillator. parameters - gain : Output gain. - freq : Oscillator frequency. - dur : Oscillator duration or wavelength (seconds). ports - freq : Oscillator frequency. - gain : Output gain.
Noise
White noise generator. parameters - gain : Output gain. ports - gain : Output gain.
108
APPENDIX A. NODE TYPES IN AURAGLYPH
name
icon
ADSR
description
Attack-decay-sustain-release (ADSR) envelope. parameters - gain : Output gain. - attack : Attack duration (seconds). - decay : Decay duration (seconds). - sustain : Sustain level (linear amplitude). - release : Release duration (seconds). ports - input : Input to apply envelope. - gain : Output gain. - trigger : Envelope trigger (triggered for any value above 0).
Feedback
Delay processor with built-in feedback. parameters - gain : Output gain. - delay : Delay length (seconds). - feedback : Feedback gain. ports - input : Input signal. - delay : Delay length (seconds). - feedback : Feedback gain. - gain : Output gain.
109
APPENDIX A. NODE TYPES IN AURAGLYPH
name
icon
LowPass
description
Resonant low-pass filter (second order Butterworth). parameters - gain : Output gain. - freq : Filter cutoff frequency. - Q : Filter Q (bandwidth). ports - input : Filter input. - gain : Output gain. - freq : Filter cutoff frequency. - Q : Filter Q (bandwidth).
HiPass
Resonant high-pass filter (second-order Butterworth). parameters - gain : Output gain. - freq : Filter cutoff frequency. - Q : Filter Q (bandwidth). ports - input : Filter input. - gain : Output gain. - freq : Filter cutoff frequency. - Q : Filter Q (bandwidth).
110
APPENDIX A. NODE TYPES IN AURAGLYPH
name
icon
BandPass
description
Band pass filter (second-order Butterworth). parameters - gain : Output gain. - freq : Filter cutoff frequency. - Q : Filter Q (bandwidth). ports - input : Filter input. - gain : Output gain. - freq : Filter cutoff frequency. - Q : Filter Q (bandwidth).
Compressor
Dynamic range compressor node. parameters - threshold : Compressor threshold (dB). - ratio : Compressor ratio. - gain : Output gain. ports - input : Input signal.
111
APPENDIX A. NODE TYPES IN AURAGLYPH
name
icon
Add
112
description
Simply adds singular value, or if multiple inputs, sums all inputs. parameters - add : Input(s) to add. - gain : Output gain. ports - add : Quantity to add, if only one input.
Multiply
Multiplies a single input by a constant value, or multiples inputs together if there is more than one. parameters - multiply : Input(s) to multiply together. - gain : Output gain. ports - multiply : Quantity to multiply by, if only one input.
Input
Routes audio from input device, such as a microphone. parameters - gain : Output gain.
APPENDIX A. NODE TYPES IN AURAGLYPH
name
icon
Output
113
description
Routes audio to final destination device, such as a speaker or headphones. parameters - gain : Output gain. ports - left : Left output channel - right : Right output channel
Composite
Composite node containing a user-defined subprogram. parameters - gain : Output gain. ports - input : Input signal.
APPENDIX A. NODE TYPES IN AURAGLYPH
A.2 Control Nodes
name
icon
Timer
description
Emits pulses at the specified interval. parameters - interval : Timer fire interval (seconds). ports - interval : Timer fire interval (seconds).
Array
Stores an array of values. ports - iterate : Advance array by one item and push that item.
Sequencer
Discrete step sequencer for control values. parameters - bpm : BPM of sequencer. ports - advance : Triggers step to advance by one. - bpm : BPM of sequencer.
midi2freq
Converts MIDI pitch input to frequency value. ports - midi : MIDI note input control value.
114
APPENDIX A. NODE TYPES IN AURAGLYPH
name
icon
Add
115
description
Adds constant value to input. parameters - add : Quantity to add to input. ports - add : Input control value.
Multiply
Multiplies input by constant value. parameters - mult : Quantity to multiply input by. ports - mult : Input control value.
Orientation
Outputs Euler angles (rotation about X, Y, and Z axes) corresponding to device orientation. parameters - rate : Rate at which to read sensors. ports - read : Triggers sensor reading and output.
Appendix B
User Study Documentation Included here are the surveys used to solicit quantitative and qualitative user feedback during the evaluation phase of this research. Questions related to the modified Creative Support Index (CSI) were displayed in a random order to not let on that the questions were paired. The category for each modified CSI (enjoyment, exploration, expressiveness, immersion, worth effort, learnability) is displayed here for reference, but were not shown to participants in the user studies.
B.1 miniAudicle for iPad Entry Survey Please answer the following questions with a number from 1-5. 1 = highly disagree, 2 = disagree, 3 = neutral/unsure, 4 = agree, 5 = highly agree 1. I am a skilled musician or composer. 2. I am a skilled computer programmer. 3. I am a skilled with the ChucK music programming language. 4. I use the following coding editors/environments a few times per week or more (select all that apply) (a) Xcode
116
APPENDIX B. USER STUDY DOCUMENTATION
117
(b) Processing (c) Sublime Text (d) Eclipse (e) Visual Studio (f) vi/vim (g) emacs (h) miniAudicle (i) Reaktor (j) Max/MSP (k) PureData (l) SuperCollider 5. I use the following audio editors/digital audio workstations (DAWs) a few times per week or more (select all that apply) (a) Ableton Live (b) Logic Pro (c) ProTools (d) Cubase (e) Digital Performer (f) Audacity (g) Reaper (h) Ardour
B.2 miniAudicle for iPad Exit Survey 1. How much time did you spend using miniAudicle for iPad?
APPENDIX B. USER STUDY DOCUMENTATION
118
(a) Less than 1 hour (b) 2 hours (c) 3 hours (d) 4 hours (e) 5 or more hours Please answer the following questions with a number from 1-5. Remember, this survey is anonymous, so your honesty is appreciated. 1 = highly disagree, 2 = disagree, 3 = neutral/unsure, 4 = agree, 5 = highly agree 3. I would be happy to use miniAudicle for iPad on a regular basis. (enjoyment) 4. I enjoyed using miniAudicle for iPad. (enjoyment) 5. It was easy for me to explore many different ideas, options, designs, or outcomes using miniAudicle for iPad. (exploration) 6. miniAudicle for iPad was helpful in allowing me to track different ideas, outcomes, or possibilities. (exploration) 7. I was able to be creative while making music with miniAudicle for iPad. (expressiveness) 8. miniAudicle for iPad allowed me to be very expressive. (expressiveness) 9. My attention was fully tuned to coding/music-making, and I forgot that I was using miniAudicle for iPad. (immersion) 10. I became so absorbed in coding/music-making that I forgot that I was using miniAudicle for iPad. (immersion) 11. I was satisfied with the work I made using miniAudicle for iPad. (worth effort) 12. What I made in miniAudicle for iPad was worth the effort I put in to make it. (worth effort) 13. I felt like I learned how to use miniAudicle for iPad better as I used it more. (learnability)
APPENDIX B. USER STUDY DOCUMENTATION
119
14. Using miniAudicle for iPad became easier over time. (learnability) Please answer the following questions in writing. Remember, this survey is anonymous, so your honesty is appreciated. 19. What (if anything) did you *like* about working with code in miniAudicle for iPad compared to working with ChucK code on a desktop or laptop? 20. What (if anything) did you *dislike* about working with code in miniAudicle for iPad compared to working with ChucK code on a desktop or laptop? 21. What (if any) musical possibilities do you feel miniAudicle for iPad can enable, compared to desktop music coding? 22. What (if any) musical possibilities do you feel are difficult or impossible with miniAudicle for iPad, compared to desktop music coding? 23. Any additional feedback, thoughts, or concerns regarding miniAudicle for iPad?
B.3 Auraglyph survey Please answer the following questions with a number from 1-5. 1 = Not comfortable at all, 5 = very comfortable 1. Please rate your overall comfort level with music programming tools like Max/MSP, Reaktor, ChucK, SuperCollider, Pd, etc. 2. Please rate your overall comfort level performing with a musical instrument or voice. 3. Please rate your overall comfort level with composing music, producing songs, and/or songwriting. 1 = Not comfortable at all, 5 = very comfortable 4. Please rate your overall comfort level with digital audio workstation (DAW) software, notation software, or similar tools. 1 = Not comfortable at all, 5 = very comfortable
APPENDIX B. USER STUDY DOCUMENTATION
120
5. How much time did you spend using Auraglyph? (a) Less than 1 hour 1 hour (b) 2 hours (c) 3 hours (d) 4 or more hours Please answer the following questions with a number from 1-5. Remember, this survey is anonymous, so your honesty is appreciated. 1 = highly disagree, 2 = disagree, 3 = neutral/unsure, 4 = agree, 5 = highly agree 6. I would be happy to use Auraglyph on a regular basis. (enjoyment) 7. I enjoyed using Auraglyph. (enjoyment) 8. It was easy for me to explore many different ideas, options, designs, or outcomes using Auraglyph. (exploration) 9. Auraglyph was helpful in allowing me to track different ideas, outcomes, or possibilities. (exploration) 10. I was able to be creative while making music with Auraglyph. (expressiveness) 11. Auraglyph allowed me to be very expressive. (expressiveness) 12. My attention was fully tuned to musicmaking, and I forgot that I was using Auraglyph. (immersion) 13. I became so absorbed in coding/musicmaking that I forgot that I was using Auraglyph. (immersion) 14. I was satisfied with the work I made using Auraglyph. (worth effort) 15. What I made in Auraglyph was worth the effort I put in to make it. (worth effort) 16. I felt like I learned how to use Auraglyph better as I used it more. (learnability)
APPENDIX B. USER STUDY DOCUMENTATION
121
17. Using Auraglyph became easier over time. (learnability) Please answer the following questions in writing. Remember, this survey is anonymous, so your honesty is appreciated. 18. What (if anything) did you *like* about working with Auraglyph, compared to other musical tools you’ve used? 19. What (if anything) did you *dislike* about working with Auraglyph, compared to other musical tools you’ve used? 20. What (if any) musical possibilities do you feel Auraglyph can enable, compared to other musical tools you’ve used? 21. What (if any) musical possibilities do you feel are difficult or impossible with Auraglyph, compared to other musical tools you’ve used? 22. Any additional feedback, thoughts, or concerns regarding Auraglyph?
Bibliography [1] M. V. Mathews, “The digital computer as a musical instrument,” Science, vol. 142, no. 3592, pp. 553–557, 1963. [2] S. Salazar, G. Wang, and P. Cook, “miniAudicle and ChucK Shell: New interfaces for ChucK development and performance,” in Proceedings of the International Computer Music Conference, 2006, pp. 63–66. [3] G. Wang and P. R. Cook, “The Audicle: A context-sensitive, on-the-fly audio programming environ/mentality,” in Proceedings of the International Computer Music Conference, 2004, pp. 256–263. [4] G. Wang, P. R. Cook, and S. Salazar, “ChucK: A strongly timed computer music language,” Computer Music Journal, vol. 39, no. 4, pp. 10–29, 2015. [5] P. R. Cook and G. Scavone, “The Synthesis ToolKit (STK),” in Proceedings of the International Computer Music Conference, 1999, pp. 164–166. [6] G. Wang, A. Misra, P. Davidson, and P. R. Cook, “CoAudicle: A collaborative audio programming space,” in Proceedings of the International Computer Music Conference, 2005. [7] A. Sorensen, “Impromptu: An interactive programming environment for composition and performance,” in Proceedings of the Australasian Computer Music Conference, 2005. [8] J. McCartney, “Rethinking the computer music language: SuperCollider,” Computer Music Journal, vol. 26, no. 4, pp. 61–68, 2002.
122
BIBLIOGRAPHY
123
[9] N. Collins, A. McLean, J. Rohrhuber, and A. Ward, “Live coding in laptop performance,” Organised Sound, vol. 8, no. 3, pp. 321–330, 2003. [10] T. Magnusson, “ixi lang: a SuperCollider parasite for live coding,” in Proceedings of International Computer Music Conference, 2011, pp. 503–506. [11] S. Aaron and A. F. Blackwell, “From Sonic Pi to Overtone: Creative musical experiences with domain-specific and functional languages,” in Proceedings of the First ACM SIGPLAN Workshop on Functional Art, Music, Modeling and Design.
ACM, 2013, pp. 35–46.
[12] ——, “From Sonic Pi to Overtone: Creative musical experiences with domain-specific and functional languages,” in Proceedings of the first ACM SIGPLAN workshop on Functional art, music, modeling & design.
ACM, 2013, pp. 35–46.
[13] M. Downie and P. Kaiser, “Welcome to Field: a development environment for making digital art,” accessed August 4, 2015. [Online]. Available: http://openendedgroup.com/field/ [14] M. Puckette, “Pure Data: Another integrated computer music environment,” in Proceedings of the Second Intercollege Computer Music Concerts, 1996, pp. 37–41. [15] D. Zicarelli, “An extensible real-time signal processing environment for MAX,” in Proceedings of the International Computer Music Conference, 1998. [16] M. Puckette, “Max at seventeen,” Computer Music Journal, vol. 26, no. 4, pp. 31–43, 2002. [17] ——, “Using Pd as a score language,” in Proceedings of the International Computer Music Conference, 2002. [18] M. Danks, “The graphics environment for Max,” in Proceedings of the 1996 International Computer Music Conference, 1996, pp. 67–70. [19] ——, “Real-time image and video processing in GEM,” in Proceedings of the International Computer Music Conference, 1997. [20] C. Scaletti, “The Kyma/Platypus computer music workstation,” Computer Music Journal, vol. 13, no. 2, pp. 23–38, 1989.
BIBLIOGRAPHY
124
[21] ——, “Computer music languages, Kyma, and the future,” Computer Music Journal, vol. 26, no. 4, pp. 69–82, 2002. [22] M. Minnick, “A graphical editor for building unit generator patches,” in Proceedings of the International Computer Music Conference, 1990. [23] J. Smith, Personal communication, Mar. 2017. [24] V. Norilo, “Visualization of signals and algorithms in Kronos,” in Proceedings of the International Conference on Digital Audio Effects, 2012. [25] G. Wang, G. Essl, J. Smith, S. Salazar, P. Cook, R. Hamilton, R. Fiebrink, J. Berger, D. Zhu, M. Ljungstrom et al., “Smule= sonic media: An intersection of the mobile, musical, and social,” in Proceedings of the International Computer Music Conference, 2009, pp. 16–21. [26] G. Wang, “Designing Smule’s Ocarina: The iPhone’s magic flute,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2009, pp. 303–307. [27] ——, “Ocarina: Designing the iPhone’s magic flute,” Computer Music Journal, vol. 38, no. 2, 2014. [28] G. Wang, J. Oh, S. Salazar, and R. Hamilton, “World Stage: A crowdsourcing paradigm for social/mobile music,” in Proceedings of the International Computer Music Conference, 2011. [29] G. Wang, S. Salazar, J. Oh, and R. Hamilton, “World Stage: Crowdsourcing paradigm for expressive social mobile music,” Journal of New Music Research, vol. 44, no. 2, pp. 112– 128, 2015. [30] G. Wang, J. Oh, and T. Lieber, “Designing for the iPad: Magic Fiddle,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2011, pp. 197–202. [31] D. Trueman, P. Cook, S. Smallwood, and G. Wang, “PLOrk: the Princeton Laptop Orchestra, year 1,” in Proceedings of the International Computer Music Conference, 2006, pp. 443–450. [32] S. Smallwood, D. Trueman, P. R. Cook, and G. Wang, “Composing for Laptop Orchestra,” Computer Music Journal, vol. 32, no. 1, pp. 9–25, 2008.
BIBLIOGRAPHY
125
[33] G. Wang, N. Bryan, J. Oh, and R. Hamilton, “Stanford Laptop Orchestra (SLOrk),” in Proceedings of the International Computer Music Conference, 2009. [34] R. Fiebrink, G. Wang, and P. R. Cook, “Don’t forget the laptop: Using native input capabilities for expressive musical control,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2007, pp. 164–167. [35] D. Trueman, “Why a laptop orchestra?” Organised Sound, vol. 12, no. 2, pp. 171–179, 2007. [36] C. Bahn and D. Trueman, “Interface: electronic chamber ensemble,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2001. [37] D. Trueman and P. Cook, “BoSSa: The deconstructed violin reconstructed,” Journal of New Music Research, vol. 29, no. 2, pp. 121–130, 2000. [38] D. Trueman, C. Bahn, and P. Cook, “Alternative voices for electronic sound,” The Journal of the Acoustical Society of America, vol. 108, no. 5, pp. 2538–2538, 2000. [39] P. R. Cook, “Remutualizing the musical instrument: Co-design of synthesis algorithms and controllers,” Journal of New Music Research, vol. 33, no. 3, pp. 315–320, 2004. [40] J. Oh, J. Herrera, N. J. Bryan, L. Dahl, and G. Wang, “Evolving the Mobile Phone Orchestra,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2010, pp. 82–87. [41] N. J. Bryan, J. Herrera, J. Oh, and G. Wang, “MoMu: A mobile music toolkit,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2010. [42] M. Wright and A. Freed, “Open Sound Control: A new protocol for communicating with sound synthesizers,” in Proceedings of the International Computer Music Conference, 1997, pp. 101–104. [43] S. Tarakajian, D. Zicarelli, and J. K. Clayton, “Mira: Liveness in iPad controllers for Max/MSP,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2013.
BIBLIOGRAPHY
126
[44] C. Roberts, “Control: Software for end-user interface programming and interactive performance,” in Proceedings of the International Computer Music Conference, 2011. [45] C. Roberts, G. Wakefield, and M. Wright, “Mobile controls on-the-fly: An abstraction for distributed NIMEs,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2012. [46] G. Geiger, “PDa: Real time signal processing and sound generation on handheld devices,” in Proceedings of the International Computer Music Conference, 2003, pp. 283–286. [47] ——, “Using the touch screen as a controller for portable computer music instruments,” in Proceedings of the International Conference on New Interfaces for Musical Expression. IRCAM—Centre Pompidou, 2006, pp. 61–64. [48] P. Brinkmann, P. Kirn, R. Lawler, C. McCormick, M. Roth, and H.-C. Steiner, “Embedding Pure Data with libpd,” in Proceedings of the Pure Data Convention, 2011. [49] D. Iglesia, “The mobility is the message: The development and uses of MobMuPlat,” in Proceedings of the 5th International Pure Data Convention, 2016. [50] J. Silva, “Jazzmutant Lemur: Touch-sensitive MIDI controller,” Sound on Sound, no. 3, 2007. [51] S. Jordà, G. Geiger, M. Alonso, and M. Kaltenbrunner, “The reacTable: Exploring the synergy between live music performance and tabletop tangible interfaces,” in Proceedings of the 1st International Conference on Tangible and Embedded Interaction. ACM, 2007, pp. 139–146. [52] P. L. Davidson and J. Y. Han, “Synthesis and control on large scale multi-touch sensing displays,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2006, pp. 216–219. [53] J. Chadabe, Electric Sound: The Past and Promise of Electronic Music. Prentice Hall, 1997.
BIBLIOGRAPHY
127
[54] D. W. Bernstein, “The San Francisco Tape Music Center: Emerging art forms and the American counterculture, 1961-1966,” in The San Francisco Tape Music Center: 1960s Counterculture and the Avant-Garde, D. W. Bernstein, Ed.
Berkeley, CA: University of California
Press, 2008, pp. 5–41. [55] D. Wessel, R. Avizienis, A. Freed, and M. Wright, “A force sensitive multi-touch array supporting multiple 2-d musical control structures,” in Proceedings of the International Conference on New interfaces for Musical Expression, 2007, pp. 41–45. [56] L. Sasaki, G. Fedorkow, W. Buxton, C. Retterath, and K. C. Smith, “A touch-sensitive input device,” in Proceedings of the International Computer Music Conference, 1981. [57] C. F. Herot and G. Weinzapfel, “One-point touch input of vector information for computer displays,” in Proceedings of the 5th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), vol. 12, no. 3.
ACM, 1978, pp. 210–216.
[58] G. Essl and A. Müller, “Designing mobile musical instruments and environments with urMus,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2010, pp. 76–81. [59] N. Tillmann, M. Moskal, J. de Halleux, and M. Fahndrich, “TouchDevelop: Programming cloud-connected mobile devices via touchscreen,” in Proceedings of the 10th SIGPLAN Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (ONWARD ’11).
ACM, 2011, pp. 49–60.
[60] T. Y. Levin, “‘Tones from out of Nowhere’: Rudolph Pfenninger and the archaeology of synthetic sound,” Grey Room, no. 12, pp. 32–79, 2003. [61] L. Moholy-Nagy, “Production–reproduction,” in Moholy-Nagy, K. Passuth, Ed.
London:
Thames and Hudson, 1985, pp. 289–290. [62] ——, “New form in music. Potentialities of the phonograph,” in Moholy-Nagy, K. Passuth, Ed. London: Thames and Hudson, 1985, pp. 291–292.
BIBLIOGRAPHY
128
[63] ——, “Problems of the modern film,” in Moholy-Nagy, K. Passuth, Ed.
London: Thames
and Hudson, 1985, pp. 311–315. [64] N. Izvolov, “The history of drawn sound in Soviet Russia,” Animation Journal, vol. 7, no. 2, pp. 54–59, 1998, trans. James Mann. [65] A. Smirnov, Sound in Z: Experiments in Sound and Electronic Music in Early 20th Century Russia.
London: Koenig Books, 2013.
[66] N. McLaren, Pen Point Percussion. National Film Board of Canada, 2006. [67] W. E. Jordan, “Norman McLaren: His career and techniques,” The Quarterly of Film Radio and Television, vol. 8, no. 1, pp. 1–14, 1953. [68] N. McLaren and W. Jordan, “Notes on animated sound,” The Quarterly of Film Radio and Television, vol. 7, no. 3, pp. 223–229, 1953. [69] S. Kreichi, “The ANS synthesizer: Composing on a photoelectronic instrument,” Leonardo, vol. 28, no. 1, pp. 59–62, 1995. [70] J. Hutton, “Daphne oram: innovator, writer and composer,” Organised Sound, vol. 8, no. 1, pp. 49–56, 2003. [71] I. E. Sutherland, “Sketchpad: A man-machine graphical communication system,” in Proceedings of the Spring Joint Computer Conference.
ACM, May 1963, pp. 329–346.
[72] M. R. Davis and T. Ellis, “The RAND tablet: A man-machine graphical communication device,” in Proceedings of the Fall Joint Computer Conference. ACM, 1964, pp. 325–331. [73] W. H. Ware, P. Chalk, R. Warnes, L. Clutterbuck, A. K. Winn, and S. N. Kirby, RAND and the information evolution: A history in essays and vignettes. RAND Corporation, 2008. [74] T. O. Ellis, J. F. Heafner, and W. Sibley, “The GRAIL language and operations,” RAND Corporation, Tech. Rep. RM-6001-ARPA, 1969.
BIBLIOGRAPHY
129
[75] G. F. Groner, R. Clark, R. Berman, and E. C. DeLand, “BIOMOD: An interactive computer graphics system for modeling,” in Proceedings of the Fall Joint Computer Conference. ACM, 1971, pp. 369–378. [76] H. Lohner, “The UPIC system: A user’s report,” Computer Music Journal, vol. 10, no. 4, pp. 42–49, 1986. [77] R. Squibbs, “Images of sound in Xenakis’s Mycenae-Alpha,” Proceedings of the Third Journe’es d’Informatique Musicale JIM, vol. 96, pp. 208–219, 1996. [78] T. Coduys and G. Ferry, “Iannix: Aesthetical/symbolic visualisations for hypermedia composition,” in Proceedings of the Sound and Music Computing Conference, 2004, pp. 18–23. [79] G. Marino, M.-H. Serra, and J.-M. Raczinski, “The UPIC system: Origins and innovations,” Perspectives of New Music, vol. 31, no. 1, pp. 258–269, 1993. [80] P. Manning, Electronic and Computer Music.
Oxford University Press, 2013.
[81] J. Appleton, “A complex tool for performance, teaching, and composition,” Music Educators Journal, vol. 69, no. 5, pp. 67–67, 1983. [82] I. Fujinaga, “Adaptive optical music recognition,” Ph.D. dissertation, McGill University, Montreal, Canada, 1996. [83] ——, “Exemplar-based learning in adaptive optical music recognition system,” in Proceedings of the International Computer Music Conference, 1996, pp. 55–56. [84] I. Fujinaga, B. Pennycook, and B. Alphonce, “Computer recognition of musical notation,” in Proceedings of the First International Conference on Music Perception and Cognition, 1989, pp. 87–90. [85] W. Buxton, R. Sniderman, W. Reeves, S. Patel, and R. Baecker, “The evolution of the SSSP score editing tools,” Computer Music Journal, vol. 3, no. 4, pp. 14–60, 1979.
BIBLIOGRAPHY
130
[86] A. Forsberg, M. Dieterich, and R. Zeleznik, “The music notepad,” in Proceedings of the 11th Annual ACM Symposium on User Interface Software and Technology.
ACM, 1998, pp.
203–210. [87] H. Miyao and M. Maruyama, “An online handwritten music symbol recognition system,” International Journal of Document Analysis and Recognition (IJDAR), vol. 9, no. 1, pp. 49– 58, 2007. [88] J. A. Landay, “Interactive sketching for the early stages of user interface design,” Ph.D. dissertation, Carnegie Mellon University, 1996. [89] J. Garcia, T. Tsandilas, C. Agon, and W. Mackay, “PaperComposer: Creating interactive paper interfaces for music composition,” in Proceedings of the 26th Conference on l’Interaction Homme-Machine. ACM, 2014, pp. 1–8. [90] G. Paine, “Interfacing for dynamic morphology in computer music performance,” in Proceedings of the 2007 International Conference on Music Communication Science, 2007, pp. 115–118. [91] S. Cazan, Personal communication, Feb. 2017. [92] M. Wright, A. Freed, and D. Wessel, “New musical control structures from standard gestural controllers,” in Proceedings of the International Computer Music Conference, 1997, pp. 387– 390. [93] D. Wessel and M. Wright, “Problems and prospects for intimate musical control of computers,” Computer Music Journal, vol. 26, no. 3, pp. 11–22, 2002. [94] K. Kaczmarek, “alskdjalskdjalskdj: documenting the process involved in writing a timingaccurate networked piece for a laptop ensemble,” in 1st Symposium on Laptop Ensembles & Orchestras, 2012, p. 94. [95] S. Serafin, R. Dudas, M. M. Wanderley, and X. Rodet, “Gestural control of a real-time physical model of a bowed string instrument.” in Proceedings of the International Computer Music Conference, 1999.
BIBLIOGRAPHY
131
[96] D. Van Nort, M. Wanderley, and P. Depalle, “Mapping control structures for sound synthesis: Functional and topological perspectives,” Computer Music Journal, vol. 38, no. 3, 2014. [97] S. Salazar and G. Wang, “Chugens, chubgraphs, chugins: 3 tiers for extending ChucK,” in Proceedings of the International Computer Music Conference, 2012. [98] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, 1989. [99] S. Madhvanath, D. Vijayasenan, and T. M. Kadiresan, “LipiTk: A generic toolkit for online handwriting recognition,” in ACM SIGGRAPH 2007 courses. ACM, 2007, p. 13. [100] J. O. Wobbrock, A. D. Wilson, and Y. Li, “Gestures without libraries, toolkits or training: A $1 recognizer for user interface prototypes,” in Proceedings of the 20th annual ACM symposium on user interface software and technology.
ACM, 2007, pp. 159–168.
[101] B. Buxton, Sketching User Experiences: Getting the Design Right and the Right Design. Morgan Kaufmann, 2007. [102] B. Verplank, “Interaction design sketchbook,” Unpublished paper for Stanford course Music 250a, 2003. [103] J. Berger, “Drawing is discovery,” in Selected Essays of John Berger, G. Dyer, Ed. Vintage, 2008. [104] J. Garcia, T. Tsandilas, C. Agon, W. Mackay et al., “InkSplorer: Exploring musical ideas on paper and computer,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2011. [105] B. Buxton, Human Input to Computer Systems: Theories, Techniques and Technology. Unpublished, 2011, available https://www.billbuxton.com/inputManuscript.html. [106] G. Wang, “Principles of visual design for computer music,” in Proceedings of the International Computer Music Conference, 2014.
BIBLIOGRAPHY
132
[107] R. Niels, L. Vuurpijl et al., “Using dynamic time warping for intuitive handwriting recognition,” in Advances in Graphonomics, Proceedings of the 12th Conference of the International Graphonomics Society, 2005, pp. 217–221. [108] E. Cherry and C. Latulipe, “Quantifying the creativity support of digital tools through the creativity support index,” ACM Transactions on Computer-Human Interaction, vol. 21, no. 4, p. 21, 2014. [109] M. M. Wanderley and N. Orio, “Evaluation of input devices for musical expression: Borrowing tools from HCI,” Computer Music Journal, vol. 26, no. 3, pp. 62–76, 2002. [110] A. Kapur, M. Darling, D. Diakopoulos, J. W. Murphy, J. Hochenbaum, O. Vallis, and C. Bahn, “The Machine Orchestra: An ensemble of human laptop performers and robotic musical instruments,” Computer Music Journal, vol. 35, no. 4, pp. 49–63, 2011. [111] S. Salazar and M. Cerqueira, “ChuckPad: Social coding for computer music,” in Proceedings of the International Conference on New Interfaces for Musical Expression, 2017.