A Framework for Personalization of Interactive ... - Semantic Scholar

A FRAMEWORK FOR PERSONALIZATION OF INTERACTIVE SOUND SYNTHESIS David Gerhard and Daryl H. Hepting Department of Computer Science University of Regina, Regina, SK CANADA S4S 0A2 {dhh,gerhard}@cs.uregina.ca ABSTRACT Instruments like the piano or guitar have a long tradition in many cultures such that non-musicians who encounter them understand that the piano keys can be pressed and the guitar strings can be plucked. Users of computer-based sound synthesis tools must use parameter names and interface feedback to develop a model of the available sound space of the instrument. Not all users may attribute the same weight to the parameters used by the tool designer. Given the opportunity, users may group parameters together or wish to use entirely different conceptual models. Currently, the user is forced to adhere to the cognitive model of the available musical space developed by the instrument designer. Alternatively, if the user is allowed to develop a personalized view of the available sound space, the interaction will be more natural to the user and fit better with the user’s cognitive model. 1. INTRODUCTION Computer-based music synthesis tools such as Max/MSP and PD have provided substantial capabilities to a diverse population. Like programming languages, however, these require some mastery to use effectively. On the one hand, this is akin to the development of virtuoso performance skills with any instrument for any performer. On the other hand, the computer may present to many an insurmountable barrier from creating music. For any instrument, it is natural to expect that an experienced player should be able to ellicit a wider range of sounds than a novice player, to the limitations of the instrument. An experienced player may even develop nontraditional techniques - like plucking the strings of a piano by reaching into the sound chamber. Although plucking the strings of a piano is never explicity precluded, it is the purpose to which it is applied that affects the value judgment. In order to democratize the composition of music, this paper considers the establishment of a personalized basis for interaction with these composition tools. It is generally true that someone with experience and training will hear more nuances of sounds. If an interface can embody the user’s current mental model [1] of available sound space, the interface can scale gracefully from novice to virtuoso

performance support, reflecting an important trait of easy to learn interfaces [2]. To establish a basis for perceptual interaction, the user must be able to locate a sound in his or her perceptual space, name the dimensions of that space, and use those dimensions to control the sound synthesis. Perception and comprehension of sound [3, 4] requires more sequential processing than any visual stimuli, so the work of locating the sounds must be done by pairwise comparison, without any a priori distinctions. The results would be processed by multi-dimensional scaling [5]. Naming of dimensions would also require pairwise comparisons of samples at opposite ends of the dimension. David Wessel [6] describes two constraints on musical interfaces: low entry fee with no ceiling on virtuosity. Interactive musical interfaces should be easy to learn and should expand with the user’s development. We present a third consideration: the direction of growth of the instrument should not be fixed beforehand. With most traditional musical instruments, the form and playing methods are, for the most part, set by the initial design of the instrument, and the player must spend much effort to learn how to make the instrument do what is desired. Further, the instrument is only ever intended to be played in this particular way, so the exploration of alternative methods of interaction is not available. Interface paradigms exist that are defined by experienced musicians working in computer music, and they presuppose deep semantic knowledge of the domain (relating to the meaning of sounds), and a good knowledge of the syntactic domain (relating to the rules of manipulation of the sounds). If a novice has neither semantic nor syntactic knowledge, then the“entry fee” is inadvertently and unintentionally increased, and may present a sizable barrier to overcome. Similarly, the cognitive model and parameter mapping of an interface may seem natural to the designer, but the underlying (and perhaps unconscious) semantic and syntactic knowledge may not be available to a user with different experience than the designer. Programs such as Reason attempt to provide the user with a familiar interface metaphor for the synthesis techniques available. An example of the interface for Reason is shown in Figure 1. Individuals familiar with the mixerin-a-rack paradigm will find this interface useful. The advantage of this system is that it can expand the paradigm

without compromising familiarity. The Reason interface can do many things that a physical mixer-in-a-rack cannot do, however, those unfamiliar with the interface have no advantage and indeed may be intimidated by the initial perception of complexity therein.

tion 5 discusses the implications of this approach and details some preliminary results. 2. PARAMETER SPACE NAVIGATION The idea of parameter space navigation is important so that composers and patch makers alike can understand the range of what is possible with an instrument or patch. Also, when faced with a new patch or a new synthesis engine, the user must first experiment before she is capable of producing desired results. For example, one of the standard synthesis patches is the [moog∼] patch, implementing the synthesis technique named for one of the pioneers of computer music. The patch itself has 3 inputs, implying to the novice user that there are 3 parameters relevant for the operation of this synthesis engine. Upon further exploration (including reading the help file) we find that the designers of the tool provided 5 parameters to explore the space of sounds available. This still may not provide access to the full dimensionality of the synthesis engine. The parameters made available by the designers define what we can do with the tool.

Figure 1. User Interface for Reason What we propose, then, is to put the user in direct relationship with the sounds the instrument can make, so she can judge them, formulate constructs about how the sounds are related to one another, and build an interface which makes use of those constructs. Different users may hear different relationships between sounds. Rather than trying to guess or to define these beforehand, the user is free to explore her own (perhaps unconscious) cognitive model. The discovery of these cognitive models using traditional mediated interactions may be susceptible to phenomena like verbal overshadowing [7], where the perceptual goal may be obscured by language. Norman [8] described two gulfs in which user may languish: the gulf of execution exists between the user’s goal and the commands needed to achieve that goal; the gulf of evaluation exists between the current system state and the user’s interpretation of that state. In a satisfying interface, these gulfs are necessarily minimized. Without support, composers may be unsure how to generate a particular sound or how to modify an existing sound to achieve their goal. The rest of this paper is organized as follows. Section 2 describes the exploration of parameter spaces. Section 3 proposed the use of this framework to explore PD patches. Section 4 details how we see the whole framework. Sec-

Figure 2. An interface to the [moog∼] patch with 5 parameters. Knowing the dimensionality of a parameter space is crucial to mapping from a personalized perceptual parameter space to an implementation-based parameter space. Consider a view of a multidimensional parameter space formed by projecting down to 2 dimensions. Changing the viewing order of the parameters changes the distances between points. The changing of views is like switching the order of levels in a decision tree—the same information is presented but the relationships between points in the space are not preserved. Therefore it is critical to identify the perceptual similarity of points in the parameter space so that similar points can generate similar sounds. Design Galleries [9] defined a means to explore a parameter space with an a priori specification of important characteristics and a means to evaluate them. With that specification, the system would return a sampling of dis-

tinct outputs. The problem with that approach is the requirement to specify first before exploring. If the parameterization is done so that generated alternatives have little difference, then it will be hard to find the desired one. As discussed in Section 1, differences in semantic and syntactic knowledge make the learning curve for a musical interface different for each user. Shneiderman [10] distinguished semantic and syntactic knowledge, thereby distinguishing the four user types presented in Table 1. 1. 3.

Low-Syntactic, High-Semantic. Low-Syntactic, Low-Semantic.

2. 4.

High-Syntactic, High-Semantic. High-Syntactic, Low-Semantic.

Table 1. The four kinds of users of this framework.

Figure 3. Screenshot with visual representation of PD output. The user can play the sound file associated with each cell, one at a time.

In the context of this framework, these four kinds of user correspond to: 4. PERSONALIZATION FRAMEWORK 1. experienced composer with little or no computer or DSP knowledge; 2. experienced composer with computer experience; 3. inexperienced musician with little or no computer or DSP knowledge; and 4. experienced composer with computer experience. The cogito interface is intended to separate users from the requirements of the syntax, however, once they become familiar with the paradigm, they may be quite willing to manipulate things directly. Ideally, users will be able to exercise local and global control over their search. 3. EXPLORATION OF PD PATCHES To facilitate automatic generation of alternative sounds produced by PD, it is important to be able to invoke the program to be executed in batch or command-line mode, without direct user interaction. To permit the commandline invocation needed for communication with cogito, the basic message-passing functionality of PD was used to control a patch that had far more than the usual number of input widgets. PD was run through a simple Perl script in order to construct the proper message passing syntax from the more traditional command-line parameters. Also unlike the standard mode for interaction with PD applications, a termination condition was provided to the patch and the application was run in non-interactive mode by passing the -nogui flag to PD. A separate process was required to generate the visual realizations, using GNUPlot, as seen in Figure 3. Having completed these implementation details, it is be possible to write supplemental interface patches that contain the necessary messages so that any existing patch can be easily manipulated and explored in this approach.

When a user begins to work with a synthesis engine mediated by this framework, pairs of representative sounds are presented to the user, who makes judgments on their similarity. Based on these judgments, a set of perceptual features are chosen which relate to the perceptual differences identified in the pairwise comparisons. These features are then presented to the user for identification: pairs of sounds are played which vary along a single identified dimension, and the user is asked to identify, in a way meaningful to them, the difference between these sounds. In this way, the system can identify and name the features which most accurately reflect the user’s cognative model. Upon further study it may turn out that there are a number of common cognitive models. In that case, only a few comparisons would be required to discover which of the set of available cognitive models is closest to that of the user. The pieces of the framework are laid out in Figure 4. Those in dashed boxes are not yet realized in the overall system. Exploration of sounds is possible within the cogito system, as depicted in Figure 3. Although it is considered here in the context of PD, this framework is generic in the sense that PD could be replaced with another synthesis engine. The GUI is an item for future work, but we see that it can be possible to use this system to generate a perceptually-motivated PD patch that can be used within PD to control more synthesis-directed patches. This would be a much more lightweight interface solution and appropriate once the composer or player understands the space of available sounds provided by the patch. The pairwise comparison module makes use of the command-line (non-interactive) methods of invocation for PD: the module will specify samples and PD will realize those samples. The system will then present these samples to the user, who will evaluate and make comparisons. Based on the results, the system will select a new pair of samples and present them to the user for comparison.

Playing the Instrument

Modifying the Instrument

interface. It should be a simple process, therefore, to rebuild the interface with access to dimensions of sound not previously available to the user. If the user is seen to belong to one of a set of common cognitive models then the adaptations and developments may even be predictable.

User

GUI

cogito

6. ACKNOWLEDGEMENTS User Parameter mapper Pairwise Comparison and Analysis

Synthesizer

Sound Production

Figure 4. Layout of framework Of course, not all of the available parameter configurations will represent significant changes in the perceived sound, and the task of the framework is to generate the appropriate pairs to divide the space intelligently based on the perceptual demands of the user, and quickly approach an acceptable solution. The fact that the space is parameterized makes it easy to construct these variations. 5. DISCUSSION This work may lead to specific interfaces to PD patches that embody the chosen perceptual dimensions. A functional version of this complete framework has the potential to dramatically democratize the composition process. Yet, there are a number of issues. For example, how does one perform the mapping from data to sound? How easy is it to predict the sound created from a given set of data? The computer-literate musician or the music-literate computer scientist who programs in PD is only part of the puzzle. A musician who wants to explore these sound synthesis techniques must have an intermediary who is fluent in both these environments. The development of more proficient and complex computer software will allow novices of all types (musical, computing and both) to start quickly and not get frustrated. A further goal of these systems should be to avoid the trap that quick-start-able systems cannot grow beyond a certain point. Perceptually customizable systems, as described above, have the potential to grow with the user as their understanding of their own perceptual space grows and develops. The user must never be restrained by the computer system, the computer system must release these restraints. Personal construct psychology [11] and related techniques provide a promising method of accessing a user’s (often unconscious) cognitive model. Over time, a user may develop a more nuanced view of the synthesizer’s output and develop a correspondingly more complicated

We thank Paul Schmiedge for implementation support. The authors wish to thank the University of Regina and to acknowledge funding from the Natural Sciences and Engineering Research Council of Canada through their individual Discovery Grants. 7. REFERENCES [1] M. B. Rosson and J. M. Carroll, Usability Engineering: Scenario-based Development of HumanComputer Interaction, Morgan Kaufmann, 2002. [2] J. Raskin, The Humane Interface, Addison-Wesley, 2000. [3] E. B. Goldstein, Sensation and Perception, Brooks/Cole, Fifth edition, 1999. [4] G. S. Dell and P. G. O-Seaghdha, “Stages of lexical access in language production,” Cognition, vol. 42, pp. 287–314, 1992. [5] J. B. Kruskal, “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis,” Psychometrika, vol. 29, pp. 1–29, 1964. [6] D. Wessel and M. Wright, “Problems and prospects for intimate musical control of computers,” Computer Music Journal, vol. 26, no. 3, pp. 11–22, September 2002. [7] M. Fallshore and J. W. Schooler, “The verbal vulnerability of perceptual expertise,” Journal of Experimental Pyschology: Learning, Memory, and Cognition, vol. 21, no. 6, 1995. [8] D. A. Norman, The Psychology of Everyday Things, Basic Books, New York, 1988. [9] J. Marks et al., “Design galleries: A general approach to setting parameters for computer graphics and animation,” in Computer Graphics: SIGGRAPH ’97 Conference Proceedings, 1997. [10] B. Shneiderman, Designing the user interface: strategies for effective human–computer interaction, Addison-Wesley, Reading, MA, USA, Second edition, 1992. [11] G. Kelly, The Psychology of Personal Constructs, Norton, 1955.