Convention Paper

Audio Engineering Society

Convention Paper Presented at the 128th Convention 2010 May 22–25 London, UK The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society.

Influence of Psychoacoustic Roughness on Musical Intonation Preference Juli´an Villegas1 , Michael Cohen1 , Ian Wilson2 , and William Martens3 1 Computer 2 Center

Arts Lab., University of Aizu, Japan

for Language Research, University of Aizu, Japan

3 Faculty

of Architecture, Design and Planning, University of Sydney, NSW 2006 Australia

Correspondence should be addressed to Juli´an Villegas ([email protected]) ABSTRACT An experiment to compare the acceptability of three different music fragments rendered with three different intonations is presented. These preference results were contrasted with those of isolated chords also rendered with the same three intonations. The least rough renditions were found to be those using Twelve-Tone EqualTemperament (12-tet). Just Intonation (ji) renditions were measured as the roughest and least preferred. Keywords:

Preference, Pleasantness, Roughness, Tuning, Intonation, Musical Consonance.

1. INTRODUCTION The auditory attribute known as ‘roughness’ is associated with fast amplitude modulations. It grows rapidly in the transition between the perception of a single tone and the perception of two tones with different frequencies (Fastl and Zwicker 2007). Qualitatively, roughness has been related to musical dissonance (von Helmholtz 1954).

According to Terhardt (1976), roughness constitutes one of the most important psychoacoustic factors harming sensory consonance (a compound of auditory attributes including loudness, roughness, and sharpness—See Fastl

and Zwicker (2007) for a detailed description). In Terhardt’s study, sensory consonance and harmony (determined primarily by affinity of tones, tone compatibility, and root relationship) are related to the perception of musical consonance. For Terhardt, harmony dominates the acceptability of successive presentation of tones, whereas sensory consonance dominates the acceptability of their simultaneous rendition. Note that in music, melody is associated with the successive presentation of tones, and harmony with their simultaneous rendering, but in Terhardt’s discourse (2008), harmony “basically addresses the goals and methods of the conventional the-

Villegas et al.

Roughness and Intonation Preference

ory of harmony, i.e., to understand the organization and auditory effects of musical sounds, and to provide methods for organizing musical sounds such that they are appreciated by the ear” (italics added). Extending Terhardt’s work, von Aures (1985b) proposed an expression to predict the relative pleasantness of a sound directly from its roughness R, sharpness S , tonalness T , and loudness N values: 2

P = e−0.55R e−0.1113S (1.24 − e−2.2T ) e(−0.023N) .

(1)

model is easy to implement and the same information required to calculate roughness is needed for the reintonation stage, optimizing overall realtime performance. For Vassilakis, roughness r of a sinusoid pair h f1 , a1 i, h f2 , a2 i with frequencies fi and amplitudes ai can be calculated by the expression (a1 a2 )0.1 2 min(a1 , a2 ) r= 2 a1 + a2

!3.11

(e−3.5F − e−5.75F ), (2)

where F = S (min( f1 , f2 )) | f1 − f2 |

Theories such as the one put forward by Terhardt are based mainly upon sensory factors. Affective and cognitive factors are not considered with the same emphasis, and it is not clear how well these theories can be applied to analysis of musical information (Shepard 2001, p. 153). Being aware of such limitations, it is still interesting to review how psychoacoustic theories have allowed some insights into music perception. Musical consonance seems to be correlated with sensory pleasantness, and (presumably) pleasantness is a good predictor of preference (Blood and Zatorre 2001). In this research, we investigate the relationship between preference choices in music and psychoacoustic roughness. It is appropriate to differentiate between intrinsic and extrinsic roughness: the former is inherent in the acoustics of a sound source and contributes to its naturalness; the latter is caused by the combination of sounds produced by different sources and is (presumably) more related to the perception of musical dissonance. We used “GoldenEar” (Villegas and Cohen 2010), an adaptive tuning object intended to achieve minimal extrinsic roughness. Adaptive tuning is a process by which pitches are altered dynamically to favor a given aspect of music, usually consonance of certain intervals. GOLDENEAR This Pure Data (Pd) patch (2009) extracts spectral components of simultaneous sources and retunes each tone in real-time to minimize predicted roughness. GoldenEar source code, libraries for Mac OS and MS Windows, documentation, and examples are available at (Villegas 2008).

and S(f) =

(4)

Following previous approaches (Plomp and Levelt 1965; Sethares 1994), Vassilakis assumes that roughness of complex tones is the accumulation of the contribution of each possible pair of frequency components. Total roughness R for a set of complex tones, is estimated by R=

p p X n X n X X

r(hah j , fh j i, haik , fik i),

(5)

h=1 i=1 j=1 k=1

where n is the number of tones, p is the number of frequency components, uv is the amplitude a or frequency f of the vth frequency component of the uth tone, and r is as defined in Equation 2. Note that von Aures (1985a) has shown that total roughness only equals the summation of each pairwise contribution if the constituent sinusoids excite different pitch analyzer filters in the basilar membrane. 2.1.

Modifications to Vassilakis’ model

Since we are not interested in changing source spectra, we contract Equation 5 by excluding intrinsic roughness contributions:

2.

Acknowledging the existence of several roughness models, we opted to use Vassilakis’ (2001) spectral model. He proposes an expression that is fast to compute, with results that resemble those of temporal models. This

0.24 . 0.0207 f + 18.96

(3)

RE =

p p X n X n−1 X X

r(hah j , fh j i, haik , fik i).

(6)

h=1 i=h+1 j=1 k=1

This equation specifies that for each iteration, only predicted roughness from pairs of frequency components belonging to different sources (h and i) is accumulated. 2.2.

Vicinity

Without limits on the search for fundamental frequencies (i.e., with progressively expanding search space), optimal adjustments would degenerately be the unison of all

AES 128th Convention, London, UK, 2010 May 22–25 Page 2 of 11

Villegas et al.

x

Roughness and Intonation Preference

Component extraction (sigmund˜)

components (freq., amp.)

Roughness minimization (goldenEar)

Pitch adjustment (shifter˜)

new fundamental freqs.

x′

Fig. 1: Block diagram of the implementation. tones, perhaps several octaves higher than the original pitches. This is because unison is the least rough interval for similar spectra, and as fundamental frequencies increase, perceived roughness decreases. Furthermore, it is desirable to preserve the ‘character’ of a given interval. For instance, for harmonic spectra, minor thirds (m3 ) are generally rougher than major thirds (M3 ), but familiarity with these two intervals and their musical importance inhibit their mutual replacement, especially in the middle and upper ranges of the audible spectrum. According to Houtsma and Smurzynski (1990), the just noticeable pitch difference (pitch jnd) is less than 0.5% of the fundamental frequency of a complex tone containing harmonics below the 7th . This difference is around 8.3 cents for the most acute region of the human hearing spectrum (Hartmann 1997). Other than the minor seventh (m7 ) and the augmented fourth (aug4 ), discrepancies between Twelve-Tone Equal-Temperament (12-tet) and Just Intonation (ji) intervals are less than 16 cents as shown in Table 1. A vicinity of ±8 cents, centered at the fundamental frequencies, is large enough to potentially “purify” (to turn into pure intervals, that is, intervals observing small integer ratios) most 12-tet intervals without converting them into a different one, for instance, a 12-tet m3 (300 cents) could be converted in a pure m3 (315.64 cents). With the imposition of a vicinity, pitch drift (alteration of the original reference frequency) is not prevented but tethered. Therefore, special care must be taken in performance if pitch drift inhibition is desired. There is no consensus in this matter: although pitch drift in large amounts is noticeable and generally prevented, small amounts are necessary in practice to achieve perfect intervals. 2.3.

Component extraction

Sigmund˜ outputs a pitch value and list of frequency

pd; component extraction sigmund~ 90 gain

threshold 1

start end

catch~ pp pd; output

goldenEar 4 8 16 Roughness minimization pd; c2freq pd; shifter~ pitch adjustment

Fig. 2: Patch used for stimuli generation. Thick lines connect audio objects, and thin lines everything else. components ordered by amplitude. This analysis is performed whenever a new tone is recognized. With a sampling frequency of 48 kHz and an analysis window of 2048 samples, pitches starting a little above G2 are supposed to be correctly analyzed. Roughness minimization (GoldenEar) GoldenEar can be launched with up to three arguments: number of sound sources (four, in Figure 2), half the vicinity (8 cents, in the same figure), and number of spectral components considered in the analysis (16, in the same figure). Its leftmost inlet receives an amplitude threshold that can be set to compensate for background noise in live situations, etc. GoldenEar only processes each arriving tone if its pitch differs from the previous tone by more than half the size of the vicinity. 2.3.2.

2.3.3.

Prototype overview

We created a Pd patch based on Eq. 6 to reduce extrinsic roughness. The realtime prototype can be described conceptually as comprising the three stages shown in Figure 1. 2.3.1.