l
Research Article
Twirling Triplets: The Qualia of Rotation and Musical Rhythm
Music & Science Volume 2: 1–20 ª The Author(s) 2019 Article reuse guidelines: sagepub.com/journals-permissions DOI: 10.1177/2059204318812243 journals.sagepub.com/home/mns
Niels Chr. Hansen1 and David Huron1
Abstract While musicologists have long noted that triplet rhythms evoke sensations of rotation in listeners, no theory has been proposed to account for this apparent association. To investigate this phenomenon, 33 excerpts of “spinning, rotating, twirling, or swirling” music were crowd-sourced from an online discussion forum. Analysis revealed a prominence of fast, repeated, isochronous patterns using stepwise pitch movement, with significantly more compound meters than generally found in Western music. Inspired by ecological acoustics, an Ecological Theory of Rotating Sounds (ETRoS) is proposed to explain these associations. The theory maps patterns of loudness fluctuations to trajectories of rotating sound sources. Two experiments tested the theory. In Experiment A, listeners rated how much binary, ternary, quaternary, and quinary figures (of 2–5 notes) evoked sensations of rotation. Experiment B used a two-alternative forced-choice paradigm pitting ecological quaternary stimuli (strong-medium-weak-medium) against unecological stimuli with permuted stress values more typical of Western music (strong-weak-medium-weak). Results indicate that perceived rotation increases with tempo and is poorly evoked by binary rhythms. Loudness patterns consistent with rotating trajectories were perceived as more rotating than unecological patterns—but only when pitch was also moving. Altogether, moderate support is provided for an acoustic-ecological account of rotating sounds. Keywords Ecological acoustics, music, rhythm, rotation, triplets Submission date: 28 November 2017; Acceptance date: 18 October 2018
Introduction Music commentators have long observed that music seems to have a special affinity with movement (e.g., Levitin, Grahn, & London, 2018). Even apart from dancing and the necessary movements of performers, music commonly evokes kinetic impressions among otherwise immobile listeners. In addressing music’s action affinity, researchers have employed various theoretical approaches. One example is embodied cognition, where perceptual and cognitive processes are linked to the human body and to the body situated in an environment (e.g., Godøy & Leman, 2010; Leman, 2007; Schiavio, 2014). Extending visuomotor mirror neurons to the auditory modality, others have demonstrated sound–action connections consistent with the existence of auditory-evoked mirror responses in supplementary motor areas (Aziz-Zadeh, Iacoboni, Zaidel, Wilson, & Mazziotta, 2004; Buccino et al., 2005). Another approach to addressing music’s action affinity is research that draws on metaphor theory (Lakoff & Johnson,
1980; Zbikowski, 2002, 2008). Well-documented metaphors include the cross-modal association between pitch height and spatial elevation (Jeschonek, Pauen, & Babocsai, 2013; Wagner, Winner, Cicchetti, & Gardner, 1981; Walker et al., 2010) and the association of loudness with visual brightness (Lewkowicz & Turkewitz, 1980). Some crossmodal mappings seem straightforwardly related to physical phenomena, such as the association between decreasing loudness and increasing distance (see Neuhoff, 1998). Similarly, the association between pitch height and object size might be explained by the simple acoustic relationship between higher frequency and smaller mass or volume 1
School of Music, Ohio State University, USA
Corresponding author: Niels Chr. Hansen, The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Locked Bag 1797, Penrith NSW 2751, Australia. Email:
[email protected]
Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
2
Music & Science
(Eitan & Timmers, 2010; Marks, 1987; Mondloch & Maurer, 2004). However, other mappings seem to defy simple explanation. For example, an association between rising pitch and increasing size has been widely reported (Antovic, 2009; Eitan, Schupak, Gotler, & Marks, 2014; Kim & Iwamiya, 2008). Yet this finding appears to contradict the static association of high pitch with small size. That is, lower pitch is larger, yet falling pitches shrink (Eitan, 2013). Research has also chronicled various mapping asymmetries; a change in one direction (such as rising pitch) may evoke a stronger metaphorical relationship than an equivalent change in the opposite direction (see Eitan, 2013; Eitan & Granot, 2006). Aware of this complexity, Eitan (2013) has proposed multiple underlying explanations. Specifically, while some cross-modal associations are innate, others might arise from multisensory integration, represent true synesthesia, be learned from exposure to common environmental occurrences, or simply originate in unique cultural expressions (see Levitin et al., 2018). In the current study, we focus on a single soundmovement association related to spinning, rotating, or twirling. We offer an account based on yet another theoretical approach, namely, “ecological acoustics”—an approach that has not been featured in previous musicrelated research. In his book, La Musique et l’Ineffable, Vladimir Jank´el´evitch (1961) wrote of “our sense that triplets ‘whirl’” (p. 91). Indeed, music listeners can readily point to musical examples in which subdivision of beats into three is associated with rotation, twirling, whirling, or spinning. Famous examples include Franz Schubert’s “Gretchen am Spinnrade,” Op. 2, D.118, and Bedrˇich Smetana’s swirling water motion at the beginning of “Die Moldau.” Other music scholars have observed this apparent relationship: The repeat . . . explodes in scurrying triplets. . . . It embraces every progressive device of keyboard virtuosity . . . swirling triplets and sextuplets. (Mellers, 1954, p. 557 and p. 561) the glockenspiel starts spinning roundabout triplets (Sarsfield, 2014, p. 34) idea e could be accelerated to twirling triplets. (Mather & Karns, 2015, p. 47) In the repertoire, musical markers of gaiety abound: lively tempo, major mode, clear and uncomplicated melodic organization, and simple rhythms with a swinging gait (triplet figures are common). (McKee, 2014, p. 171)
Dance treatises also associate spinning dances with steps counted in threes or sixes (Ma¨del, 1805, as cited in McKee, 2014). Such observations raise the question: why—and under which conditions—would triplets seem to spin? This article aims to answer this question systematically: first, musical excerpts of spinning or rotating music are collected and analyzed; second, inspired by the field of ecological
acoustics (Gaver, 1993a, 1993b; Gibson, 1966, 1979; Li, Logan, & Pastore, 1991; see also Bregman, 1990), a theory is proposed to account for this phenomenon; finally, two experiments testing both the theory and the informal and analytical observations are reported.
Survey of music theorists To study the relationship between acoustic features and rotating qualia in music more systematically, further musical examples were solicited from music scholars using a popular online discussion website sponsored by the Society for Music Theory (SMT). The following query was posted; with no mention of triplets, to avoid biasing responses: Dear collective wisdom, Some of you may know Jank´el´evitch’s Music and the Ineffable (1961). In one passage, he talks about various rhythms as connoting spinning, rotating, twirling, or swirling. Examples might include the beginning of Smetana’s Die Moldau, Schubert’s Gretchen am Spinnrade, Saint-Sae¨ns’ Omphale’s Spinning Wheel, and Dvorˇa´k’s The Golden Spinning Wheel. We’re looking for more musical examples. Suggestions welcome. David Huron & Niels Chr. Hansen
Over the course of a few days 19 postings identified 33 musical examples (henceforth referred to as “SMT examples”). The complete list is provided in the Appendix. Of the passages identified, 21 contain programmatic content (e.g., title or lyrics) explicitly alluding to rotating movement whereas 12 works identify passages that some music scholar suggests is evocative of such movement. Examining the 27 scores that the authors could locate, a number of commonalities were observed (especially in the accompaniment textures) that may or may not be pertinent to the evoking of a spinning or rotating quality. Specifically, the nominally rotating passages were examined—in isolation from the works from where they were taken—with respect to tempo, dynamics, repetition, meter, isochrony, and pitch proximity. Fifteen of the SMT examples include Italian tempo terms, with Allegro (33%) being most common. In order to determine whether this tempo is faster or slower than music in general, a distribution of tempo terms for a representative sample of 750 common-practice orchestral works composed between 1750 and 1900 was used as a comparison (Horn & Huron, 2015). Although this comparison sample may be biased in some way (London, 2013), Figure 1 nevertheless provides back-to-back bar graphs comparing the tempo distributions for the SMT examples with the Horn and Huron corpus. Note that 80% (12 out of 15) of the tempo-designated SMT examples exhibit tempi that are at or above the median tempo for the Horn and Huron distribution (technically Allegretto, but closer to Moderato than Allegro moderato). A Mann-Whitney U-test shows a statistically significant difference between the two distributions (W ¼ 3,980.5, Z ¼ 1.98, p ¼ .047). In short, the nominally rotating SMT passages appear to exhibit
Hansen and Huron
3 Table 1. Metric division and grouping types in Barlow and Morgenstern’s (1983) Dictionary of Musical Themes compared with the SMT list of musical examples “connoting spinning, rotating, twirling, or whirling” (see Appendix). Twelve irregular and ambiguous meter signatures are excluded from the Barlow & Morgenstern corpus.
Metric division Compound Simple Total
Metric grouping Triple Other (single, duple, quadruple, quintuple) Total Figure 1. Back-to-back bar graph comparing the distribution of tempo terms for a sample of 15 nominally rotating passages (solid bars) compared with a general distribution (shaded bars) of tempo terms for 750 orchestral works (after Horn & Huron, 2015). SMT: Society for Music Theory
Figure 2. Back-to-back bar graph comparing the distribution of dynamic markings for a sample of 26 nominally rotating passages (solid bars) compared with a general distribution (shaded bars) of dynamic markings for 750 orchestral works (after Horn & Huron, 2015). SMT: Society for Music Theory
somewhat faster tempi compared with common-practice Western music. With regard to dynamics, the most common dynamic marking amongst the 26 SMT examples that contained dynamic markings was piano (Figure 2). Unlike for tempo, no statistically significant difference was found between the dynamics for the nominally rotating SMT
Barlow and Morgenstern (1983)
SMT examples with score
1,287 8,863 10,150
16 11 27
Barlow and Morgenstern (1983)
SMT examples with score
3,418 6,732
5 22
10,150
27
SMT: Society for Music Theory.
passages and common music between 1750 and 1900 (W ¼ 8,028.5, Z ¼ 1.502, p ¼ .134). With regard to musical meter, Table 1 reports the frequency of occurrence of different metric types in the SMT examples compared with those found in 10,150 classical music themes from Barlow and Morgenstern’s (1983) Dictionary of Musical Themes. Musical meters are divided according to subdivision of the beat (i.e., “metric division”) and the number of beats per measure (i.e., “metric grouping”). Specifically, compound and simple meters contain subdivisions of the beat into three (e.g., 6/8, 9/8) or two (e.g., 2/4, 3/4), respectively; in terms of metric grouping, triple meters contain three beats per measure (e.g., 3/4, 9/8) whereas duple, quadruple, and quintuple meters contain two, four, or five beats per measure, respectively. A chi-square test suggests that the rotating examples submitted by music theorists include more compound meters than would be expected given the distribution of meters in Western classical music generally, w2(1) ¼ 52.3, p < .001.1 Conversely, there is no statistically significant tendency for the nominally rotating passages to favor triple meter over non-triple meters, w2(1) ¼ 2.77, p ¼ .096. In fact, the data are skewed in the direction opposite to favoring triple meter. In other words, the music connoting “spinning, rotating, twirling, or swirling” appears to favor subdividing the beat into three (i.e., compound meters) but not measures containing three beats (i.e., triple meters). This finding, that rotation is associated with compound but not with triple meters, is likely to interact with tempo. Specifically, music theorists speak of a metric hierarchy ranging from beat subdivisions, through beats and measures, to hypermetric groups of measures (Lerdahl &
4 Jackendoff, 1983). Because the beat (or tactus) can only be perceived within the 200–1,800 ms range with a peak of maximum pulse salience at 600 ms (Fraisse, 1978), triple and compound meters primarily differ in terms of tempo (London, 2012). If music slows down, for example, tactus might change to what otherwise would be considered a subdivision of the beat. Indeed, when executing ritardandi, musicians mentally subdivide the beat to facilitate creating a smooth transition (Spitzer et al., 2017). This calls for systematic manipulation of tempo in empirical studies of the rotating qualia. Apart from tempo, dynamics, and meter, accompaniment figurations were also analyzed. Each passage was classified as either isochronous or anisochronous, depending on whether the notated note values were constant or not. Fully 85% (23 out of 27) of the SMT-list accompaniments made use of isochronous figuration, most commonly using sixteenth notes. In combination with a generally fast tempo (Figure 1), the predominance of sixteenth notes attests to a rather fast pace in passages deemed to be evocative of “rotating.” Furthermore, the SMT examples exhibit a penchant for cyclic, repeated figures. Interestingly, 76% (19 out of 25) of the cyclic lengths were multiples of three, consisting of 3, 6, 9, or 12 notes. Finally, melodic interval size was coded in terms of the median interval of the first six pitches. The median interval for the SMT examples is two semitones. This is identical to the median interval for a wide-ranging sample of world music reported by Huron (see Figure 5.1 in Huron, 2006). However, in the world-music sample, 23% of melodic intervals are unisons whereas only 4% (5 out of 125) of melodic intervals in the SMT examples are unisons. Omitting unisons from analysis increases the median melodic interval in the worldmusic sample to three semitones whereas the median interval remains two semitones in the SMT examples. Thus, in nominally rotating passages, small melodic intervals are especially prevalent, even though unison intervals are rare. By way of summary, passages of Western music that are nominally evocative of rotating or spinning appear to be associated with at least five musical features. Specifically, there is a marked tendency to employ (1) compound meters, (2) fast tempi with short note values, (3) isochronous rhythms, (4) repeated cyclic patterns with cardinalities favoring multiples of 3, and (5) small melodic intervals while simultaneously avoiding unison pitch repetitions. Finally, as demonstrated by the lack of an association of rotating qualia with triple meters, there is no evidence that nominally rotating passages favor a compound triple hierarchy. In light of this observation, in the ensuing discussion, the terms “duple”, “triple”, “quadruple”, and “quintuple” will be reserved for situations where stress patterns accord with the conventional metric hierarchy (such as the quadruple “strong-weak-
Music & Science medium-weak” pattern). In most cases, the more neutral terms, “binary”, “ternary”, “quaternary”, and “quinary” will be used instead.
Ecological acoustics The use of physical models in the realm of ecological acoustics may be relevant in considering the question why triplets seem to spin. Ecological acoustics is a branch of “ecological psychology” associated with the work of James Gibson. Gibson (1966, 1979) argued that an analysis of the environment is crucial to explaining behavior. In the case of sound, an ecological, Gibsonian perspective would emphasize that listeners acquire practical everyday knowledge from sounds in their acoustical environment rather than abstract properties such as frequency, duration, and intensity (Fowler, 1990). For example, what are the acoustical properties of sounds that allow a listener to recognize that the source is made of metal or wood, or whether a sound source was struck, rubbed, or blown? Empirical work in ecological acoustics has indeed established that listeners can infer the length (Carello, Anderson, & Kunkler-Peck, 1998) and geometric width–height ratio (Lakatos, McAdams, & Causse, 1997) of objects that are dropped or struck as well as the hardness of the mallet that they were struck with (Freed, 1990) and whether the objects remained intact or broke into several pieces (Warren & Verbrugge, 1984). Similarly, listeners can infer the hand configuration used for clapping and perform above chance in guessing which individual produced a given clapping sound (Repp, 1987). In more musically pertinent studies, it has been argued that string bowing techniques (Halmrast, Guettler, Bader, & Godøy, 2010) and trumpet mouthpiece depth (Poirson, Petiot, & Gilbert, 2005) can also be inferred from the produced sounds. A paradigmatic example of the ecological approach in the realm of sound is found in Warren and Verbrugge’s (1984) work on “breaking” and “bouncing” sounds. They considered how listeners are able to deduce from sounds whether an object has broken (in response to being dropped) or is still intact (and bouncing). In brief, whereas bouncing follows a predictable exponential temporal pattern of onsets consistent with Newtonian mechanics, breaking objects separate into component parts, typically differing in shape and mass. In this latter case, the aggregate temporal pattern represents a series of (several parts) bouncing, rather than a single bouncing object. Experiments showed that listeners are indeed sensitive to these temporal patterns and consequently successfully judge synthesized sounds to be either “breaking” or “bouncing” on the basis of experience with real-world acoustical patterns. The present study asks a similar question regarding rhythmic sounds that might be emitted when either a soundproducing object or the listener is rotating.
Hansen and Huron
5
Figure 3. Four unique circumstances or scenarios in which rotating sounds can be distinguished according to the relationship between the sound source and the listener. Theoretically, the relative pattern of loudness (resulting from distance) is invariant, regardless of whether the sound source rotates, scenarios (a) and (b), or the listener rotates, (c) and (d), with the stationary part inside, (b) and (d), or outside, (a) and (c), the circular trajectory.
An Ecological Theory of Rotating Sounds (ETRoS) In considering rotating sounds, four circumstances or scenarios may be distinguished (see Figure 3). One scenario has a sound source rotating while the listener remains outside of the rotating trajectory. A second scenario has a sound source rotating around a listener. A third scenario has the listener moving in a circle with a static sound source outside of the rotating trajectory. A fourth scenario has the listener moving in a circle around the sound source. For convenience, the initial discussion focuses on the first of these scenarios whereas the remaining three are addressed afterwards. Throughout, the listener is treated as a point receiver favoring no particular direction, thus ignoring any effects from having two spatially disjoint ears. Similarly, the sound source is assumed to exhibit isotropic radiation, that is, to emanate uniformly dispersed sound in all directions. Figure 3(a) illustrates a sound source moving in a circular pattern with the listener outside of the movement trajectory. Note that the illustration supposes that the listener is facing towards the center of the trajectory and that the circular motion happens in the horizontal plane. Such a rotating sound source is expected to produce three notable acoustical effects: first, loudness should decrease with distance from the source to the listener; second, due to the Doppler effect, a receding sound should be slightly lower in pitch, whereas an approaching sound should be slightly higher in pitch—the amount of pitch change being proportional to the speed of the object. Finally, a truly rotating object will cause changes in perceived location due to inter-aural time and amplitude differences documented in the sound localization literature (e.g., Wightman & Kistler, 1992). If a musician endeavors to create a rotating effect, having the sound source move in a circular pattern sometimes proves impractical since many musical instruments are relatively immobile. With the notable exceptions of electroacoustic or digitally processed music, inter-aural time and amplitude differences are generally unavailable as cues. Similarly, most musical instruments are designed to produce discrete pitches unlike the continuous
frequency shifts needed to emulate the Doppler effect. This renders dynamic level potentially the most important acoustic cue for evoking a sense of rotation in listeners. Accordingly, effects of rotating trajectories on loudness will be considered next. Since this study was motivated by the apparent association of triplets with rotation, it seems appropriate to focus on intermittent (rhythmic) rather than continuous sounds. In the most basic situation, one might consider a series of isochronous clicks or short tones with fixed rotational phase. Figure 4 offers a more detailed analysis of the scenario shown in Figure 3(a). The figure illustrates a hypothetical listener in the presence of an intermittent sound-producing object following a rotating trajectory with constant velocity. Each of the various sub-figures illustrates sound onsets occurring at regular intervals, subdividing the rotation into two, three, four, or five, respectively. As noted above, loudness will vary systematically with distance to the listener. The top row in Figure 4 illustrates the situation for a binary pattern where two sound events are emitted for each rotation. The first column illustrates a (rare) perfect phase alignment where one of the sounds is emitted directly in front of the listener. The middle column illustrates an equally rare phase alignment in which the two closest sounds are emitted at equal distance to the listener. The third column illustrates instances of more typically encountered phase relationships. As indicated in the thumbnail bar graph and accompanying rhythmic notation, the binary cases almost exclusively involve a loud/quiet alternation. By analogy, the second, third, and fourth rows of Figure 4 illustrate ternary, quaternary, and quinary patterns in which three, four, or five isochronous sound events are emitted for each rotation. In considering the common cases illustrated in the third column of Figure 4, it is worth emphasizing that, in the absence of any already existing metrical context, the listener will tend to hear the loudest event as marking the downbeat (Lerdahl & Jackendoff, 1983; London, 2012). Notably, the recurring strong-weak pattern most commonly present in the binary case (Figure 4) is consistent with duple meters in Western music-making. For the ternary case, either a strong-weak-medium or a strong-medium-weak pattern
6
Music & Science
Figure 4. A listener placed outside the rotating trajectory of a sound-producing object will perceive different patterns of loudness in different instantiations of binary, ternary, quaternary, and quinary cases where two, three, four, or five sounds are emitted per cycle, respectively. Dotted lines connect sound events that form part of the same case (i.e., configuration of sound events), such that the images in the first two columns depict a single case whereas the images in the third column depict a multitude of different cases all stacked upon each other. Unlike the most common quaternary and quinary cases, binary and ternary loudness patterns are consistent with metrical hierarchies used in Western music-making.
results. Both of these feature a single strongest event, consistent with common stress patterns in triple music-making. Conversely, in the case of the quaternary pattern, the most common loudness pattern exhibiting a strong-mediumweak-medium pattern contrasts notably with the ubiquitous strong-weak-medium-weak pattern widely regarded by musicians as characteristic of quadruple music-making. Similarly, the most common quinary pattern (strongmedium-weak-weak-medium) deviates from quintuple meters in Western music (most typically, strong-weak-
medium-weak-weak or strong-weak-weak-medium-weak). Thus, while music in duple and triple meters exhibits stress patterns consistent with sound sources following circular trajectories, this is typically not the case for music in quadruple and quintuple meters.
Which trajectory? Note that any given stress pattern is consistent with multiple possible trajectories. Figure 5 illustrates alternative
Hansen and Huron
Figure 5. Although sound sources may follow complex trajectories (solid lines), a listener with eyes closed is likely to infer the simplest trajectory consistent with the perceived pattern, in this case, leading to (a) pendular or (b) circular percepts (dashed lines).
trajectories that could theoretically produce the same loudness patterns. For example, a binary stress pattern might arise from a figure-of-eight trajectory (Figure 5(a)). Similarly, a ternary stress pattern might arise from a more complicated trajectory (Figure 5(b)). If velocity was, moreover, allowed to vary, the number of possible trajectories would increase considerably. So on what basis might a listener favor inferring one trajectory over another? A parallel problem is known from visual perception. For example, in motion-picture film where apparent motion is induced from a sequence of still images, a rotating wheel may be perceived to move backwards whenever speed of rotation interacts with the number of frames per second. While, in reality, the wheel is rotating forwards, low frame rates may result in successive images being visually parsed to favor the simpler trajectory: a smooth backward motion rather than a jumpy forward motion involving multiple rotations between each frame (Watson & Ahumada, 1985). Sampling theory refers to this phenomenon as “aliasing” resulting from violating the Nyquist criterion, which requires sampling at no less than twice the highest frequency to be represented (Nyquist, 1928). By analogy, the listener depicted in Figure 5(a) most likely infers simple, circular motion over a complex, figure-of-eight trajectory. However, an even simpler candidate trajectory exists: namely, linear back-and-forth, pendular-like motion. In addition to the cognitive preference for simplicity (Chater & Vita´nyi, 2003) exemplified here, ecological acoustics predicts that listeners infer the most common real-world movement conforming to the perceived pattern of loudness. Most people encounter linear, back-and-forth movements (e.g., rocking, bouncing, swinging) more often than circular motion which, arguably, is yet more commonly experienced than figure-of-eight trajectories. Thus, environmental exposure and cognitive simplicity preference both speak in favor of pendular and circular interpretations of Figure 5(a) and (b), respectively.
7 Whereas Figure 4 illustrates expected sound patterns for sources known to follow a circular trajectory, Figure 6 considers how different loudness patterns are likely to be perceived by listeners. A truly rotating sound source will not necessarily result in the perception of a rotating source. A rotating sound source emitting two sounds per cycle (as in Figure 6(a), for example) would most likely be perceived to result from back-and-forth oscillation—the simplest and most commonly encountered movement consistent with the perceived pattern. Two possibilities are illustrated for the ternary case in Figure 6(b). Consider first the back-and-forth motion where the loudest event occurs when the pendulum is closest to the listener. As the pendulum moves two-thirds away, a second (weaker) sound is produced. After reversing direction at the end point, a third (weak) sound occurs in the identical position. So why would triplets favor a rotating trajectory rather than the illustrated back-and-forth motion? In the real world, sound onsets are strongly linked with changes of acceleration. For example, a percussive mallet recoils after striking an object. Similar principles apply to sounds produced by bowed or plucked strings and air movements. In a simple back-and-forth trajectory, there are only two points of change of acceleration. Assuming constant velocity, regardless of how one attempts to place three equidistant sound events, a maximum of one sound event would coincide with one of the points of maximum pendular acceleration change. At least one of the beats would need to coincide with the movement. Therefore, from an ecological perspective, back-and-forth trajectories offer a poorer fit than circular trajectories to ternary sound patterns.2 The same logic regarding circular motion as the most plausible trajectory applies to quinary rhythms (Figure 6(d)). Quaternary rhythms are more ambiguous. While accounting for the two sound events coinciding with endpoints, a pendular interpretation would leave another two events unexplained. Consequently, it is unclear whether quaternary rhythms favor rotating or pendular trajectories.
Orientation and position Up until now, discussion has been limited to movement in a horizontal plane with the listener positioned outside the trajectory facing the sound source (Figure 3(a)). Alternative orientations and positions such as those suggested by Figure 3(b) to (d) will now be addressed. First, consider the situation where a stationary listener is positioned inside the movement trajectory of the sound source (Figure 3(b)). In the unlikely case where the listener occupies the center of the circular trajectory, all sounds will be equally loud. As the head moves away from the center, dynamic patterns become increasingly differentiated, reaching maximum contrast when head position coincides with a point on the circle. Moving outside the
8
Music & Science
Figure 6. Due to the cognitive preference for simpler and more commonly encountered trajectories, binary patterns easily reduce to pendular motion, as in (a). Whereas this would imply change of direction with no co-occurring sound events for ternary and quinary patterns, (b) and (d), it is more ambiguous whether quaternary patterns, (c), are most parsimoniously represented as spinning or bouncing.
circular trajectory, dynamic contrast declines again, asymptotically towards zero differentiation at infinite distance. Note that head position only affects the relative contrast of the pattern, not the pattern itself. In short, the analysis offered earlier applies equally well to scenarios where the listener is inside or outside the sound source trajectory. Next, suppose that a listener is moving in a circle either around a stationary sound source (Figure 3(d)) or next to it (Figure 3(c)). Again, in the unlikely scenarios where the sound source occupies the center of the circle or is infinitely far away from it, all sounds will exhibit the same loudness with no differentiated pattern emerging. In all other cases, binary, ternary, quaternary, and quinary patterns will necessarily contain one sound event that is louder than the others, as outlined in the ETRoS model. Finally, regarding the plane of movement, any tilt (including a fully vertical plane) will similarly maintain the dynamic patterns described above. In conclusion, although first exemplified with a horizontally moving sound source with a stationary point-source listener outside the trajectory, ETRoS generalizes to other scenarios of circular relative motion, regardless of which component is moving and whether its movement circumnavigates the stationary component or not. If this ecologically-inspired theory is correct, the rotating qualia reported by music commentators should also be perceived by ordinary listeners. Accordingly, two experiments are reported. Experiment A examines listener judgments of spinning for rhythmic patterns with different cardinality (binary, ternary, quaternary, quinary) played at four different tempi. Experiment B manipulates relative intensity patterns as a more direct test of ETRoS.
Experiment A The first experiment tests the following hypotheses: H1: Ternary patterns are judged by listeners as more evocative of rotating or spinning than non-ternary patterns (binary, quaternary, quinary). H2: Listener ratings of spinning or rotating increase with tempo.
Method Participants. Twenty-six participants (23 females; median age, 20 years; age range, 19–57 years), principally students from the Ohio State University Department of Speech and Hearing Sciences, were recruited. Most participants received course credit for their contribution. Participants ranged from 7 to 41 (median, 23.5) on the musical training subscale from the Goldsmiths Musical Sophistication Index (Mu¨llensiefen, Gingras, Musil, & Stewart, 2014). Answering a single question from the Ollen Musical Sophistication Index (Ollen, 2006), of the 26 participants, 2 (8%) selfreported as non-musicians, 14 (54%) as music-loving non-musicians, 6 (23%) as amateur musicians, and 4 (15%) as serious amateur musicians. None self-reported as semi-professional or professional musicians. Stimuli. To resolve which stimuli would be most suitable to test the two hypotheses, a range of pilot experiments were conducted with different pitch patterns. To induce the intended groupings of 2, 3, 4, or 5 notes, some pilot stimuli made use of pitch height (Figure 7(a)), others of contour changes (Figure 7(b)), and others again comprised repeated unison pitches where rhythmic structure was imposed
Hansen and Huron
9
Figure 7. Examples of pilot stimuli using (a) pitch height; (b), (d), and (e), contour changes; or (c) dynamic accents in order to induce the intended groupings of 2, 3, 4, or 5 notes. A range of tempo values were tried, including 100, 150, 200, and 250 quarter notes per minute. Timbres comprised (a) and (b), piano, and (c), (d), and (e), marimba. For pilot stimuli using consistent note repetitions (i.e., (c)), two MIDI velocities were used, corresponding to 127 for accented notes and 95 for unaccented notes. Whereas the final stimuli depicted in Figure 8 were also used as pilot stimuli, none of the pilot data are included in the main analysis reported here.
solely by increasing the intensity of the first member of each grouping (Figure 7(c)). Pilot participants comprised lab members and colleagues of the experimenters who selfreported to be unfamiliar with the experimental hypotheses. All pilot experiments were conducted online using headphones with participants asked to set their volume level to a “comfortable level where you can easily hear the sound”. Sound stimuli were presented using the Qualtrics presentation software and responses were provided using a computer mouse on a 7-point scale ranging from “1: Does definitely not sound like it is spinning or rotating” to “7: Definitely sounds like it is spinning or rotating.” In completing the initial pilot experiments, some individuals spontaneously reported having difficulty hearing repeated pitches (Figure 7(c)) as connoting any type of rotation. This confirmed analytic observations in that only three of the 27 SMT examples employed unisons. Accordingly, further pilot experiments were conducted using repeated isochronous patterns emulating the SMT examples with constant dynamics and varying pitches in close proximity (Figure 7(a), (b), (d), and (e)). None of the pilot data are included in the data analysis for our main experiments. After several iterations, a stimulus set emerged that appeared to produce consistent results while maintaining pitch contour and tonal implications across different cardinalities. Specifically, D4-C#4 was used for binary, D4C4-C#4 for ternary, D4-C#4-C4-C#4 for quaternary, and D4-C4-B4-C4-C#4 for quinary. Notably, these patterns are all consistent with Doppler shifts produced by a rotating sound source whereby pitch contour rises as the source approaches and falls as the source recedes away from the listener. They also follow Thomassen’s (1982, 1983) model
of pitch-related accent whereby perceived accents coincide with changes of pitch contour with ascending-descending pivots providing greater accent than descending-ascending pivots. Music-compositional practice is known to conform to this model (Huron & Royal, 1996). Each pattern was generated using a MIDI piano timbre at four different tempi: 200 notes/minute (300 ms tone duration), 300 notes/minute (200 ms), 400 notes/minute (150 ms), and 500 notes/minute (120 ms). Sequences of patterns were 6 seconds in duration and exported as MPEG-1 (“mp3”) files with a sampling rate of 44.1 kHz. To ensure that participants heard the first tone (D4) as the downbeat, a 500 ms fade-out with no fade-in was used. An example of a ternary pattern presented at 400 notes/minute can be found in Audio 1 (Supplemental material). Procedure. Participants were tested individually via headphones in an Industrial Acoustics Corporation soundattenuated room with the volume set to a participantselected comfortable—yet audible—level. Each participant heard a unique random order of 32 sound files comprising two copies of each of the four tempo renditions for binary, ternary, quaternary, and quinary patterns presented with the Qualtrics software. Participants first viewed three muted computer animations illustrating rotating, bouncing, and figure-of-eight trajectories (see Video 1). Subsequently, participants received the following instructions: Note that in this experiment, “spinning” and “rotating” are used as two complementary ways of describing a specific way of experiencing musical sounds. Some people may generally find one of these terms more suitable than the other. Also, for a
10
Music & Science given individual, some sound passages may sound more like they are spinning while others may sound more like they are rotating. Others again will sound as neither spinning nor rotating. Therefore, if you think a sound passage sounds like it is spinning, but not rotating, then you should give it a high rating for spinning/rotating. Similarly, if you think a given sound passage sounds like it is rotating, but not spinning, then you should also give it a high rating for spinning/rotating. On the other hand, if a given sound passage neither sounds like it is spinning nor rotating, then you should give it a low rating on spinning/rotating. Now that you have learned what we mean by “Spinning/ Rotating Sounds,” you are ready to begin the listening experiment. You will hear 32 short sound passages. For each passage, your task is to rate how much the passage sounds like it is spinning/rotating. You provide your answer on a 7-point scale ranging from “1: Does definitely not sound like it is spinning or rotating” to “7: Definitely sounds like it is spinning or rotating.” You can play each sound passage as many times as you like, but please make sure to only play one sound at a time. There are no right and wrong answers so we recommend that you just go with your first intuition.
Following these instructions, participants listened to sound files that they activated by clicking with a computer mouse. Accompanying each stimulus, the following instruction was reiterated on the screen: “Please rate how much the passage sounds like it is spinning or rotating.” Responses were provided by clicking the relevant boxes on the computer screen, similarly using a mouse. Participants self-paced through this brief 5–6 min experiment with no explicit encouragements to take breaks.
Results Test–retest reliability. Before data analysis, an exclusion criterion had been set to ensure that results reflected consistent responses. In the movement sciences, test–retest reliability is commonly used to quantify skillful performance, in that highly consistent actions or responses reflect mastery of a given skill (Weir, 2005). To adequately characterize the physical or physiological properties of expert motor skills, one would therefore focus on individuals who can perform the relevant movement patterns consistently every time and exclude incapable individuals who only add noise. Similarly, in the present experiment, to meaningfully characterize the musical properties of rotating sound stimuli, it is appropriate to first focus on listeners who consistently perceive certain stimuli as more evocative of rotating qualia than others. Note that this procedure does not bias results towards responses that are consistent with the experimental hypothesis, but merely towards responses that are less noisy. Thus, results could be reliable, yet inconsistent with the hypothesis. To ensure that results generalize more
Figure 8. The results from Experiment A show that compared with non-ternary patterns ternary patterns evoke a greater sense of spinning/rotating in listeners. This is, however, primarily attributable to low ratings for binary patterns. Spinning/rotating qualia, moreover, increase with musical tempo (spanning from 200 to 500 notes/minute).
broadly across individuals, confirmatory analysis on the full dataset was also planned. It was decided a priori that participants exhibiting test– retest correlations less than þ.50 would be excluded. However, if necessary, this reliability threshold would be lowered to ensure that at least 50% of participants’ data was retained for data analysis. In Experiment A, test–retest reliability scores ranged from .55 to þ.97 with a mean of þ.46. Following the a priori exclusion criterion, eight participants were excluded, leaving data from 18 participants. Spinning/rotating ratings. Figure 8 displays average spinning/ rotation ratings for the 16 stimulus types. Consistent with our main hypotheses, a 24 ANOVA with pattern (ternary vs. non-ternary) and tempo (200 vs. 300 vs. 400 vs. 500 notes/minute) as within-participant factors showed that ratings were generally higher for ternary than non-ternary patterns, F(1, 17) ¼ 19.65, p < .001, Zp2 ¼ .536, and that ratings increased marginally significantly with tempo, F(1.3, 22.2) ¼ 3.93, p ¼ .050, Zp2 ¼ 0.188. Due to violation of the sphericity assumption for tempo, GreenhouseGeisser correction was applied to the degrees of freedom for this factor. No significant interaction was found, F(3, 51) ¼ 0.50, p ¼ .685, Zp2 ¼ 0.029. Another 24 ANOVA conducted on data from all participants confirmed that these results remained robust when participants with low reliability were included. Specifically, significant main effects of pattern, F(1, 25) ¼ 24.19, p < .001, Zp2 ¼ 0.492, and tempo, F(1.49, 37.1) ¼ 5.08, p ¼ .018, Zp2 ¼ 0.169, were found with non-significant interaction, F(3, 75) ¼ 0.62, p ¼ .599, Zp2 ¼ 0.024. Summing
Hansen and Huron up, ternary patterns were perceived as more spinning or rotating than non-ternary patterns on average. Moreover, while this difference was intact across different tempi, faster patterns were generally perceived as more spinning than slower ones. Notably, when assessing the different types of nonternary stimuli, somewhat unanticipated responses emerged for quaternary and quinary rhythms (Figure 8). Overall, these types of non-ternary stimuli appeared to be perceived as no less spinning or rotating than ternary patterns. This raises the question whether the main effects of pattern might primarily be driven by lower ratings for binary patterns. To investigate this possibility, a post hoc one-way ANOVA was conducted on data from participants with supra-threshold reliability, comparing ratings for binary, ternary, quaternary, and quinary patterns explicitly. In addition to a significant main effect of pattern, F(1.6, 26.6) ¼ 9.27, p ¼ .002, Zp2 ¼ .353, planned simple contrasts showed that whereas binary patterns were significantly less spinning or rotating than ternary ones, F(1, 17) ¼ 17.30, p ¼ .001, Zp2 ¼ .504, quaternary, F(1, 17) ¼ 1.30, p ¼ .270, Zp2 ¼ .071, and quinary patterns, F(1, 17) ¼ 0.54, p ¼ .472, Zp2 ¼ .031, did not differ significantly from ternary ones. Thus, whereas the lower ratings for binary patterns were consistent with ETRoS, the lack of significantly higher spin/rotation for ternary compared with quaternary and quinary patterns seems inconsistent with the theory. Response strategies. In post-experiment interviews, when prompted for explicit response strategies, 11 out of the 26 participants spontaneously reported that they had closed their eyes and internally visualized objects (or themselves) moving along with the sounds. Moreover, four individuals out of the full sample reported that they had entrained with the sounds, for example, by moving their wrists, heads, eyes, or the computer mouse in a circular trajectory. All in all, given two individuals reporting use of both types of strategies, a total of 13 participants admitted using visualization and/or entrainment strategies. Such strategies may have been prompted by the animation videos shown to participants prior to the experiment (Video 1). Assessing whether these strategies were beneficial or not, Mann-Whitney U-tests showed that people using visualization strategies generally had higher reliability scores (Median ¼ .782, n ¼ 11) compared with people who did not use visualization strategies (Median ¼ .511, n ¼ 15), U ¼ 38, Z ¼ 2.31, p ¼ .020. Entrainment strategies, on the other hand, were associated with significantly lower reliability scores (Median ¼ .258, n ¼ 4), U ¼ 16, Z ¼ 1.99, p ¼ .048, compared with people not using such strategies (Median ¼ .617, n ¼ 22). Due to the limited use of entrainment strategies, however, this latter result needs to be interpreted with caution. Even though visualization and entrainment strategies may have influenced the reliability of participants’
11 responses, two separate 242 ANOVAs on data from all participants with pattern (ternary vs. non-ternary) and tempo as within-participant factors and visualization or entrainment strategy as dichotomous between-participant factors showed no main effects or interactions involving strategy-related factors (for all, p > .05). Thus, while visualizing objects may have led to more consistent responses and entraining with sounds may have led to less consistent responses, none of these strategies compromised the overall results reported previously.
Experiment B The ETRoS theory suggests that the rotating quality of triplets is not attributable to the ternary pattern itself, but to the conformity of its intensity profile to the behavior of actual rotating sound sources. Consequently, imposing ecologically-congruent intensity patterns on other rhythms should render them more rotating-sounding. For example, overriding the classic metric interpretation of quadruple rhythms, “strong-weak-medium-weak”, with “strongmedium-weak-medium”, should transform quaternary patterns to be consistent with an ecological interpretation of a rotating sound-producing or sound-receiving object. This suggests the following hypothesis: H3: Quaternary patterns following a strong-mediumweak-medium intensity profile are perceived as evoking a greater sense of rotation than a matched pattern following a strong-weak-medium-weak profile.
Method Participants. Experiment B was conducted immediately after Experiment A and employed the same participants. Stimuli. In light of the strong tempo effects observed in Experiment A, all stimuli in Experiment B employed a constant tempo of 300 notes per minute. One stimulus type (“chromatic”) used the same quaternary pitch contour as in Experiment A (namely, D4-C#4-C4-C#4). Note that even though unison pitch patterns were infrequent in the SMT examples and were generally not perceived as spinning or rotating using neutral intensity manipulations in the pilot experiments, it remains possible that ecological intensity manipulations would be effective even for such patterns. Therefore, a second stimulus type (“repeated”) used groups of four repeated pitches (D4). As noted, the main manipulation pertained to intensity. Specifically, employing the principle that loudness is inversely proportional to distance, ecological patterns derived from the ETRoS model were pitted against unecological matched patterns (Figure 9). Intensity profiles for ecological stimuli were calculated as follows:
12
Music & Science
(a)
(b)
-170°
Degrees 100°
-80°
10°
(c) Radians
Distance
MIDI velocity
10°
0.17
0.17*r
119
100°
1.75
1.53*r
53
-170°
-2.97
1.99*r
32
-80°
-1.40
1.28*r
65
chromatic
repeated
Figure 9. Example of one of the 18 phase rotations used in Experiment B to create ecological and unecological quaternary stimulus patterns based on the Ecological Theory of Rotating Sounds (ETRoS). (a), (b): Ecological intensity profiles were determined by mapping the Euclidean distance between a listener and the points of sound emission to MIDI velocity values; (c): Unecological “twins” were subsequently created by permuting the ecological MIDI velocity values.
1.
Given a phase position representing the loudest event in the cycle, three other phase positions were established by dividing the 360 of the circle into four. For example, a starting phase of 10 entails accompanying events sounding at 100 , 170 , and 80 (Figure 9(a)). 2. Euclidean distances were then calculated between each phase position and the hypothetical listener assumed to be located at position zero on the circle (Figure 9(b)). 3. Using this distance measure, MIDI key velocities for each note event were interpolated linearly between the maximum MIDI velocity 127 and 31 (deemed to constitute a reasonably quiet though still clearly audible lower bound). This procedure follows from Dannenberg’s (2006) observation that—in the absence of any 100% fixed relationship—a simple square law provides an approximate mapping between MIDI key velocity and peak RMS amplitude. Conveniently, the inverse square-law relating distance to amplitude thus cancels out the square-law MIDI mapping. 4. Finally, steps 1–3 were repeated for 18 starting positions, each offset by 5 spanning the arc from 40 to þ45 .
For each ecological stimulus, a non-ecological “twin” was created by permuting the four MIDI velocity values (Figure 9(c)). Specifically, the highest MIDI key velocity was retained for the first (i.e., closest) sound event, the second highest value occupied position 3, and positions 2 and 4 were interchanged. For example, an ecological pattern of MIDI key velocities of 119, 53, 32, 65 (Audio 2 for chromatic and Audio 3 for repeated, Supplemental Material) would be reorganized as 119, 32, 65, 53 (Audio 4 for chromatic and Audio 5 for repeated, Supplemental Material). In this way, 36 stimulus pairs comprised chromatic and unison versions of 18 ecological intensity patterns conforming to a physical, ecological model of rotation and 18 unecological intensity patterns conforming to a musically more familiar strong-weak-medium-weak stress pattern.
For variety, a marimba sound was used, with slight reverberation to increase realism. All stimulus sequences were 6 seconds in duration, comprising six repetitions of the fournote patterns notated in Figure 9(c). They were prepared in Cubase 7.1.3 (Steinberg Media Technologies GmbH, Hamburg, Germany) and exported as MPEG-1 (“mp3”) files with a sampling rate of 44.1 kHz. Procedure. Participants completed a two-alternative forcedchoice task where they listened to two random permutations of the 36 stimulus pairs. Sound files were presented with the Qualtrics software and played once whenever the participant clicked the relevant on-screen display with a computer mouse. Headphones were used with the volume set to a participant-selected, comfortable, yet audible sound level. The task was self-paced with a voluntary break encouraged halfway through. Beforehand, participants received the following instructions: You will now hear 72 pairs of short, repeated sound passages. For each pair, your task is to choose which passage sounds more like it is spinning/rotating. You give your answer by clicking on the frame surrounding the most spinning or rotating sound display. You can play each sound passage as many times as you like, but please make sure to only play one sound at a time. There are no right or wrong answers so we recommend that you just go with your first intuition.
The left–right placement of ecological and unecological files on the computer screen was counterbalanced. Each screen reiterated the instructions, “Which passage sounds more like it is spinning/rotating?” Recall that participants had already received the detailed instructions on how to conceptualize the rotating or spinning quality of sound stimuli.
Results Test–retest reliability. Once again, an a priori test–retest reliability criterion was set to reduce noise in the measurements.
Hansen and Huron
13
Since, however, the data in Experiment B were dichotomous ecological/unecological judgments rather than ratings on a 7-point scale (as in Experiment A), an alternative ratio-based exclusion criterion needed to be adopted. Consequently, it was decided that only participants responding the same way to 60% or more of the repeated paired judgments would be included in analysis. As in Experiment A, this criterion would be loosened as necessary to ensure that no more than half of the data were excluded. Test–retest reliability in Experiment B ranged between 44% and 89% with a mean of 62%. The inclusion criterion was met by 16 of the 26 participants. Ecological versus unecological patterns. In analyzing the data from Experiment B, the statistical software R (R Core Team, 2014) and the “glmer” function from the lme4 package (Bates, Maechler, Bolker, & Walker, 2015) were used to conduct binary logistic regression with generalized linear mixed effects modelling by maximum likelihood (Laplace Approximation). This enabled accounting for the data dependency of multiple responses from each participant and for dual responses for each repeated stimulus pair. Using this method, an intercept with a log odds ratio significantly above zero would imply that participants were more likely than chance to select ecological stimuli over unecological ones as sounding more spinning or rotating. Consequently, whereas an intercept term was the only fixed effect entered into the initial model, participant ID and stimulus item were entered as random effects. This model was compared with a null model with the same random effects, but no intercept. Indeed, when including 1,152 data points from the 16 participants meeting the inclusion criterion, a significant intercept estimate of 0.79 (SE ¼ 0.17), Z ¼ 4.68, p < .001, was obtained. This model (AIC ¼ 1,408.13) performed significantly better than the null model (AIC ¼ 1,421.74), w(1) ¼ 15.61, p < .001. Moreover, a post hoc analysis including 1,872 data points from all 26 participants confirmed this pattern with a significant intercept of 0.62 (SE ¼ 0.14), Z ¼ 4.50, p < .001. This model (AIC ¼ 2,365.72) also outperformed the null model (AIC ¼ 2,380.43), w(1) ¼ 16.71, p