test sequences were presented simultaneously, followed by a comparison se- .... multitimbral keyboard, under the control of a Macintosh SE-30 computer.
Psychomusicohgy, 12, 3-21 ©1993 Psychomusicology
PITCH PATTERN, DURATIONAL PATTERN, AND TIMBRE: A STUDY OF THE PERCEPTUAL INTEGRATION OF AUDITORY QUALITIES William Forde Thompson Donald Sinclair Atkinson College, York University The perceptual integration of pitch pattern, durational pattern, and timbre was examined in two experiments. In Experiment 1, a "texture" of note sequences was defined as two different note patterns played repeatedly. On each trial, 16 consecutive note patterns were played. Presentations established either a single texture (e.g., 1211212221121121) or a texture that changed to another texture (e.g., 1211212134334344). Listeners indicated if they perceived a texture change. If a texture change was created by introducing a pitch pattern or durational pattern that was not in the initial texture, the change was perceived easily. If a texture change was created by taking the pitch patterns, durational patterns, and timbres in the initial texture, and recombining these qualities to form another texture, the change was more difficult to perceive. In Experiment 2, two test sequences were presented simultaneously, followed by a comparison sequence. Listeners judged whether the comparison was one of the test sequences. If the comparison contained a pitch pattern or durational pattern that was not in either test sequence, then it was easy to discriminate from either test sequence. If the comparison was a novel combination of qualities from both test sequences, however, it was more likely to be perceived as one of the two test sequences. The research is discussed in view of Treisman and Gelade's (1980) featureintegration theory.
A music sequence may be described in terms of a number of qualities, including pitch pattern, durational pattern, timbre, and factors related to performance expression. Within music cognition, these qualities may be understood as influences on a listener's perception of music. Each quality may be controlled and tested as an independent variable in psychological research, though many researchers are interested in the extent to which qualities interact. Interactive effects have been reported in a number of studies. For isolated tones or chords, perceptual interactions between pitch and timbre have been found (Beal, 1985;Krumhansl&Iverson, 1992; Melara& Marks, 1990). For sequences of notes, memory for pitch events has been shown to depend on temporal structure, implying a perceptual interaction between these qualities (Deutsch, 1980; Jones, Boltz, & Kidd, 1982). Perceptual interactions do not always occur, however. Monohan and Carterette (1985) and Palmer and Krumhansl (1987a, 1987b) found that pitch pattern and temporal pattern make independent contributions toward broad judgments of music sequences. In Palmer and Krumhansl's (1987b) study, the temporal and pitch patterns of an excerpt from a Mozart piano sonata were artificially separated, and judgments for the separate compoThompson and Sinclair
3
nents were collected. These judgments were then compared to judgments for the combined components. Correlation and regression analyses suggested that "no emergent perceptual qualities appeared for musical phrases containing both pitch and temporal patterns that were not found in the ratings of the separate components" (p. 512). The authors suggested that both the temporal pattern and the pitch pattern of familiar music are represented in memory, but that these qualities contribute independently to judgments of phrase "goodness" or ''completeness." The issue of independence of music qualities raises the question of the present research: To what extent are the qualities of note patterns integrated with one another in the perceptual process? Reports of interactive effects imply that auditory qualities are perceptually integrated — if they were not, there would be no basis for interactive effects. Reports that music qualities contribute independently to listeners' judgments leave open the possibility that the perceptual integration of auditory qualities is incomplete. Perceptual Integration as a Stage of Processing A useful discussion of the perceptual integration process as it relates to music stimuli has been provided by Deutsch (1986). Based on various findings, Deutsch argues that music stimuli are first "fragmented into their separate attributes, and this process of fragmentation is followed by a process of perceptual synthesis in which the different attribute values arerecombined" (Chapter 32, p. 5-6). Some of the evidence for this idea came from findings in which qualities were improperly recombined — a phenomenon that may be referred to as an "illusory conjunction" (Treisman & Schmidt, 1982). The scale illusion provides an example of an illusory conjunction of pitch pattern and location (Deutsch, 1975; Butler, 1979). Two major scales, one ascending and the other descending, were presented simultaneously to listeners. Successive tones in the ascending scale were presented alternately to the two ears, starting with the left ear. Successive tones in the descending scale were also presented alternately to the two ears, but starting with the right ear. The scales were played ten times to listeners. Listeners did not perceive the precise manner in which pitches were combined with locations. Rather, listeners perceived pitches to be combined with locations in a way that grouped tones by proximity in pitch. This illusion indicates that pitch pattern can become improperly integrated with location information. Illusory conjunctions were first studied in vision, as part of a series of investigations concerning "feature-integration theory" (Treisman & Gelade, 1980; Treisman & Schmidt, 1982). According to feature-integration theory, visual information processing involves an initial stage in which individual features such as lines and colors are registered. At a subsequent stage, requiring focused attention, these initially separable features are reintegrated into unitary objects. In auditory perception, processing also may involve a stage of decomposition in which separable qualities or attributes are evaluated, and a stage in which these qualities are reintegrated. Evaluation of qualities may be 4
Psychomusicology * Spring 1993
construed as a general process operating for both individual features, such as pitch, and qualities that emerge over time, such as pitch pattern and durational pattern. Physiological work on the evaluation process has focused on individual features, such as pure tones, noise bursts, or clicking sounds (Moore, 1988). However, the possibility that evaluation also occurs for qualities that emerge over time is supported by recent evidence that pitch pattern and durational pattern are neurally dissociated (Peretz & Morais, 1989; Peretz & Kolinsky, 1993). Integration, less understood, may be conceptualized as the formation of associations between qualities, based on contiguity in time. Interactive effects among qualities may or may not arise at the integration stage. According to feature-integration theory, focused attention may function in the perceptual integration of visual features (Treisman & Gelade, 1980). The present research examined an analogous possibility—that focused attention may influence the extent to which auditory qualities are perceptually integrated. Thus, perceptual integration was evaluated both when listeners were attentive and when they were distracted. A connection between attentional processes and interactions among music qualities has already been made by Jones (1976; 1982; see also Dowling, Lung, & Herrbold, 1987). However, her discussions have centered on fluctuations in the focus and direction of attention once a listener is already generally attentive. Jones has not compared perceptual processes when listeners are attentive to perceptual processes when listeners are distracted. In Jones* expectancy model, rhythmic regularity in music directs attention to temporally or harmonically accented elements. The effect is a rhythmicity of attentional focus, and a greater awareness and memory of those music elements to which attention is most strongly directed. Experimental support for the model has been reported by Jones, Kidd, and Wetzel (1981). We examined the sensitivity of listeners to the manner in which pitch patterns, durational patterns, and timbres are combined to form melodic sequences. A lack of sensitivity to how pitch patterns, durational patterns, and timbres are combined is understood as a lack of perceptual integration among those qualities. Perceptual integration was evaluated under conditions of both attention and distraction. Experimental conditions were based, by analogy, on conditions designed to evaluate the "feature-integration" theory of visual attention (Treisman & Gelade, 1980). In Experiment 1, listeners provided judgments of relatively long chains of note sequences, or "textures,*' to examine the processing of tonal qualities in monophonic contexts. In Experiment 2, listeners compared monophonic sequences to two-part sequences, to examine the processing of tonal qualities in polyphonic contexts. Experiment 1 Experiment 1 examined the sensitivity of listeners to the manner in which durational pattern, pitch pattern, and timbre are combined in monophonic contexts. The concept of a "texture of note sequences" was defined Thompson and Sinclair
5
for this purpose, and can be compared to visual textures used by Treisman and Gelade (1980). An example of a visual texture is arandom display of red Os and blue Xs. If a texture of red Os and blue Xs is placed next to a texture of red Os and green Xs, the two textures should be easily discriminable, because they can be discriminated by color alone. If a texture of red Os and blue Xs is placed next to a texture of blue Os and red Xs, however, discrimination should be more difficult. In the latter case, texture discrimination requires a sensitivity to the manner in which color and shape are combined. In Experiment 1, a texture of note sequences was defined as two brief sequences, played consecutively in a randomly varying order and at varying transpositions. Each sequence was a unique combination of pitch pattern, durational pattern, and timbre. Timbre differences were introduced to provide listeners with an easy way of identifying and distinguishing the two sequences that defined a texture. A change in texture was defined as any change to one or both of the two sequences defining the texture. Three general conditions were examined. In the no-change condition, there was no texture change in the presentation. In the foreign and switch conditions, a texture change occurred approximately midway through the presentation. In the foreign condition, the new texture contained a quality that was not present in the initial texture. Specifically, both sequences of the new texture contained either a novel pitch pattern or a novel durational pattern. In the switch condition, the new texture contained the same pitch patterns, durational patterns, and timbres that were present in the initial texture, but the way in which those qualities were combined was changed. For example, the pitch pattern of one sequence was combined with the durational pattern and timbre of the other sequence, and vice versa. We predicted that texture changes should be easiest to detect for foreign conditions, that is, for changes in which the new texture was created by introducing a novel quality. This prediction is based on the idea that detection of these changes does not require sensitivity to the precise manner in which qualities are combined. Rather, detection of texture changes for these conditions merely requires evaluation of the individual qualities involved. Texture changes should be more difficult to detect for switch conditions, that is, for changes in which qualities in the initial texture were recombined to create the new texture. This latter prediction is based on the idea that detection of these changes requires a sensitivity to the precise manner in which the pitch patterns, durational patterns, and timbres of sequences were combined in each texture. A lack of sensitivity to the manner in which qualities were combined in each texture should lead to poorer detection of texture changes in the switch conditions, because both textures in this condition involve the same qualities. Insensitivity to the manner in which qualities are combined indicates that the perceptual integration of these qualities is either incomplete or not entirely accurate. If perceptual integration is incomplete, qualities may remain partially dissociated in short-term memory. In the extreme case, listeners should remember only the individual qualities in the sequences—the 6
Psychomusicology • Spring 1993
pitch patterns, durational patterns, and timbres—the manner in which qualities were combined would not be encoded. This possible outcome predicts that the two textures for switch conditions should be equivalent from a processing perspective, and therefore difficult to discriminate. If perceptual integration is inaccurate, illusory conjunctions of qualities may be encoded for both textures. In that case, the perceived number of shared sequences in the two textures should be higher for switch conditions than for foreign conditions, because the two textures in switch conditions share the same qualities. This outcome again predicts that texture discrimination should be more difficult for switch conditions than for foreign conditions. Experiment 1 also evaluated the role of attention by directly manipulating the attentionai focus of listeners. All trials were performed twice by listeners: once undistracted and once while performing a distraction task. We predicted that attention may facilitate the perceptual integration of qualities. Method Subjects. Twenty one adult listeners participated in the experiment. Listeners were at least moderately trained in music, having had a minimum of 5 years of formal training on a music instrument. All listeners reported having normal hearing. Materials and conditions. Sequences were generated by a Roland U-20 multitimbral keyboard, under the control of a Macintosh SE-30 computer. Listeners were told that the task was analogous to a visual texture discrimination task, but in this case, textures were chains of six-note sequences. A single texture was defined as two six-note sequences, played repeatedly with no pauses between sequences, in a randomly varying order and at random transpositions within an octave. The two sequences that defined a texture always differed from each other in timbre. The timbres used were highly discriminable from each other. This difference in timbre (along with knowledge that each sequence involved six notes) helped listeners to keep track of the starting point of sequences within the texture. The two sequences in a texture also differed from each other in durational pattern and/or pitch pattern. Any change to one or both of the two sequences was defined as a change in texture. The task of the listener was to decide if, at any point during the presentation, there was a change in texture. Each presentation consisted of 16 six-note sequences. Each six-note sequence was played at a randomly determined pitch height such that its first note was between G3 and Al>4. These transpositions were necessary to ensure that texture changes were defined by changes to the pattern of notes, and could not be detected merely by listening for changes in duration to an individual pitch. Before each presentation, listeners were presented the two six-note sequences that defined the initial texture of the presentation. This procedure provided listeners with a clear sense of the initial texture. In addition, the first two sequences of all texture presentations were always the two sequences that defined the initial texture of the presentation. After that, Thompson and Sinclair
7
the two sequences were played randomly with replacement, and therefore did not always alternate. Presentations were either a single texture or involved a change in texture. For presentations involving a single texture, each of the 16 sequences was one of the two six-note sequences (e.g., 1211212221121121). In presentations involving a change in texture, the two sequences that defined the initial texture changed to two different sequences (e.g., 1211212134334344). Changes in texture occurred at a randomly selected position from sequence eight to sequence twelve in the presentation. Once a change in texture occurred, the two new sequences continued in a randomly varying order until the end of the presentation. There were six presentations in which there was no change in texture and four presentations in which there was a change in texture. All sequences were created from the same set of two durational patterns, two pitch patterns, and two timbres. Listeners heard each of the 10 presentations twice— once while performing a distraction task, and once undistracted. The order in which the 20 trials were presented to listeners was randomized independently for each listener. The sequences used for the four presentations involving texture changes are notated and labeled in Figure 1. The two sequences that define the initial texture are shown on the left-hand side of the figure, and the two sequences that define the changed texture are shown on the right-hand side of the figure. The timbre of each sequence, taken from the sampled sounds of the Roland U-20, is also indicated on the figure. The four presentations involving a change in texture are labeled: pitch-pattern foreign; durational-pattern foreign; pitch-pattern switch; and durational-pattern switch. In the presentation labeled pitch-pattern foreign, the two sequences of the initial texture shared the same pitch pattern, and the two sequences of the new texture had a different pitch pattern. In the presentation labeled pitch-pattern switch, the new texture was created by switching the pitch patterns of the two sequences with each other. That is, the pitch pattern of one sequence was combined with the durational pattern and timbre of the other sequence, and vice versa. In the presentation labeled durational-pattern foreign, the two sequences of the initial texture shared the same durational pattern, and the two sequences of the new texture had a different durational pattern. In the presentation labeled durational-pattern switch, the new texture was created by switching the durational patterns of the two sequences of the initial texture with each other. The tempo of the sequences was 200 quarter-note beats/min. All notes were based on equal-tempered tuning. Randomizations of transpositions and sequence order were determined independently for each listener. Distraction task. In trials involving distraction, words appeared on the computer screen (e.g., moon, eat, orange, run). After each word appeared, the listener indicated, by clicking with the mouse on one of two boxes, whether the word was most commonly used as a verb or a noun. If a response 8
Psychomusicology • Spring 1993
CO >x: o
•° S o 1? OX) O
"o
.1-8.1
B
.2
co o
9
&
C
H
O *-
«> ^j)j J
Piano 1
Test sequence 6
$ r \ ^J V £ ^
m
Test sequence 4
Test sequence 3
dp
r
PT
PPiano 2 £
f Vhi N
S
P J
Piano 1
Figure 3. The six two-part test sequences used in Experiment 2. Piano 1 is the basic piano timbre of the Roland U-20. Piano 2 is the electric piano timbre of the Roland U-20.
Thompson and Sinclair
15
Match Piano 1
f-
£
i^
J>
y
P
Pitch pattern: Foreign Piano 1
L f
y
iJ
f
Durational pattern: Foreign Piano 1
^
^
Pitch pattern: Switch Piano 1
#^£
p
T
r
Durational pattern: Switch Piano 1
i
P
Figure 4. Comparison sequences for test sequence 1. The comparison sequences notated represent the following five conditions: match, pitch-pattern foreign, durational-pattern foreign, pitch-pattern switch, and durationalpattern switch.
16
Psychomusicology • Spring 1993
parison sequences. Individual parts of all test sequences were derived from the same small set of timbres, rhythms, and pitch patterns. Therefore, the main effect is likely to relate to differences in emergent qualities, such as dissonance and rhythm, that resulted from combining the two test sequences. Mean ratings for the six pairs of test sequences notated in Figure 3 are: 3.27, 2.87,3.51, 3.52,3.31, and 3.45. There was a main effect of the type of comparison sequence on ratings, F(4, 68) = 23.41, / K . 0 0 1 . As can be seen in Figure 5, listeners gave higher ratings to match conditions than to mismatch conditions, F(l, 17) = 60.53, p