Voice Denumerability in Polyphonic Music of Homogeneous Timbres Author(s): David Huron Source: Music Perception: An Interdisciplinary Journal, Vol. 6, No. 4 (Summer, 1989), pp. 361382 Published by: University of California Press Stable URL: http://www.jstor.org/stable/40285438 Accessed: 06/10/2010 17:40 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=ucal. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact
[email protected].
University of California Press is collaborating with JSTOR to digitize, preserve and extend access to Music Perception: An Interdisciplinary Journal.
http://www.jstor.org
© 1989 by the regents of the university of California
Music Perception Summer 1989, Vol. 6, No. 4, 361-382
in PolyphonicMusic VoiceDenumerability of HomogeneousTimbres DAVID HURON University of Nottingham An experiment was carried out to determinelimitations in listeners' abilities to identify the number of concurrently sounding voices in polyphonic textures. As the number of concurrent voices in a musical texture increases, expert musicians are both slower to respond to the addition of new voices and more prone to identify incorrectly the number of voices present. For musical textures employing relatively homogeneous timbres, the accuracy of identifying the number of concurrent voices drops markedly at the point where a three-voice texture is augmented to four voices. Beyond three voices, confusions become commonplace; the most frequent type of confusion is underestimation of the number of voices present. Voice entries were found to be perceived more easily than voice exits, and entries of outer voices were found to be identified more easily than entries of inner voices. Compared with a nonmusician subject, musicians were found to be more accurate and consistent in denumerating concurrent voices- suggesting that an awareness of textural density may be a musically relevant skill.
Introduction One of the distinctive features of music, compared with other auditory phenomena, is the deliberate juxtaposition of multiple concurrent sounds. The contrast with speech perception is particularly striking. In the classic "cocktail party" situation, attention mechanisms are deployed, not to follow several conversations concurrently, but rather, to allow listeners to follow a single conversation in noisy surroundings or to switch attention from one conversation to another. The preference for single-source speech perception is reinforced through social conventions in small groups that restrict speech acts to a single voice at a time (Sloboda & Edworthy, 1981). Although musicians distinguish a category of music on the basis of singlesource sounds (monophony), in practice comparatively little music is composed in this form, especially in recent centuries. In music, multiple concurrent sources are the norm, whereas in speech, multiple concurrent sources are the exception. Requests for reprints may be sent to David Huron, Conrad Grebel College, University of Waterloo, Ontario N2L 3G6, Canada. 361
362
David Huron
If the perceptionof concurrentsourcesis importantin music,one would expectto see evidenceof listeners'abilitiesto attendto multipleconcurrent sources.The historyof perceptualpsychologysuggeststhat one mightalso see limits in listeners'abilitiesto attend to severalsourcesconcurrently. Posed as an empiricalquestion, we might ask, "how many concurrent thingscan musicianshear?"In orderto answerthis question,we mustfirst clarifywhat is meant by a "thing"- for clearly,even in the perceptionof speech,many"things"(suchas harmonicsandformants)arepresentsimultaneously. The term musiciansuse to identifyone of a seriesof concurrentsound activitiesis theword "voice."Althoughderivedfromthenotionof a human voice, the musicaltermconveysa somewhatdifferentmeaning- a meaning thatwill be followedthroughoutthis article.By "voice,"musiciansmeana single"line"of sound, more or less continuous,that maintainsa separate identityin a soundfieldor musicaltexture.Severalvocalistssingingin unison thus constitutea singlemusical"voice"by virtueof the cohesionof the line of soundand also the obliterationof the individualidentityof the componentsingers'voices. Independentlyof musicians,psychologistshave also devisedtheirown terminologyto characterizethe phenomenonof "linesof sound."Bregman and Campbell(1971) coinedthe term"auditorystream"to identifya basic auditorypercept that maintainsa perceptualunity and continuityover time. McAdams(1982) has coined the term "auditoryimage"as a more embracingconcept for both short-livedand extendedsound emanations; the word "image"suggests a more concretemental representationof a soundas some sort of realsource- suchas a bell or a lawnmower.It is not yet clearwhethera "voice"is equivalentto an auditory"stream"or "image," but clearly,in both cases,psychologistsand musiciansaregrappling with how to characterizeauditory"things"that continueto exist for an appreciableperiod of time and that yet may evolve with respectto pitch, timbre,or variousotherparameters.If the psychologicaland musicalconceptsarenot equivalent,we mightsupposethat theyareat leastparallel. Experimentalworkin the areaof auditorystreamsegregationhasdelineated a number of factors that contributeto the formationof auditory streamsor images (Heise & Miller, 1951; Norman, 1967; Bregman&c Campbell,1971; Dowling, 1973; van Noorden, 1975; Dannenbring& Bregman, 1976; Bregman & Dannenbring, 1977; Bregman, 1978; Bregman& Pinker,1978; McAdams& Bregman,1979; McAdams,1984; etc.). But this work has not yet becomesufficientlyformalizedfor it to be possibleto parseany arbitraryarrangementof sinusoidalfrequenciesinto
Voice Denumerability in Polyphonic Music
363
theirperceivedstreamsor images.Music, on the other hand, has formal notionsof a voice as encapsulatedin thenotationof polyphonicmusic.Musical notation might, therefore,provide a convenienttemplate against whichlisteners'judgmentsof concurrentactivitiesmaybe compared. In orderto answerthe question,"How manyconcurrentthingscan musicianshear?"we must also clarifywhat constitutes"hearing"something. Slobodaand Edworthy(1981) studiedlisteners'abilitiesto perceivetwo contrapuntallines by examiningtheirabilitiesto detecttonallyincoherent pitcherrors.The methodused by Slobodaand Edworthyis most usefulin studyingintervoiceperceptualmechanisms.However,the detectionof harmonicallyincoherenterrorsin a soundfieldconsistingof a numberof concurrentvoiceswould not constituteevidenceof the listener'sapprehension of eachindividualcomponentline. (Indeed,Sloboda& Edworthycontend thatlistenersareunableto attendto morethanone voice at a time andthat somethingakinto figure-groundreversalsallow the listenerto be awareof otherconcurrentvoices.) To the extentthat a listeneris able to maintaina concretementalimage of a voice, one mightexpectthe listenerminimallyto be able to reportthe numberof such imagesheld at any givenmomentin time- notwithstandinganyconfusionsintroducedby theprocessesof introspectionandreporting. Thus,we mightoperationalizethe notion of "hearinga voice"as a listener's ability to count or denumeratecorrectlythe numberof notated voicesin a conventionallyappropriateperformanceof a polyphonicmusical score.Thevalidityof thisoperationaldefinitionrestson the assumption thatwhen composersnotatea polyphonicpassageconsistingof (say)three voices, the composerdoes not expect or intend the listenerto hear five voicesor two voices,or someothernumberapartfromthree.Inthe genreof musicdubbed"polyphonic,"thereis good reasonto believethat this assumptionis correct. Of theinnumerableauditoryqualitiesthathavebeenfoundto contribute to theformationof auditorystreams,one of themostproblematicis timbre. David Wessel (1979) has producedstrikingsound examplesillustrating the importanceof timbrein the formationof auditorystreams.However, as thereis currentlyno acceptedtheoryof timbre,a studyof the perception of voice concurrencywill encounterseveredifficultiesif voices of differing timbresare used. Until it is possible to create multivoicedstimuli in whichthe voice timbresareoptimallydifferentiated,it will not be possible to measurethe absolute limits of the perceptionof multipleconcurrent soundsources.However,it is still possibleempiricallyto studythe limitationsof voice concurrencyperceptionin soundfieldsthat containhomoge-
364
David Huron
neous timbres. To this end, an experimentwas devisedto examinethe limitsof concurrentvoice denumerationin polyphonicmusic.Ingeneraloutline,the experimentconsistedof theplayingof a recordedpolyphonicworkto expertsubjects,whose judgmentsof the numberof concurrentvoices were gathered continuouslythroughoutthe performance. Method SUBJECTS
Six subjects participated in the experiment. Five subjects were expert musicians- four were faculty members in a university school of music, and one subject was a doctoral student in composition at the same institution. A sixth subject was musically naive, but was of similar general intellectual achievement. Subjects were anonymous volunteers.
STIMULUS
The musical work used as the stimulus was chosen to satisfy several needs. The music would have to maintain a largely homogeneous tone color between the voices and would also need to maintain a relatively consistent balance in loudness between parts. In addition, instruments such as the piano and harpsichord would need to be excluded on the grounds that they do not produce sustained tones. For these reasons, contrapuntal works for solo organ were examined. The work chosen was J, S. Bach's Fugue in E-flat major from the Ciavierübung Part III- BWV 552 ("St. Anne") for organ. This work is unique in that it contains three separate fugai expositions and so contains a considerable degree of textural variety. Specifically, the St. Anne contains "embedded" instances of one- and two-part writing, which are not to be found in fugues having a single exposition. The St. Anne fugue also employs up to five contrapuntal voices and so makes it possible to study more than the four concurrentpolyphonic parts more commonly found in fugues. Two recordings of the work were used as stimuli: recordings by Helmut Walcha and by Peter Hurford.1 By coincidence, one of the subjects knew the selected work intimately, having played the work in concert and in broadcast. Differences between individual subjects are discussed in the analysis section. Subjectswere read standardized instructions outlining the purpose of the experiment and the method of response. They were told that the purpose of the experiment was "to study the clarity with which you can track different voices or parts in a musical work." Subjectswere instructed to press numbered keys (containing labels from "0" to "10"), corresponding to the number of voices or parts heard at any given moment in time, and to keep the key depressed for as long as the same number of voices was heard. Subjectswere also instructed to respond as soon as possible to perceived changes in the number of concurrent voices. Trials began with the subject holding down the key marked "0" until the music started. The music was reproduced over loudspeakers or headphones according to preference or as dictated by circumstance.
1. Deutsche Grammophon APM 14049 and Decca compact disk 417 711-2, respectively.
Voice Denumerability in Polyphonic Music
365
Subjectswere given two practice trials using the five-voice fugue No. 4 in C-sharp minor from the first volume of Bach's Well-Tempered Clavier. After the second trial of the practice stimuli, a set of posttest questions were discussed. In the case of the musically unskilled subject (Subject 6), the practice session was extended somewhat: the opening expositions of several fugues were patiently reviewed- the experimenterpointing out the entries of various voices and coaching the subject through simple passages of one, two, and three voices. Every effort was made to build up the confidence of Subject 6 and to ensure a high degree of motivation. After the two practice trials, subjects proceeded to the St. Anne fugue; except for Subject3, all subjects did two replicate trials of the St. Anne. As with the practice trials, data were recorded on cassette tape, with one channel logging the music directly from the stimulus recording while the second channel recorded coded signals produced by the subjects' responses. Furtherposttest questions and discussion completed the experimental sessions.
DATA TRANSCRIPTION
The recorded data were manually encoded on printed scores and subsequently transferred to a computer-encoded version of the score. Each key depression was assigned to the particularvertical sonority in the score during which the key depression was made. Any response occurring before the onset of the next note was assigned to the sonority represented by the currently sounding note. Responses were transcribed twice in order to ensure accuracy. The chosen encoding procedure reduces the variability of encoding judgments compared with encoding responses to the nearest vertical sonority; but the chosen procedure achieves this reduced variability at the price of introducing a constant error that extends the average response time by approximately 0.1 sec. In order to achieve the greatest familiarity with the data, an explicatory script was written for each trial in which an attempt was made to explain the cause of each response given by the subject. There are many possible sources that can account for the errors in responses. An obvious source of error is the reaction time in recognizing changes of texture and in responding by depressing keys. A second source of error arises from the speed with which the number of voices change during the course of the work. Some passages contain rapid changes of texture. In the posttest debriefingmany of the subjects mentioned the difficulty of deciding whether brief rests should be thought of as the termination of a voice. In Bar 84, for example, a brief passage occurs in which two voices alternate in an overlapping call-andresponse fashion (Figure 1). In this passage, Subject 1 responded differently in each of his two trials. In the first trial, Subject 1 indicated "2 voices" throughout the entire passage, whereas in the second trial, Subject 1 made admirable (if belated) attempts to alternate between one and two. Neither of these responses could be construed as being wrong. Such passages present obvious difficulties for analysis and underscore the need for a careful and coherent analysis strategy.
Fig. 1. Example of a problematic passage showing rapid changes of texture.
366
David Huron
Analytic Results ANALYSIS STRATEGY
Conceptually,we mightposit two listeningstrategiesby whicha listener is ableto identifythe numberof concurrentvoicesin a musicaltexture.One strategyis for the listenerto attendto the entriesand exits of voices, and maintaina runningaccountof the additionsandsubtractions.Evidencefor this strategymight be found where a listenergets "out of step" with the actualnumberof concurrentvoices, but continuesto incrementand decrementvoicescorrectlyas theyenterandexit fromthetexture.A secondstrategy may be that for any given passagewith a stablenumberof voices, the listener"polls"the sound field and mentallycounts the numberof voices present.Evidencefor this secondstrategymightbe foundin long passages of constanttexture where, after a period of time, a listenerrevisesa responseto a differentvalue. In the posttestquestionperiodfor the practicetrials,two subjectsindependentlymentionedthat whenevera new voice entersthe texturethere was a temptationto incrementthe responsenumberautomatically.Both subjectsstressedthat one must be cautiousin respondingin this way becauseoccasionallyone of the otherconcurrentvoices discretelydropsout of the textureat the same time or soon afterward.Theseintrospectiveaccountssuggestthat subjectsare awareof, and may use, both of the strategiesoutlinedabove.Ouranalysisof the datawill attemptto captureaspects of voice concurrencyidentificationthat might be a resultof eitherone or both of thesetwo strategies.In the firstinstance,the analysiswill stressthe identificationof voice entriesandexits, while in the secondpart,the analysis will focus on extendedpassagesof stablenumbersof voices- what will hereafterbe called"isotexturalpassages." ANALYSIS OF VOICE ENTRIES AND EXITS
Reaction Times for Voice Entries
The datawere analyzed,firstwith respectto reactiontimesfor voice entries.Beforedatacollection,30 siteswereidentifiedin the scorewherenew voicesareclearlyintroducedinto the musicaltexture.(A detailedlist of entrysitesis givenin Appendix1.) All 30 sitesare"mono-incremental" that is, a textureof N voices is followedby a textureof N + 1 voices. Reaction timesweredeterminedby firstmeasuringthe musicaldurationthatelapsed betweenthe entryof the new voiceandthe subject'sresponse.Thisduration was then translatedinto an absoluteelapsedtime by calculatingthe tempo of the specific recordedperformanceat the correspondingpoint in the
VoiceDenwnerabilityin PolyphonicMusic
367
score.Reactiontimeswere measuredfor eachof the subject/trialsin which thevoiceentrywas identifiedcorrectly.Correctidentificationof a voice entry was definedas any case wherethe subjectcorrectlyindicatedboth the numberof voices beforethe voice entryand the numberof voices afterthe voiceentry- butwithin20 verticalsonoritiesafterthe entrysite andbefore anyotherchangeof texture. Fromthe collectedresponsesfor the musiciansubjects,a total of 172 entrieswereidentifiedcorrectlyaccordingto this criterion.The musicallynaive subject(Subject6) identifiedcorrectlyonly threeentrysites- a figure thatmaylie close to the chancelevel,althoughthislevelis difficultto calculate giventhe natureof the stimulus.Owingto the verypoor performance on thepartof Subject6, the datafor thissubjecthas beenexcludedfromthe aggregateanalysisof voice entriesand exits. The averagereactiontimesfor the varioustexturaldensitiesaregivenin Figure2. It is clearfromthis figurethat, as the numberof concurrentvoices in the textureincreases,the musiciansubjectsareslowerto identifythe additionof new voices. By itself, this relationshipis difficultto interpret.Increasingreactiontimes might suggestgreaterdifficultyin identifyingthe numberof concurrentvoices for texturesof increasingdensity.Alternatively,the increasingreactiontimesmightsupportthe hypothesisthat a sequential"counting"processis occurring(i.e., it takes five times longerto countto thenumberfivethanit doesto countto thenumberone).Thelatter hypothesiswould assumethatthe "polling"strategywas predominantand
Fig.2. Responsetimesto voice entries.
368
David Huron
thatvoice entriesmerelyact as a triggerfor the réévaluationof the textural density. Unrecognized Voice Entries
In additionto the 172 correctlyidentifiedentries,the musiciansubjects failedto identify91 entrypoints.2Subjectseitherfailedto identifycorrectly the numberof voices in the antecedenttexture,or theirfirstresponseafter the entrysite failedto indicatethe correctnumberof voicesin the new texor the subjectmadeno changein responsewhatsoture ("misidentified"), Altoeverthroughoutthe periodof the changingtexture("unidentified"). be voice entries and unidentified the misidentified may grouped gether, togetheranddubbed"unrecognized"voiceentries.Comparingthe number of such unrecognizedvoice entriesto the numberof correctidentifications giveus a numericalproportionthatprovidesa usefulmeasureof the success with which variousentriesareidentified.Figure3 shows the proportionof voice entriesthatwereincorrectlyidentifiedor missedby the musiciansubjects. As the numberof concurrentvoicesin a musicaltextureincreases,listenersareless ableto identifycorrectlynew voice entries.Inparticular,thereis a substantialworseningof performancein circumstanceswhere a threevoice textureis augmentedto four voices. Finally,we may point out that
Fig. 3. Unrecognized voice entries. 2. Seven entries are missing because of the no response data for Subject 3.
Voice Denumerability in Polyphonic Music
369
thereis a broadagreementbetweenthe accuracydatagivenin Figure3 and the reaction-timedata given in Figure2. This agreementlends supportto theinterpretationthatthe degradationof reactiontimesfoundin the denser texturesis the resultof increaseddifficultyin performingthe task. Inner Voice Entries
In the entryof new voicesto the texture,the pitchrelationshipof the enteringvoice to the otherconcurrentlysoundingvoiceswas foundto be important.A distinctioncan be madebetweenvoice entrieswhose regionof pitchactivity(tessitura)is embeddedbetweenactivevoicesboth higherand lowerthanitself (i.e., "innervoice")andthosevoice entrieswhose tessitura is the highestor lowest of the currentlyactivevoices (i.e., "outervoice"). With respectto responsetimes and unrecognizedentries,inner-voiceentriesshowednoticeabledifferencesfromoutervoice entries.Tables1 and2 tabulatethe differences. The entryof an innervoice is moreapt to be incorrectlyidentifiedor not identifiedat all. Evenwhen innervoice entriesare correctlyidentified,on averageit takesmorethan twice as long for a subjectto respondthan for a correspondingoutervoice entry.In short,entriesof outervoices are identifiedmoreeasilythanentriesof innervoices.
TABLE 1
Inner Voice/Outer Voice Response Times Entry
Average Response Time
Inner voice Weighted outer voice
2.299 sec (20) 1.028 sec (110)a
Calculated by taking the average response times for three-voice, four-voice, and fivevoice outer-voice entries and weighting their averages according to the proportions of three-, four-, and five-voice inner-voice entries before producing an aggregate value. Note that the response times for the total outer voice entries (0.817 sec [152]) and for the total outer-voice entries for three, four, and five voices (0.933 sec [110]) are also lower than the inner-voice entries.
TABLE 2
Unrecognized Entries Entry Inner voice Outer voice
Errors 55.6% 32.4%
(25/45) (73/225)
370
David Huron
Timbrai Effects
Anothereffectwas foundwith respectto entriesmadein the pedalvoice. Althoughthe fourkeyboardpartssharemoreor less homogeneoustimbres throughoutthe performances,the pedalvoice has severaluniquefeatures. Good performancepracticecalls for the registrationof the pedalvoice to engagestops of the pedal divisionratherthan couplingto the activekeyaddsa 16' or boarddivision.In addition,the pedalvoice characteristically suboctavestop to the texture.Hencethe pedalvoice differsin both timbre and loudness from the other four voices. With respectto both response timesandunrecognizedentries,entriesof thepedalvoicedisplayeda different patternof resultsfromthatof nonpedalvoices (Tables3 and4). Entries in the pedalvoice aresomewhateasierto identify. Voice Exits A numberof siteswereidentifiedin the scorewherevoicesexit fromthe musicaltexture.Althoughsome of the sites are "mono-decremental" (N voices followed by N - 1 voices), many of the sites retiremultiplevoices simultaneously.Becausetheworkbeginswith one voiceandendswith a full complementof voices, and becausethereareseveral"multiple-decrement" points, the numberof voice-exitsites is smallerthan the numberof voiceentrysites. Furthermore,many of the exit sites are unsuitablefor analysis becausethey areembeddedin rapidchangesof texture.In orderfor a voice table 3 PedalVoice/NonpedalVoice ResponseTimes Average Response Time
Entry
Pedal voice Matched nonpedal voice
0.946 sec 1.978 sec
(32) (29)a
aMatched to pedal entries in terms of the proportions of four- and five-voice outer-voice entries.
TABLE 4
Unrecognized Entries Entry
Pedal voice Nonpedal voice
Errors
57.4% 64.2%
(31/54) (52/8 l)a
aMatched to pedal entries in terms of the proportions of four- and five-voice outer-voice entries.
Voice Denumerability in Polyphonie Music
371
exit to be selectedas an analysissite, exits had to be both precededand followed by an isotexturalpassageof at least one bar. Only 9 suitablevoiceexit siteswere found, as contrastedwith 30 suitablevoice-entrysites. (Appendix2 providesa completelist of selectedexit sites.) Onceagaina correctidentificationof a voiceexit was definedas anycase wherethe subjecthad correctlyindicatedthe numberof voices beforethe voiceexit andcorrectlyindicatedthe numberof voicesafterthevoiceexitbut within 20 verticalsonoritiesafter the exit site and before any other changeof texture.Not enoughcorrectresponsesweremadeto permitreliable comparisonsof reactiontimesor to comparevariousexit conditions. Table5 comparesthe proportionsof unrecognizedresponsesfor voice entriesand voice exits. Voice entriesthus appearto be substantiallyeasierto recognizethanvoice exits.
ANALYSIS OF ISOTEXTURAL PASSAGES
Apartfroman analysisof the entriesand exits of voices,it is possibleto of concurrentvoicesby examiningreexamineerrorsin the denumerability sponsesto isotexturalpassages that is, extendedpassageshavingan unchangingnumberof concurrentvoices.Beforedatacollection,26 segments of the work were identifiedthat were unequivocalin the numberof voices presentthroughoutthe segment(Appendix3). Nearlyhalf of the workwas discardedas unsuitablefor analysisbecauseof eitherfrequentoccurrences of rests or unisons or becauseof the introductionor retirementof voices fasterthan once a bar. The averagelengthof the isotexturalsegmentswas two andone-halfbars,with the fourshortestsegmentsbeinga barin duration. Of the 26 selectedpassages,19 maintainedsolid part-writingthroughout, with no rests in any of the concurrentlysoundingvoices and no unisons. Sevenof the 26 passagescontainedminorpointsof ambiguity:one of the passagescontaineda quarter-noterest in one of the parts.The other 6 passagescontainedbriefincidencesof unisonpitchsharingby two voices
table 5 Proportions of Unrecognized Voice Entries and Exits Unrecognized Voice
Entry Exit
Proportion
34.6% 63.0%
(91/263) (51/81)
372
David Huron
although no unison was longer than a quarter note in duration, and only one passage contained more than two such instances. Because all of these aberrations occurred in musical passages exceeding three bars in length, they were deemed insignificant. Figure 4 shows the most contentious of the selected isotextural passages. The music immediately preceding the isotextural passage consists of three voices, with the fourth voice having entered in the alto voice at the beginning of Bar 9. The first seven quarter-notes of Bar 9 have been omitted from the analysis segment in order to minimize effects of reaction times (see below). The isotextural segment lies between the two vertical lines. As can be seen in Bar 9, there are two antecedent instances of unisons (one of which is a half note in duration). The analysis passage itself contains two instances of unisons. Again, because of the length of the passage, it was felt that these unisons were unlikely to have a significant impact on listeners' responses. In order to isolate the isotextural segments from the possible effects of voice entries, the starting positions of the segments were delayed so that at least 3A of the response times for the voice entries occurred before the onset
Fig. 4. One of 26 isotextural passages selected from J. S. Bach BWV 552. Unisons circled.
Voice Denumerability in Polyphonic Music
373
of the analysis segment. In four instances, this procedure shortened the isotextural segment to just under a bar in duration. Although this analysis strategy does not entirely eliminate the effects of reaction times to voice entries, it does considerably reduce their interaction. Thus, the following analysis of isotextural identification errors is largely independent of response times. Having defined the 26 isotextural passages, the corresponding segments were extricated from the response data on a vertical-sonority-by-verticalsonority basis and confusion data assembled. The proportion of confusion errors is plotted for each subject in Figure 5 according to the number of voices in the texture. The solid lines represent trials by musically trained subjects, and the dotted lines indicate two trials of the nonmusician subject. The nonmusician
= Fig. 5. Confusion errors in isotextural passages. Dashed line nonmusician, plus = Musician 4, box= = x = Musician circles Musician Musician 3, 2, 1, triangles signs es = Musician 5.
374
David Huron
hadsignificantdifficultiesin all texturaldensities,althoughthereis a worseningof performanceas the numberof voicesis increased.As notedearlier, duringthe posttest questioning,Subject1 was found to have an intimate knowledgeof the work- havingperformedthe St. Anne fuguein concert and in broadcast,and suggestingthat he couldplay muchof it frommemory. Subject1 displayeda superlativeabilityto identifycorrectlythe number of concurrentvoices. In spite of his knowledgeof the work, Subject1 neverthelessmadenoticeableconfusionerrorsin identifyingfive-voicesegmentsof the work. The other musiciansdisplaysomewhatgreaterconformityin their responses.The most strikingfeatureis the sharpincreasein errorsin the fourvoice textures.Also noteworthyis the reductionof errorsin the five-voice textures.Most of the five-voicetexturesare completedthroughthe addition of the pedalvoice, and it is possiblethat the reductionof errorsin the five-voicesituationis due to the enhancedperceptibilityof the pedal (and outervoice effects)in the entryof the fifthvoice. Overall,as the numberof concurrentvoices in a musicaltextureincreases,listenersare less able to identifycorrectlythe numberof voices present.In particular,there is a markedworseningof performanceat the pointwherea three-voicetexture is augmentedto fourvoices. Theseresultscorroboratethe findingsfor unrecognizedvoice entries. The scope and natureof the perceptualdifficultycan be seen by analyzing the particulartypesof errorsmadein identifyingthe numberof concurrentvoices. The over- and underestimationscan be seen in the confusion matrix.Table6 shows a confusionmatrixfor all of the musiciansubjectsincludingdata for Subject1.
table 6 ConfusionMatrix* Response Actual Number of Voices
One Two Three Four Five
One
Two
Three
Four
Five
3 0 15 6 309* 0 1 19 1368* 63 0 127 1121* 12 0 77 961 608* 14 0 22 165 466 399* 0
note: Asterisk indicates correct responses. aMusician subjects only.
Six
0 0 0 0 4
No Response Total
3 13 4 44 8
No. of Errors
Error (%)
17 8.0 336 96 6.6 1464 1264 143 11.3 1704 1096 64.3 1064 665 62.5
Voice Denumerability in Polyphonie Music
375
The numericalvaluesaretabulationsof the combinedresponsesevident for eachverticalsonorityin the isotexturalpassages.The diagonal(entries markedwith an asterisk)indicatescorrectresponses,whereasall otherentries in the matrix indicate mistakenidentifications.The most common type of confusion is underestimationof the number of voices present. Ninety-twopercentof errorsareerrorsof underestimation.Underestimatingthe numberof voicesby one voice accountsfor 80% of all identification errors.In both the four- and five-voicetextures,the numberof underestimationsconsiderablyexceedsthe numberof correctidentifications. Voice Denumerability As a Musical Skill One might ask whetherthe abilityto determinethe numberof concurrent voices can be properlyconsidereda musical skill. A skill may be definedas a goal-drivenbehaviorthatconsistentlyachievesa levelof attainmentin the pursuitof the goal that, in the absenceof the skill, is not commonly achieved.We now have threelargelyindependentmeasuresof the successof the denumerationof voice concurrency:(1) the proportionof correctidentificationsof voice entries,(2) the proportionof correctidentificationsof voice exits, and (3) the proportionof correctidentificationsof the numberof concurrentvoices in isotexturalpassages.Becausereplicate data were recordedfor five of the six subjects(includingthe nonmusician subject),it is also possibleto measurethe consistencyof performancebetweensuccessivetrialsby eachsubject.Moreover,thisconsistencymeasure canbe evaluatedwithoutmakingany assumptionof what constitutes"correct"behaviorin the denumerationof polyphonicvoices. A consistencymeasurewas calculateby comparingresponsesfor both basis. Accordingto this trials on a vertical-sonority-by-vertical-sonority formof measure,thereis no guaranteewhatsoeverthatthe most consistent subjectwould necessarilyachievethe highestscorein denumeratingvoice concurrency.A consistencyscore of 1.000 would merelyindicatethat the subjectrespondedin an identical fashion for each trial. The combined resultsare given in Table 7. As can be seen, thereis a generallyexcellent correlationbetweenthesefour independentmeasures.The higherthe consistencyof responsesbetween trials, the greaterthe numberof correctly identifiedentriesand exits of voices, as well as the greaterthe scorein the identificationof the numberof concurrentvoicesin isotexturalpassages. As a group,the musiciansdisplayedbothhigherperformancescoresand a markedlygreaterdegreeof consistencybetweenresponsesfor their repeatedtrials.Takentogether,theseresultsindicatea highdegreeof correlation between achievementand consistency- which is the hallmark of
376
David Huron TABLE 7
Results by Subject Intertriai Consistency
Entries Identified
Exits Identified
Isotextures Identified (%)
Musicianswith replicatedata: 1 2 4 5 Average
.839 .827 .728 .679 .768
50 43 39 23 38.75
13 9 4 4 7.5
92.0 66.2 61.2 41.6 65.3
Nonmusician: 6
.399
10
1
33.6
Subject No.
skilled behavior.This implies (but in no way proves) that the ability to denumeratevoice concurrencyis a musicalskill. However,even assumingthat voice denumerationis a musicalskill,one mightstill raisedoubtsaboutits musicalsignificance.In the posttestquestion period,only one musicianclaimedthatthe assignedtaskwas similarto his/hernormallisteningapproachto polyphonicmusic.All subjectscharacterizedthe task as "difficult"or "challenging,"andnearlyall saidthatthey did not normallylistenwith such intensity.Suchresponsesraisedoubtsas to whetherthe abilityto identifythe numberof concurrentvoices in a texturehas anyrealmusicalimportance.Thequalityof performanceshownby the musiciansin the experimentaltrialsis not aptto be representative of the skilllevel foundin ordinarymusicallistening.
Discussion Useof recordedmusicas an experimentalstimulusmayhavethevirtueof preservingsome degreeof "ecologicalvalidity,"however,in usingexisting musicalworks thereare factorsthat inevitablyone wisheswerebettercontrolled.For example,all of the increasesin the numberof voices in the St. Anne fugue are mono-incremental;hence,thereis no controlfor possible effectsof varyingantecedenttextures.Is a listenermoreaptto misidentifya four-voicetexture when precededby a two-voice texture ratherthan a three-voice texture?Anotherinterestingquestionis the signifyingeffectof thematicmaterial.Bach always has voices enter by announcingthe the-
Voice Denumerability in Polyphonic Music
ZII
maticsubject.Would listenersbe more apt to misidentifyentriesthat did not use the fugaisubject?In otherwords,is one of the functionsof thematic materialto operateas a perceptualcue in assistingin the identificationof voice entries?All of these are questionsthat cannotbe answeredgiventhe natureof the stimulusused for this experiment. In designingthis experiment,it was fearedthat musicianswould be apt to assumethat fuguesnormallyhavefourvoices.If this werethe case,then much or all of the confusionevidentwith five voices might be merelyan artifactof listeners'predisposedassumptionof a maximumof fourvoices. Fortunately,all of the musiciansubjectsspontaneouslymadefive-voiceresponsesin at leastone of the practicetrialsusingthe five-voiceFugueNo. 4 Clavier.Hence,we mightconin C-sharpminorfromthe ^Hell-Tempered clude that subjectswere mentallyprimedfor the possibilityof more than fourvoicesin the experimentaltrials.Of thenineexperimentaltrialsforthe musiciansubjects,eightcontainedfive-voiceresponses.Thesefactssuggest thatthe subjectswerenot boundby an a prioriassumptionconcerningthe natureof the stimuli. The case of Subject1 may indicateeitherthat thereare musiciansof exceptional ability in denumeratingvoice concurrencyor that intimate knowledgeof a work significantlyenhancesa listener'sperceptionof the work.On the basisof thisexperimentalone,it is not possibleto saywhichis the case. Nor is it possibleto say to what degreeour sampleof five musiciansis representativeof musiciansat large.
Denumeration versus Estimation In the posttest questionperiod, one of the subjectsdescribedin detail how he thought there were two differenttechniquesfor identifyingthe numberof concurrentvoices. When the numberof voices was small, the subjectsaidhe couldformquitedefiniteimagesof theindividuallines- and so had no difficultycounting them. But when the numberof voices increasedbeyonda certainpoint,he saidthathe couldno longerbe certainof the numberand so was forcedsimplyto "gauge"the numberof voices by comparingthe currenttexturaldensitywith previoustexturesand estimating the numberof voices. In the firstcase he could be sureof his answers, whereasin the second case he was awarethat his answerswere only estimates. When asked by the experimenterwhat he thought might be the thresholdbetween these two approaches,the subject responded"three voices." Inlightof the analysisresults,thisintrospectiveaccountwould appearto havesomevalidity.We mightsupposethatin denumeratingthe numberof
378
David Huron
concurrentvoices in a musicaltextureboth countingandestimationmethods areused.Wherea listeneris ableto formdefiniteindependentimagesof eachvoice, a discreteformof countingtakesplace,whereasin densertexturesthe listeneris forcedto gaugethe overalldensity.It appearsthatin the perceptualdenumerationof sounds of homogeneoustimbre,listenersdo not follow the arithmeticsequence:one, two, three,four, etc. to infinity, but proceed in a manner similar to the counting language of the San bushmen:auditorilywe may count: one, two, three, many- where one might admit only gradationsof "manyness"ratherthan definitediscrete values.
Relationship to Speech Research Superficially,there might appearto be some similaritiesbetween the workreportedhereandexperimentalworkin information-processing limitations in speechresearch.Researchin the perceptionof multipleconcurrent speechstreamshas demonstrateda seriousinabilityof subjectsto attend to more than one voice at a time (Broadbent,1958). When askedto attendto one of two concurrentspeechstreams,auditorsmaybe unableto reportevenwhetherthe languageof the unattendedstreamwas English. However, a number of factors preclude direct comparisonbetween cross-channelspeechtasksandmusic.Firstof all, musicalvoicesmaintaina degreeof intercoordinationthat is virtuallyabsentin multiplespeechtasks (Sloboda& Edworthy,1981). A musicalexperimentcomparableto the multiple-speechresearchwould perhapsentailthe playingof severalunrelated monophonie works concurrently,ratherthan a single polyphonic work.A secondpossibilityis thatthe informationcontentof a singlespeech streammay be greaterthan that normallyfound in a single monophonie musicalvoice. Thus, one mightspeculatethat the use of multipleconcurrentlinescharacteristicof musicbut not of speechmaybe a way of "fillingup" or fully utilizingthe information-processing capacityor bandwidthof musiclisteners.The latterview mightalso accountfor the fact that thereis proportionallyso little monophoniemusicthat does not employlyricsor engagein pseudo-polyphony.A thirddifficultyin relatingthisworkto concurrentspeechresearchis to be foundin the definitionof "hearing"as simply denumeratingthe numberof soundstreamsratherthanas attendingto them. Merely denumeratingthe numberof concurrentvoices is not the sameas attendingto the "content"(semanticor otherwise)of the concurrent lines. Perhapsresultssimilarto those found here would be found in speechperceptionif experimentersmerelyasked listenersto denumerate the numberof concurrentspeechstreamsperceived.
Voice Denumerability in Polyphonic Music
3 79
Conclusion A numberof conclusionsfollow from the analysisof the experimental results.First,musiciansareindeedableto formindependentmentalimages of multipleconcurrentvoices- at least to the extentthat listenersare able to count theirnumber.Whetherthese imagesare formedconcurrentlyor sequentiallyis not known. Althoughabilitiesvaryfrommusicianto musician,all of the musiciansweresignificantlymoreaccurateandconsistentin denumeratingconcurrentvoicesthan a nonmusician.This suggeststhat an awarenessof texturaldensitymaybe a musicallyrelevantskill.However,as the numberof concurrentvoices in a musicaltextureincreases,listeners' abilitiesto denumerateconcurrentvoicesdegradenoticeably.Listenersare both slower to respondto the additionof new voices and more prone to identifyincorrectlythe numberof voicespresent.Formusicaltexturesemploying relativelyhomogeneoustimbres,the accuracyof identifyingthe numberof concurrentvoices dropssignificantlyat the point wherea threevoice textureis augmentedto fourvoices. Beyond three-voicetextures, confusions become commonplace.The most common type of confusion is underestimationof the number of voices. Underestimatingthe numberof concurrentvoices by one voice accountsfor over 80% of the observedidentificationerrors. In general,voice entriesweremoreeasilyperceivedthanvoice exits. The abilityto identifythe entryof a voicevariedaccordingto the circumstances of its entry.Forexample,entriesof outervoiceswereidentifiedmoreeasily than entriesof innervoices. This differencebetweenthe perceptibilityof inner-voiceentries versus outer-voiceentries lends support to Heinrich Schenker'sview that outerpartsare perceptuallymoreimportantthan innerparts. Entriesemployingthe pedaldivisionof the organwere significantlyeasierto hearthanmatchednonpedalentries.Thiseffectwas attributedto differencesin timbrebetweenthe pedalvoice andthe manualvoices. Because the differencein timbrebetweenthe pedal and manualpartsis musically rathermodest,thisresultsuggeststhattimbremaybe a verystrongfactorin the identificationof concurrentpolyphonicvoices.As this experimentwas purposelylimitedto the conditionof polyphonictexturesemployinghomogeneoustimbres,it would be wrongto extrapolatetheseresultsto textures employingvoices with heterogeneoustimbres- such as, say, a woodwind quintet.The explorationof the influenceof timbrein assistingvoice denumerationwill needto await furtherexperimentalwork.3 3. The author extends his thanks to Dr. Mark Haggard of the Institute of Hearing Research (Nottingham, U.K.) and Dr. Robert Pascali for offering valuable comments on an earlier draft of this article. This research was carried out under the financial support of the Social Sciences and Humanities Research Council of Canada.
380
David Huron
References Bregman, Albert. Auditory streaming: competition among alternative organizations. Perception and Psychophysics, 1978, 23, 391-398. Bregman, Albert, & Campbell, Jeffrey. Primaryauditory stream segregation and the perception of order in rapid sequences of tones. Journal of Experimental Psychology, 1971, 89, 244-249. Bregman, Albert, & Dannenbring, Gary. Auditory continuity and amplitude edges. Canadian Journal of Psychology/ Revue Canadienne de Psychologie 1977, 31, 151-159. Bregman, Albert, & Pinker, S. Auditory streaming and the building of timbre. Canadian Journal of Psychology, 1978, 32, 19-31. Broadbent, D. E. Perception and communication. London: Pereamon, 1958. Dannenbring, Gary, & Bregman, Albert. Effect of silence between tones on auditory stream segregation. Journal of the Acoustical Society of America, 1976, 59, 987-989. Dowling, Walter James (Jav)- The perception of interleaved melodies. Cognitive Psychology, 1973, 5, 322-337. Heise, George, & Miller, George. An experimental study of auditory patterns. American Journal of Psychology, 1951, 64, 68. McAdams, Stephen. Special fusion and the creation of auditory images. In M. Clynes (Ed.), Music, mind, and brain: the neuropsychology of music. New York: Plenum Press, 1982. McAdams, Stephen. Spectral fusion, spectral parsing and the formation of auditory images. Ph.D. dissertation, Stanford University, 1984. McAdams, Stephen, & Bregman, Albert. Hearing musical streams. Computer Music Journal, 1979, 3(4), 26-43, 60. van Noorden, Leo Paulus A. S. Temporal coherence in the perception of tone sequences. Netherlands: Druk Vam Voorschoten, 1975. Norman, Donald. Temporal confusions and limited capacity processors. Acta Psycholo«ca, 1967, 27, 293-297. Sloboda, John, & Edworthy, Judy. Attending to two melodies at once: the effect of key relatedness. Psychology of Music, 1981, 9(1), 39-43. Wessel, David L. Timbre space as a musical control structure. Computer Music Journal, 1979, 3(2), 45-52.
Voice Denumerability in Polyphonic Music
Appendix 1: Voice Entries (30) Bar No. 1 3 7 9 14 19 21 22 31 39 43 45 49 51 59 62 73 77 83 85 89 90 92 101 104 105 108 112 113 114
Note First note First quarter First quarter First quarter First note Second quarter Seventh quarter Fifth quarter First note Third eighth Third eighth Third eighth Third eighth Third eighth Fifth quarter Third eighth Third eighth Third eighth Fourth eighth Fourth eighth Fourteenth sixteenth Tenth eighth Fourth eighth Fourth eighth Tenth eighth Tenth eighth Fourth eighth Fourth eighth Fourth eighth Fourth eighth
No. of Voices (antecedent/consequent) 0/1 1/2 2/3 3/4 4/5 4/5 2/3 3/4 4/5 1/2 2/3 3/4 2/3 3/4 1/2 2/3 2/3 3/4 1/2 2/3 2/3 3/4 4/5 3/4 3/4 4/5 4/5 2/3 3/4 4/5
Appendix 2: Voice Exits (9) Bar No.
Note
15 16 37 47 71 82 87 102 107
Third quarter Third quarter Second quarter Second quarter Third eighth Fourth eighth Tenth eighth Seventh sixteenth Seventh sixteenth
No. of Voices (antecedent/consequent) 5/4 4/3 5/1 4/2 3/2 4/1 3/2 4/3 5/4
381
David Huron
382
Appendix 3: Isotextural Passages (26) Bar 1 3 7 9 14 24 31 37 39 43 45 47 49 51 59 65 71 74 77 82 85 88 105 111 112 114
Note
to
First note Second quarter Third quarter Eighth quarter Second quarter Fourth quarter Third quarter Sixth eighth Seventh eighth Sixth eighth Seventh eighth Seventh eighth Ninth eighth Seventh eighth Twelfth eighth Eleventh eighth Eleventh eighth Third eighth Ninth eighth Fifth eighth Eighth eighth First note Twenty-third sixteenth Sixth eighth Ninth sixteenth Sixteenth sixteenth
Note
Bar 2 6 8 13 14 30 36 39 43 45 46 49 51 56 62 67 73 76 81 83 87 89 107 112 113 117
Last note Last note Last note Last note Last note Last note Last note Second eighth Second eighth Second eighth Last note Second eighth Second eighth Fourth eighth Second eighth Sixth eighth Second eighth Last note Ninth eighth Sixth sixteenth Ninth eighth Twelfth sixteenth Sixth sixteenth Sixth sixteenth Sixth sixteenth Last note
note: Passagesareinclusivefrombar,noteto bar,note.
No. of Voices 1 2 3 4 5 4 5 1 2 3 4 2 3 4 2 3 2 3 4 1 3 2 5 2 3 5