Pitch and spectral estimation of speech based on ...

1 downloads 0 Views 177KB Size Report
Lois L. Elliott, Chairman. Northwestern University, 2299 Sheridan Road, Evanston, .... SAM speech noise. The speech noise was 70 dB SPL overall, and the.
TUESDAY MORNING,

8 NOVEMBER

GOLDEN

1983

WEST ROOM, 9:00 TO 11:35 A.M.

SessionF. Psychological Acoustics I: PatternProcessing in SpaceandTime Lois L. Elliott, Chairman

Northwestern University, 2299SheridanRoad,Evanston, Illinois60201 Chairman's

Introduction--9:00

Contributed Papers 9:05

Fl. Behavioralevidencefor central codingof azimuth as a functionof

stimulusfrequency. Alan D. Musicant al IDepartments of Behavioral Science andSurgery, University of Chicago, Chicago, IL 60637)

In a previous reportto theSociety [Musicant andButler,J. Acoust. Soc.Am. $uppl.1 72,$93(1982)]wepresented dataontheorderlyprogression ofperceived azimuth of narrowbands of noiseasa function of stimulus frequency. Thepatterns thatresulted havebeencalledSpatial ReferentMaps(SRMs).A recentexperiment suggests thattheseSRMs mustinvolvea centralmechanismfor codinglocationin the horizontal

plane.In thisexperiment, stimuliconsisted of 1-kHz-wide noisebands withcenterfrequency (CF)ranging from4-14 kHz. Therewere13loudspeakers placed15øapartat locations from360øto 180øazimuth. Stimuli werepresented fromonlythatloudspeaker located at 270ø. Subjects, listeningmonaurally, responded withtheloudspeaker position fromwhich theyperceived the soundasoriginating. SRMswerethenconstructed. Subjects nextperformed thesame task,butnowlistened monaurally with 'the pinnacavityof theopenearfilled.Theexternalmeatuswasopen.

This paperdescribes a systemfor processing sonorantregionsof speech, motivated byknowledge ofthehumanauditory system. Thespectral representation is intendedto reflecta proposed modelfor human auditoryprocessing ofspeech, whichtakesadvantage ofsynchrony in the nervefiringpatterns toenhance formantpeaks. Theauditory modelisalso applied to pitchextraction, andthusa temporal pitchprocessor isenvisioned. Thespectrum isderivedfromtheoutputs of a setof linearfilters with criticalbandwidths.Saturationandadaptationare incorporated for

eachfilterindependently. Each"spectral" coefficient is determined by weighting the amplituderesponse at that frequency (corresponding to

mean firingrate)byameasure ofsynchrony tothecenter frequency ofthe filter.Pitchis derivedfrom a waveformgenerated by addingthe (weight-

ed)rectifiedfilteroutputsacrossthe frequency dimension. The system performance isevaluated byprocessing of a varietyof signals, including naturalandsynthetic speech, andresultsarecompared with otherprocessing methods andwithknownpsychoacoustical datafromthesetypes ofstimuli.[Worksupported in partbyNINCD$ andtheSystem Development Foundation.]

Azimuthaljudgements of stimulus locationwerecollected andcompared

9:50

to the data collectedearlier. Correlationsbetweenthe meansof the azi-

muthaljudgements ateachfrequency forthetwoconditions ranged from

F4. Effectof amplitudemodulationuponfusionof spectralcomponents.

0.81-0.97forsevensubjects. Thehighdegreeofcorrelation suggests that,

Albert S. Bregman,JackAbramson(Departmentof Psychology, McGill University,1205Doctor PenfieldAvenue,Montreal,Quebec, Canada H3A 1B1), and Christopher Darwin (Laboratory of Experimental Psychology, Sussex University, Brighton, England)

at leastfor narrow bandsof noise,a centralmechanismmust exist that codesazimuthalsoundsourcelocationasa functionof stimulusfrequen-

cy. alPresent address: Department of Neurophysiology, Universityof Wisconsin,Madison, WI 53706.

Westudied theperceptual integration oftwocomplex tones, eachsent to bothearsandformedby amplitudemodulation (AM) of a carrierfre9:20

F2. Sidedhess and perception for singleechoof someordinarysounds.

TerryS. Zaccone andEarl D. Schubert (HearingandSpeech Science, StanfordUniversity,Stanford,CA 94305)

Nine differentsound-sample sequences rangingin temporalpredicta-

quency CFbyasinusoid withmodulation frequency MF. Onetonealways had CF = 1500Hz and MF = 100Hz. The other had CFs around 500 Hz and MFs around 100 Hz. Both harmonicand inharmonicpartials,pro-

ducedbyAM, wereemployed. Themethod involved studying thecompetitionbetween twoorganizations: (a)thefusionof thetwotones,and(b) thetendency ofthehighertobestripped outofthemixturebya competing

bilityfromwhitenoisetoEnglish speech wererecorded witha single echo delayed fromzeroto 100ms.Thesounds wererecorded in thepresenta-

sequential organization. Fusionwasbestwhenthebothtoneshadthe

tion modesof monaural,dichotic,and mixed(originalto both ears,de-

harmonicseries.FusionwasenhancedwhenAM appliedto the two tones

layedin one).Thesinglerepetition intensity wasequalto,and3 and6 dB greater than,thatof theoriginalsound. Subjects wereaskedto indicate whenthe echowasperceived and,in separate tests,on whichsidethey

wasin phase. Results relatetotheperceptual separation of simultaneous

perceived thesound. In general, theabilityofthesubjects to choose the

same voice.

sideoftheleadingsignaldiminished astherepetition delayincreased past 20ms.Theywereexpected to perceive temporal orderratherthansidednessasthe delaybecamelongerthan20 ms.Evenfor soundswith pronounced envelope, thesubjects werenotableto regainidentification of sidedness out to 100-msdelay. Presentationof differentsoundtypesre-

sultedin significant differences in performance. Whitenoisepresented the mostdifficulttask,asshownby thelowpercentage of identification of the leadingear.Thefemalesingerandviolinyieldedthehighestpercentages for leading-earidentification. 9:35

F3. Pitchandspectralestimation of speech basedonauditorysynchrony

model.Stephanie Seneft(Departmentof ElectricalEngineering & Computer Science, Rm.36-521,Massachusetts Institute ofTechnology, Cambridge,MA 02139)

$9

J.Acoust. Soc.Am.Suppl. 1,Vol.74,Fall1983

sameMF, evenwhentheresultingpartialsdid notformpartof thesame

voicesand favora theoryin whichbasilarmembraneoutputsthat are

amplitude modulated by thesameglottalpulsewillbeallocated to the

10:05

F5.

Modulation

transfer function using temporal-probe tones.

Christopher AhlstromandLarry E. Humes(Divisionof Hearingand Speech, VanderbiltUniversity, Nashville,TN 37212)

Psychoacoustic MTFsweremeasured byembedding a probetonein SAM speech noise.The speech noisewas70 dB SPLoverall,andthe modulation depthof thisnoisewas60 dB.Themodulation frequencies were:0, 2.5, 5, 10,20,and35 Hz. Thedurationof eachtonepipwas4.7 ms

forfrequencies 500,1414,and4000Hz. Bandwidths (3dB)at eachofthe frequencies were19,22,and18Hz. Skirtslopes werebetween 6 and7 dB/ octave. Simplemasking nearlypredicted changes in theseMTFsin noise. A Speech Transmission Index(STI)derived fromtheMTFsin noisewas nobettercorrelated (r = 0.95)withspeech recognition scores thanwasthe

106thMeeting: Acoustical Society ofAmerica

$9

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 179.61.155.116 On: Tue, 19 Jul 2016 23:16:43

Suggest Documents