Automatic recognition of semivowels in word context

2 downloads 0 Views 211KB Size Report
Automatic recognition of semivowels in word context. Hiroya. Fujisaki and ... R. Bakis (Computer Sciences Department, IBM Thomas J. Watson. Research Center ...
S97

91st Meeting: AcousticalSociety of America

ent limitationsor accuracyof theseprocedures.When speechin an office environmentis processed usingreasonable quality microphones, taperecorders, analogto digitalrecording,and digitalprocessing methods,the resultsare frequentlydisappointing[ Holmes,J. Acoust. Soc.Am. 58, 747-749 (1975)]. The purposeof this paperis to describeseveralof the reasonswhy this disappointmentoccursand to

showhow a combinationof highqualitydirectrecordinginto the computerand linear phasefiltering, in conjunctionwith the covariancemethodat automaticallyderivedanchorpoints, canresult in theoreticallyexpectedglottalvolumevelocitywaveformsfor high intensity vowel soundswhere glottal closureis known to occur. Timing relationswith respectto the speechwaveformsare also comparedwith other measures of glottal Closure[Strube,J. Acoust.Soc. Am. 56, 1625-1629 (1974)]. 2:42

RR7. Automatic recognitionof semivowelsin word context. Hiroya

Fujisakiand YasuoSato(Departmentof ElectricalEngineering, Faculty of Engineering,Universityof Tokyo, Bunkyo-ku, Tokyo, 113 Japan)

Basedon an approximateformulationof the coarticulatoryprocessat the acousticlevelantt,a methodfor adaptationto individual differences,a schemefor reliablesegmentation and recognitionof connectedvowelshasalreadybeenestablished[H. Fujisakiet al., SpeechCommunicationSeminar,Stockholm1974]. The present paperdescribes a studyfor the extentionof the schemeto recognition of vowels and semivowels in word context. Values of formant

targetswhichrepresenta commandfor eachphoneme,its duration, as well as the rate of transition between successive phoneroesare extractedasparametersfrom time-varyingpatternsof formant frequencies.Among theseparameters,the command duration is shownto be mosteffectivein discriminatingsemivowels[j! and

/w/from vowels/i/and/u/. A scheme for recognitionof vowels, semivowels, and their sequences basedon formant targetsand commandduration is then proposedand testedexperimentally usingmeaningfuland nonsensewordsuttered by four speakers.

S97

rates. Inherent vocabulary size is derived from these bounds. To evaluatesyntatic constraintsfor finite state languages,an inherent size is computed for each state based on the subsetof words which

lead from that state. Then, averagepath complexity is computed for the language.This languagecomplexity, when comparedwith the total vocabularycomplexity and averagepath length, yields the degreeof syntacticconstraint.Analysisof severaltaskswill be reported. 3:18

RR10. Harpy, a connectedspeechrecognitionsystem.BruceP. Lowerreand B. Raj Reddy (Department of Computer Science, Carnegie-MellonUniversity,Pittsburgh,PA 15213) The Harpy connectedspeechrecognitionsystemis the result of our attempt to understandthe relativeimportanceof variousdesign choicesof two earlierspeechrecognitionsystemsdevelopedat Carnegie-MellonUniversity, The Hearsay-Isystem(D. R. Reddy, et al., IEEE Trans. AU, 229-238 (1973); L. D. Erman, Technical Report, ComputerScienceDepartment,Carnegie-MellonUniversity,

(1974) andthe Dragonsystem[J. K. Baker,Ph.D.Thesis(CarnegieMellon University)]. Knowledgeis representedas proceduresin Hearsay-Iand as a Markov network with a-priori transition probabilities between statesin Dragon. Hearsay-I usesa best first search while Dragonsearchesall the possible(acousticsyntactic)paths

throughthe networkto determinethe optimalpath.Hearsay-Iuses segmentation and labelingand Dragonis a segmentation-free system. Systematicperformanceanalysisof variousdesignchoices resultedin the Harpy systemwhichrepresentsknowledgeasa finite statetransitionnetworkbut withoutthe a-prioritransition probabilities,searches only a few "best" paths,and usessegmentation to reducethe number of stateprobability updatesthat must be done. The system achieves between 70% and 100% sentence

accuracy,dependingupon the task, and runs between 1.5 and 10 timesreal time. Completedetailsof design,implementation,and experimental resultsare given in B. P. Lowerre, Ph.D. Thesis(Computer ScienceDepartment,Carnegie-MellonUniversity,1976).

2:54 3:30

RR8. Continuousspeechrecognitionvia centisecondacousticstates. R. Bakis (Computer SciencesDepartment, IBM Thomas J. Watson ResearchCenter, Yorktown Heights,NY 10598) Continuous speechwas treated as if produced by a finite-state machine making a transition every centisecond.The observableoutput from statetransitionswasconsideredto be a power spectruma probabilisticfunction of the target stateof eachtransition. Using

thismodel,observedsequences of powerspectrafrom real speech were decodedas sequencesof acousticstatesby meansof the Viterbi trellisalgorithm.The finite-statemachineusedasa representationof the speechsourcewascomposedof machinesrepresentingwords,combinedaccordingto a "languagemodel." When trainedto the voiceof a particularspeaker,the decoderrecognized seven-digittelephone numbers correctly 96% of the time, with a

betterthan99%per-digitaccuracy. Results for othertests'ofthe system, including syllable andplloneme recognition, will alsobe given.

RR9. Vocabularyand syntacticcomplexityin speechunderstanding systems.Gary Goodmanand D. Raj Reddy (Department of Computer Science,Carnegie-MellonUniversity,Pittsburgh,PA 15213) Comparingthe realtive performancesof speech-understanding systemshasalwaysbeen difficult and subjectto speculation.Different tasksnaturallyrequiredifferent vocabularies with varyingacoustical similarities.Moreore.r,constraintsimposedby the syntaxmay makerecognitioneasier,evenfor vocabularies with high ambiguity. We define"inherentsize" asa measureof vocabularycomplexity and investigateits relation to recognitionratesand the (apparent) vocabularysize.Word recognitionis modeledas a probabilisticfunction of a Markovprocessusingphonemeconfusionprobabilities derivedfrom an articulatory positionmodel. Multiple pronunciations and allophonicvariationsare allowed. Analysisof maximal word confusions leadsto lower boundsfor expectedwordrecognition

RR11 Parameter-independent techniques in speechanalysis.Henry C. Goldbergand D. Raj Reddy (Computer ScienceDepartment, Carnegie-MellonUniversity, Pittsburgh,PA 15213) Most presentprogramsfor feature extraction, segmentation,and phoneticlabelingof speechare highly dependentupon the specific input-parametric representation used.We have been developing approacheswhich are relatively independentof the choice of parameters. Vectors of parameter valuesare dealt with uniformly by statisticalpattern-recognitionmethods.Suchan approachpermits systematicevaluationsof different representations(be they formants,spectra,LPC's, or analogfilter measurements).Cost can be balancedagainstperformance.Giventhe largenumberof design choicesin this area, carefulattention mustbe paid to methods whichare invariantunder changeof parametricrepresentation.For segmentation,regionsof acoustic changeare detected by functions of parameter vector similarity and by amplitude cues.The basic modelof signaldetectiontheory canbe appliedto quantify the missed/extra-segment error trade-off. Error ratesof 3.7% missed for 19% extra havebeenachieved.To accountfor allophonicvari-

ability whenacquiringlabelingknowledge(training),a clustering algorithm was developedto empirically discoverthe acousticvariationsinherentin eachphoneticclass.Labelingthen proceedsby nearest-neighbormatch. Accuracyfor 29 phonetic classeswas 35% correct in the first choice and 60% correct in the first three choices.

For details,seeH. G. Goldberg,Ph.D. Thesis(Carnegie-Mellon University, 1975). 3:42

RR12. Improvementof intelligibilityand individualityof helium speechby use of autocorrelation function. J. Suzuki and M. Nakatsui(Radio ResearchLaboratories,Koganei,Tokyo 184 Japan) Helium speechis hardly intelligible owing to the expansionof

J. Acoust.Soc. Am., Vol. 59, Suppl. No. 1, Spring 1976

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 54.152.109.166 On: Mon, 04 Jan 2016 21:01:48

Suggest Documents