Experiments on automatic detection of filled pauses using ... - INESC-ID
Recommend Documents
inhalation breath noises are becoming more important. One issue is the ..... âA Tramp Abroadâ than in the German TTS training corpus. This is not surprising ...
The experiment also investigated the effects of the prompts on utterance form and meaning of the callers. ... help is desired. Self-monitoring/error detection hypothesis. .... the TeliaSonera Customer Service Call Center in. Sundbyberg ...
play a role in turn-final floor-holding strategies, while NLFP is mostly connected to ... Keywords: Child speech; vowel lengthening; filled pause; floor-holding;.
Michiko Watanabe1, Yasuharu Den2, Keikichi Hirose3 and Nobuaki Minematsu4. 1Macquarie ..... [9] Takanashi, K., Maruyama, T., Uchimoto, K., and Isahara,.
shaped microphone arrays each with four omnidirectional sensors, and a ... The results of the first experiments of acoustic source localization are shown in Table ...
In this wor@, the automatic control and inspection of welding defects is made ..... A neural Networ@ For Weld penetration control In Gas tungsten Arc Welding, ...
pathological axial elongation and eyes that deviates from the ... Singapore. L. Tong is with the Singapore National Eye Center and the National. University of ...
D. U. Seo b. , G. C. Park b. aKorea Atomic Energy Research Institute, Yuseong-Gu, Daejeon, Korea, ... phases that were originally present in the near-surface.
ABSTRACT. A novel strategy to automatically detect cracks in road pavement surface imagery, acquired by a laser imaging system, is proposed. It includes a ...
To be able to explore the utility of laugh- ter segments for speaker recognition, however, we first need to build a robust system to detect laughter, which is the ...
CRF model has the advantages of modeling the relations of the sequential labels and is able to retain the long distance
There have been numerous efforts to develop CDAs using image processing and machine ... are extracted using convexity, an important perceptual organization cue. .... crater and non-crater regions using machine learning techniques.
representation of them as one's own original work [7]. Plagiarism comes from a latin verb that means, âto kidnapâ If we plagiarize it means that we are kidnapping ...
Automatic Pavement Cracks Detection using Image. Processing Techniques and Neural Network. Nawras Shatnawi. Department of Surveying and Geomatics ...
Jun 25, 2016 - In addition to the in-house test sets, Vision and .... In Proceedings of the 3DTV Conference: The True Vision-Capture, Transmission and Display.
Aug 29, 2016 - Helena Gómez-Adorno 1,*, Grigori Sidorov 1, David Pinto 2, Darnes Vilariño 2 and ... de Investigación en Computación, Av. Juan de Dios Bátiz S/N, ...... Keselj, V.; Peng, F.; Cercone, N.; Thomas, C. N-gram-based author ...
fessor Mike Middleton and Erik Larsson at the Department of. Geology, Chalmers University of Technology, for guidance at various stages of the work.
Jan 21, 2015 - 05/01/2015: Obama proposes free two years at community colleges, Malia Obama .... EvenTweet: Online Localized Event Detection from.
Mar 20, 2014 - paper, we classified images of eye fundus into no-exudate or have ..... The Wills Eye Hospital: Atlas of Clinical Ophthalmology (2nd ed.).
AUTOMATIC DETECTION OF SUB-KM CRATERS ON THE MOON. L. Bandeira, M. Machado, P. Pina,. CERENA/IST/UL, Lisbon, Portugal, {lpcbandeira, ...
Experiments on Sentence Boundary Detection. Mark Stevenson and Robert Gaizauskas. Department of Computer Science,. University of Sheffield.
Oct 25, 2018 - [15] simulated a section of underwater oil pipeline to study the possibility of using AE .... ereas D1âD5 represent the detail functions of the layers. Figure 4. ... Figure 5. Schematic diagram of pipeline leakage test platform. Figu
Experiments on automatic detection of filled pauses using ... - INESC-ID
Experiments on automatic detection of filled pauses using prosodic features H. Medeiros1,2 , H. Moniz1,3 , F. Batista1,2 , I. Trancoso1,4 , H. Meinedo1 1
Spoken Language Systems Lab - INESC-ID, Lisbon, Portugal 2 ISCTE - Instituto Universitário de Lisboa, Portugal 3 FLUL/CLUL, Universidade de Lisboa, Portugal 4 IST, Lisboa, Portugal
Abstract. This paper reports our first experiments on automatic detecting filled pauses from a corpus of university lectures using segmentation and prosodic features. This domain is informal, mostly spontaneous speech, and therefore difficult to process. Filled pauses correspond 1.8% of all the words in the corpus, and to 22.9% of all disfluency types, being the most frequent type. The detection of filled pauses has a great impact on the speech recognition task, since their detection has impact on the recognition of adjacent context and on the improvement of language models. We have performed a set of machine learning experiments using automatic prosodic features, extracted from the speech signal, and audio segmentation provided by the ASR output. For now, only force aligned transcripts were used, since the ASR system is not well adapted to this domain. The best results were achieved using J48, corresponding to about 61% F-measure. Results reported in the literature are scarce for this specific domain, and are significantly better for other languages. The proposed approach was compared with the approach currently in use by the in-house speech recognition system and promising results are achieved. This study is a step towards automatic detection of filled pauses for European Portuguese using prosodic features. Future work will extend this study for fully automatic transcripts, and will also tackle other domains, also exploring extended sets of linguistic features.
Keywords: Filled pauses, University lectures, Statistical methods, Prosody
1
Introduction
Disfluencies are linguistic mechanisms used for online editing a message. These phenomena have been studied in several areas such as Linguistics, Psycholinguistics, Text-to-speech and in Automatic Speech Recognition (ASR). The ASR system is often a pipeline with several modules where each one feeds the subsequent ones with more levels of information. Disfluencies are known to have impact in the ASR modules, since they are
frequently misrecognized and may also lead to the erroneous classification of adjacent words. This has impact in several natural language processing tasks such as part-of-speech tagging, audio segmentation, capitalization, punctuation, summarization, speech translation, etc. It is also known that if disfluencies are recognized, then more reliable transcripts are provided, allowing, for example, to disambiguate between sentence-like units and also between those units and disfluency boundaries, and for speaker characterization. Although the ASR system has to account for all the disfluent categories (filled pauses, prolongations, repetitions, deletions, substitutions, fragments, editing expressions, insertions, and complex sequences), the focus of the present study is on the detection of filled pauses. Filled pauses are amongst the most frequent disfluent types produced. In European Portuguese they are mostly characterized by: i) an elongated central vowel only (orthographically transcribed as “aa”; (ii) a nasal murmur only (“mm”); and (iii) a central vowel followed by a nasal murmur (“aam”). Filled pauses display several communicative functions, e.g., evidence for planning effort, mechanisms to take or hold the floor, to introduce a new topic in a dialogue. Figure 1 shows an example of a disfluent sequence containing the filled pause aa/”uh”.