Design and Implementation of the Note-taking Style Haptic Voice ...

0 downloads 164 Views 2MB Size Report
Haptic Voice Recognition for Mobile Devices. Seungwhan Moon. Franklin ... [email protected]. 1. 14th ACM International Co
Design and Implementation of the Note‐taking Style  Haptic Voice Recognition for Mobile Devices

14th ACM International Conference on Multimodal Interaction DoubleTree Suites Santa Monica, California. October 22‐26th, 2012

Seungwhan Moon

Khe Chai Sim

Franklin W. Olin College of Engineering 1000 Olin Way Needham, MA, U.S.A. [email protected]

National University of Singapore Computing 1, 13 Computing Drive Singapore, Singapore 117417 [email protected] 1

Introduction • Haptic Voice Recognition (HVR)

Haptic Input

Speech Input

2

Introduction • Haptic Voice Recognition (HVR) • Boundary of Sentence (BoS) • Boundary of Word (BoW) • First Letter of Word (FLoW) …  • Synchronous • Asynchronous 3

Note‐taking Style HVR Motivation

Lecture Note Haptic voice recognition - combine speech / touch - increases accuracy

Semantically Meaningful Keywords  Natural to write & take notes 4

Note‐taking Style HVR

Haptic Input

meeting tom at 6 pm Haptic Note Sequence

Speech Input

5

Note‐taking Style HVR 1. An element in a haptic note sequence refers to a  partially or fully spelled word in the decoded  word sequence. 2. The number and the order of keywords in a  haptic note sequence do not need to match  those of words in the actual word sequence. 3. The exact time at which a haptic event occurs is  ignored. 6

Note‐taking Style HVR 3 Types of Haptic Input Methods 1. Longhand Handwriting

note

2. Shorthand Handwriting 3. Virtual Keyboard N

O

T

E

7

Note‐taking Style HVR (Adapted) Gregg Shorthand Handwriting Recognition 1.

Facilitates much faster and  more effective input

2.

Adds ambiguousness to the  letters that have phonetic  similarities

3.

Adapted to HVR – uses  isolated letters to spell a  word. 8

Note‐taking Style HVR

9

Demo

10

Algorithm Design

: Word sequence : PLI sequence : Sequence of observed acoustic features : Sequence of observed haptic features

Haptic Voice Recognition   Finding the joint optimal solution for W, L given O, H. 11

Algorithm Design Weighted Finite State Transducer (WFST)

: Lattice of multiple word sequence hypotheses : PLI model : Lattice of permutations of haptic note sequence

Shortest Path of Eq (2)   Optimal solution for Eq (1) 12

Algorithm Design Using OpenFST …

fstcompile

fstshortestpath

fstcompose 13

Experimental Results (1) Simulation ‐ Single user, 72 sentences, 100 iterations. ‐ N words (partially / fully spelled) are randomly chosen  (artificial haptic events) – NW3L   /   NW ‐ Under two Sound Noise Ratio (SNR) conditions – clean, 15dB (artificially corrupted) ‐ Compared with FLoW, Oracle Error Rate 14

Experimental Results (1) Simulation

Figure: Simulation results (a) when performed without any additional noise and (b) when performed with artificial noise at SNR = 15dB. x‐ axis denotes the number of randomly chosen keywords (N), whereas y‐axis denotes the word error rate (WER). The red and the blue lines  refer to the Note‐taking‐style HVR performance with the first 3 letters of N randomly chosen words (N‐W3L), and the Note‐taking‐style HVR  performance with N fully‐spelled words (N‐W). The error bars indicate the standard deviations of the 100 iterations.  15

Experimental Results (1) Simulation ‐ Notable improvement in the Word Error Rate (WER)  for both  NW3L &  NW in both SNR conditions. ‐ Higher improvement for bigger N – with decreasing  rate of improvement ‐ Bottleneck at the Oracle Error Rate  performance depends on the quality of the speech                recognizer.

‐ Large standard deviation of WER  choice of keywords  significantly affect the performance.

16

Experimental Results (2) Preliminary User Studies ‐ Single User (72 sentences for each) ‐ 3 keywords (partially spelled – only the first 3 characters)  are chosen – 3W3L  ‐ 3 Different Input Method – Shorthand  /  Longhand  / Keyboard ‐ Compared with BoS, and FLoW 17

Experimental Results (2) Preliminary User Studies

Table: Five haptic methods were applied in this experiment: Boundary of Sentences (BoS), 3 Words and 3 Letters (3W3L) via Shorthand,  Longhand, and Keyboard input, and First Letter of Words (FLoW). The table reports the Word Error Rate (WER), the Keyword Error Rate  (KER), and the absolute improvement in the error rate from the Automatic Speech Recognition (ASR) results to the Hatpic Voice Recognition  (HVR) results.

18

Experimental Results (2) Preliminary User Studies

‐ Notable improvement in the Word Error Rate (WER) and the Keyword Error Rate (KER) ‐ Greater improvement in KER  can enhance the user experience with the speech recognition system. ‐ Increased duration of speech.  minimized by the use of partially spelled words and Gregg shorthand.

19

Conclusion • Summary – Improvement in WER & particularly in KER – Less‐increased duration of speech (Gregg Shorthand, partial spelling) – Large standard deviations of WER

• Future Work – HVR API – Application in Spoken Document Retrieval (for Online Lectures, e‐Learning, Conferences, etc.) 20

Design and Implementation of the Note‐taking Style  Haptic Voice Recognition for Mobile Devices

‐ The End ‐ Seungwhan Moon

Khe Chai Sim

Franklin W. Olin College of Engineering 1000 Olin Way Needham, MA, U.S.A. [email protected]

National University of Singapore Computing 1, 13 Computing Drive Singapore, Singapore 117417 [email protected] 21

References [1] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri. Openfst: A general and ecient weighted finite-state transducer library. Lecture Notes in Computer Science, 4783(11):11{23, 2007. [2] H. Butler. Teeline Shorthand. Butterworth Heinemann, 1991. [3] J. R. Gregg. The Basic Principles of Gregg Shorthand. New York: Gregg Pub, 1923. [4] S. Gunter and H. Bunke. Hmm-based handwritten word recognition: on the optimization of the number of states, training iterations and gaussian components. Pattern Recognition, 37:2069{2079, 2004. [5] J. Hu, S. G. Lim, and M. K. Brown. Writer independent on-line handwriting recognition using an hmm approach. Pattern Recognition, 33(1):133 - 147, 2000. [6] M. Mohri, F. Pereira, and M. Riley. Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16(1):69 - 88, 2002. [7] G. A. Reid, E. J. Thompson, and M. Angus. Pitman Shorterhand. New York: Pitman Pub, 1972. [8] T. Robinson, J. Fransen, D. Pye, J. Foote, S. Renals, P. Woodland, and S. Young. WSJCAM0 Cambridge Read News. Linguistic Data Consortium, Philadelphia, 1995. [9] K. C. Sim. Haptic voice recognition: Augmenting speech modality with touch events for efficient speech recognition. IEEE Spoken Language Technology Workshop (SLT), pages 73{78, 2010. [10] K. C. Sim. Probabilistic integration of partial lexical information for noise robust haptic voice recognition. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 31- 39, July 2012. [11] A. Varga and H. J. Steeneken. Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3):247 - 251, 1993. [12] S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland. The HTK Book (for HTK version 3.4). Cambridge University, December 2006.

22