experiments on error-corrective language model adaptation - CiteSeerX

0 downloads 0 Views 108KB Size Report
cannot cover the correct answer sentence in many of the cases. ... bining error-correction and a linguistic-driven adaptive language model. ..... 10 bests for each utterance in the lists, so about 1,000 best outputs were passed to the RERANKER.
EXPERIMENTS ON ERROR-CORRECTIVE LANGUAGE MODEL ADAPTATION Minwoo JEONG and Gary Geunbae LEE Department of Computer Science and Engineering, Pohang University of Science & Technology (POSTECH) E-mail: {stardust, gblee}@postech.ac.kr

ABSTRACT We present a new language model adaptation framework integrated with an error handling method to improve accuracy of speech recognition and performance of spoken language applications. The proposed error corrective language model (ECLM) adaptation approach exploits recognition environment characteristics and domain-specific semantic information to provide robustness and adaptability for a spoken language system. We demonstrate some experiments of spoken dialogue tasks and empirical results which show an improvement of the accuracy for both speech recognition and spoken language understanding. KEYWORDS: Language model adaptation, Automatic speech recognition, Spoken language understanding, Speech error correction

INTRODUCTION A spoken language interface is often required in many application environments, such as mobile information retrieval, car navigation, and ubiquitous computing, but the low performance of speech recognition systems makes it difficult to extend their application to new fields. For a spoken language system to provide practical interaction between human and machine, the major problem is how to recover decreasing application level performance due to incomplete outputs in speech recognition. Especially, accuracy of automatic speech recognition (ASR) is severely affected by recognition task changes. In a new recognition task, the lexical, syntactic, or semantic characteristics of the utterance have different distributions in comparison with the training corpus. To recover this mismatch and improve the abilities of spoken language systems, the speech recognizer should be adapted to a specific recognition domain. Various language model adaptation approaches have been investigated [1]. The most widespread approaches of adaptive language modeling use the n-best or lattice re-ranking (or re-scoring) technique, which is an efficient method to adapt to new observed data in the recognition task. Typically, n-best re-scoring assumes that the correct answer sentence should always be included in the ASR hypotheses. But, if a speech recognizer cannot provide correct words or lattices due to extremely noisy environments or abnormal characteristics of speakers, even the n-best hypotheses cannot cover the correct answer sentence in many of the cases. To overcome this problem, we propose a new language model adaptation method to incorporate channel characteristics of the domain environment as well as various types of higher-level linguistic knowledge [8], and this paper extends our previous work to more general frameworks. There are two contributions in this paper. The first contribution is the introduction of the unified model combining error-correction and a linguistic-driven adaptive language model. This can capture the characteristics of the

Adaptation Data

Adaptation Data

(ASR and Transcript)

TextCorpus

Channel Model

Bigram LM

ASR output

Semantic LM

1-best

n-best CORRECTOR

n-best SLU

with

semantic frame

RERANKER

Figure 1: Error Corrective Language Model Adaptation Framework recognition environment (e.g. domain, style or speaker variations) with linguistic variation. Therefore our framework provides a simple way to incorporate acoustic (or speaker) and language model adaptation. The second contribution is the attempt to integrate semantic information into the adaptive language model. The domain-specific language understanding model is tightly coupled to and makes feedback to the language model. For robust spoken language systems, we present a general unified framework to correct ASR errors and to adapt to the linguistic variations of the recognition task. The adaptation framework can both handle ASR errors and also combine high-level linguistic knowledge in a uniform manner.

ERROR CORRECTIVE ADAPTATION Adaptation Framework In traditional spoken dialogue systems, the best ASR output, which is re-scored with a domain adapted language model, is passed to the application layer and then a dialogue system analyzes it. In this approach, there are two disadvantages. Firstly, they use only adaptive (and domain-specific) language models to re-score the n-best list or word lattice generated by the base speech recognizer. So these adaptation methods depend on the baseline recognizer, and the hypothesis space is fixed by the base results of the n-best list or lattice. Secondly, the acoustic (or speaker) model and language model are adapted independently in previous adaptation approaches although sometimes acoustic or speaker variation influences language models. To address these limitations, we present an error-corrective adaptation which can correct or re-generate the hypothesis space and then select the best hypothesis in an expanded n-best list or word lattice. Our error corrective language model (ECLM) adaptation framework is depicted in Fig. 1. In this framework, we can consider two models as intermediate levels, the CORRECTOR and RERANKER models. The CORRECTOR is the baseline ECLM which generates new hypotheses whose errors are corrected by the channel models and bigram language models. The RERANKER model re-evaluates the generated hypotheses with domain-specific statistical language models. Finally, we can find the best word sequence (or hypothesis) w by the following equation: w∗ = arg max PRE (w)

(1)

w∈COR(o)

where o is the output of the speech recognizer and COR(o) is the CORRECTOR model which generates new hypotheses. Like the other adaptive language models, our ECLM requires enough adaptation data. In our ECLM framework, we can consider two different types of adaptation data. One is the small channel adaptation data A0 , which is a hypothesis and reference pair (h, r) of the base ASR in the current CORRECTOR task. The other is the linguistic 00 feature corpus (or collection to the current RERANKER task. Then, the entire S 00of sentences) A , which is relevant 0 adaptation data A = A A . In general, reference r in A0 is a subset of A00 , and can be from acoustic or speaker adaptation processes.

User Speech

TAKE A TRAIN FROM CHICAGO TO TOLEDO

ASR output

TICKET TRAIN FROM CHICAGO TO TO LEAVE

Figure 2: Example of a split and a merged error problem (from [11]) CORRECTOR Model The CORRECTOR models the relationship between ASR hypotheses and correct hypotheses. The ASR hypotheses may be incorrect, may not include domain vocabulary, and may not have domain-affected or speaker dependent models. Therefore, the model acts domain adaptor or error corrector. According to this assumption, we can model the CORRECTOR within the “source-channel” or “noisy-channel” paradigm. We assume that the first-pass ASR produces an output sequence including erroneous words, and the adaptation procedure will find the new word sequence with an adapted language model distribution. So, the problem of adaptation of ECLM can be stated in this model as follows. If the ASR outputs o = o1 , o2 , . . . , on for a given input sentence, find the best original word sequence1 w∗ = w1 , w2 , . . . , wn , that maximizes the posterior probability PA (w|o). Then, applying Bayes’ rule and dropping the constant denominator, we can rewrite it as: w∗ = arg max PA (w|o) = arg max PA (o|w) · PA (w). w

w

(2)

At this point, we have a noisy channel model for ECLM adaptation, with two components, the channel model (or translation model) PA (o|w) estimated by A0 and the language model PA (w) integrating the background corpus and adaptation data A00 . The search procedure of Eq. (2) can be efficiently calculated by dynamic programming using the n-best Viterbi search algorithm. Output hypotheses can be error corrected (or newly generated) n-best sentences. The outputs of CORRECTOR depend on two sub-component models, a channel model PA (o|w) and a language model PA (w). Learning the CORRECTOR Model The conditional probability, PA (o|w) reflects the channel characteristics of the ASR environment. If we assume that the output word sequences produced by ASR are independent of one another, we have the following formula: PA (o|w) = PA0 (o1 , o2 , . . . , on |w1 , w2 , . . . , wn ) =

n Y

PA0 (oi |wi )

(3)

i=1

where the channel model is based on data A0 , so PA (o|w) = PA0 (o|w). However, this simple one-to-one model is not suitable for handling split or merged errors, which frequently appear in an ASR output, because errors influence the surrounding words, and there is a context dependency in the error word sequence. For example, Fig. 2 shows a split or a merged error problem. This problem was addressed in statistical machine translation (MT) [3] and posterror correction [11]. We refer to the k-number of post-channel words oi produced by a pre-channel word wi as a fertility with probability PA0 (f : k|wi ). We can simplify the fertility model of IBM statistical MT model-4, and allow the fertility within 2 windows such as PA0 (oi−1 , oi |wi ) · PA0 (f : 2|wi ) for two-to-one channel probability, and PA0 (oi |wi , wi+1 ) · PA0 (f : 1/2|wi ) · PA0 (f : 1/2|wi+1 ) for one-to-two channel probability. So, the fertility model can deal with (TICKET, TAKE A) or (TO LEAVE, TOLEDO) substitution in the example of Fig. 2. Using the fertility model, we can generalize Eq. (3) to handle models more complex than just one-to-one mapping. Let L be a probable set of fertility-values (e.g., L = {0, 1/2, 1, 2} for the 2 fertility model), and fertility k ∈ L. We can assume conditional independence of a channel model (e.g., PA0 (oi−1 , oi |wi ) = PA0 (oi−1 |wi ) · PA0 (oi |wi )). Then, we can get the following equation: PA (o|w) = PA0 (o1 , o2 , . . . , on |w1 , w2 , . . . , wn ) =

n X Y i=1 k∈L

1 Actually,

we find the n-best list using a modified Viterbi search algorithm.

PA0 (oi |wi ) · PA0 (f : k|wi )

(4)

P where k∈L PA0 (f : k|wi ) = 1. To train the channel model, we need training data consisting of {w, o} pairs which are manually transcribed strings and ASR outputs. Also, we align the pair based on minimizing the edit distance between wi and oi by dynamic programming. We can then calculate the probability of each substitution PA0 (oi |wi ) by maximum-likelihood estimation (MLE). Let N (wi ) be the frequency of a source word, and N (oi , wi ) be the frequency of events where oi substitutes wi . Then, PM LE (oi |wi ) =

N (oi , wi ) . N (wi )

(5)

If the adaptation data A0 is not enough to estimate channel model parameters, model parameter PA0 (oi |wi ) is overfitted in general. To overcome this problem, we apply well known language model discounting techniques to the channel model: absolute, Good-Turing and modified Kneser-Ney discounting. For further details of these discounting methods in language modeling, see [4]. Learning the language model PA (w) is slightly different from the channel model. The probability PA (w) is given by the language model and plays the pivotal role of the prior. The distribution PA (w) can be defined using an n-gram, structured language model, or any other tool in statistical language modeling. In this paper, we use a simple bigram language model in the CORRECTOR model to reduce the search space and RERANKER re-evaluates the resulting hypotheses. CORRECTOR searches for the n-best hypotheses using the bigram, and RERANKER re-scores the best one via more sophisticated language models.

RE-RANKING MODEL WITH SPOKEN LANGUAGE UNDERSTANDING The RERANKER, the second part of ECLM, re-scores the n-best list or word lattice of the error-corrected speech recognizer. The probabilistic re-ranking approach is widely used in natural language processing (NLP), including language modeling [1] and natural language parsing [6]. The major benefit of the re-ranking method is more tractable computation rather than direct processing. And it can be adaptable to specific domains without re-modeling the base ASR in the language modeling task. In this section, we explain how to use domain-specific semantic knowledge in our re-ranking model. RERANKER Model From the n-best outputs of CORRECTOR, we can calculate language model probability with CORRECTOR model scores2 . So, we define the RERANKER model as the following formula. PRE (w) = λPCOR (w) + (1 − λ)PLM (w).

(6)

The PCOR (w) is the scaled and normalized CORRECTOR model score and PLM is various language models which are trained by background and domain corpora. In this model, PCOR (w) has the same role as prior as the acoustic score did in the speech recognizer. The trigram model and class-based language model (CLM) [2] have been successfully applied in ASR tasks, so we choose these two models for our baseline RERANKER model. We defined PLM with the trigram as the following simple form. Ptrigram (w) = P (w2 |w1 , ) ×

n Y

P (wi |wi−1 , wi−2 ) × P (|wn−1 , wn−2 )

(7)

i=3

The CLM predicts a word sequence by the following equation. PCLM (w) = P (c1 |)P (w1 |c1 ) ×

n Y

P (ci |ci−1 )P (wi |ci ) × P (|cn )

(8)

i=2

where ci is the word class to which the i-th word belongs. The word cluster probability P (wi |ci ) is learned by Expectation-Maximization (EM) algorithm, and then P (ci |ci−1 ) is calculated from uniquely assigned word class sequences. 2 This CORRECTOR score is not the probability because of the ignored constant term P (o) in the Viterbi search. We assume that the effect of P (o) is quite small, so we define P (o) as uniform over all hypotheses.

Using Language Understanding as a Semantic Language Model The aim of the ECLM adaptation is to build practical and robust spoken dialogue system. The spoken dialogue system has a pipeline structure which consists of speech recognizer and domain-specific language understanding modules. In such dialogue applications, task-oriented spoken language understanding (SLU) can be used for our re-ranking language model. Like the class-based bigram, the semantic class-based language (bigram) model (SCLM) is defined by the following equation. PSCLM (w) = P (s1 |)P (w1 |s1 ) ×

n Y

P (sc |si−1 )P (wi |si ) × P (|sn )

(9)

i=2

where si is the semantic class to which the i-th word belongs. Learning SCLM is divided into two phases: 1) assigning semantic class s to raw training corpus by the SLU model and 2) estimating the two probability functions from assigned semantic classes. The probability of P (si |si−1 ) and P (wi |si ) are simply estimated from a corpus as follows. P (si |si−1 )

=

P (wi |si )

=

N (si , si−1 ) N (si−1 ) N (wi , si ) N (si )

(10) (11)

where N (x) represents the number of times the event x occurred in the (semantic class assigned) learning corpus. There are two main differences between CLM and SCLM: 1) a word belongs to a unique class in CLM but to multiple classes in SCLM 2) the model is learned by EM algorithm in CLM and by MLE in SCLM. Nevertheless SCLM requires a semantic-tagged corpus to train the SLU model, which is normal for practical spoken dialogue systems.

EXPERIMENTAL RESULT To evaluate our ECLM, we built the CORRECTOR and RERANKER models with various experimental setup. In this section, we present the empirical results of both ASR and SLU tasks. Experimental Setup We have evaluated our ECLM adaptation method using two data sets: the CU-Communicator travel corpus (airtravel) and Korean tele-banking dialogue corpus (telebank). The airtravel set made by University of Colorado is an overthe-telephone spoken dialogue corpus to develop dialogue system for accessing live airline, hotel and car rental information, and was collected in 461 days and consists of 2211 dialogues or 38,408 utterances in total. We removed most of the single word responses such as “Yes, please” or “No” which were not very useful to evaluate SLU performance. In general, there are no domain-specific semantic concepts in such single words. After cleaning up, we selected 13,983 utterances, and labeled semantic categories to raw data. The semantic labels of airtravel were automatically generated by Phoenix parser and manually corrected. In the airtravel set, the semantic frame has a twolevel hierarchy: 31 first level classes and 6 second level classes, and for a total of 96 class combinations in 13,983 utterances. The telebank is a Korean spoken dialogue corpus which is provided by Sogang University. The telebank set consists of 7 different sub scenarios and about 22,000 utterances in total. We selected a sub-scenarios related to loan information and removed meaningless utterances, so we got 2,239 utterances. Without initial automatic labeling, we manually tagged semantic frame to raw text. In addition to word-level semantic labeling, sentence-level dialogue acts are labeled in the telebank set (see details in [7]). We defined 15 word-level semantic frame slots. For the two data sets, we used two ASR systems. In both airtravel and telebank set, we trained recognizers with only the domain-specific speech corpus (not only the acoustic model but also the language model), and used no acoustic/speaker adaptation or noise filter, because we want to evaluate our method fairly without being fused with other techniques. For airtravel, we used the open source speech recognizer Sphinx2 and the open source CMUCommunicator Feb-2000 acoustic model3 which was trained with a semi-continuous density model. The reported 3 Available

at http://www.speech.cs.cmu.edu/Communicator

27.0

15.0

Baseline(trigram) No smoothing Absolute Good−Turing Modified KN

14.0

14.5

Word Error Rate (%)

25.5 25.0

13.5

24.0

24.5

Word Error Rate (%)

26.0

26.5

Baseline(trigram) No smoothing Absolute Good−Turing Modified KN

10000

20000

30000

40000

Number of Training Size

Figure 3: Result of CORRECTOR model experiment in airtravel set. WER achieved by adaptation using different sizes of channel model parameters.

1000

2000

3000

4000

5000

6000

7000

Number of Training Size

Figure 4: Result of CORRECTOR model experiment in telebank set. WER achieved by adaptation using different sizes of channel model parameters.

accuracy of airtravel data is about 85%, but we had 73.85% accuracy in our experimental setup because we used only subset data (13,983 out of 38,408) without tuning and the sentences of this subset is longer and more complex than those of the removed ones, most of which are single-word responses. For telebank, we made an HTK-based Korean speech recognizer (morpheme-based recognition system) using mel-frequency cepstrum coefficients (MFCC) 39 dimensional feature vectors. For each test sentence, we produced 100-best lists with these baseline speech recognizers. We divided each test set into 5 parts and evaluated the results using 5-fold cross validation for all our experiments. All language models were trained with modified Kneser-Ney discounting [4]. The CORRECTOR model generated 10 bests for each utterance in the lists, so about 1,000 best outputs were passed to the RERANKER. Experiment 1 - Effect of CORRECTOR Model First, we consider the CORRECTOR model. Fig. 3 and 4 show the effect of the CORRECTOR model (baseline ECLM without re-ranking). The baseline speech recognizer’s word error rate (WER) of the airtravel set is 26.15% and that of telebank is 20.81%. Because our HTK-based Korean speech recognizer provides just bigram n-best, we re-score the telebank data with a trigram language model, and the baseline WER was reduced to 14.85%. Our CORRECTOR model achieved the improvement of performance by increasing the training data. To reduce the overfitting problem, we used 3 different smoothing methods: absolute, Good-Turing, and modified Kneser-Ney smoothing. In Fig. 3 and 4, the discounting methods work well in general, and we come to the conclusion that the best training strategy is to employ the modified version of Kneser-Ney method. We performed the standard NIST Matched Pairs Sentence Segment Word Error Test (MAPSSWE) as a statistical significance test [10] between the baseline trigram and our ECLM with the simple bigram to verify our result and found that: • the WER improvement of the airtravel set with the ECLM adaptation (CORRECTOR with bigram model) over the baseline trigram model is significant (p < 0.001), and • that of the telebank set with ECLM adaptation (CORRECTOR with bigram model) over the baseline trigram model is also significant (p < 0.001). Experiment 2 - Effect of RERANKER Model In the next experiment, we applied the re-ranking approach to our baseline ECLM adaptation. The outputs of experiment 1 (in section ), 1000-best lists generated by CORRECTOR, were re-evaluated by the RERANKER model in formulas (1) and (6). We ran the second-pass re-ranking algorithm over both baseline ASR system and CORRECTOR outputs using either trigram, CLM or our SCLM. To verify that the proposed model is helpful to both

Table 1: RERANKER results of airtravel set. We used 3 different language models to re-rank the outputs of both the baseline ASR and the baseline ECLM adaptation.

Model ASR (trigram) ECLM (trigram) ASR (CLM) ECLM (CLM) ASR (SCLM) ECLM (SCLM)

ASR WER 26.15 (0.00) 23.70 (-9.37) 26.72 (+2.18) 23.72 (-9.29) 26.08 (-0.27) 23.61 (-9.71)

Prec 74.51 75.42 70.96 75.68 72.86 75.69

SLU (slot) Rec 66.08 67.81 63.44 67.94 64.71 68.06

F1 70.04 71.41 66.99 71.60 68.54 71.67

Table 2: RERANKER results of telebank set. Again, we used 3 different language models to re-rank the outputs of both the baseline ASR and the baseline ECLM adaptation.

Model ASR (trigram) ECLM (trigram) ASR (CLM) ECLM (CLM) ASR (SCLM) ECLM (SCLM)

ASR WER 14.85 (0.00) 12.95 (-12.79) 15.30 (+3.03) 12.92 (-13.00) 14.77 (-0.54) 12.85 (-13.47)

Prec 86.23 85.89 86.11 85.95 86.47 86.08

SLU (slot) Rec 80.75 82.82 81.08 82.87 81.26 83.09

F1 83.40 84.32 83.52 84.38 83.78 84.55

speech recognition and spoken language application, we evaluated not only WER but also performance of SLU. Evaluation of SLU is measured in terms of F1-measure (F1) on slot/value pair, which is the harmonic mean between the precision (Prec)4 and recall (Rec)5 for concept frame matching with the gold standard frames, and is defined as F 1 = (2 · P rec · Rec)/(P rec + Rec). Table 1 and 2 show the experimental results of WER and SLU performance using ECLM adaptation with various re-ranking language models (RERANKER model). In both data sets, WER’s decrease, and performances of the semantic frame extraction increase by the RERANKER model over the baseline ECLM adaptation, which is a CORRECTOR model. Thus, we can see that the ECLM adaptation significantly improves the performance for both WER and SLU F1 (all results of ECLM pass the MAPSSWE test with p < 0.001). The results also present the effect of re-rank language modeling in RERANKER. Especially, the SCLM is better than both trigram and CLM, so it shows that semantic information is useful to a spoken language system whether the CORRECTOR model is used or not. The best model is the ECLM adaptation model with SCLM: its results for both test sets were 9.71% and 13.47% error reduction over the ASR trigram. Frame extraction performance of the gold standard transcripts (0% WER) are 89.76 and 88.67. The performance normally decreases linearly with errors of the speech recognizer. But we also achieved an improvement of the frame extraction performance using ECLM over the baseline F1 values of 70.04 and 83.40. In comparison to airtravel, the telebank result is better, because the speech recognition accuracy of telebank is higher than that of airtravel.

CONCLUSION The main issues facing practical spoken language applications to provide interface between human and machine is how to overcome incompleteness of speech recognition and how to guarantee the reasonable end-performance of spoken language applications. Therefore, handling erroneously recognized outputs is a key to developing robust spoken language systems. To address this problem, we proposed an ECLM adaptation approach to combine both domain environment characteristics and high-level linguistic knowledge. First, our ECLM adaptation expands the sentence hypotheses from a speech recognizer with noise channel properties. Next, we re-rank the generated hypotheses with semantic information combined with high-level linguistic knowledge. Using ECLM adaptation, we demonstrated the improvement of performance in both ASR and domain-specific SLU applications. The ECLM technique is better able to deal with the erroneous outputs of ASR and can be adapted by domainspecific text corpora and recognition results to improve the accuracy of speech recognition systems. The ECLM can be combined with general statistical language modeling methods, which successfully integrate higher-level linguistic knowledge. In our ECLM adaptation framework, the correction model is tightly coupled with the SLU component using the semantic language model to re-score the corrected candidate sentences. The channel adaptation data can be sometimes insufficient. To address this data scarcity problem, we are considering to use active learning methods to reduce the effort of human transcription [5]. This can be integrated with the 4 Prec 5 Rec

= (# of selected relevant frames) / (total # of selected frames) = (# of selected relevant frames) / (total # of relevant frames in gold standard)

acoustic or speaker adaptation process because it already uses transcripts and acoustic signals pair data. Thus, active learning with speaker adaptation can be the subject of future intensive research on our ECLM adaptation. Moreover, signal-level features such as acoustic scores can be helpful to make a better channel model, so we are considering these type of features to model channel properties.

ACKNOWLEDGEMENTS This research was supported by the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Commerce, Industry and Energy of Korea.

REFERENCES [1]

Bellegarda, J.R., January 2004. Statistical language model adaptation: review and perspectives. Speech Communication, 42(1):93-108.

[2]

Brown, P.F., Pietra, V.J.D., deSouza, P.V., Lai, J.C., and Mercer, R.L., 1992. Class-based n-gram models of natural language. Computational Linguistics. 18(4):467-479.

[3]

Brown, P.F., Pietra, V.J.D., Pietra, S.A.D. and Mercer, R.L., 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics. 19(2):263-311.

[4]

Chen, S.F. and Goodman, J., 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR.10.98, Center for Research in Computing Technology (Harvard University).

[5]

Cohn, D., Atlas, L. and Ladner, R., May 1994. Improving generalization with active learning. Machine Learning, 15(2):201-221.

[6]

Collins, Michael and Koo, Terry, 2005. Discriminative reranking for natural language parsing. Computational Linguistics, 31(1):25-69.

[7]

Eun, J., Jeong, M., and Lee, G.G., September 2005. A multiple classifier-based concept-spotting approach for robust spoken language understanding. Interspeech 2005-Eurospeech, Lisbon.

[8]

Jeong, M., Eun, J., Jung, S. and Lee, G.G., September 2005. An error-corrective language-model adaptation for automatic speech recognition. Interspeech 2005-Eurospeech, Lisbon.

[9]

Malouf. R., 2002, A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of the Sixth Conference on Natual Language Learning (CoNLL).

[10] Pallett, D., Fisher, W., and Fiscus, J., 1990. Tools for the analysis of benchmark speech recognition tests. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1. 97–100, Alburquerque, NM. [11] Ringger, E.K. and Allen, J.F., 1996. Error correction via a post-processor for continuous speech recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).