Optimizing Features for Dialogue Act Classification James D. O’Shea, Member, IEEE, Zuhair A. Bandar, and Keeley A. Crockett, Senior Member, IEEE School of Computing, Mathematics and Digital Technology Manchester Metropolitan University Manchester, United Kingdom
[email protected] Abstract— Natural language dialogue is an important component of interaction between ordinary users and complex computer applications. Short Text Semantic Similarity algorithms have been developed to improve the efficiency of producing sophisticated dialogue systems. Such algorithms are currently unable to discriminate between different dialogue acts (assertions, questions, instructions etc.), requiring the addition of efficient dialogue act classifiers to enhance them. The Slim Function Word Classifier (SFWC) has proved promising, particularly in its computational simplicity. This study optimizes the SFWC by clustering function word features using grammatical principles. Experiments show a significant improvement in classification accuracy for a selection of sentence forms which were challenging for the unoptimized SFWC. Results are expected to be applicable to many intelligent text processing applications ranging from question answering to meeting summarization. Keywords— Natural language interfaces, Interactive Systems, Decision trees, Pattern recognition, Dialogue Systems, Function words
I.
INTRODUCTION
The motivation for this work is the creation of a method of classifying Dialogue Acts (DAs) for use in intelligent text processing systems. The DA is a fundamental component of human communication [1] [2]. DA theory separates out the point or purpose of an utterance (the DA) from the propositional content (what the utterance is about). So the propositional content “door is shut” gives rise to the DAs “The door is shut.” (assertion), “Is the door shut?” (question) and “Shut the door!” (instruction). Apart from general fields such as information retrieval [3] and question answering [4], specific applications such as tutoring systems [5], summarization of multi-party meetings [6] [7], analysis of online chat and social messaging [8] and deception detection [9] can also benefit from DA classification. Our main motivation is the application of DA classification to improve the performance of Goal Oriented Conversational Agents (GO-CAs). GO-CAs are similar to Spoken Dialogue Systems (SDSs) but originate from agent based systems as opposed to automatic speech recognition. Consequently they use different tools and strategies.
The latest GO-CAs use short text semantic similarity algorithms to provide a numerical measure of the similarity between the user utterance and a set of targets [10]. However, these algorithms do not currently take specific account of the DA, so DA classifiers could improve their performance [11]. Initial work on the Slim Function Word Classifier (SFWC) for DAs showed promising Classification Accuracy (CA) with a single Decision Tree (DT) classifier [12] [13]. Function words are a closed-class of structurally important words e.g. articles (the) and prepositions (on). There is a high degree of overlap between function words and stop words, which have high frequency of occurrence and are usually removed in Natural Language Processing (NLP). Later work suggested that improvements in CA could be obtained by using a small set of different DT classifiers for particular variants of DAs [14]. However further improvements in CA are needed and the fundamental research question in this paper is whether this can be achieved through the optimization of feature encoding. This work investigates two alternative hypotheses: that clustering of function words by functional/grammatical properties will increase the CA and reduce the size of DT SFWC classifiers. The corresponding null hypotheses are that there is no significant difference compared with the alphabetically sorted function words. Compactness of the DTs is important because smaller trees are likely to generalize better from the training set to the whole domain, and because they are more computationally efficient, resulting in better scalability when deployed over the web. The rest of this paper is organized as follows: section II reviews prior work on DA classification, section III describes the SFWC approach and section IV describes optimization of the SFWC. Section V presents the results of a series of experiments to evaluate the optimization technique, using question vs. nonquestion datasets, and section VI contains conclusions and directions for future work. II.
PRIOR WORK ON DA CLASSIFICATION
Three features characterize different approaches to DA classification: taxonomies, feature extraction and the classifiers themselves.
978-1-4799-0652-9/13/$31.00 ©2013 IEEE
A. DA Taxonomies The DA taxonomy determines how many classes of DA will need to be recognized by the complete classification system. No definitive DA taxonomy has emerged from prior research. Differences between taxonomies arise from issues such as the degree of granularity used in decomposing the DAs and whether or not the text originates from Automatic Speech Recognition (ASR). Searle [2] used 5 coarse-grained categories: Assertive (e.g. statements), Directive (e.g. orders), Commissive (e.g. promises), Expressive (e.g. apologies) and Declarations (e.g. declaration of war). This coarse-grained approach can be useful in practical dialogue systems [15], particularly in fields such as robotics [16]. In contrast, Dialogue Act Markup in Several Layers (DAMSL) is a model which supports very fine-grained categories [17]. It has clearly specified upper layers and flexible lower layers intended to adapt effectively to diverse domains. The highest level contains 4 categories of tags representing the independent dimensions: CommunicativeStatus, Information-Level, Forward-Looking Function and Backward-Looking Function. The documentation identifies 30 DA tags at the lowest level of detail; however some of these allow further decomposition, for example the InformationLevel, Other-level category is described as capable of containing jokes, nonsequiturs and smalltalk. DAMSL reflects the strong orientation of most DA classification systems to the problems of ASR. For example, Communicative-Status is only applied to uninterpretable utterances and includes the tags Uninterpretable, Abandoned and Self-talk.
B. DA Feature extraction The sources of features for DA classification are prosodic information, contextual information and surface information in the sentence text. Prosodic features consist of patterns of rhythm, stress and intonation in speech and are well established in ASR [18]. Contextual information uses the immediate dialogue environment. The most useful contextual information comes from the DAs of earlier utterances [19]. Surface information is the most important source of features because it is always available from the text representation of an utterance, regardless of the communication channel used. The most common surface feature is the n-gram, a string of words extracted from a corpus using statistical methods [20], e.g. Can You [21]. Typically, n-grams are up to 4 words long. Variations include gappy n-grams (which allow other words to occur in the sequence [22]), context features (which indicate the presence or absence of specific words e.g. contains-wordRENTAL-or-CAR [23]) and cue phrases (e.g. in any case [24]). Various other features have been used in a somewhat ad-hoc fashion including the mood of the utterance, subject type, first verb type, etc. [25], number of occurrences of the word OR in the presence of a question-mark [19], orthographic (e.g. comma) and tokenized lexical features [26].
Some approaches, such as Feature Latent Semantic Analysis [27], explicitly remove stop words before making the classification. This removes almost all of the function words. Most of the prior work on feature extraction has problems of accuracy, difficulty of automation and computational complexity if NLP processes are used. C. DA Classifiers A wide range of classifiers has been used in studies to date with Bayesian techniques proving the most popular [21]. Others include statistical [28] models, Kohonen Networks [29], Backpropagation Artificial Neural Networks [30], C4.5 Decision Trees [19], Production Rules [23], Simple Heuristics [31], Hidden Markov Models [32], Support Vector Machines [18], Learning Vector Quantization [33] and Self Organizing Maps [33]. Multi-classifier combinations have also been investigated, with a pairing of k-Nearest Neighbor and Bayesian classifiers proving most effective (81.25% CA on English and 78.93% on German) [30]. It is difficult to draw any conclusions about the relative performances of particular classifiers from reported CAs. They have been evaluated with different datasets tagged with different numbers and types of DA, ranging from 2 [21] to 232 different tags [34]. The sizes of the datasets also vary, from 81[21] to 223,000 utterances in the Switchboard set [35]. The human population sample is also an issue. One dataset of 2794 DAs was drawn from only two humans [36] and may not generalize beyond these two people. What could be considered as good performance? One example was a CA of 89.27% on classifying the large ICSI Meeting Corpus into 5 DA categories [19]. Results can be variable however; another study [37] achieved 92.9% on suggestions but scored 0 on queries. III.
THE SLIM FUNCTION WORD CLASSIFIER (SFWC) FOR DIALOGUE ACTS
The Slim Function Word Classifier (SFWC) was devised to overcome the problems of scalability and feature extraction. It is founded on the hypothesis that function words alone in an utterance provide sufficient information to classify the DA. It has been evaluated in classifying questions v. nonquestions [12] and instructions vs. noninstructions [13]. A. SFWC Feature Extraction Feature extraction begins with a pre-processing stage to expand contractions such as it’s to it is. The utterance is tokenized using a function word lookup table (extract in table I). Originally this made no assumptions about structure within the set of function words and simply allocated the tokens in ascending alphabetical order of the words. So related words could be dispersed throughout the table (e.g. the pronouns I, we and you were allocated the tokens 99, 227 and 260 respectively). Utterances are converted into a fixed length vector of tokens, set at 25 words (by taking the previously used upper bound of 20 [38] and adding an extra margin of 5 words to allow for preambles or cue phrases that may accompany questions, instructions etc.). Each slot in the vector is an attribute that may be used by a classifier. All of the content words (nouns, verbs, adjectives or adverbs) are replaced by the
same wildcard token, 0. Each function word is replaced by its numeric token (range 1 – 264) from the table. TABLE I.
A SAMPLE OF FUNCTION WORDS AND TOKENS FROM THE LOOKUP TABLE
Function word
Token
above ...
3 ...
myself namely ... Yours yourself yourselves
131 132 ... 262 263 264
For utterances of fewer than 25 words, otherwise empty positions in the vector are filled with the token 300, indicating “no word in this position”. Filling the empty slots with a token prevents the slots from being treated as missing attributes by a classifier and also allows the classifier to exploit the length of the utterance as an attribute if this assists in the classification. An example of a tokenized question is given in table II. In the tokenized form, the function word “does” is replaced by token 56 from the function word table. The content words “wearing caps” are not found in the table so each is replaced by 0, the function word “or” is replaced by token 156 from the table etc. When the final word “loss” has been replaced by 0, there are fewer than 25 tokens so the rest are filled with the token 300. B. SFWC Decision Tree Classification Prior work on the SFWC [12] found C4.5 decision trees to be superior to Bayesian and Multi-Layer Perceptron classifiers. They may also provide a greater insight into the problem domain (due to the transparency of their rules) and highly computationally efficient implementations are possible. The SFWC treated each token position in the sentence as an attribute and splits were made based on the numerical value of the token. Attribute values were on a nominal scale in [12] and [13]. The function words were sorted into alphabetical order, making no use of grammatical sub-classes. Suppose that in a particular word position, personal pronouns are significant values of the attribute. These are widely dispersed in the alpha-sorted approach. Separating them from the other function words would require a number of splits in the decision tree. Each split loses information, so the construction process may exhaust the training set without learning useful properties of the domain. Quinlan [39] addressed such problems by grouping attribute values. Similarly, this work clusters function words so that functional classes and subclasses are in contiguous bands in TABLE II.
A QUESTION AND ITS TOKENISED FORM
Question
does wearing caps or hats contribute to hair loss
Tokens
56,0,0,156,0,0,212,0,0,300,300,300,300,
300, 300,300,300,300,300,300, 300, 300, 300,300,300 the tokenizing table (I = 1, we = 2, you = 3 etc.). This supports splitting that keeps the words in a subclass together and separates them from other sub-classes, which should preserve information leading to better trees. C. SFWC evaluation and benchmark datasets The current versions of the SFWC are intended for use in text-based systems such as GO-CAs. Datasets intended for SDSs are not suitable for this work because they contain terse and irrelevant DAs such as comm which controls the audio channel. The use of small population samples also risks the data being unrepresentative. Consequently this study uses specifically designed datasets [11], available from (http://www.semanticsimilarity.net). IV.
OPTIMISATION STRATEGY
This strategy requires the function words to be decomposed into categories and sub-categories. It uses the grammatical categories described in comprehensive analyses of the English Language informed by the analysis of large corpora [40] [41] [42]. A. Decomposition of the function words At the top level, function words are decomposed into Pronouns, Determiners, Link words, Abbreviations, Prepositions, Auxiliary verbs, Positives/Negatives, Interrogative introducers and Others (which don’t fit naturally into any of the previously defined categories). The taxonomy is tree-structured, decomposing into leaf categories which contain the function words (see table III). Many link words can also be used as simple adverbs or prepositions, but they are included in virtue of their function word properties. Also, the interrogative introducers that start with “wh” are often called wh-chefts. The full decomposition can be found in [11]. Where a word could fit into more than one category, it has been allocated to the TABLE III. Category Pronouns Personal Reflexive Indefinite Possessive Determiners
FUNCTION WORD CATEGORIES
Examples I, you Myself, yourself Something, anybody. Mine, yours
Category Link words Co-ordinators Additives
And, but Again, also
Resultatives
Hence, so
Contrastives Time Indicators
Else, instead After, meanwhile Anyhow, anyway Else, as
Personal
My, your
Concessives
Numerical
First, second
Demonstrative. Article Other Determiners Positives / negatives Abbreviations
This, those A, an, the All, either
Miscellaneous Links Auxiliary verbs Standard Modal
Yes, no, not e.g., i.e., etc.
Interrogative Introducers Prepositions
Examples
Be, have, is Can, may, ought Why, what, how by, at, for.
a.
Already, Indeed
Others
most frequent sense or usage of the word, using the Cobuild dictionary [42]. B. Application of the taxonomy to feature encoding Each function is still represented by a unique code, but the words in a cluster are contiguous so that a single split in the tree can partition two clusters. Because some function words could participate in more than one cluster, these were placed at the boundaries of the clusters and clusters were ordered (as far as possible) to allow the DT training algorithm to split inside the clusters where this would produce more accurate partitions. This was performed using a priori grammatical information [41]. V.
EXPERIMENTAL PROCEDURE
A. Experimental datasets The datasets were based on two sets of sentences which are either straightforward questions or nonquestions (see table IV). Various real-word question categorization schemes were reviewed to synthesize a set of generic categories [14]. For example, a straightforward question starts with an interrogative introducer or an auxiliary verb (e.g. who, can). A straightforward nonquestion starts with something else such as a pronoun (e.g. we). A difficult nonquestion begins with a word that normally starts a question, but uses it in a different way such as a link word (e.g. when). Seven datasets derived from these, from [14], were chosen for this study. These were the simplest and the 6 most challenging datasets from [14]: 1.
Straightforward Questions vs. Straightforward Nonquestions both without preambles
2.
Straightforward Questions with 1-word preambles vs. Straightforward Nonquestions
3.
Straightforward Questions with a mix of 1-3 word preambles vs. Straightforward Nonquestions
4.
Straightforward Questions vs. Difficult Nonquestions both without preambles TABLE IV.
EXAMPLE DIALOGUE ACT FORMS
DA Form Straightforward Question Straightforward Nonquestion Difficult Nonquestion Straightforward Question with 1word preamble Straightforward Nonquestion with 2-word preamble Difficult Nonquestion with 3-word preamble Simulated clauses
Example What are fuel cells? We are all about the city. Whena psoriasis develops on the scalp, hair loss sometimes follows. So, what are fuel cells? In fact we are all about the city. Well I mean, when psoriasis develops on the scalp, hair loss sometimes follows. Wow we work this out then will be down to the lawyers, am I considered covered by an employer sponsored retirement plan for the year if I do not participate in the plan?
Distinctive elements of a form are shown in bold
5.
Straightforward Questions vs. Difficult Nonquestions both with 2 word preambles
6.
Straightforward Questions vs. an approximately 50/50 mix of Straightforward and Difficult Nonquestions all without preambles
7.
Simulated clauses
The fundamental requirement for collecting data was representation of the natural language used in dialogue. Prior experience [11] suggested that using human participants to generate new data would have a low productivity of usable data. Consequently two types of web source were used. Questions were obtained from varied Usenet newsgroup Frequently Asked Question (FAQ) lists, supplemented by other suitable FAQ lists. Topics included HiFi/Ambisonics, British Broadcasting Corporation and Pensions/Investments/Finance amongst others. Nonquestions were obtained from blogs on corresponding topics to avoid introducing confounding factors including hifiblog.com, bbc.co.uk/blogs and various pensions and investments blogs. The outcome was a pool of 1,660 straightforward questions and 2,288 straightforward nonquestions. The tokenising process can produce identical vectors from different sentences. The pool enabled the generation of 600 unique question and 600 unique nonquestion vectors for each variant. These quantities allow 60-fold cross-validation to be used ensuring over 1,000 training cases are available, consistent with a rule-of-thumb from Quinlan [39]. Each fold was run 10 times so that each column in the tables is the outcome of 600 DT trials. The simulated clause is somewhat artificial as it is constructed by concatenation (the example in table IV is a Difficult Non-Question followed by a Simple Question). It is possible that this approach misses features that would aid classification in the real world. B. Experiments Each of the 7 variants was investigated by conducting 4 C4.5 decision tree construction experiments (2 for each pruning method) following the approach reported in [14]. These experiments determined the highest classification accuracies obtainable and the smallest trees that could be obtained (before a significant drop in CA), using either confidence level pruning or minimum number of objects pruning. Tables V and VI provide an example of confidence level pruning applied to variant 4 – Straightforward Questions vs. Difficult Non-Questions, with no preambles attached to either class. In this case, the optimum CA was 91.72% and the tree size was reduced substantially from the baseline range to a range from 9 to 15 nodes without a statistically significant drop in CA (using the corrected re-sampled t-test). Each fold was run 10 times so that each column in the tables is the outcome of 600 DT trials.
TABLE V. b
Conf CA c Treed Size
CLUSTERED FUNCTION WORDS, OPTIMISED CA
0.25 91.58 61-105
0.2 91.24 57-99
TABLE VI. Conf CA Tree Size
0.15 91.10 49-99
0.05 91.72 25-69
CLUSTERED FUNCTION WORDS, OPTIMISED TREE SIZE
7.0E-8 90.30 9-17 b.
6.0E-8 90.30 9-17
5.0E-8 90.30 9-17
4.0E-8 90.30 9-15
3.0E-8 90.30 9-15
Conf = Confidence (the parameter controlling confidence interval pruning); c.
d.
0.1 91.49 35-77
CA = Classification Accuracy (average over the 600 trees);
Tree Size = number of nodes in the constructed trees (minimum – maximum size over the 600 trees)
C. Results Table VII summarizes the optimization across the 7 variations. It shows that in all variants except 1, there has been a significant increase in CA. Although the improvement in variant 7 is statistically significant, a maximum CA of 78.84% is not practical for real-world use. Except for variants 2 and 7 there has been a substantial decrease in the tree sizes. Thus the findings generally support the hypotheses that clustering provides better domain modeling, producing DTs which are both more accurate and efficient. The lack of improvement in CA of variant 1 is not surprising as there was the least scope for improvement. However, with an unoptimized CA of 98.51%, even a small improvement at these levels can be useful. The increase in the range of sizes for the decision trees in variants 2 and 7 is surprising. It could be explained by the clustering form preserving more information at DT splits, allowing C4.5 to explore the structure of the features in more detail. This is at the expense of building a bigger tree. Supporting evidence for this conjecture was obtained by a TABLE VII.
further experiment on variant 2 which used confidence level pruning set to 0.005. This forced the tree sizes into a range from 25-63 and resulted in a CA of 88.33%, which is consistent with the alphabetically sorted organization. VI.
CONCLUSIONS AND FUTURE WORK
The most effective classifier produced achieved a CA of 98.67% and only variant 7 had a CA of less than 90%. The classifiers described in these experiments could already make a substantial improvement to the performance of GO-CAs, which can use dialogue to disambiguate misunderstandings that could arise from occasional incorrect classifications. However, exploring the ambiguity of grammatical class membership could lead to further improvements. The current approach uses a priori knowledge of the classes to locate words that could be in more than one class on the class boundaries and to locate classes containing ambiguous words contiguously, giving C4.5 a little discretion in placing the decision boundaries. However, because the split can take place within a cluster, it is also possible for a suboptimal split to take place inside a cluster if the training set does not contain cases that define the boundary clearly. Forcing all words in the same class to have the same numeric token could result in better learning from the training sets. Alternatively, a genetic algorithm could be used to discover a better set of categories into which to group the function words, with the grammatical implications being explored later. A fuzzy approach could allow a function word to have varying degrees of set membership of several grammatical categories. Finally, the compound and complex sentences simulated by variant 7 may require the moving window technique [43], in which portions of the text are examined sequentially, to be resolved effectively.
CLASSIFICATION ACCURACIES FOR CLUSTERED COMPARED WITH ALPHABETICALLY SORTED FUNCTION WORDS Variation
1. Straightforward Q-NQ no preambles 2. Straightforward Q-NQ 1 word preamble 3. Straightforward Q-NQ mixed preambles 4. Straightforward Q difficult NQ no preambles 5. Straightforward Q Difficult NQ both with 2 word preambles 6. Straightforward Q mixed S/D NQ no preambles 7. Simulated clauses
Alpha Sorted Function Words Classification Tree Size Accuracy 98.51 29-67 88.11 31-91 79.22 113-189 89.55 49-123 89.03 53-107 89.37 59-101 66.62 3-11
Clustered Function Words Classification Tree Size Accuracy 98.67 7-15 91.03* 73-133 90.88* 71-129 91.72* 25-69 91.63* 19-27 93.14* 27-47 78.84* 27-63
* indicates that the improvement over the alphabetically sorted approach is statistically significant (α = 0.05)
[3] [1] [2]
J. L. Austin, How to do things with Words: The William James Lectures delivered at Harvard University in 1955, 2 ed.: Harvard University Press, 1975. J. R. Searle, Mind, Language and Society: Weidenfield & Nicholson 1999.
[4]
E. Hagen, "User Modeling and User-Adapted Interaction " vol. 9, pp. 167-213, 1999. D. Toney, S. Rosset, A. Max, O. Galibert, and E. Bilinski, "An evaluation of spoken and textual interaction in the RITEL interactive question answering system." in The Sixth International Language Resources and Evaluation (LREC’08),
[5] [6]
[7]
[8] [9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17] [18] [19]
[20]
[21]
[22]
[23]
Marrakech, Morocco, 2008. D. Litman and K. Forbes-Riley, "Correlations between dialogue acts and learning in spoken tutoring dialogues," Natural Language Engineering, vol. 12, pp. 161-176, 22 May 2006. A. Dielmann and S. Renals, "Recognition of Dialogue Acts in Multiparty Meetings Using a Switching DBN," IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, pp. 1303-1314, September 2008. F. Yang, G. Tur, and E. Shriberg, "Exploiting Dialogue Act Tagging And Prosodic Information For Action Item Identification " in IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, Location: Las Vegas, NV, 2008, pp. 4941-4944. E. N. Forsyth, "Lexical and discourse analysis of online chat dialog," in International Conference on Semantic Computing, 2007, Irvine, California, USA, 2007. D. P. Twitchell, J. F. Nunamaker Jr., and J. K. Burgoon, "Using Speech Act Profiling for Deception Detection," Intelligence and Security Informatics: Lecture Notes in Computer Science vol. 3073, pp. 403-410, 2004. K. O'Shea, Z. Bandar, and K. Crockett, "Towards a New Generation of Conversational Agents Based on Sentence Similarity " Lecture Notes Electrical Engineering, vol. 39, pp. 505-514, 2009. J. O'Shea, "A Framework for Applying Short Text Semantic Similarity in Goal-Oriented Conversational Agents," in Computing and Mathematics. vol. Doctor of Philosophy Manchester: Manchester Metropolitan University, 2010, p. 413. J. O'Shea, Z. Bandar, and K. Crockett, "A Machine Learning Approach to Speech Act Classification Using Function Words," Lecture Notes in Artificial Intelligence, vol. 6071/2010, pp. 8291, 2010. J. O'Shea, Z. Bandar, and K. Crockett, "Using a Slim Function Word Classifier to Recognise Instruction Dialogue Acts.," Lecture Notes In Artificial Intelligence, vol. 6682, pp. 26-34, June/July 2011. J. O'Shea, Z. Bandar, and K. Crockett, "A Multi-Classifier Approach to Dialogue Act Classification Using Function Words " Transactions on Collective Compational Intelligence, vol. VII, pp. 119-143, 2012. K. Crockett, Z. Bandar, J. O’Shea, and D. McLean, "Bullying and Debt: Developing Novel Applications of Dialogue Systems," in The 6th IJCAI Workshop Knowledge and Reasoning in Practical Dialogue Systems, Pasadena, CA, 2009, pp. 1-9. T. Längle, T. C. Lüth, E. Stopp, G. Herzog, and K. G., "KANTRA — A Natural Language Interface for Intelligent Robots," in Intelligent Autonomous Systems (IAS 4), Amsterdam, 1995, pp. 357-364. J. Allen and M. Core, "Draft of DAMSL: Dialog Act Markup in Several Layers," University of Rochester, Rochester, USA. 1997. R. Fernandez and R. W. Picard, "Dialog Act Classification from Prosodic Features Using Support Vector Machines," in Speech Prosody, 2002. D. Verbree, R. Rienks, and D. Heylen, "Dialogue-Act Tagging Using Smart Feature Selection; Results On Multiple Corpora," in IEEE Spoken Language Technology Workshop, 2006, pp. 70 - 73. M. Louwerse and S. Crossley, "Dialog Act Classification Using N-Gram Algorithms," in The Nineteenth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2006, 2006, pp. 758-763. S. Keizer, R. op den Akker, and A. Nijholt, "Dialogue Act Recognition with Bayesian Networks for Dutch Dialogues," in Third SIGdial Workshop on Discourse and Dialogue, Philadelphia 2002, pp. 88-94. A.-M. Popescu, O. Etzioni, and H. Kautz, "Towards a theory of natural language interfaces to databases," in IUI'03 International Conference on Intelligent User Interfaces Miami, 2003, pp. 327-327. R. Prasad and W. Walker, "Training a Dialogue Act Tagger For
[24] [25] [26] [27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35] [36]
[37] [38]
[39] [40] [41] [42] [43]
Human-Human and Human-Computer Travel Dialogues," in The 3rd SIGdial workshop on Discourse and dialogue, Philadelphia, Pennsylvania, 2002, pp. 162-173. J. Hirschberg and D. Litman, "Empirical studies on the disambiguation of cue phrases," Computational Linguistics, vol. 19, pp. 501-530, 1993. T. Andernach, "A Machine Learning Approach to the Classification of Dialogue Utterances," 1996. D. J. Litman, "Cue Phrase Classification Using Machine Learning," J. Artif. Intell. Res. (JAIR), vol. 5, pp. 53-94, 1996. B. Di Eugenio, Z. Xie, and R. Serafin, "Dialogue Act Classification, Higher Order Dialogue Structure, and InstanceBased Learning," Dialogue and Discourse, vol. 1, pp. 1-24, 2012. S. Lesch, T. Kleinbauer, and J. Alexandersson, "A new Metric for the Evaluation of Dialog Act Classication," in Dialor05, the Ninth Workshop On The Semantics And Pragmatics Of Dialogue (SEMDIAL 2005) Nancy, France, 2005. T. Andernach, M. Poel, and E. Salomons, "Finding Classes of Dialogue Utterances with Kohonen Networks" in ECML/MLnetWorkshop on Empirical Learning of Natural Language Processing Tasks, Prague, Czech Republic, 1997, pp. 85–94. L. Levin, C. Langley, A. Lavie, D. Gates, D. Wallace, and K. Peterson, "Domain Specific Speech Acts for Spoken Language Translation," in 4th SIGdial Workshop on Discourse and Dialogue, Sapporo, Japan, 2003. N. Webb, M. Hepple, and Y. Wilks, "Error analysis of dialogue act classification," in Proceedings of the 8th International Conference on Text, Speech and Dialogue, Karlovy Vary, Czech Republic, 2005. A. Stolcke, K. Ries, N. Coccaro, E. Shriberg, R. Bates, D. Jurafsky, P. Taylor, R. Martin, C. Van Ess-dykema, and M. Meteer, " Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech," Computational Linguistics vol. 26, pp. 339-373, 2000. K. Jokinen, T. Hurtig, K. Hynna, K. Kanto, M. Kaipainen, and A. Kerminen, "Self-Organizing Dialogue Management " in The 2nd Workshop on Natural Language Processing and Neural Networks, NLPRS2001, Tokyo, Japan., 2001, pp. 77–84. R. Serafin, B. Di Eugenio, and M. Glass, "Latent Semantic Analysis for dialogue act classification," in The 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology Edmonton, Canada, 2003. N. Webb, M. Hepple, and Y. Wilks, "Dialogue Act Classification Based on Intra-Utterance Features," in AAAI 2005, Pittsburgh, Pennsylvania, 2005. A. Venkataraman, A. Stolcke, and E. Shriberg, "Automatic Dialog Act Labeling With Minimal Supervision," in 9th Australian International Conference on Speech Science and Technology, 2002. S. Wermter and M. Lochel, "Learning Dialog Act Processing," in COLING 1996, 16th International Conference on Computational Linguistics 1996, pp. 740-745. J. D. O'Shea, Z. Bandar, K. Crockett, and D. McLean, "A Comparative Study of Two Short Text Semantic Similarity Measures," Lecture Notes in Artificial Intelligence, vol. 4953/2008, pp. 172-181, 2008. J. R. Quinlan, C4.5: programs for machine learning. San Mateo, California: Morgan Kaufmann Publishers, 1993. C. Carter and M. McCarthy, Cambridge Grammar of English: Cambridge University Press, 2006. R. Quirk, S. Greenbaum, G. Leech, and J. Svartik, A Comprehensive Grammar of the English Language, 0-58251734-6 ed. Harlow, UK: Addison Wesley Longman Ltd., 1985. J. Sinclair, Collins Cobuild English Dictionary for Advanced Learners, 3 ed.: HarperCollins, 2001. M. C. Fairhurst and M. S. Hoque, "Moving window classifier: Approach to off-line image recognition," ELECTRONICS LETTERS vol. 36, pp. 628–630, 30th March 2000.