Almost every user of the Internet is using one or more text mes- .... tion, Iext, maps human communication messages (e.g., emails, SMS messages) to a set.
Intention Extraction From Text Messages Insu Song and Joachim Diederich School of Business & Information Technology James Cook University Australia {insu.song,joachim.diederich}@jcu.edu.sg
Abstract. Identifying intentions of users plays a crucial role in providing better user services, such as web-search and automated message-handling. There is a significant literature on extracting speakers’ intentions and speech acts from spoken words, and this paper proposes a novel approach on extracting intentions from non-spoken words, such as web-search query texts, and text messages. Unlike spoken words, such as in a telephone conversation, text messages often contain longer and more descriptive sentences than conversational speech. In addition, text messages contain a mix of conversational speech and non-conversational contents such as documents. The experiments describe a first attempt to extracting writers’ intentions from Usenet text messages. Messages are segmented into sentences, and then each sentence is converted into a tuple (performative, proposition) using a dialogue act classifier. The writers’ intentions are then formulated from the tuples using constraints on felicitous human communication.
1 Introduction 1.1
Motivation
In recent years, machine learning techniques have shown significant potential to the practice of automated message processing, in particular dialog act classification for spoken words [5, 9, 13], but the use of such classifiers on non-spoken words has not been investigated. Almost every user of the Internet is using one or more text messaging services. These include email, Newsgroups, and instant messaging. In 1999, an estimated 3 billion email messages were sent every day in the USA [6]. The study by Kraut et al. [7] showed that interpersonal communication is a stronger driver of Internet use than are information and entertainment applications. That is, text messaging services are virtually everywhere and constantly demanding our attention as messages arrive. This ubiquity and the accelerating growth in the number of text messages make it important that we develop automated text message processing and text message based query processing in general. However, conversational speeches (e.g. telephone conversation) are different from electronic text messages (e.g. email, Newsgroups articles). The main difference is that the speaker turns (i.e. who is talking) are explicitly specified by the non-text field (i.e. sender) of the messages. Therefore, there is no need to identify the speaker turns. Another difference is that text messages contain longer and more descriptive sentences than conversational speech. That is, a text message is a mix of conversational speech
and normal text documents. This paper proposes a method of extracting intentions from online text messages, such as web query strings and instance messages. The experiments describe a first attempt to extract intended acts from Usenet newsgroup messages. Dialog-act classifiers were constructed to extract dialog-acts, and then senders’ intentions were deduced based on intentional speech act theories [2, 1, 14, 10]. Predicate forms (verb-object-subject) of each dialog-act units together with the extracted dialog-acts were then used to represent a set of intended acts (speech acts) of message senders. 1.2 Background This work is based on text mining and in particular text classification, dialog-act classification, and intention based theory of speech acts. The following section provides a brief overview of the core techniques, focussing on dialog act classification and speech act theory. Dialog Act Classification Traditionally syntactic and semantic processing approaches have been used for dialogue act classification, but they are error prone [9] and require intensive human effort in defining linguistic structures and developing grammar [13]. Recently, many statistical understanding approaches for dialog act classification have been proposed to overcome these problems [5, 9, 13]. All of these approaches use unigram Naive Bayesian classifiers, except [13], which uses the ngram model with dynamic programming algorithm. Another advantage of the probabilistic approaches over the traditional syntactic and semantic processing approaches is that no expensive and error-prone deep processing is involved. Intention Based Theory of Speech Acts In 1990, Cohen and Levesque [2] developed a theory in which speech acts were modelled as actions performed by rational agents in the furtherance of their intentions, based on their intentional logic [1]. Unlike their previous plan-based speech act theory, this theory is rooted in a more general theory of rational action [14]. The foundation upon which they built this model of rational action was their theory of intention [1]. The theory was formalized in modal logic using their logic of intention. Although semantics of intentional logic itself were given in possible worlds semantics in [1, 8], it was Singh [10] who proposed semantics of speech acts, i.e., their conditions of satisfaction. He proposed several useful constraints on communication. One of the interesting constraints proposed is the normative constraints for felicitous communication. An example is shown below: comm(x, y, ⟨directive, p⟩) → intends(x, p) This says that if x directs y to achieve p, x intends p. He proposes six such constraints arguing that these constraints restrict the possible models to the agents that we can assume to be rational. That is, if an agent is rational, the agent saying a directive speech act must be intending what the agent is saying. We develop an intention extraction function for message handling assistants based on these constraints in Section 3.2.
Fig. 1. The number of dialog units and the number of words in each data set.
2
Experimental Evaluation
2.1 Methodology A preliminary study has been undertaken on extracting intentions from online text messages. Three sets of online text messages were obtained from Usenet news groups. Each data set comprises of DA-annotated (dialog-act annotated) units, called dialog-act units (DUs), as shown in Figure 1. Dialog-act units (DUs) in the data set A are either a question for information or a yes-no question. The question dialog-act units are collected from various Usenet FAQ forums. Dialog-act units in data set B are collected from the same Usenet discussion group. Dialog-act units in data set C are collected from another Usenet discussion group (comp.ai.neural-nets) that is different from the data set B. The unknown words for the data sets A, B, and C in the test sets were 19.9%, 14.8%, and 33.5%, respectively. The resulting dialog-act units are represented as attribute-value vectors (“bag of words” representation) where each distinct word corresponds to a feature whose value is the frequency of the word in the text sample. Values were transformed with regard to the length of the sample. Some functional words (e.g., the and a) were removed and stemming were performed on each extracted dialog-act unit. In summary, input vectors for machine learning consist of attributes (the words used in the sample) and values (the transformed frequency of the words). Outputs are dialog-acts of each dialog-act units, that is, multiclass decision tasks were learned. All the dialog units in the data sets were labelled with 21 dialog acts. Among 21 dialog acts, we used only 11 dialog acts because other dialog acts are not useful for practical uses, such as message handling. For example, dialog acts that are realized with some fixed phrases like THANK or GREETING do not provide any essential information in message handling and trivial in recognizing them. Compound sentences and unreadable sentences that couldn’t be automatically segmented are tagged as NOTAG and not included in the data set. The assertive dialog act is further classified into four dialog acts to test classification of the combinations of performatives and concepts. The chosen dialog acts for our experiments are listed on the left column in Table 2. The types of the dialog acts and example dialogs are also shown. A total of four classifiers were implemented. Two of them were based on Naive Bayesian approaches: multivariate-Bernoulli (NB-B) and multinomial (NB-M). The other two were based on the neural network approach: single layer perceptron classifier (SLP) and multi-layer perceptron classifier (MLP). Only the MLP classifier was a non-linear classifier. All other classifiers were linear classifiers.
Fig. 2. Dialog acts for the experiment.
Fig. 3. Performance comparison
Unlike Naive Bayes classifiers, the neural network classifiers (i.e. SLP and MLP) have a few parameters that have to be set manually from some empirical study. We found that both SLP and MLP classifiers converge reasonably within 1000 iterations. We set both the learning rate η , the momentum α to 0.1 in all our tests. The number of hidden units used for each data set is shown in Figure 3. For neural network classifiers, each test was conducted ten times and the results were averaged. 2.2 Results In Figure 3, accuracy of the classifiers are shown. Test results of neural network classifiers are averages over ten tests. As we can see from Figure 3, in overall, the SLP classifier outperformed all other classifiers in accuracy. The highest accuracy in each feature set is highlighted. The SLP classifier recognized 60 dialog acts correctly from 91 dialog units in data set C, i.e., 66%. Even though the data set C was collected from a different site that is independent from the training data set B, the test result was better than the result of [12] which claims accuracy of 58% on performative classification and 39.7% in dialog act classification. Figure 4 shows the recall (rec) and precision (prec) measures for dialog acts for the feature set B. As we can see form the table, the SLP classifier again performs the best having an average recall of 42%, with 48% precision. The NB-M classifier comes to the second having an average recall rate of 38%, with 42% precision. However, unlike the accuracy measure, the MLP classifier had higher recall and precision rate.
Fig. 4. Recall and precision rates for all dialog acts.
Fig. 5. Confusion matrix for dialog acts (Multinomial Naive-Bayes on data set B).
Figure 5 shows a confusion matrix for the classification experiment using the multinomial Naive Bayes classifier on the feature set B. The tables show that the classifiers tends to assign A FACT (assert facts) to directive dialog units and A SELF SITUA (assert self situation) to commissive dialog units. An OPINION dialog units are equally misclassified as either commissive or assertive. A SELF SITUA and A FACT tends mob up dialog units of other classes. This tendency was so severe in the NB-B classifier that most of dialog units belonging to infrequent classes are classified either as A SELF SITUA or A FACT.
3
Extracting Intentions from Messages
3.1 Intention Extraction Function: Iext We describe an intension extract function defined in [11]. The intention extraction function, Iext, maps human communication messages (e.g., emails, SMS messages) to a set of intentions. It defines a relationship between messages and sender’s intentions. Let us suppose that each sentence in a message contains only one performative verb, that
is, we do not consider compound sentences that contain several performatives. Then, since all sentences in a message could be put in the form of performatives by using appropriate performative verbs [10], a sentence can be written as a pair ⟨a, p⟩, where a is a performative (e.g., a dialog-act of a DU extracted using a dialog-act classifier) and p is a proposition (e.g., a DU in a form of verb-object-subject). A message can contain several sentences. Thus, a message, m, can be represented as an ordered set of ⟨ai , pi ⟩, i.e., m = (⟨a1 , p1 ⟩, . . . , ⟨an , pn ⟩), where n is the number of sentences in the message. If we assume that messages are independent, ignoring the order of sentences, we can then define an intention extraction function Iext as follows: Iext : M → 2I , where M is the set of all messages and I is the set of all intentions. 3.2
Intentions From Speech Acts
We now describe an implementation of the intention extraction function Iext, which combines a dialog-act classifier for online text messages and a mapping function that deduces specific intentions based on the extracted dialog-acts and constraints felicitous communication proposed by Singh [10]. Figure 6 shows an algorithm of the intention extraction function, Iext, which now will be discussed in detail below. Speech act theory is an attempt to build a logical theory that would explain dialogue phenomena in terms of the participants’ mental states [3]. The theory models how a speaker’s mental state leads to communicative actions on what assumptions the speaker makes about the hearers. To clarify terms used in speech act theory, let us consider an example of a speech act: “Please shut the door.” This sentence can be rewritten as “I request you to shut the door.” Here, the performative verb is “request” and the proposition is “shut the door”. We can classify the speech act of the sentence as directive illocutionary force. In this paper, we adapt the classification made by Singh [10]. Using the formal speech act theory developed by Cohen and Levesque [2–4], we can deduce speakers’ intentions from their speech acts. For example, upon receiving a sentence containing a request speech-act and a proposition prop, we can deduce the following mental states of the speaker using the definition of the request speech-act in [2]: (Bel speaker ¬p ∧ ¬q) (1) (Int speaker p)
(2)
(Des speaker ⋄ q)
(3)
where q is ∃e(Done hearer e; prop?), and p is a mutual belief between speaker and hearer about certain conditions for the speech act to be appropriate. The intuition is that when certain conditions are met (e.g. hearer is cooperative), hearer will do an action e to bring about prop. We now propose a simple method that can acquire a sender’s intentions for an message handling agent (MA) by adapting normative constraints for felicitous communication proposed by Singh [10]. First, we introduce some notions to define an illocutionary
force extractor function, Fext. Let A = F × Prop be the set of all speech acts, where F is the set of all illocutionary forces and Prop is the set of all propositions. If a ∈ A is a speech act, it is defined as a = ⟨i, p⟩, where i is an illocutionary force and p is a proposition. Then, the illocutionary force extractor function, Fext, is defined as: Fext : S → A,
(4)
where S is the set of all sentences. This function is implemented using a dialog-act classifier developed in the previous sections. In Section 1.2, we discussed briefly the semantics of speech acts that Singh [10] proposed. His semantics of speech acts are for a speaker and a hearer, but we need to find an interpretation for an MA who reads a message received for its user from a sender. The following is an interpretation of the types of illocutionary forces for an MA. If we have the type of the speech act identified for a sentence, we can reason what the sender’s intention is for saying the sentence. That is, if the illocutionary force of a sentence is directive (e.g. I request you p), the intention of the sender is to make the user intend p. If it is assertive (e.g. I inform you p), the intention is to make the user believe p. If it is commissive (e.g. I promise you p), the intention is to make the user believe that the sender intends p, so it will be true in some future time. If it is permissive, the intention is to make the user believe that the user is not obliged to the sender not to intend. If it is prohibitive, the intention is to make the user believe that the user is obliged to the sender not to intend p. From this, we can formulate an algorithm for the intention extraction function, Iext, shown in (Figure 6). Formulas (1), (2), (3), and (4) can be used to deduce the sender’s mental states from the intentions extracted. 1 2 3 4 5 6 7 8 9 10 12
Set I := { }; Set x := Sender(message); For every sentence s in message; < i, p >← Fext(s); If i is directive, add (Int x (Int user p)) to I; If i is assertive, add (Int x (Bel user p)) to I; If i is declarative, add (Int x (Bel user p)) to I; If i is commissive, add (Int x (Bel user (Int x p))) to I; If i is permissive, add (Int x E(♢(Int user p))) to I; If i is prohibitive, add (Int x A(¬(Int user p))) to I; Return I; Fig. 6. Intention extraction function: Iext
4
Discussion and Future Work
This is the first report of a novel approach to extracting intentions from non-spoken online text messages that combines classifiers and semantic processing, in particular a speech-act mapping model for online text messages. Considerable further research is
required. The approach of extracting intentions using machine learning in processing online-messages has the potential to improve query processing in Web-search domains. This approach of extracting intentions using dialog-act classifiers for online-messages and constraints on felicitous communication can be further expanded by using alternative feature representations of text data sets, such as concept terms or semantic terms. There is massive potential of incorporating these sophisticated information extraction technologies within automated message processing and in query processing more generally.
References 1. Philip R. Cohen and Hector J. Levesque. Intention is choice with commitment. Artificial Intelligence, AI, 42(2–3):213–261, March 1990. 2. Philip R. Cohen and Hector J. Levesque. Performatives in a rationally based speech act theory. In Meeting of the Association for Computational Linguistics, pages 79–88, 1990. 3. Philip R. Cohen and Hector J. Levesque. Rational interaction as the basis for communication. In Intentions in Communication, pages 221–255. MIT Press, Cambridge, Massachusetts, 1990. 4. Philip R. Cohen and Hector J. Levesque. Communicative actions for artificial agents. In Proc. ICMAS ’95, pages 65–72, San Francisco, CA, USA, 1995. The MIT Press: Cambridge, MA, USA. 5. P. Garner, S. Browning, R. Moore, and M. Russell. A theory of word frequencies and its application to dialogue move recognition. In Proc. ICSLP ’96, volume 3, pages 1880–1883, Philadelphia, PA, October 1996. 6. Jacek Gwizdka. Reinventing the inbox: supporting the management of pending tasks in email. In CHI ’02 extended abstracts on Human factors in computer systems, pages 550– 551. ACM Press, 2002. 7. Robert Kraut, Tridas Mukhopadhyay, Janusz Szczypula, Sara Kiesler, and William Scherlis. Communication and information: alternative uses of the internet in households. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 368–375. ACM Press/Addison-Wesley Publishing Co., 1998. 8. Anand S. Rao and Michael P. Georgeff. Modeling rational agents within a BDI-architecture. In Proceedings of the 2nd International Conference on Principles of Knowledge Representation and Reasoning (KR’91), pages 473–484. Morgan Kaufmann publishers Inc.: San Mateo, CA, USA, 1991. 9. Norbert Reithinger and Martin Klesen. Dialogue act classification using language models. In Proc. Eurospeech ’97, pages 2235–2238, Rhodes, Greece, September 1997. 10. Munindar P. Singh. A semantics for speech acts. In Readings in Agents, pages 458–470. Morgan Kaufmann, San Francisco, CA, USA, 1997. (Reprinted from Annals of Mathematics and Artificial Intelligence, 1993). 11. Insu Song and Pushkar Piggott. Modelling message handling system. In Proc. 16th Australian Joint Conference on Artificial Intelligence, pages 53–64. Springer-Verlag, 2003. 12. F. Toshiaki, D. Koll, and A. Waibel. Probabilistic dialogue act extraction for concept based multilingual translation systems. In Proc. ICSLP ’98, volume 6, pages 2771–2774, Sydney, Australia, 1998. 13. Ye-Yi Wang and Alex Waibel. Statistical analysis of dialogue structure. In Proc. Eurospeech ’97, pages 2703–2706, Rhodes, Greece, September 1997. 14. Michael Wooldridge. An Introduction to Multiagent Systems. John Wiley Sons, 2001.