So for example, 'I will go to the gym for 20 minutes next Tuesday' indicates a ... logic (for an overview of dialogue systems, see for example Androutsopoulos and .... System: âWrite in your own words, how you will overcome the barrier that has ...
ISBN: 972-8924-09-7 © 2006 IADIS
RECOGNIZING WELL-FORMED IMPLEMENTATION INTENTIONS IN A COMPUTER-BASED HEALTH IMPROVEMENT SYSTEM Marco De Boni Unilever Corporate Research Colworth Bedford MK44 1LQ
Robert Hurling Unilever Corporate Research Colworth Bedford MK44 1LQ
David L. Hunt Tessella Support Services plc. Stevenage SG1 2DX
ABSTRACT We present an algorithm, built for an exercise behavior improvement system, which determines whether a sentence can be considered a well-formed implementation intention or not. We identify a number of salient features which are then optimized using a genetic algorithm. The resulting algorithm is then shown to successfully classify sentences taken from actual users. KEYWORDS
Health Informatics, Dialogue System, Implementation Intentions.
1. INTRODUCTION The health risks associated with obesity are an increasing concern for developed western countries: Prentice & Jobb (1995) have highlighted the contribution of declining physical activity to the rise in obesity and Tate et al (2001) the benefits of an internet-based behavioral weight loss programme. Prestwich et al (2003) discussed the benefits of increasing exercise level and, building on the work of Milne et al (2002), demonstrated the positive impact of combining motivational information with specific implementation intentions (Gollwitzer 1993). An implementation intention for exercise would be to specify the time, place and type of extra exercise to be engaged over the following few weeks. With this in mind we have developed a mini-dialogue system that provides feedback on a person’s exercise implementation intention, defining two key aspects of well-formed implementation intentions: First, as proposed by Gollwitzer (1993), they should specify a precise time, place and type of exercise: for example ‘I will go to the gym for 20 minutes next Tuesday’ is a more specific implementation intention than ‘I will do more exercise next week’. Second, they should also indicate a strong intention: Ajzen’s (1985) Theory of Planned Behaviour identifies intention as the mediator between attitudes and behavior; the stronger the intention the more likely the behavior will occur. So for example, ‘I will go to the gym for 20 minutes next Tuesday’ indicates a stronger intention than ‘I’ll try to go to the gym for 20 minutes next Tuesday’. The automated dialogue measures the specificity and strength of intention of a person’s written implementation intention and suggests appropriate improvements.
632
IADIS International Conference Applied Computing 2006
2. RELATED WORK Automated dialogue systems try to interpret the meaning of user sentences in order to provide an appropriate response. At the most simple level, this is done through some simple pattern matching, while more complex systems attempt to translate sentences into some meaning representation language, usually some form of logic (for an overview of dialogue systems, see for example Androutsopoulos and Aretoulaki 2003). In our case we weren’t interested in the meaning associated with a user input, but in the intention behind the meaning. Consequently, we could not rely on conventional methods of interpretation and representation and had to find a new approach. Recent research has underlined the importance of detecting psychological states such as emotion from language for a variety of tasks such as computer aided learning, psychotherapy and document classification. (Beineke et al. 2004; Pang and Lee 2004; Bechtel and Gottshalk 2004; Litman and Forbes-Riley 2004. We are unaware however of any work in the area of measuring the strength of intentions, and in particular implementation intentions, from dialogue input.
3. THE ALGORITHM We constructed the algorithm for classifying sentences as well-formed implementation intentions by first identifying the features necessary to automatically classify sentences. A number of different approaches have been used to interpret the psychological characteristics of language, including content analysis (e.g. Pennebaker and King 1999); particular syntactic classes (e.g. Wiebe 2000); semantic features (e.g. Bloehdorn and Hotho 2004); bag of words (Beineke et al. 2004). A first attempt using n-gram analysis of words did not yield any useful insight and a number of components were instead identified from the psychological literature and chosen as features; a machine learning technique was then applied to optimize their use. From the psychological literature we can see that a set number of components determine if sentences convey wellformed implementation intentions, in particular a) the ambiguity of the words contained in the sentence, b) the presence of definite times/places and exercise types and c) the use of strong or weak exercise intention words. In a slightly more formal way, we are looking at the following algorithm: Function well-formed-intention( tokenized and tagged sentence) returns a classification and a measure for each aspect considered -Measure the ambiguity of words in the sentence -Measure the presence of definite times/ places/ exercise types -Measure the presence of strong or weak exercise intention words -Combine the measure to give a classification End Function
4. FEATURES USED FOR CLASSIFICATION 4.1 Recognizing the Ambiguity of Sentence Meaning The first element we took into consideration to decide the type of intention was the kind of words used in the sentence. Ambiguity of sentence meaning was calculated by making use of WordNet (Fellbaum 1998), a database which organizes words according to their meaning where each meaning of a word is assigned a unique code (called “synset”) which defines synonyms (synonyms have the same synset).. Ambiguous words were taken to be words with a high number of hypernym (is-a relationships) relationships (as a speaker could use a number of more specific words to indicate the same concept) and synonyms (polysemy was taken to be an indicator of ambiguity). In order to find these features we looked at the number of synsets associated with each word and the number of hyponyms associated with each of these synsets. The function we used was: Function sentence-ambiguity( tagged and tokenized sentence) returns a score indicating the ambiguity of a sentence
633
ISBN: 972-8924-09-7 © 2006 IADIS
Score=0 For each word/Part-of-Speech pair: Find all the synsets corresponding to the word Count all the hyponyms of all the synsets HyponymCount = Sum( hyponyms of synsets corresponding to Word/PoS) Count how many synomyms (polysemy) word has PolysemyCount = number of synsets corresponding to Word/PoS Score = Score + HyponymCount + PolysemyCount End For End Function
The score was then normalized by taking into account the number of words in a sentence, using the simple formula score = score / text length.
4.2 Recognizing Specific Places, Times, Types of Exercises We used WordNet again to gather an initial set of specific times, places and types of exercises. These initial word lists were then expanded to cover obvious gaps and inconsistencies. We then added numbers representing times to the list of specific times. A score was assigned for each category and calculated respectively as the number of specific place, time and exercise words found in each sentence, again normalized by taking into account the number of words in a sentence. We did not attempt to correct spelling mistakes, but this was not found to be a problem in the experiments we carried out.
4.3 Recognizing Strong and Weak Intentions A list of weak (e.g. “I’ll try and...”, “I could...”) and strong (e.g. “I will VB”, “most definitely”) intention words was compiled by a psychologist. A “weak” and “strong“ score was assigned to each sentence, calculated as the number of weak and strong intention words found, normalized by the total number of words in the sentence. Again we did not attempt to correct spelling mistakes.
5. DATA COLLECTION Data was collected through a web-based questionnaire; respondents were almost equally split into males and females and were homogeneously spread as regards age. We collected a set of 350 paragraphs (which could be made up of more than one sentence), written by as many different users, which contained exercise intentions. The gathered sentences were manually annotated by a psychologist and a linguist to decide if they were to be considered well-formed implementation intentions or not. 48% of the total retained sentences were classified as well-formed intentions and the remaining 52% as not-well formed intentions. The following are examples of not well-formed implementation intentions: “I will try and talk [sic] a 30 minute brisk walk, whenever I have time.” “check out opening times of the gym and swimming pool. find out if anyone would like to get fit together. make a reasonable plan of what I am going to do.”
The following are instead examples of well-formed implementation intentions: “I'll make sure I go running two times a week in the mornings before work (rather than letting it slip), I will attend at least one strengthening class a week at the gym and I will do one long run (10 miles+) on the weekends.” “I will go for a walk at lunchtime at least 3 days a week.”
6. OPTIMIZATION OF THE ALGORITHM The collected sentences were then split equally into a training and test sample; they were then tokenized, tagged and lemmatized. One of the key requirements was to understand in detail the weaknesses (if any) of
634
IADIS International Conference Applied Computing 2006
the intention behind the input sentence in order to be able to provide useful feedback to users, suggesting ways in which the commitment to exercise should be rephrased in order to be more effective. The simplest way to do this was to consider the algorithm as a weighted sum of the features identified above, as this would enable us to immediately identify the most “important” features which generate an overall measure of wellformedness of implementation intentions. In other words we wanted to determine w1..wn in the following: Score( s ) = w1*g1(s) + w2*g2(s) +...+ wn*gn(s)
Where s is a sentence, Score( s ) is a score which can usefully be used to discriminate between wellformed and not-well formed intentions, g1..gn are the functions which return the normalized scores given as described above for a) the ambiguity of meaning, b) specificity of time, c) specificity of place, d) specificity of exercise and e) strength of intention of s. We first used linear regression analysis using the least squares method in order to determine the weights, with the a precision of 78.1% on the not well-formed intentions and 77.4% on the well-formed intentions (Fmeasure: 0.778). We then tried an alternative method in order to decide the weight to give to each component by using a genetic algorithm to optimize the weights (see Mitchell 1997 or Al-Attar 1994 for a general introduction to genetic algorithms). A number of other approaches could have been chosen, but we chose genetic algorithms for availability, ease of use and understandability of the meaning of the output.The genetic algorithm used in this work was based on (Weber et al. 1995). A simple gradient optimizer (see Mitchell 1997 for an introduction to gradient optimizers) was then implemented to further refine the optimization. The fittest point in the search space is used as the starting point; points are then generated by adding or subtracting a constant to just one parameter at a time. The fittest of these points is then chosen and the process is repeated. If none of the new points are fitter, then the constant is halved and the process is repeated until the constant becomes too small. Noun ambiguity appeared to be the only feature that did not prove useful, possibly due to the fact that many nouns would be specific times or places; however, the fact that the standard deviation in the parameter value from the genetic algorithm is high makes it difficult to conclude that it is a dispenable feature. The algorithm was able to correctly classify sentences as well-formed implementation intentions 100% of the time, and as not well-formed 88.9% of the time (F-measure 0.941).
7. IMPLEMENTATION The algorithm was then implemented within a web-based exercise improvement system, allowing users to write an exercise implementation intention and receive immediate feedback on whether their intentions should be rephrased in a more specific way. Feedback was based on the scores received from each feature, with a bias towards more heavily weighted features (so, for example if a ill-defined intention sentence was lacking in both specific places and times, feedback would include a suggestion for including specific places and times). A typical interaction would proceed as follows: System: “Write in your own words, how you will overcome the barrier that has been preventing you from exercising as much as you want. Be as specific and detailed as you can - it really will help!” User: “I'll try and do some cycling” System: “Thanks. Now try and rephrase your commitment, being more specific about what you will do (e.g. being more specific about location/time/).” User: “I'll do some cycling on Wednesday evening.” System: “That's great, work on your commitment!”
8. CONCLUSION AND FURTHER WORK We developed an algorithm to recognize well-formed implementation intentions using a number of different sentence features. We then optimized the weight given to these features using a genetic algorithm. The algorithm was shown to perform well, categorizing 85% of well-formed intention sentences and 89% of not
635
ISBN: 972-8924-09-7 © 2006 IADIS
well-formed intention sentences of the test set correctly. The algorithm has now been implemented within a wider exercise behavior system and future work will evaluate its performance within this system. As noted above, bad or variant spelling did not appear to be a serious issue in the samples collected by us, perhaps because of the high literacy of the users who were used to gather the sentences. Nevertheless there were occasional instances of bad spelling and lapses. Further work will also include using a spell-checker to correct the most common mistakes, a necessary step if the algorithm is to be used with a wider audience.
REFERENCES Ajzen, I. 1985. “From intentions to action: A theory of planned behaviour.” In . Kuhl, J. and Beckman, J. (Eds.) Action Control: From Cognitions to Behaviours, pp. 11-39. Springer, NY Al-Attar, A, 1994. “A Hybrid GA-Heuristic Search Strategy”, AI Expert USA, September. Androutsopoulos, I., and Aretoulaki, M., 2003. “Natural Language Interaction”, in Mitkov, R. (ed.), The Oxford Handbook of Computational Linguistics, Oxford University Press. Bechtel, R. J., and Gottschalk, L. A., 2004. “Detection of Neuropsychiatric States of Interest in Text”, Proceedings of the 2004AAAI Fall Symposium on Dialogue Systems for Health Communication, AAAI Press. Beineke, P., Hastie, T. and Vaithyanathan, S., 2004. “The Sentimental Factor: Improving Review Classification Via Human-Provided Information”, Proceedings of ACL 2004, Barcelona. Bloehdorn, S., Hotho, A., 2004. “Boosting for Text Classification with Semantic Features”, In Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Bruce, R. F., and Wiebe, J., 2000. “Recognizing subjectivity: a case study of manual tagging”, Natural Language Engineering, 6 (2). Fellbaum, C., 1998. Wordnet, An electronic lexical database, MIT Press. Gollwitzer, P.M. 1993. “Goal achievement: The role of intentions.” In Stroebe, W. and Hewstone, M. (Eds.). European Review of Social Psychology 4 pp. 141-185 Wiley, Chichester, England. Han, E.-H., Karypis, G., Kumar, V. 1999. “Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification”, in Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Joachims, T. 1998. “Text categorization with support vector machines: learning with many relevant features”, in Proceedings of ECML 1998. Litman, D., and Forbes-Riley, K., 2004. “Predicting Student Emotions in Computer-Human Tutoring Dialogues”, Proceedings of ACL 2004, Barcelona. Milne, S., Orbell, S. & Sheeran, P. 2002. “Combining motivational and volitional interventions to promote exercise participation: Protection motivation theory and implementation intentions.” British Journal of Health Psychology, 7, pp. 163-184 Mitchell, T. M., 1997. Machine Learning, McGraw-Hill, New York. Pang, B. and Lee, L., 2004. “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts”, Proceedings of ACL 2004, Barcelona. Pennebaker, J. W., and King, L. A. 1999. “Linguistic styles: Language use as an individual difference”, Journal of Personality and Social Psychology, 77, 1296 - 1312. Prentice, A.M., Jebb, S.A. 1995. “Obesity in Britain: Gluttony or Sloth.” British Medical Journal, 311, 437 - 439. Prestwich, A.J., Lawton, R.J. & Conner, M.T. 2003. “The use of implementation intentions and the decision balance sheet in promoting exercise behaviour.” Psychology & Health, 18 (6), 707-721. Weber, L, Wallbaum, S, Broger, C, Gubernator, K, 1995 “Optimisation of the Biological Activity of Combinatorial Compound Libraries by a Genetic Algorithm”, Agnew. Chem. Int. Engl. 34, No 20, 1995. Tate, D. F., Wing, R. R., & Winett, R. A. 2001. “Using Internet technology to deliver a behavioral weight loss program.” Journal of the American Medical Association, 285, 1172-1177. Turney, P. 2002. “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews”, in Proceedings of ACL 2002, Philadelphia.
636