Emotion Recognition in Intelligent Tutoring Systems for Android-Based Mobile Devices Ramón Zatarain-Cabada1, María Lucía Barrón-Estrada1, Giner Alor-Hernández2, Carlos A. Reyes-García3 1
Instituto Tecnológico de Culiacán, Juan de Dios Bátiz s/n, Col. Guadalupe, Culiacán Sinaloa, 80220, México 2 Instituto Tecnológico de Orizaba, Avenida Oriente 9 No. 852, Col. Emiliano Zapata, C. P. 94320, Orizaba, Veracruz, México 3 Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) Luis Enrique Erro No. 1, Sta. Ma. Tonanzintla, Puebla, 72840, México {rzatarain, lbarron}@itculiacan.edu.mx,
[email protected],
[email protected]
Abstract. In this paper, we present a Web-based system aimed at learning basic mathematics. The Web-based system includes different components like a social network for learning, an intelligent tutoring system and an emotion recognizer. We have developed the system with the goal of being accessed from any kind of computer platform and Android-based mobile device. We have also built a neural-fuzzy system for the identification of student emotions and a fuzzy system for tracking student´s pedagogical states. We carried out different experiments with the emotion recognizer where we obtained a success rate of 96%. Furthermore, the system (including the social network and the intelligent tutoring system) was tested with real students and the obtained results were very satisfying. Keywords: Intelligent Tutoring Systems, Affective Computing, Social Intelligence, Artificial Neural Networks, Mobile learning.
1
Introduction
The use of new technologies such as social networking in education – specifically in intelligent tutoring systems, affective computing applied to learning systems, and mobile computing – is creating a point of inflection in the ways of learning. An inflection point is a new paradigm change used to carry out a process, such as in the case of education, where traditional methods used to learn certain activities, particularly academics, are fully renovated. In recent years, the Web has evolved from being a static platform with content information to an entity that is constantly producing, renewing, and sharing not only information but also knowledge. This way of operating, where users not only consume information and knowledge but also produce it, has been called “Web 2.0” [1] or harnessing collective intelligence [2]. Moreover, the concept of social software has emerged as part of the Web 2.0, which is a medium that allows people to not only connect to repositories containing information and knowledge (e.g. learning objects), but also connect to other people. Blogs, wikis and social networks are examples of communities of knowledge or social software, where the case of the latter is the most significant because of its explosive growth in our society. For example, social networks such as Facebook© has over 1.11 billion users and Twitter© over 645 million. A
social network is an online communication tool that allows users to create public or private profiles, create and display their own as well as other users' online social networks and interact with people in their networks [3]. Affective computing [4] is a field of research that integrates different scientific disciplines, seeking to enable computers in order to behave intelligently, interacting naturally with users through the ability to recognize, understand and express emotions. Knowing the emotional state of a person provides relevant information about his/her psychological state and offers the possibility to decide on how to respond to it. Research in this field aims to develop software systems that identify and respond to the emotions of a user (e.g. a student). Emotions are detected by different devices (PC camera, PC microphone, special mouse, neuro-headset, among others) that can be placed on a computer or person [5]. These devices are responsible for picking up the users’ signals (facial image, voice, mouse applied pressure, heart rate, stress level, to mention but a few) and sending them to the computer to be processed. Then, the resulting emotional state is obtained in real time. In the field of education, an affective system seeks to change a user’s negative emotional state (e.g. confused) into a positive emotional state (e.g. committed), in order to facilitate an appropriate emotional state for learning. The latest related works on emotion recognition in intelligent tutoring systems (ITS) incorporates different hardware and software-based methods to recognize student emotions [5, 6, 7, 8]. In the last several years, many studies have proved the benefits of web-based learning and mobile learning [9, 10, 11]. Different learning methodologies such as hybrid learning, blended learning, and mobile learning have been proposed for increasing the efficacy of web-based learning. We decided to implement our social software system as a Web-based and mobile application instead of a desktop application, due to the advantages posed by the former (platform independence, access from anywhereanytime-anyplace, no software installation, among others). In this work, we present a software system that incorporates emotion recognition and support to an ITS for mathematics, which are part of a learning social network. Our main contribution is the integration of different components like a social network for learning, an intelligent tutoring system and an emotion recognizer. The emotion is recognized by capturing the students’ facial expressions. Another contribution is the integration of an ITS inside a social network. This allows students to collaboratively work in a natural way. The output of the affective recognizer (the actual student emotion) is merged with the pedagogical or cognitive results in the math exercises, forming the input for a fuzzy system that decides what kind of exercise to present to the student. To implement this process we built a neural network for emotion recognition and a fuzzy system for tracking student's pedagogical states.
2
Explicit Invocation Architecture
We developed an application for web browsers and Android-based mobile devices, allowing the Intelligent Tutoring System (ITS) to be accessed from both PC and mobile devices (e.g. smartphones and tablets). The ITS requests the execution of an ap-
plication in order to extract the facial features. This task is done via web browser by executing an invocation to the program installed on the PC or mobile device. This program takes a picture of the student’s face to obtain features of the eyes and mouth, which are submitted to the ITS. The information obtained is sent to a Web server with a neural network in order to determine the corresponding emotion, providing such information to the ITS. Once the emotion is obtained, the ITS uses it as an input with other related values from results of the student’s exercises. Figure 1 shows the Explicit Invocation Architecture where some of the components are briefly explained.
Fig. 1. Explicit Invocation Architecture Facial feature extraction. Face. This component is responsible for finding the human face in the picture by using a Haar-like features Cascades method, implemented with the OpenCV library. Mouth, Right and Left Eye. Once the student's face is detected, the components are invoked in order to find the mouth, right eye and left eye. For optimal image processing, a ROI (Region of Interest) method was used delimiting the search space and discarding the rest of the image. Once the objects in the image are found, a series of transformations are performed, facilitating the search for edges on objects and calculating their opening distances. These data feed the input to the neural network for emotion classification. Next, the eight steps of the algorithm for Facial Feature Extraction process are described.
To calculate the opening distance (points) of the mouth, left and right eye, different transformations were performed in regions of interest where the objects are found in the image. These modifications allow the application to perform feature extraction, with optimal performance in image size, besides image noise cleaning and handling of certain pixels to identify objects in regions of interest. The Gaussian average operator was considered to allow cleaning the image obtained by the application. Initially, Gaussian g was used where the coordinates x, y are controlled by the difference σ2 according to equation 1 [12]: (1)
The results obtained using the Gaussian average operator were smoother images, removing details of the photo, allowing a greater focus on long structures. Another influential factor in the images is the brightness, which may hinder the transformation process because it does not allow the definition of some edges and structures. So we applied the histogram equalization process where the image obtained passes through a non-linear process, which tends to emphasize the brightness in a particular way to make it more suitable for recognition. With this application, the process makes changes producing an image with a flat histogram, where all levels are very similar. Then, for a range of M levels the histogram draws the points. For the input (old) and output (new) image, the number of points per level is denoted as O (l) and N (l) (for O ≤ l ≤ N) respectively (Equation 2).
(2) Since the output of the histogram is uniformly smooth, the cumulative histogram to the p level (for an arbitrarily chosen level p) should be a fraction of the total sum. Then the number of points in the output image is the ratio of the number of points in the range of levels of the output image (Equation 3).
(3) The last transformation is a thresholding that allows distinguishing the starting point for calculating opening distances in order to determine the border points of the mouth, right eye, and left eye. Specifying a certain level, the pixels are only set in two colors, white for high-level and black for low-level (figure 2). To represent the probability of distribution of the intensity levels, the following equation (equation 4) is used.
(4)
Fig. 2. Aplication of Thresholding operator
3
Intelligent Tutoring System (ITS)
Figure 3 shows the architecture of the ITS, which is similar to traditional tutoring systems. This architecture has three main modules: The Domain Module represents the knowledge of the expert and handles different concepts related to Knowledge
Space Theory [13]. This theory provides a sound foundation for structuring and representing the domain module for personalized or adapted learning. It applies concepts from combinatory theory and we use it to model particular or personalized courses. The knowledge base of this module is stored by using a particular kind of XML-based format. The Student Module provides the information about student competencies and learning capabilities through a diagnostic test. The Student Module can be seen as a sub-tree of all knowledge stored in the domain. For every student there is a static profile, which stores particular and academic information, and a dynamic profile, which stores information obtained from the navigation on the tutor and from the emotion recognition process. The Tutoring Module presents the exercises to the students according to the level in the problem. We implemented production rules (procedural memory) and facts (declarative memory) via a set of XML-based rules. Furthermore, we developed a new knowledge tracing algorithm based on fuzzy logic, which is used to track student's cognitive states, applying the set of rules (XML-based and Fuzzy rules) to the set of facts.
Fig. 3. ITS Architecture As it was mentioned, the emotion recognition is done by using a face feature extraction process that follows the Ekman's theory [14]. We considered five types of emotional states: anger, happiness, sadness, surprise, and neutral. The ITS also has, as part of the affective sub-module, an affective/pedagogical agent. The agent shows up when the student makes a mistake or error (pay attention, explain, suggest, or think actions) during the exercise, when the student correctly completes the problem (acknowledge or congratulate actions), or when the student asks for help (announce or confuse actions). Figure 4 shows an interface of the ITS developed in Spanish language, where “Genie” is delivering a congratulatory message and a reminder to ask for help if needed.
Fig. 4. Genie: An Affective Agent (the message in Spanish language) 3.1 The Neural Network for Recognizing Emotional States The emotion recognition system was built in three steps: the first one was an implementation to extract features from face images (algorithm explained before) in a corpus used to train the neural network. The second step was the implementation of the neural network. We used the Java-based algorithms implemented in Weka [15] to implement classification by using neural network (feed-forward method). The third step integrated extraction and recognition into a fuzzy system, which is part of the ITS. For training and testing the neural network, we used the corpus RAFD (Radboud Faces Database) [16], which is a database with 8040 different facial expressions that contains a set of 67 models including men and women. Once the emotional state is extracted from the student, the emotional state is sent to the fuzzy system. The fuzzy system takes the emotion value together with other parameters such as time, number of errors, and requests for help from the student exercise, and produces a math exercise (see figure 5). The difficulty of the math exercise depends on the parameters entering the fuzzy system.
Fig. 5. Feature Extraction and Emotion Classification
3.2 The Fuzzy Expert System In the ITS, a fuzzy expert system was implemented with a new knowledge tracing algorithm, which is used to track student's pedagogical states, applying a set of rules. The benefit of using fuzzy rules is that they allow inferences even when the conditions are only partially satisfied. As it was established, the fuzzy system uses four input linguistic variables: error, help, time, and emotion. These variables are loaded when the student solves an exercise. The output variable of the fuzzy system is the difficulty and type of the next exercise.
4
Results
Figure 6 illustrates the interfaces of the software in two different versions: Androidbased version (upper of figure) and Web-based version (bottom of figure). The Android-based version shows the emotion recognition of a user, part of an exercise and a message (left). The Web-based version interface shows the social network, a Math exercise (an integer division), and the pedagogical or affective agent. The social network contains all the functionalities common in this type of Web 2.0 applications (creating a user profile, a community, making friends, accessing a course (ITS), ), among others).
Fig. 6. Emotion recognition process into the software In our evaluation design method (Pretest –intervention- posttest) [17], we tested the classification precision of the neural network with two different tools: Weka and Matlab, and the intelligent tutoring system (Web-based version) with students of two schools (public and private) in Mexico.
The results of the neural networks (left corner of Figure 7) trained with Matlab can be observed. We created a two-layer feed-forward network with sigmoid hidden neurons and linear output neurons. The network was trained with the LevenbergMarquardt back-propagation algorithm. Regression Values that measure the correlation between outputs and targets had values very close to 1, meaning an almost perfect lineal association between target and actual output values (independent and dependent variables). In other words, predicted values Y (actual output), from X (target values) according to the regression model coincide almost exactly with the values observed in Y, and very few prediction errors will occur. The right part of Figure 7 shows the results with the Weka tool. We can observe the error levels when applying the classifier to corpus RAFD. We obtained excellent results with a success rate of 96.9466 % in the emotion recognition process. Small prediction errors shown with Matlab are equivalent to errors detected in Weka instances classified as incorrect (3.0534 %).
Fig. 7. Training and testing the neural network with Matlab and Weka
Figure 8 shows the evaluation results considering multiplication and division operations applied to 33 students (9 from public schools and 24 from private schools; all of them in Culiacán, Mexico). We can observe the progress of 27 students (six of them did not change their rating). Based on the results obtained with Weka and because this tool is open source, we decided to integrate this classifier with the feature extractor and the intelligent tutor-
ing system (as it was mentioned). Another reason for implementing the tutoring system by using Weka tool is that the source code used was Java, which is the programing language to develop Android-based applications. The intelligent tutoring system and the fuzzy expert system were implemented with CCS3, HTML 5, Java, and JSP. We tested the Web applications with real students and soon we are starting to test the Android-based version.
Fig. 8. Results of using the ITS with 33 students
5
Conclusions and Future Work
The implementation of this work has been a very complex job because we integrate different technologies like a social network implemented with web programming languages (HTML, JavaScript, and Java Servlets, and MySQL database), emotion recognition (Java programming, OpenCV and JavaCV libraries), Feed-Forward Neural networks (taken and adapted to our system from Weka software), a fuzzy system implemented also in Java language, and the software for the Android-based mobile device (Java for Android). As future work, we are considering creating our own corpus of emotions oriented to the teaching-learning process, covering more math material in the ITS, and making more experiments with more students Funding The work described in this paper is fully supported by a grant from the DGEST (Dirección General de Educación Superior Tecnológica) in Mexico under the program “Projects of Scientific Research and Technological Innovation”. Additionally, this work was sponsored by the National Council of Science and Technology (CONACYT) and the Public Education Secretary (SEP) through PROMEP.
References 1. O’Reilly, T. What is Web 2.0 (2005), www.oreillynet.com 2. Hage, H., Aimeur, E., Harnessing Learner’s Collective Intelligence: A Web 2.0 Approach to E-Learning, Proceeding of the 9th International Conference on Intelligent Tutoring Systems (ITS 2008), 438-447, Springer-Verlag, Berlin Heidelberg 3. Boyd, D., Ellison, N. B. Social network sites: Definition, history and scholarship. Journal of Computer-Mediated Communication (2007), 13(1), article 11, http://jcmc.indiana.edu/vol13/issue1/boyd.ellison.html 4. Picard, R. W.: Affective Computing. M.I.T Media Laboratory Perceptual Computing Section Technical Report No. 321 (1995) 5. Arroyo, I., Woolf, B., Cooper, D., Burleson, W., Muldner, K., Christopherson, R.: Emotions sensors go to school. In: Proceedings of the 14th international conference on artificial intelligence in education, 17-24. IOS press, Amsterdam (2009) 6. Calvo, R. A., D’Mello, S.: Affect Detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affect Computing 1, 18-37 (2010) 7. Baker, R. S. J. D., D’Mello, S.K., Rodrigo, M.M.T., Graesser, A.C.: Better to be Frustrated than Bored: The Incidence, Persistence, and Impact of learners’ Cognitive-affective States During Interactions with three Different Computer-Based Learning Environments. International Journal of Human-Computer Studies 68(4), 223-241 (2010) 8. Sabourin, J., Rowe, J.P.,Mott, B.W., Lester, J.C.:When Off-Task in On-Task: The Affective Role of Off-Task Behavior in Narrative-Centered Learning Environments. In: Biswas, G., Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS, vol. 6738, 534-536. Springer, Heidelberg (2011) 9. Gardner, L., Sheridan, D., White, D.: AWeb-based learning and assessment system to support flexible education. Journal of Computer Assisted Learning, 18, 125–136 (2002) 10. Costa, D. S. J., Mullan, B. A., Kothe, E. J., & Butow, P.: A web-based formative assessment tool for Masters students: a pilot study. Computers & Education, 54(4), 1248– 1253 (2010) 11. Chen, G. D., Chang, C. K., & Wang, C. Y.: Ubiquitous learning website: scaffold learners by mobile devices with information-aware techniques. Computers & Education, 50, 77–90 (2008) 12. Nixon, M. & Aguado, A.: Feature Extraction & Image Processing, Second edition.Academic Press (2008) 13. Doignon, J. –P. and Falmagne, J. C.: Knowledge Spaces. Springer-Verlag (1999) 14. Ekman P, Oster, H.: Facial expressions of emotion. Annual Review of Psychology 30:527554 (1979) 15. http://www.cs.waikato.ac.nz/ml/weka/. Weka Oficial Homepage. University of Waikato, New Zealand 16. Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D., Hawk, S., van Knippenberg, A.: Presentation and validation of the Radboud Faces Database, Cognition & Emotion, 24(8), 1377-1388, DOI: 10.1080/02699930903485076 (2010) 17. Ainsworth, S.: Evaluation methods for learning environments. Tutorial at AIED 2005 available at http://www.psychology.nottingham.ac.uk/staff/Shaaron.Ainsworth/ aied_tutorialslides2005.pdf (2005)