EEG-based classification of positive and negative ...

19 downloads 271 Views 400KB Size Report
Maja Stikic received her PhD degree in computer science from Technical University of ... Veasna Tan, M.A., received her MA degree in Psychology at Pace University ... patented and commercialized a forensic test to liquefy hair to identify drug ...
EEG-based classification of positive and negative affective states Maja Stikic, Robin R. Johnson, Veasna Tan, Chris Berka Advanced Brain Monitoring Inc., Carlsbad, CA, USA Correspondence: Maja Stikic, Advanced Brain Monitoring Inc., 2237 Faraday Avenue, Suite 100, Carlsbad, CA 92008, USA, email: [email protected], phone: (760) 720 0099 Maja Stikic received her PhD degree in computer science from Technical University of Darmstadt, Germany in 2009. Her thesis was concerned with scalable recognition of daily activities for context-aware applications in the field of wearable and ubiquitous computing. Since 2009 she has been a research scientist at Advanced Brain Monitoring, Inc., Carlsbad, California working on EEG-based predictive algorithms for cognitive performance decrements and workload assessment, team performance, and leadership emergence. Her research interests include machine learning, human-computer interaction, signal processing, neuroscience, and electroencephalography. Robin R. Johnson, PhD, is a Director of Psychophysiological Research at Advanced Brain Monitoring. Dr. Johnson’s interests focus on cognitive neuroscience applications in stress and fatigue management; assessment, diagnostics, and treatment efficacy assessment of chronic disease states that result in cognitive impairment; training; and education. Her graduate work focused on psychoneuroimmunology, studying the effects of social stress on the disease process in an animal model of multiple sclerosis, including the behavioral deficits, with an emphasis on the basic mechanisms underlying learning and memory. Dr. Johnson has extensive experience in project management, study design, and behavioral analytics from her graduate work. For the past 7 years she has managed multiple human research based projects for Advanced Brain Monitoring, Inc. in the field of Cognitive Neuroscience. Veasna Tan, M.A., received her MA degree in Psychology at Pace University in New York, NY and BA in Psychology from Syracuse University in Syracuse, New York. She has worked on research based in cognitive psychology and test administration. She is currently a Project Manager at Advanced Brain Monitoring. Her responsibilities include IRB oversight for multiple projects, assists with grant proposals, management of participants, data analysis, and study design. Chris Berka, CEO and Co-Founder of Advanced Brain Monitoring, has over 25 years experience managing clinical research and developing and commercializing new technologies.

She is co-inventor of eight patented and eleven patent-pending technologies and is the principal investigator or co-investigator for grants awarded by the National Institutes of Health, DARPA, ONR, OSD, and NSF that provided more than $20 million of research funds to Advanced Brain Monitoring. Ms. Berka played a key role in the growth of an AMEX public company that patented and commercialized a forensic test to liquefy hair to identify drug use, recruiting a team of scientific advisors, implementing clinical trials, and co-authoring a business plan that raised over $2.5 million. She has 10 years experience as a research scientist with publications on the analysis of the EEG correlates of cognition in healthy participants and patients with sleep and neurological disorders. She received her B.A. with distinction in Psychology/Biology at Ohio State University and completed graduate studies in Neuroscience at the University of California, San Diego.

EEG-based classification of positive and negative affective states This study aimed to identify the neurophysiological correlates of two primary aroused affective states related to positive and negative emotions, and to create a classification model for each second of data. General and individualized models were built on the EEG data recorded from 98 healthy participants while watching two contrasting ~20 minute videos – one to elicit a negative affective state, and the other to induce positive affect. A comparative performance analysis indicated that individualized models outperformed a general model by 20% and achieved an accuracy of over 90%. The final classifiers were cross-validated on additional 63 participants. The classifiers achieved similar results – indicating that the models were not over-fitted against the original data. The generalization capability of classifiers was further estimated in a related study where 63 participants returned to watch videos that incorporated narratives with varying levels of fairness, justice, and character identification. Keywords: emotion recognition; physiological signals; electroencephalography; discriminant function analysis

1. Introduction Affective Brain Computer Interfaces (aBCI) have attracted increasing attention in recent years due to their potential to enable a number of compelling applications in domains such as healthcare, entertainment, marketing, education, and the arts. [1-4] Emotion recognition has emerged as a notable research topic in this field as it provides a window on the internal mental state of the user, an elemental component in the development of more intuitive, adaptive human-computer interaction techniques. There exists a wide range of techniques used for automatic recognition of affective states that are based on facial expressions, body language, or verbal cues such as the change in the tone of speech or voice intonation. [5-9] These existing methods, however, are all limited by the main challenge in emotion recognition: observation of external indicators of emotion fail to account for the subconscious component of emotion, thoughts, veiled

feelings, and more importantly, can be easily subject to deception. For this reason, researchers have begun exploring the underlying neural constructs governing human emotion in search of physiological explanations for psychological phenomena. [10,11] The use of physiological measures was limited in the past by the obtrusive nature of earlier instrumentation, but this has changed with the advent of miniaturized sensors and embedded platforms capable of supporting complex signal processing techniques in real-time. Typically used physiological signals to derive emotional biomarkers include electrocardiography (ECG) [12,13], electrodermal activity (EDA) [14], electromyography (EMG) [15], skin temperature [16], or multi-modal approaches [17]. These modalities capture the physiological changes indicative of anxiety, and are therefore an indirect reflection of changes in emotional states. Neuroimaging techniques such as electroencephalography (EEG) could provide further insight into the internal affective states of the user. Compared to the other neuroimaging techniques, EEG-based emotion assessment is inexpensive and it provides continuous measures with high temporal resolution. [4,18,19] Studies on emotion recognition commonly draw on two primary emotional concepts. The first type assumes that more complex emotions are made up of a finite array of “basic emotions” and thereby focuses on a discrete set of basic emotional constructs, such as anger, fear, sadness, happiness, or surprise. [20-23] The second approach

uses

a

two-dimensional

scale

based

on

valence

(ranging

from

negative/unpleasant to positive/pleasant) and arousal (ranging from calm to excited). [24-26] The majority of emotion recognition approaches (e.g., [4,21,27,28]) employ the latter valence-arousal model due to the universality of this two dimensional model. In [29], a thorough review of the studies on the neurophysiological correlates of emotions' valence-arousal is presented.

Different methods of emotion elicitation have been proposed in order to obtain data that correspond to a particular emotional state. These include participant-elicited methods (in which the participants are asked to either mimic a particular emotion or to recall that emotion from the past), and event-elicited methods (in which the participant is presented a set of stimuli such as images, sounds, or video clips designed to induce the emotion of interest). The participant-elicited methods are often used for emotion recognition based on facial expressions or speech, while the event-elicited methods are more convenient for the neurophysiological approaches. Different brain regions are activated during the presentation of various stimuli modality types. [30] Two validated and commonly used databases for visual and audio stimuli include the International Affective Picture System (IAPS) [31] and International Affective Digitized Sounds (IADS) [32]. Both have been successfully administered in a number of studies. [4,33-35] Although normative ratings have been established for both the IAPS and IADS, studies continue to gather and utilize subjective responses for each individual in order to take into account individual differences. [28,36] As past experiences play an important role in emotion elicitation, a single stimulus can result in divergent ratings. An extensive comparison of different studies in [37] showed that [28,36] obtained higher performance scores independent of stimuli types. Interestingly, these two studies used continuous stimuli such as music [28] and movies [36] to elicit the targeted emotions. These two stimuli types were also successfully applied in other studies as well, (e.g. [4,18]). Lately, there have been attempts to collect a unified database of music videos for emotion induction as well. [38] A wide range of machine learning methods have been employed in the field of emotion recognition ranging from linear discriminant function [28] over k-nearest neighbor (kNN), multilayer perceptron (MLP), and Support Vector Machines (SVM)

(all evaluated in [18]), to fractal-based approaches [4] and regression techniques [39]. The major challenge for all of these methods has been the inter-participant and intraparticipant variability inherent in the physiological data. Typically, efforts to address individual variability seek to apply data normalization techniques to account for these differences. [36] A recent study was conducted on intra-participant variability by analyzing physiological variations of a single participant. [17] Findings indicate that features of different emotions on the same day tend to cluster more tightly than do the features of the same emotion on different days. The majority of studies, however, are comprised of a relatively small number of participants and thus lack the ability for detailed evaluation of the generalization capabilities of trained models. This paper presents the results of a larger cohort comprised of 161 participants in an EEG-based study that attempts to develop a classification model for 2 classes of, mainly aroused, emotions: negative and positive affective states (in terms of their valence). These two states were induced by continuous stimuli (i.e., the same videos that were used in an already established protocol [40-42]). The previous studies on EEG-based correlates of these relatively complex affective states have shown very rich correlates. [38,43,44] For this reason, we accessed relatively high spatial resolution by covering all relevant scalp regions and recording the EEG data with 20 electrodes to identify which parts of the brain are most activated when these two affective states are elicited. Furthermore, in order to develop an algorithm suitable for real-time monitoring in real-world settings, we did not apply computationally expensive approaches. Moreover, we aimed to investigate several ways of overcoming individual differences in EEG signals, by building both general and individualized models on a large sample size, cross-validating them, and analyzing their benefits and shortcomings. Lastly, generalization capability of the classifiers was further estimated in an experiment

(comprising 63 participants) of the narrative storytelling followed by charity donations questionnaire. The goals of this experiment were to examine the effects of varying narratives on the affective states, potential dependency of charity donations on the affective states previously induced by narratives, and lastly to identify and characterize the corresponding neural correlates. So far, very little (e.g., [2]), narrative storytelling research has been directed towards neurocognitive responses to story and to specific story elements in order to establish the internal causation and linkages for the externally-observed effects. One of the most comprehensive compendiums of storybased research from a variety of multi-disciplinary science fields is presented in [45].

2. Methods In this section, we outline the study protocols, including both training and testing datasets and we detail acquisition system together with the signal processing, data analysis, and evaluation procedure.

2.1. Participants The study protocol was approved by the Chesapeake IRB (Columbia, MD). Healthy participants between the ages of 18 and 70 were recruited through ads posted on the web. All participants voluntarily responded to the web posting or found out about the study through word of mouth. Each participant was compensated $20 per hour for their time after completion of the study. The study protocol was comprised of a telephone pre-screening, orientation, and two separate experimental sessions. The eligibility prescreening criteria briefly covered anything known to effectively alter EEG signals. Anyone with general health problems such as psychiatric, neurological, behavioral, attention, or sleep disorders, as well as any pulmonary or eating disorder, diabetes, high

blood pressure, or history of stroke were excluded from the study. Those who reported using pain medications regularly, stimulants such as amphetamine, illicit drugs, or consumed excessive alcohol or tobacco on a daily basis were also excluded. In addition, those with head injuries within the past 5 years or who were pregnant were not included in the study. Interested callers who passed this initial screener were scheduled for the orientation session. Out of the total 167 participants that passed the screening criteria, 161 completed the emotion elicitation experimental session. Six participants failed to comply with study protocols, voluntarily withdrew during the course of the study, or had poor EEG data quality. A summary of the participant demographics revealed that 52% were female, and overall average age was 36.3. The first 98 participants were used for the initial building of the emotion classifier. As the study progressed, additional 63 participants completed the study and they were used for cross-validation. A subset of 63 participants (57% female, average age 38.2) from the 161 participants used to build and cross-validate the classifier were able and willing to return during the course of the study and complete the narrative storytelling experiment. The resulting data was utilized for an additional estimation of emotion classifier’s generalization capabilities.

2.2. Study protocols The orientation session comprised of the informed consent procedure followed by a computerized survey consisting of demographic information and a series of subjective psychological questionnaires to eliminate subjects with high anxiety, stress, or depression risk. The orientation session lasted approximately 2 hours. The participants were then scheduled to arrive at the research facility between 8:00am and 9:00am for the experimental sessions. They were asked to ensure they got a full 7.5-9 hr of sleep in

the nights leading up to their appointments. They were also instructed to try to stay still during the experimental sessions to reduce the amount of the movement that is known to contaminate EEG.

2.2.1. Experimental session 1: positive/negative affective states dataset The testbed utilized commercially available videos (Figure 1) to elicit the affective states of interest, based on prior studies using similar protocols [40-42]. Positive affect was induced by humorous clips from "America's Funniest Home Videos" (AFV), and negative affect by a battle scene from the war drama "Saving Private Ryan" (SPR). The positive video consisted of the opening theme song (lasting approximately 1min), followed by 16min long compilation of the AFV clips. The introduction for the 19min long negative video showed an approximate 2min opening scene of SPR (i.e., visiting the cemetery of the fallen comrades) followed by a presentation of the battle from the "Omaha Beach" scene. In the data analysis, the introductory parts of the videos were not used. In order to avoid carryover effects, the presentation of each video was counterbalanced, i.e. half the participants viewed the full 19min SPR video first followed by the full 16min AFV, and the other half viewed AFV first followed by SPR. After each video, the participants were asked to rate the video in terms of valence (negative, neutral, or positive) and arousal (on 1-10 scale). The videos were presented using E-Prime®, developed by Psychology Software Tools Inc., Sharpsburg, PA.

2.2.2. Experimental session 2: narrative storytelling dataset The goal of this experiment was to identify and characterize the neural correlates of the effects of narratives on the affective states. The participants watched a ~19min video narrative (Figure 2) built around the archetypal themes of fairness and justice, situated in a contemporary and cross-culturally applicable context. Specifically, the current story

involved themes of injustice against women, illegal immigrants, and people with disabilities in order to elicit strong negative emotional responses. The 11-part story was developed with 3 variable segments to enable alternate character descriptions that would potentially increase or decrease empathy and character identification for both the main character and the antagonist. The last variable segment in the narrative further allowed for the story resolution to divert between two versions: a "more just" or "least just" ending for the main character. Although the story was deliberately intended to elicit overall negative reaction, the variable segments may have increased or decreased the level of negative reaction. Each segment of the story was specifically designed to engage the participant and sustain attention. The introduction provided background material that included the theme of illegal immigration. As the narrative continued, the main sympathetic character, Maria, was introduced as either a strong independent woman or as a victim. The next 4 segments remained the same in both story versions during which a supporting character, Freight, is introduced through both mild sense of humor and conflict. The seventh story segment was variable in its introduction of the antagonist, Ramon. Ramon was presented as a negative character, however, depending on the story version, the audience was given either a very negative view of his character or a mildly less negative view. The following 3 segments did not vary between story versions and described the main conflict between the main character, Maria, and the antagonist, Ramon. The final story segment contained the two variable versions of resolution: "least just" in which the antagonist was not punished for the crime he committed, and "most just" in which the antagonist could not escape justice. Overall, the "most just" version depicted Maria as the strong independent woman with Ramon's character as mildly negative, and the "least just" version had Maria as the victim with Ramon being very negative.

Participants watched either the "least just" (N=30) or "most just" (N=33) version of the story. Afterwards, participants completed a set of post-video questionnaires that asked if they would like to donate to a particular charity out of a list of 3 foundations related to the narrative. Charity donations would come directly out of their compensation for participation. 2.3. Data recording and signal processing EEG was acquired using the B-Alert® X24 wireless sensor headset (Advanced Brain Monitoring Inc., Carlsbad, CA) shown in Figure 3. This system has 20 referential EEGchannels located according to the International 10-20 international system at Fp1, Fp2, Fz, F3, F4, F7, F8, T3, T4, T5, T6, Cz, C3, C4, Pz, P3, P4, O1, and O2 sites as well as an additional referential EEG-channel located at POz (Figure 4). Linked reference electrodes were located behind each ear on the mastoid bone. The EEG signals were sampled at 256Hz after being filtered with a high pass filter at 0.1Hz and a low pass filter at 100Hz. Data were captured unobtrusively and transmitted wirelessly via Bluetooth to a host compute, where acquisition software then stored the data. The sharp notch filters were applied to remove environmental artifact interference from the power network. The proprietary acquisition software also included artifact detection algorithms [46] in the time-domain EEG signal, such as spikes, amplifier saturation, or excursions that occur during the onset or recovery of saturations. Eye blinks were identified and decontaminated by an algorithm [46] based on wavelet transformation. The decontaminated EEG signal was transformed from the time-domain to the frequency domain for further data analysis by applying FFT. The log transformed power spectral densities (PSDs) were calculated for each 1sec epoch. Next, excessive muscle activity (EMG), which is particularly relevant for the aBCI systems, was detected by identifying epochs in which: a) PSD bins from 35Hz to 40Hz were above a certain

threshold and b) square root of the PSD bins' sum from 70Hz to 128Hz was above a defined cut-off value. The threshold values were customized during the EEG sensor development. Epochs with detected EMG-artifact were discarded from the further analysis. 2.4. Data analysis We utilized and compared effectiveness of 2 types of features, PSDs and wavelets. The PSDs from 1Hz to 40Hz bin were grouped into the standard EEG bandwidths (i.e., delta, theta slow/fast/total, alpha slow/fast/total, sigma, beta, and gamma). We intentionally did not include into the data analysis Hz bins above 40Hz to minimize potential EMG artefacts and the gamma band used in the analysis was cut at 40Hz. This part of feature vector comprised 200 variables (i.e. 10 PSD-bandwidths for 20 EEGchannels). Next, the PSD-bandwidth variables were also summarized over all EEGchannels, and over the frontal, central, parietal, occipital, temporal left/right, midline, and left/right brain regions. This type of features consisted of 100 variables (10 PSDbandwidths over 10 regions). Lastly, wavelet coefficients were derived for each EEGchannel in the exponential 0-2, 2-4, 4-8, 8-16, 16-32, and 32-64 Hz bands by applying the Coiflet-wavelet filter [47]. This type of features included 120 variables (20 EEGchannels over 6 bands). All 420 variables were extracted on the second-by-second basis. The most discriminative variables were selected from the training set (AFV/SPR dataset) by utilizing variable selection procedure in SAS software package: step-wise discriminant analysis. In each step, a set of F-tests was performed as the selection criteria to determine the explanatory power of variables and to select which variables to include or exclude from the model. The selection criterion was the 0.05 significance level of an F-test from the analysis of covariance, where the variables already in the model acted as covariates, and variable under consideration was dependent variable.

Furthermore, the singularity criterion was utilized during each step to preclude the entry of the variables whose squared correlation with the variables already in the model was significant (p = 0.05). The selected variables were then utilized for building the general model of positive and negative affective states. Two different classification approaches were applied: linear Discriminant Function Analysis (lDFA) and quadratic Discriminant Function Analysis (qDFA) [48]. The classifiers fitted a multivariate normal density to each class of interest with either a pooled estimate of covariance (lDFA) or covariance matrices stratified by class (qDFA). Individual variability in the EEG data could confound the EEG-based assessment of affective states. To account for the variability of the individual physiological signatures of affective states, we also built individualized models for each participant by training the model on each participant's data, but using the same set of selected variables as in the general model. 2.5. Evaluation procedure The data from the initial positive/negative affective states dataset (i.e., N=98 participants, more than 187000 instances) were used for model development, i.e., for training of the general model. The trained model was evaluated in different ways. First, in the initial model development phase, auto-validation was utilized to test the feasibility of the model by testing it on the training data. Afterwards, the generalization capabilities of the model were assessed by leave-one-participant-out cross-validation, i.e., by testing it on the data that was not used for training. The model was first trained on the positive/negative data from 97 participants and then tested on the remaining participant's data. The procedure was repeated for all 98 participants, and the results were averaged across all cross-validation rounds. Furthermore, the leave-one-

participant-out cross-validation was generalized to leave-n-participants-out crossvalidation to investigate behavior of the classifier for increasing n. In this case, data from n randomly selected participants were used for testing, and the rest of 98-n participants were used for training. This was successively repeated until data from all participants were utilized for testing. In the end, the general model was further crossvalidated based on the 112800 test instances from the additional data recording that included 63 participants watching the same set of videos (i.e., AFV and SPR). Moreover, for each participant (N=98), the lDFA and qDFA models were individualized by training them on the participant's positive/negative data and evaluating them in terms of the models' sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy. These classifiers utilized the same set of features as the general model (selected on N=98 participants), so the generalizability of the selected features in this case, was validated on the same N=63 participants that were used for cross-validation of the general model. In addition, we explored the possibility of using shorter video sequences for training the individualized models, by using the first 1 to 15 min of the AFV and SPR videos for training and the remaining of the videos for testing the models. Lastly, the generalization estimation of the classifiers was performed on the narrative story-telling dataset. Each story segment was classified with both the general and individualized models based on the participant's EEG data. In order to assess the participant's affective state (i.e., positive or negative), class posterior probabilities on the second-by-second basis were averaged over the story segments. These averaged story segment classifications were then further analyzed in order to examine the classifiers ability to track the subtle differences between the "least just" and "most just" story versions in terms of their varying levels of fairness, justice, and character

identification. Lastly, the averaged story segment classifications were also utilized in the lDFA classification to predict if the participants were going to donate to the charity. 3. Results In this section, we present the variable selection results, model development and crossvalidation results of the classifiers, subjective ratings of the AFV and SPR videos, as well as narrative storytelling data analysis results.

3.1. Variable selection results Overall, 48 variables were selected as significant by step-wise analysis. In Table 1, the first 30 variables selected are shown, since the rest of the selected variables had notably lower F-test scores. These variables are selected based on the entire dataset, and they are further used for both the general and individualized models. To test their stability, we also selected the most discriminative variables: a) for each fold of the leave-oneparticipant-out cross-validation, and b) for each participant. In Table 1, we also show how many times each of the variables was selected in these two cases. There were not that many differences in the selected variables when a single participant was omitted. On average, the variables were selected 70 times. On the other hand, the variables selected for each participant separately varied much more, and in this case the variables selected on the entire dataset were selected by 15 participants, on average. Overall, out of the 48 selected variables, the most predictive variables were from the gamma frequency band, followed by the variables from the theta and beta frequency bands. The majority of the selected variables came from the frontal, temporal, and prefrontal EEG sites (i.e., 27%, 21%, and 10%, respectively). Wavelet coefficients were less dominant (23% of the selected variables), and they were typically selected in the last steps of the step-wise variable selection.

As EMG-spectrum has the broad range, it is likely that EMG-artifact was still present to some extent in the decontaminated EEG-signal and the most informative gamma PSD-bandwidth. We investigated this further by identifying and discarding the sessions with large amount (>20%) of EMG-contaminated epochs. The gamma PSDbandwidths averaged over all sessions were compared to the set of sessions after the highly contaminated sessions (15% AFV and 5% SPR videos) were discarded. There were no statistically significant differences between the gamma values for these two groups, suggesting that the EMG-artifact did not greatly contaminate the gamma band. 3.2. Model development results The classification results during the model development (i.e., auto-validation) are shown in Table 2. The general and individualized lDFA and qDFA models were evaluated in terms of sensitivity, specificity, PPV, NPV, and accuracy. The general model was further cross-validated in a leave-one-participant-out manner. The leaveone-participant-out cross-validation approach is not applicable for the individualized models in this setting. Lastly, leave-one-participant-out approach was extended to leave-n-participants-out. The general model achieved reasonable classification results. During autovalidation accuracy was 74.2% for lDFA, and 70.7% for qDFA. During leave-oneparticipant-out cross-validation, there was no significant drop in accuracy of the classifiers (lDFA: 73.0%, qDFA: 67.8%). Overall drops in the classification results were 1.1% and 3.7% for lDFA and qDFA, respectively. This indicated the classifiers were not overfitted to the training data. When comparing the lDFA and qDFA classifiers, qDFA achieved higher specificity and PPV than lDFA, but sensitivity and NPV of lDFA were higher; i.e., qDFA often misclassified positive class as negative, and vice versa: lDFA misclassified negative class as positive more often. Ultimately, the

lDFA classification results were more stable and the overall accuracy of the lDFA classifier was higher by 3.5% and 5.2% during auto-validation and leave-oneparticipant-out cross-validation, respectively. In Figure 5 we show accuracy of the general lDFA and qDFA classifiers for leave-n-participants-out experiments. We varied the value of n from 2 up to 97, when the data of only a single participant were used for training, and the trained classifier was tested on the remaining 97 subjects. From the plot one can observe that accuracy is stable for n < 80 suggesting that even the data from about 20 individuals could be sufficient for building a successful classifier. Upon evaluation of the model efficacy for individual participants, we found that the model was excellent for many participants, but for some participants it was very weak. Accuracy was above 70% for 66% and 48% participants for lDFA and qDFA, respectively. However, for 17% and 25% participants, accuracy was below 60% for lDFA and qDFA, respectively. For a few participants (3 participants for lDFA and 5 participants for qDFA), accuracy was below the chance level. This indicated a large individual variability in the EEG data. To account for the significant variability between individuals, we also built an individualized model for each participant. From Table 2, one can observe that the individualized model’s performance was excellent (over 90%); the accuracy of the lDFA classifier was 92.8% and qDFA achieved accuracy of 94.7%. After inspecting the individual participants' results, it turned out that the variance was lower than in the case of the general model and overall performance was better. For all participants, accuracy was above 70% for lDFA and above 80% for qDFA. Accuracy over 90% was achieved for 79% and 83% participants for lDFA and qDFA, respectively.

In Figure 6, we show average accuracy of the individualized models that were trained on the shorter AFV/SPR videos (1-15min) and tested on the rest of the videos. From the plot, one can observe that using 6min of the videos already provides the classification results over 80%. 3.3. Cross-validation results The cross-validation classification results are shown in Table 3. The results were in the same range as the model development results (general model: lDFA accuracy was 74.3% and qDFA accuracy was 69.4%; individualized model: both lDFA and qDFA accuracy was 94.5%, suggesting that the general model was not overfitted and its selected variables were also effective for the individualized models of the crossvalidation participants. Surprisingly, in some cases the cross-validation results were even slightly higher than the auto-validation results. However these differences were not statistically significant. Again, the individualized models were more stable as their standard deviation of accuracy was, on average, 13% lower than for the general model. 3.4. Subjective ratings of the videos The subjective ratings of the AFV and SPR videos were analyzed, and the AFV videos were rated as positive by the majority of the participants (94.6% and 98.4% of the participants during the model development and cross-validation phase, respectively). None of the model development participants rated the AFV video as negative, and none of the cross-validation participants rated it as neutral. Average arousal scores were 7 and 6.7 for the model development and cross-validation participants, respectively. The SPR video got slightly higher average arousal scores (it was rated with 7.5 by the model development participants, and 7.7 by the cross-validation participants), but its valence scores were less consistent. The SPR video was rated as negative, neutral, or positive by

83%, 10.6%, and 6.4% of the model development participants, respectively. Similarly, 95%, 1.7%, and 3.3% of the cross-validation participants rated the SPR video as negative, neutral, or positive, respectively. In order to account for the individual differences in the subjective ratings, we did an experiment in which we excluded from the analysis the participants who rated the AFV video as negative or neutral, and the ones who rated the SPR video as either positive or neutral. Furthermore, we also excluded the participants who assigned the arousal score below 6 to any of the videos. After exclusion of these participants, the classification results only slightly improved (up to 2% and 5% during model development and cross-validation, respectively). The differences in the results were not statistically significant based on the two-tailed t-test (p > 0.05). 3.5. Narrative storytelling data analysis results In this section, we present estimation of the emotion classifier generalization capabilities on the narrative storytelling dataset. First, we investigate the differences in the predicted negative/positive affective state levels during the "least just" and the "most just" version of the story, and second, we examined if these differences in classifications are sufficient to predict if the participants would donate to the charity.

3.5.1. Narrative storytelling segment classifications In Figure 7, we show the averaged probabilities of the negative class for each story segment of both the "least just" and "most just" story versions. Due to the space limitations, we omit the positive class as the posterior probabilities of the classes of interest (i.e., positive and negative) sum up to 1. However, the averaged negative probabilities were relatively high (between 0.47 and 0.91), so one can conclude that the participants felt more negative during the narrative storytelling, which was expected due

to the story scenario. From the plots, one can observe that the "most just" story version typically induced a lower level of negative affective state than the "least just" story. The difference was the most notable for the individualized qDFA model. In that case, the last story segments in which different story endings were introduced induced significantly different levels of negative emotion, based on the t-test (p = 0.01). 3.5.2. Charity donations analysis Next, the charity donation analysis was performed in three different ways: (1) The differences between the number of charity donations for the participants who heard the "least just" and "most just" story versions were examined; (2) The story segment classifications between the participants who donated to the charity and the participants who did not donate were compared (for both stories, i.e. "least just" and "most just"); (3) Lastly, it was predicted if the participants did end up donating to the charity based on their induced levels of positive/negative affective states during the story segments. Charity donation analysis showed that the participants who heard the "most just" story version donated the money to the charity more often than the participants who heard the "least just" version of the story. Overall, 46.2% and 23.8% donations occurred after the "most just" and "least just" story versions, respectively. Even though the negative emotion levels were overall slightly higher during the "least just" story, the "most just" story was designed to induce the stronger feelings of justice. Thus, participants who heard the "most just" story were willing to donate the money to the charity. In Figure 8, the negative class levels averaged over story segments are shown for all four models (i.e., general/individualized lDFA/qDFA) depending on the story

version and whether the participants donated to the charity or not. From the plots one can observe that in case of the "least just" story, the participants who donated the money had overall higher negative affective state levels than the participants who did not donate for all four classification models. In the case of the "most just" story, the behavior of the individualized models was the same, but the induced levels of negative emotion of the participants who donated the money were lower based on the general models. However, the general models predicted similar negative emotion levels for the participants who did not make charity donations for both story versions: "least just" and "most just". Based on the averaged negative affective state levels over the story segments, we trained a classifier to predict if the participants donated money to the charity or not. Figure 7 shows that "least just" and "most just" story versions had different induced levels of emotion already during the first segment, although that segment was identical in both versions. The participants' reaction depended on their personal view on the presented information and previous experiences with illegal immigration. To account for this difference, we used the data from the first segment as a baseline, and the posterior probabilities for the next 10 segments were subtracted from the baseline probability. Due to the relatively small sample size in this experiment, we trained only the lDFA classifier. However, we did evaluate all four models by training a separate classifier for each model. The results are shown in Table 4. Overall, the general qDFA model characterized the participants who donated the money slightly better than the other models, achieving the classification accuracy of 72.3%. This model was further cross-validated in the leave-one-participant-out manner, and it achieved the classification results only somewhat above the chance level (52%).

4. Discussion In this paper, we have presented an extensive study and a comparative evaluation of different methods to discriminate between the positive and negative affective states based on their EEG-derived profiles. We have contrasted the general and individualized models by examining their effectiveness in accounting for the individual differences inherent in the neurophysiological data. We have also compared performance of the two different classifiers (i.e., lDFA and qDFA) utilized in both settings: user-dependent and user-independent. Lastly, we have also utilized self-ratings of the induced emotions during AFV/SPR videos to validate the ground truth and potentially improve the results. In the following, we summarize the important criteria our algorithm had to meet in order to be applicable in real-world applications, and the achieved results. We also contrast the proposed algorithm to the related work and discuss the main emotion recognition challenges identified during our study as well as the limitations of the proposed approach and future work directions. In order to be able to implement the emotion recognition algorithm in real-time, the method should not be computationally expensive. We utilized relatively simple yet effective classifiers (lDFA and qDFA) and validated the generalizability of the trained classifiers in different ways. First, the models were evaluated in a leave-one-participantout cross-validation manner on the AFV/SPR dataset comprising 98 participants to test if the algorithms were able to account for the individual differences in the data during the initial model development. Second, the final model was cross-validated on the AFV/SPR data from the additional 63 participants. The cross-validation results were in the same range as the model development results, indicating that the selected variables and classifiers were not overfitted to the training data. Individualized models were able to discriminate between the negative and positive affective states with accuracy of over

90% in all settings, outperforming the general models by more than 20%. Third, the current models were found to be effective not only across different participants, but to some extent across different emotion induction tasks too. Their generalization capability was estimated on the data from the narrative storytelling experiment that included 63 participants watching video narratives of varying levels of fairness, justice, and character identification. The preliminary results suggest that the positive/negative class estimations during different segments of the story could be helpful in prediction if the participants were going to donate to the charity, but due to relatively low classification results in this experiment, further investigations are still needed. Overall, the most predictive variables in our dataset were from the gamma frequency band. This is in line with [49-51], where it has been shown that the gamma band is suitable for EEG-based emotion classification. However, gamma band is also known to be contaminated by EMG due to its broad spectrum range. [52,53] We aimed to minimize EMG influence on the gamma band, by cutting it at 40Hz. Furthermore, we instructed the participants to reduce the amount of movement to a minimum, however as shown in Section 3.1 some AFV sessions were largely contaminated with EMG, presumably due to the mirthful laughter. This is difficult to avoid, especially when individuals behave naturally. But, even if the gamma band was influenced by EMG to some extent, we believe that the results of this large study are still essential for aBCI. In our study, we replicated the testbed from [40,50] based on the AFV and SPR videos and our results are in agreement with the conclusion from that study that the gamma frequency band is a good indicator of the difference between the positive and negative affective states. However, compared to [40,50] that only analyzed the PSD bandwidths over 9 EEG-channels during the positive/negative affective states on a relatively small number of participants (up to 20), we: a) extended the study to a much

larger number of participants (N=161) and 20 EEG-channels, b) examined the individual differences in the EEG data, and c) developed a classifier that is able to recognize the two affective states of interest with the high accuracy. The current study sought to develop a method for positive/negative emotion recognition that could be easily applied in different aBCI-based applications, but this is only the first step in developing an emotion recognition system that can be implemented in real-time and utilized in real-world settings. In order to bridge the gap between stateof-the-art emotion recognition approaches and the deployment of emotion recognition systems in the field one still needs to overcome a few challenges. All previous studies including ours have been performed under laboratory conditions. In order to move out of the lab, evaluation of the algorithms is needed in real-world emotion elicitation situations. Our algorithm was only partially able to re-identify neurophysiological correlates of positive/negative affective states during narrative storytelling and to predict charity donations. In future, we plan to investigate this further and cross-validate the classifier on the new set of the videos that would induce less extreme positive/negative emotions, compared to the AFV/SPR videos. This might also provide better insight why some classification results were even higher during cross-validation in our study. In our study, we have identified a larger number of movement-based artifacts than usual by utilizing our artifact decontamination algorithm and we aimed to minimize its effect on the EEG data. However, in more naturalistic settings, a larger amount of EMG-artifact is expected, and more robust artifact detection algorithms might be required. Acknowledgements. This work was supported by the Defense Advanced Research Projects Agency (government contract number #D12PC00367). The views, opinions, and/or findings contained in this article are those of the author and should not be

interpreted as representing the official views or policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the Department of Defense. The authors would like to thank Stephanie Korszen for excellent editing advice. References 1. Allison B., Graimann B., Graser A. Why Use a BCI if You are Healthy? In: Brainplay 07, Brain-Computer Interfaces and Games Workshop at ACE (Advances in Computer Entertainment), 2007. 2. Cavazza M., Pizzi D., Charles D., Cogt T., Andre E. Emotional Input for Character-based Interactive Storytelling. In: Proceedings of the 8th International Conference of Autonomous Agents and Multiagent Systems. 2009; 1: 313-320. 3. Bos D.P.O, Reuderink B., van de Laar, B., Gurkok H., Muhl C., Poel M., Nijholt A., Heylen D. Brain-Computer Interfacing and Games. Brain-Computer Interfaces HCI Series. 2010; 149-178. 4. Liu Y., Sourina O., Nguyen, M.K. Real-time EEG-based Emotion Recognition and its Applications. Transactions on Computational Science. 2011; 12:256-277. 5. Black M.J., Yacoob Y. Recognizing Expressions in Image Sequences Using Local Parameterizes Models of Image Motion. International Journal of Computer Vision. 1997; 25(1): 23-48. 6. Zeng Z., Pantic M., Glenn R.I., Huang T.S. A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Trans. on Pattern Analysis and Machine Intelligence. 2009; 31(1): 39-58. 7. Adolphs R. Recognizing Emotion from Facial Expressions: Psychological and Neurological Mechanisms. Behavioral and Cognitive Neuroscience Reviews. 2002; 1(1): 21-62. 8. Dellaert F., Polzin T., Waibel A. Recognizing Emotion in Speech. In: 4th International Conference on Spoken Language. 1996; 1970-1973. 9. Castellano G., Villalba S.D., Camurri A. Recognising Human Emotions from Body Movement and Gesture Dynamics. In: Affective Computing and Intelligent Interaction, LNCS. Vol. 4738. 2007; 71-28. 10. LeDoux J.E. Emotion Circuits in the Brain. Annual Review of Neuroscience. 2000; 23: 155184. 11. Nasoz F., Alvarez K., Lisetti C.L., Finkelstein N. Emotion Recognition from Physiological Signals for Presence Technologies. International Journal of Cognition, Technology, and Work - Special Issue on Presence. 2004; 6(1): 4-14.

12. Ya X., Guangyuan, L., Min H., Wanhui, W., Xiting, H. Analysis of Affective ECG Signals Towards Emotion Recognition. Journal of Electronics. 2010; 27(1): 8-14. 13. Rattanyu K., Mizukawa M. Emotion Recognition Based on ECG Signals for Service Robots in the Intelligent Space During Daily Life. Journal of Advanced Computational Intelligence and Intelligent Informatics. 2011; 15(5): 582-591. 14. Henriques R., Paiva A., Antunes C. Accessing Emotion Patterns from Affective Interactions using Electrodermal Activity. In: Affective Computing and Intelligent Interaction (ACII 2013). 15. Nakasone A., Predinger H., Ishizuka M. Emotion Recognition from Electromyography and Skin Conductance. In: The Fifth International Workshop on Biosignal Interpretation. 2005; 219-222. 16. Maaoui C., Pruski A., Abdat, F. Emotion Recognition for Human-Machine Communication. In: IEEE International Conference on Intelligent Robots and Systems. 2008; 1210-1215. 17. Picard R., Vyzas E., Healey J. Toward Machine Emotional Intelligence: Analysis of Affective Physiological State. IEEE Trans. on Pattern Analysis and Machine Intelligence. 2001; 23(10): 1175-1191. 18. Wang X.W., Nie D., Lu B.L. EEG-Based Emotion Recognition Using Frequency Domain Features and Support Vector Machines. In: Proceedings of the 18th International Conference on Neural Information Processing. 2011; 734-743. 19. Lin Y.P., Wang C.H., Jung T.P., Wu T.L. EEG-Based Emotion Recognition in Music Listening. IEEE Transactions on Biomedical Engineering. 2010; 57 (7): 1798-1806. 20. Izard C.E. Basic Emotions, Relations Among Emotions, and Emotion-Cognition Relations. Psychological Review. 1992; 99: 561-565. 21. Stein N.L., Oatley K. Basic Emotions: Theory and Measurement. Cognition and Emotion. 1992; 6: 161-168. 22. Ekman, P. Basic Emotions. Handbook of Cognition and Emotion, 1999; 45-60. 23. Plutchik, R. Emotion: Theory, Research, and Experience. Theories of Emotion 1, Vol. 1, New York, Academic; 1980. 24. Russell, J.A. Affective Space is Bipolar. Journal of Personality and Social Psychology. 1979; 37(3): 345-356. 25. Lang, P.J. The Emotion Probe: Studies of Motivation and Attention. American Psychologist. 1995; 50(5): 372-385. 26. Rubin D.C., Talerico, J.M. A Comparison of Dimensional Models of Emotion. Memory. 2009; 17: 803-808. 27. Chanel G., Ansari-Asi K., Pun T. Valence-Arousal Evaluation Using Physiological Signals in an Emotion Recall Paradigm. In: IEEE International Conference of Systems, Man and Cybernetics; 2007.

28. Wagner J., Kim J., Andre E. From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification. In: IEEE International Conference on Multimedia and Expo; 2005. 29. Muehl C. Toward Affective Brain-Computer Interfaces: Exploring the Neurophysiology of Affect During Human Media Interaction. [dissertation]. University of Twente; 2012. 30. Muhl C., Brouwer A.M., van Wouwe N.C., van den Broek E.L., Nijboer F., Heylen D.K.J. Modality-specific Affective Responses and their Implications for Affective BCI. In: International Conference on Brain-Computer Interfaces; 2011. 31. Lang P.J., Bradley M.M., Cuthbert B.N. International Affective Picture System (IAPS): Technical Manual and Affective Ratings. NIMH Center for the Study of Emotion and Attention; 1997. 32. Bradley M.M., Lang P.J. International Affective Digitized Sounds (IADS): Stimuli, Instruction

Manual

and

Affective

Ratings.

The

Center

for

Research

in

Psychophysiology, University of Florida, Gainesville; 1999. 33. Bos D.O. EEG-based Emotion Recognition: The Influence of Visual and Auditory Stimuli. http://hmi.ewi.utwente.nl/verslagen/capita-selecta/CS-Oude_Bos-Danny.pdf 34. Haag A., Goronzy S., Schaich P., Williams J. Emotion Recognition using Bio-Sensors: First Steps Towards an Automatic System. Affective Dialogue Systems, Lecture Notes in Computer Science, Vol. 3068, 2004; 36-48. 35. Berka C., Tan V., Fidopiastis C., Skinner A., Martinez D., Johnson R.R. Enhancing Cultural Training Platforms with Integrated Psychophysiological Metrics of Cognitive and Affective States. In: AHFE, Advances in Design for Cross-Cultural Activities Part I, 2012; 229-238. 36. Lisetti C.L., Nasoz F. Using Noninvasive Wearable Computers to Recognize Human Emotions from Physiological Signals. EURASIP Journal of Applied Signal Processing. 2004; 11: 1672-1687. 37. Chanel G. Emotion Assessment for Affective Computing Based on Brain and Peripheral Signals. [dissertation]. Universite de Geneve; 2009. 38. Koelstra S., Muhl C., Soleymani M., Lee J.S., Yazdani A., Ebrahimi T., Pun T., Nijholt A., Patras I. DEAP: A Database for Emotion Analysis Using Physiological Signals. IEEE Transactions on Affective Computing. 2012: 3(1): 18-31. 39. Yang Y.H., Lin, Y.C., Su Y.F., Chen H.H. A Regression Approach to Music Emotion Recognition. IEEE Transactions on Audio, Speech, and Language Processing. 2008; 16(2): 448-457. 40. Berk L., Cavalcanti P., Bains G. EEG Brain Wave Band Differentiation During a Eustress State of Humor Associated Mirthful Laughter Compared to a Distress State. Experimental Biology, San Diego, CA; 2012.

41. Berk L.S., Tan S.A., Fry W.F., Napier B.J., Lee J.W., Hubbard R.W., Lewis J.E, Eby W.C. Neuroendocrine and Stress Hormone Changes During Mirthful Laughter. The American Journal of the Medical Sciences. 1989; 298(6): 390-396. 42. Miller M., Mangano C., Park Y., Goel R., Plotnick G. D., Vogel, R.A. Impact of Cinematic Viewing on Endothelial Function. Heart. 2006; 92(2): 261-262. 43. Tran Y., Thuraisingham R.A., Wijesuriya N., Nguyen H.T. Detecting Neural Changes During Stress and Fatigue Effectively: A Comparison of Spectral Analysis and Sample Theory. IEEE EMBS Conference on Neural Engineering; 2007. 44. Chanel G., Kronegg J., Grandjean D., Pun T. Emotion Assessment: Arousal Evaluation Using EEGs and Peripheral Physiological Signals. International Workshop on Multimedia Content Representation. 2005; 4105: 1052-1063. 45. Kendall H. Story Proof: The Science Behind the Startling Power of Story. 2009; Westport, CT: Libraries Unlimited. 46. Berka C., Levendowski D.J., Lumicao M.N., Yau A., Davis G., Zivkovic V.T., Olmstead R.E., Tremoulet P.D., Craven P. EEG Correlates of Task Engagement and Mental Workload in Vigilance, Learning, and Memory Tasks. Aviation, Space, and Environmental Medicine. 2007; 78(5): B231-244. 47. Wei D. Coiflet-Type Wavelets: Theory, Design, and Applications. [dissertation]. The University of Texas at Austin; 1998. 48. Hardle W., Simar L. Applied Multivariate Statistical Analysis. Springer Berlin Heidelberg; 2007. 49. Li M., Lu B.L. Emotion Classification Based on Gamma-Band EEG. In: IEEE Engineering in Medicine and Biology Society; 2009. 50. Berk L.S., Pawar P., Alphonso C., Rekapalli N., Arora R., Cavalcant, P. Humor Associated Mirthful Laughter Enhances Brain EEG Power Spectral Density Gamma Wave Band Activity (31-40Hz). In: Society for Neuroscience; 2013. 51. Aftanas L.I., Reva N.V., Varlamov, A.A., Pavlov S.V., Makhnev V.P. Analysis of Evoked EEG Synchronization and Desynchronization in Conditions of Emotional Activation in Humans: Temporal and Topographic Characteristics. Neuroscience and Behavioral Physiology. 2004; 34(8): 859-867. 52. Goncharova I., McFarland D.J, Vaughan T.M, Wolpaw J.R. EMG Contamination of EEG: Spectral and Topographical Characteristics. Clinical Neurophysiology; 2003, 114(9): 1580–1593. doi:10.1016/S1388-2457(03)00093-2 53. Van Boxtel A. Optimal Signal Bandwidth for the Recording of Surface EMG Activity of Facial, Jaw, Oral, and Neck Muscles. Psychophysiology; 2001, 38(1): 22–34.

Table 1. The list of the first 30 selected variables by step-wise selection, with their Ftest scores, and their stability rankings. Variable

F-test score

Fp2, gamma T3, gamma Temporal right, gamma Fz, theta total Fz, gamma C4, gamma All EEG-channels, delta Fp2, beta Fp1, gamma Fp2, sigma O1, gamma Fz, alpha total O1, theta total T4, sigma C4, alpha slow T3, beta F4, gamma Frontal, alpha total P3, wavelet, 0-2 HZ All EEG-channels, gamma O2, gamma Fp1, wavelet, 0-2 Hz F7, theta fast T3, alpha total P3, beta F7, gamma Occipital, delta Fz, sigma T5, beta T4, beta

22582.4 16018.9 5911.37 2072.13 1587.29 1160.16 1020.34 942.09 904.17 847.84 774.46 773.21 770.27 479.85 465.27 438.84 326.34 302.28 296.27 278.32 267.82 263.47 253.97 222.99 216.35 195.47 183.18 170.46 133.97 35.6

Ranking – leave-one-participant-out cross-validation 98 98 98 50 95 94 63 80 98 95 84 72 41 95 59 84 59 20 54 28 95 37 57 83 45 87 84 76 53 22

Ranking – individualized models 37 42 13 4 25 11 4 18 39 16 21 5 5 5 2 38 29 0 11 10 20 12 5 6 7 20 3 3 23 37

Table 2. Model development classification results for the general and individualized lDFA and qDFA models averaged over 98 participants (mean±standard deviation). General model Auto-validation

lDFA

qDFA

Sensitivity Specificity PPV NPV Accuracy Sensitivity Specificity PPV NPV Accuracy

71.5% ± 25.3% 76.1% ± 20.8% 76.0% ± 16.2% 77.2% ± 15.8% 74.2% ± 14.0% 56.6% ± 28.8% 86.4% ± 12.8% 78.6% ± 15.2% 69.3% ± 14.9% 70.7% ± 13.7%

Leave-one-participantout cross-validation 70.5% ± 26.5% 74.6% ± 22.1% 74.9% ± 16.6% 76.5% ± 16.0% 73.0% ± 14.1% 52.3% ± 28.7% 82.0% ± 16.5% 73.8% ± 17.2% 67.4% ± 15.2% 67.8% ± 13.8%

Individualized model Autovalidation 91.8% ± 6.6% 93.8% ± 5.9% 93.4% ± 6.2% 92.1% ± 6.4% 92.8% ± 6.0% 93.8% ± 6.9% 95.7% ± 4.6% 95.4% ± 4.5% 94.4% ± 5.6% 94.7% ± 4.8%

Table 3. Cross-validation classification results for the general and individualized lDFA and qDFA models averaged over 63 participants (mean±standard deviation).

lDFA

qDFA

Sensitivity Specificity PPV NPV Accuracy Sensitivity Specificity PPV NPV Accuracy

General model

Individualized model

70.3% ± 29.5% 77.8% ± 22.3% 77.9% ± 15.5% 77.5% ± 15.8% 74.3% ± 13.5% 50.3% ± 30.2% 87.6% ± 13.3% 79.6% ± 16.4% 67.3% ± 15.5% 69.4% ± 14.7%

93.6% ± 4.5% 95.4% ± 4.6% 95.3% ± 4.3% 93.8% ± 4.2% 94.5% ± 4.2% 92.7% ± 8.3% 96.0% ± 5.4% 96.1% ± 4.3% 93.5% ± 6.5% 94.5% ± 4.8%

Table 4. Charity donation classification results for the general and individualized lDFA and qDFA models.

Sensitivity Specificity PPV NPV Accuracy

General lDFA model 58.8% 63.3% 47.6% 73.1% 61.7%

General qDFA model 64.7% 76.7% 61.1% 79.3% 72.3%

Individualized lDFA model 62.5% 71.4% 55.6% 76.9% 68.2%

Individualized qDFA model 62.5% 75.0% 58.8% 77.8% 70.5%

Figure 1. Positive/negative testbed: a sample of the commercially available videos.

Figure 2. The participants during the narrative storytelling experiment.

Figure 3. EEG wireless sensor headset.

Figure 4. Map of the channel locations for the utilized EEG headset.

Figure 5. Leave-n-participants-out classification results.

Figure 6. Accuracy of the individualized lDFA and qDFA models trained on the first 115 min of the AFV/SPR videos and tested on the rest of the videos.

Figure 7. Average posterior probabilities of negative class for each story segment of the "least just" and "most just" story versions for general/individualized lDFA/qDFA models.

Figure 8. Delineation of the average posterior probabilities of the negative class for each story segment of the participants who donated to the charity and the participants who did not donate in the case of the "least just" and "most just" story versions for general/individualized lDFA/qDFA models.