Towards Incorporating Affective Feedback into Context-Aware

7 downloads 0 Views 4MB Size Report
measurement units (IMU). This system shows improved ac- tivity recognition accuracy of 75% when compared to using ... observed to be heightened in system failures during time-pressured tasks, i.e., ... blood volume pulse (BVP), blood pressure (BP), muscle ... from Evolutionary Theory that fight-or-flight response to stress.
2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Towards Incorporating Affective Feedback into Context-Aware Intelligent Environments Deba Pratim Saha, Thomas L. Martin and R. Benjamin Knapp Institute for Creativity, Arts and Technology Virginia Tech. Blacksburg, VA 24061. Email - {dpsaha, tlmartin, benknapp}@vt.edu Abstract—Determining the relevance of services from intelligent environments is a critical step in implementing a reliable context-aware ambient intelligent system. Designing the provision of explicit indications to the system is effective in communicating this relevance, however, such explicit indications come at the cost of user’s cognitive resources. In this work, we strive to create a novel pathway of implicit communication between the user and their ambient intelligence by employing user’s stress as a feedback pathway to the intelligent system. In addition, following a few very recent works, we propose using proven laboratory stressors to collect ground truth data for stressed states. We present results from a preliminary pilot study which shows promise for creating this implicit channel of communication as well as proves the feasibility of using laboratory stressors as a reliable method of ground truth collection for stressed states.

Intelligent Environment

User Context {Laser Tracker for Hand Position}

Mental Model Human Senses

Environmental Context {Out of Scope}

User

SystemEnvironment Model Intelligent

{E.g. - SVM}

Affect Generation

Model Parameters {E.g. - Kernel Weights}

Affective Feedback

Physiology Wearable Sensors

Fig. 1: Affective Feedback Block Diagram

Keywords—Intelligent Environment; Technostress; Stroop Test

I.

symbiotic relationship with each other, as argued by Doctor et al. and Hammal et al. [1], [3] respectively. Recent works have shown improvements in overall system performance of an activity classifier and an affective computing system by sensor level fusion between activity, context and affective data streams, which essentially addresses the measurement related issues arising from real-life noisy environments. For example – a) Yang et al. [4] presents an activity recognition pipeline which uses galvanic skin response (GSR), skin temperature (ST) and heat flux (HF) sensors in conjunction with innertial measurement units (IMU). This system shows improved activity recognition accuracy of 75% when compared to using only physiological signals (60%) and only accelerometer data (45%). b) Sun et al. [5] have used accelerometer data to detect activity and used physiological sensor such as electrocardiocram (ECG) and GSR data to build a stress recognition pipeline. They hypothesize that activity cannot be overlooked in stress detection and claim to have achieved 92% accuracy in the stress-state detection. However, none of these works show the use of inference level fusion, i.e., inferences drawn from the logical or internal context (e.g. AC system) aiding the inference engine for physical or external context (e.g. CA system), which is exactly our aim in this current work (see Bauldauf [6] for details on the terms coined by Hofer et al. [7] and Prekop et al. [8], respectively).

I NTRODUCTION

The influx of embedded and networked computing devices have ushered in the age of ‘Internet of Things’ (IoT) which is an enabler of richer context-aware (CA) systems and intelligent environments (IE) that are increasingly getting closely coupled with a user’s daily life. In addition to these, a new class of computing systems has emerged in the past decade that “relate to, arise from or deliberately influence human emotions” and are collectively termed as Affective Computing (AC) systems (Rosalind W. Picard). With the projected growth of IoT and at the current rate of mobile system penetration into the masses, technology will need to be more human-centric [1]. As we’ll see in later sections, this increasingly close coupling of IE with its users renders them more prone to the problem of technostress; which will be used in an innovative way to provide a feedback to the IE about the relevance of its services to the user. This will necessitate the creation of a more natural or implicit channel of communication, a concept courtesy Rodie Cowie’s seminal article [2], between the IE and its users. This implicit channel will enable the IE to become aware of user’s moods and feelings, and eventually, reconfigure its responses based on this knowledge about the user. Our current work presents a prototype implementation of an approach to investigate whether the inferences drawn by a CA system can be aided by a physiology-based AC system in the form of an affective feedback, as shown in Figure 1. That is, we intend to present a method of creating an implicit communication channel between an intelligent environment and its user. In doing so, we are also proposing a novel approach towards collection of ground truth data for stressed states via commonly used laboratory stressors.

This paper is organised as follows: in the Background section, we will discuss the salient features of an IE, how stress is elicited in a real-life IE and how affective-feedback can be used in IE. In the Experiment Design section, we will describe our setup of an IE and the method to capture groundtruth data for stressed states. Finally, we will present the results of our system evaluation and a short discussion in the Results and Discussion sections respectively.

There are numerous interconnections between inferences of CA and AC systems as their individual inferences share a 978-1-4799-9953-8/15/$31.00 ©2015 IEEE

49

II.

block level overview of the affective feedback implemented in our system can be seen in Figure 1. Such affective feedback loops have been effectively implemented in reorienting smart home behavior by infering user (dis)approval using vision based facial feature understanding [16]. In this current work, we are presenting a personalized model of stress to be used in an affective feedback loop wherein the system recognizes the patterns of ANS responses while the user is experiencing technostress in the intelligent environment.

BACKGROUND

A. Intelligent Environments Augusto et al. have defined an Intelligent Environment as a “physical space in which the actions of numerous networked controllers, each controlling a specific aspect of the environment, is orchestrated by self-programming pre-emptive processes in such a way as to create an interactive holistic functionality that enhances dweller’s experiences”[9]. Augusto et al. argue that an IE should be intelligent and sensible to identify when and how it should provide a service to the user to help them achieve their current goals while also preserve their privacy, safety and autonomous behavior. A critical aspect of IE is its interactivity with the user to gather feedback regarding accuracy or appropriateness of the services it provides. Incorporating such feedback techniques is usually about creating a channel of communication, either explicit or implicit [2], that helps determine the relevance of the system’s intelligent interventions towards fulfilling the user’s goals.

D. Physiological Aspects of Stress Recognition Human affective states are complex and composite signals, which have historically been inferred using various modalities such as audio-visual, behavioral, gestural as well as physiological sensing [17]. Physiological sensing provides a reliable modality of non-invasively capturing responses directly from ANS reflecting implicit responses that may occur without conscious awareness or are beyond cognitive intent [12]. Thus, ANS sensing provides a pathway into measuring nascent emotional states unaltered by conscious efforts of emotion regulation. Kreibig [18] presented an extensive review of relationship between human physiological responses and emotional states, in which it was pointed out that ANS responses appear more pronounced in negative emotions (such as stress, anger etc.) compared to positive emotions (such as happy, excited etc.). Although it should be mentioned here that physiological sensing presents multiple other challenges such as a) inevitable presence of natural confounders like food or caffeine intake and physical activity, and b) wide between-person variations in responses to stress originating from the different coping strategies adopted by people with different personalities etc. which makes it difficult to design a generalised classifier model for a wide range of users [19]. In addition, collecting ground truth data from such natural settings poses practical limitations in realising such systems. Plarre et al. [19] argue that knowledge of proven stressors such as mental arithmetic, public speaking etc. can be used as a more practical method to collect and annotate ground truth data.

B. Technostress and Intelligent Environments Technostress has been defined by psychologist Craig Brod as “a modern disease of adaptation caused by an inability to cope with computer technologies” [10]. In simpler terms, technostress is the perception of hassles due to system crash, response delay or a demanding learning-curve of new modalities during interaction with technology. It is a psychological as well as biological stressor, which results in activation of many biological subsystems including central nervous system (CNS) and autonomic nervous system (ANS). Technostress produces elevated levels of stress hormones (e.g., cortisol, adrenaline) as well as heightened activity of the sympathetic division (SNS) of ANS, which in effect regulates heart-rate, skin conductance, muscle tension and blood pressure [11]. A leading cause of technostress is “achievement stress” which is observed to be heightened in system failures during time-pressured tasks, i.e., tasks having hard-deadlines associated with them [10]. In contrast to a general computing system, an IE is intended to be more closely coupled with a user’s daily life, which may at times render their malfunction (i.e., incorrect service responses) to cause increased achievement stress resulting in higher technostress. This technostress signal, if recognised accurately, can provide a unique peek into a user’s (dis)approval of appropriateness of the services provided by the IE, which is exactly our current goal.

Human ANS innervates various body organs such as heart, skin, pupils and muscles etc., whose effects can be noninvasively measured using wearable on-body sensing. Some such signals are ECG, heart rate variability (HRV), respiratory sinus arrythmia (RSA), GSR alternatively called electrodermal activity (EDA), pupil dilation (PD), ST, respiratory rate (RR), blood volume pulse (BVP), blood pressure (BP), muscle tension (MT) among others [18]. Reidl et al. [10] have noted from Evolutionary Theory that fight-or-flight response to stress activates the hypothalamic-pituitary-adrenal (HPA) axis and the SNS division of ANS. For our current study, we use GSR, ECG and HRV signals as described below:

C. Affective Feedback to Intelligent Environments As described in Section II-A, user feedback is an integral component of an IE very critical towards practically implementing it. Explicit feedback is a robust way of capturing user’s intentions and needs, but it comes at the expense of user’s cognitive resources. Implicit feedback on the other hand, provides a way to infer user’s approval of system interventions by using behavioral cues or physiological responses which often are generated at a subconcious level [12]. AC systems often take the form of a feedback loop where the system maps sensory data acquired from visible, audible, behavioral and physiological cues to affective states of the user, thereby reorienting its own reponses by continuously learning user’s preferences in real-time [13], [14], [15]. A 978-1-4799-9953-8/15/$31.00 ©2015 IEEE

1) Galvanic Skin Responses: Eccrine sweat glands, present in large concentration on the inner side of the palm and feet, are very reliable indicators of SNS activity [10] which shows heightened activity during the perception of stress. EDA is an umbrella term which defines the change in electrical properties of skin measured across specific active sites, occuring due to sweat secretion from eccrine glands during aroused SNS activity [12]. Skin conductance (SC) is the most widely used measure to quantify EDA, which is composed of a slow 50

even at the cost of a low recognition accuracy of technostressed states. This is intuitive that an FP (i.e., when the system falsely senses user to be in technostressed state when they are actually not) in this new system will directly deteriorate its performance compared to a baseline CA system, as the new system will try to reorient its services even though the service was actually helpful to the user. However, such a comparison with a baseline system is beyond the scope of this currect work and will be dealt-with in future versions.

changing background component called skin conductance level (SCL) and a rapidly changing component called skin conductance response (SCR). EDA is an established measure of SNS arousal as it is arguably the only physiological variable that reflects the SNS activity uncontaminated by parasympathetic nervous system (PNS) activity [12]. Event related phasic SCR (ER.SCR) are quite informative and have shown wide variation in rise-time, decay-time, amplitude and latency based on the nature of stimulus applied. In this present work, we have used a modified version of Jaimovich’s EDA preprocessing pipeline provided as a set of MATLAB subroutines along with his thesis [20]. The algorithms take the raw EDA time-series, resamples it to 50Hz, removes electrical noise using an FIR filter of 0.5Hz cut off frequency and gives time annotated SCR and SCL values as output. For computing the ER.SCR related features, we have used the procedure charted in Kim et al.’s work [21],[12].

In addition to this, as discussed previously in Section II-D, ground-truth collection for stressed states in real-life situations such as ours, poses a real challenge. Our hypothesis, aided by a recent work [19], is that we can circumvent this by training our classifier on proven laboratory stressors such as Paced Stroop tests. As we’ll see in the Results section that a self-calibrating classifier trained on laboratory stressors can indeed be used to sense technostressed states in real-life IE settings.

2) Instantaneous Heart-Rate and HRV: Variation in instantaneous heart-rate derived at each beat from the ECG data, termed as heart-rate variability (HRV), is a widely studied parameter in psychophysiological studies as it is known to reflect the balance between degrees of sympathetic and parasympathetic activities of ANS [22]. We have used the Jaimovich’s ECG preprocessing pipeline provided as a set of MATLAB subroutines along with his thesis [20], which gives a time annotated instantaneous HR time series as output. The raw ECG data, obtained from the ECG sensor setup as described in Section III-C, is detrended and filtered using an FIR highpass filter with Kaiser Window having a cut-off frequency of 3Hz, followed by heart-rate extraction at each beat using a moving window with a thresholding parameter of 2 standarddeviations (SD) and beat change-ratio of 20%. Time domain features such as mean and standard deviation of RR interval and HRV time-series, root-mean-squared value of successive difference of RR interval (RMSSD), percentage of RR intervals greater than x ms in a given window (pNNx) are used in our analysis. Frequency domain features of the HRV time-series such as sub-band powers in low-frequency (pLF) i.e. 0.030.15Hz and high-frequency bands (pHF) i.e. 0.15-0.4Hz and their ratio (rLFHF) have been known to represent the ANS modulated sympatho-vagal balance of heart rate [22]. pLF and pHF are known to represent predominance of SNS and PNS activity on the heart-rate and their ratio is expected to show an increase during stressed states [22]. III.

A. Order-Picking Experiment (OPE) Setup An order picking task is the process of collecting supply items corresponding to a particular order from warehouse racks, and sorting them as per order requisitions for delivery. Order picking is one of the major tasks in warehouses across the globe, accounting for upto 60% of their operating costs [23]. To reduce this operational cost as well as human errors, order-picking personnel are provided with various context aware task-assistances from the IE. Commonly used taskassistance systems in the industry are paper pick-lists (pickby-list or LST), illuminated bin indicators (pick-by-light or LHT) and heads-up-display (HUD) assisted picking [23]. Such systems are usually fitted with laser trackers in order to determine the current pick by detecting personnel’s reach in each bin. However, these laser-trackers are frequently prone to mistrigger errors which is a major source of irritation for these personnel [23], because the IE that is designed to aid in a worker’s task is actually posing hindrace towards achieving their goals, thereby inducing “achievement stress” which is one of the potent reasons of technostress [10]. In practice, these mistrigger errors have to be corrected by expicitly indicating to the IE about the error using buttons etc. Our prototype (shown in Figure 2a) is based on the pickby-light scheme, wherein the bins are fitted with bin-indicating LEDs (bLED) showing the personnel which bin to pick from, as well as wrong-pick LEDs (wLED) indicating a wrong pick. An indicative timeline diagram of the user study for a typical user is shown in Figure 3.

M ETHODS AND E XPERIMENTAL -S ETUP

As a part of our effort to design an IE described in Section II-A, we have prototyped an order picking system using a smaller version of industrial shelves/racks which are instrumented to provide task-assistance to the picker-personnel. However, as we’ll see in Section III-A section, this system often malfunctions which evidently induces technostress in the personnel. As discussed in Section II-B, sensing this technostressed state of the user corresponding to IE-malfunction will give a unique peek into a user’s subconcious (dis)approval of the IE’s service. Thus, creating this implicit communication with the user by recognising the onset of technostressed state is the exact goal of this current work. We must note here that for this new CA system with affective feedback to beat the performance of a baseline CA system having no affective feedback, the system must not tolerate any false-positives (FP) 978-1-4799-9953-8/15/$31.00 ©2015 IEEE

(a) Instrumented OPE Setup

(b) Sample IC-PST Figure

Fig. 2: Experimental Setup Snapshots 51

Rest Experiment Rest

Experiment

Rest Experiment

sequence programmed to be consistent across all participants. C. Hardware Setup

End

Start

Order Picking Experiment

As described in Section II-D, our setup involves acquiring EDA and ECG data using BioEmo and BioBeat sensors from Biocontrol Sytems (www.biocontrol.com, accessed 4/22/15) respectively. BioEmo is an exosomatic skin conductance sensor, designed to be worn on the medial or distal phalanges of the fingers in direct contact with skin. Due to the possibility of sensor misfit due to physical activity in OPE experiment, adhesive tape was used to secure BioEmo on the ring and middle fingers. Jaimovich notes that BioEmo follows a logarithmic relation with skin conductance measured across the fingers [20], so that 10% change in logarithm of the output voltage represents 1.08mS conductance. BioBeat is the ECG sensor which comprises of gold plated electrodes worn in the form of a chest band. Both these sensors are connected with iCubeX wiMicroDig digitizers sampling at 200Hz, which are configured to stream data wirelessly over bluetooth to a nearby laptop acting as a terminal for running both OPE and PST while also logging physiological data.

Fig. 3: Experiment Timeline Diagram (Indicative) During our experiments, we purposefully indicate a wrong pick at predetermined times, even though the participants know they are picking from the correct bin, thereby simulating the situation of mistrigger error in this IE which causes technostress. Also, in order to aid in distinguishing the positive affective (PA) states and negative affective (NA) states, we’ve interspersed LHT and LST orders, as seen in Figure 3, such that PA is induced when the user is switching from LST to LHT by receiving a correct (surprise-inducing and task-aiding) CA assistance from the IE; and NA is induced when the user is receiving an incorrect (stress-inducing and hindraceposing) CA assistance from the IE. Our system is operated in a Wizard-of-Oz fashion, wherein the CA assistance as well as the mistrigger/pick-place error indications are triggered by the experimenter, who can actually see the participant’s progress in a particular task. In our experiment, each user is provided with a paper-pick list containing 14 order bin numbers (i.e., 14 items per task), out of which 5 orders have no task-assistance (LST), 5 orders have correct task-assistance ({LHT + bLED} states) and 4 orders have incorrect task-assistance ({LHT + wLED} states). In addition, even though total time for the tasks were not stipulated, in order to simulate the real life warehouse scenario, the participants were requested to finish their tasks in the minimum possible time.

IV.

A. Data Normalization, Feature Extraction and Reduction As discussed earlier in Section II-D1, significant individual differences are observed in the baseline value for skinconductance levels. It has been shown that within-person data normalization results in better stress classification [19], [27]. To deal with these individual baseline differences, the raw time-series data is normalised by computing the studentized residuals, which are defined as the mean subtracted data points scaled by the variance of the time-series in a given window. This normalization makes the algorithm self-calibrating to personal baseline differences.

B. Paced Stroop Test (PST) In its classical form, Stroop color-word interference test demands that the user chooses the font color of a word which is depicting the name of either the same color as the font’s color or a different color. In the congruent (C-PST) version of the test, the font color of the word and the name of the color depicted by the word match, whereas in the incongruent (ICPST) version, they do not. Stroop color-word interference test has been used as a standard cognitive stressor for laboratory use, which is capable of inducing heightened ANS activity on users [24], [25], [26]. A modified version of this test is called Paced Stroop test, where each iteration of the Stroop test is programmed to be active for a stipulated time, say 3 seconds [25]. This task-pacing during the Stroop test has been shown to enhance the stress-inducing capability of Stroop test as compared to self-paced Stroop test, due to the need to expend increased amount of mental/cognitive effort in producing the correct response [26]. For our experimental setup, we have used task pacing time of 3 seconds between each Stroop figure, running for a total of 180 seconds (i.e., a total of 60 pairs of Stroop figures and responses) and used two versions of the Paced Stroop test. In the first version, conveniently named InPhase-PST, one block of 60 seconds (i.e., 20 pairs) of C-PST is preceeded and followed by 60 seconds each of IC-PST (i.e., 2 x 20 pairs). In the second version, conveniently named RandomPST, the 30 pairs of C-PST are interspersed with 30 pairs of IC-PST making it appear random for one participant, but their 978-1-4799-9953-8/15/$31.00 ©2015 IEEE

DATA A NALYSIS

As described in Section II-D, we extract a set of fourteen features from GSR and ECG time-series data, that have been reported in literature as distinguishing for stress related studies [25], [19], [27]. Following Figner’s report [28] stating a common window of interest for EDA feature extraction to be limited upto 6seconds after the stimulus onset, features in our analysis are extracted from a window of 6seconds called StimWin. Only time domain EDA features were included in our analysis which include mean amplitude, rise-time and falltime of the phasic ER-SCR. Both time and frequency domain features are extracted from the HRV time-series, derived from the ECG data as described in Section II-D2. Time domain features include mean and SD of HR computed at each beat, mean and SD of R-R peak intervals, root-mean-square of successive difference of R-R peak intervals, percentage of all R-R peak intervals in the StimWin that are greater than 20ms and 50ms. Frequency domain features used are total spectral power in LF band, HF band, and the ratio of these total powers in LF and HF bands. In order to properly project observations onto a space with independent basis vectors, we use principal component analysis (PCA) which while orthogonalizing features also preserves the variance of the dataset along these orthogonal bases. Thus, it can also be used to discard those dimensions which do not explain significant amount of variance of the 52

As described in Section III-B, physiological data collected during PST is used as ground truth data corresponding to cognitive states of S or NS, C-PST being related to NS state and IC-PST to S state of the user. Based on our experiment design, we have the following sets of results to present here: a) Train only on OPE data and cross-validate (CV) on OPE data, b) Train only on PST data and Predict on OPE data, and c) Train on Combined PST+OPE data and CV on OPE data. Evidently we are particularly interested in the results from the prediction/CV performed on the OPE data.

dataset, essentially reducing dimensions and alleviating the Curse-of-Dimensionality. A reasonable cut-off for variance is 99%, i.e. accepting at least N dimensions such that they explain a total variance of more than 99% [29]. B. Support Vector Machines (SVM) Support Vector Machine is a very widely used linear discriminative classification algorithm which, being data distribution independent, is known to successfully classify a wide variety of problems with good accuracy. Prior works on physiology based stress recognition have shown SVM outperforms various other classifiers [30], [31], [19], [27]. SVM is predominantly a binary classifier where the primary objective is to come up with a maximum margin classifying hyperplane between the classes such that there are minimum number of support vectors inside the margin, where margin is defined as the minimum distance between the classifying hyperplane and a point in the dataset. So for a training dataset of labeled points D = {xi , yi }n i=1 with yi ∈ {+1, −1}, softmargin SVM has the following dual formulation : Objective : max L = α

dual

n X i=1

n

αi −

A. Performance Evaluation Criteria Performance of such models is commonly summarized using classification accuracy metric, which however, is not an adequate metric for class-imbalance learning problems such as ours [33], [34]. Stressed states in practical situations can reasonably be assumed to be rare states, compared to normal non-stressed states; hence, a stress-classifier in practice has to, almost always, deal with imbalanced classes. A confusion matrix and its derived measures such as precision (p), recall (r), G-score, Fβ -score are used for quantifying classifier performance in imbalanced cases and are defined as: 2 √ G = pr, Fβ = (ββ 2+1)pr p+r where β is used to tune the effect of p and r. Sasaki [35] suggests that for β < 1, Fβ becomes increasingly precision oriented. As discussed in Section III, our problem statement calls for heavy penalty for any FP while also rewarding a good classification for technostressed states; so we must formulate an Fβ -score that rewards very low FP (i.e., more dominant on p); hence we select β = 0.1.

n

1 XX αi αj yi yj K(xi , xj ) 2 i=1 j=1

Constraints :0 ≤ αi ≤ C ∀i ∈ D and

n X

αi yi = 0

(1)

i=1

where K(xi , xj ) is the kernel function used to map data vectors to a more expressive feature space which aids in classification of non-linear datasets. Zhai et al. [32] showed that Sigmoid kernel outperformed Gaussian kernel for stress recognition. This is in line with our findings, which prompted us to use Sigmoid kernel for compiling our results. V.

B. Model Performance Evaluation The results from training an SVM classifier with Sigmoid Kernel for each user are presented here. Case − I: Train on OPE data and CV on OPE data: Here, we used features extracted from OPE dataset to train our classifier. The model performance was evaluated using LOSOCV method for each user and results are presented in Table I and Figure 4. Evidently the model is quite underperforming, which could not recognise a single correct technostressed-state for User C and User E. Case − II: Train on PST data and Predict on OPE data: In this case, we used the features extracted from the PST dataset to train our classifier. The prediction results on the OPE dataset are presented in Table I and Figure 4. In these results, we can clearly see that the classifier performance has improved, benefiting from the additional ground truth data provided by PST dataset. Although, as evident from the table, very low FP could not be achieved for all users. Case − III: Train on Combined data and CV on OPE data: Here, we used the features extracted from the combined PST and OPE datasets to train our classifier. The results of CV performed on the OPE dataset using LOSO-CV method for each user are presented in Table I and Figure 4. The performance results have improved both on G-score and Fβ -score metrics, compared to both the previous cases. Particularly, the FP has reduced and classification for technostressed-states has increased for all users. Thus, it is safe to conclude that the model has improved from the additional ground truth data provided by the combined PST and OPE datasets.

R ESULTS

The goal of this current work is to correctly identify physiological states corresponding to the onset of mental stress (i.e., technostress) induced by the incorrect responses from the IE, i.e., {LHT + wLED} states in OPE experiment. This goal is achieved by learning a statistical model from a subset of this labelled dataset as well as the ground-truth obtained from a laboratory stressor i.e., the PST dataset. The model is verified per user by predicting the class i.e., stressed (S) vs. not-stressed (NS), of a previously unseen input sample from the OPE dataset using the leave-one-sample-out-crossvalidation (LOSO-CV) method. We have collected data from 7 participants (5 males, 2 females) in the age bracket 20-30 yrs under a research protocol approved by Virginia Tech IRB#14-689. Participants represented a wide range of nationality, ethnicity and physique, though no conscious effort was made to select participants based along any discriminatory attributes. Following data preprocessing, fourteen features were extracted from segments of window length StimWin from the onset of each stimulus. Stimulus onset for PST dataset was considered as the time when a new PST window appeared on the screen and for the OPE dataset, as LEDs glowing corresponding to {LHT + xLED}x={w,b} states. These features were presented to the pattern recognition pipeline to learn a statistical model. 978-1-4799-9953-8/15/$31.00 ©2015 IEEE

53

TABLE I: Userwise Confusion Matrix, G-score and Fβ -score Calculations (described in Section V-A) for Cases (discussed in Section V-B). A typical confusion matrix result for say User F, Case-III will be read as TP=9, TN=3, FP=1, FN=1. User A B C D E F G*

Confusion NS NS 9 S 3 NS 6 S 2 NS 7 S 4 NS 8 S 3 NS 9 S 4 NS 7 S 1 NS 6 S 2

Matrix for Case-I S G-score Fβ -score 1 0.35 0.49 1 4 0.41 0.33 2 3 0 0 0 2 0.29 0.33 1 1 0 0 0 3 0.61 0.50 3 4 0.26 0.20 1

VI.

User A B C D E F G*

Confusion Matrix for Case-II NS S G-score Fβ -score NS 10 0 0.50 0.97 S 3 1 NS 5 5 0.53 0.38 S 1 3 NS 6 4 0.22 0.20 S 3 1 NS 10 0 1.00 1.00 S 0 4 NS 3 7 0.47 0.30 S 1 3 NS 9 1 0.35 0.49 S 3 1 NS 6 4 0.65 0.43 S 0 3

D ISCUSSION

CASE II

B C D E F G*

Encouraged by the evidence produced by this pilot study pointing towards the possibility of using laboratory stressors as the ground truth for real-life stressors, we plan to implement a real-time affective feedback loop in a similar IE in future. Prior works such as [19], [27] have used various other kinds of social, mental and physical stressors such as public speaking, mental arithmetic, cold pressor challenge to induce stress in users. In addition to ECG and EDA, some recent stress recognition studies have shown effective use of respiration rate [19], [27], pupilary dilation [25] and trapezius muscle EMG [36] as good predictors of mental stress. These open up new vistas of modeling and sensing stress-episodes. We plan to explore the use of varied classes of laboratory stressors to model the ground truth of a real-life stressor by experimentally weighting the contributions from each kind of stressor that might constitute a real-life stressor. In our future work, we also plan to incorporate a continuous temporal model based Dynamic Bayes’ Networks and will try to use an accumulationdecay model, on similar lines as [19].

As described in Section III, our goal was to train a classifier that produces the least number of false positives (FP) while also accurately classifying stressed states. Results shown in Section V-B are very encouraging, as evident from a quick look on Figure 4 which shows a consistent increase in Fβ score with the introduction of PST dataset in training phase.

CASE I

A

Confusion Matrix for Case-II NS S G-score Fβ -score NS 10 0 0.71 0.99 S 2 2 NS 8 2 0.67 0.61 S 1 3 NS 9 1 0.35 0.50 S 3 1 NS 10 0 0.71 0.99 S 2 2 NS 8 2 0.67 0.61 S 1 3 NS 9 1 0.75 0.75 S 1 3 NS 9 1 0.67 0.67 S 1 2

It should be noted here that by tuning the hyperparameters for the SVM model, we were able to reduce FP count to zero for all users; however, it resulted in near-zero correct classifications for stressed states, which is why they are not included in the results. The OPE experiment was conceptualised to mimic a real-life setting for an IE which senses user context and provides relevant services. This also introduces a lot of noise sources, primarily motion artifacts into the sensor data. Although, we have used adhesive tapes to affix the EDA sensors, thus, reducing sensor fitting issues. However, the sensors used for this experiment are not designed for use in ambulatory settings. There were instances of data corruption, which were dropped during the pre-processing stages using methods described in Section II-D.

Using this pilot study, we sought to answer two basic questions for real-life IE, namely whether it is possible: a) to create an inference-level fusion between AC and CA systems thereby creating an implicit channel of communication between a user and IE, b) to use laboratory stressors as ground truth for reallife stressors during ambulatory sensing of stress. The results produced from this pilot study as shown in Section V, depict that by training the SVM models individually for each user, we were able to find similarity in the patterns of physiological data acquired during sessions where two kinds of stressors were presented to a user, namely technostress in the OPE experiment and cognitive stress in the PST experiment. This is evident from the improvement in results in Case-III for all users (except User-D) as compared to Case-II and Case-I of the experiment. These results provide preliminary evidence of computationally learning statistical parameters corresponding to stress related physiological responses elicited from a proven laboratory stressor and using these parameters to classify stress responses in real-life settings (technostress in this case). Section II-B describes how technostress is elicited when a system malfunctions, thereby hindering a user’s progress. Thus, the parametric model capable of classifying technostress-states, can be used to provide a user feedback, employing the implicitchannel of communication [2], thereby completing an affective feedback loop in an intelligent environment.

1

User

VII.

CASE III

C ONCLUSION

0.9

In this work, we set out to explore the possibility of determining the relevance of services provided by an intelligent environment by creating an affective feedback loop. Following a few recent works, we also hypothesized that proven laboratory stressors such as paced Stroop tests can be used as effective ground truth collection instruments even in ambulatory settings. Preliminary results from this pilot study are encouraging and show that this is a novel idea worthy of further investigation.

0.8

Fβ - Scores

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

A

B

C

Users

D

E

F

G

Fig. 4: Userwise Fβ -score Comparison For Case I, II and III 978-1-4799-9953-8/15/$31.00 ©2015 IEEE

54

R EFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13] [14]

[15]

[16]

[17]

[18] [19]

[20]

[21]

F. Doctor, R. Iqbal, and V. Zamudio, “Introduction to the thematic issue on Affect Aware Ubiquitious Computing,” Journal of Ambient Intelligence and Smart Environments, vol. 7, no. 1, pp. 3–4, Jan. 2015. R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor, “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32– 80, Jan. 2001. Z. Hammal and M. Teodosia Suarez, “Towards Context Based Affective Computing,” in 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), Sep. 2013, pp. 802–802. S.-I. Yang and S.-B. Cho, “Recognizing human activities from accelerometer and physiological sensors,” in IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2008. MFI 2008, Aug. 2008, pp. 100–105. F.-T. Sun, C. Kuo, H.-T. Cheng, S. Buthpitiya, P. Collins, and M. Griss, “Activity-Aware Mental Stress Detection Using Physiological Sensors,” Mobile Computing, Applications, and Services, pp. 211–230, Jan. 2012. M. Baldauf, S. Dustdar, and F. Rosenberg, “A Survey on Context-Aware Systems,” Int. J. Ad Hoc Ubiquitous Comput., vol. 2, no. 4, pp. 263– 277, Jun. 2007. T. Hofer, M. Pichler, G. Leonhartsberger, J. Altmann, and W. Retschitzegger, “Context-awareness on mobile devices the hydrogen approach,” in Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2002, pp. 292–302. P. Prekop and M. Burnett, “Activities, context and ubiquitous computing,” Computer Communications, vol. 26, no. 11, pp. 1168–1176, Jul. 2003. J. C. Augusto, V. Callaghan, D. Cook, A. Kameas, and I. Satoh, “Intelligent Environments: A manifesto,” Human-centric Computing and Information Sciences, vol. 3, no. 1, p. 12, 2013. R. Riedl, H. Kindermann, A. Auinger, and A. Javor, “Computer Breakdown as a Stress Factor during Task Completion under Time Pressure: Identifying Gender Differences Based on Skin Conductance,” Advances in Human-Computer Interaction, vol. 2013, p. e420169, Oct. 2013. R. Riedl, “On the Biology of Technostress: Literature Review and Research Agenda,” SIGMIS Database, vol. 44, no. 1, pp. 18–55, Nov. 2012. J. J. Braithwaite, D. G. Watson, R. Jones, and M. Rowe, “A Guide for Analysing Electrodermal Activity (EDA) & Skin Conductance Responses (SCRs) for Psychological Experiments,” Psychophysiology, vol. 49, pp. 1017–1034, 2013. S. H. Fairclough, “Fundamentals of physiological computing,” Interacting with Computers, vol. 21, no. 1-2, pp. 133–145, Jan. 2009. D. Novak, A. Nagle, and R. Riener, “Linking recognition accuracy and user experience in an affective feedback loop,” IEEE Transactions on Affective Computing, pp. 1–1, 2014. R. Calvo and S. D’Mello, “Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications,” IEEE Transactions on Affective Computing, vol. 1, no. 1, pp. 18–37, Jan. 2010. K.-I. Benta, A. Hoszu, L. Vcariu, and O. Cre, “Agent Based Smart House Platform with Affective Control,” in Proceedings of the 2009 Euro American Conference on Telematics and Information Systems: New Opportunities to Increase Digital Citizenship, ser. EATIS ’09. New York, NY, USA: ACM, 2009, pp. 18:1–18:7. Z. Zeng, M. Pantic, G. Roisman, and T. Huang, “A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39–58, Jan. 2009. S. D. Kreibig, “Autonomic nervous system activity in emotion: A review,” Biological Psychology, vol. 84, no. 3, pp. 394–421, Jul. 2010. K. Plarre, A. Raij, S. Hossain, A. Ali, M. Nakajima, M. al’Absi, E. Ertin, T. Kamarck, S. Kumar, M. Scott, D. Siewiorek, A. Smailagic, and L. Wittmers, “Continuous inference of psychological stress from sensory measurements collected in the natural environment,” in 2011 10th International Conference on Information Processing in Sensor Networks (IPSN), Apr. 2011, pp. 97–108. J. Jaimovich, “Emotion recognition from physiological indicators for musical applications,” Ph.D., Queen’s University Belfast, 2013.

978-1-4799-9953-8/15/$31.00 ©2015 IEEE

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35] [36]

55

K. H. Kim, S. W. Bang, and S. R. Kim, “Emotion recognition system using short-term monitoring of physiological signals,” Medical and Biological Engineering and Computing, vol. 42, no. 3, pp. 419–427, May 2004. J. Kim and E. Andre, “Emotion recognition based on physiological changes in music listening,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 12, pp. 2067–2083, Dec. 2008. A. Guo, S. Raghu, X. Xie, S. Ismail, X. Luo, J. Simoneau, S. Gilliland, H. Baumann, C. Southern, and T. Starner, “A Comparison of Order Picking Assisted by Head-up Display (HUD), Cart-mounted Display (CMD), Light, and Paper Pick List,” in Proceedings of the 2014 ACM International Symposium on Wearable Computers, ser. ISWC ’14. New York, NY, USA: ACM, 2014, pp. 71–78. D. F. Barbosa, F. J. A. Prada, M. F. Glanner, O. d. T. Nbrega, and C. Crdova, “Cardiovascular response to Stroop test: comparison between the computerized and verbal tests,” Arquivos Brasileiros de Cardiologia, vol. 94, no. 4, pp. 507–511, Apr. 2010. J. Zhai and A. Barreto, “Stress Detection in Computer Users Based on Digital Signal Processing of Noninvasive Physiological Variables,” in 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2006. EMBS ’06, Aug. 2006, pp. 1355– 1358. P. Renaud and J.-P. Blondin, “The stress of Stroop performance: physiological and emotional responses to colorword interference, task pacing, and pacing speed,” International Journal of Psychophysiology, vol. 27, no. 2, pp. 87–97, Sep. 1997. Y. Shi, M. H. Nguyen, P. Blitz, B. French, S. Fisk, O. D. L. Torre, A. Smailagic, D. P. Siewiorek, M. A. Absi, E. Ertin, T. Kamarck, and S. Kumar, “Personalized Stress Detection from Physiological Measurements,” in 2010 International Symposium on Quality of Life Technology, 2010. B. Figner and R. Murphy, “Using skin conductance in judgment and decision making research,” in A Handbook of Process Tracing Methods for Decision Research: A Critical Review and Users Guide. Psychology Press, May 2011. A. H. H. Ngu, Q. Z. Sheng, D. Q. Huynh, and R. Lei, “Combining Multi-visual Features for Efficient Indexing in a Large Image Database,” The VLDB Journal, vol. 9, no. 4, pp. 279–293, Apr. 2001. J. Hernandez, R. R. Morris, and R. W. Picard, “Call Center Stress Recognition with Person-Specific Models,” in Affective Computing and Intelligent Interaction, ser. Lecture Notes in Computer Science, S. DMello, A. Graesser, B. Schuller, and J.-C. Martin, Eds. Springer Berlin Heidelberg, 2011, pp. 125–134. C. Setz, B. Arnrich, J. Schumm, R. La Marca, G. Troster, and U. Ehlert, “Discriminating Stress From Cognitive Load Using a Wearable EDA Device,” IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 2, pp. 410–417, Mar. 2010. J. Zhai, A. Barreto, C. Chin, and C. Li, “Realization of stress detection using psychophysiological signals for improvement of human-computer interactions,” in IEEE SoutheastCon, 2005. Proceedings, Apr. 2005, pp. 415–420. M. Kubat and S. Matwin, “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection,” in In Proceedings of the Fourteenth International Conference on Machine Learning. Morgan Kaufmann, 1997, pp. 179–186. M. Kubat, R. C. Holte, and S. Matwin, “Machine Learning for the Detection of Oil Spills in Satellite Radar Images,” Machine Learning, vol. 30, no. 2-3, pp. 195–215, Feb. 1998. Y. Sasaki, “The truth of the F-measure,” Teach Tutor mater, pp. 1–5, 2007. J. Wijsman, B. Grundlehner, J. Penders, and H. Hermens, “Trapezius Muscle EMG As Predictor of Mental Stress,” in Wireless Health 2010, ser. WH ’10. New York, NY, USA: ACM, 2010, pp. 155–163.