Modelling Automatic Continuous Emotion

0 downloads 0 Views 730KB Size Report
VGGFace. AlexNet. LLD descriptor. First high level audio features. Second high level audio features. Third high level audio features. Video. Data. Audio. Signal.
Modelling Automatic Continuous Emotion Recognition using Machine Learning Approach Motivation

Methodology

Mental health problem affect one in four citizen at some point of their lives. However, the first step aimed at the behaviour of people suffering from mood disorder only limited to categorize emotion description such as happy, sad, fear, surprise and so on. Our approach is to advance emotion recognition by modelling behavioural cues of human affect as a small number of continuously value time signal, as an emotion felt during present times may be influenced by the emotion during previous one. It will involve Machine Learning technique to predict time continuous emotion using multivariate time series data.

AVEC2014

Behavioural Cues







ECG EDA HRHRV SCL SCR

• VGGFace AlexNet

Second high level audio features Third high level audio features

Figure 1: Feature Extraction Method from Multi-Modality Behavioural Cues

Fusion of Label from Multi-modality • to give more weights on wellperformed individuals features • Weighted mean of class score is taken from development performance. • Previous researcher applied it on classification, while we apply it on regression analysis.

Student Name: Yona Falinie Binti Abd Gaus Supervisor Name: Dr Hongying Meng

AVEC2016

CORR

RMSE

CORR

RMSE

A D

0.5546 0.5538

0.0921 0.1010

0.582

0.143

#

#

V Mean

0.5548 0.5544

0.0570 0.0834

0.434 0.508

0.144 0.1435

*A=arousal, V=valence, D=dominance *CORR=Pearson Correlation Coefficient, RMSE=Root Mean Square Error

Discussion Figure 2: CNN architecture for feature extraction from video data

Features

First high level audio features

Audio LLD and MFCC is being chosen because it is smaller, expertknowledge based feature sets and it leads to high robustness for the modelling of emotion from speech. Temporal nature of audio is also taken into account, by implementing deep autoencoder on top of audio data. Video LGBP-TOP, Facial landmark is being chosen as a features from video data. Deep learning features is also being explored, to obtain 4096 dimensional features from video clip. Physiological signal More wearable device now include physiological sensors, such as electrodermal activity (EDA) or electrocardiogram (ECG) can be purchased at an affordable cost.

Machine Learning

Regression Training

Facial landmark

Automatic Continuous Emotion Recognition

Features

LLD descriptor

Physiological Signal

Machine Learning

Features

LGBPTOP

Audio Signal

Feature Extraction

Baseline features

Data

Video Data

Result and Analysis

LSTM

Y1

LSTM

Y2

LSTM

Y3

LSTM

Y4

Figure 3: Overview of proposed approach on affective recognition dimension system

Machine Learning • Since features available are generally very high dimensional, LSTM can only be fully utilized in second stage of regression method. • LSTM also can consider dynamic relationship between consecutive units of expression in each and every affective dimensions.

• Adding CNN features with hand crafted features greatly improve the performance in each dimensions. Arousal Label • Concatenating original audio features with deep autoencoder audio features greatly helps improve the performance in each dimensions. Dominance Label • As for Physiological signal, ECG modality on arousal dimensions and HRHRV modality on valence dimensions gives competitive results, when compared to audio and video modality Valence Label • Exponent weighted decision fusion contribute to increase the CC values with little computational time

Future Work • We plan to adjust LSTM parameter so it can capture temporal relationship between each consecutive unit of affective dimension, by taking into account only subset of past observation. • There are also several hyperparameters in SVR that we can tune to improve speed and results.

Department: Department of Electronic and Computer Engineering Brunel University London

Suggest Documents