A Drowsiness Detection Scheme Based on Fusion of Voice and Vision ...

A Drowsiness Detection Scheme Based on Fusion of Voice and Vision Cues Anirban Dasgupta, Bibek Kabi, Anjith George, SL Happy, Aurobinda Routray

Abstract—Drowsiness level detection of an individual is very important in many safety critical applications such as driving. There are several invasive and contact based methods such as use of blood biochemical, brain signals etc. which can estimate the level of drowsiness very accurately. However, these methods are very difficult to implement in practical scenarios, as they cause discomfort to the user. This paper presents a combined voice and vision based drowsiness detection system well suited to detect the drowsiness level of an automotive driver. The vision and voice based detection, being non-contact methods, has the advantage of their feasibility of implementation. The authenticity of these methods have been cross-validated using brain signals. Keywords—drowsiness; voiced-silenced; PERCLOS;

I. INTRODUCTION

A

major cause of road accidents is the loss of attention in automotive drivers, which has been a concern for transportation safety over the years. Hence, on-board monitoring of the alertness level of the driver is necessary to prevent such occurrences. Alertness level in human beings can be assessed using different measures such as Electroencephalogram (EEG)[1], ocular features [2][3][4], blood samples [5], speech [6], skin conductance [7] etc. The EEG and blood bio-chemicals based method has been reported to be the most authentic cue for estimating the reduction in alertness level [8]. However, being a contact based method, its feasibility of implementation becomes impractical[9]. In this paper, image and voice cues have been considered for implementation on virtue of being non-contact based method. Voice based algorithm works on voiced-silenced ratio[6]. The vision based algorithm computes a ratio called PERCLOS[10], which is based on the eye closure rates. The vision based algorithm utilizes the near infra-red (NIR) illumination for night driving. EEG based method have been used for off-board validation of the vision and voice based algorithms. The NIR data was tested using the database reported in [11]. There has been attempts to develop similar systems based on vision such as [12]–[16]. However, these systems only rely on image information, which may be improved by using a multimodal approach. The system reported in this paper improves existing system by incorporating a voice based algorithm, which provides additional information related to drowsiness. The voice based approach has been recently established as a measure of drowsiness[6] and also has been experimentally validated in this work.

The alertness monitoring system described in this work consists of a single USB camera having maximum frame rate of 30 fps at a resolution of 640×480 pixels. The camera is placed directly on the steering column just behind the steering wheel to obtain the best view of the driver’s face. The microphone is placed on the dashboard. The embedded processing unit is an x86 architecture, Intel Atom processor based Single Board Computer (SBC). The SBC is powered at 12 V DC from the car supply. The typical current drawn from the input source is approximately 1200 mA. The approximate typical power drawn from the supply is 15 W. A voltage regulator unit, comprising of IC LM317 along with some resistors, a capacitor and an inductor, is used before the input to remove high voltage spikes from the car supply. An NIR lighting arrangement, consisting of a matrix of 3×8 Gallium Arsenide LEDs, is also connected across the same supply in parallel with the Embedded Platform. The NIR module is operated at 10 V DC and draws a typical current of 250 mA. The lighting system is connected through a Light Dependent Resistor (LDR), to automatically switch on the NIR module in the absence of sufficient illumination. A seven inch LED touch screen is used to display the results. A set of speakers are installed for generating voice alarm. The paper is organized as follows. Section II discusses the voice based algorithm. Section III presents the vision based algorithm. The paper is concluded in Section IV. II. VOICE BASED SYSTEM Non obtrusive nature of speech data and its sensor free application makes it advantageous for drowsiness detection over contact-based methods. Moreover, speech is easier to record even under extreme environmental conditions and noisy surroundings like high temperature, high humidity, crowded traffic, motor sound, horn sound etc. In this work, we define loss of alertness in two ways: 1) loss of alertness due to drowsiness and 2) loss of alertness associated with response time. Two sets of experiments were conducted to assess the alertness level. Two different frameworks were developed for alertness assessment. The acoustic fatigue detection schemes require the estimation of various speech parameters which are affected by low alertness level due to fatigue, such as pitch, duration of voiced and unvoiced speech, Linear Predictive Coding (LPC), Linear Predictive Cepstral Coefficients (LPCC)[17], Mel Frequency Cepstral Coefficients (MFCC)[18], [19], formants etc.[20] These estimated features will be fed to a classifier to decide upon the level of alertness.

We have developed two algorithms for assessing alertness through 1) voiced and unvoiced classification (Experiment-1) and 2) voiced and silence classification (Experiment-2). A. Experiment Conducted Some tasks require subjects to divide their attention, or inhibit their response. This is done in the Stroop Test[21]. The Stroop test is a popular neuropsychological test widely used to examine the nature of automatic and controlled cognitive processes, disturbances in cognition caused by psychiatric and/or neurological disorders, or to explore the neurocognitive architecture of selective attention. There are three variants of this test: neutral, congruent and incongruent. In the incongruent version, each stimulus contains a mismatch between word and color, which distracts attention and demands attentional focus. The name of a color is written in a different color (for example, the word ‘BLUE’ may be written in green). Naming the color of the word, instead of the word in the color, would therefore take longer and would be more prone to errors than when the color of the ink matches the name of the color. This is called Stroop effect. In the present experiment, the subject is asked to read the Stroop set rowwise. Scoring is carried out on the basis of the total time taken to complete the set and the number of correct responses. A sample set is shown in Fig. 1. B. Method Speech signals can be classified into 3 broad classes based on the mode of generation namely silence, voiced speech and unvoiced speech. When the speaker is silent, the signal sensed is due to background and sensor noises. Voiced speech is produced by the vibrations of vocal cords whereas unvoiced sounds are due to turbulence of air in vocal tract (mouth, tongue, velum etc.). From a signal processing perspective, the classification can be done based on Energy and zero crossing rates. Silence and unvoiced speech have lower energy and higher zero crossing rates compared to voiced speech. The

vocal fold vibrations are quasi periodic, i.e. the vibrations can be assumed periodic if the signal is of short duration (1030ms). For this reason, speech is processed in small windows of size 10-30ms (called frames). In this project, a window size of 25ms is considered. Voiced and unvoiced classification is done by using support vector machine. The voiced and unvoiced classification using MFCC as features and trained support vector machine as classifier is shown in Fig. 2. Energy feature was used to classify voiced and silence segments. After framing and windowing a 6 level Daubechies Wavelet was applied to calculate the per band energy because the fundamental frequency of voiced segments ranges from 60500 Hz. The ratio between the energy of the bands between 62.5 and 1000 Hz to that of all bands are computed to classify as voiced or silence. Each frame of the speech sample is labelled as voiced or silence and the voiced-to-silence ratio decides whether the subject is alert or not. The wavelet based per band energy methodology to compute the voiced to silence ratio has been shown in Fig. 2

Fig. 1 Methodology of speech based algorithm

Fig. 2 Scheme of speech processing

Fig. 3 Duration of Stroop test variation with increase in drowsiness

Voiced-to-silence ratio

4.5 4 3.5 3

2.5 Subject 3

2

detected, the eye states are classified into open and closed. Support Vector Machine (SVM), being a robust binary classifier [27] is used for classifying the eye states. Linear SVM gives a successful classification rate of 98.6% and has been employed for classifying the eye state.

Subject 2

1.5

Subject 1

1

0.5 0 1

2

3

4

5

6

7

8

9 10 11 12 13

Stroop test Stages

Fig. 4 Voiced to silenced ratio with increasing drowsiness The results reveal that the algorithm is robust enough to detect the level of drowsiness in any human being. Hence, this algorithm was implemented for detection of drowsiness in automotive drivers.

Fig. 5 Some detection results using PCA

C. System Design The system taken input voice samples from the driver. The system has been installed on a car. The microphone has been fixed on the dashboard while the speakers are kept at the bottom of the steering wheel. The power is drawn from the car battery at 12V DC. The driver is asked to speak for a duration of 20 seconds. Then the voice is analyzed for classification of the state of alertness of the drivers.

Fig. 6 Eye localization using LBP

III. VISION BASED SYSTEM A. Computation of PERCLOS The vision based method relies on the computation of PERCLOS. PERCLOS may be defined as the percentage of time eyelids are closed over the pupil[22]. The methodology of computation of PERCLOS is as follows:  Face Detection  Eye Detection  Eye State Classification  Drowsiness Detection The face was detected online using a classifier trained with Haar-like features[23]. Detection accuracy is found to be low for a moderate amount of in-plane rotation of the face of the subject. An affine transformation [24] of the input frame is performed to detect in-plane rotated faces. It was found to be robust for rotations up to an angle of 30o. The detection accuracy was found to be low when the lighting conditions were extremely bright or dark. BiHistogram Equalization (BHE) [25] of the input frame is used to compensate for the varying lighting conditions. Using BHE, the face detection accuracy was found to improve from 92% to 94% under on-board conditions, thereby making the algorithm robust against illumination variation. The eye detection is carried out using block Local Binary Patterns (LBP) [26]. During night driving, where passive NIR lighting is used to illuminate the face. The eye detection rate is found to be 96.5%. Once the eyes are

Fig. 7 Sample training images for eye state classification using SVM

Once the eye state is classified, the closed and open eyes are counted for a window of 3 minutes. PERCLOS value for that window is computed and based on a threshold of 20%, the driver is adjudged as drowsy or alert. The overall vision based algorithm is found to work at 10 fps. B. Combined System The combined vision and voice based system has been programmed to operate in a parallel fashion. When both the voice and vision based system declares the driver as drowsy an alarm is sounded. In this manner, the driver is warned of being drowsy and is advised to refresh himself. Such a technology is intended to prevent many such road accidents which occur due to inattentiveness or drowsiness of the driver. The system has been validated against highly

authentic EEG signals and psychomotor vigilance tests under laboratory conditions.

[10]

IV. CONCLUSION AND FUTURE SCOPE This paper presents a system for drowsiness detection in automotive drivers based on image and voice cues. Onboard monitoring of the alertness level of an automotive driver has been a challenging research in transportation safety and management. In this paper, we have developed a robust real-time embedded platform to monitor the loss of attention of the driver based on image and speech signals. The voice based algorithm uses the computation of the ratio of voiced to unvoiced speech. The image based algorithm relies on the computation of PERCLOS. The algorithms run in parallel on an embedded hardware in real-time. The algorithms have been cross-validated using brain signals and psychomotor tests. The system has been tested systematically both off-board and on-board. A limitation of the system is that, it shows limited accuracy for driver’s wearing spectacles, however preliminary work has been reported in [28]. There are further scopes of improvement in the voice based algorithm. Removal of unwanted noise and passenger’s speech has however not been taken into consideration.

[11]

[12]

[13]

[14]

[15]

[16]

REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

S. Kar and A. Routray, “Effect of sleep deprivation on functional connectivity of EEG channels,” Syst. Man, Cybern. Syst. IEEE Trans., vol. 43, no. 3, pp. 666–672, May 2013. W. Qing, S. BingXi, X. Bin, and Z. Junjie, “A PERCLOSBased Driver Fatigue Recognition Application for Smart Vehicle Space,” 2010 Third Int. Symp. Inf. Process., pp. 437–441, Oct. 2010. E. H. Kim, V. R. Vicci, and T. L. Alvarez, “Saccade and vergence interaction during fatigued versus non-fatigued sessions,” 2009 IEEE 35th Annu. Northeast Bioeng. Conf., pp. 1–2, Apr. 2009. T. Ito, S. Mita, T. Nakano, and S. Yamarnoto, “Driver Blink Measurement by the Motion Picture Processing and its Application to Drowsiness Detection,” no. September, pp. 168–173, 2002. B. Nayak and A. R. Kar, S, “A biomedical approach to retrieve information on driver’s fatigue by integrating EEG, ECG and blood biomarkers during simulated driving session,” in 4th International Conference on Intelligent Human Computer Interaction, 2012. L. S. Dhupati, S. Kar, A. Rajaguru, and A. Routray, “A novel drowsiness detection scheme based on speech analysis with validation using simultaneous EEG recordings,” in 2010 IEEE International Conference on Automation Science and Engineering, 2010, pp. 917–921. M. M. Bundele, “Detection of Fatigue of Vehicular Driver using Skin Conductance and Oximetry Pulse : A Neural Network Approach,” pp. 739–744, 2009. C. Papadelis, C. Lithari, and C. Kourtidou-papadeli, “Monitoring Driver ’ s Sleepiness On-Board for Preventing Road Accidents,” pp. 485–489, 2009. S. L. Happy, A. Dasgupta, P. Patnaik, A. Routray “Automated alertness and emotion detection for empathic feedback during e-Learning,” in Proceedings - 2013 IEEE

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

5th International Conference on Technology for Education, T4E 2013, 2013. R. Hanowski, D. Bowman, A. Alden, and W. Wierwille, “PERCLOS+: Moving Beyond Single-Metric Drowsiness Monitors,” Strategies, 2008. S. L. Happy, A. Dasgupta, A. George, and A. Routray, “A video database of human faces under near Infra-Red illumination for human computer interaction applications,” in 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), 2012, pp. 1–4. P. Zheng and C. Q. Zhang, “Hardware Design of Fatigue Detection System Based on PERCLOS,” 2011 Third Int. Conf. Multimed. Inf. Netw. Secur., pp. 1–5, Nov. 2011. A. Dasgupta, A. George, S. L. Happy, and A. Routray, “A Vision-Based System for Monitoring the Loss of Attention in Automotive Drivers,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 1, pp. 1–14, 2013. L. Lang and H. Qi, “The Study of Driver Fatigue Monitor Algorithm Combined PERCLOS and AECS,” 2008 Int. Conf. Comput. Sci. Softw. Eng., no. April 1999, pp. 349– 352, 2008. A. Dasgupta, A. George, S. L. Happy, A. Routray, and T. Shanker, “An on-board vision based system for drowsiness detection in automotive drivers,” Int. J. Adv. Eng. Sci. Appl. Math., vol. 5, no. 2–3, pp. 94–103, Aug. 2013. S. Arun, M. Murugappan, and K. Sundaraj, “Hypovigilance warning system: A review on driver alerting techniques,” 2011 IEEE Control Syst. Grad. Res. Colloq., pp. 65–69, Jun. 2011. A. Samantaray, B. Kabi, A. Routray and K. Mahapatra, “A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages,” Recent Trends …, 2015. B. Kabi, A. Samantaray, A. Routray, “Voice Cues, Keyboard Entry and Mouse Click for Detection of Affective and Cognitive States: A Case for Use in Technology-Based Pedagogy,” … Educ. (T4E), …, 2013. B. Kabi, S. Sahoo, A. Routray, “Floating to Fixed-point Translation with its Application to Speech-based Emotion Recognition,” … (EAIT), 2014 Fourth …, 2014. S. Shukla, S. R. M. Prasanna, and S. Dandapat, “Stressed speech processing: Human vs automatic in nonprofessional speakers scenario,” 2011 Natl. Conf. Commun., pp. 1–5, Jan. 2011. D. Wu, C. G. Courtney, B. J. Lance, S. S. Narayanan, M. E. Dawson, K. S. Oie, and T. D. Parsons, “Optimal Arousal Identification and Classification for Affective Computing Using Physiological Signals: Virtual Reality Stroop Task,” IEEE Trans. Affect. Comput., vol. 1, no. 2, pp. 109–118, Jul. 2010. W. Wierwille, “Historical perspective on slow eyelid closure: When PERCLOS,” in Technical Conference on Ocular Measures of Driver Alertness, 1999. S. Gupta, A. Dasgupta, and A. Routray, “Analysis of training parameters for classifiers based on Haar-like features to detect human faces,” in 2011 International Conference on Image Information Processing, 2011, pp. 1–4. P. K. Mohanty, S. Sarkar, and R. Kasturi, “Designing Affine Transformations based Face Recognition Algorithms,” 2005.

[25]

[26]

[27]

[28]

S. Electronics, “Contrast enhancement using brightness preserving bi-histogram equalization,” IEEE Trans. Consum. Electron., vol. 43, no. 1, pp. 1–8, 1997. T. Ahonen, A. Hadid, and M. Pietikäinen, “Face description with local binary patterns: application to face recognition.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 12, pp. 2037–41, Dec. 2006. E. Osuna, R. Freund, and F. Girosit, “Training support vector machines: an application to face detection,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 130–136, 1997. M. Singvi, A. Dasgupta, and A. Routray, “A real time algorithm for detection of spectacles leading to eye detection,” in 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), 2012, pp. 1–6.

A Drowsiness Detection Scheme Based on Fusion of Voice and Vision ...

A Drowsiness Detection Scheme Based on Fusion of Voice and Vision ...

Suggest Documents

Detection of Drowsiness based on HOG features and ...

Vision-Based Drowsiness Detector for a Realistic Driving ... - CiteSeerX

Driver Drowsiness Monitoring Based on Yawning Detection - CiteSeerX

multiple-feature fusion based onset detection for solo singing voice

Vision-Based Estimation of Driver Drowsiness with ORD ... - IEEE Xplore

A Wearable EEG-Based Drowsiness Detection System with Blink ...

a robust watermarking scheme based on edge detection and contrast

Detection of Voice Disorders based on Wavelet and ... - IEEE Xplore

Drowsiness Detection and Management - OMICS International

Bayesian Fusion of Laser and Vision for Multiple People Detection

Vision-Based Corrosion Detection Assisted by a

A Robust Vision-based Moving Target Detection

formance Based on a Target Object Detection Scheme - J-Stage

noise robust voice activity detection based on statistical model and ...

A review on computer vision based defect detection ...

Computer Vision Based Fire Detection

A Review of Vision Based Vehicle Detection, Recognition and Tracking

Multi-Timescale Drowsiness Characterization Based on a ... - MDPI

Vision-based Vehicle Detection and Classification

Robust Voice Activity Detection Based on the Entropy of Noise ...

Vision based Defect Detection on 3D Objects and

Detection of denial-of-service attacks based on computer vision ...

Towards a new system for drowsiness detection based on eye blinking ...

Detection of denial-of-service attacks based on computer vision ...