Embedded Fatigue Detection using Convolutional

0 downloads 0 Views 616KB Size Report
camera module mounted on the car dashboard connected to an embedded ... While these bio-metric based methods produce high detection accuracy, they ...
Embedded Fatigue Detection using Convolutional Neural Networks with Mobile Integration Mohammed Ghazal, Yasmine Abu Haeyeh, Abdelrahman Abed, and Sara Ghazal Abu Dhabi University, Abu Dhabi, UAE Email: [email protected]

Abstract—Fatigued or drowsy drivers pose a significant risk of causing life-threatening accidents. Yet, many sleep-deprived drivers are behind the wheels exposing lives to danger. In this paper, we propose a low-cost and real-time embedded system for fatigue detection using convolutional neural networks (CNN). Our system starts by spatially processing the video signal using a real-time face detection algorithm to establish a region of interest and reduce computations. The video signal comes from a camera module mounted on the car dashboard connected to an embedded Linux board set to monitor the driver’s eyes. Detected faces are then passed to an optimized fatigue recognition CNN binary classifier to detect the event of fatigued or normal driving. When temporally persistent fatigue is detected, alerts are sent to the driver’s smart phone, and to possibly others, for prevention measures to be taken before accidents happen. Our testing shows that the system can robustly detect fatigue and can effectively be deployed to address the problem. Keywords—Convolutional neural network, fatigue detection, embedded systems, IoT, signal processing

I. I NTRODUCTION Fatigued drivers are a serious cause of fatal accidents. According to the USA National Highway Traffic Safety Administration (NHTSA), 7% of all crashes and 16.5% of fatal crashes involved drowsy driving. It is estimated that the societal cost of fatigue-related fatal and injury crashes is $109 billion in the US, not including property damage [1]. To overcome this serious issue, many researches have been conducted for the purpose of designing a robust fatigue detection system in order to warn fatigued drivers before falling asleep and losing control of the vehicle. Fatigue has many definitions. One definition is a lack of energy physically or mentally, which does not lend itself to robust detection. We can also define it as drowsiness when it refers to a lack of sleep. These definitions focus on the cause rather than the consequences or signs of fatigue. We adopt a definition based on the physical signs of fatigue. This definition is necessary to setup our recognition problem and it is aligned with the definition used by researchers in the field. Fatigue is defined by its physical manifestations such as as measurable variations in the heart rate, observable behavior, the electrical resistance of skin, brain activity or body temperature [2]. Physical signals such as the electroencephalogram (EEG) have been used for estimating brain dynamics [3]. Physical and mental activities related to driving can be extracted from EEG signals [4]. Also, electrocardiogram signals (ECG) have been used in measuring heart rate variability (HRV) as a feature

for drowsiness detection. ECG provides an increased level of detection reliability under different driving conditions [5]. While these bio-metric based methods produce high detection accuracy, they require electrode contacts to be attached to the driver’s head or chest to produce the required measurements. Therefore, they are more difficult to deploy in the real-world compared to contact-less techniques. As a result, the majority of the work done in the area focuses on non-intrusive detection methods based on the drivers behavior to detect driver’s fatigue or distraction. These non-intrusive methods monitor the steering wheel movement, standard deviation of the lane position, and steering wheel angle using embedded sensors in different places inside the vehicle [6]. These measurements are biased to driving skills, vehicle status and road conditions, and are thus difficult to control. While extracting the driver’s visual features approaches such as eye-blinking, yawning and head movement to measure different levels of fatigue using computer vision techniques is promising. Researches have shown that the latency in visual response reflected in a parameter called Psycho-motor Vigilance Task (PVT) is directly connected to the percentage of closed eyelids over time (PERCLOS) [7]. Therefore, we can use this parameter to detect driver’s hypo-vigilance in order to estimate the level of fatigue and distraction [8]. Yawning can be an early warning sign of fatigue which can be concluded from the mouth geometrical features. Yawning detection performance is more susceptible to false detection [9]. Visual approaches are easier to deploy without the driving process and provide earlier fatigue detection. While it is relatively less accurate than approaches based on bio-electric signals, it addresses a major deployment barrier being contact-less. Recent advances in deep learning has given visual approaches a recent boost in performance. Since early and critical symptoms of fatigue can be identified from the driver’s eyes, our proposed fatigue detection system monitors eyes closure using a deep convolutional neural network in real-time using an embedded Linux board installed on the dashboard. The proposed system is low-cost with reduced deployment cost. II.

R ELATED W ORK

Previous works attempted to address the problem of fatigue and drowsiness detection using feature extraction based techniques [10]–[12]. The main aim is to monitor the loss of attention of the driver. While these feature extraction based technique [13], [14] provide faster detection and reduced

computational complexity, they are more sensitive to illumination conditions and are sensitive to imaging resolution. Recently, researchers have shifted their focus to driver face monitoring from the unprocessed image using deep learning without explicit feature-extraction. Well-trained machine learning algorithms offer high precision and robustness in fatigue recognition. A simple CNN-based drowsy driver detection system has been proposed in [15], where an average accuracy of 78% was accomplished across multiple subjects using their customized training data sets. The work in [16] presents a deeper CNN which has been adopted for detecting driver drowsiness from input RGB video. Their proposed network consists of three deep networks namely AlexNet, VGG-FaceNet and FlowImageNet. The approach uses transfer learning to reduce the training set size. When evaluating the detection system on NTHU-driver drowsiness detection benchmark video data set, it achieved a detection accuracy of 73%. A faster and more precise real-time drowsiness detection system has been presented in [17] by compressing their model to produce a lighter model to be deployed on an embedded system. The authors in [17] report a processing speed of 14.9 frames per second and an accuracy of 89.5%. They rely on the eyes and mouth as the main facial features in drowsiness recognition. III.

D EEP C ONVOLUTIONAL N EURAL N ETWORKS

Conventional machine learning techniques rely on designed feature extraction algorithms which attempt to summarize the source data. The summaries are then fed to classifiers to reach conclusions. Deep convolutional neural networks (DCNN) have been proved a robust and precise ability to extract features and reach a conclusion, without the need for user design or intervention. DCNNs were first inspired by cat’s visual cortex and are built from multiple layers performing convectional transforms with parameterized filters, followed by decimation through pooling and ending with a full connected convolutional network for final classification. An illustrative representation of a DCNN is shown in Fig. 1. The given network consists of seven layers starting from the input layer. We provide the layer size and the number of color channels on the figure. The input image is convolved with multiple filter kernels in the convolution layers. Filter weights are learned. The output volumes of the convolutional layers are processed by the next pooling layer for downsampling. The idea is to reduce the size while maintaining important information. Reducing the size reduces the number of trainable parameters. The most common form used is maxpooling. Convolutional layers followed by pooling layers form a pair or a block. The depth of the network is achieved by adding (repeating) more of these blocks. The generated feature maps from sequential convolution and pooling layers need a fully connected layer at the end which uses the summarized information produces so far (the automatically produces feature space) to generate an output equal to the number of classes in our classification problem. Deeper CNN perform better when trained on very large data sets. Transfer learning allows researchers to reduce the demand for data by

Fig. 1.

Schematic diagram of a deep convolutional neural network.

starting the search for the optimum parameters for the problem in hand from initial conditions obtained through searching for other similar problems where the data is available. For example, transfer learning on a data set already trained on faces reduces the need for data to practical levels. IV.

R EAL -T IME FACE DETECTION

Face detection is a typical first stage of many driver face monitoring systems to establish a region of interest and reduce computational cost. Two main approaches have been adopted in the literature: 1) a feature-based approach based on the color of the skin or the face shape; and 2) a machine learning based approach relying on face detection. Although feature-based approaches are faster, they are less accurate in the presence of noise. On the other hand, learning-based approach using machine learning algorithms are more effective but require more computing time. We use a learning-based technique for face detection based on the Haar-like features of Viola and Jones and cascade classifiers [18]. An integral image is used to reduce the initial image processing required for face detection and compute the rectangle features efficiently, where the value of integral image at point (x, y) can be calculated in one pass over the original image and it equals the sum of all pixels above and to the left of (x, y) using I(x, y) = i(x, y)+I(x−1, y)+I(x, y−1)−I(x−1, y−1), (1) X where the integral image I(x, y) = i(x0 , y 0 ) , and x0 ≤x,y 0 ≤y

i(x, y) is the original image. To achieve fast and precise face detection, the authors used Ada-Boost to build a simple and efficient detector for faces from the computed valued set of image features relying on the integral image technique. We use the Haar-like features and cascade classifiers for detection faces and regions within faces such as (eyes, frontal face, etc..). V.

P ROPOSED FATIGUE D ETECTION S YSTEM

A. Network Architecture In our proposed fatigue detection system, we used a model composed of two double convolutional layers followed by max-pooling layers, leading to two fully connected hidden layers with a final Softamx layer for classification as illustrated in Fig. 2.

Fig. 2.

Proposed fatigue detection network architecture

1) Convolutional Layer: The convolutional layer is the core of any convolutional network accomplishing the main role of extracting input image features. It is usually the first layer and consists of multiple learn-able filters to produce an activation map. It accepts an input size of (W ×H ×D) which correspond to our data image size of (100 × 100 × 1) since we are converting RGB images into grey-scale to focus on facial features rather than skin tone. We use double convolutional layers as this has experimentally produced a balance between accuracy and computations. 2) Max-Pooling Layer: A common non-linear downsampling method to reduce the spatial size of the representation in order to decrease the number of parameters, hence reduce computation time and control over-fitting. 3) Dropout: The dropout layer can be used as an image noise reduction technique when applied after max-pooling layer. When it used with fully connected layers it deactivates part of the neurons in order to improve generalization by allowing the layer to learn the same concept with different neurons and avoid over-fitting. 4) Fully-Connected Layer: Dense hidden layers take an input volume from the output of the last max-pooling layer to produce an N -dimensional vector, where N corresponds to the number of classes. In our case, the model has to choose from two classes (fatigued and control) for classification. 5) Model Parameters: We build our sequential model by adding a stack of linear layers. The system is trained after defining an optimizer and a loss function. We use the Adam optimizer, which is a stochastic optimization technique. For the loss function, the objective function that the training tries to minimize, we use the categorical cross entropy. For performance evaluation, we use the accuracy. B. System Operation The proposed real-time fatigue detection system as shown in Fig. 3. We start by loading weights from a pre-trained network to transfer the learning and reduce dependency on large data sets. We then capture a frame from the camera and apply face detection to establish a region of interest. The face image is then run through the classifier to label the frame as having signs of fatigue or not. If the label persists over time, a warning is delivered to subscribing smart phones. VI. T ESTING A. Experimental Platform We focus on resource-limited devices and aim to have a solution which is scale-able, inexpensive, and robust. To test

Fig. 3.

Proposed fatigue detection system

the system we used a Raspberry Pi 3 model B due to its lowcost, portability, and connectivity features. Cloud integration is used to deliver the alarm to the subscribing smart phone (including the driver and his or her support system through work or family). The Raspberry Pi has 1 GB RAM and 1 Ghz processing speed. We used the Raspberry Pi camera module for taking the pictures. For training, we used iMac 3.06 GHz Intel Core i3 with ATI Raeon HD 4670 GPU which has a core clock speed of 750 Mhz, and Ubuntu 16.04 with Keras-Theano back end deep learning library.

Fig. 4.

RaspberryPi3 model B with 5MP camera

B. Database For training our fatigue detection system, we used the Closed Eyes in The Wild (CEW) dataset [19]. This data set contains 2423 subjects of which 1192 subjects who have both eyes closed are collected. We use 1231 subjects with eyes open from the Labeled Face in the Wild (LFW) database. The cropped coarse faces are re-sized to (100 × 100) pixels. Our data set has been labeled and divided randomly into training and validation data sets following 10-fold cross validation approach to assure that we selected the best skilled training model.

Fig. 5.

Real-time fatigue detection system

able to increase the frame rate of our pipeline. This speedup is obtained by reducing I/O latency and ensuring the main thread is not blocked, allowing us to grab the most recent frame read by the camera at any moment in time. Using this multi-threaded approach, our video processing pipeline is not blocked.

C. Real-time fatigue detection As shown in Fig. 5, the detection system is capable of differentiating between alert and fatigued drivers. During the real-time detection process, an alarm is sent immediately to the driver’s smart phone once fatigue symptoms are detected to alert the driver. D. Experimental Results and Analysis While training, the system reported 95.8% of crossvalidation accuracy. We also tested our proposed system’s performance by applying on never seen before test subjects and obtained an average accuracy of 95%. In Fig. 6, the testing accuracy is compared with the training accuracy to find the optimal number of training epochs. We obtain testing accuracy from testing the model on subjects outside the training data. If the curves of testing and training accuracy start to depart consistently, then we need to stop the training at an earlier epoch to avoid over-fitting. We face the problem of overfitting when the model is over-learned the training data set. We can observe from the graph that 12 epochs in our model are sufficient for accurate results. When performing real-time testing using the Raspberry Pi 3 with its 5 MP camera module, we observe a delay in the detection process since due to the reduced frames rate. We speed up the Pi camera response using a dedicated thread (separate from the main thread) to read frames from the camera module and are therefore

Fig. 6.

Proposed system accuracy over training time.

E. Comparison to related work In [15], the authors proposed a driver drowsiness detection system based on deep learning classifier. Their system achieved an average accuracy of 78% across all test subjects. To compare the performance of our system to the state of the art, we applied their training model on the same data we used for training to be able to compare both models in terms of accuracy per number of epochs. The results are shown in Fig. 7.

[6]

[7]

[8]

[9]

[10]

Fig. 7. Comparison between proposed model and previous work [15] model accuracy.

[11]

[12]

VII. C ONCLUSION In this paper, we propose a real-time monitoring system consisting of a Raspberry Pi 3 with an attached camera module and a deep learning fatigue detection pipeline. Our proposed system starts by face detection followed by deep convolutional neural network classification using a Keras back end. During monitoring, the system relies on persistent detection of closed eyes to raise alarms to the driver’s smart phone and possibly others. We achieve 95% in our model based on our testing. The proposed method is real-time, low-cost, and can be deployed easily on existing vehicles. ACKNOWLEDGMENT This work is sponsored by the Office of Research and Sponsored Programs of Abu Dhabi University under grant number. The authors thank Sarah Hasan Baras for her work on the implementation. R EFERENCES [1]

P. Fischer, J. Adkins, D. Davila, Ch. DeWeese, V. Harper, and J. Stephen Higgins. ”wake up call! understanding drowsy driving and what states can do”, 2016. [2] H Cai and Y Lin. An experiment to non-intrusively collect physiological parameters towards driver state detection. In Proceedings of the SAE World Congress, Detroit, Mich, USA, 04 2007. [3] Y. T. Liu, Y. Y. Lin, S. L. Wu, C. H. Chuang, and C. T. Lin. Brain dynamics in predicting driving fatigue using a recurrent self-evolving fuzzy neural network. IEEE Transactions on Neural Networks and Learning Systems, 27(2):347–360, Feb 2016. [4] G. Borghini, G. Vecchiato, J. Toppi, L. Astolfi, A. Maglione, R. Isabella, C. Caltagirone, W. Kong, D. Wei, Z. Zhou, L. Polidori, S. Vitiello, and F. Babiloni. Assessment of mental fatigue during car driving by using high resolution eeg activity and neurophysiologic indices. In 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 6442–6445, Aug 2012. [5] K. T. Chui, K. F. Tsang, H. R. Chi, B. W. K. Ling, and C. K. Wu. An accurate ecg-based transportation safety drowsiness detection scheme. IEEE Transactions on Industrial Informatics, 12(4):1438–1452, Aug 2016.

[13]

[14]

[15]

[16]

[17]

[18]

[19]

Zuojin Li, Shengbo Li, Renjie Li, Bo Cheng, and Jinliang Shi. Online detection of driver fatigue using steering wheel angles for real driving conditions. In Sensors, volume 17, page 495, 03 2017. Paul Stephen Rau. Drowsy driver detection and warning system for commercial vehicle drivers: Field operational test design, data analyses, and progress. Proceedings: International Technical Conference on the Enhanced Safety of Vehicles, 2005:7p–7p, 2005. Mohamad-Hoseyn Sigari, Mahmood Fathy, and Mohsen Soryani. A driver face monitoring system for fatigue and distraction detection. International Journal of Vehicular Technology, 2013, 01 2013. S. Abtahi, B. Hariri, and S. Shirmohammadi. Driver drowsiness monitoring based on yawning detection. In 2011 IEEE International Instrumentation and Measurement Technology Conference, pages 1–4, May 2011. A. Dasgupta, A. George, S. L. Happy, and A. Routray. A vision-based system for monitoring the loss of attention in automotive drivers. IEEE Transactions on Intelligent Transportation Systems, 14(4):1825–1838, Dec 2013. W. Zhang, B. Cheng, and Y. Lin. Driver drowsiness recognition based on computer vision technology. Tsinghua Science and Technology, 17(3):354–362, June 2012. J. J. Yan, H. H. Kuo, Y. F. Lin, and T. L. Liao. Real-time driver drowsiness detection system based on perclos and grayscale image processing. In 2016 International Symposium on Computer, Consumer and Control (IS3C), pages 243–246, July 2016. H. Hajjdiab and I. Al Maskari. Plant species recognition using leaf contours. In 2011 IEEE International Conference on Imaging Systems and Techniques, pages 306–309, May 2011. R. Laganire, H. Hajjdiab, and A. Mitiche. Visual reconstruction of ground plane obstacles in a sparse view robot environment. Graphical Models, 68(3):282 – 293, 2006. SPM 2005. K. Dwivedi, K. Biswaranjan, and A. Sethi. Drowsy driver detection using representation learning. In 2014 IEEE International Advance Computing Conference (IACC), pages 995–999, Feb 2014. Sanghyuk Park, Fei Pan, Sunghun Kang, and Chang Dong Yoo. Driver drowsiness detection system based on feature representation learning using various deep networks. In ACCV Workshops, 2016. B. Reddy, Y. H. Kim, S. Yun, C. Seo, and J. Jang. Real-time driver drowsiness detection for embedded system using model compression of deep neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 438–445, July 2017. P. Viola and M. Jones. Robust real-time face detection. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volume 2, pages 747–747, 2001. F.Song, X.Tan, X.Liu, and S.Chen. ”eyes closeness detection from still images with multi-scale histograms of principal oriented gradients”, pattern recognition, 2014.

Suggest Documents