Real Time Driving Data Collection and Driver Verification using CMAC-MFCC M. Khalid, W. Abdul and N. Kamaruddin Center for Computational Intelligent, School of Computer Engineering Nanyang Technological University, Blk N4 #2A-36, Nanyang Avenue, Singapore 639798 email:
[email protected], tel/fax: 65 6790-4948/ 65 6792-6559
Abstract - There have been numerous studies in understanding driver behavior for purpose of understanding contributing factors to high accident rates. Driving abnormalities could be one of the many factors affecting accidents and if it can be detected this will help prevent accidents. In this paper we present simple and effective methods for an in-car data acquisition in collecting real time driving data. These data will be used to investigate the effectiveness of driver behavior, focusing on driver’s response to the brake and gas pedals as well as its rate of change. From these data, we will demonstrate simple yet effective technique in driver verification. Driver profiles were created using the cerebellum model articulation controller (CMAC) feature map taking inputs from the brake and gas pedals pressure signals. From the CMAC outputs, relevant features were extracted using Mel-Frequency Cepstral Coefficient (MFCC). These features were used to verify drivers using multi layer perceptron (MLP) as classifiers. The performance of the driver verification indicates positive development in the area of intelligent vehicle driver verification system that may enhance the driver’s security, safety and comfort in driving. Keywords: brake pedal pressure, gas pedal pressure, CMAC, MFCC, driver verification.
1.
Introduction
With increased emphasis on the practicality and safety of motor vehicles, the recognition of drivers and their driving behavior has gained importance. Most drivers are aware on the affect of drinking and use of mobile phone while driving. However, little consideration to other factors that can be even more distracting such as fatigue, stress and their psychological state that may have a serious impact on driving serious impairments were considered. In addition, biometrics authentication technology has far progress over these years as the price of identity theft or stolen goods is unbearable. In recent developments, fingerprint anti-theft devices can be purchased and planted in vehicles to allow registered drivers to drive the car [1]. From our previous study, we have managed to model a driver’s behavior from the way they apply pressure on brake and gas pedals [2] as well as using a multi-dimensional CMAC for driver behavior profiling [3]. In extending their work and contribute to
this research area, we will acquire relevant driving data and thus, will investigate on the ability of using CMAC output in extracting relevant features. We will be focusing on the utilizing only the brake and gas pedal pressures to propose a technique in modeling a system for in-car driver verification. CMAC The Cerebellar Model articulation Controller (CMAC) was proposed by Albus (1975) [4] as a simple model of the cerebellar cortex of mammals. It is a simple neural network architecture that provides the advantages of fast learning and a high convergence rate [5]. It uses a table look-up technique to represent nonlinear functions. The numerical information is distributively stored at memory locations as weights. Each weight is associated with a basis function that outputs a non-zero value in a specified region of the input. The CMAC input is quantized by a lattice constructed by basis functions. To speed up learning and increase the information spread to adjacent basis functions, the CMAC updates a group of weights associated with basis functions that are close to a given point, and thus yields generalization capability. An input vector is the collection of N appropriate sensors of the real world and/or measures of the desired goal. The input space consists of the set of all possible input vectors. The number N of input vector components and the number of outputs is arbitrary within some practical limits. The CMAC algorithm maps any input it receives into a set of C points in a large “conceptual” memory in such a way that two inputs that are “close” in input space will have their C points overlap in the memory, with more overlap for closer inputs. If two inputs are far apart in the input space there will be no overlap in their C-element sets in the memory, and therefore no generalization. One of the benefits of CMAC is that it has local generalization, meaning that similar inputs produce similar outputs and dissimilar inputs produce nearly independent outputs. Another benefit of CMAC is its rapid convergence relative to back-propagation, meaning that the number of iterations required to converge is much smaller with CMAC. The rapid convergence time allows CMAC neural networks to work in real-time settings.
1. For each input pattern, the network produces an output pattern.
MFCC
X(f) Inputs
Mel Scale Filter Banks
The input brake and gas pedal pressure signals is divided into frames. Each frames are filtered using hamming window and transformed to the relative frequencies using the Fast Fourier Transform (FFT). The MFCC features are then extracted based on the logarithmic frequency bands positioned on the Mel scale [6,7]. Although the original ideas of the MFCC is to conform to the mechanism of extracting features similar to the cochlea that will extract the appropriate features needed by the brain, in our case we uses the capabilities of distributing the frequencies in logarithmic form. The MFCC are based on a short-time spectrum, where Fourier basis signals are decomposed into a superposition of a finite number of sinusoids. The power spectrum bins are grouped and smoothed according to the perceptually motivated Mel-frequency scaling, in this case it is the logarithmic scaling. Then, the spectrum is segmented into 12 critical bands by means of a filter bank that typically consists of overlapping triangular filters. Finally, a Discrete Cosine Transform (DCT) applied to the logarithm of the filter bank outputs results in vectors of de-correlated MFCC features.
Log ( )
Discrete Cosine Transform
F1
F2
F3
2. It compares the actual output and the desired one from the training set and calculates an error. 3. It adjusts its weights a little to reduce the error (sliding down the slope). 4. It repeats n times for every example in the training set until it has minimized the errors. Also, during learning, error information is propagated back through the network and used to adjust the connection weights. A standard back-propagation algorithm used is the gradient descent which its mathematics can be found in numerous neural networks textbooks such as [5].
2.
In-car Data Acquisition setup and route
Two KYOWA Thin pedaling Force Transducers (LPRA-S1/LPR-B-S1) were attached to the gas/brake pedal of the vehicle respectively. The transducers are connected to the KYOWA Signal Conditioner (CDV-400) to tap the signal. The signal conditioner’s output is connected to the EMANT 300 USB Data Acquisition (DAQ) Module [9] to record the gas and brake pedal pressure signal using an application via a notebook. The software is specially developed to serve this purpose. As shown in figure 2, the waveform capture application, written in C# will communicate with the DAQ modules. The data recorded will be sampled at 200 Hertz and waveforms will be recorded and displayed in real-time as shown by figure 3.
F12
Mel Frequency Cepstral Coefficients
Figure 1 The block diagram of MFCC feature extraction. MLP Multilayer Perceptrons (MLPs) are feed-forward neural networks trained with the standard back-propagation algorithm. The network is trained in a supervised manner according to a desired response. MLP networks are general-purpose, flexible, nonlinear models consisting of a number of units organized into multiple layers. The complexity of the MLP network can be changed by varying the number of layers and the number of units in each layer. Given enough hidden units and enough data, it has been shown that MLPs can approximate virtually any function to any desired accuracy [8]. They have been shown to approximate the performance of optimal statistical classifiers in difficult problems. MLPs are valuable tools in problems when one has little or no knowledge about the form of the relationship between input vectors and their corresponding outputs. Supervised learning for MLP is basically carried out as follows:
Figure 2 Block diagram of gas/brake pedal connections Two Digital Audio Tapes recorders with 4 microphones were used to record the driver’s voice, in-car conversations, vehicle engine noise and background noise under four separate channels. These microphones were attached to the driver’s clothes collar, dashboard, passenger seat and the leg compartment respectively. One video camcorder is attached to the side window of the passenger seat next to the driver. It is fixed onto the window using a MANFROTTO camera support. The camera will be use to record the road conditions throughout the drive. One web camera was attached on the lower windscreen in front of the driver to record his/her facial expression throughout the drive. The video
is captured using the webcam recording application via another laptop. Also, a mobile phone and its hands-free set will be placed in the vehicle to be used by the driver.
advised subject to exhibit his/her normal driving behavior. Next, starting mileage is recorded and data synchronization is performed prior to moving off. During the drive, there is one observer sitting next to the driver should the driver require any clarification or assistance. Another observer is seated in the rear seat diagonally to the back of the driver in ensuring the working order of the equipment. There will be 6 synchronization stations along the route, where the driver is required to stop the vehicle for synchronization of all the equipment. The objective for these synchronization stations is to ensure that the data collected are in harmony. All drives were conducted from morning (approximately 10.00 AM) to mid-afternoon (approximately 5.00 PM). All drives were conducted in clear weather with almost similar traffic conditions.
4.
Figure 3 Application Interface As for the driving route, it consists of 6 segments and 4 phases for the driver to comprehend. The route was planned to put the driver in situations where stress, distraction and frustration were likely to occur. These situations were simulated through looping and winding roads, road with sharp bends as well as driving in a housing-estate area. Also, in the third phase, the test subject will be required to answer a phone survey from our staff in order to distract him. Periods of familiarization and rest were included as well. The drives were taken both on weekday and weekend, to simulate typical daily commute and traffic during a weekend respectively. In total, the drive is approximately 60 minutes and 25.61 kilometers depending on traffic conditions.
3.
Data Collection Procedure
The data collection was carried out during vacation time as we can easily obtained test subjects. Also, drivers are required to have a valid Singapore driver’s license and at least 2 years experience in driving. In total, we managed to collect data for 11 drivers during a one week period using the rental vehicle. Two of the drivers are female and all drivers are between the ages of 24 - 25 years old. Once the in-car DAQ setup is complete and tested to be working, the test subject driver will be brought into the vehicle which is parked at the starting location of the route. Test subject will be required to fill-up a form followed by a small briefing conducted in the car. The briefing covers mainly on the safety of handling the vehicle and driving, route of the drive as well as the tasks the subject will be required to perform. He/she will be
Data Preprocessing
The data collected from the video cameras, brake/gas pedals pressure and digital audio tapes will be first put in-sync with each other by reviewing them according to the six synchronization points we conducted during the drive. After which, we cut all the synced data into its respective segments. For this paper, we are interested in the brake and gas pedal pressures. To remove unwanted noise the gas and brake pedal pressures signals will be first filtered using a Median filtering that can successfully remove spikes while retaining the amplitude details of the original signal. Window size, n = 20 at the sampling frequency of 200Hz were used. Following the filtering process, we next down sampled the signals to 65Hz to allow us to better sync with the rest of the driving data. We then extract 5 smaller sections of stop-go region for each of the segments. Similar to [3], the motivation for using just the stop-go regions instead of the entire signals is instinctive since little or no information relating to driving behavior is present when the vehicle is stationary. Next, for each of these signal pairs, its derivative was calculated. Experiments conducted in [2] indicate higher accuracy in driver identification when the derivatives are combined with the original. These data, upon normalization will be used as inputs to the CMAC feature map for driving profiling.
5.
Driver profiling using CMAC
Using the CMAC feature map model created in our previous work [3], output tables modeled as 3-D mesh plots were obtained from the 5 processed signals per segment per driver. We will repeat this for all the 6 segments of the 11 drivers. Figure 4 shows a sample of the output mesh plot for one of the male drivers.
used in calculating the mel cepstrum of the signal is as follows: Sample rate = 100Hz. Hamming window in time domain. Number of cepstral coefficients = 11. Length of frame = 12. Frame increment = 8.
Figure 4 3-D mesh plot of one driver’s amplitudes of brake against gas and for derivatives of the brake against gas. From the above 3-D mesh plots, we extracted angular ‘slices’ passing through the origin. We believed that these ‘slices’ represents both the driver’s profile relating to either the brake/gas component or its derivative component. As such, a total of 9 angular slices can be extracted from one CMAC output plot. Hence, in total we have 54 ‘slices’ per driver.
Figure 5 One of the angular ‘slice’ from the driver’s CMAC output of the brake and gas derivative Figure 5 shows on of the extracted angular ‘slice’ for one of the driver’s 3-D CMAC output for his derivative of brake and gas signal. As the signal is of relative frequency from the CMAC weights versus the resolution, it is hard for us to extract its features using MFCC. As such, the above signal was transformed into amplitude versus frequency in Matlab. The resultant plot is as follows.
Figure 6 the transformed version for the angular ‘slice’ in Figure 5 Feature Extraction using MFCC Each of these signals will be used as inputs to the MFCC tool to extract relevant mel-frequency cepstrums. The MFCC tool used is VOICEBOX for Matlab, written and maintained by Mike Brooks [8]. The function used is melcepst.m and melbankm.m. The parameters that was
Hence, we obtained 11 cepstral coefficients and 11 instances for each of the angular slice input signals making a total of 594 instances per driver. Each of these outputs will be merged to form our dataset for the following driving verification experiments. Experimental Setup In this paper, we will only be using multi layer perceptron (MLP) neural network in investigating the possibility of driver verification from our real-life data collected. From the results obtained in 5.1, we will have 11 features for the each of the gas and brake amplitude (GB) and derivative of the gas and brake signal (dGB). And combining both (GB+dGB), we will have 22 features. To be able to identify the driver, we need to recognize more than one of his driving styles and at the same time having the knowledge of his past styles that he has driven before. Hence, we will carry out the experiments using a singular MLP model for all K-fold cross validation thus allowing the network to ‘age’ with previous learnt training set. Preliminary work on model selection process involves taking only the two drivers (D1), (D2) for each of the dataset and for each dataset, a network will be created to be trained and tested. Experiments were conducted to discover optimized network architecture for verification. We were using local validation experiments to provide the rough idea of the best number of neurons and layers with emphasis on 10 to 20 neurons for one layer and 10 to 20 neurons for two and three layers architecture. From the result, MLP architecture with 20 neurons in two hidden layers is best and achieve up to 83.3% accuracy taking 187s for training. Similar as our preliminary experiments, the dataset and its desired result was randomized and spliced into 6 folds. Three different types of experiments were carried out for different combinations of drivers per dataset. Firstly, we create a dataset with two classes containing one driver and equal instances from the rest of the ten drivers. The rationale for this is that in application, most of the time; the vehicle verification system just needs to verify one driver (the owner) against a set of drivers that is considered foreign. Secondly, a dataset with two classes containing combined instances of two drivers and equal instances from the rest of the nine drivers. Similar to the rationale mentioned earlier, in this case, we want to verify the accuracy of the vehicle being able to detect two drivers sharing the vehicle against a set of foreign drivers. This case can be considered practical as most car owners may have their spouse or another driver using the same car. Lastly, we
move one step further by creating a dataset in order to identify three drivers against a set of foreign drivers.
Figure 7 Accuracy of GB, dGB, (GB+dGB) for 1 driver against the rest (D1-D11) In Figure 7 and 8 separate runs were carried out on the datasets containing each of drivers against the rest. As observed, the (GB+dGB) dataset gives us the highest accuracy of 91.33% and lowest accuracy of 83.4% (D6) in classification as compared to the other types, (GB) and (dGB) alone. One of the factors that lead to this higher accuracy is that the network has doubled the number of features as compared to (GB) and (dGB). With this, it can provide more information and thus, results in a much higher performance. This result is very much similar to the experimental results carried out in [1]. Another observation is that for datasets (GB) and (dGB), although the accuracy is not as high as the combination, the derivative of the brake and gas signals shows only a slight increase as compared to the amplitude signals alone. However, in the case of drivers, D1, D5 and D11, amplitude signals (GB) shows a slight improvement in its accuracy. This abnormality might be due to different driving styles for these drivers such as having lesser rate of change of the pedals such that when the derivative is computed, it does not give enough information.
Figure 8 accuracy of GB, dGB, (GB+dGB) against 2 driver combinations The earlier observation is also true for the dataset containing 2 drivers against the rest and for 3 drivers against the rest. However, the accuracy for (GB+dGB) dropped to 83.67% and 79.38% respectively in Figure 8 and 9. This is expected as when two or three drivers are integrated, the data will somehow not complement each
other (reduction in uniqueness of data), thus resulting in lower verification.
Figure 9 accuracy of GB, dGB, (GB+dGB) against 3 driver combinations From the results obtained, we are now certain that firstly, driver verification is possible from our data collection as well as from the slices extracted from the earlier CMAC output plots; for the case of the angular slices. Secondly, for more than one driver verification, further experiments needs to be carried out as the accuracy is not as good as one driver verification.
6.
Conclusion and Future Works
From the experimental results, we are able to conclude that when both derivative of gas/brake and the amplitude of gas/brake signals are combined, a relatively higher accuracy for driver verification is obtained. However, the accuracy of it will drop when more than one driver is combined in a class of dataset. Also, we have actually verified the suggestion from [2] that, indeed, features can be extracted from CMAC output plots of driver profiles to be used in driver verification. These results provide an indication as to the practicality of using CMAC-MFCC coupled with either a NN or FNN for a real-time in-car driver verification system as part of the constant development of intelligent automobiles. However, this does not end here as and we should explore on more areas that address the issue of the reduced accuracy when more than one drivers are combined for verification. Also, it may also be possible to use the angular slices and feature extraction to determine the driver’s state of emotion. Fuzzy neural network such as Adaptive Neuro Fuzzy Inference System (ANFIS) and Generic Self-organizing Fuzzy Neural Network (GenSoFNN) may be used to achieve better performance for the multiple driver verification. On top of that, feature extraction methods such as Gaussian Mixture Model (GMM) can be used for further optimization on the extracted data.
7.
References
[1] SID Protect Inc. announces the release of the SID, a fingerprint anti-theft device for vehicles, (2007),
Retrieved 10, March, 2008, http://www.justbiometrics.com/press.html
Website:
[2] W.Abdul, C.K Tan, H.Abut, K. Takeda, Driver Recognition Using FNN and Statistical methods, Biennial on DSP for in-vehicle and Mobile Systems, 2005, Sesembra, Portugal [3] W. Abdul, W. G. Toh and N. Kamaruddin, Understanding Driver Behavior Using MultiDimensional CMAC, Information, Communications & Signal Processing, 2007 6th International Conference, 10-13 Dec. 2007 [4] Albus, J. A. (1975). A New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC), ASME J. Dynamic System Measurement, Contr., 97(3), pp. 220-227. [5] Francisco, J., Gonzalez, S., Anibal, R., Figueiras, V. and Artes, R. (1998). Generalizing CMAC Architecture and Training, IEEE Transaction on Neural Networks, 9(6), pp. 1509-1514. [6] W.Abdul, G.S.Ng, R. Dickiyanto, Speaker authentication system using soft computing approaches, Neurocomputing, 2005, vol. 68, pp. 13-37 [7] Rabiner, L. Juang, B., Fundamental of Speech Recognition, Prenticve Hall, 1993. [8] VOICEBOX: Speech Processing Toolbox for Matlab, (2007), Retrieved 10, March, 2008, Website: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.h tml [9] “Low Cost USB 24-bit DAQ Training Kit”, (2007), Retrieved 21, August, 2007, Website: http://www.emant.com/