Noname manuscript No. (will be inserted by the editor)
Dual tree complex wavelet transform based features for automated alcoholism identification Manish Sharma · Pragya Sharma · Ram Bilas Pachori · U. Rajendra Acharya
Received: date / Accepted: date
Abstract A novel automated system for the identification of alcoholic subjects using electroencephalography (EEG) signals is proposed in this study. The proposed system employed dual tree complex wavelet transform (DTCWT) based features and sequential minimal optimization support vector machine (SMO-SVM), least square support vector machine (LS-SVM) and fuzzy Sugeno classifiers (FSC) for the automated identification of alcoholic EEG signals. The EEG signals are decomposed into several sub-bands (SBs) using DTCWT. The features extracted from DTCWT based SBs are fed to FSC, SMO-SVM, and LS-SVM classifiers to evaluate the best performing classifier. The 10-fold crossvalidation scheme is used to mitigate the over fitting of the model. We have obtained the highest classification accuracy (CAC) of 97.91%, the area under receiver operating characteristic curve (AU-ROC) of 0.999 and M. Sharma Department of Electrical Engineering, Institute of Infrastructure, Technology, Research and Management (IITRAM), Ahmedabad, India E-mail:
[email protected] Pragya Sharma Department of Electronics and Communication Engineering, Acropolis Institute of Technology and Science, Indore, India E-mail: pragyasharma1512gmail.com Ram Bilas Pachori Discipline of Electrical Engineering, Indian Institute of Technology Indore, Indore , India E-mail:
[email protected] U. R. Acharya Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore 599489, Singapore, Department of Biomedical Engineering, School of Science and Technology, SUSS University, Singapore and Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Malaysia
Matthew’s correlation coefficient (MCC) of 0.958 for our proposed alcoholic diagnosis model. Our alcoholism detection system performed better than the existing systems in terms of all three measures CAC, AU-ROC and MCC. Keywords Electroencephalography (EEG), analytic complex wavelet transform, computer aided diagnosis system, support vector machine (SVM)
1 Introduction Alcoholism ranks among the top five causes of mortal abuses [7, 41]. The alcoholic subjects feel compelled to satiate their cravings despite knowing the healthrelated consequences and social adversities [3]. The diagnostic and statistical manual of mental disorders has recognized alcoholism as a psychological disorder [34]. Moreover, in psychoanalytical questioning, patients tend to withhold information due to embarrassment which results in poor (50%) positive rate of alcoholism detection [32]. Genetic, psychological, and environmental factors play a crucial role in alcohol addiction [1]. Such inflictions combined with persistence may contribute to degradation in job performance in adults, poor academic performance, and learning abilities in adolescents [33]. Perpetual consumption of alcohol can also attribute to the brain contraction and white matter deformity [47]. As per an estimate, 2 billion people are regular consumers of alcohol-based beverages and, amongst them, 81.7 million have a diagnosable alcoholic addiction. Another alarming aspect of such an addiction is increased disability-adjusted life-years (DALY), which has affected 139 million people world wide [44]. Alcohol consumption attributes to 5.1% of diseases and in 2012, approxi-
2
mately 5.9% ( about 3.3 million) of all the deaths worldwide [22, 44, 45] were due to alcohol. According to the report published in 2009, alcoholics are prone to injury conditions and more than 200 diseases including alcohol reliance, liver cirrhosis, and cancer [25]. Therefore, for effective treatment of chronic and acute alcoholism, it is prudent to ensure early detection of alcoholism thereby reducing dangers of its impairment and death. Along with behavioral changes, alcohol intake manipulates power levels of various brain-waves such as theta [48], delta and beta waves [49], which can be useful factors in the diagnosis of alcoholism using EEG signals. In recent past, neurophysiologists have used this approach as an alternative to direct questioning [7]. Due to unavailability of algorithms that deal with an entire human brain, researchers relied upon derived methods that employ EEG signals. However, non-linearity and non-stationary characteristics of EEG signals make the analysis a challenging task [43]. To handle the classifications problems associated with non-stationary nature and non-linearity of EEG signals, advanced signal processing techniques in combination with machine learning based approaches are employed [10, 14, 54, 56]. These techniques are used to extract relevant features from EEG signals which are then applied to classification algorithms to identify the respective class-labels of multiple-class dataset [27]. The three-step strategy is often employed to separate various classes of EEG signals [12, 19, 20]. The first step is to transform the original time domain EEG signal into some other domain such as frequency domain or joint time-frequency domain. The signal transformation is followed by the feature extraction wherein some useful features are obtained from the transformed version of the signal. Finally, the extracted features are fed to some classification algorithm that labels features corresponding to either normal or abnormal subject [8, 29].
Manish Sharma et al.
However, complex-valued wavelet transforms, which have certain additional desirable properties to analyze EEG signals have yet not been explored in the alcoholism detection and diagnosis. This motivated us to evaluate the efficacy of complex valued analytic dual tree wavelet transform in the automated alcoholism detection system. In our proposed study, we have evaluated the performance of dual tree complex wavelet transform (DTCWT) in the identification of alcoholism. The useful discriminating features have been extracted from the sub-bands (SBs) obtained from the wavelet decomposition using the DTCWT. Though the DWTs offer near-optimal representations of many real-world signals, they suffer from oscillations around singularities, shift variance, aliasing and lack of directionality. The aforementioned drawbacks of DWT can be conquered by applying the DTCWTs. Unlike DWT, the DTCWTs are redundant tight frames that have the ability to overcome the above limitations [52]. Having performed wavelet decomposition, the next step is the extraction of features from the SBs. In the proposed study, we have extracted l2 norms (L2Ns) and log-energy entropies (LEEs) from the decomposed SBs. Then the extracted discriminating features are fed to the classifiers to separate them into two groups corresponding to normal and alcoholic subjects. In supervised machine learning based classification algorithms, the support vector machine (SVM) [21, 24, 58] and their variants including least squared support vector machine (LS-SVM) are popular for the classification of bio-signals. In this study, we have used three different classifiers namely LS-SVM, SMO-SVM and Fuzzy Sugeno classifier (FSC) to distinguish normal and alcoholic EEG Signals.
A few automated classification systems have been proposed in earlier studies for the screening of alcoholic and normal subjects using signal processing techniques [7, 11, 31, 46]. The most recent approach proThe accurate screening of alcoholic use disorder (AUD) posed by sharma et al. [53] employed TBWFB based features for screening of alcoholic EEG siganls and reis a challenging task due to the subjectivity involved in ported very good classification performance. Patidar et self-test reports [51]. Due to memory loss and dishonest al. [46] exploited TQWT and correntropy (CE) based behavior of the patients, the self-test reports may lead feature for the identification of alcoholism. A computerto false diagnosis. Thus, the motive of the proposed aided diagnosis (CAD) system has been developed by study is to bring forth a machine learning and signal Faust et al. [31], they employed wavelet packet decomprocessing based model for the computerized identifiposition (WPD), energy measures and LS-SVM classication of healthy and alcoholic subjects using EEG sigfier to separate EEG signals into alcoholic and normal nals. The model facilitates faster analysis of normal and groups. An automated technique by Acharya et al. [11] abnormal conditions pertaining to AUD. In the literahave applied nonlinear features such as approximate enture, real-valued discrete wavelet transform (DWT) [31] tropy (AppEnt), largest Lyapunov exponent (LLE) and and its variant such as tunable-Q wavelet transform higher order spectra (HOS) and SVM classifier. Kan(TQWT) [46] and time-frequency optimized three band nathal et al. [38] used correlation dimension (CD) [26], wavelet filter bank (TBWFB) [53] have been used in Hurst exponent (HE), LLE, sample entropy (SampEnt), alcoholism identification systems. These systems [46, AppEnt, and HOS [30]. However, they did not use any 53] have produced promising classification performance.
Automated identification of alcoholism
3
2.1 Dataset/Raw EEG signal Acquisition of EEG signals
Artifact removal through filtering
Signal decomposition using DTCWT
Computation of of L2N and LEE features
Feature ranking and feature selection
Classification using LS-SVM, SMO–SVM, and FSC with 10-fold CV
Identification of alcoholic and normal EEG signals
Fig. 1: Steps used for automatic alcoholism detection.
classifiers for automated classification. Ehlers et al. [26] employed chaos theory based features and discriminant analysis without using any classification algorithm for alcoholic identification. In this study, a new DTCWT feature based CAD system is proposed. The proposed CAD yielded promising performance in detecting the alcoholism. The classification performance obtained using the proposed model is found to be better than the existing models.
2 Method and Material Fig. 1 shows main steps used to develop the proposed automated alcoholic detection system. The first step is to acquire the raw EEG data to be tested. The raw EEG data is then filtered to remove artifacts. In the next step, the processed EEG signals are decomposed into SBs. In this work, we have employed DTCWT for signal decomposition. We then computed L2Ns and LEEs features from the decomposed SBs. These features are then ranked and selected using student’s t-test. Then the features are then applied to the SMO-SVM, LSSVM and FSC adopting 10-fold CV strategy for distinguishing normal and alcoholic EEG signals.
In this work, we have used the publicly available EEG data from University of California, Irvine knowledge discovery (http://kdd.ics.uci.edu/databases /eeg/eeg.data.html). The EEG dataset was obtained from total 122 different subjects. Total 120 trials were taken, for every subject, where different stimuli were shown [2]. The measurements were acquired by placing 64 electrodes on the scalp of the subject. The complete 10/20 international montage had been employed. The trials contaminated by excessive eye-body movement were discarded. The grounding of subjects was accomplished by nose electrode with less than 5K ohm impedance. The data collection process has been described by Zhang et al. [61]. The subjects were split into two groups: alcoholic and normal. Pictures of objects from the 1980 Snodgrass and Vanderwart picture set were used as stimuli, to which the subjects were exposed in either as single stimulus (S1) or two stimuli (S1 and S2). The two stimuli were shown as either matched condition or nonmatched condition. In matched condition, the S1 and S2 were identical and in a case of non-matched condition, S1 and S2 differed. Trials that were contaminated by enormous eye or body movements were discarded. The total 30 EEG recordings for each of the normal and alcoholic classes were acquired. The recorded EEG signals of 32 sec duration were later sampled with a sampling frequency of 256 Hz. This resulted in 60-time series each containing 16,446 samples. These recorded time series were then divided into 8 sec segments containing 2,048 samples each. Thus, we have used total 240 segments, 120 each for normal and alcoholic. Thus, the data used in the study is completely balanced. Fig. 2 shows the sample segments of normal and alcoholic EEG signals.
2.2 Denoising and filtering Raw EEG data may be contaminated by various noise and artifacts including electrode & muscle movement and power line interference. Several methods have been used to remove artifacts and noise from EEG signals [50]. We have used Butterworth 7-tap finite impulse response filter with cutoff frequency of 48 Hz in order to reject high-frequency components.
2.3 Signal transformation The EEG signals exhibit non-stationary and non-linear nature. They contain spikes and transient events [13].
4
Manish Sharma et al.
where Ψ (t) and Φ(t) denote the real-valued bandpass wavelet and lowpass scaling function, respectively. The s(n) and w(m, n) are scaling and wavelet coefficients, respectively. They can be obtained as ˆ∞ z(t)Φ(t − n)dt
s(n) =
(2)
−∞
ˆ∞ m/2
w(m, n) = 2
z(t)Ψ (2m t − n)dt
(3)
−∞
Fig. 2: EEG signals: alcoholic (top), normal (bottom). Fig. 3: Typical dual tree analysis filter bank structure. Wavelet transforms have been proven to be a very effective tool to analyze and classify various subclasses of non-stationary EEG signals. In this study, we have employed DTCWT designed by Kingsbury [52]. The DTCWT is a redundant variant of DWT with few additional desirable properties. The limitation of the DWT is that, it is shift variant transform. This limitation can be overcome by using DTCWT which exhibits nearly shift invariant behavior. Further, it possesses the desirable property of directional selectivity in two and higher dimensions. Though it is a redundant transform unlike DWT, the redundancy factor is only 2 for onedimensional EEG signals. Besides, DTCWT is a complex analytic transform whereas DWT is a real-valued transform. In DTCWT architecture, two wavelet filter trees are employed to get wavelet coefficients corresponding to real and imaginary parts of the complex wavelet. In case of DWT, we have a real-valued orthogonal or biorthogonal basis Ψ (t). On the other hand, in case of DTCWT, we have a complex wavelet basis ΨC (t) = ΨR (t) + jΨI (t) where ΨR (t) and ΨI (t) form individually orthogonal bases. According to classical wavelet transform theory [42], a signal z(t) can be represented in terms of scaling and wavelet functions as
Unlike DWT, the DTCWT coefficients of the signal z(t) is obtained by projecting it on the complex valued wavelet basis ΨC (2m t − n) and complex scaling basis ΦC (t−n). Then, we obtain complex-valued wavelet and scaling coefficients; √ wC (m, n) = wR (m, n) + jwI (m, n); where j = −1 (4) sC (n) = sR (n) + jsI (n)
(5)
where, ΦC (t) = ΦR (t) + jΦI (t) and ΨC (t) = ΨR (t) + jΨI (t). ΨR (t) is real and even and, ΨI (t) is imaginary and odd. The computation of ΨR (t) and ΨI (t) requires two tree implementation, thus, the name DTCWT. Hence, DTCWT exploits two real-valued DWTs. A typical one-dimensional dual tree filter bank is shown in Fig. 3. The upper branch of the tree comprising real-valued filters h0 (n) and g0 (n) yields the real part of coefficients whereas the lower branch composed of the real-valued filters h1 (n) and g1 (n) yields imaginary part of the coefficients. The filters g0 (n) and g1 (n) are analysis lowpass filters and h0 (n) and h1 (n) are analysis highpass filters for the first stage. Both of sets {g0 (n), h0 (n)} and {g1 (n), h1 (n)} satisfy perfect reconstruction condition (PR) [55]. Further, the filters are chosen such that the complex wavelet ΨC (t) = ΨR (t) + ∞ ∞ ∞ X X X m/2 m jΨ z(t) = s(n)Φ(t−n)+ w(m, n)2 Ψ (2 t−n) I (t) is an approximate analytic version. Equivalently, −∞ m=0 n=−∞ ΨR (t) can be considered as an approximate version of (1) the Hilbert transform of ΨI (t). In this study, 8-length
Automated identification of alcoholism
5
ΨR(t) ΨI(t)
0.15
|ΨC(t)|
0.1
0.05
0
−0.05
−0.1
−0.15
0
7
Fig. 4: Real part, imaginary part and absolute value of the complex wavelet. filters designed by Abdelnour and Selesnick [4, 5] and Kingsbury [39] have been chosen for the filter banks of the underlying dual tree structure. We have applied 8 levels of DTCWT decomposition on each EEG signal. The decomposition of level 8 yielded 9 SBs for which L2Ns and LEEs are calculated, as discussed in the next section. The wavelets are generated through the cascade iteration of the underlying FBs. Fig. 4 shows the real and imaginary parts of complex wavelet along with the absolute value of the complex wavelet function considered in this study. Fig. 5 depicts the spectrum of the complex wavelet shown in Fig. 4. From the figure, it is clear that, the generated wavelet is almost analytic.
alcoholic EEG signals. Thus, total of 18 features are considered in this study. Let x(n) be a SB of a EEG signal, then L2N and LEE of the SB, denoted as l2 (x) and log E(x), can be given by s l2 (x) =
2
X
|x(n)|2
(6)
n
log E(x) =
X
loge |x(n)|2
(7)
n
Having computed the features for each EEG signal, the next step is to perform feature ranking and feature selection as explained below.
8 7
2.5 Feature ranking and Feature selection
Magnitude
6 5 4 3 2 1 0 0
1
2
3 4 ω (in rad/sample)
5
6
7
Fig. 5: Spectrum of the complex analytic wavelet.
2.4 Feature extraction In this study, we computed L2Ns and LEEs from 9 SBs obtained from 8-levels of DTCWT decomposition of each EEG signal. The L2Ns and LEEs have been used as distinguishing features to separate normal and
In this study, we have computed 18 features. However, all 18 features are not fed to the classifier. We ranked the computed features using the student’s t-test [37]. In order to reduce the size of the feature vector, the most significant and relevant subset of the ranked features is obtained using the forward sequential feature selection (FSFS) technique as described below. The feature ranking and selection process gives an optimal subset of the computed features that provides the optimal classification performance. The process of feature ranking and selection reduces the computational complexity and thereby reduces the time required by the classifier to test and train the data. Thus, the process improves the speed of classification scheme. The student’s t-test assesses the statistical significance of every feature by computing pooled variance estimate called as t-statistics or t-value. Clinically most significant feature
6
will have highest t-value. Thus t-value indicates the distinguishing capability of every feature computed. The SFFS forms an optimal feature subset by selecting the features sequentially, in the order of their rank, from the selected 18 features. The selection criterion involves the minimization of misclassification rate of the designed model. With the addition of every new feature into a newly formed subset, the criterion is observed. The process of adding new features into the subset continues until we do not find further minimization in criterion by adding the new feature. If the objective function does not decrease further by adding a new feature, we stop adding the features into the subset. The subset of features which is selected just before this case is regarded as the optimal subset. The size of the optimal subset thus formed will be smaller than the original featureset, but at the same time, it will give the optimal classification performance. Thus, we refrain from feeding all features to the classifier simultaneously in one go. This ensures that only relevant features that optimize the classification accuracy are fed to the classifier. In our study, the optimal feature-set contains only top 15 features.
2.6 Classification and Cross validation In the proposed study, we have employed three different classifiers; two are the variants of support vector machine (SVM) [57, 60] and the other one is the fuzzy inference based Fuzzy Sugeno classifier (FSC) [23]. The rationale behind using different classifiers is that we do not apriori which classifier will yield the best classification result. The performance of classifier depends upon the application at hand and nature of features extracted. Further, in the existing literature [30,46] it has been established through empirical studies that both SVMs and FSC are good choices, for the non-linear features extracted from EEG signals, in classifying normal and alcoholic groups. Hence, we employed both SVM and FSC for the classification in this study. To design SVMs, one must choose an appropriate kernel function. The kernel-based SVMs was proposed by Boser et al. [21] and Cristianini et al. [24]. It is to note that there is no analytical way for choosing the best kernel function and the performance of SVM vary with the choice of a kernel function. Only through extensive empirical studies and experience, one can decide which kernel function will perform best for the given application. In this study, we have used two variants of SVM namely SMO-SVM and LS-SVM [58] taking the Gaussian radial basis function (RBF) as the kernel function as it has already been established that LS-SVM and
Manish Sharma et al.
SMO-SVM with RBF perform well in classifying normal and alcoholic EEG signals [53]. The RBF kernel can be defined as k um − ul k2 K (um , ul ) = exp − (8) 2σ 2 The width of RBF can be controlled by a parameter σ. In this study, for achieving the optimized value, σ is varied from 1 to 20 in steps of 0.1. Fuzzy system models have been found to be effective tools in identifying the alcoholism [30]. The central part of fuzzy system is a set of if-then rules with fuzzy implications and the membership functions. The membership function f (u) expresses the degree of belongings of an input u = {u1 , u2 . . . un } to a fuzzy set . In earlier studies [30], the fuzzy the classifier has yielded promising performance in the automated identification of alcoholism. Hence, we have employed fuzzy inference based classifier referred to as FSC. The FSC based fuzzy inference system (FIS) uses the cluster information that models the data behavior using a minimum number of rules, optimally. A FIS formulates a mapping from the given input to an output employing fuzzy logic. The mapping gives the basis for classifying various groups or patterns. The FIS is a rule-based system which employs fuzzy logic instead of Boolean logic in order to make a decision [15, 36]. A FIS comprises of inputs, outputs and a set of fuzzy rules, which are formed by employing the information of training patterns, to cover the entire space of features. The subtractive clustering technique (SCT) [16] has been employed that yields the FIS wherein number of if-then rules are determined by the number of clusters. The SCT is used when we do not know apriori about the number of clusters in the given data set. The SCT is a one-pass fast algorithm to estimate the number of clusters and the corresponding centers of clusters. The SCT considers each data point as a potential center of the cluster and then depending upon the density of data points in the vicinity it estimates the measure of the likelihood that the data point would define the cluster center. A vector or a scalar containing entries [0, 1] is supplied by the user to the algorithm in order to cluster the data. The each entry of the vector acts as the radius of the cluster [23]. The radius decides the range of influence of a cluster center in each of the data dimensions. The optimal values of the radii are found to be between 0.2 and 0.4. The smaller values will result in a small number of big clusters. In this work, we have chosen a scalar value of 0.4 which implies that each center of the cluster has a spherical neighborhood of influence with the radius of 0.4. We used the Gaussian function as input membership function because its performance is found to be good [40]. The width of the function is determined by SCT.
Automated identification of alcoholism
To avoid the possible data redundancy and overfitting of the model as well as to ensure robust and stabilized performance, the 10-fold cross validation (CV) is used. On the completion of each fold, the confusion matrix is computed, and the performance is measured in terms of four measures, namely classification accuracy (CAC), classification sensitivity (CSE), classification specificity (CSP) and Matthew’s correlation coefficient (MCC). After the completion of 10 folds, the average of all four measures is calculated.
7
classes are well separated. Further, it is clear that box plots for two classes corresponding to L2N features extracted from SB-6 to SB-9 are almost non-overlapping. Similarly, LEE features corresponding to SB-3 and SB7 to SB-9 are well separated between the two classes. This indicates the high discriminating ability of these features.
SB−1
SB−2
−1000 −2000 −3000 A
2.7 Results and discussion
N
500 A
SB−4
In this section, we have discussed the classification performance of our proposed model. The proposed study is also compared with the existing studies on the automatic alcoholism identification using EEG signals. In Table 1, we presented the mean and standard deviation values of all 18 features used in this study. For both features (L2N and LEE), we can notice a significant difference in mean values for normal and alcoholic signals. Further, we can observe that feature values for the alcoholic signal are lower than the normal EEG signal. This indicates that normal signals are more random than alcoholic signals [7]. We have also used the Kruskal-Wallis test to determine the null hypothesis, that is to confirm whether the given feature belongs to the same distribution. The Kruskal-Wallis test gives pvalues corresponding to each feature. The p-value can be regarded as the probability of the null hypothesis being true. If the p-values are lower than the predetermined significance level, then the hypothesis is rejected. Thus, the p-value can be considered as an indicator of the statistical significance of the given feature. The smaller the p-value better would be the discriminating ability of the feature. Table 2 shows p-values for all features. From this table, it is evident that the p-values for all features except the L2N for SB-1 are less than 0.0001. Thus all features are clinically significant except L2N of SB-1. Also, the p-value corresponding to L2N of SB-7 is the lowest (7.0362e-32). Hence the L2N feature of SB-7 is considered as the best feature.
The box plots help to visualize the discrimination ability of features between the two classes. The boxplots can also be used to analyze the discriminative abilities between the individual features. The box plots for L2N as well as LEEs features corresponding to all SBs are depicted in Fig. 6 and Fig. 7. From the figures, it is evident that box plots corresponding to most of the L2N and LEE features for normal and alcoholic
SB−3 1000
1000 500 0 N
A
SB−5
SB−6
800
400
200
600
300
150
400
200 A
N
N
100 A
SB−7
N
A
SB−8
N SB−9
80 60
100
60
40
40
50 A
N
A
N
A
N
Fig. 6: Boxplots for LEEs features extracted SBs: the notations A and N represent SB of alcohol and normal EEG signal respectively.
SB−1
SB−2
120 100 80 60 40 20
SB−3 200
150 100 50 A
N
100 A
SB−4
N SB−5
400 200 N
A
SB−7
SB−6
N
200
200
100
100 N
N SB−9
300
A
A
SB−8
300
N
250 200 150 100 50
300 200 100 A
A
400 200 A
N
A
Fig. 7: Boxplots for L2NS features extracted from SBs: the notations A and N represent SB of alcohol and normal EEG signal respectively.
Table 3 shows the performance of the model in terms of CAC, CSE, CSP and MCC. The optimal value of the parameter σ for the RBF kernel is also mentioned in the table for both classifiers. We have mentioned in the table the number of selected features (NSF) used to ob-
N
8
Manish Sharma et al.
Table 1: Feature Statistics: (mean ± standard deviation) of L2N and LEE corresponding 9 SBs. SB-1 to SB-8 denote 8 detailed DTCWT coefficients and SB-9 represents approximate DTCWT coefficients. SB
L2N Alcoholic Normal 66.94 ± 34.70 73.67 ± 34.39 68.72 ± 22.68 88.83 ± 26.04 85.54 ± 24.92 128.49 ± 34.58 101.46 ± 33.22 175.46 ± 73.08 83.97 ± 23.51 152.81 ± 51.28 83.14 ± 27.40 138.88 ± 39.29 84.52 ± 28.49 158.16 ± 43.14 85.12 ± 33.93 158.06 ± 47.11 130.51 ± 67.78 253.65 ± 91.07
1 2 3 4 5 6 7 8 9
LEE Alcoholic Normal -2856.57 ± 543.47 -2154.06 ± 549.91 207.29 ± 360.12 637.64 ± 305.64 575.90 ± 156.85 820.55 ± 159.09 435.96 ± 91.17 580.54 ± 118.48 240.27 ± 39.98 320.58 ± 42.29 143.00 ± 22.25 177.35 ± 20.47 84.49 ± 12.32 105.97 ± 10.76 48.39 ± 6.98 59.34 ± 5.85 54.01 ± 9.46 66.52 ± 7.30
Table 2: p-values obtained from Kruskal-Wallis test. SB 1 2 3 4 5 6 7 8 9
L2N 0.153 4.248e-09 3.513e-22 3.521e-18 6.771e-30 1.607e-25 7.0362e-32 9.933e-28 3.643e-22
LEE 2.536e-18 8.484e-18 2.146e-22 5.841e-18 2.001e-30 2.760e-23 3.308e-29 1.435e-26 4.264e-21
Table 3: Classification performance using three different classifiers
1 0.9 0.8
Classification measure CAC(%) CSE(%) CSP(%) MCC NSF
Sensitivity
0.7 0.6 0.5 0.4 0.3
LS-SVM 97.917 99.167 96.667 0.95863 15
SMO-SVM 97.917 98.333 97.5 0.95837 15
FSC 95.833 97.5 94.166 0.9171 14
0.2 0.1 0 0
0.1
0.2
0.3
0.4
0.5 1 − Specificity
0.6
0.7
0.8
0.9
1
Fig. 8: ROC curve for SMO-SVM.
tain the highest classification performance for both classifiers. The ROC can be employed to measure the collective efficacy of the features to discriminate the binary classes. ROC is a graphical representation of true positive rate versus false positive rate. Using SMO-SVM classifier, the area under the receiver operating characteristics (ROC) curve ( Fig. 8) is found to be 0.999. The value close to 1 indicates almost perfect classification performance. Recently, a few studies have been conducted to identify alcoholism using machine learning techniques. Table 4 gives the summary of various studies done pre-
viously on automated alcoholism detection using the same database with EEG signals. The entries are listed in decreasing order of year of publication. The system proposed by Patidar et al. [46] achieved the maximum CAC of 97.02 % and MCC of 0.9494 by means of TQWT and CE features. The model presented the best classification performance so far. The results obtained by our study, offer further improvement. The proposed model by us has surpassed TQWT based model [46] in terms of both CAC as well as MCC. Faust et al. [30] achieved 92.4% accuracy employing HOS features. In another work, Faust et al. [31] attained CAC of 95.8% using wavelet packet decomposition, relative energy features and k-nearest neighbor (kNN) classifier. In few studies [26, 38, 59], alcoholic and normal EEG signals are classified without machine learning algorithms. In Table 4 the entries corresponding to classification performance mentioned as
Automated identification of alcoholism
9
Table 4: Comparison of various studies on automated diagnosis of alcoholism using the same database. Authors The Proposed work Sharma et al. [53] Patidar et al. [46] Faust et al. [30] Faust et al. [31] Tcheslavski and Gonen [59] Acharya et al. [11] Faust et al. [28] Kannathal et al. [38] Ehlers et al. [26]
Classifier SMO-SVM, LS-SVM FSC LS-SVM LS-SVM DWT, FSC, kNN kNN NC SVM ANN, SVM NC NC
Features DTCWT, L2N and LEE TBWFB, energy TQWT, CE HOS WPD, relative energies parametric spectrum coherence measure AppEnt, SampEnt, LLE, HOS Burg’s PSD CD, LLE, entropy, HE CD
Classification performance CAC:97.91%, AU-ROC:0.999, MCC:0.9583 CAC: 97.08% CAC: 97.02%, MCC: 0.9494 CAC: 92.4% CAC: 95.8% NA CAC: 91.3%, CSE: 90%, CSF: 93.3% AU-ROC: 0.822 NA NA
PSD= Power spectral density; NC= No classifier; NA= Not applicable
not applicable (NA) indicates not reporting classification performance. Thus our proposed system achieved the best results in all three categories: CAC, AU-ROC and MCC. As shown in Table 3, maximum CAC is found to be 97.91 %, MCC of 0.9583 and the AU-ROC of 0.999. Further, CSE which is the measure of correctly identified subjects who suffer from a disease. The maximum CSE obtained by the model is found to be 99.16 %. The CSP measures the correctly identified subjects having normal health, which in our study is found to be 96.67 %. In this work, we have used SVM based as well as FSC to classify normal and alcoholic EEG signals. The SVMs classify data using an optimized hyperplane. It is a powerful machine-learning method for handling nonlinear data with small sample sizes for two-class classification problems. Rule-based classifier including FSC can perform well in case of a linear big data for multiclass classification tasks. The SVMs are the best choice in case of two-class classification problems as in the case of SVM, an m-class problem is transformed into m two-class problems. This transformation may lead to the problem of unclassifiable regions. To overcome this limitation, in case of an m-class problem, fuzzy classifiers may be used where one can define polyhedral pyramidal membership functions to avoid the problem of unclassifiable regions [35]. Thus, the fuzzy classifiers may be more efficient when we have to determine the degree of membership also.
these SBs. The extracted features are ranking using ttest ranking and are fed to LS-SVM, SMO-SVM and FSC classifiers. The 10-fold cross-validation is used to avoid redundancy and over-fitting so as to develop a robust model. Our proposed model presented an excellent classification performance in terms of CA, MCC and AU-ROC. Hence, the study also established the superiority of complex valued DTCWT over real-valued DWT in the characterization of alcoholism. The proposed CAD system is non-invasive and cost-effective. Thus, it may be used in remote hospitals situated in villages of the developing countries. It is recommended that the model be tested using a big dataset before being introduced in the practical setup. The study can be extended in future to detect the early stage of alcoholism. Our technique may help to reduce the road accidents by finding the drivers who drive with alcohol. One can also formulate an index [9, 46] to quantify the alcoholism using DTCWT based features. The proposed model can also be refined to detect other neural abnormalities like autism [17], Alzheimer disease [18] and down syndrome using EEG signals. In this study, we have considered only two-class problem therefore as expected SVMs have performed better than FSC. It may be interesting to compare the performance of SVM and FSC classifier where we have to discriminate not only normal and alcoholic classes, but we have to determine the degree of alcoholism as well. Other fuzzy methods like fuzzy equivalence relations [6] and fuzzy support vector machines [35] can also be explored in our future works on the identification of alcoholism.
3 Conclusion We have proposed a new automated alcoholic identification system using DTCWT coupled with SVM and FSC classifiers using EEG signals. The complex analytic DTCWT decomposes EEG signals into various SBs. The L2N and LEE features are extracted from
References 1. Alchohol use disorder. http://www.mayoclinic.org/ diseases-conditions/alcohol-use-disorder/basics/ definition/con-20020866?reDate=29052017. Accessed: 2017-05-29
10 2. Eeg database data set. https://archive.ics.uci.edu/ml/ datasets/eeg+database. Accessed: 2017-05-29 3. What is an alcoholic? how to treat alcoholism. http: //www.medicalnewstoday.com/articles/157163.php. Accessed: 2017-05-29 4. Abdelnour, A.F., Selesnick, I.W.: Nearly symmetric orthogonal wavelet bases. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), vol. 6 (2001) 5. Abdelnour, A.F., Selesnick, I.W.: Symmetric nearly orthogonal and orthogonal nearly symmetric wavelets. Arabian Journal for Science and Engineering 29(2), 3–16 (2004) 6. Acharya, U.R., Bhat, P.S., Iyengar, S., Rao, A., Dua, S.: Classification of heart rate data using artificial neural network and fuzzy equivalence relation. Pattern Recognition 36(1), 61 – 68 (2003). DOI http://dx.doi.org/10.1016/ S0031-3203(02)00063-8. URL http://www.sciencedirect. com/science/article/pii/S0031320302000638 7. Acharya, U.R., Bhat, S., Adeli, H., Adeli, A., et al.: Computer-aided diagnosis of alcoholism-related eeg signals. Epilepsy & Behavior 41, 257–263 (2014) 8. Acharya, U.R., Fujita, H., Sudarshan, V.K., Bhat, S., Koh, J.E.: Application of entropies for automated diagnosis of epilepsy using EEG signals: A review. Knowledge-Based Systems 88, 85–96 (2015) 9. Acharya, U.R., Mookiah, M.R.K., Koh, J.E., Tan, J.H., Bhandary, S.V., Rao, A.K., Hagiwara, Y., Chua, C.K., Laude, A.: Automated diabetic macular edema (dme) grading system using dwt, dct features and maculopathy index. Computers in Biology and Medicine 84, 59–68 (2017) 10. Acharya, U.R., Ng, E., Eugene, L.W.J., Noronha, K.P., Min, L.C., Nayak, K.P., Bhandary, S.V.: Decision support system for the glaucoma using Gabor transformation. Biomedical Signal Processing and Control 15, 18–26 (2015) 11. Acharya, U.R., Sree, S.V., Chattopadhyay, S., Suri, J.S.: Automated diagnosis of normal and alcoholic eeg signals. International journal of neural systems 22(03), 1250,011 (2012) 12. Acharya, U.R., Sree, S.V., Krishnan, M.M.R., Molinari, F., Saba, L., Ho, S.Y.S., Ahuja, A.T., Ho, S.C., Nicolaides, A., Suri, J.S.: Atherosclerotic risk stratification strategy for carotid arteries using texture-based features. Ultrasound in medicine & biology 38(6), 899–915 (2012) 13. Acharya, U.R., Sree, S.V., Swapna, G., Martis, R.J., Suri, J.S.: Automated EEG analysis of epilepsy: a review. Knowledge-Based Systems 45, 147–165 (2013) 14. ACHARYA, U.R., YANTI, R., ZHENG, J.W., KRISHNAN, M.M.R., TAN, J.H., MARTIS, R.J., LIM, C.M.: Automated diagnosis of epilepsy using cwt, hos and texture parameters. International Journal of Neural Systems 23(03), 1350,009 (2013). DOI 10.1142/S0129065713500093. PMID: 23627656 15. Amo, A.d., Montero, J., Biging, G., Cutello, V.: Fuzzy classification systems. European Journal of Operational Research 156(2), 495–507 (2004) 16. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell, MA, USA (1981) 17. Bhat, S., Acharya, U.R., Adeli, H., Bairy, G.M., Adeli, A.: Automated diagnosis of autism: in search of a mathematical marker. Reviews in the Neurosciences 25(6), 851–861 (2014) 18. Bhat, S., Acharya, U.R., Dadmehr, N., Adeli, H.: Clinical neurophysiological and automated eeg-based diagnosis of the alzheimer’s disease. European neurology 74(3-4), 202–210 (2015) 19. Bhati, D., Sharma, M., Pachori, R.B., Gadre, V.M.: Timefrequency localized three-band biorthogonal wavelet filter bank using semidefinite relaxation and nonlinear least squares with epileptic seizure EEG signal classification. Digital Signal Processing 62, 259 – 273 (2017)
Manish Sharma et al. 20. Bhattacharyya, A., Sharma, M., Pachori, R.B., Sircar, P., Acharya, U.R.: A novel approach for automated detection of focal eeg signals using empirical wavelet transform. Neural Computing and Applications pp. 1–11 (2016) 21. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992) 22. Cherpitel, C.J.: Alcohol and injuries: emergency department studies in an international perspective. World Health Organization (2009) 23. Chiu, S.L.: Fuzzy model identification based on cluster estimation. J. Intell. Fuzzy Syst. 2(3), 267–278 (1994). URL http://dl.acm.org/citation.cfm?id=2656634.2656640 24. Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other kernel-based learning methods. Cambridge University press (2000) 25. Druesne-Pecollo, N., Tehard, B., Mallet, Y., Gerber, M., Norat, T., Hercberg, S., Latino-Martel, P.: Alcohol and genetic polymorphisms: effect on risk of alcohol-related cancer. The lancet oncology 10(2), 173–180 (2009) 26. Ehlers, C.L., Havstad, J., Prichard, D., Theiler, J.: Low doses of ethanol reduce evidence for nonlinear structure in brain activity. The Journal of neuroscience 18(18), 7474–7486 (1998) 27. Ethem, A.: Introduction to machine learning (adaptive computation and machine learning). Mass: MIT Press, Cambridge (2004) 28. Faust, O., Acharya, R., Allen, A.R., Lin, C.: Analysis of eeg signals during epileptic and alcoholic states using ar modeling techniques. IRBM 29(1), 44–52 (2008) 29. Faust, O., Acharya, U.R., Adeli, H., Adeli, A.: Wavelet-based EEG processing for computer-aided seizure detection and epilepsy diagnosis. Seizure 26, 56–64 (2015) 30. Faust, O., Yanti, R., Yu, W.: Automated detection of alcohol related changes in electroencephalograph signals. Journal of Medical Imaging and Health Informatics 3(2), 333–339 (2013) 31. Faust, O., Yu, W., Kadri, N.A.: Computer-based identification of normal and alcoholic eeg signals using wavelet packets and energy measures. Journal of Mechanics in Medicine and Biology 13(03), 1350,033 (2013) 32. Fleming, M.F.: Strategies to increase alcohol screening in health care settings. Alcohol Research and Health 21(4), 340 (1997) 33. Giancola, P.R., Moss, H.B.: Executive cognitive functioning in alcohol use disorders. In: Recent developments in alcoholism, pp. 227–251. Springer (1998) 34. Guze, S.B.: Diagnostic and statistical manual of mental disorders, (dsm-iv). American Journal of Psychiatry 152(8), 1228–1228 (1995) 35. Inoue, T., Abe, S.: Fuzzy support vector machines for pattern classification. In: Neural Networks, 2001. Proceedings. IJCNN ’01. International Joint Conference on, vol. 2, pp. 1449–1454 vol.2 (2001). DOI 10.1109/IJCNN.2001.939575 36. Ishibuchi, H., Nakaskima, T.: Improving the performance of fuzzy classifier systems for pattern classification problems with continuous attributes. IEEE Transactions on Industrial Electronics 46(6), 1057–1068 (1999). DOI 10.1109/41.807986 37. Kailath, T.: The divergence and Bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technology 15(1), 52–60 (1967) 38. Kannathal, N., Acharya, U.R., Lim, C.M., Sadasivan, P.: Characterization of eeg-a comparative study. Computer methods and Programs in Biomedicine 80(1), 17–23 (2005) 39. Kingsbury, N.: A dual-tree complex wavelet transform with improved orthogonality and symmetry properties. In: Image Processing, 2000. Proceedings. 2000 International Conference on, vol. 2, pp. 375–378. IEEE (2000)
Automated identification of alcoholism 40. Kreinovich, V., Quintana, C., Reznik, L.: Gaussian membership functions are most adequate in representing uncertainty in . . . In: PROCEEDINGS OF NAFIPS’92: NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY CONFERENCE, PUERTO VALLARTA, pp. 618–624 (1992) 41. Lim, S.S., Vos, T., Flaxman, A.D., Danaei, G., Shibuya, K., Adair-Rohani, H., AlMazroa, M.A., Amann, M., Anderson, H.R., Andrews, K.G., et al.: A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the global burden of disease study 2010. The lancet 380(9859), 2224–2260 (2013) 42. Mallat, S.: A wavelet tour of signal processing. Academic press (1999) 43. Mitchell, T.M., Michell, T.: Machine learningmcgraw-hill series in computer science (1997) 44. Organization, W.H.: Global status report on alcohol and health 2014. World Health Organization (2014) 45. Organization, W.H., et al.: Global status report on alcohol 2004 (2004) 46. Patidar, S., Pachori, R.B., Upadhyay, A., Acharya, U.R.: An integrated alcoholic index using tunable-q wavelet transform based features extracted from eeg signals for diagnosis of alcoholism. Applied Soft Computing 50, 71–78 (2017) 47. Porjesz, B., Begleiter, H.: Alcoholism and human electrophysiology. Alcohol research and health 27(2), 153–160 (2003) 48. Rangaswamy, M., Porjesz, B., Chorlian, D.B., Choi, K., Jones, K.A., Wang, K., Rohrbaugh, J., O’Connor, S., Kuperman, S., Reich, T., et al.: Theta power in the eeg of alcoholics. Alcoholism: Clinical and Experimental Research 27(4), 607–615 (2003) 49. Rangaswamy, M., Porjesz, B., Chorlian, D.B., Wang, K., Jones, K.A., Bauer, L.O., Rohrbaugh, J., Oconnor, S.J., Kuperman, S., Reich, T., et al.: Beta power in the eeg of alcoholics. Biological psychiatry 52(8), 831–842 (2002) 50. Repovš, G.: Dealing with noise in eeg recording and data analysis. In: Informatica Medica Slovenica, vol. 15, pp. 18– 25 (2010) 51. Rumpf, H.J., Hapke, U., Meyer, C., John, U.: Screening for alcohol use disorders and at-risk drinking in the general population: psychometric performance of three questionnaires. Alcohol and Alcoholism 37(3), 261–268 (2002) 52. Selesnick, I.W., Baraniuk, R.G., Kingsbury, N.C.: The dualtree complex wavelet transform. IEEE signal processing magazine 22(6), 123–151 (2005) 53. Sharma, M., Deb, D., Acharya, U.R.: A novel three-band orthogonal wavelet filter bank method for an automated identification of alcoholic eeg signals. Applied Intelligence (2017). DOI 10.1007/s10489-017-1042-9. URL https://doi.org/ 10.1007/s10489-017-1042-9 54. Sharma, M., Dhere, A., Pachori, R.B., Acharya, U.R.: An automatic detection of focal EEG signals using new class of time–frequency localized orthogonal wavelet filter banks. Knowledge-Based Systems 118, 217 – 227 (2017) 55. Sharma, M., Dhere, A., Pachori, R.B., Gadre, V.M.: Optimal duration-bandwidth localized antisymmetric biorthogonal wavelet filters. Signal Processing 134, 87 – 99 (2017) 56. Sharma, M., Pachori, R.B., Acharya, U.R.: A new approach to characterize epileptic seizures using analytic time-frequency flexible wavelet transform and fractal dimension. Pattern Recognition Letters 94, 172 – 179 (2017). DOI https://doi.org/10.1016/j.patrec.2017. 03.023. URL http://www.sciencedirect.com/science/ article/pii/S0167865517300995
11 57. Suykens, J., Vandewalle, J.: Least squares support vector machine classifiers. Neural Processing Letters 9(3), 293– 300 (1999). DOI 10.1023/A:1018628609742. URL https: //doi.org/10.1023/A:1018628609742 58. Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Processing Letters 9(3), 293–300 (1999) 59. Tcheslavski, G.V., Gonen, F.F.: Alcoholism-related alterations in spectrum, coherence, and phase synchrony of topical electroencephalogram. Computers in biology and medicine 42(4), 394–401 (2012) 60. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer New York (2000). DOI 10.1007/978-1-4757-3264-1. URL https://doi.org/10.1007/978-1-4757-3264-1 61. Zhang, X.L., Begleiter, H., Porjesz, B., Wang, W., Litke, A.: Event related potentials during object recognition tasks. Brain Research Bulletin 38(6), 531–538 (1995)
12
Manish Sharma et al.
Manish Sharma received the B.E. degree in Electronics Engineering from Pandit Ravi Shankar Shukla University, Raipur India, in 1998; the M.E. in Electrical Engineering from GSITS, Indore in 2004 and Ph.D. degree from the Indian Institute of Technology (IIT) Bombay, Mumbai, in 2015. Dr. Sharma had been the recipient of Award for Excellence in Research Work from IIT Bombay in 2015 for outstanding Ph.D. He had received a post doctoral fellowship from IIT, Indore during 2016. He has been awarded the prestigious "ERCIM Alain Bensoussan Postdoctoral Fellowship" of European Union, in December, 2017 for carrying out research at the Department of Electronic Systems, the Norwegian University of Science and Technology (NTNU), Norway. He is currently an Assistant Professor with the Department of Electrical Engineering, IITRAM, Ahmedabad, India. His primary research interests are in the area of time-frequency methods, multirate systems, filter banks, and wavelets, machine learning, big data and their applications in communication and biomedical signal processing. . Pragya Sharma obtained the B.E. in Electronics and communication Engineering and M.Tech. degree in Embedded systems and VLSI design from Rajiv Gandhi Proudygiki Proudyogiki Vishwavidyalaya, Bhopal, India in 2013 and 2017, respectively. Her research area includes biomedical signal processing and VLSI design. Ram Bilas Pachori received the B. E. degree with honors in Electronics and Communication Engineering from Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, India in 2001, the M. Tech. and Ph. D. degrees in Electrical Engineering from Indian Institute of Technology (IIT) Kanpur, Kanpur, India in 2003 and 2008 respectively. He worked as a Postdoctoral Fellow at Charles Delaunay Institute, University of Technology of Troyes, Troyes, France during 20072008. He served as an Assistant Professor at Communication Research Center, International Institute of Information Technology, Hyderabad, India during 2008-2009. He served as an Assistant Professor at Discipline of Electrical Engineering, IIT Indore, Indore, India during 2009-2013. He worked as an Associate Professor at Discipline of Electrical Engineer-
ing, IIT Indore, Indore, India during 2013-2017 where presently he has been working as a Professor since 2017. He worked as a Visiting Scholar at Intelligent Systems Research Center, Ulster University, Northern Ireland, UK during December 2014. His research interests are in the areas of biomedical signal processing, nonstationary signal processing, speech signal processing, signal processing for communications, computer-aided medical diagnosis, and signal processing applications. U. Rajendra Acharya PhD, DEng is a senior faculty member at Ngee Ann Polytechnic, Singapore. He is also (i) Adjunct Professor at University of Malaya, Malaysia, (ii) Adjunct Faculty at Singapore Institute of Technology- University of Glasgow, Singapore, and (iii) Associate faculty at Singapore University of Social Sciences, Singapore. He received his Ph.D. from National Institute of Technology Karnataka (Surathkal, India) and DEng from Chiba University (Japan). He has published more than 400 papers, in refereed international SCI-IF journals (345), international conference proceedings (42), books (17) with more than 14,000 citations in Google Scholar (with h-index of 60), and ResearchGate RG Score of 45.95. He has worked on various funded projects, with grants worth more than 2 million SGD. He has three patents and licensed two software’s: (i) automated extraction of retinal vasculature and (ii) automated detection age related macular degeneration (AMD) for a local private company. He is on the editorial board of many journals and has served as Guest Editor for many journals. His major academic interests are in Biomedical Signal Processing, Biomedical Imaging, Data mining, Visualization and Biophysics for better healthcare design, delivery and therapy. He is ranked in the top 1% of the Highly Cited Researchers (2016 and 2017) in Computer Science according to the Essential Science Indicators of Thomson. ( http://hcr.stateofinnovation.com/).