Lung sound classification using cepstral-based statistical features Nandini Senguptaa,∗, Md Sahidullahb , Goutam Sahaa a Department
of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, India, Kharagpur-721 302. b Speech and Image Processing Unit, School of Computing, University of Eastern Finland, Joensuu, Finland, Finland-80101.
Abstract Lung sounds convey useful information related to pulmonary pathology. In this paper, short-term spectral characteristics of lung sounds are studied to characterize the lung sounds for the identification of associated diseases. Motivated by the success of cepstral features in speech signal classification, we evaluate five different cepstral features to recognize three types of lung sounds: normal, wheeze and crackle. Subsequently for fast and efficient classification, we propose a new feature set computed from the statistical properties of cepstral coefficients. Experiments are conducted on a dataset of 30 subjects using the artificial neural network (ANN) as a classifier. Results show that the statistical features extracted from mel-frequency cepstral coefficients (MFCCs) of lung sounds outperform commonly used wavelet-based features as well as standard cepstral coefficients including MFCCs. Further, we experimentally optimize different control parameters of the proposed feature extraction algorithm. Finally, we evaluate the features for noisy lung sound recognition. We have found that our newly investigated features are more robust than existing features and show better recognition accuracy even in low signal-to-noise ratios (SNRs). Keywords: Artificial Neural Network (ANN), Auscultation, Discrete Wavelet Transform (DWT), Mel-frequency Cepstral Coefficients (MFCCs), Spectral ∗ Corresponding
Author Email addresses:
[email protected] (Nandini Sengupta),
[email protected] (Md Sahidullah),
[email protected] (Goutam Saha)
Preprint submitted to Computers in Biology and Medicine
June 9, 2016
Features, Statistical Features.
1. Introduction Lung sound characteristics and their diagnoses form an indispensable part of pulmonary pathology [1]. Medical practitioners use diverse techniques to identify the lung sound characteristics but most of the available methods are not always convenient. The simplest and most popular technique to determine the pathological conditions from chest auscultation is to use a stethoscope. However, it is unreliable owing to the factors like: (a) the inexperience of the physician leading to his/her inability to recognize the lung sound abnormalities and dysfunctions; and (b) low sensitivity of the human ear to the lower frequency band of a lung sound [2]. Advanced techniques like chest X-rays, spirogram, arterial blood gas analysis, which physicians heavily rely on, are also not without limitations. Arterial blood gas analysis is invasive and expensive; X-ray radiation is harmful for body and spirogram is a subjective process. Furthermore, the non-stationary nature of lung sounds, which gets severe for the case of abnormal subjects [2], makes the detection task an even more difficult for the physicians [3]. Ergo, more economical, patient-friendly, non-invasive, objective and automated or computerized techniques are desirable. Recent studies in computerized procedures have not only improved over simple auscultation processes, but they have also furnished new insights into the analysis of lung sounds for diagnostic purposes. An automatic lung sound analysis system mainly progresses through the steps of pre-processing, feature extraction and classification. In pre-processing step, collected signal is prepared for the subsequent processing by reducing the heart sound effect [4], filtering or sample rate conversion and amplitude normalization. Feature extraction involves deriving a compact representation of a large set of data without losing distinguishable information. On the other hand, classification assigns different signals to their corresponding groups. A systematic and extensive study on various features and classifiers used for computer-
2
based lung sound recognition can be found in a recent paper [5]. Wavelet-based features proved to be the most popular among these, since they focus on nonstationarity of lung sounds [2]. Compared to this, however, cepstral features along with Gaussian mixture model (GMM) classifier have shown better classification results [6]. Though Gaussian mixture model (GMM) and mel-frequency cepstral coefficients (MFCCs) yielded better recognition accuracy in the case of crackle, it was when MFCCs were used along with artificial neural network (ANN), that comparatively better results for wheeze and normal lung sounds were registered [7]. Over the years, a number of studies have been published where wavelet and cepstral features are used for this task [8, 8–15]. Autoregressive (AR) coefficients, have also been used in certain cases [16–21]. But being highly sensitive to numerical precision, they were transformed into cepstral coefficients [22]. Another frequently used set of features for lung status recognition are those derived from the power spectrum of the signal [23–26, 26– 29]. Yet, being overly sensitive to noise, those power-spectrum based features show poor results. Motivated by the great success of cepstral features in speech pattern classification, in our work, we explore the cepstral features further for lung sound recognition. In speech processing, they are compact representation of spectral envelope that separates source and filter [30]. The source signal is due to the vibrations of the vocal cords whereas the vocal tract acts as a filter [22]. In the context of lung sounds, the chest wall or thoracic wall acts as a filter through which the lung sounds sift through [31, 32]. Besides, cepstral coefficients are suitable for pattern recognition task for they are uncorrelated with each other. In our study, we consider three types of lung sounds: normal, wheeze and crackle. Towards this end, five popular cepstral features are studied falling under two broad categories: (i) features based on all-pole modeling, and (ii) features based on filterbank method. Under the first head, linear prediction cepstral coefficient (LPCC) and its perceptual form, perceptual linear prediction coefficient (PLPCC) [33] are considered. In the case of filterbank-based approach, linear frequency cepstral coefficient (LFCC), its perceptual form, mel-frequency 3
cepstral coefficient (MFCC) [34] and a complementary feature of MFCC, i.e., inverted mel-frequency cepstral coefficient (IMFCC) [35] are studied. Next, we have proposed a new feature extraction method that uses statistical properties of the cepstral coefficients. The proposed feature outperforms the wavelet-based features in terms of classification accuracy. In addition to that, the proposed features require significantly less time for classification task as compared to the baseline cepstral features. Finally, we optimize different parameters for our proposed features and evaluate their performance in presence of different noises. The rest of the paper is organized as follows. Section 2 briefly describes the database used in our study. In Section 3, short-term spectral characteristics of lung sound are studied. In Section 4, cepstral features are described for the completeness of this paper. In Section 5, we study the statistical properties of cepstral coefficients and propose new features from this analysis. Experimental setups are described in Section 6. Results are discussed in Section 7. Finally, we conclude the paper in Section 8 by summarizing the work and by providing some future directions.
2. Database description Systematic collection of lung sound samples with reliable ground truth is an important part of this research. Our database consists of recorded lung sounds obtained from three different resources: RALE database1 , Audio and Bio-signal Processing Lab (IIT Kharagpur)2, Institute of Pulmocare and Research (Salt Lake, Kolkata)3 . The sampling frequency of the original recording is 8000 Hz in all the three cases. The ground-truths were verified by experienced pulmonologists. The database includes three types of lung sounds: normal, crackle (both fine and coarse crackle) and wheeze. As the heart sound is mixed with the recorded lung sound, the lung sound separation is an essential step before the 1 http://www.rale.ca/repository.htm 2 http://www.ecdept.iitkgp.ernet.in/index.php/home/labs/bio-sig-proc 3 http://www.pulmocareindia.org/
4
samples are modeled and characterized. In this work, the effect of heart sound has been reduced from the collected data by adopting the method proposed in [36]. In this approach, the signal is first decomposed into intrinsic mode functions (IMFs) using empirical mode decomposition (EMD) technique. After that, an energy-based heart sound peak detection method is used in each IMFs within the LS signals followed by a boundary estimation algorithm to localize the spread of the heart sounds. Finally, localized heart sounds are reduced from the signal using filtering technique. The lung sound cycles were extracted using a Hilbert envelope based algorithm [37] and we use 72 cycles (24 from each class) for our experiments collected from 30 different subjects.
3. Spectral characteristics of lung sounds Lung sounds can be broadly categorized into normal, abnormal, and adventitious [38, 39]. Normal lung sound refers to the respiratory sound of a healthy subject, hardly audible without a stethoscope. On the contrary, abnormal sound indicates the absence or decrease of normal sound [38] while adventitious sound appertains to the superposition of additional sound with normal sound. Adventitious sounds can again be continuous or discontinuous. Among the several adventitious lung sound instances we come across in the medical field viz. wheeze, crackle, cough, rhonchus, squawk, stridor and so on, two most common adventitious sounds wheeze (continuous) and crackle (discontinuous) are considered for detailed study in this paper. Often, either wheeze or crackle is found to be a characteristic feature of most of the other adventitious pulmonary sounds mentioned earlier. Acoustically, wheezes characterize cough sounds, even the asthmatic ones. Rhonchus is a low-pitched wheeze; squawk is a short wheeze preceded by a crackle; and stridor discernibly is low-frequency wheeze [39]. Wheeze: Wheeze is a continuous and adventitious waveform of duration 250 ms or more, perceptibly displaying a musical character [40]. Wheezing, usually the outcome of localized or diffused airway narrowing or obstruction in the passageway from the larynx to the small bronchi, can be induced by multiple
5
Figure 1: Time-domain characteristics and spectrogram of (a) normal, (b) wheeze, and (c) crackle lung sound cycle.
(a) 0.2 0.1 0 0
200
400
600
800
1000
1200
1400
1600
1800
2000
1200
1400
1600
1800
2000
1200
1400
1600
1800
2000
1000 1200 Frequency (Hz)
1400
1600
1800
2000
(b) 0.2
F−Ratio
0.1 0 0
200
400
600
800
1000 (c)
0.2 0.1 0 0
200
400
600
800
1000 (d)
0.2 0.1 0 0
200
400
600
800
Figure 2: F-ratio of power-spectrum coefficients illustrating the separability of: (a) normal and wheeze, (b) normal and crackle, (c) wheeze and crackle, and (d) normal, wheeze and crackle.
6
causes like mucosal edema, external compression, partial obstruction by a tumor or foreign body, and so on [41]. The dominant frequency range typifying a wheeze normally exceeds 100 Hz [2, 41]. Wheezing is a symptom of diseases predominantly associated with the obstruction of airways. Whereas reversible obstruction characterizes asthmatic tendencies [42], irreversible obstruction denotes chronic obstructive pulmonary diseases (COPDs) [43], for example, emphysema (caused by smoking), chronic bronchitis and cystic fibrosis. Crackle: Discontinuous in nature, crackles normally last for 1 to 10 ms [31, 40]. In general, crackles are caused by explosive opening of small airways [44]. They are short, explosive, non-musical sounds heard on inspiration and sometimes during expiration. It has a distinctly wide frequency range up to 2000 Hz [38, 44]. Two types of crackles are commonly found: fine and coarse. Fine crackles are high pitched and last for less than 5 ms as distinguished from coarse crackles, which are low pitched and have a duration of about 10 ms [45]. Crackles detected in patients can be symptomatic of alveolitis, interstitial lung diseases [46], congestive heart failure [47], carotid arterial stiffness [48], etc. Various types of lung sounds have their different spectral characteristics [49]. From the spectrogram plot in Fig. 1, it can be observed that, crackles are distinguishable from normal and wheeze for their wide frequency range that usually extends up to 2000 Hz. This is also evident from the Fig. 1 that unlike crackle or wheeze, a normal respiratory sound has a narrower frequency range. Then we have done statistical analysis with power-spectrum coefficients to study the class-separability in different frequency components of short-term power spectrum. We use F-ratio based method which gives separability measure between the data of two or more classes [50, 51]. For a particular frequency component, it is computed as the ratio of between-class variance to within-class variance of power spectrum coefficients. For a given frequency component k, it is defined as,
PC
2 ¯ ¯ i=1 [Si (k) − S(k)] P Ci j 1 2 ¯ j=1 [Si (k) − Si (k)] i=1 Ci
F (k) = PC
(1)
7
where Sij (k) is the power spectrum coefficient of j-th sample (j = 1, 2, ..., Ci ) ¯ of i-th class (i = 1, 2, ..., C) at frequency k. S¯i (k) and S(k) are the mean of k-th frequency component of i-th class and all the classes respectively. We compute F-ratios for different frequency components for all pair of classes to analyze class separability in detail. All the lung sound cycles belonging to the three classes are used for this analysis. The variation of F-ratio with respect to frequency is shown in Fig. 2. The short-term power spectrum of each lung sound cycle is obtained using 20 ms frame length with 50% overlap and Hamming window. The main observation from Fig. 2 is that short-term power spectrum in low-frequency region (i.e., from 0 Hz to 1000 Hz) contains more discriminative information than higher frequency region (i.e., from 1000 Hz to 2000 Hz). In pairwise F-ratio analysis, frequency content up to 1000 Hz is adequate to distinguish a signal between normal and wheeze class (Fig. 2(a)). However, entire frequency band (i.e., from 0 Hz to 2000 Hz) contains discriminative information for the separability of normal and crackle as shown in Fig. 2(b). We also notice that the discriminative information between wheeze and crackle are distributed in almost entire frequency band. This is also true when all the three classes are considered together as in Fig. 2(d). From the above analysis, we can conclude that short-term features containing power spectrum information can be used to represent different lung sounds. Cepstral coefficients represent this spectral characteristics in a more robust manner. In the following section, we briefly review the computation method of several cepstral features.
4. Computation of cepstral features Two all-pole modeling based features and three filterbank-based features are used in our study. Unlike wavelet-based features [9], cepstral features are based on short-term spectral analysis. Hence lung sounds are first divided into overlapping frames and features are computed in frame-wise manner.
8
4.1. All-pole modeling based features All pole modeling is a method of representation of speech signal in terms of fewer parameters. It is based on autoregressive (AR) model of random process. 4.1.1. Linear prediction cepstral coefficients (LPCCs) Short-term window of lung sound, s(n) (where n = 0, 1, 2, 3, ..., M − 1. and M is the window size) is modeled using a q-th order linear prediction model:
s(n) =
q X
a(k)s(n − k) + e(n)
(2)
k=1
where a(k) are the linear prediction coefficients (LPCs) and e(n) is the prediction error [52]. The value of q is chosen such that it can effectively capture the real and complex poles in the frequency range up to the Nyquist frequency. The predictor coefficients are determined by minimizing the prediction error in the mean-square sense. These LPCs typically are converted into a robust form popularly known as linear prediction cepstral coefficients (LPCCs) using the following recursive equation,
c0 cm
= ln G = am +
(3) m−1 X
(
k=1
k )ck am−k , m
1≤m≤q
(4)
where G is the gain parameter of the LPC model. LPCCs are traditionally features in speech and speaker recognition [52, 53]. 4.1.2. Perceptual linear prediction cepstral coefficients (PLPCCs) In this case, lung sound is first converted into perceptually meaningful signal through several processing steps mimicking human auditory perception [33, 54, 55]. In the first step, the power-spectrum is integrated using critical band filters placed at an interval of one unit in psychoacoustical Bark scale [56]. Next, the integrated spectrum is weighted using equal loudness curve approximating
9
(a) LFCC Amplitude →
1.5 1 0.5 0 (b) MFCC Amplitude →
1.5 1 0.5 0 (c) IMFCC Amplitude →
1.5 1 0.5 0 0
200
400
600
800
1000 1200 Frequency (Hz) →
1400
1600
1800
2000
LFCC: Linear frequency cepstral coefficient MFCC: Mel-frequency cepstral coefficient IMFCC: Inverted mel-frequency cepstral coefficient
Figure 3: Illustration of filter bank frequency responses for (a) LFCC, (b) MFCC, and (c) IMFCC.
the non-linear sensitivity of human auditory system. Then, cube-root compression is performed on the weighted-energy coefficients. Finally, LPCCs of this coefficients are computed as PLPCCs. 4.2. Filterbank-based cepstral features Filterbank-based parametrizations are widely accepted than the LP-based methods due to their computational advantage using fast Fourier transform (FFT). Three types of such features are studied in this work. 4.2.1. Linear frequency cepstral coefficients (LFCCs) In order to compute the filterbank-based features, short-term power spectrum is computed from the lung sound frame, s(n), using NF F T -point FFT 10
as:
M X |Y (k)|2 = s(n) · e
−j2πnk NF F T
n=1
where, 1 ≤ k ≤
NF F T 2
2 ,
(5)
+ 1. Next, filterbank with linearly spaced triangular
filters in the Hz scale is imposed on the spectrum as shown in Fig. 3(a). The outputs {e(i)}Q i=1 of the linearly spaced band-pass filters can be calculated by a weighted summation between respective filter response ψi (k) and the energy spectrum |Y (k)|2 as:
NF F T 2
X+1
e(i) =
|Y (k)|2 · ψi (k)
(6)
k=1
Finally, discrete cosine transform (DCT) is performed on the log filterbank energies {log[e(i)]}Q i=1 and the m-th LFCC coefficient is computed be written as:
Cm =
r
(Q−1) π 2l − 1 2 X · . log[e(l + 1)] · cos m · Q 2 Q
(7)
l=0
4.2.2. Mel-frequency cepstral coefficients (MFCCs) The computation of MFCC is almost the same as LFCC except the fact that here the filters are spaced in non-linear in mel scale [57, 58]. According to the psychophysical studies, human perception of the frequency content of sounds follows this nonlinear scale defined as,
fmel(f ) = 2595 log10 1 +
f 700
(8)
where f is the original frequency in Hz. Here filters are spaced more closely in low-frequency region than high-frequency region as shown in Fig. 3(b). We expect that MFCCs will perform better than LFCCs as more emphasize is given on highly discriminative section of power spectrum. 11
4.2.3. Inverted mel-frequency cepstral coefficients (IMFCCs) IMFCCs are computed with inverted mel scale. Here, the filters as more densely spaced in higher-frequency region [35] as illustrated in Fig. 3(c). IMFCCs capture more detailed information from the high frequency region. From the F-ratio analysis in Fig. 2, we have seen that high-frequency cues could be useful in discriminating normal and crackle as well as wheeze and crackle lung sounds. For this reason, we study IMFCC features in lung sound classification problem.
5. Proposed statistical features and class-separability analysis 5.1. Feature formulation Standard cepstral features are computed in a frame-wise manner. In speech processing applications, such as speech and speaker recognition, they are expressed as mixture of multivariate Gaussian distributions with GMMs since each mixture approximately represents one acoustic class [59]. GMM is suitable there as different speech sounds belong to different acoustic class. Here, we hypothesize that each lung sound cycle of a particular disease class is produced due to a specific physical structure of lung, and hence each cycle can be modeled with a unimodal multivariate normal distribution function, i.e., a single Gaussian distribution per cycle. First we study the histogram of cepstral features for different categories of lung sounds for full database (i.e., considering all the cycles). A representative figure is illustrated in Fig. 4. It can be observed that they approximately follow the normal distribution with different mean and variance. We then perform Jarque-Bera test to check the normality of the cepstral features within a cycle. This uses the values of skewness and kurtosis of distribution curve to check the normality [60]. It gives a test decision whether a data comes from a normal distribution with an unknown mean and variance or not. We have found that in most cases the cepstral coefficients follow normal distribution. The plot of p-values for different features are shown in Fig. 5. We have also used normal 12
350
Normal
300
Wheeze 250
Frequency
Crackle 200
150
mean ± std Normal: −0.0212 ± 0.0677 Wheeze: 0.0391 ± 0.0617 Crackle: −0.0007 ± 0.0047
100
50
0 −0.4
−0.3
−0.2
−0.1 0 0.1 Normalized numerical value of feature
0.2
0.3
0.4
Figure 4: Figure showing histograms of features for normal, wheeze, and crackle lung sounds. We choose MFCCs for this representative plot.
probability plot, a graphical technique to verify if the cepstral data follows normal distribution or not [61]. If all the data points approximate the straight line, then corresponding features follow normal distribution. Figure 6 is the plot of a single dimension of cepstral coefficients of all the frames of an arbitrary chosen cycle. We notice from the illustration that the cepstral coefficients approximately follow normal distribution. From the above analysis, we can conclude that cepstral features from each cycle can be approximated with normal distribution. This leads us to define the probabilistic model of the cepstral coefficients of k-th cycle as,
pk (x) =
1 d
1
(2π) 2 |Σk | 2
exp
1 ⊤ −1 − (x − µk ) Σk (x − µk ) 2
(9)
where µ and Σ are respectively the mean and covariance of the the ddimensional feature vector x, where d is the number of coefficients. Hence, we can parameterize the cepstral features of each cycle with their sample mean
13
(c) Crackle
(b) Wheeze
(a) Normal 0.5
0.5
0.5
0.45
0.45
0.45
0.4
0.4
0.4
0.35
0.35
0.35
0.3
0.3
0.3
0.25
0.25
0.25
0.2
0.2
0.2
0.15
0.15
0.15
0.1
0.1
0.1
0.05
0.05
0.05
0
0 LPCC PLPCC LFCC
MFCC IMFCC
0 LPCC PLPCC LFCC
MFCC IMFCC
LPCC PLPCC LFCC
MFCC IMFCC
Figure 5: The illustration of p-values obtained from Jarque-Bera test of different feature set at the 5% significance level. Plot of p-values are shown for all 24 cycles of each class for each dimension of feature vector.
Normal Probability Plot 0.997 0.99 0.98 0.95 0.90
Probability
0.75
0.50
0.25 0.10 0.05 0.02 0.01 0.003 −1
−0.8
−0.6
−0.4
−0.2 Data
0
0.2
0.4
Figure 6: A representative figure showing normal probability plot.
14
0.6
Lung Sound Cycle Pre-processing
Compute Cepstral Coefficients
Framing and Windowing
Compute Statistical Features
Mean Standard Deviation
Figure 7: Block diagram of proposed feature extraction method.
and variance across frames. We call these as statistical features and they are computed as follows. Let T be the number of frames in a given lung sound cycle and NF be the number of coefficients, we compute mean, µ and standard deviation, σ (i.e., square-root of the diagonal elements of Σk ) of the cycle as,
µi =
T 1 X xij T j=1
(10)
and
1/2 T X 1 σi = (xij − µi )2 T j=1
where i = 1, 2, ...NF
(11)
where xij is the i-th cepstral coefficient of j -th frame of the cycle. Note that as the cepstral coefficients are uncorrelated and hence, we can approximate the covariance matrix as diagonal [34, 59]. We propose to use this mean and standard deviation as the features for lung sound classification. The block diagram of this feature extraction method is shown in Fig. 7. The main advantage of this formulation is that frame-wise cepstral features can be represented as a single vector and it will lead to a faster training and classification for cepstral features. In the next subsection, we analyze the suitability of our proposed statistical features for class separability. 5.2. Class separability analysis using proposed features It is of interest to see which type of the statistical feature – mean or standard deviation – has greater influence on the lung sound classification. Figure 4 shows 15
the distribution plot of an arbitrarily chosen cepstral feature. We observe that means relatively more different than the standard deviation parameter. As is shown in the figure, the numerical values of standard deviation, i.e, square root of variance of baseline cepstral features, show smaller differences than the numerical values of the mean among classes. When comparing normal and wheeze sounds, we find that their standard deviations are 0.0677 and 0.0617, respectively, whereas their means are -0.0212 and 0.0391. Figure 8 shows average of statistical coefficients for different classes. We notice in Fig. 8 that crackle is significantly separable in case of standard deviation of MFCCs for all the coefficients. On the other hand, wheeze and normal are not separable. However, when using mean-based features of MFCCs, lower-order coefficients (particularly from 3 to 11) are distinguishable for different classes. To quantify the discriminative ability of the two distribution parameters, we have calculated the Rayleigh Quotient [62, p. 120]. This is a measure of compactness within each class and separability among classes and it is used in linear discriminant analysis (LDA). In LDA, higher dimensional data are projected to lower dimensional space to maximize the separability in the projected space. The direction of projection is determined such that the Rayleigh Quotient is maximized. We have computed this measure for both features separately. It is computed by,
J(w) =
|w⊤ SB w| |w⊤ SW w|
(12)
where w is the direction of projection that maximizes the discrimination among different classes, SB and SW are the between-class and within-class scatter matrices before projection respectively. w is evaluated by solving the following generalized eigenvalue problem,
SB w = ΛSB w
(13)
where Λ is a diagonal matrix containing eigenvalues as the diagonal elements. 16
(b)
(a) 1
1
Normal Sound Wheeze Sound Crackle Sound
0.5
Average of features
Average of features
Normal Sound Wheeze Sound Crackle Sound
0
−0.5
0.5
0
2
4
6
8 10 12 14 Coefficient index
16
18
20
−0.5
2
4
6
8 10 12 14 Coefficient index
16
18
20
Figure 8: Average of proposed statistical features ((a) mean and (b) standard deviation) for normal, wheeze, and crackle lung sounds. The statistical features are computed on MFCCs, and the average is taken over all the lung cycles of the entire database.
Better separability among classes and compactness within each class is possible by maximizing J(w) [62]. We have calculated this Rayleigh Quotient for both type of statistical features. For mean-based, we get 0.79 and for standard deviation based, it is 0.47 on full database. Therefore, we can say that mean-based features are more distinctive than standard deviation based features. These observations are in agreement with the results in Table 1 and Table 2 as discussed in Section 7.
6. Experimental setup 6.1. Pre-processing & feature extraction In the pre-processing stage, signals are first down-sampled at 4000 Hz as most of the lung sound information exists within 2 kHz. Then we perform amplitude normalization to reduce the instrumental noises. We have used 20 ms long frame length with 50% overlap for extracting short-term features. We use 1024-pt FFT for computing FFT-based power spectrum. As we use 20 filters for spectral analysis and all-pole modeling, the feature dimension is 20 for LFCCs, MFCCs, 17
IMFCCs and LPCCs. But for PLPCCs as 13 filters are used in bark scale for the range 0-2000 Hz, the feature dimension is 13. Sample mean and standard deviation of each with each set of coefficients are calculated across frames as in Eq. (10) and (11). This leads to 20 dimensional feature vector for LPCC, LFCC, MFCC, and IMFCC and 13 dimensional feature vector for PLPCC. For better efficiency, prior to classification, feature vectors are normalized between [-1,1] [2]. 6.2. Classifier and performance evaluation ANN with multilayer perceptron (MLP) has been used as classifier [63, 64]. This consists of 40 hidden layer nodes and 3 output layer nodes. Resilient back propagation (RP) [2, 65] is used here for training ANN. Activation functions of hidden layer and output layer chosen are tan-sigmoid and log-sigmoid for better classification accuracy and efficiency [2]. We have iterated the whole classification method 25 times and average accuracy is calculated. We use leave-one-out cross validation to evaluate our methodology. In leavep-out cross-validation, p observations are used as the test set and the remaining observations in the training class models [18, 66]. In our experiments, one of the available 72 cycles is used as test sample while the remaining are used for training. This method is repeated 72 times to test cycles. The classification accuracy is computed as,
Classification accuracy =
No. of correctly classified cycles × 100. Total no. of cycles under test (14)
We further compute the sensitivity and specificity metric [67]. Sensitivity measures the proportion of abnormal cycles that are correctly identified as abnormal and can be written as, Sensitivity
=
No. of correctly classified abnormal cycle × 100. Total no. of abnormal cycles under test (15) 18
Feature
Crackle
Wheeze
Normal
Sensitivity
Specificity
Overall
Time
Wavelet (31.25-1000 Hz)
100.00
91.00
74.83
95.91
74.83
88.61
4.22
Wavelet (0-2000 Hz)
100.00
90.66
82.83
95.50
82.83
91.16
3.76
Baseline LPCC
100.00
95.83
91.67
97.91
91.67
95.83
723.56
Baseline PLPCC
100.00
95.83
91.67
97.91
91.67
95.83
691.36
Baseline LFCC
100.00
95.83
91.67
97.91
91.67
95.83
717.70
Baseline MFCC
100.00
95.83
95.83
97.91
95.83
97.20
725.50
Baseline IMFCC
100.00
95.83
87.5
97.91
87.5
94.44
730.98
LPCC: Linear prediction cepstral coefficient PLPCC: Perceptual linear prediction cepstral coefficient LFCC: Linear frequency cepstral coefficient MFCC: Mel-frequency cepstral coefficient IMFCC: Inverted mel-frequency cepstral coefficient.
Table 1: Lung sound classification performance for wavelet and baseline cepstral features. Classification accuracy (in %) is shown for crackle, wheeze and normal lung sounds. Sensitivity, specificity and overall accuracy are also computed. The last column shows the computational time (in minutes). On the other hand, specificity measures the proportion of normal cycles that are correctly identified as normal and it can expressed as, Specificity
=
No. of correctly classified normal cycle × 100. Total no. of normal cycles under test (16)
6.3. System setup All algorithms are implemented using MATLAB (Version 2013b) programming language. The experiments are conducted with a Lenovo MT-M 3492-H20 desktop computer using Intel(R) Core(TM) i5-347OS CPU, 8GB RAM and Windows 7 operating system.
7. Results & discussions 7.1. Comparison with baseline methods We have conducted experiments with our proposed features as well as waveletbased approach [2]. First, wavelet coefficients are calculated using Daubechies 19
Parameter
Mean
Standard Deviation
Feature
Crackle
Wheeze
Normal
Sensitivity
Specificity
Overall
Time
LPCC
99.83
95.66
95.00
97.91
95.00
96.80
3.59
PLPCC
100.00
93.50
91.00
97.25
91.00
94.83
3.70
LFCC
99.50
95.00
98.30
97.41
98.33
97.61
3.69
MFCC
99.67
94.50
97.50
97.41
97.50
97.20
3.43
IMFCC
100.00
89.16
87.67
95.41
87.66
92.27
3.99
LPCC
94.00
73.00
73.83
88.58
73.83
80.27
7.02
PLPCC
98.50
70.60
72.16
84.66
72.16
80.44
31.12
LFCC
96.60
77.16
83.16
90.25
83.16
85.6
35.31
MFCC
95.83
76.50
68.30
87.67
68.33
80.22
33.43
IMFCC
97.66
77.30
71.16
90.00
71.17
82.05
34.99
LPCC: Linear prediction cepstral coefficient PLPCC: Perceptual linear prediction cepstral coefficient LFCC: Linear frequency cepstral coefficient MFCC: Mel-frequency cepstral coefficient IMFCC: Inverted mel-frequency cepstral coefficient.
Table 2: Lung sound classification performance for proposed statistical features. Classification accuracy (in %) is shown for crackle, wheeze and normal lung sounds. Sensitivity, specificity and overall accuracy are also computed. The last column shows the computational time (in minutes). mirror filters of order 8 (db8) with five levels, encompassing a frequency range of 31.25-1000 Hz. Then the mean of absolute subbands, standard deviation, average power and ratio of computed means of adjacent subbands are calculated to formulate 19-dimensional feature vectors. These are used with neural network classifier. For an even more appropriate comparison, the frequency range is changed to 0-2000 Hz. Here, the feature vector dimension is 27. These are again applied to neural network classifier using the same configuration. To evaluate the performance of baseline cepstral features, frame-wise cepstral coefficients are separately considered as input to the neural network. The output of the neural network is combined across all the frames to determine the target class for each lung sound cycle. The classification results of all the baseline methods are shown in Table 1. Table 1 demonstrates that cepstral features perform better than the wavelet-based features. This is possibly due to the fact that short-term spectral characteristics can represent the lung sound in-
20
Accuracy
Static
Static + ∆
Static + ∆ + ∆∆
Crackle (%)
99.67
100.00
99.50
Wheeze (%)
94.50
91.50
91.30
Normal (%)
97.50
99.00
94.80
Overall (%)
97.20
96.80
95.20
Table 3: Effect of dynamic coefficients on lung sound classification accuracy for proposed MFCC features. formation in a more effective manner unlike statistical measures of lung sound signals in wavelet-domain. Beside, cepstral features are more robust in presence of nuisance variation in the signal. We also study this in Section 7.3. In case of sensitivity and specificity, cepstral features outperform the baseline wavelet based features. Further from Table 1, we also observe that among all the studied cepstral features, baseline MFCCs yield the best results. This could be due to the fact that MFCCs capture information relevant to the auditory perception. In medical diagnosis field, the expert pulmonologists are able to detect the differences between normal and adventitious sounds by listening it through stethoscope. Therefore, auditory perception specific information captured by MFCCs are suitable for the recognition purpose. On the other hand, baseline IMFCCs give the worst performance among the cepstral features. The reason behind this may be that resolution is more in high-frequency than that of low frequency in inverted mel scale. Due to the higher-spacing of filters at lowerfrequencies, this feature fails to capture the discriminative information of the three classes as discussed in Section 3. We also observe that crackle sounds are accurately classified using all the studied features where as identification of normal and wheeze are relatively difficult. Results in Table 2 shows lung sound recognition performance with proposed statistical features. We found that the proposed mean feature gives better results than standard deviation feature. The performance of proposed mean feature is also comparable to the performance obtained using baseline cepstral coefficients. We also note that the computational time required for our proposed 21
100 99 98
Accuracy (%)
97 Frame length = 50 ms Overall accuracy = 97.83%
96 95 94 93
Crackle Wheeze Normal Overall
92 91 90
20
40
60
80
100 120 Frame length (ms)
140
160
180
200
Figure 9: Effect of frame length on lung sound recognition accuracy for proposed MFCC features.
approach for training and testing is significantly reduced as compared to the baseline cepstral features. For instance, it requires 725.50 minutes for baseline MFCC features where as for its statistical counterpart, the computational time is 3.43 minutes. This is expected as the features are evaluated in a frame-wise manner for the baseline method. This means if a lung cycle consists of T frames, then T data-samples per cycle are processed for training and test. On the other hand, the proposed features are represented by a single vector similar to the wavelet features, and their dimensions are equal to the number of filters in the filterbank. The standard deviation based features gives poor performance because of their lower discriminative capability as discussed in Section 5.2. Note that the proposed features are studied so far for an ad hoc configuration used in speech processing. In the next section, we optimize the configuration parameters of this features for lung sound recognition.
22
7.2. Experimental optimization of cepstral features 7.2.1. Effect of delta and double-delta features Dynamic features of MFCC - delta and double-delta features - include temporal information, i.e., the trajectories of MFCCs over time. Proposed MFCCs along with appended delta and double-delta features are considered for lung sound classification. The experimental results are shown in Table 3. It is observed that MFCCs as stand-alone features show higher accuracy than when combined with any of the delta or double-delta coefficients. Noticeably, doubledelta features show even lower performance improvement than delta features. We conclude that temporal information of our proposed features are not useful for lung sound recognition. Therefore, in our subsequent experiments, the delta and double-delta features are not considered. 7.2.2. Optimization of frame length The short-term features from speech signals are usually extracted with a window of 20 ms. In this section, we analyze how the variation in frame lengths during the process of computing MFCCs can affect the lung sound recognition accuracy. We conduct experiments by varying frame lengths between 10 ms to 200 ms. Figure 9 illustrates the effect of varying the frame length on the overall classification accuracy and also the accuracies for each of the three classes. We observe that classification accuracy of crackle is consistently better than normal or wheeze accuracy for different length of analysis window. Lung sound detection accuracy in the case of wheeze is the lowest compared to the other two. Further, we have found that with the variation in frame length, the individual accuracies for normal, wheeze and crackle sounds are not much affected. The accuracy of normal sound is found to be the best (i.e., 99.16%) at a window length of 50 ms. On the other hand, we obtain best recognition accuracy of wheeze obtained as 95.17% at slightly higher frame length (i.e., 70 ms). In Fig. 9, the solid line representing overall accuracy attains maximum of 97.83% at a frame length of 50 ms.
23
1 0 −1
0 −1
5
10
15
20
25
(b) Wheeze
1 Amplitude
Amplitude
1
Amplitude
Amplitude
(a) Normal
0 −1
5
10
15
20
5
10
15
20
15
20
1 0 −1
5
10
15
20
25
1 Amplitude
Amplitude
(c) Crackle
0 −1
1 0 −1
5
10
15 Coefficients
20
25
5 Clean
10 Coefficients
Noisy
Figure 10: A representative figure showing the noise effect (Additive white Gaussian of 10 dB SNR) on wavelet-based baseline (in left) and sample mean based proposed (in right) features for normal, wheeze, and crackle sounds.
24
Noise
Feature
0 dB
10 dB
20 dB
30 dB
40 dB
Wavelet
35.49 ± 0.22
49.03 ± 0.32
83.73 ± 0.28
89.16 ± 0.22
90.48 ± 0.09
Proposed
63.31 ± 1.05
84.90 ± 1.13
75.35 ± 0.98
85.90 ± 1.06
94.98 ± 0.30
Wavelet
55.07 ± 1.34
83.82 ± 0.49
87.87 ± 1.38
89.89 ± 0.41
90.49 ± 0.10
Babble
Proposed
37.92 ± 1.01
67.04 ± 2.40
88.09 ± 1.47
97.16 ± 0.42
97.83 ± 0.17
Wavelet
59.58 ± 1.26
87.77 ± 0.82
89.60 ± 0.36
90.17 ± 0.15
90.52 ± 0.25
Volvo
Proposed
75.72 ± 0.89
93.33 ± 3.69
97.57 ± 0.10
97.74 ± 0.11
97.73 ± 0.12
White
Table 4: Performance (overall accuracy in %) of lung sound classification with wavelet and mean-based proposed features (computed from MFCCs) in presence of additive white Gaussian noise, babble noise and volvo noise. 7.3. Performance evaluation in presence of noise In real-world, lung sounds can be distorted with noise due to the transmission channel effects, errors in data acquisition, etc. In this section, we evaluate the performance of proposed and baseline features in presence of three additive noises: white Gaussian, babble and volvo noise. Figure 10 illustrates the effect of white Gaussian noise with 10 dB SNR on the proposed mean-based statistical features and wavelet-based features for the three classes. We observe that proposed mean-based features are relatively more robust than wavelet-based features. We evaluate the performance in presence of the those types of noises by artificially degrading the test sample for different SNR levels (0 dB-40 dB). The comparative results are shown in Table 4. For white Gaussian noise, the proposed feature outperforms wavelet-based features for low SNR (less than 20 dB). On the other hand, it yields better results in presence of babble noise for higher SNR (greater than 10 dB). Furthermore for volvo noise, we have obtained improved performance at all SNR levels. Interestingly for white noise, the performance of the proposed features drops suddenly at 20 dB SNR. This could be possibly due to the regularization effect as a result of noise contamination [68]. Considering the overall performance in all the three noisy conditions, we conclude that the proposed mean-based features outperform the waveletbased features.
25
8. Conclusions Acoustic signals produced by the lungs during inspiration and expiration render useful information regarding the lung status. In this study, the competency of the statistical parameters of cepstral features in recognizing three types of lung sounds is investigated. Irrespective of subjective factors like auditory sensitivity or the physician’s inexperience, procuring and utilizing spectral information related to lung sounds and thereby segregating different lung sounds can be done efficiently. Conventional methods of classification that use the waveletbased method or baseline MFCC based method are, no doubt, satisfactory. But cepstral features not only yield good results compared to wavelet but also, the statistical features from cepstral coefficient consume much less computational overhead in comparison to the baseline cepstral features. The proposed features are further optimized and their performance in the presence of three different additive noises are also studied separately. We have found that cepstral features are better than wavelet-based features in terms of classification accuracy. The statistical features extracted from cepstral coefficients are good to approximate the class-specific information and they reduce the computational time appreciably. The sample mean is found to be more suitable for classification than the standard deviation. During the experimental optimization of the proposed features, we observe that dynamic features are not useful for lung sound recognition. We have also found that the optimum size of analysis window for short-term feature extraction of lung sound is moderately longer than what frequently used in speech analysis. In the experiments with noisy lung sounds, we have found that the proposed optimized features outperform existing features at most of the noise-levels studied in this paper. In the present study, we have emphasized on the formulation of feature vectors and their experimental optimization whereas the classifier is fixed. As a continuation of this work, research will be conducted to find the best classifier suitable with proposed features. The proposed feature can also be studied for the analysis of other biomedical signals such as heart sound.
26
Conflict of interest statement The authors declare that they have no conflict of interest.
Acknowledgement The work was supported by Ministry of Human Resource & Development, Govt. of India. We are thankful to Dr. Ashok Mondal for his help in preparing the database for the experiments. We would also like to thank the reviewers for their careful reading of the manuscript and helpful comments. The work of the second author was partially supported Academy of Finland grants (proj. no. 253120 and 283256).
References [1] Z. Moussavi, Fundamentals of respiratory sounds and analysis, in: J. Enderle (Ed.), Synthesis lectures on biomedical engineering, Vol. 1, Morgan & Claypool Publishers, 2006, pp. 1–68. [2] A. Kandaswamy, C. Kumar, R. Ramanathan, S. Jayaraman, N. Malmurugan, Neural classification of lung sounds using wavelet coefficients, Computers in Biology and Medicine 34 (6) (2004) 523 – 537. [3] S. Chowdhury, A. Majumder, Frequency analysis of adventitious lung sounds, Journal of Biomedical Engineering 4 (4) (1982) 305 – 312. [4] M. Pourazad, Z. Moussavi, F. Farahmand, R. Ward, Heart sounds separation from lung sounds using independent component analysis, in: Proceedings of 27th Annual International Conference of the Engineering in Medicine and Biology Society, 2005, pp. 2736–2739. [5] R. Palaniappan, K. Sundaraj, N. U. Ahamed, Machine learning in lung sound analysis: A systematic review, Biocybernetics and Biomedical Engineering 33 (3) (2013) 129 – 135.
27
[6] M. Bahoura, Pattern recognition methods applied to respiratory sounds classification into normal and wheeze classes, Computers in Biology and Medicine 39 (9) (2009) 824 – 843. [7] A. D. Orjuela-Ca˜ no´n, D. G´omez-Cajas, R. Jim´enez-Moreno, Artificial neural networks for acoustic lung signals classification, in: Proceedings of Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Springer, 2014, pp. 214–221. [8] G. Serbes, C. Sakar, Y. Kahya, N. Aydin, Feature extraction using timefrequency/scale analysis and ensemble of feature sets for crackle detection, in: Proceedings of 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, 2011, pp. 3314–3317. [9] S. Pittner, S. Kamarthi, Feature extraction from wavelet coefficients for pattern recognition tasks, Pattern Analysis and Machine Intelligence, IEEE Transactions on 21 (1) (1999) 83–88. [10] X. Lu, M. Bahoura, An integrated automated system for crackles extraction and classification, Biomedical Signal Processing and Control 3 (3) (2008) 244–254. [11] J.-C. Chien, H.-D. Wu, F.-C. Chong, C.-I. Li, Wheeze detection using cepstral analysis in Gaussian mixture models, in: Proceedings of 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, 2007, pp. 3168–3171. [12] M. Bahoura, C. Pelletier, Respiratory sounds classification using cepstral analysis and Gaussian mixture models, in: Proceedings of 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vol. 1, 2004, pp. 9–12. [13] P. Mayorga, C. Druzgalski, R. Morelos, O. Gonzalez, J. Vidales, Acoustics based assessment of respiratory diseases using GMM classification, in: Pro-
28
ceedings of 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, 2010, pp. 6312–6316. [14] S. Khan, N. Jawarkar, V. Ahmed, Cell phone based remote early detection of respiratory disorders for rural children using modified stethoscope, in: Proceedings of 2012 International Conference on Communication Systems and Network Technologies, 2012, pp. 936–940. [15] C.-H. Chen, W.-T. Huang, T.-H. Tan, C.-C. Chang, Y.-J. Chang, Using Knearest neighbor classification to diagnose abnormal lung sounds, Sensors 15 (6) (2015) 13132–13158. [16] B. Sankur, Y. P. Kahya, E. C ¸ . G¨ uler, T. Engin, Comparison of AR-based algorithms for respiratory sounds classification, Computers in Biology and Medicine 24 (1) (1994) 67–76. [17] S. Alsmadi, Y. Kahya, Design of a DSP-based instrument for real-time classification of pulmonary sounds, Computers in Biology and Medicine 38 (1) (2008) 53–61. [18] G.-C. Chang, Y.-F. Lai, Performance evaluation and enhancement of lung sound recognition system in two real noisy environments, Computer Methods and Programs in Biomedicine 97 (2) (2010) 141 – 150. [19] H. Martinez-Hernandez, C. Aljama-Corrales, R. Gonzalez-Camarena, V. Charleston-Villalobos, G. Chi-Lem, Computerized classification of normal and abnormal lung sounds by multivariate linear autoregressive model, in: Proceedings of 27th Annual International Conference of the Engineering in Medicine and Biology Society, IEEE, 2006, pp. 5999–6002. [20] S. Charleston-Villalobos, G. Martinez-Hernandez, R. Gonzalez-Camarena, G. Chi-Lem, J. Carrillo, T. Aljama-Corrales, Assessment of multichannel lung sounds parameterization for two-class classification in interstitial lung disease patients, Computers in Biology and Medicine 41 (7) (2011) 473–482.
29
[21] A. Cohen, D. Landsberg, Analysis and automatic classification of breath sounds, Biomedical Engineering, IEEE Transactions on BME-31 (9) (1984) 585–590. [22] L. Rabiner, R. Schafer, Introduction to digital speech processing, Foundations and Trends in Signal Processing 1 (1) (2007) 1–194. [23] A. Abbas, A. Fahim, An automated computerized auscultation and diagnostic system for pulmonary diseases, Journal of Medical Systems 34 (6) (2010) 1149–1155. [24] R. Riella, P. Nohama, J. Maia, Method for automatic detection of wheezing in lung sounds, Brazilian Journal of Medical and Biological Research 42 (7) (2009) 674–684. [25] S. Xie, F. Jin, S. Krishnan, F. Sattar, Signal feature extraction by multiscale PCA and its application to respiratory sound classification, Medical & Biological Engineering & Computing 50 (7) (2012) 759–768. [26] F. Jin, S. Krishnan, F. Sattar, Adventitious sounds identification and extraction using temporal–spectral dominance-based features, Biomedical Engineering, IEEE Transactions on 58 (11) (2011) 3078–3087. [27] S. Rietveld, M. Oud, E. Dooijes, Classification of asthmatic breath sounds: preliminary results of the classifying capacity of human examiners versus artificial neural networks, Computers and Biomedical Research 32 (5) (1999) 440–448. [28] L. Waitman, K. Clarkson, J. Barwise, P. King, Representation and classification of breath sounds recorded in an intensive care setting using neural networks, Journal of Clinical Monitoring and Computing 16 (2) (2000) 95– 105. [29] M. Munakata, H. Ukita, I. Doi, Y. Ohtsuka, Y. Masaki, Y. Homma, Y. Kawakami, Spectral and waveform characteristics of fine and coarse crackles., Thorax 46 (9) (1991) 651–657. 30
[30] D. Childers, D. Skinner, R. Kemerait, The cepstrum: A guide to processing, Proceedings of the IEEE 65 (10) (1977) 1428–1443. [31] A. Sovijarvi, L. Malmberg, G. Charbonneau, J. Vanderschoot, F. Dalmasso, C. Sacco, M. Rossi, J. Earis, Characteristics of breath sounds and adventitious respiratory sounds, European Respiratory Review 10 (77) (2000) 591–596. [32] J. Xu, J. Cheng, Y. Wu, A cepstral method for analysis of acoustic transmission characteristics of respiratory system, Biomedical Engineering, IEEE Transactions on 45 (5) (1998) 660–664. [33] H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, the Journal of the Acoustical Society of America 87 (4) (1990) 1738–1752. [34] M. Sahidullah, G. Saha, Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition, Speech Communication 54 (4) (2012) 543–565. [35] S. Chakroborty, A. Roy, G. Saha, Improved closed set text-independent speaker identification by combining MFCC with evidence from flipped filter banks, International Journal of Signal Processing 4 (2) (2007) 114–122. [36] A. Mondal, P. Bhattacharya, G. Saha, Reduction of heart sound interference from lung sound signals using empirical mode decomposition technique, Journal of Medical Engineering & Technology 35 (6-7) (2011) 344– 353. [37] A. Mondal, P. Bhattacharya, G. Saha, Detection of lungs status using morphological complexities of respiratory sounds, The Scientific World Journal 2014. [38] R. Palaniappan, K. Sundaraj, N. U. Ahamed, A. Arjunan, S. Sundaraj, Computer-based respiratory sound analysis: a systematic review, IETE Technical Review 30 (3) (2013) 248–256.
31
[39] S. Reichert, R. Gass, C. Brandt, E. Andr`es, Analysis of respiratory sounds: state of the art, Clinical Medicine. Circulatory, Respiratory and Pulmonary Medicine 2 (2008) 45. [40] R. Murphy, S. Holford, W. Knowler, Visual lung-sound characterization by time-expanded wave-form analysis, New England Journal of Medicine 296 (17) (1977) 968–971. [41] H. Gong, Wheezing and asthma, in: H. Walker, W. Hall, J. Hurst (Eds.), Clinical Methods: The History, Physical, and Laboratory Examinations, Boston Butterworths, Boston, 1990, Ch. 37, pp. 203–206. [42] M. Masoli, D. Fabian, S. Holt, R. Beasley, The global burden of asthma: executive summary of the gina dissemination committee report, Allergy 59 (5) (2004) 469–478. [43] D. Mannino, COPD: epidemiology, prevalence, morbidity and mortality, and disease heterogeneity, Chest 121 (5 suppl) (2002) 121S–126S. [44] P. Forgacs, The functional basis of pulmonary sounds., Chest 73 (3) (1978) 399–405. [45] P. Piirila, A. Sovijarvi, Crackles: recording, analysis and clinical significance, European Respiratory Journal 8 (12) (1995) 2139–2148. [46] G. Epler, C. Carrington, E. Gaensler, Crackles (rales) in the interstitial pulmonary diseases, Chest 73 (3) (1978) 333–339. [47] A. Davie, C. Francis, L. Caruana, G. Sutherland, J. McMurray, Assessing diagnosis in heart failure: which features are any use?, QJM 90 (5) (1997) 335–339. [48] J. Blacher, B. Pannier, A. P. Guerin, S. J. Marchais, M. E. Safar, G. M. London, Carotid arterial stiffness as a predictor of cardiovascular and allcause mortality in end-stage renal disease, Hypertension 32 (3) (1998) 570– 574. 32
[49] A. Bohadana, G. Izbicki, S. Kraman, Fundamentals of lung auscultation, New England Journal of Medicine 370 (8) (2014) 744–751. [50] J. Wolf, Efficient acoustic parameters for speaker recognition, The Journal of the Acoustical Society of America 51 (6B) (1972) 2044–2056. [51] S. Nicholson, B. Milner, S. Cox, Evaluating feature set performance using the F-ratio and J-measures., in: Proceedings of 5th European Conference on Speech Communication and Technology, Vol. 97, 1997, pp. 413–416. [52] L. Rabiner, B. Juang, Fundamental of Speech Recognition, 1st Edition, Prentice Hall, 1993. [53] B. Atal, Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, The Journal of the Acoustical Society of America 55 (6) (1974) 1304–1312. [54] M. Sahidullah, S. Chakroborty, G. Saha, On the use of perceptual line spectral pairs frequencies and higher-order residual moments for speaker identification, International Journal of Biometrics 2 (4) (2010) 358–378. [55] B. Gold, N. Morgan, D. Ellis, Speech and audio signal processing: processing and perception of speech and music, John Wiley & Sons, 2011. [56] E. Zwicker, Subdivision of the audible frequency range into critical bands (frequenzgruppen), The Journal of the Acoustical Society of America 33 (2) (1961) 248–248. [57] S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, Acoustics, Speech and Signal Processing, IEEE Transactions on 28 (4) (1980) 357–366. [58] O. Douglas, Speech Communications: Human and Machine, 2nd Edition, Universities Press, 2009.
33
[59] D. Reynolds, R. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models, Speech and Audio Processing, IEEE Transactions on 3 (1) (1995) 72–83. [60] C. Jarque, A. Bera, Efficient tests for normality, homoscedasticity and serial independence of regression residuals, Economics Letters 6 (3) (1980) 255 – 259. [61] J. Chambers, W. Cleveland, P. Tukey, B. Kleiner, Graphical Methods for Data Analysis, 1st Edition, Duxbury Press, 1983. [62] R. Duda, P. Hart, D. Stork, Pattern Classification, 2nd Edition, John Wiley & Sons Inc., 2006, Ch. 3, pp. 111–112. [63] S. Haykin, Neural Networks and Learning Machines, 3rd Edition, Prentice Hall, 2008. [64] S. Ari, G. Saha, In search of an optimization technique for artificial neural network to classify abnormal heart sounds, Applied Soft Computing 9 (1) (2009) 330 – 340. [65] M. Riedmiller, H. Braun, A direct adaptive method for faster backpropagation learning: The RPROP algorithm, in: IEEE International Conference on Neural Networks, IEEE, 1993, pp. 586–591. [66] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic press, 2013. [67] T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters 27 (8) (2006) 861 – 874. [68] C. M. Bishop, Training with noise is equivalent to Tikhonov regularization, Neural computation 7 (1) (1995) 108–116.
34