Automatic detection, segmentation and classification ...

3 downloads 0 Views 723KB Size Report
Abstract: Snore related signals (SRS) have been found to carry important information ... classification of SRS from overnight audio recordings are significant in ...
Automatic detection, segmentation and classification of snore related signals from overnight audio recording K. Qian1 Z. Y. Xu1,* H. J. Xu2,* Y. Q. Wu1 Z. Zhao1 1

School of Electronic and Optical Engineering, Nanjing University of Science and Technology, No. 200, Xiaolingwei Street, Nanjing, People’s Republic of China 2

Department of Otolaryngology, Beijing Hospital, No. 1, Dahua Road, Beijing, People’s Republic of China E-mail: [email protected]; [email protected] Abstract: Snore related signals (SRS) have been found to carry important information about the snore source and obstruction site in the upper airway of an Obstructive Sleep Apnea/Hypopnea Syndrome (OSAHS) patient. Overnight audio recording of an individual subject is the preliminary and essential material for further study and diagnosis. Automatic detection, segmentation and classification of SRS from overnight audio recordings are significant in establishing a personal health database and in researching the area on a large scale. In this study, we focused on how to implement this intelligent method by combining acoustic signal processing with machine learning techniques. We proposed a systematic solution includes SRS events detection, classifier training, automatic segmentation and classification. An overnight audio recording of a severe OSAHS patient is taken as an example to demonstrate the feasibility of our method. Both the experimental data testing and subjective testing of 25 volunteers (17 males and 8 females) demonstrated that our method could be effective in automatic detection, segmentation and classification of the SRS from original audio recordings.

1 Introduction Snoring is a highly prevalent disorder among the community, in an early investigation extending to 5,713 people [1] it was found that 19% suffered from snoring (corresponding to 24.1% of the male and 13.8% of the female population) Sound snoring is also a typical symptom for Obstructive Sleep Apnea/Hypopnea Syndrome (OSAHS), a chronic sleep disease sharing a frequent occurrence [2]. OSAHS is characterized by posterior pharyngeal occlusion for at least 10s with apnea/hypopnea and attendant arterial oxyhemoglobin desaturation [2]. If left untreated, this life-threatening sleep disorder may induce high blood pressure, coronary heart disease, pulmonary heart failure and even nocturnal death [3]. Recent studies have found that snore related signals (SRS) carry important information about the snore source and obstruction site in the upper airway (UA) of OSAHS patients [4]. This significant discovery has motivated several researchers to develop acoustic-based approaches that could provide inexpensive, convenient and noninvasive monitoring and screening apparatuses for OSAHS [4]. Portable devices based on SRS analysis for monitoring the health status of subjects are helpful to overcome the considerable low diagnosis rate of OSAHS, approximately 93% affected women and 82% affected men in the United States were undiagnosed and untreated [5]. In addition, real-time surveillance of personal sleep quality in a smart home could be implemented by these acoustic-based techniques [6]. In medical practice, there are more demands from doctors to find the obstruction site in the UA rather than only screening OSAHS [7]. Overnight audio recording of an individual subject is the preliminary material for further study and diagnosis. Duckitt et al. proposed a method for automatic detection and segmentation of snoring from ambient acoustic data by combining spectral features and Hidden Markov Models (HMM) [8]. In their study, they achieved 82%-89% accuracy to identify snores. However, this study was not in terms of classifying different SRS, which might be useful for doctors to find some anatomical variations of UA for the subject during a whole night. Similarly, Liao and Su studied how to separate episodes into snoring sounds and non-snoring sounds by a probability distribution based classifier from all night audio recording [9]. Recently, Mesquita et al. studied the time interval between snores in subjects with OSAHS from the all night audio recording [10]. They found that time interval might be a good feature to distinguish normal and abnormal snores, which indicates the importance of analysis of overnight audio recording. However, studies above did not mention segmentation and classification of different SRS from overnight audio recording automatically. Doctors and specialists in the OSAHS field could acquire convenient and efficient methods of establishing a health database for individual subjects if this task could be implemented intelligently. In this paper, we proposed an automatic method for detecting, segmenting and classifying different SRS from overnight audio recording. This intelligent approach combines acoustic signal processing with machine learning techniques, which would be helpful for doctors and engineers to develop medical diagnostic devices in smart home. A kernel function was utilized to set the suitable energy threshold for SRS events detection, which was originated from image processing and achieved a good performance in this study. Some frequently used acoustic features were adopted and compared with a K-NN model to classify different SRS. A selection factor (SF) was designed to be a criterion to find the optimal classifier for further automatic segmentation and classification of audio recording. In experiments, an overnight audio recording of an OSAHS patient provided by Beijing Hospital, People’s Republic of China, was taken as the material. This patient had a severe OSAHS who generated the most complex groups of SRS during a night compared with other patients in the hospital’s database. Therefore it is reasonable to demonstrate the feasibility of the method we proposed. Finally, an experimental data testing and a subjective testing held involving 17 males and 8 females as volunteers were to assess the quality of classified audio episodes. Our research covered the following areas and offered solutions through our findings as follows:

1. 2. 3.

We proposed a systematic solution for detecting, segmenting and classifying for SRS data from original audio recording; We extracted some frequently-used acoustic features to train a K-NN based classifier and also compared the contribution of each kind of features to the recognition rate; The method and software framework could be applied into practical tool for doctors and specialists to establish segmented and classified SRS database for further biomedical research and personal health caring uses.

2 Theory and method design In this section, we propose the method design of software for automatic detection, segmentation and classification of overnight audio recording. Hardware system consists of a single channel low-noise microphone (Studio-Project® B3, USA) positioned at roughly 0.3m [7] above the patient’s head and a data acquisition card (M-Audio® Fast Track Ultra 8R, USA) configured at a sampling frequency of 16kHz with 16-Bit resolution. The sleep laboratory with this data acquisition system is shown in Fig. 1 and the experiments were approved by committee of ethics in Beijing Hospital, People’s Republic of China.

Fig. 1 Sleep laboratory environment and overnight audio recording data acquisition system

2.1 Main framework The main framework of the method is presented in Fig. 2, which combines signal processing and machine learning techniques.

Fig. 2 Main framework of automatic detection, segmentation and classification of SRS

In this process, overnight audio recording will be read frame by frame. SRS events could be detected and subsequently a trained classifier with extracted acoustic features will label the frames of SRS and assign them to different groups. To eliminate some random noise and misclassifications, we set Td as a condition verification time. Then SRS frames of a same class last more than Td (here we set it to be more than 180ms [7]) are saved as final classified audio episodes. Among this framework, SRS events detection and the trained classifier are essential parts, which will be given in detail from section 2.2 and section 2.3.

2.2 SRS events detection Original audio recording usually contains large amounts of silent episodes (eg. normal breathing, background noise, interference, etc.), which need to be eliminated by voice activity detection (VAD). In speech signal processing, there are numerous techniques like zero-crossing rate, short-time energy threshold, machine learning-based approaches, etc. for VAD [11]. However, these methods are restricted to the environmental conditions and specific acoustic signals [11]. Fortunately, analysis of SRS is an offline task for doctors and researchers, which brings a possibility to acquire enough prior knowledge for further processing. The SRS events detection in our method is performed using energy calculations en from running time frames: L f −1

en =

∑S

2

(k )

(1)

k =0

where en represents the energy of the nth frame, S(k) is the quantized amplitude of audio recording, and Lf is the length of each frame. Fig. 3 describes the SRS events detection in a finite audio recording episode. In this figure, ne represents the frame index of the beginning of the current reading audio episode, Eth is the experimental energy threshold and d is the length of continuous time duration (3 in this case). Frames continuously passed Eth will be labeled as activity frames namely the SRS events. Here j is a pointer in function and d is a time duration parameter.

Fig. 3 Process of SRS events detection Set the experimental energy threshold Eth: The value of Eth is essential in the algorithm of SRS events detection. Enough samples from training database could give us the statistical energy distribution of both SRS and non-SRS frames. Motivated by the good performance of kernel regression in image processing field [12], we adopted a kernel function to estimate the probability density of energy for both SRS and non-SRS data: e − en 1 N 1 N (2) ) fˆh (e) = K h (e − en ) = K( ∑ ∑ N n 1= Nh n 1 h = where K(⋅) is the kernel, a symmetric but not necessarily positive function that integrates to one (in this case K(⋅) is used as a

normal function, namely Φ (⋅)), h > 0 is a smoothing parameter called the bandwidth and en (n = 1, 2, 3, …, N) is the energy sample drawn from some distribution with an unknown density f. Compared with parametric models like Rayleigh Distribution and Gamma Distribution, Kernel Distribution could fully utilize the prior knowledge of data [12]. Fig. 4 illustrates the comparison results of fitting the energy distribution of SRS and non-SRS by Kernel, Rayleigh and Gamma Distribution. As a non-parametric model, Kernel Distribution could achieve a best fitting result due to its full use of the dataset. All the experimental computing and data processing works in this paper are done with the MATLAB R2012b by MathWorks®, USA.

a

b Fig. 4 Energy distribution fitting by Kernel, Gamma and Rayleigh function a SRS data b non-SRS data We defined the false alarm rate Pf and missing alarm rate Pm, respectively as: ∞ P = P( SRS | non − SRS ) = fˆ (e | non − SRS ) de f



Pm = P(non − SRS | SRS ) =

Eth



Eth

0

h

n

fˆh (en | SRS ) den

n

(3) (4)

For overnight audio recording, there are sufficient SRS events for further study, therefore false alarm rate Pf makes greater influence than missing alarm rate Pm on next steps. Our task is to establish a classified SRS database for further study therefore it is an off-line processing. The off-line processing provides sufficient data for us to set a suitable energy threshold Eth based on Pf (Eth is set to make Pf < 0.0001) as Fig. 5 shows.

a

b Fig. 5 Eth selection by kernel function estimated probability density a Eth setting b zoom in of Eth setting It is reasonable to assume that the data acquisition system in the sleep laboratory experiences a quiet environment. So the background noise could be expected to maintain a relatively stable level during a whole night, which means manually selected non-SRS data could represent its whole distribution in statistical point of view. To assess the performance of the SRS events detector, we randomly selected a ten-minute audio file to evaluate the level of its false alarm rate Pf and missing alarm rate Pm as Table 1 shows. Theoretical value is calculated by the numerical computing tool based on the statistical model and experimental value is given for the results of the automatic machine detection compared with specialist manual labeling. In the ten-minute audio file we selected, there were 129 episodes of SRS events labeled by specialist and 126 of them were detected by the SRS events detector. Both the Pf and Pm have achieved a better performance on experiments compared with theoretical value. Table 1 The level of Pf and Pm for the SRS events detector

Theoretical Value Experimental Value

Pf

Pm

0.0077526% 0%

17.34% 3/129 = 2.33%

In practice, the results of the detection are also depended on the length of d mentioned in Fig. 3. Pf will be lower if d is longer while Pm experiences the contrary trend. For the following classification performance, Pf will be much harmful to the following classification rather than Pm because false alarm events could bring redundant information for the trained classifier to recognize, which would impair the quality of the ultimate segmented and classified audio files (misclassified or interference).

Fig. 6 SRS events detection results of an episode Fig. 6 gives the SRS events detection results by our algorithm for an episode. This SRS events detection process is based on the widely-used Neyman-Pearson rule in signal detection [11]. We can see that even though some SRS might be missed, the false detection situation is seldom existed. The start and end point of an SRS event could be labeled clearly by the method we proposed.

2.3 Training the classifier Training the classifier needs some manual labeled SRS as training database, which is established from only a small proportion of the overnight audio recording (below 5%). The classifier training process is shown in Fig. 7.

Fig. 7 Process of training a classifier

2.3.1 Features extraction Like speech signals processing, several researchers studied both temporal and spectral features of SRS [5]. Significant findings are indicated that some acoustic features could establish a relationship between characteristic properties and anatomical variations in the UA of OSAHS patients [5], [7]. Multi-feature extraction for establishment of machine learning model is demonstrated to be effective in classifying OSAHS and non-OSAHS snores [13]. Abeyratne et al. studied deeply on comparison of the diagnostic performance on OSAHS by different classes of features using logistic regression techniques [14]. In this paper, we adopted five classes of frequently-used features extracted from SRS, namely the frequency, energy, formant, Mel-scale frequency cepstral coefficients (MFCC) and Empirical Mode Decomposition (EMD) features. Frequency features extraction: Some basic frequency features have been found useful in distinguishing groups of SRS in our previous study [7]. We defined fcenter, fpeak and fmean as follows: f center

∑X

fc



=

fi = f i 0= fi f center

{

= X f peak max = X fi , fi 0,..., f c fc

f mean =

∑f fi = 0 fc

i

⋅ X fi

∑X fi = 0

(4)

X fi

}

(5) (6)

fi

where Xfi is the absolute amplitude of SRS spectrum at the frequency of fi Hz calculated by Fast Fourier Transform (FFT). fc is the cut-off frequency of the SRS spectrum, which is 8 kHz in our study. In our previous experiments, fmean is a good indicator to reflect the structure of SRS spectrum, which is hepful to find the anatomical variations of UA. To get the detail information about

spectrum distributions of each sub-band of SRS, we modified the equation (6) as: 1000* j



f i ⋅ X fi

= fi 1000*( j −1) 1000* j

f mean ( j ) =

j = 1, 2, ... , 8



= fi 1000*( j −1)

(7)

X fi

then we get 1000-Hz sub-band fmean features from SRS. Energy features extraction: Power ratio at the frequency of 800 Hz is capable of classifying SRS generated by different obstruction sites in UA [7]. We defined this feature as:   PR800 = lg    

  fi = 0  fc  2 X fi  ∑ f i =800  800

∑X

2 fi

(8)

Like PR800, 1000-Hz sub-band energy distributions [15] are also good indicators to represent the spectrum distribution of SRS: 1000* j



X 2fi j 1, 2, ... , 8 = X ∑

fi 1000*( j −1) = fc 2 fi fi = 0

BER( j ) =

(9)

Formant features extraction: UA can be regarded as an acoustic filter in generating SRS: attenuating the transfer of sound energy at certain frequencies while allowing maximal energy through at the resonance frequencies (formants) [16]-[17]. We utilized linear predictive coding (LPC) tool to extract formant features of SRS since biophysics of speech production and generation of snoring sounds share many similarities [16]. In speech signals processing, LPC establishes an all-pole model: 1 (10) H ( z) = p 1 − ∑αk z −k k =1

where αk (k = 1, 2, …, p) is a set of predictor coefficients directly calculated from SRS Yule-Walker equation and solved by Levinson-Durbin algorithm [11]. In our study, an 18th-order LPC analysis was performed and the angles of the complex roots with positive values were calculated, which represent the formant frequencies. We extracted the first three formant frequencies, namely F1, F2 and F3 as formant features for further machine learning. MFCC features extraction: Mel-scale frequency cepstral coefficients (MFCC) have been demonstrated to be effective in automatic speech recognition systems [11]. Follow the previous work done by Liao and Lin [18] we adopted MFCC parameters for analysis of different SRS. For each frame of SRS, the Mel-scale mapping is performed using the relation [18]: f (11) Mel( f ) = 1125ln(1 + ) 700 Thirteen Mel cepstral coefficients (MFCCi, i = 1, …, 13) were estimated using triangular Mel filter banks. EMD features extraction: Snoring sounds are typical non-stationary signals [5] and empirical mode decomposition (EMD) has superior performance on processing non-stationary signals for its adaptive characteristic in choosing basis functions [19]. Motivated by the ability of classifying different sound signals by EMD-based features [20], we defined SRS EMD features with the intrinsic mode functions (IMFs):

Ei =



+∞

−∞

2

ci (t ) dt i = 1, 2, ... , n

(12)

where c1(t), c2(t), …, cn(t) are called as the n IMFs of s(t) and they include different frequency components. The sum of the energy of the n IMFs, E, should be approximately equal to the total energy of s(t) when the residual rn is ignored. We got a normalized EMD energy feature vector as: (13) T = [ E1 / E , E2 / E , ... , En / E ] which reflects the energy distribution in different frequency components of s(t). In addition, the EMD energy entropy was defined as [20]: n

H E = − ∑ pi log 2 pi

(14)

i =1

where pi = Ei/E is the percent of the energy of ci(t) in the whole signal energy. EMD-based features above could have ability to indicate the intrinsic energy distribution of original SRS.

2.3.2 Features selection Selecting subspace of effective features from original features space is significant to improve the performance of machine learning algorithm and saving the computation time and resources [21]. Relief algorithms have been viewed as features subset selection methods that are applied in a prepossessing step before the model is learned [21]. Basic Relief algorithm is only based on two-class problem while extension algorithm ReliefF could cope with multi-class problem. Relief or ReliefF algorithm is based on K-nearest neighbours (K-NN) model [22] and its main idea is as follows: 1. Set all weights W[A] := 0.0; 2. for i := 1 to m do begin 3. randomly select an instance Ri; 4. find K nearest hits Hj;

5. 6. 7.

for each class C ≠ class(Ri) do from class C find K nearest misses Mj(C); for A := 1 to n do

8.

W [ A] := W [ A] − ∑ diff ( A, Ri , H j ) / (m ⋅ K )

K

j =1

+

K   P(C ) diff ( A, Ri , M j (C ))  / (m ⋅ K ); ∑  1 ( ( )) P class R − 1 C ≠ class ( Ri )  j= i 



9. end; where class(Ri) is the class index of instance Ri, diff(A, I1, I2) is the Euclidean distance between instance I1 and I2 in space of feature space A (n dimension), P(C) is the prior probability of C class estimated from the training set. From W[A] generated by the algorithm above, we could get the contribution rate values of each subset of original features space. Features with high contribution rate will be selected and the whole dimension of features space will be reduced.

2.3.3 Classifier modification K-NN model is adopted for its good performance on pattern recognition [22] and our previous work. In our study, K is set to be 10 after experiments. k-fold validation is an effective technique to modify the classifier to make the utmost use of training database [22]. Training database could be equally separated into k groups of samples. In each step of k-fold validation, the kth group of samples is selected as testing data and the other k-1groups of samples are regarded as training data. Considering the former features selection method, we defined a selection factor (SF) to evaluate the performance of trained classifier: ACCmax − α (15) SF = ×100 n

where ACCmax is the maximum recognition rate achieved by classifier in k-fold validation, α is the acceptable recognition rate (in this case k is set to be 10 and α is 80%) and n is the dimension of features space. The optimal ‘Marginal Utility’ in training a classifier could be obtained by maximizing SF.

3 Experimental results and discussions 3.1 Materials An overnight audio recording of an OSAHS patient was provided by the Department of Otolaryngology, Beijing Hospital, Beijing, People’s Republic of China. This patient had severe OSAHS which generated the most complex groups of SRS during a night compared with other patients in the hospital’s database. We found that there are dominantly four classes of SRS in this overnight audio recording and specialists on OSAHS also indicated that the four classes of SRS are really distinct from each other from a generation mechanism point of view. Fig. 8 shows the waveforms and spectrograms of the four different SRS. Our objective is to demonstrate the feasibility of the intelligent method we proposed rather than reveal the medical discovery. From the waveforms and spectrograms, we can find that A class SRS has the biggest amplitude while D class has the smallest one. Class B and C has relatively regular waveforms compared with A and D. For D class SRS, its spectrogram seems like modulated signals, which is the most distinguished one among the four SRS. By confirmation of the specialist, among the four basic classes of SRS data, A, B and C could be snore signals while D could be the non-snore signals. From a whole cycle of snoring, all the four basic SRS could be useful for doctors to study therefore we called them snore related signals (SRS). For our next work, we will reveal the anatomical meaning of the different SRS and study the transition relationship between each class of SRS.

a

b

c

d Fig. 8 Waveforms and spectrograms of four classes of SRS a A class of SRS b B class of SRS c C class of SRS d D class of SRS

3.2 Classifier training results We selected a 30-minute audio file as the training database from the overnight audio recording (8-hour) and the rest episodes were regarded as testing database. Six groups of acoustic features (frequency-based, energy-based, formant-based, MFCC-based, EMD-based and the overall features) were extracted to train a 10-NN classifier and the maximum recognition rate by 10-fold validation with each group of features was calculated. Fig. 9 shows the results of recognition rates with different features and different feature contribution rates calculated by ReliefF algorithm (100% means no reduction of dimension). Fig. 10 gives the SF values of each classifier established in Fig. 9 and we found that MFCC with a contribution rate of 80% was the best classifier. In this model, MFCC-based features had been reduced into a 3-dimensional space, which will be much more efficient in real-time processing compared with the original 13-dimensional MFCC-based classifier.

Fig. 9 Performance on recognition by different features and contribution rates

Fig. 10 SF values of different features and contribution rates

3.3 Detection, segmentation and classification results In our study, we proposed an automatic method for detection, segmentation and classification of different SRS data. We need to note that in this approach, the three steps of detection, segmentation and classification are combined with each other together. We implanted the SRS events detector (mentioned in 2.2) and trained classifier (mentioned in 3.2) into the main process in our software framework thus this intelligent system could automatically detect, segment and classify different SRS from original audio recording. Fig. 11 gives the results of automatic detection, segmentation and classification of different SRS from an audio recording file. We could find that there are some missing episodes in the results of segmented and classified the audio file, however, it is sufficient for doctors and specialists to collect SRS data of an individual OSAHS patient. Our proposed method has a good performance on detecting, segmenting and classifying SRS from the original audio file automatically.

Fig. 11 Results of automatic detection, segmentation and classification of different SRS from an audio recording file

3.4 Subjective testing of classified audio episodes A ten-minute audio recording from the testing database was used to hold a subjective testing to evaluate the quality of the classified audio episodes produced by our method. We adopted a mean opinion score (MOS) method [23] to assess the quality of classified audio episodes. Table 2 shows the standard of MOS we utilized, where similarity is the proportion of classified audio episodes which are identity to the manually selected standard audio episode in one class. Table 2 The standard of MOS method we utilized Rating 5 4 3 2 1

Similarity above 80% 60%-80% 40%-60% 20%-40% below 20%

Quality Excellent Good Fair Poor Bad

There were 25 volunteers (17 males and 8 females) taking part in the MOS-based subjective testing. They assessed the classified audio episodes by observing and listening. The evaluation report is listed in Table 3. The trained classifier might be best to recognize the SRS in Class B, which achieved the best performance in this subjective testing. Female volunteers seemed to be more cautious in marking classified audio episodes compared with males. Generally, all of the classified audio episodes had a Good to Excellent quality with the proposed intelligent method. In future work of our group, we will utilize more medical prior knowledge to train the classifier, which could be useful to establish the personal heath database for individual subjects. Table 3 Report of MOS-based subjective testing Class

A

B

C

snore signals Male

Female

Overall

D non-snore signals

Mean

4.61

5.00

4.61

4.67

Std

0.50

0.00

0.50

0.49

Mean

4.43

4.86

4.29

4.57

Std

0.79

0.38

0.76

0.53

Mean

4.56

4.96

4.52

4.64

Std

0.58

0.20

0.59

0.49

Compared with the work of some other researchers [8]-[9], we proposed a software framework which could combine the segmentation task with classification together. Duckitt et al. studied how to utilize a HMM-based model to classify different acoustic signals from the audio file, however, their research was not in terms of classifying different SRS data, which might be more useful for doctors to embark further deep study related to anatomical structure variations of OSAHS patients. Liao and Su proposed a technique to group apnea and normal snore signals while they did not mention how to automatically establish a SRS database for individual OSAHS patients. Our method is demonstrated to be an effective and practical framework to establish SRS database from original audio recording for OSAHS patients. Doctors and relevant specialists could make further studies using the segmented and classified database according to their needs and experience.

4 Conclusions In this study, we proposed an automatic method for detecting, segmenting and classifying different SRS from overnight audio recording, which combines the acoustic signal processing with machine learning techniques. SRS events detection, classifier training and modification were introduced in detail. Motivated from image processing, kernel function was found to be effective to set the suitable energy threshold in this acoustic-based study. An overnight audio recording of an OSAHS patient provided by Beijing Hospital was taken as an example to show the feasibility of the method we proposed. Manual labeled SRS data were used to train a K-NN classifier with k-fold validation and an SF criterion. Finally, a 10-NN model with 3-dimension MFCC features was selected as the best classifier. MOS-based subjective testing demonstrated that our method could achieve a Good to Excellent quality for all classified audio episodes in each class. Our method could be widely used in areas as follows: a. Automatic detection, segmentation and classification of different SRS from overnight audio recording for doctors; b. Establishment of SRS database of individual OSAHS patients for further research and study; c. Developing personal sleep quality real-time monitoring devices in a smart home.

5 Acknowledgements This study is supported by the National Natural Science Foundation of China (People’s Republic of) under grant No. 61271410. Authors wish to record appreciation to all the doctors of Department of Otolaryngology, Beijing Hospital, P. R. China, for their

assistance and organization of this study. We also appreciate the help from Ms. Jennie for her modification with the grammar of this article.

6 References [1] Lugaresi E., Cirignotta F., Coccagna G., Piana C.: ‘Some epidemiological data on snoring and cardiocirculatory disturbances’, Sleep, 1980, 3, (3-4), pp.4-221 [2] Young T., Palta M., Dempsey J., Skatrud J., Weber S., Badr S.: ‘The occurrence of sleep-disordered breathing among middle-aged adults’, New Engl. J. Med., 1993, 328, pp. 1230-1235 [3] Banno K., Kryger H. M.: ‘Sleep apnea: Clinical investigations in humans’, Sleep Med., 2007, 8, pp. 400-426 [4] Young T., Evans L., Finn L., Palta M.: ‘Estimation of the clinically diagnosed proportion of sleep apnea syndrome in middle-aged men and women’, Sleep, 1997, 20, pp.705-706 [5] Pevernagie D., Aarts M. R., Meyer D. M.: ‘The acoustics of snoring’, Sleep Medicine Reviews, 2010, 14, pp.131-144 [6] Rougui E. J., Istrate D., Souidene W.: ‘Audio sound event identification for distress situations and context awareness’, Proc. 31st Int. Conf. of IEEE EMBS, Minneapolis, Minnesota, USA, September 2009, pp. 3501-3504 [7] Xu Huijie, Huang Weining, Yulisheng, Chen Lan: ‘Spectral Analysis of Snoring Sound and Site of Obstruction in Obstructive Sleep Apnea/Hypopnea Syndrome’, Journal of Audiology and Speech Pathology, 2011, 19, (1), pp. 28-32 [8] Duckitt D. W., Tuomi K. S., Niesler R. T.: ‘Automatic detection, segmentation and assessment of snoring from ambient acoustic data’, Physiol. Meas., 2006, 27, pp. 1047-1056 [9] Liao Wenhung, Su Yisyuan: ‘Classification of audio signals in all-night sleep studies’, Proc. 18th Int. Conf. on Pattern Recognition, Hong Kong, China, August 2006, pp. 302-305 [10] Mesquita J., Solà-Soler J., Fiz A. J., Morera J., Jané R.: ‘All night analysis of time interval between snores in subjects with sleep apnea hypopnea syndrome’, Med. Biol. Eng. Comput., 2012, 50, pp. 373-381 [11] Rabiner R. L., Schafer W. R.: ‘Theory and Applications of Digital Speech Pocessing’ (Pearson Education, Inc., 2010.) [12] Takeda H., Farsiu S., Milanfar P.: ‘Kernel regression for image processing and reconstruction’, IEEE Transactions on Image Processing, 2007, 16, (2), pp. 349-366 [13] Karunajeewa S. A., Abeyratne R. U., Hukins C.: ‘Multi-feature snore sound analysis in obstructive sleep apnea–hypopnea syndrome’, Physiol. Meas., 2011, 32, pp. 83-97 [14] Abeyratne R. U., Silva D. S., Hukins C. and Duce B.: ‘Obstructive sleep apnea screening by integrating snore feature classes’, Physiol. Meas., 2013, 34, pp. 99-121 [15] Azarbarzin A., Moussavi K. M. Z.: ‘Automatic and unsupervised snore sound extraction from respiratory sound signals’, IEEE Transactions on Biomedical Engineering, 2011, 58, (5), pp. 1156-1162 [16] Ng K. A., Koh S. T., Baey E., Lee H. T., Abeyratne R. U., Puvanendran K.: ‘Could formant frequencies of snore signals be an alternative means for the diagnosis of obstructive sleep apnea’, Sleep Medicine, 2008, 9, pp. 894-898 [17] Ng K. A., Koh S. T., Baey E., Puvanendran K.: ‘Role of upper airway dimensions in snore production: acoustical and perceptual findings’, Annals of Biomedical Engineering, 2009, 37, (9), pp. 1807-1817 [18] Liao Wenhung, Lin Yukai: ‘Classification of non-Speech human sounds: feature selection and snoring sound analysis’, Proc. 2009 IEEE Int. Conf. on Systems, Man, and Cybernetics, San Antonio, TX, USA, October 2009, pp. 2695-2700 [19] Huang N. E., Shen Zheng, Long R. S., Wu C. M., Shih H. H., Zheng Quanan, Yen Nai-Chyuan, Tung Chi Chao, Liu H. H.: ‘The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis’, Proc. R. Soc. Lond. A, 1998, 454, (1971), pp. 903-995 [20] Yang Yu, Yu Dejie, Cheng Junsheng: ‘A roller bearing fault diagnosis method based on EMD energy entropy and ANN’, Journal of Sound and Vibration, 2006, 294, pp. 269-277 [21] Robnik-Sikonja M., Kononenko I.: ‘Theoretical and empirical analysis of ReliefF and RReliefF’, Machine Learning, 2003, 53, pp. 23-69 [22] Theodoridis S., Koutroumbas K.: ‘Pattern Recognition’ (Academic Press of Elsevier B.V., 2009, 4th edition.) pp. 595-909 [23] Ribeiro F., Florêncio D., Zhang Cha, Seltzer M.: ‘CROWDMOS: An approach for crowdsourcing mean opinion score studies’, Proc. 2011 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Prague, Czech Republic, May 2011, pp. 2416-2419