Acoustic Signal Classification of Breathing Movements ... - IEEE Xplore

1 downloads 0 Views 607KB Size Report
Mar 8, 2013 - In this paper, a novel method is proposed to de- tect and classify breathing movements. The overall VR framework is intended to encourage the ...
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 17, NO. 2, MARCH 2013

493

Acoustic Signal Classification of Breathing Movements to Virtually Aid Breath Regulation Ahmad Abushakra and Miad Faezipour, Member, IEEE

Abstract—Monitoring breath and identifying breathing movements have settled importance in many biomedical research areas, especially in the treatment of those with breathing disorders, e.g., lung cancer patients. Moreover, virtual reality (VR) revolution and their implementations on ubiquitous hand-held devices have a lot of implications, which could be used as a simulation technology for healing purposes. In this paper, a novel method is proposed to detect and classify breathing movements. The overall VR framework is intended to encourage the subjects regulate their breath by classifying the breathing movements in real time. This paper focuses on a portion of the overall VR framework that deals with classifying the acoustic signal of respiration movements. We employ Mel-frequency cepstral coefficients (MFCCs) along with speech segmentation techniques using voice activity detection and linear thresholding to the acoustic signal of breath captured using a microphone to depict the differences between inhale and exhale in frequency domain. For every subject, 13 MFCCs of all voiced segments are computed and plotted. The inhale and exhale phases are differentiated using the sixth MFCC order, which carries important classification information. Experimental results on a number of individuals verify our proposed classification methodology. Index Terms—Acoustic signal of breath, exhale, inhale, Melfrequency cepstral coefficient (MFCC), segmentation, threshold, voice activity detection (VAD).

I. INTRODUCTION A. Motivation HE increasingly high volume of patients diagnosed with common cancer, as a leading killer in the U.S. and all over the world, day after day, has urged many researchers in medical sciences and engineering fields to investigate the prevention and treatment/cure techniques more seriously. The major cancer types that are diagnosed with the greatest frequency in the U.S. are lung cancer, breast cancer, prostate, colon cancer, and melanoma, with lung cancer being the most prevalent cancer diagnosed among all types. According to the American Cancer Society, there are 221,130 new cases of lung cancer diagnosed every year, leading to approximately 156,940 deaths each year [1]. Virtual reality (VR) revolution has a lot of implications in many fields, which could also be used as a simulation technology for healing purposes [2]. This has been an indication to use VR to

T

Manuscript received June 12, 2012; revised October 12, 2012; accepted January 25, 2013. Date of publication February 1, 2013; date of current version March 8, 2013. The authors are with the University of Bridgeport, Bridgeport, CT 06604 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JBHI.2013.2244901

assist the lung cancer patients. In addition, smartphone devices are, nowadays, becoming increasingly popular in several aspects of daily life. The effectiveness and acceptance of smartphone applications by patients have also become an appealing theme. Smartphone devices have now become like general purpose computers in terms of functionality, which traditionally have been available only on normal computers [3]. The small device size, long battery life, handy/affordable features, and embedded high performance processors make them even more attractive for so many applications. VR using smartphones has also evolved rapidly in the scale and scope of developing VR applications for medical purposes [4]. Applying VR technology to assist lung cancer patients via smartphones is a novel idea, yet a big challenge, as lung cancer is directly related to the respiratory system functionality. Our goal is to assist subjects regulate their breath virtually. This could especially be useful for patients with certain breathing disorders such as lung cancer patients. Interestingly enough, virtual therapy has proven to increase the chance of survivability of patients with cancer by more than 50% [4], [5]. This type of therapy relieves the patient in a virtual world, which directly increases their level of hope, empowering the immune system. Soothing and relaxing treatments, even in a fictitious world, have been a very effective tool for treatment and recovery [5]. To this end, the intended overall VR framework of this study aims at increasing the oxygen intake of the patient. Oxygen is supplied from the lungs to the entire body via blood cells. Therefore, any form of exercise that regulates the respiratory system enhances the lung functionality in providing oxygen to the rest of the body, and as a result, eliminates lung cancer symptoms [6]. Lung functionality refers to how well the lungs work and is usually checked to find the cause of breathing problems, such as shortness of breath. It includes how much air you can take into the lungs, how much air the human body can blow out of the lungs, how fast you can do it, how well the lungs deliver oxygen to the blood and the strength of the breathing muscles. The purpose of this study is to help patients regulate their breath which will consequently enhance the lung functionality. The system detects the breathing movements to encourage taking the next breath deeper, if the previous one was insufficient. The idea here is to build a virtual therapy platform on a smartphone device to assist breath regulation. This will be done by permitting users/patients to see a virtually real image of their lungs while they are breathing, so that when their inhalation or exhalation is less than normal, they will be motivated to take their next coming breath more efficiently. This, in turn, will lead to increasing the oxygen percentage in their blood. The overall VR framework is shown in Fig. 1. The real challenge is finding a way to detect the breathing movements for

2168-2194/$31.00 © 2013 IEEE

494

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 17, NO. 2, MARCH 2013

C. Contribution

Fig. 1.

Smartphone application of the intended virtual therapy platform.

Fig. 2.

Breath cycles showing different phases.

individuals in real time and associate these movements with the VR interface for the patient. Therefore, it is also important to understand the physiological characteristics of breathing disorders for various patients and further classify the breathing movements in real time to aid patients in regulating their breath through virtual therapy. In this paper, we focus on developing a novel, efficient, and feasible technique to classify breathing movements. This is one of the most important parts of the intended virtual therapy platform. In the other parts of this platform, we have developed a lung capacity computation module to estimate the capacity of the lung while breathing. We are also working on making this system run in the real time by storing incoming breathing samples in memory units of the framework and reading from the memory for processing. The signal processing is performed on small chunks of the data at a time, making it seem as if it were running in real time. The implementation of VR platform on smartphone, however, is not yet complete till date and is still under progress.

In this paper, we employ Mel-frequency cepstral coefficients (MFCCs) to the speech/voice segments of the acoustic signal of respiration to depict the differences between the inhale and the exhale in frequency domain. MFCC features are widely used in illustrating the different acoustic and physical traits of voice. MFCC has the ability to fully capture the characteristics of the channel spectrum and simulate the human’s auditory function, whose approximation of speech is linearly spaced in frequency scale [8]. MFCCs are based on the known variation of the human ear’s critical bandwidths with frequency. Our technique is based on segmenting the breathing cycles into speech and silence phases and computing the MFCC features of the speech/voice segments. For segmentation purposes, we apply the voice activity detection (VAD) technique as one of the most important functions for silence and speech detection [9] to the acoustic signal of breath. After segmentation, 13 MFCC features of each voiced segment are computed. The sixth MFCC clearly shows distinct properties among other MFCC features, in which it has been used to differentiate the inhale and exhale durations by employing a linear thresholding function. The proposed classification technique is a critical component of our overall intended VR platform to assist breath regulation. Accurate differentiation between inhale and exhale will lead to a more reliable and precise modeling of the lung functionality during the breathing process in a virtual environment, which will eventually achieve the goal of motivating users/patients to regulate their breath. D. Paper Organization The rest of this paper is organized as follows. Section II glances at earlier work related to breathing movement classification. In Section III, the adopted VAD technique, the Melfrequency analysis, and the thresholding methods used to differentiate inhale from exhale are described. The dataset used and the overall classification procedure proposed in this study are also explained in detail. Section IV shows the results obtained and discusses the performance of our technique. A comparison of our classification procedure with respect to other approaches follows in the same Section. Finally, Section V presents conclusions and recommendations for future work.

B. Background In [7], it is shown that the breathing cycle is divided into four different phases: inspiratory phase, inspiratory pause, expiratory phase, and expiratory pause. The breathing phases are defined as starting with the onset of inspiration from the moment the air inflow starts. When the airflow stops, the inspiratory phase ends and the inspiratory pause begins and lasts until the air begins to flow out from the lungs, in which the expiratory phase starts. The expiratory phase is followed by the expiratory pause, which lasts until the end of the breathing cycle, as shown in Fig. 2. In general, the exhale phase has a higher average energy than the inhale, which is due to the fact that air exits the lungs in addition to the voice that is generated. This characteristic can be effectively used for classification purposes.

II. RELATED WORK A plethysmograph is an equipment for determining changes in volume within the body system. Pulmonary plethysmographs are consistently used to measure the volume of the lungs when the muscles of respiration are relaxed. These devices are widely used in clinical setups. However, the high costs and difficult settings make it very inconvenient to use at home. To this end, there have been several signal processing techniques introduced to measure and differentiate inhale from exhale. One approach is to determine the average energy of the signal in time domain [10]. Voiced and unvoiced speech detection techniques have also been previously introduced [11]. Another technique is to identify the maximum amplitude the signal reaches combined

ABUSHAKRA AND FAEZIPOUR: ACOUSTIC SIGNAL CLASSIFICATION OF BREATHING MOVEMENTS

495

A. Segmentation

Fig. 3.

Exhale detection by maximum amplitude.

Fig. 4.

Exhale detection by average energy.

with the number of maximum peaks. The highest amplitude indicates the expiration phase [8]. Authors in [12] and [13] described a method for signal segmentation based on the amplitude of the signal. After fine tuning, this technique has been applied on the acoustic signal of breath and the segmentation results can be seen in Fig. 3. However, this technique is hard to implement and is also unreliable, since it requires continuous modifications for the marking coefficients regarding the breath signal. Another segmentation method has been implemented by calculating the average energy of the signal per phase time, as shown in Fig. 4. This technique is a more reliable approach, since it depends on the whole period of the breathing cycle. An improved version of this technique has been introduced in [14]–[16] where the average energy combined with the zero cross rate is used for segmentation purposes and detecting exhalation. Consequently, researchers went through frequency-domain analysis for detecting breathing movements. In [17], the authors focused on the frequency-domain analysis of the signal to classify the inhale from the exhale. The authors showed that in frequency domain, the magnitude of the signal at the beginning and in the middle carries information about the differences between the inspiration and the expiration. The repeated breathing pattern and differences between inspiration and expiration can be identified in the frequency behavior of the signal [17]. In this study, it has been observed that the smooth beginning and the abrupt ending are often obvious during the inspiration. On the other hand, the expiration phase of athletes shows an abrupt beginning and a smooth ending. Both phases are separated with the inspiration and expiratory pauses. The work presented in this paper also relies on frequency analysis of the breathing signal for differentiating inhale and exhale. III. PROPOSED WORK We aim at detecting the breathing movements, i.e., classify which parts of the signal correspond to the inhale, and which parts correspond to the exhale. The classification procedure is based upon frequency-domain analysis. Fig. 5 shows an overall view of our classification technique.

In our study, the VAD technique is used for segmentation purposes. VAD is one of the most effective functions that differentiate between silence and speech phases [9]. The basic assumption for the VAD algorithm is that the spectrum of the speech signal changes quickly, but the background noise is relatively stationary and changes slowly. In addition, the active speech level is usually higher than the background noise level. In this study, the basic VAD algorithm is fine tuned for the acoustic signal of breath to differentiate the silence and breathing movement phases (see Fig. 6). The output of VAD segmentation in this study is speech/voiced segments that include inhale and exhale durations, and also silence phases. We are interested in the voiced segments for further analysis. The VAD algorithm is implemented by first filtering the signal to remove the undesired low-frequency components, and second, calculating the power with different window sizes of the fast Fourier transform (FFT) of the input signal [9]. Assuming that X(K) are the FFT bins of the input signal x(n), the VAD algorithm is described as follows. 1) Calculate signal energy En as En =

K2 

|X(K)|2

(1)

K =K 1

where K1 and K2 are the nearest integers of frequency indices K. The energy of the signal in a short window is calculated as Es (j) = (1 − αs )Es (j) + αs En

(2)

and the long window signal energy is calculated as El (j) = (1 − αl )El (j) + αl En .

(3)

In these formulations, j is the window number, and αs and αl are the window-length factors. The subscript “s” represents the short window and “l” represents long window. 2) The noise level at the N th frame (Nf ) is computed and a threshold Tr used for signal energy comparison is calculated Tr =

Nf + margin. 1 − αl

(4)

The “margin” is a reasonable value to avoid toggling between voice and silence if the noise level is flat. 3) The current frame signal energy will be compared against the threshold and the decision will be made as follows: ⎧ ⎨1, if En > Tr VAD Flag = 1, if En ≤ Tr and hangover not expired ⎩ 0, if En ≤ Tr and hangover expired. (5) A hangover period is used for transitioning from active speech to silence in order to avoid false detection of the silence at the tail end of speech. During the tail period of the speech and before the hangover counter is expired, the signal frames are classified as active speech.

496

Fig. 5.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 17, NO. 2, MARCH 2013

Block diagram of the proposed breathing movement classification technique.

3) Windowing: Each frame is multiplied by a Hamming window s (n) = s(n) ∗ w(n)

(8)

where w(n) are the coefficients of the hamming window. 4) FFT: The time-domain signal is transformed into frequency-domain energy distribution X(k). 5) Triangular window bandpass filter: The output energy of each filter is computed. These N triangle bandpass filters are linear in the Mel-frequency coordinates m(l) = Fig. 6.

Segmenting acoustic signal of breath using VAD.

B. Mel-Frequency Analysis Mel frequency indicates that the human ear has perception to different voice frequencies. The frequency is linear with perception ability when it is less than 1 kHz and logarithmic otherwise. On the other hand, the perception of human ear is sensitive to low frequencies and rather ambiguous to high frequencies. Mel frequency simulates the hearing characteristics of the human ear, converts spectrum to nonlinear spectrum based on Mel-frequency coordinates, and then converts it to the spectrum domain [18], [19]. Mel frequency can be transformed by the following formula: (6)

The MFCC extraction steps are as follows. 1) Pre-emphasis: The voice signal initially goes through a high-pass filter to suppress the effect of vocal cords H(z) = 1 − μz −1 .

Wl (k) |X(k)| .

(9)

k =o(l)

In this study, we used (length(signal)/L frames, where L = 528 is the window size in frequency domain, and αs = 1/16, and αl = 1/128. After splitting the acoustic breath signal into silence and speech phases using VAD, the next step is to determine whether the speech phases are related to inhale or exhale. The next procedure of our proposed classification technique follows in Section III-B.

Mel(f ) = 2595 log(1 + f /700).

h(l) 

(7)

2) Frame blocking: N samples are combined to an observation unit called frame blocking. In order to avoid changes in two adjacent frames, there will be an overlapping area which contains M sampling points between two adjacent frames. Usually, the value of N is 256 or 512 and M is about half or one-third of the N .

In the previous formula, l = 1, 2,. . ., L refers to the order of the filter, and o(l) and h(l) are the lower limit and upper limit frequency of the N th triangle filters. Considering c(l) to be the central frequency of the N th trik −o(l) if o(l) ≤ k ≤ c(l), and angle filters, Wl (k) = c(l)−o(l) h(l)−k Wl (k) = h(l)−c(l) if c(l) ≤ k ≤ h(l). 6) Discrete cosine transform (DCT): Finally, DCT is performed on the logarithmic energy of the filter output to get the Lth-order MFCC as follows:  

L 2  1 iπ log m(l) cos 1− cm fcc (i) = . N 2 L l=1 (10) MFCC has 13 levels of Mel frequency as a human ear perception. Our technique is based on segmenting the breathing cycles and computing the MFCC features of each speech segment. In our analysis, the sixth MFCC is observed to have certain characteristics, as explained hereafter. The MFCC feature of our interest is the sixth one. From the implementation process, it is noted that the sixth coefficient comes from filter bank number 6 that is responsible for allowing frequencies in the range of 2200–3100 Hz [9]. The physical features and characteristics that are related to this range of frequencies are discussed as follows. 1) First: The relation between the amplitude of the frequencies inside this range and the speed of the speech is prominent. By pronouncing the same letter with different pronunciation speeds, the position of the letters with time domain varies accordingly. By plotting the signal in frequency domain, the resonances change. Also, when the speed of sound is higher, the resonances occur at higher frequencies; the second resonance is shifted right off scale. 2) Second: The sound of the speech that majorly exists in this spectrum range can be detected. Assuming that each

ABUSHAKRA AND FAEZIPOUR: ACOUSTIC SIGNAL CLASSIFICATION OF BREATHING MOVEMENTS

letter of the speech is dominant at a specific spectrum, certain spectrums will be most similar to the inhale pronunciation and the exhale pronunciation, respectively. The pronunciation of “F” is most similar to inhale, and the pronunciation of the exhale is close to the pronunciation of “H.” Hence, the MFCCs are determined and examined for the most variations. It is found that the sixth MFCC of the inhale is greater than that of the exhale, which indicates that the letter “F” is more prominent in that range of spectrum than the exhale. This concept has found application in our study. It has been observed that that the period of the exhale is greater than the period of the inhale. In other words, the speed of pronouncing the inhale is greater than the speed of pronouncing the exhale. Assuming that the sound of the speech of both the inhale and exhale are almost similar, it is predicted that the sixth MFCC of the inhale is greater than the exhale, as the resonance frequency is shifted to the right (higher). C. Linear Thresholding This paper is built upon our previous study which employs MFCC features to classify inhale and exhale [20]. After determining the relation between the sixth MFCC value and the breath phases, a threshold level should be employed to automate the classification procedure. The threshold level is defined as the level that separates between two regions. In our study, it is used to separate between the speech phase (inhale and exhale) MFCC values. In this study, values above the threshold level are determined to be related to the inhale phase, and the ones below the threshold are related to the exhale phase. The threshold value is determined by calculating the average (mean) value of the sixth MFCC related to all speech samples. The patient-adaptive threshold level in this study is configured using the following equation: R th q =1 6 MFCC(q) (11) Threshold = R where R is the number of exhales and inhales, and q is the current phase of computation for a subject/patient. The proposed classification procedure works properly if the difference in amplitude and average energy of the exhale is higher than the inhale. D. Classification Procedure The following steps show the classification procedure in our study. 1) First, the recorded signal for each speaker/subject is split into speech and silence segments. The splitting process was implemented using the VAD technique, as described earlier. 2) Second, the 13 MFCCs for each speech/voiced segment of the same speaker are calculated. 3) Third, the ith MFCC of all samples is determined and kept aside from the other MFCCs.

497

4) Finally, all the ith MFCCs that are related to the same speaker’s breathing segments are plotted. 5) The sixth MFCC for the breathing segments of each speaker are further analyzed using the linear thresholding method, where the inhale and exhale utterances are differentiated. E. Dataset Participants of this study are two male lung cancer patients from Bridgeport Hospital and 123 students from the University of Bridgeport (50 male and 73 female students). The students are adults, with no reported history of breathing problems. Each participant was informed about the study and the procedure before the actual recording took place. Each volunteer was asked to breathe ten cycles in a noise-free environment. The recording of all speech samples took place at the same location for all subjects to provide uniformity. In our experiments, the “SONY VAIO VPCEB42FM microphone (Realtek High Definition Audio)” [21] is used. The microphone was placed approximately 3 cm away from speakers. All samples were recorded with a sampling frequency of 44.1 KHz. Portions from all volunteers were then copied into MATLAB for normalization and processing. IV. EXPERIMENTAL RESULTS A. Setup and Results In this study, the assumption is that the patient is attending this breathing exercise session in a quiet area, and the patient is not doing any other activity (walking, exercising). Each virtual therapy session is assumed to be no longer than 30 min, as it may make the patient very exhausted if it lasts long. The experiments in this study were based on 10–20 s, collected from ten inhales and ten exhales. Our technique has been tested on 125 different speakers/subjects. Each record was captured using a microphone and had ten breathing cycles. The sound utterances are segmented using the VAD technique and classified into inhale and exhale using the sixth MFCC computation and linear thresholding. The sixth MFCC frequency of the alphabet sound joins with the same level of the inhale and exhale. This characteristic shows distinct properties among other MFCC features, and therefore, it has been used to differentiate the inhale and exhale durations. Fig. 7 shows the sixth MFCC values for inhale and exhale classification of four different subjects. The results obtained reflect the novelty and efficiency of this research. As shown in the graphs, we were able to differentiate between the exhale and inhale of various subjects clearly. Fig. 8 also shows that the exhale and inhale values could be separated accurately by simply using a linear threshold function of the average values. This procedure has been applied on multiple subjects and this further supports our hypothesis which was proposed at the beginning of our study. Table I shows the accuracy of our inhale and exhale classification procedure. The table contains the results of 125 subjects including two lung cancer patients, where the sixth MFCC

498

Fig. 7.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 17, NO. 2, MARCH 2013

Sixth MFCC value for inhale and exhale classification of four different subjects. TABLE I BREATHING MOVEMENT CLASSIFICATION ACCURACY

Fig. 8. Linear threshold of the sixth MFCC value for inhale and exhale classification.

values were plotted for all ten speech segments. The sixth MFCC value for the inhale is greater than the exhale with an accuracy that exceeds 90%. Employing a linear threshold further classifies inhales and exhales within 90–100% accuracy. This proves the reliability and efficiency of the proposed technique, even on lung cancer patients. It is important to note that in some acoustic signals of the breath (e.g., in athletes), the inspiration phase has a higher amplitude and average energy than that of the expiration, as shown in Fig. 9. Hence, it is important to appropriately include these subjects in our experiments, to report the classification results

Fig. 9. Breath cycles for an athletic player showing higher inhale average energy than exhale.

ABUSHAKRA AND FAEZIPOUR: ACOUSTIC SIGNAL CLASSIFICATION OF BREATHING MOVEMENTS

accurately. Nevertheless, most subjects have the sixth MFCC values for inhale that are greater than exhale, with an accuracy that exceeds 90%. As far as the reliability in terms of SNR is concerned, the VAD technique, which is a very well known method to extract voice from silence (even with the existence of noise), has been fine tuned for our acoustic breathing signals. Our method performs at the same typical SNR level as the VAD approach. The minimum SNR in our work was 10 dB. The system started performing unreliably for SNR values below 10 dB. In this case, bluetooth microphones may also work reliably using our algorithm. Our study consists of three main processing units to produce the classified output: VAD, MFCC computation, and linear thresholding. The linear thresholding module has computational complexity in the order of n: O(n), assuming that n is the number of breathing cycles that are being processed in one run. The complexity of our system is the largest complexity among the processing units which is mainly due to the FFT computation in VAD and MFCC. The overall complexity is O(nlog n) where n is the number of acoustic samples for FFT computation. The experiments were running on an i3 Intel core with 4 GB RAM and 64-bit processing. With these specifications, the classified output was observed after 0.18 s per breathing cycle. The complexity is reasonable and the overall delay is not noticeable by humans, making this study feasible for real-time processing on a smartphone. Till date, there is no online database that provides acoustic signal of respiration and annotates the inspiration and expiration samples. Our classification results from VAD segmentation, MFCC computation, and thresholding are based on our observations and have been cross validated by our biology collaborator.

B. Comparison With Other Techniques Table II provides a comparison of our breathing movement classification for inhale and exhale (binary classification) among other techniques. The techniques represented in [8], [10], and [11] are based on time-domain analysis of the acoustic signal of breath. In these works, the inhale and exhale phases were highlighted on the signal waveform. However, no numerical representation of the classification accuracy was reported. The work in [17] presented a nonrestrictive breathing monitoring system utilizing the microphone, which is another breathing movement detection technique. The advantage of this method is that it is based on the acoustic signal using a microphone, which is simple, portable, and seems to be well suited for modern breathing monitoring. It also allows for breathing investigations in a wide spectrum of clinical situations, due to its handy implementation. However, the authors do not focus on acoustic signal filtering from a software point of view; rather they focus on the hardware of filtering the acoustic signal to classify the breathing movements. One major drawback of this technique is that it also does not provide a performance accuracy value for the inhale and exhale duration detection. Hence, it cannot be compared with other classification techniques, numerically.

499

TABLE II COMPARISON OF BREATHING MOVEMENT CLASSIFICATION TECHNIQUES

In [22] and [23], the authors introduce a noise cancellation method for extracting authentic lung sounds from noisy auscultation environments. One concern here is that the noise sources may have similar frequency bands as the lung sounds and may not be statistically independent of the lung sounds. This methodology utilizes the feature of time-split stages in breathing sounds, rather than frequency separation or statistical independence. It uses traditional filtering and whitening techniques. The performance accuracy reported was highly dependent on noise factors. V. CONCLUSION AND FUTURE DIRECTIONS In this paper, a novel automated technique was proposed to detect and classify breathing movements. Breathing cycle consists of four phases: the inspiration and the expiration phases, which are surrounded by silence phases. Silence phases can be detected using VAD as a segmentation technique, which splits the breathing movement into silence and voiced (exhale/inhale) phases. By studying the voiced phases, it was observed that the sixth MFCC shows distinction among all the other MFCCs in classifying inhale from exhale, as all values in the inhale or exhale phases are very close to one another for a particular subject. The difference in the sixth MFCC value between the inhale and the exhale is justified as first, the pronouncing speed differences, and second, the inhale pronunciation being mostly similar to “F” and the exhale pronunciation being close to “H.” After MFCC computation, a linear thresholding technique was used to separate inhale from exhale value sets. The overall accuracy, which

500

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 17, NO. 2, MARCH 2013

was calculated after segmentation, MFCC and thresholding was proven to be very efficient in breathing movement detection and classification. As a future direction, we plan to make use of other frequencybased analysis of breathing and blowing samples to increase the accuracy of classifying the inhale and exhale phases to 99% and higher. In addition, the inclusion of athletic data samples and certain noise interrupts such as sneezing or coughing is left for future investigation. This study was part of a larger research project to assist users/patients regulate their breath through virtual therapy. In continuation of this research, we intend to fully implement the VR therapy platform to assist subjects regulate their breath by integrating the proposed classification technique with a highquality animated application and a lung capacity estimation module on a smartphone. ACKNOWLEDGMENT The authors would like to thank Prof. B. Barkana, Dr. M. J. Autuori, and A. Abumunshar for their useful discussions and suggestions regarding this work.

[15] T. L. Burrows, “Speech processing with linear and neural network models,” Cambridge University Engineering Department, Cambridge, U.K., Ph.D. dissertation, 1996. [16] J. K. Shah, A. N. Iyer, B. Y. Smolenski, and R. E. Yantorno, “Robust voiced/unvoiced classification using novel features and Gaussian mixture model,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2004, pp. 17–21. [17] J. Kroutil and M. Husak, “Detection of breathing,” in Proc. 7th IEEE Int. Conf. Adv. Semicond. Devices Microsyst., Oct. 2008, pp. 167–170. [18] H. Wang, Y. Xu, and M. Li, “Study on the MFCC similarity-based voice activity detection algorithm,” in Proc. 2nd IEEE Int. Conf. Artif. Intell. Manage. Sci. Electron. Commerce, Aug. 2011, pp. 4391–4394. [19] Z. Wu and Z. Cao, “Improved MFCC-based feature for robust speaker identification,” IEEE Tsinghua Sci. Technol. J., vol. 10, no. 2, pp. 158– 161, 2005. [20] A. Abushakra, M. Faezipour, and A. Abumunshar, “Efficient frequencybased classification of respiratory movements,” in Proc. IEEE Int. Conf. Electro/Inf. Technol., May 2012, pp. 1–5. R [21] VAIO Laptop ComputersVPCEB42FM/BJ (2011). [Online]. Available: http://www.docs.sony.com/release/VPCEB3_Series.pdf [22] H. Wang, L. Y. Wang, H. Zheng, R. Haladjian, and M. Wallo, “Lung sound/noise separation for anesthesia respiratory monitoring,” WSEAS Trans. Syst., vol. 3, pp. 1839–1844, Jun. 2004. [23] H. Zheng, H. Wang, L. Y. Wang, and G. G. Yin, “Cyclic system reconfiguration and time-split signal separation with applications to lung sound pattern analysis,” IEEE Trans. Signal Process., vol. 55, no. 6, pp. 2897– 2913, Jun. 2007.

REFERENCES [1] Cancer Facts and Figures (2011). [Online]. Available: http://www.cancer. org/acs/groups/content/@epidemiologysurveilance/documents/document/ acspc-029771.pdf [2] S. Wang, Z. Mao, C. Zeng, H. Gong, S. Li, and B. Chen, “A new method of virtual reality based on Unity 3D,” in Proc. 18th IEEE Int. Conf. Geoinformatics, Jun. 2010, pp. 1–5. [3] E. A. Suma, D. M. Krum, and M. Bolas, “Sharing space in mixed and virtual reality environments using a low-cost depth sensor,” in Proc. IEEE Int. Symp. VR Innovation, Mar. 2011, pp. 349–350. [4] H. Liang, H. Song, Y. Fu, X. Cai, and Z. Zhang, “A remote usability testing platform for mobile phones,” in Proc. IEEE Int. Conf. Comput. Sci. Autom. Eng., Jun. 2011, pp. 312–316. [5] B. S. Siegel, Love, Medicine & Miracles: Lessons Learned About SelfHealing From a Surgeon’s Experience With Exceptional Patients. New York, NY, USA: HaperCollins, 1990. [6] J. Wong, E. Lyn, E Wilson, G. Lowe, M. Sharpe, J. Robertson, A. Martinez, and E. Aird, “The application of breathing control for treatment of lung cancer with CHARTWEL,” in Proc. 22nd Annu. Int. Conf. Eng. Med. Biol. Soc., 2000, vol. 4, pp. 2741–2743. [7] P. Hult, B. Wranne, and P. Ask, “A bioacoustic method for timing of the different phases of the breathing cycle and monitoring of breathing frequency,” Med. Eng. Phys., vol. 22, pp. 425–433, 2000. [8] A. Harma, “Automatic identification of bird species based on sinusoidal modeling of syllables,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Apr. 2003, vol. 5, pp. 545–548. [9] M. Sen and W. Tian, Real-Time Digital Signal Processing Implementations and Applications, 2nd ed. Hoboken, NJ, USA: Wiley, 2006. [10] J. K. Lee and C. D. Yoo, “Wavelet speech enhancement based on voiced/unvoiced decision,” presented at the 32nd Int. Congr. Expo. Noise Control Eng., Seogwipo, Korea, Aug. 2003. [11] B. Atal and L. Rabiner, “A pattern recognition approach to voicedunvoiced-silence classification with applications to speech recognition,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-24, no. 3, pp. 201– 212, Jun. 1976. [12] S. Ahmadi and A. S. Spanias, “Cepstrum-based pitch detection using a new statistical V/UV classification algorithm,” IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 333–338, May 1999. [13] Y. Qi and B. R. Hunt, “Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier,” IEEE Trans. Speech Audio Process., vol. 1, no. 2, pp. 250–255, Apr. 1993. [14] L. Siegel, “A procedure for using pattern classification techniques to obtain a voiced/unvoiced classifier,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-27, no. 1, pp. 83–88, Feb. 1979.

Ahmad Abushakra received the B.S. degree in management information system from Philadelphia University, Philadelphia, PA, USA, in 2004, and the M.S. degree in management information system /Ebusiness from the Arab Academy for Banking and Finance Science, Amman, Jordan, in 2007. He is currently working toward the Ph.D. degree in computer science and engineering at the University of Bridgeport, Bridgeport, CT, USA. After graduating, he was with the Arab Education Forum as a Web Developer until August 2008. In September 2008, he became an Information Technology Manager for the Arab Education Forum. Since September 2010, he has been a Research/Graduate Assistant at the School of Engineering, University of Bridgeport. His research interests include smartphones, virtual reality, and mobile web application/web technologies.

Miad Faezipour (S’06–M’10) received the B.Sc. degree in electrical engineering from the University of Tehran, Tehran, Iran and the M.Sc. and Ph.D. degrees in electrical engineering from the University of Texas at Dallas, USA. She is an Assistant Professor of computer science and engineering at the University of Bridgeport (UB), Bridgeport, CT, USA, and the Director of the Digital/Biomedical Embedded Systems and Technology Lab since July 2011. Prior to joining UB, she was a Postdoctoral Research Associate with the University of Texas at Dallas collaborating with the Center for Integrated Circuits and Systems and the Quality of Life Technology laboratories. Her research interests include biomedical signal processing and behavior analysis techniques, highspeed packet processing architectures, and digital/embedded systems. Dr. Faezipour is a member of the IEEE Engineering in Medicine and Biology Society and IEEE Women in Engineering.