Unconstrained Video Monitoring of Breathing Behavior ... - IEEE Xplore

2 downloads 255 Views 788KB Size Report
breathing template to learn individuals' normal breathing patterns ... tional Taiwan University of Science and Technology, Taipei 106, Taiwan (e-mail: ... See http://www.ieee.org/publications standards/publications/rights/index.html for more ...
396

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 61, NO. 2, FEBRUARY 2014

Unconstrained Video Monitoring of Breathing Behavior and Application to Diagnosis of Sleep Apnea Ching-Wei Wang∗ , Member, IEEE, Andrew Hunter, Member, IEEE, Neil Gravill, and Simon Matusiewicz

Abstract—This paper presents a new real-time automated infrared video monitoring technique for detection of breathing anomalies, and its application in the diagnosis of obstructive sleep apnea. We introduce a novel motion model to detect subtle, cyclical breathing signals from video, a new 3-D unsupervised self-adaptive breathing template to learn individuals’ normal breathing patterns online, and a robust action classification method to recognize abnormal breathing activities and limb movements. This technique avoids imposing positional constraints on the patient, allowing patients to sleep on their back or side, with or without facing the camera, fully or partially occluded by the bed clothes. Moreover, shallow and abdominal breathing patterns do not adversely affect the performance of the method, and it is insensitive to environmental settings such as infrared lighting levels and camera view angles. The experimental results show that the technique achieves high accuracy (94% for the clinical data) in recognizing apnea episodes and body movements and is robust to various occlusion levels, body poses, body movements (i.e., minor head movement, limb movement, body rotation, and slight torso movement), and breathing behavior (e.g., shallow versus heavy breathing, mouth breathing, chest breathing, and abdominal breathing). Index Terms—Action recognition, behavior analysis, breathing monitoring, obstructive sleep apnea (OSA).

I. INTRODUCTION BSTRUCTIVE Sleep Apnea (OSA) [1] is a condition with severe complications including: reduction in cognitive function, cardiovascular disease, stroke, fatigue, and excessive day time sleepiness. OSA is characterized by repetitive obstruction of the upper airways during sleep, resulting in oxygen desaturation and frequent arousal events, characterized by violent awakening. Although OSA affects around 4% of men

O

Manuscript received January 9, 2013; revised August 19, 2013; accepted August 27, 2013. Date of publication August 29, 2013; date of current version January 16, 2014. This work was jointly supported by United Lincolnshire Hospitals NHS Trust and the University of Lincoln. A research Ethics approval was gained from Derbyshire Research Ethics Committee (REC number 08/H0401/12). Asterisk indicates corresponding author. ∗ C.-W. Wang is with the Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan (e-mail: [email protected]). A. Hunter is with the University of Lincoln, Lincoln, LN6 7TS, U.K. (e-mail: [email protected]). N. Gravill and S. Matusiewicz are with the Medical Physics Department and Medicine School of the United Lincolnshire Hospital, Lincoln, LN2 5QY, U.K. (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TBME.2013.2280132

and 2% of women [1], [2] the majority of affected individuals, perhaps 80–90% [3], [4], remain undiagnosed. The gold standard diagnostic tool for OSA is Polysomnography (PSG), which measures a wide range of parameters, including brain waves (EEG), eye movements, skeletal muscle activation, electrocardiogram (ECG)/heart rate, airflow, respiratory effort, and blood oxygen saturation using a range of sensors. However, PSG is costly, labor-intensive (not least in analyzing the data), and invasive, which may disturb sleep and compromise the findings. A popular cost-effective, less invasive alternative combines pulse oximetry (to measure blood oxygen saturation levels [SpO2] and heart rate) with infrared (IR) video monitoring. The clinician identifies suspicious areas on the pulse oximetry trace (defined by a dip of more than 4% in the oxygen saturation level) and reviews the corresponding video data to reach a diagnosis. However, the pulse oximetry traces of some OSA patients do not show all the abnormalities, forcing the clinician to review a significant amount of the video data. To reduce the workload, some existing video systems [5] try to detect patient movement, utilizing patterned sheets and IR light to detect gross degrees of motion, which at least identifies periods of activity, even if it does not identify what the activities are. However, if the patterned cover is removed by the patient, the system fails. There is thus a growing interest in alternative, more robust, automated approaches to the diagnostic assessment of OSA. The contact-type approaches include thoracic–abdominal bands [7], which track changes in the body circumference during the respiratory cycle, stick-on electrodes such as the ECG method [8], the nasal temperature probe [9], and contact-type microphone for audio analysis to monitor tidal volumes from the human breathing activity [10]. The main disadvantages of these approaches are consequent on their invasiveness: they may be uncomfortable, which disturbs sleep and compromises results; and patient movement may dislodge sensors or compromise readings. The noninvasive techniques include noncontact audio analysis [11], [12], vibration sensors [13], thermal imaging [14]–[16], and Doppler radar sensors [17], [18] designed to identify breathing. A major challenge for noncontact type audio analysis is the extraction of the breathing sounds from the sensor signals contaminated by the environmental noise [11] [12]. The vibration sensors require an expensive specialized hardware and impose positional and postural constraints. The thermal imaging techniques have been used to capture a breathing signal, by detecting the breath as it is expelled [14]–[16], and the radar sensors have

0018-9294 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

WANG et al.: UNCONSTRAINED VIDEO MONITORING OF BREATHING BEHAVIOR AND APPLICATION TO DIAGNOSIS OF SLEEP APNEA

Fig. 1. Unstable signals appear in the depth images using a low-cost-3-D camera (Kinect).

been proposed for monitoring of the cardiac and respiratory motion [17], [18]. However, in both the methods, there are strict positional constraints (the mouth/nose region must be targeted), and the region of interest must not be occluded. These requirements are not easily fulfilled when monitoring humans during sleep. In this paper, we investigate the use of IR video in detecting apnea events. This has the advantages of using standard, low-cost hardware, and being noninvasive. However, there are several major technical issues: the breathing motion is barely perceptible (due to obscuration by bed clothing and the subtlety of the breathing movements), and being cyclical the movements are prone to self-occlusion. Consequently, the standard motion detection and activity recognition methods do not function well. An interesting alternative exists in modern low-cost 3-D cameras (e.g., Kinect, Xtion Pro, CamBoard nano, or Gesture Camera), which have been suggested for the respiratory motion detection [19], [20]. However, our preliminary analysis indicates that the image signal from Kinect is much more unstable than the IR image (see Fig. 1), making filtering of noise and detection of subtle breathing patterns more difficult. The use of a standard IR camera alone, which may already be available in sleep labs, is also helpful in reducing the complexity of the technology. The standard motion detection methods include difference of frames (DOF) and optic flow. DOF can be formulated as follows: D(t) = |I(t) − I(t − k)|

(1)

where I(t) is the intensity image at frame/time t, k is the selected time interval, and D(t) is the frame difference. If k = 1, D(t) is the difference of consecutive frames. The difference is thresholded to produce a binary difference map  1, if D(x, y, t) > α B(x, y, t) = (2) 0, otherwise where α is a selected threshold. As breathing movement is so subtle, the value of α in (2) must be set to such a small value (e.g., α = 1) to detect differences that noise problems become excessive, particularly as IR sensors suffer from high noise levels [22]. The optic flow tends to fail in regions with a largely homogeneous appearance [23], which is typical of bed clothing. Furthermore, objects that move in a straight line but oscillate

397

forward and backward tend to have low salience [21]. These issues make optic flow unsuitable for our problem domain. Activity recognition extracts a compact representation of spatiotemporal features and uses this to classify activities. A popular recent approach treats the video sequence as a 3-D space-time volume (of intensities, gradients, optical flow, or other local features). Efros et al. [24] perform action recognition by correlating optical flow measurements from low-resolution videos. Bobick and Davis [25] proposed a static vector image as a temporal template to represent human movement, where the vector value at each point is a function of the motion properties at the corresponding spatial location; they introduced the motion history image (MHI) and motion energy image (MEI), spatiotemporal models that can be matched to stored models of known actions. However, the technique is view sensitive, requiring the “shapes” of actions in the same category to be similar and the shapes of actions in different categories to be dissimilar. In our domain, there is little constraint on the subject’s sleeping posture and the shape of breathing varies. MEI and MHI are derived from DOF, and indeed Bobick and Davis [25] suggest that a more robust motion detection mechanism is required in situations where the test subject moves slowly. In addition, the MHI is vulnerable to spatial motion self-occlusion occurring within a temporal window due to overwriting. The extensions to the MHI have been designed to handle self-occlusion [26], [27], but as these are based on DOF techniques, they remain unsuitable for our domain. Gorelick et al. [28] also use spatiotemporal volumes for action recognition, modeling human actions as silhouettes of a moving torso and protruding limbs undergoing articulated motion. These space-time shapes may be used to classify actions. Another popular technique is to track space-time interest points [29] to generate spatial-temporal “words” [30] (using the bag of words representation), and to classify these using probabilistic techniques [31], [32]. However, the lack of distinctive patterns on the bed cover makes this unsuitable for our domain. This paper presents a new real-time IR video monitoring technique for detecting abnormal breathing activities. It extends our previous work published in short form in [6]. Here, we introduce improved models for both the motion detection and activity recognition that are less sensitive to noise than our earlier approach. Our evaluation demonstrates that these achieve high accuracy in recognizing the abnormal breathing events and the body movement. The organization of the paper is as follows. The proposed algorithm is introduced in Section II. Section III shows the experimental results on 15 video sequences featuring simulated apnea episodes and four clinical clips featuring actual episodes. Section IV concludes the paper. II. BREATHING DETECTION This section presents a new IR video monitoring approach for anomalous breathing behavior detection. No positional constraint on the patient is imposed (other than by the orientation and position of the bed), allowing patients to sleep on their back or side, with or without facing the camera. The technique works with subjects either fully or partially obscured by a bed cover.

398

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 61, NO. 2, FEBRUARY 2014

Fig. 2. Sample activity map generated using PLIM, illustrated across a breathing cycle. The subject’s breathing is detected around the edge of the subject and folds in the bed clothes. The activity level grows and falls in a cyclical manner through the breathing cycle.

⎧ ⎪ ⎨ 1, P (x, y, t) = P (x, y, t − 1) + 0, ⎪ ⎩ −1,

Fig. 3. Activity level, et , may be used to detect events, with normal breathing (b,d), apnea events (e), and body movements (a,c) giving rise to significantly different levels of et .

The system monitors the degree of motion; while this remains below a threshold, the subject is undergoing normal breathing; when it exceeds the threshold, a motion event has occurred. The system learns a template to characterize normal breathing motion patterns online, and uses this to classify motion events as body movements, normal breathing episodes, deep breathing episodes, or apnea episodes. A. Motion Detection for Breathing Analysis To address the limitations of DOF methods with respect to the detection of breathing, we have developed a persistent luminous impression model (PLIM). PLIM is derived from the concept of background modeling, which updates a model of the background to discount transitory noise sources while allowing adaptation to long-term changes [33], [34]; however, the PLIM is tuned to detect subtle motion rather than to segment foreground objects. In background differencing, the background model is updated over time to avoid accumulated errors. In contrast, the PLIM is designed to accumulate errors to enhance the breathing signals and to differentiate between the breathing activity and the body movement. The PLIM incorporates slow adaptation, allowing pose changes to be accommodated while allowing cyclical movements to be detected; see Fig. 2. A simple measure of activity level can be extracted from the PLIM, and used to identify motion events; see Fig. 3. Given an M × N image, and frame rate of F frames/s, the PLIM is initialized using the image values P (x, y, 0) = I(x, y, 0).

(3)

At time t, the PLIM is updated using Δ(x, y, t) = I(x, y, t) − P (x, y, t − 1)

(4)

Δ(x, y, t) > 0 Δ(x, y, t) = 0 . (5) Δ(x, y, t) < 0

The PLIM activity map A(x, y, t) is defined as  1, if I(x, y, t) − P (x, y, t) > α A(x, y, t) = 0, otherwise (6) where α is the detection threshold, a parameter of the model. During normal sleep, breathing causes subtle movements away from and back toward any arbitrarily chosen starting point. These are observable as a cyclical growth and decline of regions in the PLIM activity map; see Fig. 2. We define the activity level, et , as the number of detected pixels in the activity map, A(x, y, t), computed using (6) at time t  A(x, y, t). (7) et = (x,y )

B. State Algorithm for Action Segmentation Normal breathing is (barely) perceptible in the et level; however, motion events manifest as significant perturbations; see Fig. 3. We segment motion events by identifying the start and end time, ts and te , where et rises above and subsequently falls below thresholds, described below. The sequences between motion events, where the activity level is very low, correspond to periods of normal breathing. We, therefore, use a two-state algorithm, which switches between the normal breathing state and the motion event state. It is possible to classify motion events using only the duration and peak values from the corresponding section of the et time series, but this is insufficient to distinguish some movements (e.g., slight head movements) from apnea episodes. A more sophisticated approach using online breathing templates is introduced in the next section. C. Templates for Normal Breathing Activity The sleeping subject undergoes protracted periods of normal breathing, where there is only slight movement in particular areas (e.g., around the rib cage, shoulder, throat, mouth, or abdomen, depending on the posture and individual breathing behavior). By identifying where this movement occurs, it is possible to distinguish even quite subtle body movements from breathing, and to classify breathing actions as apnea, moderate deep breathing, or normal breathing episodes. We use a template-based method to capture the regions of movement corresponding to normal breathing. However, as the subject tends

WANG et al.: UNCONSTRAINED VIDEO MONITORING OF BREATHING BEHAVIOR AND APPLICATION TO DIAGNOSIS OF SLEEP APNEA

399

Fig. 5. Switching algorithm state. Assuming the algorithm has switched to normal breathing state at t1s , the rise of et above the threshold ee at time te , for n/2 steps to tm a tch switches state to a motion event and triggers template matching at tm a tch . The fall of et below the threshold es for n steps triggers the switch to normal breathing at t2s .

Fig. 4. Sample sequence illustrating adaptive online breathing template construction and activity recognition. (1) Initially, a normal breathing template is constructed, (2) an apnea episode occurs and is identified using the templatematching model, (3) adaption of the existing breathing template continues, (4) a body movement is detected based on the template matching result - template mismatch, (5) a normal breathing template is reconstructed by the proposed adaptive online breathing template construction model, and (6) another apnea episode is detected using the template matching model with the newly constructed template.

to change body pose periodically during sleep, the breathing motion region changes over time too; the template model is therefore reconstructed after body pose changes. The method is related to MHI, described previously, but given the partial, noisy, and occasional signals, we use a simple binary template augmented with an online construction algorithm to produce a self-adapting normal breathing template based on an individual breathing behavior. A blank template is initially created, and when the state switches to normal breathing, the adaptive construction proceeds, until the algorithm switches to the motion event state. If the motion event is classified as a breathing event, which implies that the body pose remains the same, the previous template is retained and used when the state changes back to normal breathing. On the other hand, if the motion event is a body movement, a new template is created to capture breathing activity in the new pose. Fig. 4 illustrates the new adaptive online template construction process and recognition of events in a particular scenario. The template construction algorithm needs to capture intermittent and limited breathing motion signals while discarding noise. To suppress noise, the signals are included in the template only if they appear at least twice within a certain period, and are retained if they repeat reasonably often. The signals that stop

repeating are usually discarded, except when the number of signals on the template is low, in which case they are retained, as it is then more important to accumulate data than to avoid noise. The binary template Tt is updated at each time step t, using an auxiliary integer-valued cumulative image Ttg with values in the range [0, 255]. Defining the template quality level, qt = x,y (Tt ) as the number of set pixels in the template, each pixel of Tt is updated as follows: ⎧ g 255, if At = 1 and Tt−1 >0 ⎪ ⎪ ⎪ g ⎨ δ, if At = 1 and Tt−1 = 0 (8) Ttg = g g ⎪ >0 Tt−1 − , if At = 0 and qt−1 > λ and Tt−1 ⎪ ⎪ ⎩ 0 otherwise  g 1, if Tt (x, y) > δ Tt (x, y) = (9) 0, otherwise where δ = 100,  = 4, and λ = 0.0012 WH are empiricallydetermined parameters of the algorithm, and we have omitted the pixel indices (x, y) for brevity. The template quality threshold λ, is also ultimately used to determine whether the template contains sufficient information to be used for motion event classification. D. State Transition Rules The start point ts of a normal breathing cycle is triggered when the activity level, et , drops below the selection threshold λ for n = 10 time steps. The end te point of the normal breathing cycles is triggered when the activity level rises above the adaptive threshold, ee , given by ee = max(q(t∗ )ν, λ)

(10) ∗

where ν = 1.3 is determined empirically and t is a periodic time sample taken on every mth frame (m = 40 for 15 frames/s video clips). Hence, the activity level must rise by 130% within a short period ≤ m to indicate the end of a normal breathing episode. The template is compared with the activity map at time tm atch , after te , where the activity level has risen above the adaptive threshold, ee , for n/2 time steps; see Fig. 5. At frame 0, when no template has been built, ee is temporarily set as ee = λν 2 , which is defined empirically and soon replaced by the values generated based on the individual breathing pattern. Obtaining ee , es , the end time te to terminate the current template construction, the to template matching time tm atch , and the starting time tnew s

400

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 61, NO. 2, FEBRUARY 2014

TABLE I EFFECTIVE VALUES OF MODEL PARAMETERS

build the next normal breathing template are defined below with a stabilizing factor n for the switching status, which is defined empirically (n = 10) te = arg min t

(11)

t∈T 1

old where T1 = {t : et ≥ ee and t > told s } and ts is set to 0 in the beginning

n  − 1| 2 t

tm atch = arg min | t∈T 2

(12)

l=t e

where T2 = {t : et ≥ ee and t > te } tnew s

= arg min |n − t∈T 3

t 

1|

(13)

d = te − t s ⎧ ⎪ ⎨ o1 , if et m ≥ θm or d ≥ θd action = o2 , if d ≥ θd /2 ⎪ ⎩ o3 , otherwise

(18) (19)

where thresholds θm = κ WH and θd = βF , where κ = 0.26, β = 4.6 are defined empirically. G. Adjustable Parameters

E. Action Recognition by Template Matching The motion events are classified using one of the two techniques, based on the breathing template if it is usable (i.e., qt ≥ λ), or the activity level otherwise. The activity map is compared with the template using a normalized matching score, s = w2 /w1 , where w1 is the proportion of the template intersecting the activity map, and w2 the proportion of the activity map not intersecting the template  (T A)  (14) w1 = T  (∼ T A)  w2 = (15) A s = w2 /w1 .

(16)

The action is classified using two empirically defined thresholds, γ1 = 0.03 and γ2 = 0.004 action =

When the breathing template is insufficient to support matching (e.g., due to shallow breathing), qt < λ, we instead classify the motion events using the duration d and the activity level value et m

l=t m a t c h

where T3 = {t : et ≤ es and t > tm atch }.

o1 ,

F. Simple Action Recognition Model

if s ≥ γ1

o2 ,

if γ2 ≤ s < γ1

o3 ,

if s < γ2

(17)

where o1 is a body movement event, o2 an apnea event, and o3 a deep breathing event. The matching score is a measure of the degree of novelty of the action with respect to the template, normalized for both the activity map and the template size; the time complexity is O(p) where p ≥ q.

The algorithm has a number of adjustable parameters. An initial set of effective parameter values was heuristically determined; then, each parameter was experimentally varied in turn. The operating values for these were determined using three video clips from the simulated datasets, which contains various events including overbreathing and body movement events. Where parameters produced effective performance over a range of values, the most effective value was chosen for each parameter. Table I illustrates the range of effective parameter values. The reported results utilize the values (α = 10, λ = 0.0012 WH, γ1 = 0.03, γ2 = 0.004, ν = 1.3, n = 10, κ = 0.26, β = 4.6). The algorithm is not very sensitive to the settings of most of these parameters provided they are within the effective range; we discuss the more sensitive parameters below. The front end motion detector parameter α influences the motion detection results. When α is small (e.g., α = 6), more motion is captured, as is noise; when α is too high, all motion is filtered out. As a result, the selection of α is important and can influence the settings of other parameters such as λ. An effective range of (8 ∼ 10) was identified, and a large value (α = 10) chosen to filter out high IR noise. Another important and relatively sensitive parameter, λ, determines whether to use the template-matching method or the simple action recognition model. A range of λ values were tested (0.00105 ∼ 0.00165), and a low value (λ = 0.00117) was selected in order to utilize the template as often as possible. Other parameters (γ1 , γ2 , ν, n, κ, β) are set using the mean of the effective range.

WANG et al.: UNCONSTRAINED VIDEO MONITORING OF BREATHING BEHAVIOR AND APPLICATION TO DIAGNOSIS OF SLEEP APNEA

401

The same parameter values were used successfully on two separate datasets, which have significantly different environmental settings—-including the illumination, camera viewpoint and angle, camera distance to the subject, and bed and clothing configuration—which indicate that the proposed method is robust. III. EXPERIMENTS We have evaluated the new technique in identification of normal breathing, apnea events, and body movement events. The apnea events are identified as the overbreathing event that occurs at the end of every apnea episode. The body movement is also used as an indicator of waking up by clinicians: if a body movement directly follows an overbreathing event, it supplies additional evidence of an apnea episode. Consequently, the evaluation of the proposed technique is based on detection of the overbreathing events and body movement events. Two datasets are used in our evaluation: the simulation dataset and the clinical dataset, which use different models of camera, and in different settings with different camera positions with respect to the subject. The simulation dataset (15 video clips) features actors simulating a wide variety of motion and body movement events. This allows us to evaluate a range of scenario with various occlusion levels, body poses, body movements (i.e., minor head movement, limb movement, body rotation, and slight torso movement), breathing behavior (e.g., shallow versus heavy breathing, mouth breathing, chest breathing, and abdominal breathing) and sequences of linking events (i.e., apnea–body movement and body movement–apnea). Two Sony IR camcorders (DCRHC-30E) were utilized, with three different shooting angles, at 15 frames/s and a resolution of 320 × 240. In order to simulate the sleep-lab environment, there was no visible lighting in the filming room and the subjects were partially covered by a sheet. The experimental data were collected from two subjects with three main postures (i.e., lying on the back, lying on one side facing the camera, and lying on the other side with their back to the camera). The data were collected on different days, from multiple camera positions, with the subjects wearing different clothing. The activities, such as normal breathing, obstructive apnea, and body movement, were simulated by the subjects. Furthermore, one of the subjects has shallow breathing patterns. To produce a reference standard, the experimental video contents were manually marked by a human observer who defined all motion events except for deep breathing events, including the frame numbers of the beginning and end of each event. The deep breathing activity is marked as normal breathing. The clinical evaluation system is installed in the sleep lab of the Lincoln County Hospital. The video system contains three IR cameras: two wall-mounted cameras on each side of the bed targeting on the upper body of the patient from different angles, and one on the ceiling capturing the full body view. In these experiments, the wall-mounted cameras were used. Three symptomatic subjects (one severe and two moderate) and six nonsymptomatic subjects were recruited to spend one night sleeping in the sleep lab for 8 h video recording. For the

Fig. 6. Experimental results of the simulated data: action classification outcome and reference standard.

symptomatic data, five video clips are randomly sampled from the 8 h recordings of the severe OSA patient; four video clips are randomly sampled from the moderate OSA sufferers (two from each). Each clip lasting 15 min, containing 22500 frames. Six video clips are randomly sampled, one from each of the nonsymptomatic subjects. To produce a reference standard, the data were manually marked by the author, who is trained by the medical experts from the Lincoln County Hospital to identify apnea episodes. The output of the algorithm is a list of apnea and body movement episodes, each with the associated beginning and end frame numbers. These episodes are compared to the reference standard. We define an event to be correctly recognized if the majority of frames (> 85%) covered by the estimated event have the correct labeling. Fig. 6 illustrates the classification process on the simulation dataset. Fig. 7 shows the quantitative classification results in the form of a confusion matrices [24], [35], for both datasets. The rows represent the reference standard, the columns the algorithm’s results. On the simulation dataset, the diagonal average of the confusion matrix is 95.5%, demonstrating that the method achieves high accuracy in recognizing apnea episodes and body movements. We observe that the method misses apnea episodes occurring directly after a body movement episode, as shown in video clip 8, as it segments temporally contiguous episodes as

402

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 61, NO. 2, FEBRUARY 2014

TABLE II VAHI VALUES

Fig. 7. Confusion matrix of action classification on (a) Simulated data. (b) Clinical data.

Fig. 8. Template matching examples. Each row contains the raw image I, activity map A, online constructed action template T , matching result at a given time step (yellow: ∼ T ∧ A; blue: T ∧ ∼ A; white: T ∧ A). The upper three rows are apnea episodes, showing relatively low matching levels; the lower three rows are body movement episodes.

one. In practice, this scenario is implausible as apnea does not occur when the patient is awake, and body movement indicates waking up. On the other hand, if minor body movements happens right after an apnea episode (e.g., in video clip 11, and frequently in the clinical data), the method classifies the entire event as an apnea episode. In such cases, the human observer also defines the entire session as an apnea episode. For the clinical dataset, the diagonal average is 94%, demonstrating high accuracy in recognizing apnea and body movements episodes for the real clinical data. It is worth noting that the nonsymptomatic patients may experience some apnea episodes (a normal occurrence), and some such episodes were identified. Some template matching outputs for the clinical dataset are shown in Fig. 8. a) Classification of symptomatic and nonsymptomatic subjects: The apnea–hypopnea index is generally used for evaluation of the severity of OSA in PSG studies, and is calculated

as the average number of apneas (airflow during breath reduced by >90%) plus hypopneas (airflow during breath reduced by between 50% and 90%), per hour of sleep. It is normal for the nonsymptomatic subjects to have a few apnea episodes during sleep, and generally the pulse oximetry traces of the nonsymptomatic subjects also show a small number of oxygen desaturation episodes (ODI

Suggest Documents