Automated Extraction of Temporal Motor Activity Signals From Video ...

676

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 4, APRIL 2005

Automated Extraction of Temporal Motor Activity Signals From Video Recordings of Neonatal Seizures Based on Adaptive Block Matching Nicolaos B. Karayiannis*, Senior Member, IEEE, Abdul Sami, James D. Frost, Jr., Merrill S. Wise, and Eli M. Mizrahi

Abstract—This paper presents an automated procedure developed to extract quantitative information from video recordings of neonatal seizures in the form of motor activity signals. This procedure relies on optical flow computation to select anatomical sites located on the infants’ body parts. Motor activity signals are extracted by tracking selected anatomical sites during the seizure using adaptive block matching. A block of pixels is tracked throughout a sequence of frames by searching for the most similar block of pixels in subsequent frames; this search is facilitated by employing various update strategies to account for the changing appearance of the block. The proposed procedure is used to extract temporal motor activity signals from video recordings of neonatal seizures and other events not associated with seizures. Index Terms—Adaptive block matching, block matching, motion tracking, motor activity signal, neonatal seizure, optical flow, segmentation, video recording.

I. INTRODUCTION

S

EIZURE occurrence represents one of the most frequent clinical signs of central nervous system dysfunction in the newborn [6], [21], [30]. These disturbances of cerebral function are often associated with significant long-term sequelae such as neurological impairment, developmental delay, and postnatal epilepsy [4]–[6], [19], [22], [24]. Thus, prompt recognition of neonatal seizures by nursery personnel is very important with regard to diagnosis and management of underlying neurological problems. The development of portable electroencephalographic (EEG)/video/polygraphic monitoring techniques has allowed investigators to assess and characterize neonatal seizures at the bedside and has permitted retrospective review [10], [24], [26]. Automated processing and analysis of video recordings of neonatal seizures can generate novel methods for extracting

Manuscript received September 5, 2003; revised September 26, 2004. This work was supported in part by the National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health (NIH), under Grant 1 R01 EB00183 and in part by the National Institute of Neurological Disorders and Stroke, NIH, under Contract N01-NS-2316. Asterisk indicates corresponding author. *N. B. Karayiannis is with the Department of Electrical and Computer Engineering, University of Houston, 4800 Calhoun, Houston, TX 77204-4005 USA (e-mail: [email protected]). A. Sami is with the Department of Electrical and Computer Engineering, University of Houston, Houston, TX 77204-4005 USA. J. D. Frost, Jr. is with the Peter Kellaway Section of Neurophysiology, Department of Neurology, Baylor College of Medicine, Houston, TX 77030 USA. M. S. Wise and E. M. Mizrahi are with the Peter Kellaway Section of Neurophysiology, Department of Neurology, and the Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030 USA. Digital Object Identifier 10.1109/TBME.2005.845154

quantitative information that is relevant only to the seizure. A video system based on automated analysis potentially offers several advantages. Infants who are at risk for seizures could be monitored continuously using relatively inexpensive and noninvasive video techniques that supplement direct observation by nursery personnel. This would represent a major advance in seizure surveillance and offer the possibility for earlier identification of potential neurological problems and subsequent intervention. Video recordings of neonatal seizures are highly redundant because infants may not move excessively in their beds while seizures affect specific parts of their bodies. Thus, the extraction of quantitative motion information from videotaped seizures must focus only on the moving part of the infant’s body that is affected by the seizure. This can be accomplished by two complementary procedures designed to extract temporal motion strength and motor activity signals from video recordings of neonatal seizures [12]–[15]. In principle, motor activity signals are obtained by projecting to the horizontal and vertical axes an anatomical site located on the body part affected by the seizure. The automated extraction of motor activity signals from video recordings of neonatal seizures requires a procedure that can select anatomical sites located on moving body parts and a procedure that can track the anatomical site of interest throughout the frame sequence. Motion tracking was performed in an earlier study [13] by the KLT algorithm [28], [29]. The KLT algorithm was generally successful when used to extract motor activity signals from video recording of neonatal seizures [13]. However, in some cases, the algorithm lost anatomical sites that were located on moving body parts tracked throughout the frame sequence. This paper presents the results of a study aimed at the automated extraction of quantitative motion information from video recordings of neonatal seizures. This study relied on optical flow computation and morphological filtering to select the initial location of anatomical sites located on the infants’ moving body parts. Motor activity signals were extracted in this study by relying on adaptive block matching to track the anatomical site(s) of interest throughout the entire frame sequence. II. EXTRACTION OF TEMPORAL MOTOR ACTIVITY SIGNALS This long-term goal of the project described in this paper is the development of an automated system for the recognition and characterization of neonatal seizures based on their video recordings. The system under development would involve

0018-9294/$20.00 © 2005 IEEE

KARAYIANNIS et al.: AUTOMATED EXTRACTION OF TEMPORAL MOTOR ACTIVITY SIGNALS

a single video camera for each bed in the neonatal intensive care unit. Each video camera would be connected to a dedicated PC or a server for data processing and analysis. In accordance with the structure of the seizure recognition system under development, the motion tracking method proposed in this paper was developed for video recordings acquired by a single camera mounted above the infant’s bed. The exact distance between the camera and the bed varied depending on the individual patient’s body size, but it was within the range of 0.9–1.5 m. The infant was positioned at the center of the bed just below the camera. The extraction of quantitative information from videotaped seizures must focus only on the moving parts of the infant’s body that are relevant to the seizure [13]. Neonatal seizures are events occurring in three dimensions, but they can be quantified by sequences of 2-D projections. More specifically, neonatal seizures can be quantified by projecting the location of selected anatomical sites to the horizontal and vertical axes. As the seizure progresses in time, these projections will produce temporal signals recording motor activity of the body parts of interest. Fig. 1 illustrates the mechanism that was used for generating temporal motor activity signals tracking the movements of different parts of the infant’s body during focal clonic and myoclonic seizures. Myoclonic and focal clonic seizures affect the infants’ extremities [13], [24]. Fig. 1 depicts a single frame containing the sketch of an infant’s body with four selected anatomand repreical sites. In this particular configuration, sent the projections of the site located on the left leg to the horizontal and vertical axes, respectively. The projections of the sites located on the right leg, left hand, and right hand are denoted by and and , and and , respectively. As the infant moves its extremities, the locations of the sites in the frame will change, as will the projections of the sites to the horizontal and vertical axes. Recording the values of the projections from frame to frame of the videotaped seizure will genand erate four pairs of temporal signals, namely for the left leg, and for the right leg, and for the left hand, and and for the right hand. For a given set of anatomical sites, each seizure will produce signature signals depending on its type and location. III. AUTOMATED SELECTION OF ANATOMICAL SITES LOCATED ON MOVING BODY PARTS The automated extraction of quantitative motion information from video recordings of neonatal seizures requires a procedure for selecting anatomical sites located on moving body parts. Such a procedure can be developed by estimating the motion velocities of the body parts and placing the anatomical site of interest on the moving body part with the highest velocity. In this study, the selection of anatomical sites of interest relied on optical flow computation, which was used to estimate the motion velocity fields at the frames of the video recordings. A. Quantifying Motion in Video by Optical Flow Methods Optical flow is the term used to indicate the velocity field generated by the relative motion between an object and the camera in a sequence of frames [9]. Optical flow provides important

677

Fig. 1. Extraction of temporal motor activity signals by projecting four selected anatomical sites to the horizontal and vertical axes.

information for analyzing motion in video. In the absence of any additional assumptions about the nature of motion, optical flow computation based on two successive frames is an ill-posed problem. A problem is called ill-posed if its solution is not unique and/or if its solution does not depend continuously on the data [1], [16]. Optical flow computation provided recently the basis for extracting quantitative motion information from video recordings of neonatal seizures [14], [15]. This approach to optical flow computation relied on elements of regularization theory and produced a methodology for the development of regularized optical flow computation methods based on a broad variety of smoothness constraints [15]. This methodology was used to produce new regularized optical flow methods by employing a smoothness constraint relying on spline functionals of the first and second order. For first-order spline functionals, this methodology leads to the optical flow computation method proposed by Horn and Schunck [9]. The Horn-Schunck method was employed in this work for selecting anatomical sites located on moving body parts. B. Selecting Anatomical Sites Based on Motion Velocity Fields The velocity fields produced by optical flow computation provide an estimate of motion in a sequence of frames. The areas of the video frames that contain moving body parts were segmented in this study by thresholding the magnitudes of the velocity vectors. This process produced velocity patches, which are composed of the pixels with motion velocities above a certain threshold. The frames produced by this segmentation process still contained spurious patches (i.e., small groups of pixels assigned high-velocity values by optical flow computation). The reduction of such spurious patches was attempted by relying on mathematical morphology [8], [20]. More specifically, spurious patches were reduced by employing the OPENING and CLOSING morphological operations. The OPENING of an object by a structuring element , denoted as , is the erosion of by followed by the dilation of , the result by . In mathematical terms, and denote the erosion and dilation of where by the structuring element , respectively. The the object CLOSING of an object by a structuring element , denoted , is the dilation of by followed by the erosion of as . the result by . In mathematical terms,

678


OPENING smoothes the contour of an object and breaks narrow bridges. CLOSING also smoothes the contour of an object. However, in contrast with OPENING, CLOSING fuses narrow breaks, eliminates small holes, and fills gaps in the contour. In this study, each segmented frame was operated first by the OPENING morphological operator followed by the CLOSING morphological operator. Both operators employed the same structuring element, which was selected to be a circle within a 5 5 square window. The initial location of the anatomical site selected for tracking was determined as 1) the center of the velocity patch with the largest area, and 2) the center of the velocity patch with the maximum average velocity. However, the tracking algorithm failed occasionally to track the anatomical sites selected according to both aforementioned schemes; this is because these schemes have the tendency to place the initial anatomical site in homogeneous areas of the moving body parts. The experiments indicated that tracking a site lying in a homogeneous area of the moving body part is a much more challenging task than tracking a site located in an area rich in texture, such as an area closer to the edges. Thus, the selection of an anatomical site for tracking can benefit from the results of a texture analysis performed for the region around the initial site location. In this study, texture analysis was performed for a window of size 5 5 centered at the initial site location. Texture analysis relied on the concept of the entropy, which was defined in terms of the co-occurrence matrix as described below [17]. be the intensity level at the location of Let between frames the frame , which is displaced by and . If the intensities of the pixel take values between entry of the co-occurrence matrix 0 and , then the is defined as the number of pixels for which and , with . The co-occurrence matrix is typically computed by selecting . The entropy is obtained for each pixel of interest by computing the . co-occurrence matrix over a processing window of size The entropy at a pixel of interest is defined as (1) where . According to its definition, the entropy takes its lowest values when computed in homogeneous areas of the frame. The entropy in (1) attains its highest values when computed in textured areas of the frame. In this study, the size of the processing window was 15 15. The feature selected for tracking was located at the point of the 5 5 window corresponding to the largest entropy value.

Fig. 2. Block matching illustration: Motion of a block of pixels is estimated by searching for the most similar block of pixels in a search window at subsequent frames.

tracking. This problem was overcome by systematically eliminating overlapping velocity patches by a labeling procedure identified in all subsequent frames, so that only patches representing anatomical sites located on different body parts were retained. IV. EXTRACTION OF MOTOR ACTIVITY SIGNALS FROM VIDEO BASED ON ADAPTIVE BLOCK MATCHING This section presents a brief review of block matching and outlines the adaptive block matching methods employed in this study to track selected anatomical sites throughout the entire frame sequence. A. Motion Tracking Based on Block Matching Block matching is a popular correlation-based approach to motion estimation [11] and tracking [25], [27]. Block matching relies on the assumption that a block of pixels remains constant over time and motion [11]. This assumption is valid only if the frame rate is sufficiently high. The site to be tracked is typically defined as the center of a square block of pixels (e.g., 11 11, 15 15), which is termed the reference block. The reference block is tracked by searching for the most similar block in subsequent frames according to some similarity measure (see Fig. 2). The search is typically constrained to a search window, which has to be appropriately chosen. A large search window allows accurate tracking of rapid movements that could be lost if the search window was smaller. However, a very large window also increases the likelihood of a mismatch. Moreover, increasing the size of the search window increases considerably the computational effort associated with block matching. B. Adaptive Block Matching Methods

C. Selecting Anatomical Sites for Multiple Moving Body Parts The proposed procedure was developed to perform tracking of multiple anatomical sites located on moving body parts. This tracking is necessary because neonatal seizures are frequently associated with motion of multiple extremities. Optical flow computation consistently identified anatomical sites located on moving body parts. However, in some cases, different velocity patches produced by optical flow computation were actually located on the same body part, which, thus, resulted in redundant

The reference block, to which the block of pixels is matched, has to be updated to take into consideration the changes in the appearance of the target. Such an extension of block matching is often referred to as adaptive block matching. According to the adaptive block matching method, the reference block is tracked by searching for the most similar block in subsequent frames according to some similarity measure, such as the mean square error used in this study. Tracking of the reference block requires the adoption of a strategy to be used for updating the reference


block throughout the frame sequence. The update strategies employed in this experimental study are described as follows [25]. 1) Single-Frame Update Strategy: The single-frame strategy is the simplest update strategy that can be employed for adaptive block matching. According to this strategy, the reference block is replaced after every frames by the block of , the reference pixels at the current tracking position. If block is updated after every frame. On the other hand, the . If the reference reference block is never replaced if block is updated too often, then the site tracked may be lost because of the accumulation of errors due to camera jitter. If the reference block is not updated often enough, the site tracked may be lost again because the site tracked can change with time to a degree that no good match exists between the site and the reference block. 2) Multiframe Update Strategy: The multiframe update strategy searches for the best match in the current frame (say . The the th frame) based on reference blocks, with reference blocks employed by this update strategy are the best previous frames. Let be matching blocks found in the a candidate block of pixels in the current frame, and let be the best matching block in the th frame. According to the multiframe update strategy, the best match in the current frame is determined based on the following similarity measure:

679

the features throughout the given frame sequence. On the other hand, the weighted averaging involved in (4) tends to cancel the noise that might be present in the best matching blocks, which makes this update strategy resistant to noise. 4) Kalman Filtering Update Strategy: Kalman filtering adof dresses the general problem of estimating the state a discrete-time controlled process that is governed by the linear stochastic difference equation [2], [3] (5) with a measurement

generated by (6)

In (5), the matrix relates the state at the previous time to the state at the current step , whereas the step matrix relates the optional control input to the state . In (6), the matrix relates the state to the measurement at a given step . The random variables and represent the process and measurement noise, respectively. It is and are independent, white, and Gaussian. assumed that Let be the a priori state estimate at step given be knowledge of the process before step , and let the a posteriori estimate at step given measurement . The a can be updated as priori estimate (7)

(2) measures the similarity between and and are real weights. The weights can be determined to ensure that the search for the best match in the current frame is influenced more intensely by the best matching blocks found in the most recent frames. Such a scheme can be realized , with , and by computing so by setting that where

(3) 3) Finite Impulse Response (FIR) Filtering Update Strategy: The FIR filtering update strategy searches for the best match in the current frame based on a reference block obtained as a linear combination of the best matching blocks in frames. According to this update strategy, the the previous reference block for the th frame is obtained as (4) where is the best matching block in the th frame and are real coefficients. The name of this update strategy underlines the resemblance of (4) to a linear FIR filter. If , and , then the update strategy based on (4) reduces to the single-frame strategy described above. In general, the cocan be selected to be decreasing functions of to efficients ensure that the reference block resembles the best matching blocks in the most recent frames. The use of (4) ensures that the reference block adapts to the changing appearance of the site tracked, which makes this update strategy capable of tracking

The a posteriori estimate can be obtained in terms of through the update equation (8) where is called the measurement innovation or is the Kalman gain. Under certain assumpthe residual and at a given step can be calculated tions, the Kalman gain as [2], [3] (9) where is the measurement error covariance matrix. Kalman filtering can be used in adaptive block matching beat frame for the cause it can estimate the reference block in the previous frame. If the motion best-matching block of the block between two adjacent frames can be modeled by pure translation, the state transition matrix is the identity ma. According to this formulation, the reference trix, i.e., block for the th frame can be obtained as (10) where is the reference block at frame . For , for the th frame (9) indicates that the Kalman gain matrix can be calculated as (11) where is proportional to the identity. The a priori covariance is defined for the th frame as , matrix where . The a priori covariance matrix can be updated at step as (12) where is the a posteriori covariance matrix at step and is the process noise covariance. The a posteriori covariis then updated as . ance matrix

680


TABLE I RMS ERROR VALUES IN PIXELS COMPUTED FOR THE MOTOR ACTIVITY SIGNALS PRODUCED BY ADAPTIVE BLOCK MATCHING RELYING ON THE SINGLE-FRAME, MULTIFRAME, FIR FILTERING, AND KALMAN FILTERING UPDATE STRATEGIES FOR MYOCLONIC SEIZURES, FOCAL CLONIC SEIZURES, AND RANDOM INFANT MOVEMENTS

TABLE II RMS ERROR VALUES IN PIXELS COMPUTED FOR THE MOTOR ACTIVITY SIGNALS PRODUCED BY ADAPTIVE BLOCK MATCHING THAT RELIED ON REFERENCE BLOCKS AND SEARCH WINDOWS OF VARIOUS SIZES

TABLE III RMS ERROR VALUES IN PIXELS COMPUTED FOR THE MOTOR ACTIVITY SIGNALS PRODUCED BY THE FEATURE TRACKERS BASED ON DISPLACEMENT ESTIMATION AND ADAPTIVE BLOCK MATCHING FOR MYOCLONIC SEIZURES, FOCAL CLONIC SEIZURES, AND RANDOM INFANT MOVEMENTS

V. EXPERIMENTAL RESULTS This section presents some results of an experimental study, which relied on a large database containing a broad variety of neonatal seizures. This database, which was developed by the Clinical Research Centers for Neonatal Seizures (CRCNS), contains video/EEG/polygraphic recordings of several hundred individual seizures, which have been characterized and classified by a team of physicians in terms of their electrographic and behavioral features [23]. The CRCNS database contains simultaneous EEG and analog video recordings on a split screen. The camera was a Javelin Electronics “Chromachip,” model JE3362, with a National TV zoom lens - . A fixed zoom setting was used in the acquisition of the individual video recordings, which guarantees that the spatial resolution remained constant throughout each video recording. The analog video recordings were digitized using a PXC200A frame grabber. The temporal sampling rate was 30 frames/second, which is considered high enough to capture sudden and rapid motion. The digitized frames were of spatial resolution 352 240 pixels. After the elimination of the EEG recordings, the video recordings produced sequences of frames of size 203 240 pixels. No compression of any form was used at any stage of data acquisition and processing. Although this increased considerably the storage hardware required for this study, this choice was deemed

necessary to ensure that the results of the experiments were not affected by data distortion due to a lossy compression scheme. The performance of the feature trackers tested in the experiments is summarized in Tables I–III, which show the root-mean-square (RMS) error values obtained for video recordings of myoclonic seizures, focal clonic seizures, and random infant movements. The reference signals were obtained by manually tracking the anatomical sites located on moving body parts. The RMS error values represent the distance in pixels between the anatomical site tracked by the motion trackers tested in the experiments and that tracked manually every five frames for the entire frame sequence. Figs. 3–5 show the motor activity signals produced in some of the experiments summarized in Tables I–III. The locations of the moving body parts during the clinical event are shown in the first frame of each video recording that contains detectable motion. These frames are shown in Figs. 3(a), 4(a), and 5(a) together with the velocity patches obtained after the application of the morphological operators. The moving body part in each video


Fig. 3. (a) A selected frame from the video recording of a myoclonic seizure affecting the infant’s left hand; motor activity signals produced by adaptive block matching tested with a 50 50 search window and a 15 15 reference block by relying on (b) the single-frame update strategy, (c) the multiframe update strategy, (d) the FIR filtering update strategy, and (e) the Kalman filtering update strategy.

2

2

recording is shown within a box. The motor activity signals produced by the motion trackers tested in the experiments can be evaluated based on the coordinates of the anatomical site that was manually tracked throughout the entire frame sequence, which are shown in Figs. 3–5 as markers. A. Evaluation of Update Strategies The first set of experiments evaluated the performance of motion trackers relying on adaptive block matching when the reference frame was updated by the single-frame, multiframe, FIR filtering, and Kalman filtering update strategies. Adaptive block matching was tested with a reference block of

681

Fig. 4. (a) A selected frame from the video recording of a focal clonic seizure affecting the infant’s left leg; motor activity signals produced by adaptive block matching tested with a 50 50 search window and a 15 15 reference block by relying on (b) the single-frame update strategy, (c) the multiframe update strategy, (d) the FIR filtering update strategy, and (e) the Kalman filtering update strategy.

2

2

size 15 15. The search window was a square block of size 50 50. Adaptive block matching was attempted by employing the single-frame update strategy with . This update strategy produced the best results for . The multiframe update strategy was used in the experiments with and . The update strategy that relied on FIR filtering was tested using and the same set of weights used in the multiframe strategy. Table I summarizes the results produced by adaptive block matching for 24 video recordings from the CRCNS database. The outcome of this experimental study indicated that the performance of adaptive block matching depends rather strongly on the update strategy employed for the reference block. According to Table I, the highest RMS error

682


Fig. 5. (a) A selected frame from the video recording of a random movement of the infant’s left hand; motor activity signals produced by adaptive block matching tested with a 50 50 search window and a 15 15 reference block by relying on (b) the single-frame update strategy, (c) the multiframe update strategy, (d) the FIR filtering update strategy, and (e) the Kalman filtering update strategy.

2

2

values were produced when adaptive block matching relied on the single-frame update strategy. The performance of adaptive block matching improved in most cases when the single-frame update strategy was replaced by the multiframe or the FIR filtering update strategies. The computational overhead associated with FIR filtering was compensated by the performance gain realized in most cases when this update strategy replaced the multiframe update strategy. With few exceptions, the best performance was achieved when adaptive block matching relied on Kalman filtering for updates of the reference block. The interpretation of the RMS error values shown in Table I can be facilitated by Figs. 3–5, which show the motor activity signals

produced by the motion trackers mentioned above for three of the video recordings involved in this experimental study. In the myoclonic seizure shown in Fig. 3, the infant’s left hand is moving to the left and then toward the right of the frame between frames 140 and 180 (Fig. 3 shows only frame 142). The same movement was also observed between frames 190 and 230, but the amplitude of motion in this time interval was higher. This movement was captured by the temporal motor activity obtained as the projection of the moving body signal part to the horizontal axis. The motor activity signal obtained as the projection of the moving body part to the vertical axis indicates that the left hand is also moving toward the top of the frame. Manual tracking of motion in this video recording indicated that the temporal motor activity signals shown in Fig. 3 constitute a satisfactory representation of the actual motor activity along the vertical direction. According to Fig. 3, there are some significant differences between the motor activity signals produced by adaptive block matching when it relied on various update strategies. Manual tracking revealed that the infant’s left hand is moving to the left, then to the right, and again back to the left of the frame in the time interval between frames 140 and 230. The motion between frames 190 and 220 was captured and quantified by adaptive block matching when the reference block was updated using Kalman filtering. However, adaptive block matching failed to track the infant’s left hand between frames 190 and 220 when it relied on the other three strategies for updates of the reference block. Fig. 4 shows the temporal motor activity signals produced for a focal clonic seizure affecting the infant’s left leg. For this seizure, the single-frame and multiframe update strategies failed to track the feature after frame 23. This is apparent by comparing the motor activity signals shown in Fig. 4(b) and (c) with the coordinate markers produced by manual tracking. The failure of these two update strategies is also consistent with the corresponding RMS error values shown in Table I. Although the update strategy that relied on FIR filtering captured the rhythmicity of motion along the horizontal direction, the amplitude shown in Fig. 4(d) was not consistent with of the signal that observed in the video recording. Moreover, FIR filtering failed to track the infant’s left leg along the vertical direction as indicated by the signal shown in Fig. 4(d). According to Fig. 4(e), Kalman filtering exhibited the best performance among all four update strategies tested in the experiments. Manual tracking of motion in this video recording indicated that the update strategy relying on Kalman filtering produced motor activity signals that constitute an accurate representation of the actual motor activity observed in the video recording. These signals describe the rhythmic motion of the infant’s left leg along the vertical direction and, to a lesser degree, along the horizontal direction. Fig. 5 shows the temporal motor activity signals produced for a random movement of the infant’s left hand. The single-frame and multiframe update strategies failed to track the moving body part throughout the frame sequence. The FIR and Kalman filtering update strategies produced similar motor activity signals for this event as indicated by comparing Fig. 5(d) and (e). However, comparison of the motor activity signals shown in Fig. 5(d) and (e) indicates that Kalman filtering was more successful in tracking the motion of the


683

infant’s left hand along the vertical direction. Fig. 5(d) and (e) also reveals some important differences between the motor activity signals produced by random movements of the infant’s extremities and movements associated with myoclonic seizures. The motor activity signals produced for random movements are typically composed of “bell-shaped” segments. Such segments are consistent with the fact that random movements are typically smoother and slower than those associated with myoclonic seizures. B. Realization of Adaptive Block Matching This set of experiments investigated how the performance of adaptive block matching is affected by the sizes of the reference block and the search window. Table II shows the RMS error values obtained from the motor activity signals produced by adaptive block matching for a myoclonic seizure affecting the infant’s left foot. Adaptive block matching was tested with reference blocks of sizes 7 7, 11 11, 15 15, and 21 21. The size of the search window varied from 25 25 to 100 100. In all of these experiments, adaptive block matching relied on Kalman filtering for updates of the reference block. For a reference block of a fixed size, the RMS error values decreased as the size of the search window increased; this is because increasing the search window size increases the likelihood that it contains the target block. However, increasing the size of the search window above a certain threshold also increases the likelihood of mismatch; this explains the increase of the RMS error values observed as the size of the search window increased above a certain threshold. Similarly, the RMS error values decreased as the size of the reference block increased when adaptive block matching was tested with a fixed search window size; this is because increasing the size of the reference block enhances its uniqueness and, thus, reduces the likelihood of mismatch. However, increasing the size of the reference block above a certain threshold also increases the likelihood that the target block is not contained in its entirety by the search window; this is reflected by the increase of the RMS error values observed as the size of the reference block increased above a certain threshold. According to Table II, the highest RMS error values were obtained when adaptive block matching was tested with a search window of size 25 25 and reference blocks of various sizes. High RMS error values were also obtained when adaptive block matching was tested with a reference block of size 7 7 and search windows of various sizes. The lowest RMS error value was obtained for a reference block of size 11 11 and a search window of size 75 75. The second lowest value of the RMS error was obtained when adaptive block matching was tested with a reference block of size 15 15 and a search window of size 50 50. This latter combination was found to be computationally less demanding. Thus, it was selected for the realization of adaptive block matching employed for the rest of this experimental study. C. Evaluation of Adaptive Block Matching This set of experiments evaluated adaptive block matching and compared its performance with that of block motion estimation, which is the displacement estimation method employed by the KLT algorithm [28], [29]. Adaptive block matching was

Fig. 6. Stages of the procedure proposed for the automated selection of anatomical sites located on the moving body parts. Velocity patches before and after morphological filtering and sites selected for tracking in a video recording of a focal clonic seizure affecting (a)–(c) the infant’s right hand, (d)–(f) the infant’s right leg, and (g)–(i) the infant’s left hand.

tested with a 15 15 reference block and a search window of size 50 50. Adaptive block matching relied on Kalman filtering for updates of the reference block. Table III summarizes the results obtained in the experiments for 24 video recordings. According to Table III, the motion tracker based on displacement estimation failed occasionally to track the anatomical site of interest, a fact that is reflected by relatively high values of the RMS error. Displacement estimation was often outperformed by adaptive block matching, which produced considerably lower values of the RMS error. In the few cases in which displacement estimation outperformed adaptive block matching, the differences between the RMS error values were less significant. D. Automated Tracking of Multiple Body Parts Fig. 6 describes the process of selecting anatomical sites for tracking in the video recording of a focal clonic seizure affecting multiple body parts. Fig. 6(a)–(c) describes the selection of the site located on the infant’s right hand, which begins moving at frame 30. Fig. 6(a) shows the white velocity patches, which were obtained based on the motion velocity fields produced by the Horn–Schunck method. These patches contain the areas of the frame that were assigned motion velocity vectors of magnitude greater than a threshold. Fig. 6(b) shows the velocity patches obtained after the application of the morphological operators. In this particular case, the morphological operators eliminated one of the two patches, which also happened

684


Fig. 7. (a) Selected frames of the video recording of a focal clonic seizure affecting multiple body parts, (b) the new anatomical site on the infant’s right hand selected for tracking at frame 30, (c) the corresponding motor activity signal, (d) the new anatomical site on the infant’s right leg selected for tracking at frame 34, (e) the corresponding motor activity signal, (f) the new anatomical site on the infant’s left hand selected for tracking at frame 35, and (g) the corresponding motor activity signal.

30 of the sequence. Fig. 7(c) shows the temporal motor activity signals produced by tracking the anatomical site located on the infant’s right hand. The motor activity signals shown in Fig. 7(c) constitute a satisfactory representation of the actual motion as indicated by the coordinate markers produced by manual tracking of this anatomical site. At frame 34, the automated procedure also selected an anatomical site located on the infant’s right leg. Fig. 7(d) shows the two anatomical sites tracked at frame 34, whereas Fig. 7(e) shows the motor activity signal produced by tracking the site located on the infant’s right leg. The motor activity signal shown in Fig. 7(e) captured the rhythmicity of motion in the time interval between frames 0 and 140. However, adaptive block matching did not track the motion between frames 140 and 180, as indicated by the flat segment shown for this time interval in Fig. 7(e). The captured the rhythmicity of motion along the signal vertical direction. However, Fig. 7(e) indicates that there was some error between the estimated and the actual trajectory of the anatomical site located on the infant’s right leg. Fig. 7(f) shows the new anatomical site selected for tracking at frame 35 together with the other two sites already tracked by adaptive block matching. Fig. 7(g) shows the motor activity signal produced by tracking the anatomical site located on the infant’s left hand. According to Fig. 7(g), adaptive block matching captured the rhythmicity of motion of the infant’s left hand. However, this method underestimated the amplitude of the infant’s movements, which is apparent by comparing the motor activity signals shown in Fig. 7(g) with the coordinate markers produced by manual tracking. Overall, Fig. 7 indicates that tracking of the three anatomical sites throughout the frame sequence produced “saw-tooth-like” motor activity signals. This experimental outcome reveals the rhythmicity of the movements of the three body parts affected by this focal clonic seizure. VI. CONCLUSIONS

to have the most irregular shape. Fig. 6(c) shows the anatomical site located on the infant’s right hand, which was selected for tracking by adaptive block matching. Fig. 6(d)–(f) describes the process of selecting an anatomical site located on the infant’s right leg. Fig. 6(d) and (e) shows the velocity patches before and after the application of the morphological operators, respectively, whereas Fig. 6(f) shows the site on the right leg selected for tracking at frame 34. Fig. 6(f) also shows the site located on the infant’s right hand, which was tracked by adaptive block matching between frames 30 and 34. Note that this site moved to a location on the infant’s right hand that is closer to the boundary between the hand and the background. Finally, Fig. 6(g)–(i) describes the process of selecting an anatomical site located on the infant’s left hand. Fig. 6(i) shows all three sites that were selected for tracking by the proposed automated procedure. Fig. 7 shows the temporal motor activity signals produced by the proposed automated procedure for a focal clonic seizure affecting multiple body parts. Adaptive block matching relied on Kalman filtering for updates of the reference block. Fig. 7(a) shows the location of the three anatomical sites tracked by adaptive block matching at three selected frames of the sequence. Fig. 7(b) shows the anatomical site selected for tracking at frame

This paper introduced an automated procedure for the extraction of temporal motor activity signals from video recordings of neonatal seizures. This procedure employs an optical flow computation technique developed to select anatomical sites of interest located on moving body parts and a tracking method based on adaptive block matching. The proposed procedure was used to extract temporal motor activity signals from a database of video recordings of myoclonic seizures, focal clonic seizures, and random infant movements not associated with seizures. This paper showed that adaptive block matching can be used to extract motor activity signals from video recordings of neonatal seizures. The outcome of this experimental study indicated that the performance of adaptive block matching depends rather strongly on the update strategy employed for the reference block. More specifically, the update strategy that relied on FIR filtering outperformed both the multiframe and the single-frame update strategies. On the other hand, the multiframe update strategy performed better than the single-frame update strategy. Although the multiframe and FIR filtering update strategies are more effective than the single-frame strategy, they incur higher computational and memory cost. The use of Kalman filtering in this application is not computationally demanding because this method relies on a single update equation


for the reference block. Another advantage of this method is its reliance on a recursive update scheme, which makes it flexible and easy to implement. Finally, the experiments indicated that Kalman filtering was the most reliable among the update strategies employed in this study for block matching. Adaptive block matching was also compared with the displacement estimation method employed by the KLT algorithm. This method estimates the displacement of a block of pixels between successive video frames by relying on a block motion model involving pure translation. This comparison indicated that adaptive block matching was more accurate and reliable than displacement estimation. Nevertheless, the outcome of these experiments indicates that the reliability and accuracy of automated motion tracking can be improved by combining displacement estimation and adaptive block matching. Such a motion tracking scheme would rely on displacement estimation to initialize the search for the best matching block in the next frame, which would be performed by adaptive block matching. The data available for this project were recorded by a single camera. As a result, the proposed method would be capable of tracking motion that takes place in the 2-D plane that includes the infant’s bed. Since the videotaped clinical events take place in three dimensions, the proposed method may not be able to track movements of the infant’s extremities toward or away from the camera unless certain precautions are taken during data acquisition. More specifically, the method proposed to extract motor activity signals would capture and quantify only quantifiable motion, that is, the part of the movement that did not follow a trajectory exactly perpendicular to the camera. A review of the CRCNS database indicated that all clinical events contained quantifiable motion. In fact, the recording of quantifiable motion can be guaranteed by calibrating the distance between the bed and the camera and by positioning the infant in such a way that there is a nonzero angle between the camera and the infants’ extremities. Establishing a protocol for infant placement provides a simple and inexpensive alternative to the utilization of multiple cameras, a choice that would increase the cost of the video recognition and characterization system under development. The effectiveness of the automated procedure described in this paper becomes clear by observing the motor activity signals produced for a focal clonic seizure affecting multiple body parts. Such seizures are not uncommon in clinical settings and present a real challenge to the development of automated procedures for quantifying motion in their video recordings. Note that the automated procedure described in this paper selected good anatomical sites on the moving body parts and was successful in tracking these sites throughout the frame sequences. The outcome of this experimental study revealed the potential of the proposed procedure as a computational tool in the development of an automated system for seizure recognition and characterization, which is the long-term goal of this study. REFERENCES [1] M. Bertero, T. A. Poggio, and V. Torre, “Ill-posed problem in early vision,” Proc. IEEE, vol. 76, pp. 869–889, Aug. 1988. [2] S. M. Bozic, Digital and Kalman Filtering. London, U.K.: Edward Arnold Publishers, 1979. [3] D. E. Catlin, Estimation, Control, and Discrete Kalman Filter. New York: Springer-Verlag, 1989.

685

[4] R. R. Clancy and A. Legido, “Postnatal epilepsy after EEG-confirmed neonatal seizures,” Epilepsia, vol. 32, pp. 69–76, 1991. [5] J. H. Ellenberg, D. G. Hirtz, and K. B. Nelson, “Age at onset of seizures in young children,” Ann. Neurol., vol. 15, pp. 127–134, 1984. [6] G. M. Fenichel, Neonatal Neurology, 3rd ed. New York, NY: Churchill-Livingstone, 1990. [7] M. Ghanbari, “The cross-search algorithm for motion estimation,” IEEE Trans. Commun., vol. 38, pp. 950–953, Jul. 1990. [8] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2002. [9] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artif. Intell., vol. 17, pp. 185–203, 1981. [10] J. R. Ives, N. R. Mainwaring, L. J. Gruber, G. R. Cosgrove, H. W. Blume, and D. L. Schomer, “128-channel cable-telemetry EEG recording system for long-term invasive monitoring,” Electroencephalogr. Clin. Neurophysiol., vol. 79, pp. 69–72, 1991. [11] J. Jain and A. Jain, “Displacement measurement and its application in interframe image coding,” IEEE Trans. Commun., vol. COM-29, pp. 1799–1808, Dec. 1981. [12] N. B. Karayiannis, “Advancing videometry through applications: Quantification of neonatal seizures from video recordings,” in Proc. 14th Int. Conf. Digital Signal Processing, Santorini, Greece, Jul. 1–3, 2002, pp. 11–21. [13] N. B. Karayiannis, S. Srinivasan, R. Bhattacharya, M. S. Wise, J. D. Frost Jr., and E. M. Mizrahi, “Extraction of motion strength and motor activity signals from video recordings of neonatal seizures,” IEEE Trans. Med. Imag., vol. 20, pp. 965–980, Sep. 2001. [14] N. B. Karayiannis and G. Tao, “Extraction of temporal motion velocity signals from video recordings of neonatal seizures by optical flow methods,” in Proc. 25th Annu. Int. Conf. IEEE Engineering in Medicine and Biology Society, Cancun, Mexico, Sep. 17–21, 2003, pp. 874–877. [15] N. B. Karayiannis, B. Varughese, G. Tao, J. D. Frost Jr., M. S. Wise, and E. M. Mizrahi, “Quantifying motion in video recordings of neonatal seizures by regularized optical flow methods,” IEEE Trans. Image Processing, to be published. [16] N. B. Karayiannis and A. N. Venetsanopoulos, “Regularization theory in image restoration: The stabilizing functional approach,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 38, pp. 1155–1179, Jul. 1990. [17] C. Kermad and C. Collewet, “Improving feature tracking by robust points of interest selection,” in Proc. Vision Modeling and Visualization Conf., Stuttgart, Germany, Nov. 21–23, 2001, pp. 415–421. [18] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion-compensated interframe coding for video conferencing,” in Proc. National Telecommunication Conf., Nov. 1981, pp. 531–535. [19] E. D. Mellitis, K. R. Holden, and J. M. Freeman, “Neonatal seizures. II. A multivariate analysis of factors associated with outcome,” Pediatrics, vol. 70, pp. 177–185, 1982. [20] F. Meyer and S. Beucher, “Morphological segmentation,” J. Vis. Commun. Image Represent., vol. 1, pp. 21–46, 1990. [21] E. M. Mizrahi, “Neonatal seizures,” in Childhood Seizures: Pediatric and Adolescent Medicine, S. Shinnar, N. Amir, and D. Branski, Eds. Basel, Switzerland: Krager, 1995, vol. 6, pp. 18–31. , “Acute and chronic effects of seizures in the developing brain: [22] Lessons from clinical experience,” Epilepsia, vol. 40, pp. 42–50, 1999. [23] E. M. Mizrahi, R. R. Clancy, J. K. Dunn, D. Hirtz, L. Chapieski, S. McGuan, P. Cuccaro, R. A. Hrachovy, M. S. Wise, and P. Kellaway, “Neurologic impairment, developmental delay, and postneonatal seizures 2 years after EEG-video documented seizures in near-term and term neonates: Report of the clinical research centers for neonatal seizures,” Epilepsia, vol. 42 (Suppl 7), pp. 102–103, 2001. [24] E. M. Mizrahi and P. Kellaway, “Characterization and classification of neonatal seizures,” Neurology, vol. 37, pp. 1837–1844, 1987. [25] A. M. Peacock, S. Matsunaga, D. Renshaw, J. Hannah, and A. Murray, “Reference block updating when tracking with the block matching algorithm,” IEE Electron. Lett., vol. 36, no. 4, pp. 309–310, 2000. [26] D. Rector, P. Burk, and R. M. Harper, “A data acquisition system for long-term monitoring of physiological and video signals,” Electroencephalogr. Clin. Neurophysiol., vol. 87, pp. 380–384, 1993. [27] H. Smith, C. Richards, S. Brandt, and N. Papanikolopoulos, “Visual tracking for intelligent vehicle-highway systems,” IEEE Trans. Veh. Technol., vol. 45, pp. 744–759, Nov. 1996. [28] J. Shi and C. Tomasi, “Good features to track,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Seattle, WA, Jun. 1994, pp. 593–600. [29] C. Tomasi and T. Kanade, “Detection and Tracking of Point Features,” Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-CS-91-132, Apr. 1991. [30] J. J. Volpe, Neurology of the Newborn. Philadelphia, PA: Saunders, 1995.

686


Nicolaos B. Karayiannis (S’85–M’91–SM’01) was born in Greece on January 1, 1960. He received the Diploma degree in electrical engineering from the National Technical University of Athens, Athens, Greece, in 1983 and the M.A.Sc. and Ph.D. degrees in electrical engineering from the University of Toronto, Toronto, ON, Canada, in 1987 and 1991, respectively. From 1983 to 1984, he was a Research Assistant at the Nuclear Research Center “Democritos,” Athens, where he was engaged in research on multidimensional signal processing. From 1984 to 1991, he was a Research and Teaching Assistant at the University of Toronto. He is currently a Professor in the Department of Electrical and Computer Engineering, University of Houston, Houston, TX. He has published more than 130 papers, including 60 in technical journals, and is the co-author of the book Artificial Neural Networks: Learning Algorithms, Performance Evaluation, and Applications (Norwell, MA: Kluwer, 1993). His current research interests include biomedical imaging and video, computer vision, image and video coding, neural networks, intelligent and neuro-fuzzy systems, wireless communications and networking, and pattern recognition. Dr. Karayiannis is the recipient of the 1994 W. T. Kittinger Outstanding Teacher Award and the University of Houston El Paso 2000 Energy Foundation Faculty Achievement Award. He is also a co-recipient of a Theoretical Development Award for a paper presented at the Artificial Neural Networks in Engineering’94 Conference. He is an Associate Editor of the IEEE TRANSACTIONS ON NEURAL NETWORKS and the IEEE TRANSACTIONS ON FUZZY SYSTEMS. He also served as the General Chair of the 1997 International Conference on Neural Networks (ICNN’97), held in Houston, TX, on June 9–12, 1997. He is a member of the Technical Chamber of Greece.

Abdul Sami was born in Hyderabad, India, in 1981. He received the B.S. degree in electronics and communications engineering from the Gandhi Institute of Technology and Management, India, in June 2002 and the M.S. degree in electrical engineering from the University of Houston, Houston, TX, in May 2004. He is currently a Research Assistant in the Electrical and Computer Engineering Department, University of Houston. His research interests include computer vision, signal processing, image processing, and image analysis.

James D. Frost, Jr., was born in Porterville, CA. He received the B.A. degree in biology from Stanford University, Stanford, CA, in 1958 and the M.D. degree from Baylor University College of Medicine, Houston, TX, in 1962. From 1962 to 1963, he was an intern with The Methodist Hospital in Houston and completed a fellowship in Clinical Neurophysiology in 1965. He has been a member of the faculty at Baylor College of Medicine since 1963, and he is currently a Professor in the Department of Neurology and the Department of Neuroscience at Baylor. He is also Deputy Chief of the Neurophysiology Service at The Methodist Hospital and is a member of the medical staffs at Texas Children’s Hospital, St. Luke’s Episcopal Hospital, and the Harris County Hospital District. He established the first clinical sleep laboratory in the Houston area at The Methodist Hospital in 1971 and served as its medical director until 1997. Specific areas of clinical and research interest have included sleep disorders, epilepsy, and both basic and clinical neurophysiology. His current research interests include the development of automated methods for analyzing and interpreting EEG activity and investigation of the pathophysiology and treatment of childhood seizure disorders. Dr. Frost has received Scientific-Technical Contribution Awards from the National Aeronautics and Space Administration in 1970 and 1971 for research conducted during the Skylab program, and in 1974, he was awarded the NASA Medal for Exceptional Scientific Achievement for this work. He has been awarded several patents for devices providing improved methods for EEG acquisition and analysis. He has served on the editorial boards of the journal Clinical Neurophysiology and the Journal of Clinical Neurophysiology, and he has been a member of several National Institutes of Health review committees. He recently co-authored the book Infantile Spasms: Diagnosis, Management, and Prognosis (Norwell, MA: Kluwer, 2003). He is a member of the American Epilepsy Society, the American Clinical Neurophysiology Society, the Harris County Medical Society, and the Texas Medical Association.

Merrill S. Wise is Associate Professor of Pediatrics and Neurology at Baylor College of Medicine, Houston, TX. He trained as a pediatric neurologist and clinical neurophysiologist, and he serves as Medical Director of the Epilepsy Monitoring Unit at Texas Children’s Hospital. He has received National Institutes of Health funding in the areas of neonatal seizures and sleep disorders in children. His focus within the neonatal seizure field has involved seizures in the very low birthweight infant. With regard to epilepsy, he is involved with the characterization of seizures using EEG/video monitoring and with epilepsy surgery for intractable epilepsy. Dr. Wise is a member of numerous professional organizations, including the Child Neurology Society, the American Epilepsy Society, the American Clinical Neurophysiology Society, the American Academy of Neurology, and the American Academy of Pediatrics. He has helped develop a number of practice guidelines in the field of sleep medicine, and he is a reviewer for Pediatrics and SLEEP.

Eli M. Mizrahi received the B.S. degree in psychology from Emory University, Atlanta, GA, in 1971 and the M.D. degree from the University of Miami, Miami, FL, in 1975. He was an intern and a resident in pediatrics with Albert Einstein College of Medicine, Bronx, NY, a resident in neurology (pediatric neurology) with Stanford University Medical Center, Stanford, CA, and a post-doctoral fellow in clinical neurophysiology at Baylor College of Medicine, Houston, TX, under the direction of Peter Kellaway, Ph.D. In 1982, he joined the faculty at Baylor. He is Head of the Peter Kellaway Section of Neurophysiology, Professor of Neurology and Pediatrics, and Vice-Chairman, Department of Neurology, Baylor College of Medicine. He also serves as Chief, Neurophysiology Services at The Methodist Hospital and St. Luke’s Episcopal Hospital and Chief, Neurophysiology Laboratory Services, Texas Children’s Hospital, all located in Houston, TX. He directs the Baylor Comprehensive Epilepsy Center and the Clinical Research Center for Neonatal Seizures, both based at The Methodist Hospital. A main focus of his research has been on neonatal seizures. He has investigated the clinical aspects of characterization and classification, electroencephalographic and EEG-video features, pathophysiology, and therapies. He has worked in collaboration with other investigators at Baylor and the University of Houston on computer analysis of both EEG and video imaging of neonatal seizures. Dr. Mirahi received a clinician-scientist investigator award (K08) from the National Institutes of Health (NIH) early in his career, followed by additional NIH funding throughout his career. He was awarded the Michael Prize from the Stiftung Michael, Bonn, Germany, in 1988 and the American Epilepsy Society/Milken Family Medical Foundation Clinical Research Award in 1992. He has served as the Chair of the Professional Advisory Board for the Epilepsy Foundation of America and President of the American Clinical Neurophysiology Society.

Automated Extraction of Temporal Motor Activity Signals From Video ...

Automated Extraction of Temporal Motor Activity Signals From Video ...

Suggest Documents

Extraction of motion strength and motor activity signals from video

Extraction of Temporal Motion Velocity Signals from Video Recordings ...

Automated Extraction of Inundated Areas from Multi-Temporal ... - MDPI

Automated crop field extraction from multi-temporal Web Enabled

Automated Extraction of Semantic Concepts from ...

Application of automated sentiment extraction from ...

Automated Extraction of VTE Events From

Automated Extraction of Cranial Landmarks from Computed

Temporal Feature Extraction from Temporal ... - Group of Logic

Automated Information Extraction from Empirical Software Engineering ...

Automated Metadata Extraction from Art Images

Automated Knowledge Extraction from the UMLS

Automated Metadata and Instance Extraction from ... - public.asu.edu

Automated Concept Extraction from Plain Text

Automated Feature Classification and Knowledge Extraction from

Spectral Spatio-Temporal Template Extraction from ...

Intelligent Visual Descriptor Extraction from Video Sequences

Automated Body Modeling from Video Sequences

Keyword Extraction from Educational Video ...

Automated Video Exposure Assessment of Repetitive Hand Activity ...

Automated Activity Detection as a Pre-processing stage of Video ...

blind source extraction of heart sound signals from lung sound ...

Logo Extraction from Audio Signals by Utilization of ...

Extraction of event-related signals from multichannel ... - IEEE Xplore