Multimodal Information Fusion for Automated Recognition of Complex

Multimodal Information Fusion for Automated Recognition of Complex Agitation Behaviors of Dementia Patients Qiang Qiu, Siang Fook Foo, Aung Aung Phyo Wai, Viet Thang Pham, Jayachandran Maniyeri, Jit Biswas Institute for Infocomm Research, Singapore biswas@ i2r.a-star.edu.sg Abstract – This paper presents a new approach using multimodal information fusion for automated recognition of complex agitation behavior among persons with dementia. In particular, we present a hierarchical information fusion framework to model complex agitation behaviors based on a clinical agitation scale widely used in hospitals. We present the detailed features extraction and selection to represent low level atomic agitation behaviors, classifiers for atomic agitation behavior recognition and inference combiners to detect and rate the onset of complex agitation behaviors in dementia patients using multimodal sensors such as pressure sensors, ultrasound sensors, video cameras, acoustic sensors, etc., in a hospital ward. Keywords: Multimodal information fusion, Agitation

1

Introduction

There is an increasing interest worldwide in applying the latest developments in information and data fusion for both military [8, 9, 10] and nonmilitary applications [11, 12, 13]. The emergence of new sensor devices, advanced processing technique and improved hardware processing have made real-time fusion of data increasingly viable. Traditionally, data and information fusion systems are used extensively for target tracking, automated identification of targets, surveillance such as air, surface and space surveillance and intelligent transportation systems for collision avoidance, etc. Though there are some works applying information fusion approach to the healthcare industry such as multi-spectral mammography for tumor detection using multiple images [14], instrumented equipment to predict or estimate “health” from sensor data [15] etc., there is limited work taking into account the applicability of information fusion using multimodal sensors in a general fashion for clinical purposes.

Philip Yap Alexandra Hospital Singapore philip [email protected]

This paper is motivated by our work with a local hospital to develop a behavioral understanding system to automatically detect and rate the onset of agitation behavior in persons with dementia based on the widely used clinical scale called Scale for the Observation of Agitation in Persons with Dementia [18] or SOAPD. Agitation is a common difficult behavioral symptom of dementia, especially Dementia Alzheimer’s Type (DAT), that can distress and disable patients and caregivers, prompt institutionalization, and lead to physical and pharmacologic restraints. Agitation occurs at some time in about half of all patients with dementia. Associated behaviors include aggression, combativeness, disinhibition, and hyperactivity. SOAPD is useful for clinical trials to evaluate interventions (such as using an audiotape of special memories to reduce agitation and enhance withdrawn mood) for agitation in persons with DAT and to examine behavioral symptoms in the context of disease progression. Despite its value, SOAPD rating is highly non-trivial if administered in clinical settings, especially when several patients are rated at the same time. Much resource in terms of manpower and time is needed. Our work investigates the effectiveness of non-intrusive and automated surveillance for rating some aspects of SOAPD automatically and objectively. Not only will technology then replace the intensive manpower needed in the use of this tool, it will also extend the capabilities of monitoring for agitation in dementia to the patients’ own homes, thereby encouraging researchers to examine interventions at home that might decrease agitation and make a positive difference for patients with DAT. In this paper, we present on how we model and perform multimodal information fusion for coding some aspects of SOAPD automatically and objectively. Not only will technology then replace the intensive manpower needed in the use of this tool, it will also extend the capabilities of monitoring for agitation in demen-

tia to the patients’ own homes. The rest of the paper is organized as follows: Section 2 discusses the issues in modeling complex agitation behaviors and the information fusion approach we adopted. Section 3 describes the detailed features extraction and selection from single modality of sensors or multiple modalities of sensors to represent low level atomic agitation behaviors, and the atomic agitation behavior classifiers is presented in section 4. Section 5 presents the complex agitation behavior inference and some preliminary results we collected from a real test-bed being deployed in a local hospital are presented in section 6. Section 7 concludes with a discussion on future work.

2

Modeling Complex Agitation Behavior

We aim to provide valuable insights into the modeling process and fusion considerations when performing automated complex agitation behavior recognition using multimodal sensors such as pressure sensors, ultrasound sensors, video camera, audio sensors, etc. Using the SOAPD scale, we identify that there are seven categories of agitation behavior. Each category consists of many complex agitation behaviors. As an illustration, the up/down agitation behavior category consists of complex agitation behaviors such as rolling left and right, getting up and down, trying to get out, etc. The first challenge is to translate the qualitative complex agitation behavior model in SOAPD into quantitative computational behavior model for real time automated recognition. In order to do so, we use a layered hierarchical approach for modeling the agitation behaviors so that we can perform information fusion at each level of the hierarchy with ease and flexibility for future extension in case more sensor modalities are added. We first decompose the highly complex agitation behaviors into many atomic agitation behaviors. For example, for the category of up/down movement agitation behaviors, we decompose it into atomic agitation behaviors such as rolling left, rolling right, no rolling, sitting up, sitting down, etc., and in the category of outward movement agitation behavior, we further decompose it into atomic agitation behaviors such as kicking, hitting, etc. Each atomic agitation behavior can be modeled using features obtained from one modality of sensors or from multiple modalities of sensors. The feature selection process is important as it provides important base information for us to derive transformed features. Both the original and transformed features will then be used by classifiers such as Hidden Markov Model in the next stage for classifying atomic agitation behavior recognition. Finally, a higher level of complex agita-

tion behaviors that are more relevant to the doctors and the clinical scale can be deduced through Bayesian networks from the atomic agitation behaviors.

3

Features for Atomic Agitation Behavior Recognition

Each atomic agitation behavior is modeled using extracted features from the multimodal sensors such as pressure sensors, audio sensors, video cameras, ultrasound sensors, infrared sensors, etc. Each extracted feature can be an original observation from a single sensor, or transformed one from combined sensor observations. Due to the curse of dimensionality, given a fixed size of training sample in designing a classifier, additional features may possibly degrade the accuracy of the classification process. On the other hand, a reduction of the number of features may also lead to poor insufficient classification basis. It is very crucial to carefully adopt an “optimal” set of features for modeling the representation of each atomic agitation behavior. The “optimal” set of features usually are usually generated through a feature selection or/and feature extraction process. Feature selection refers to the method of using an algorithm to select the best input feature set, while feature extraction refers to the methods of creating new features set based on transformations or combinations of original feature sets [6]. While choosing the features to be extracted for atomic agitation behavior recognition of good discrimination ability, we take into account two important considerations. First, due to the bandwidth and energy constraints in wireless sensor networks, we try to avoid acquiring sensor observations that will eventually be discarded. However, due to the complexity of the diverse multimodal sensors being deployed, in practice, we may have to perform simple trials to assist in the process. Second, for future reuse and easy evaluation purposes, we should, whenever possible, try to maintain the physical meaning of each transformed/combined feature. With such considerations, some traditional methods to construct the feature sets may not be the ideal ones to be used in our SOAPD behavior classification system. For example, it is a common practice to extract the most expressive features from sensor observations using Principal Component Analysis (PCA), and then discard the transformed features with low discrimination ability. However, linear transformation during PCA can only be performed when sensor observations are fully acquired, and typically, some of those observations will be discarded after the PCA process. This may not be a good practice in wireless sensor networks environment, where constraints on network bandwidth

and battery energy are emphasized. Furthermore, the physical interpretation of each feature cannot be easily maintained through PCA. We adopt two criteria when generating the features to be used for atomic agitation behavior recognition - feature extraction based on particular agitation behavioral description and feature selection based on the quality of information conveyed.

3.1

Feature extraction based on body movements of an atomic agitation behavior

Intuitively, each atomic agitation behavior is different from others due to specific physical body movements associated with such atomic behavior. When such physical movements associated with an atomic behavior can be easily identified and described by transformed or combined sensor observations, it is usually a good idea to include those extracted features in the representation of the atomic behavior. We here present two successful examples we experienced. Example 1 (Representation of Kicking): Kicking behavior is differentiated from other atomic behaviors by a sequence of continuous movements of two legs; and the center coordinates of the two feet are obtained from video Skin Segmentation techniques. To include the transformed/combined features describing such unique leg movements, kicking behavior is represented in terms of variation of the following 5 scalars between adjacent video frames, ( two feet position, distance between feet, distance between two feet and head). We adopt a Hidden Markov Model (HMM) model [7] to recognize the atomic kicking behavior based on those chosen features. In this HMM model, we employ mixture multivariate Gaussian distribution for emission probability, and estimate all the model parameters by the Expectation-maximization (EM) algorithm from training data set. With this approach, the recognize rate for kicking behavior is above 90%. Example 2 (Representation of Rolling): Rolling behavior is usually identified by a sequence of continuous movements of whole body; and the current position of the body on the bed are observed by 9 pressure sensors under the mattress. A good representation of rolling behavior is the position variation of body centroid, which is a combined result derived from the 9 pressure sensors observations.

3.2

Feature selection based on the quality of information conveyed

When an atomic behavior is not obvious or is not sufficiently identified by a set of body movement descriptors, i.e., transformed or combined features, we proposed to include the most expressive original features,

i.e., raw sensor observations, based on conditional entropy and mutual information [4] to the atomic behavior representation and recognition. Let x be nature states, zi and zj be observations from sensor i and j. From experiments conducted, we found that the features selected based on conditional entropy H(x, zj |zi ) and mutual information I(x; zj |zi ) tend to give good classification rate, as these two factors become good indication here about how much missing information from sensor j can be predicted from sensor i, and how much unique information can only be observed by sensor i. It is important to note that the candidate features for the behavior representation are usually correlated, because they are observations from spatially correlated sensors, i.e., sensors deployed closely to observe the same underlying events. This proposed information quality based method for feature selection is a simplified version of entropy-based weight scheme [5]. Features Dependency Reward Weights : First, among correlated features, we should reward according to the ability of a feature, i.e., observations of a sensor, to enhance the certainty to others based on the information dependency. When observations of a sensor can enhance more certainty to other correlated sensors, usually observations of this sensor can be used for more accurate prediction of missing data at other correlated sensors on observed states. Such dependency reward is translated into a set of reward weights assigned to each feature. The weight assignment is through the following type of objective function proposed in [5]: r r (wij H(x, zj |zi ))2 Subject to wij =1 M inimize j∈N

j∈N

Here, conditional entropy H(x, zj |zi ), is chosen as the scalar to measure the ability of a sensor to enhance the certainty of others. By satisfying this objective function, sensors enhance more certainty to others, i.e., smaller conditional entropy, are entitled to higher weights. We can adopt a general histogram-based approach to estimate the conditional entropy. Let a group of n sensors be denoted as the set N = {1, 2, . . . , n}. Based on an optimal set of weights for this type of objective function derived in [5], we obtain the weight matrix W r ,  r  r r w1,1 w1,2 . . . w1,n r r r  w2,1  w2,2 . . . w2,n  Wr =   ... ... ... ...  r r r wn,2 . . . wn,n wn,1 where,

r = wij

H(x, zj |zi )2

1 k∈N

1 H(x,zk |zi )2

so the reward weight for feature j is, wjr =

1 n

i∈N

r wij

Feature Redundancy Penalty Weights : When evaluating correlated features, we should also penalize information redundancy among them. Mutual information, I(x, zj |zi ), is chosen as the scalar in the objective function to assign weights to each feature. As I(x, zj |zi ) = H(x|zi ) − H(x|zi , zj ), higher weights are assigned to sensors that gain less information from others. In the similar steps to compute reward weights wir , we derive the penalty weight wip for each feature. Feature Selection Criteria : For correlated features, by considering both dependency reward and redundancy penalty factors, we define the criteria for feature selection as

4

Atomic Agitation Classifier

Behavior

As illustrated, each atomic agitation behavior, e.g., kicking, hitting, rolling, etc., is represented in terms of multiple original or transformed or combined features, i.e., sensor observations. The objective of the atomic agitation behavior classifier is to establish decision boundaries in the feature space that separate different behaviors. Such decision boundaries are usually determined by the probability distribution of pattern belongs to each behavior, which is learned from the sensor data.

ci = I(x; z) · wir · wip When nature states, i.e., SOAPD atomic behaviors, can only be sufficiently be described by multiple spatially correlated sensors, it is shown that such qualitybased selection can be a good selection strategy. 1 Quality−Based Sensor Selection Entropy−Based Sensor Selection

0.9

HMM Recognition Accuracy

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

1

2

3 4 5 6 Number of Activated Sensors

7

8

Figure 1: HMM Recognition Accuracy for rolling behavior: Quality-Based Sensor Selection v.s. EntropyBased Sensor Selection As an illustration, we have deployed pressure sensors under the mattress, whose observations are likely to be correlated, to observe rolling behavior. We first activate the sensors one by one in the order decided by the value of entropy reduction, (H(x) − H(x|zi )), which is a good sensor selection criteria in certain applications, e.g., location tracking. Sensors whose observations introduce larger entropy reduction, i.e., add more certainty to the state estimation, have higher priority to be selected. We then activate sensors one by one in the order based on the proposed selection criteria, (H(x) − H(x|zi )) · wir · wip . Features, i.e., observations from Sensors, which are able to introduce more certainty, and infer more information about other correlated sensors, and contain more unique information, own higher priority to be selected. From Figure 1, we can observe that features, i.e., sensor observations, selected for the rolling behavior representation based on proposed criteria are of better discrimination power for classification, compared to the entropy based selection.

Figure 2: Recognition for Atomic Agitation Behaviors and Complex Agitation Behaviors As shown in Figure 2, our atomic agitation behavior classifier consists of two layers, the Hierarchical Hidden Markov Models (HHMM) layer and the Support Vector Machines (SVM) layer. For each atomic agitation behavior, one HMM is constructed based on observations from a single sensor modality. Though a HMM built on observations together from multiple sensor modalities may give better classification performance, we adopt the current design so as to allow sensor modalities to be operated without dependency as it is important for practical real time deployment and future expansion when new sensor modality is added. For example, we are allowed to switch off all ultrasonic sensors without affecting the classification done currently using the pressure sensors. For those ambiguous atomic agitation behaviors, i.e., the representation of those atomic behaviors are of low discrimination ability, the classification results from the corresponding HMMs are likely to be inaccurate. These erroneous classification results from a selected set of HMMs will be the inputs to the respective SVMs learned over those HMMs outputs in a larger time scale in order to reduce or eliminate the classification ambiguity.

4.1

Atomic Agitation Behavior HMM for the state estimation. Classifier K−1

The Hierarchical Hidden Markov Models (HHMM) layer shown in Figure 2 consists of many HMMs. For each atomic agitation behavior, one HMM is built based on features from each sensor modality. A HMM here serves as a probabilistic data fusion model to estimate the nature states, x, from sensor observation z. HMM example shown in Figure 3 is used to recognize “RollRight” behavior in the bed based on the observations from nine pressure sensors. It is important to note that, for simplicity, the behavior representation here is decided by selecting all relevant features of one modality, i.e., without feature selection or extraction invovled. With this example, the corresponding HMM θ = {A, B, π} is parameterized as following:

Figure 3: A HMM to recognize “RollRight” in bed Emission Probability (B): The nature states are defined as, x = {RollRight, U nknownAction}. Each state is manually assigned with some discrete value. Each nature state, a human behavior here, is described by a hidden variables xi whose value is not directly specified by the vector of sensor observations z from the nine pressure sensors, z = {z1 , z2 , . . . , z9 }, but is associated with a emission probability, i.e., p(z|xi ). In our model, a multivariate mixture Gaussian is used for the emission probability. Transition Probability (A): The sequence of nature states, a human behavior sequence, is associated with a matrix of transition probabilities to indicate the probability to go from one nature state i to another state j, i.e., p(xj |xj ). Initial state distribution (π): Initial states probability, i.e., p(xi ), are assumed to be the proportion of each behavior in the training data. We use a set manually labelled training data, denoted as Ztrain , to estimate the model θ = {A, B, π} through Expectation-maximization (EM) algorithm [3]. The parameters estimated by the EM algorithm locally maximizes p(X|Ztrain , θ)p(Ztrain |θ). X

As shown in the following equation, the objective of this HMM fusion model is to estimate the most likely state sequence between time [0, K], x = (1) , . . . , x (K) }, based on the corresponding sen{ x(0) , x sor measurements, z = {z (0) , z (1) , . . . , z (K) }. Here z (k) is a N-dimensional vector, i.e., the observations from N sensors at time k. We adopt the Viterbi Algorithm

p(x(k+1) |x(k) )

x = arg maxp(xi ) x

k=0

K

p(z (k) |x(k) )

k=0

It is noted that each SOAPD HMM Classifier is learned from one modality observation. Observations from multi-modalities sensors provide us multidimensional description on each atomic agitation behavior. However, while learning a classifier from combined multi-modalities observations, we also create rigid dependency among those sensor modalities. For example, given a HMM classifier learned from pressure, ultrasonic and video sensor observations together to recognize “rolling on bed”, during classification, this HMM classifier can only be activated when observations from all three modalities are present; and this classifier needs to be trained again whenever any of the previously included modalities are absent or additional modalities are to be included. Thus, to address this drawback, each atomic agitation behavior classifier is learned from single modality observation.

4.2

Hierarchical HMM Classifiers

Different atomic agitation behaviors exhibit various levels of difficulty to be classified based on sensor observations. In general, good classification performance can be obtained for a classifier when the value of data to be classified are grouped with small spread within the same class and separated with large margin among different classes. Thus, those actions associated with more dynamic observations patterns, such as kicking and hitting, are usually of low recognition accuracy, compared to those actions associated with more rigid observations patterns, such as, rolling. To make it possible that the successful classification of “easy” behaviors can help in improving the overall classification performance; we design agitation behavior classifiers in a hierarchical way. The nodes order in the tree shown in Figure 4(c) indicate the order to apply different HMM classifiers. The data associated with those successfully labeled atomic behaviors by the parent classifier will be filtered from the input to the children classifiers. Though the input data to the parent and children classifiers may not necessarily be the same as behaviors can be represented from different modalities, data to different classifiers should be indexed with the same time axis to make such hierarchical data filtering possible. When multimodality sensor observation are present, hierarchical classifiers have particular significance in the classification of randomly mixed behaviors, because a “difficult” behavior from one modality point of view can be an “easy” behavior from another modality. For example, pressure sensors will be triggered

Figure 4: Hierarchical HMM classifiers for Atomic Agitation Behaviors very clear observation patterns by behavior “rolling”, but almost blind to “hitting”. At the same time, hitting can be clearly recognized using video sensors based the coordinates of the hands color segments in the picture frame; however, rolling can often interfere in the recognition of hitting as many patients tend to move their arms while rolling, which may increase the falsepositive error rate in recognizing “hitting”. If we place the hitting (video) classifier as the child node of rolling (pressure) classifier, i.e., the input to the“hitting” classifier contains no observation while rolling, we can expect better recognition accuracy on “hitting”. The major potential drawback of the hierarchical classifier is that the misclassification at a parent classifier can affect the classification accuracy at all its children. Thus, in order to decide the structure of the tree, we need to first carefully evaluate each individual classifier through some validation processes, e.g., N-fold cross validation. Only those classifiers with sufficient classification accuracy, e.g., with 95% above classification rate, are appropriate to be parents of others, ands classifiers with higher classification accuracy will be closer to the root. Two extreme cases are also shown as Figure 4 (a) and (b). In Figure 4(a), all individual classifiers have sufficient classification accuracy, so classifiers can be operated orderly one by one; In Figure 4(b), none of the individual classifiers has sufficient classification accuracy to be parent of others, thus, all classifiers have to be operated in parallel.

4.3

subject to xi (wT φ(zi ) + b) ≥ 1 − ξi , ξi ≥ 0. Here the function φ(z) maps training vectors zi into a higher dimensional space. A kernel function K(zi , zj ) = φ(zi )φ(zj ) is used to specify such mapping. In our SOAPD classification system, additional SVM classifiers are constructed on the outputs of the HHMM layer to enhance the classification accuracy of those ambiguous behaviors. Some atomic SOAPD behaviors are ambiguous to be described based on single modality observations. Such ambiguity is mainly due to the fact that several different SOAPD actions could result in similar pattern of observations at a particular sensor modality. Very often, such ambiguity can be easily eliminated or reduced with observations from certain other modalities. Instead of building a separate HMM classifier for each ambiguous action based on multimodalities representation, which leads to unnecessary multimodality dependency, we recognize those ambiguous actions based on the misclassified HMM results through additional SVM classifiers. For example, SOAPD action “attempt to get out of bed” typically generate very similar observation patterns compared to pressure sensors as atomic agitation behavior “rolling” , and similar patterns to ultrasonic sensors as atomic agitation behavior “sitting up”. Therefore, the false-positive classification results from the “rolling” HMM classifier based on pressure sensor observations and “sitting up” HMM classifier based on ultrasonic sensor observations are good patterns to be classified as action “attempt to get out of bed” by the SVM classifier constructed on HMM classifiers results. In the SOAPD system, we choose radial basis function (RBF) kernel in our SOAPD system: K(zi , zj ) = exp(−γ||zi − zj ||2 ), γ > 0. In order to extend the SVM binary classification to k-class classification, we adopted the LIBSVM implementation [1] to perform C2k pair-wise 2-class classification.

5

Complex Agitation Behavior Inference In many situations, the knowledge we are interested

SVM Classifiers to Learn Ambigu- in may not be directly expressed by the classification ous Actions from HMMs results results on atomic agitation behavior; or the knowledge

Support Vector Machine (SVM) [2, 1] is a supervised binary classification algorithm. In SVM, given a labeled training set, {(z1 , x1 ), (z2 , x2 ), ..., (zn , xn )}, where zi ∈ RN is a feature vector, i.e., representation of an agitation behavior, and xi in −1, 1 is its binary class label, the following optimization problem is required to be sovled, n

1 ξi min wT w + C W,b,ξ 2 i=1

on the relationship on different atomic agitation behaviors in describing a patients’ complex behavior is lacking. Bayesian networks entitle us another powerful alternative to obtain the required information from the atomic agitation behavior classification results. Thus, we initially learn the Bayesian networks from labeled atomic behavior classification results. During the real monitoring, such trained Bayesian networks are used to obtain the knowledge on complex agitation behavior based on atomic behavior classification results.

5.1

Explore High Level Complex Agi- never. Therefore, it is more appropriate to customize a Bayesian network by learning from the training data. tation Behavior

In most situations, the doctors or caregivers are interested in the high-level description of patients’ agitation behaviors. Such high level behavior description is usually an overall conclusion on several atomic actions over a certain period of time. For example, outward motion behavior is an overall assessment on one category of aggressive atomic agitation behavior, such as, kicking, hitting, etc., and short-duration SOAPD agitation behavior is an overall assessment based on different agitation behaviors within 16 seconds. In order to derive high level behavior description, we employ Bayesian networks. By assuming that we have the knowledge of which atomic agitation behaviors are associated with each complex agitation behavior, we construct Bayesian networks by using atomic agitation behavior classification results as leaves, and place those high level behaviors to be inferred as the parents of relevant atomic agitation behaviors. One example Bayesian network is shown in Figure 5.

We measure the fitness of a Bayesian network for a particular patient by Bayesian Information Criteria (BIC), L(E, B) = −

m

log p(vi ) +

i=1

|B| log m 2

where, E: the data consists of m samples. p(vi ): the joint probability that the variable has the values specified by vi . |B|: the number of parameters in B. vi : an n-dimensional vector of values of the n variables.

6

Experimentation

We deployed an observational system in a hospital ward consisting of multiple modalities of sensors such as pressure sensors, ultrasound sensors, passive infrared sensors, video cameras, audio sensors, etc., to perform information fusion on the data collected from dementia patients as shown in Figure 6.

Figure 5: Bayesian Networks for Complex Agitation Behaviors

5.2

Explore Associated Modalities

In many situations, the structure of Bayesian Networks for high-level complex agitation behaviors inference using multimodal sensors is not obvious. The unknown structure of these Bayesian Networks is mainly due to two reasons. Firstly, we may not know in advance if a sensing modality can provide proper description on atomic agitation behaviors associated with a particular complex agitation behavior. For example, until some model validation process, we usually can not determine if observations from a particular set of ultrasonic sensors can be used to reliably recognize the atomic rolling on bed behavior. Usually, the methods for feature selection discussed in Section 3 can be used for such sensor modality association problem. Secondly, the same complex agitation behavior can be performed very differently from patient to patient, i.e., a complex agitation behavior can be associated with difference set of atomic agitation behaviors for different patients. For example, one patient tend to scream during agitation while another one

Figure 6: SOAPD Observational System in a Ward We also built a mock up test-bed with similar sensor modalities in our lab as shown in Figure 7. The objective of the mock up test-bed is to bridge the gap in obtaining meaningful and useful data for the information fusion process so that much better recognition rate and false alarm rate can be obtained. In the deployment process, we found that a patient may not display the full range of behaviors expected from agitated dementia patients as expressed in the SOAPD classification. Hence, mockup data collection can be generated in reasonable quantity in our laboratory as to be able to train the various feature detection and classification algorithms. Concurrently, the data collected from both settings are analyzed off-line and online. However, we will not go into the detailed of the deployment as it is described in [16, 17]. For our experiments, we are able to achieve more than 90% of recognition rate for atomic agitation behaviors such as rolling left, rolling right, getting up,

[4] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley, 1991. [5] O. A. Basir and H. C. Shen, “Interdependence and Information Loss in Multi-Sensor Systems”, Journal of Robotic Systems. 16(11), pp.597-612, 1999. [6] A.K. Jain, et al., “Statistical pattern recognition: a review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, 2000, 4-37. Figure 7: SOAPD Test-bed in the Lab getting down, kicking, etc. For complex agitation behavior recognition, we can achieve much better recognition rate which is about 92% with at least 50% reduction in false alarm rate in three of the complex agitation behavior categories. We do realize that it is still a huge challenge to detect the complete sets of complex agitation behaviors categories in an unconstrained environment based on SOAPD. Research is currently being done on how to further detect and classify the agitation behavior detection over time and with more modalities for information fusion so that the pattern recognition rate and false alarm rate can be improved.

7

Conclusion

In this paper, we present the challenges of using multimodal information fusion for automated recognition of agitation behaviors in persons with dementia. In particular, we present a hierarchical fusion framework consisting of detailed features extraction and selection from single modality of sensors or multiple modalities of sensors to represent low level atomic agitation behaviors, atomic agitation behavior classifiers, complex agitation behaviors inference using Bayesian combiners and some experimental results we obtained from a real test-bed being deployed in a hospital ward. We hope to extend the work to include more sensor modalities for information fusion so that successful recognition of complex agitation behaviors in an unconstrained environment can be achieved. We hope that our joint effort with a local hospital will enable us achieving our objective of deploying it pervasively in patients’ home.

References [1] C. C. Chang, and C. J. Lin, LIBSVM: a library for support vector machines 2001, Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. [2] V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995. [3] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.

[7] L. Rabiner, “A tutorial on Hidden Markov Models and selected applications in speech recognition”, Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989 [8] David L. Hall and James Llinas, Handbook of Multisensor Data Fusion, 2001 CRC Press LLC [9] Multi-INT Fusion Performance Study, Joint C4ISR Decision Support Center Study DCS-02-00, 2001 [10] Uhlmann, et al., Algorithms for multiple target tracking, American Scientist, 80(2), 1992 [11] Ying Zhu, et al., “Passing Vehicle Detection from Dynamic Background Using Robust Information Fusion”, IEEE Intelligent Transportation Systems Conference, 2004 [12] Abowd, et al., “The Aware Home: A Living Laboratory for Technologies for Successful Aging”, Proceedings of the AAAI Workshop and Automation as Caregiver, pp. 1-7. [13] D. Comaniciu, “Nonparametric Information Fusion for Motion Estimation”, IEEE International Conference on Computer Vision and Pattern Recognition, pp. 59-66, 2003 [14] Harold Szu, et al., “Early Tumor Detection by Multiple Infrared Unsupervised Neural Nets Fusion”, Proceedings of the 25th Annual International Conference of the IEEE EMBS, 2003 [15] http://www.hesmagazine.com/ [16] V. Foo, Q. Qiu, J. Biswas and A. Wai, “Fusion Considerations in Monitoring and Handling Agitation Behavior for Persons with Dementia”, International Conference on Information Fusion, 2006 [17] J. Biswas, et al., “Agitation Monitoring of Persons with Dementia based on Acoustic Sensors, Pressure Sensors and Ultrasound Sensors: A Feasibility Study”, International Conference on Ageing, Disabilities and Independence, 2006 [18] Ann C. Hurlet, et al., “Measurement of Observed Agitation in Patients with Dementia of the Alzhiemer’s Type”, Journal of Mental Health and Aging, 1999

Multimodal Information Fusion for Automated Recognition of Complex

Multimodal Information Fusion for Automated Recognition of Complex

Suggest Documents

Multimodal Information Fusion for Automated Recognition of Complex ...

An Architecture for Multimodal Information Fusion - CiteSeerX

Gated Multimodal Units for Information Fusion

Multimodal Sensor Data Fusion for Activity Recognition Using ... - MDPI

Fusion of HRR and SAR information for Automatic Target Recognition ...

Multimodal Image Fusion Algorithm Using Dual-Tree Complex ...

Texture and Shape Information Fusion for Facial Action Unit Recognition

Fusion Framework for Multimodal Biometric Person ... - IAENG

Interactive Multimodal Recognition Of Household

Feature Selection and Multimodal Fusion for

Fusion engines for multimodal input: A survey

Tensor Fusion Network for Multimodal Sentiment Analysis

Audio-Visual Multimodal Fusion for Biometric

MULTIMODAL BIOMETRIC RECOGNITION

fusion and coordination for multimodal interactive ... - CiteSeerX

An Architecture for Multimodal Semantic Fusion

Multimodal Emotion Recognition Using Multimodal Deep Learning

Dreaming Machines: On multimodal fusion and information ... - DROPS

Ontologies for Information Fusion

Multimodal microscopy for automated histologic ... - BioMedSearch

Dutch Multimodal Corpus for Speech Recognition

Hybrid Framework for Robust Multimodal Face Recognition

MULTIMODAL EMOTION RECOGNITION FOR AVEC 2016 ... - Limsi

Multimodal Algorithm for Iris Recognition with