Strategies for Inference Mechanism of Conditional Random Fields for ...

5 downloads 0 Views 511KB Size Report
Kuo-Chung Hsu, Yi-Ting Chiang, Gu-Yang Lin, Ching-Hu Lu,. Jane Yung-Jen Hsu and Li-Chen Fu. Department of Computer Science and Information ...
Strategies for Inference Mechanism of Conditional Random Fields for Multiple-Resident Activity Recognition in a Smart Home Kuo-Chung Hsu, Yi-Ting Chiang, Gu-Yang Lin, Ching-Hu Lu, Jane Yung-Jen Hsu and Li-Chen Fu Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan [email protected], [email protected], [email protected], [email protected] [email protected], [email protected]

Abstract. Multiple-resident activity recognition is a major challenge for building a smart-home system. In this paper, conditional random fields (CRFs) are chosen as our activity recognition models for overcoming this challenge. We evaluate our proposed approach with several strategies, including conditional random field with iterative inference and the one with decomposition inference, to enhance the commonly used CRFs so that they can be applied to a multipleresident environment. We use the multi-resident CASAS data collected at WSU (Washington State University) to validate these strategies. The results show that data association of non-obstructive sensor data is of vital importance to improve the performance of activity recognition in a multiple-resident environment. Furthermore, the study also suggests that human interaction be taken into consideration for further accuracy improvement. Keywords: smart home, activity recognition, multiple residents, conditional random field (CRF), data association

1

Introduction

Activity recognition, a mechanism in ubiquitous computing to facilitate the classification of human activities, is a key enabler to many context-aware applications in a smart home. For example, in order to target maximum comfort for residents, a smarthome system should be able to recognize their activities and to provide appropriate services whenever necessary (e.g. switching on a light if a resident is studying while the environment is dim). Another important application is healthcare; activity recognition can provide instant assistance for caregivers in the event of emergencies (e.g. diagnosing an abnormality of a resident and triggering an alarm if the resident sleeps much longer than that s/he used to do). Due to these benefits, many researchers have focused on how to efficiently sense the activity of residents. Numerous approaches to multiple-resident activity recognition in a house scale have been proposed in the literature. Generally speaking, these approaches include

asking residents to carry sensors[1](e.g., RFIDs, infra-red receivers), installing cameras in the environment[2-3], and deploying pervasive sensors[4-5](e.g., pressure sensors, reed switches) for later activity reasoning. What seems to be lacking, however, is the human-centric concerns since a smart home system, unlike a public space, should not ignore many factors regarding ergonomic concerns. For example, the camera-based solutions may invade residents’ privacy concerns, and wearable sensors usually cause inconvenience, both of which make the sensors not suitable for a smarthome environment targeting maximum comfort. Therefore, Lu et al. categorize deployed devices into seamless and seamful types to take residents’ ergonomic concerns into account, and they suggest making use of as much seamless devices as possible for a smart-home environment [6]. Accordingly, selecting suitable devices is nontrivial and using non-obtrusive and pervasive sensors for sensing information is suggested from a smart-home environment. Existing studies regarding non-obtrusive activity recognition primarily concentrate on an environment involving only a single resident [7-9]. However, in a multipleresident environment, the high complexity of tracking plural residents and the difficulty in modeling interpersonal interactions make multi-user activity recognition intractable. Among these challenges, the data association problem using nonobstructive sensors is of vital importance. The data association problem is defined as resolving by whom each sensor data are triggered. As a result of insufficient clues, much research addresses this problem using various filters. For example, Wilson and Atkeson propose a system for simultaneous tracking and recognizing the state “active” or “inactive” of multiple people[4]. In this paper, we propose and compare several approaches to resolving the data association for recognizing 15 activities which are frequently preformed of multiple residents. The inferred activities, if reliable, can be a strong clue to many applications. The automation and healthcare may be practical under a multiple-resident environment. The paper is organized as follows: Section 2 provides an overview of a dataset used to verify our approach. In the section 3, our problem definition and several strategies to train multiple-resident activity models are presented. Then, the experimental results and the comparison among various models are shown in Section 4. Finally, we summarize this paper in Section 5.

2

CASAS Dataset

The dataset, “WSU Apartment Testbed, ADL Multiresident Activities”, is collected in the CASAS project in the WSU smart workplace [10]. There are 26 files of different participant pairs containing sensor events and their corresponding annotations whereby we utilize to validate our proposed models. There are about 400 to 800 sensor events in each file, totally 17,238 events in this dataset. The data represent two residents being asked to perform 15 activities of daily living (ADLs), which are shown in Table 1. Of these activities, some are individual activities, which are performed by one resident and which are independent to the activities performed by other residents. On the other hand, cooperative activities are performed by one or more

(a)

(b) Fig. 1. The sensor deployment

residents, and multiple activities are dependent. The deployed sensors include motion sensors, item sensors and cabinet sensors (see Fig. 1.(a)). The output format of a sensor event and its annotation is (Date, Time, SensorID, Value, ResidentID, TaskID). The detail of the dataset can be found in [5]. Table 1. The 15 activities asked to perform by residents in CASAS project

Activity

3

individual Filling medication dispenser Hanging up clothes Reading magazine Sweeping floor Setting the table Watering plants Preparing dinner

Cooperative Moving furniture Playing checkers Paying bills Gathering and packing picnic food

Multiple-Resident Activity Recognition

The prior proposed approaches generally follow the following procedures for recognizing activities of multiple residents. First, sensor data are preprocessed and then transformed into feature vectors whereby the parameters of activity models are learnt. The transformed features are used to determine the data association variables and the determined variables can assist to infer activities. Fig. 2(a) shows the flowchart for the above-noted general multiple-resident activity recognition. For describing our models, several issues should be addressed. The first issue is the problem we intend to handle and the notation definitions. The second is how to preprocess sensor data. The third is the material of conditional random field (CRF) applied to resolve multiple-resident activity recognition. The fourth is the mechanism to

Preprocessing

Raw data

Data Association Recognizer

Activity Recognizer

Activity

(a) The flowchart for general multiple-resident activity recognition

Raw data

Preprocessing

Activity Recognizer

Data Association Recognizer

Converge

Activity

(b) The flowchart for iterative training data association and activity recognition Fig. 2. The flow chart of multiple-resident activity recognition iteratively gain the data association variables and the activities. The last is the decomposition inference given the data association. 3.1

Problem Formulation

The associated notations are introduced here in recognizing activities of two residents. Given n observations O1:n with each observation Ot =(Datet, Timet, SensorIDt, Valuet) at time t, let the preformed activities be A1:n and the data association sequence be θ1:n, where At=1,…,15 (each one is an index to an activity of interest) and θ1:n=1,2 which indicates that the observation Ot is triggered by the θt-th resident. To be unambiguous, activity At is defined to be performed by the θt-th resident at time t. The purpose of multiple-resident activity recognition is to determine A1:n and θ1:n given O1:n without prior knowledge between A1:n and θ1:n. 3.2

Data preprocessing

For each time slice, the preprocessed data are fed into the recognizer. Three types of preprocessed data are concerned: raw data, environment data and room-level data. The raw data refer to the data with the date and time removed from Ot. The environment data contain all sensed information in the house by continuously renewing the status of each sensor. For example, five sensors exist and the environment data at time t-1 is (0,1,0,0,1). At time t, a sensor event is triggered “on” by the third sensor. This event updates the environment data to (0,1,1,0,1) at time t. The room-level data are collected by manipulating the environment data and using domain knowledge to divide the whole experimental environment into several room-scale regions (see Fig. 1(b)) and then representing each room by a feature which is “on” if and only if one of the motion sensors in the room is “on”. 3.3

Conditional Random Field (CRF)

In this paper, we apply conditional random fields (CRFs) [11] for multipleresident activity recognition since they are discriminative, undirected graphical models which can be applied to sequence labeling problems. Given a sequence of observations, we can define feature functions in CRFs to model the dependency between two consecutive states in the problem domain. Let the activities we want to recognize be

the states and the collected sensor data to be the observations; we can apply CRFs on activity recognition. It has been shown that CRFs perform well on activity recognition problems [7, 12-13]. Let X and Y be two set of random variables representing the observations and the states respectively. We can define an undirected graph G(V,E) such that V corresponds to X and Y, and E represents the relationship (in the form of a feature function) between these vertices (random variables). If C is the set of cliques in G, in conditional random fields, we can define a condition probability

P(Y | X )  exp cC c fc (Yc , X c ) / Z ( X ) , where Yc and Xc are random va-

riables in a clique c in G, Z(X) is a normalization factor, fi’s are the feature functions, and λi’s are the parameters of this CRF. For example, Fig. 3 is a typical linear-chain conditional random field (LCRF) model. X={X1,...,Xn} is a observation sequence, and Y={Y1,...,Yn} is a state sequence. In this CRF model, there are two kinds of feature functions: ft’s for transition model and gt’s for observation model. The condition probability in this CRF model is

P(Y | X ) 

n 1  n 1  exp t ft ( yt , yt 1 )   t gt ( xt , y t )  Z(X ) t 1  t 1 

where   t t 1 and   t t 1 are the parameters to be learned from training n 1

n

data. Optimization methods like conjugate gradient [14] and improved iterative scaling [15] can be used to train CRFs. The most commonly used method is an improved quasi-Newton method called L-BFGS[16]. 3.4

Conditional Random Field with Iterative Inference

Generally, the multiple-resident activity recognition follows the procedure shown in Fig. 2(a). However, we observe that an estimated activity can be helpful and informative for data association recognizer and vice versa. Hence, a reciprocal procedure is designed to sufficiently exploit the closeness of the relationship between the two kinds of information (See Fig. 2(b)). The graphical representation of the CRF we propose is shown in Fig. 4, which associates the activities At with the data association variable θt and the observation Ot. For iteratively estimating activity and data associa-

Fig. 3. The graphical representation of the linear-chain conditional random field

Fig. 4. The graphical representation of the CRF for multiple-resident activity recognition tion based on the reciprocal procedure, three variant CRFs are utilized as shown in Fig. 5. The first model is designed to initialize the activity sequence A1:t (Fig. 5(a)), the second is designed to infer the data association sequence θ1:t given O1:t and A1:t (Fig. 5(b)), and the third is designed to infer the activity sequence A1:t by treating O1:t and θ1:t as observations (Fig. 5(c)). Among these structures of CRFs, the first and the third one are in the form of linear-chain CRFs (LCRFs). After the models are learned, the preprocessed data are used to generate A1:t(0) , and then iteratively infer the data association variables and the activities. The inference algorithm is shown in Algorithm 1. 3.5

Conditional Random Field with Decomposition Inference

The previous model describes the activities of both residents using a single model, which implies that two adjacent activities will be relevant even if they are performed by different residents, which is not a reasonable implication. Moreover, it seems un-

(a) The linear-chain CRF for initialization

(b) The CRF to infer the data association sequence 1:t

(c) The CRF to infer the activity sequence

A1:t

Fig. 5. Three CRFs used in conditional random field with iterative training

Algorithm 1 The inference algorithm in the CRF with iterative inference Input

observations (0)

Initialize A1:t (i ) 1:t

While A Infer



O1:t

given

O1:t by the CRF in Fig. 5(a)

( i 1) or 1:t

A

( i 1) 1:t

( i 1) 1:t

Infer A

 (i )   (i 1) 1:t

1:t

(i ) 1:t

by

O1:t and A

by the CRF in Fig. 5(b).

by

O1:t and 1:(ti 1) by the CRF in Fig. 5(c).

reasonable to implement a single chain to model multiple residents. In decomposition inference, we assume that the influence of interaction between residents is negligible; as a result, given the data association, it is practicable to use single-resident activity sequences to infer the activities of each person, then we can decompose a multiple-resident problem into sub-problems using single-resident models. In this paper, we tried to use two approaches, transition table and conditional random field with iterative inference, to determine the data association variables, which are addressed in the following. Transition Table If we know that a sensor event st at time t is generated by a resident, we may use this information to infer who generate the next event st+1 by the relationship between st and st+1. The easiest way to do this is to compute the transition probability between any two sensor events si and sj. Define the transition probability of two sensor events P( si  s j ) to be a value proportional to the count of adjacently triggered events (si and sj) of each person where si, sj are two sensor events. At each time slice, we assign the one with greater transition probability to the association. Conditional Random Field with Iterative Inference As aforementioned, a CRF with iterative inference generates both association and activity. This generated association is used to divide the whole problem into two single-resident activity sequences. The procedure can be seen in Algorithm 2.

Algorithm 2 The inference algorithm in the CRF with decomposition training Input Infer

observations



1:t

given

Decompose

O1:t

O1:t by Transition Table or CRF with Iterative Inference

O1:t into two groups according to 1:t .

Apply Linear-Chain CRF to infer A1:t according to the decomposed observation.

4

Experimental Results and Discussion

In this section we present the experimental results from the research. Each experiment is evaluated using the leave-one-out cross-validation of 26 files provided by the CASAS project. The first experiment compares the performance of several solutions for multipleresident activity recognition with respect to the three types of preprocessing, which includes raw data, environment data and room-level data, to validate the accuracy and investigate the tendency of data preprocessing. The second experiment shows the correlation between association and activities. Based on the CRFs with decomposition training, we couple them with different types of data association mechanisms and analyze the accuracy. 4.1

Experiment 1: Tendency of Data Preprocessing

Table 2 shows the averaged accuracy for the three models we are interested in, of which the best one is greater than the accuracy in [5] (0.6060), but still not satisfactorily accurate. Moreover, the CRFs decomposition method outperforms others, which shows that if we can properly separate the dataset, we can apply single-resident model to the multiple-resident problem. However, the insufficient accuracy may be due to the nature of the human activity pattern under the multiple-people condition. The CRF with iterative training is intuitively competent in those files with more interactions than the one with decomposition training. On the other hand, inferring the activities of two residents separately performs well in the case of less relation between the residents. However, the dataset is a mixture of individual and cooperative activities, thus causing insufficient accuracy. The accuracy for models coupled with distinct preprocessed data are also illustrated in Table 2. Surprisingly, the highest accuracy occurs while the model coupled with the raw data. This decline in accuracy for more manipulation is probably attributed to the sensitivity to noises. That is, taking all information into consideration may result in overfitting to the noises. Furthermore, the lower accuracy of room level data would seem to stem from little dependency between room-level features and their associated activities. Table 2 Average and variance pairs for distinct models and data preprocessing average /variance raw data environment data room level data

CRF with iterative training

CRF with decomposition (transition)

CRF with decomposition (iterative)

0.5067 / 0.0139 0.3323 / 0.0950 0.2449 / 0.0065

0.5953 / 0.1070 0.3026 / 0.0083 0.2333 / 0.0131

0.6416 / 0.0128 0.3425 / 0.0118 0.2655 / 0.0090

4.2

Experiment 2: Correlation between data association and accuracy

In this experiment, we compare the existing mechanisms and the CRF with decomposition given data association. As shown in Table 3, the accuracy of data association and the accuracy of activity are positively correlated. In addition, the low accuracy of the model coupled with transition table may be due to three factors: noise, smoothing and relevance of data association and activity. The transition accuracy may be highly sensitive to the noise. Moreover, if a transition does not appear in the training phase, data smoothing is required. Finally, according to the dataset, it suggests that some activities are performed by a specific resident. Table 3 The accuracies for distinct data association mechanisms

given association transition table iterative training

5

accuracy of data association 1.0000 0.5099 0.7287

accuracy/variance of activity 0.8137 / 0.0248 0.5067 / 0.0139 0.6416 / 0.0128

Conclusion

Multiple-resident activity recognition is an important yet challenging problem and we address this problem by applying CRFs with different strategies, which include different ways to train a CRF and to preprocess the dataset. We also investigate the importance of data association in multiple-resident activity recognition. In the first experiment, the CRFs decomposition method outperforms other classifiers we proposed, which shows that if we can properly decompose the dataset for each individual, we can apply single-resident model to a multiple-resident problem to some degree. However, the insignificant accuracy of the CRF with decomposition training indicates that totally ignoring the interactions among residents does compromise the recognition accuracy. Moreover, in this experiment, preprocessing the dataset results in low accuracies, which is contrary to our expectation. This result suggests that in preprocessing a dataset, we have to take into consideration some factors such as the noise in the dataset and the prior knowledge of the environment. In the second experiment, we show that data association in a multiple-resident environment is a vital issue. If we have a good algorithm for data association problem, the activity recognition accuracy may be improved efficiently. This study shows that data association is an inevitable issue in dealing with multiple-resident activity recognition. It also reveals that modeling human interactions is also necessary in making the system more reliable. We therefore encourage researchers to develop more appropriate solutions for these two critical issues so that the methods proven effective for single-resident activity recognition can be extended and applied to a multiple-resident environment.[17]

References 1 L. Wang, T. Gu, X. Tao, and J. Lu, "Sensor-Based Human Activity Recognition in a Multi-user Scenario " in Ambient Intelligence, ed, 2009, pp. 78-87.

2

3 4 5 6

7 8

9

10 11 12 13 14 15 16 17

L. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M. Barnard, and D. Zhang, "Automatic analysis of multimodal group actions in meetings," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 305317, 2005. N. Nguyen, S. Venkatesh, and H. Bui, "Recognising Behaviours of Multiple People with Hierarchical Probabilistic Model and Statistical Data Association," presented at the British Machine Vision Conference 2005. D. H. Wilson and C. G. Atkeson, "Simultaneous Tracking and Activity Recognition (STAR) Using Many Anonymous, Binary Sensors.," presented at the Pervasive, 2005. G. Singla, D. Cook, and M. Schmitter-Edgecombe, "Recognizing independent and joint activities among multiple residents in smart environments," Ambient Intelligence and Humanized Computing Journal, 2009. C.-H. Lu, C.-L. Wu, and L.-C. Fu, "Hide and Not Easy to Seek: A Hybrid Weaving Strategy for Context-aware Service Provision in a Smart Home," presented at the IEEE Asia-Pacific Services Computing Conference, Yilan, Taiwan, 2008. T. v. Kasteren, A. Noulas, G. Englebienne, and B. Krose, "Accurate activity recognition in a home setting," presented at the Proceedings of the 10th international conference on Ubiquitous computing, Seoul, Korea, 2008. J. Ye, A. K. Clear, L. Coyle, and S. Dobson, "On using temporal features to create more accurate human-activity classifiers," presented at the 20th Conference on Artificial Intelligence and Cognitive Science, UCD Dublin, Ireland, 2009. Y.-H. Chen, C.-H. Lu, K.-C. Hsu, and L.-C. Fu, "Preference Model Assisted Activity Recognition Learning in a Smart Home Environment " presented at the International Conference on Intelligent RObots and Systems, St. Louis, MO, USA, 2009. W. S. University. CASAS Smart Home Project Available: http://ailab.eecs.wsu.edu/casas/ J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," presented at the ICML, 2001 D. L. Vail, M. M. Veloso, and J. D. Lafferty, "Conditional random fields for activity recognition," in International Conference on Autonomous Agents and Multi-agent Systems, 2007. C.-C. Lian and J. Y.-J. Hsu, "Probabilistic Models for Concurrent Chatting Activity Recognition," presented at the In Proceedings of IJCAI-2009, Pasadena, 2009. J. R. Shewchuk, "An Introduction to the Conjugate Gradient Method Without the Agonizing Pain," 1994. J. N. Darroch and D. Ratcliff, "Generalized iterative scaling for log-linear models," presented at the Annals of Mathematical Statistics, 1972. J. Nocedal, "Updating Quasi-Newton Matrices with Limited Storage," Mathematics of Computation, vol. 35, pp. 773-782, 1980. T. Kudo. (2007, CRF++: Yet Another CRF toolkit. Available: http://crfpp.sourceforge.net/.