An activity of daily living primitive–based recognition ... - SAGE Journals

3 downloads 0 Views 795KB Size Report
Activity of daily living, activity recognition, discrete sensor data, activity of daily living primitive, recognition ... sensors to collect raw activity data stream in the field.
Research Article

An activity of daily living primitive–based recognition framework for smart homes with discrete sensor data

International Journal of Distributed Sensor Networks 2017, Vol. 13(12) Ó The Author(s) 2017 DOI: 10.1177/1550147717749493 journals.sagepub.com/home/dsn

Rong Chen1 , Danni Li2 and Yaqing Liu1

Abstract The proven approach successfully recognizes the activity of daily living is a classifier training on feature vectors created from streamed sensor data. However, there is still room to improve feature extraction techniques in that the activity of daily living data are often nominal or ordinal. The ordinal data can be likely less discriminative due to the great uncertainty in level of measurement. This article provides a framework with novel activity of daily living primitive that introduces an enhanced feature selector with linear time complexity. The extension to traditional approaches is that the present framework considers the following: (1) defining activity of daily living primitives and constructing a primitive vocabulary, (2) reducing data when representing raw activity data, and (3) selecting an appropriate primitive set for each testing activity. The empirical results reveal that a pre-trained portable primitive vocabulary not only outperforms the existing baseline frameworks but also greatly facilitates the deployment and management of activity recognizers. Keywords Activity of daily living, activity recognition, discrete sensor data, activity of daily living primitive, recognition cost and portability

Date received: 31 August 2017; accepted: 23 November 2017 Handling Editor: Jui-Pin Yang

Introduction Activity recognition systems deployed in smart homes are characterized by the abilities of detecting human actions and their goals and thus providing them with the decent assistance. Such assistive technologies have been adopted nowadays by smart homes and healthcare applications in practice1,2 and have delivered promising results for offering either more caring services to an elderly resident3–5 or responsive assistance in an emergency situation.6 Technology advances will continue to be more deeply integrated into our daily lives. There are various sensors to collect raw activity data stream in the field of daily activity recognition, and a huge amount of sensor data are certainly generated. Good examples are cameras,7 wearable sensors,8 and radio-frequency

identification (RFID) sensors,9 which often yield continuous data. Also, there are non-obtrusive and pervasive sensors that transparently distribute throughout the space to generate discrete sensor data of on-off type. The famous database Center for Advanced Studies in Adaptive Systems (CASAS),10 collected by Washington State University (WSU) for a long time,

1

Information Science and Technology College, Dalian Maritime University, Dalian, China 2 Beijing Zhuanzhuan Limited Company, Beijing, China Corresponding author: Rong Chen, Information Science and Technology College, Dalian Maritime University, Dalian 116026, China. Email: [email protected]

Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (http://www.uk.sagepub.com/aboutus/ openaccess.htm).

2 contains abundant experimental data which has seen great application value in many research works.11,12 There are a lot of researchers who study deeply on proposing more powerful classifiers to improve the accuracy of activity recognition. These classifiers mostly derive from several mainstream ones, to name a few, native Bayesian (NB),13 hidden Markov model (HMM),14 support vector machine (SVM),15 and conditional random fields (CRF),16 and they characterize in different aspects, for example, NB has a remarkable advantage of recognition speed due to its simple calculation principle, while CRF has better contextual power than others. As classifier researchers known, the Weka is a collection of machine learning algorithms for data mining tasks,17 which is possible to be applied to train activity models and recognize activities. By balancing advantages and disadvantages of all kinds of classifiers mentioned above, we choose CRF in Weka to be experiment classifier for activity model training and recognizing tasks. Although obtaining the supply of CRF, under the same classifier ability condition, how to extract activity features always restricts recognition framework’s ability. Thus, the process of constructing activity features is another key part in recognition architecture design. According to the different types of sensor data, such as the continuous data recorded by accelerators, the discrete data collected by non-intrusive sensors, and video and audio data filmed by cameras, the features can be vectors,18 events,19 and spatial and temporal information.20 By facing various activity data types, there are several corresponding ways in activity recognition field to extract features, which are as follows: (1) feature vector construction: by putting continuous data stream into sequence of time sliding windows, some researchers use statistical data in each window such as maximum, minimum, mean as feature elements which are gathered according to some particular assembly sequences. (2) Sensor event marking: if a sensor is triggered, that sensor event will be marked 1, otherwise, it will be marked 0. The marked results become features to be used for training and recognizing. (3) Position tracking: participant is tracked by cameras when he or she is executing activities, whose spatial information will be computed to set some spatial and temporal constraints as activity features. However, the feature extraction techniques mentioned above, yet mainstream and promising, are largely more sensitive to scalar values of activities than ordinal ones. This issue may destroy the portability of activity recognition architecture because ordinal measurement can be erroneous in some cases. Unlike the available techniques, M Zhang and AA Sawchuk21 proposed a concept of motion primitive to construct a bag of feature-based activity recognition framework for continuous sensor data (i.e. signals) instead of discrete ones (events). The difference is important as many

International Journal of Distributed Sensor Networks statistical and data mining algorithms can handle one type but not the other. Though their motion primitives often lack real physical meaning, this framework can theoretically be an all-purpose tool in that it represents different activities by a ‘‘vocabulary’’ instead of scalar values. In this article, we are motivated to propose the idea of activity of daily living (ADL) primitive for smart homes that are associated with discrete data and to apply them to practice. We create an activity recognition framework based on ADL primitive (AR-ADLP) to recognize discrete home activities, thus to improve recognition accuracy and framework portability. The remainder of this article is organized as follows. Section ‘‘Related work’’ introduces some related works. Section ‘‘Overview of AR-ADLP framework’’ gives an overview of AR-ADLP framework. Section ‘‘ADL primitive mining’’ details the processes of mining and modeling activities. Section ‘‘Activity pattern selection’’ presents the activity pattern selection method of testing data. In section ‘‘Experiments,’’ we report on experimental setting and results, and offer some discussions. Finally, in section ‘‘Conclusion,’’ we draw our conclusions.

Related work In the field of non-video activity recognition, the mainstream feature extraction technologies to represent activity are classified into two categories: feature vector construction and sensor event marking. The former is usually applied to process continuous activity data, such as the data collected by accelerometer and gyroscope,22 whereas the latter is often used in discrete sensor data which is always recorded through nonobtrusive and pervasive sensors.23 Zhan et al.24 extracted feature vectors to represent activities from continuous acceleration data over sliding windows. Sun et al.25 extracted low-level signal features from accelerometer and Global Positioning System (GPS) data first, and then used Dirichlet process Gaussian mixture model (DPGMM) to map feature vectors. Finally, they demonstrated the effectiveness of their recognition method. However, for the discrete data collected in smart home environment, the methods of feature vector construction used in continuous data are usually not appropriate. The sensor event marking technology is more suitable to extract features and represent activities for discrete data. To name a few, Krishnan and Cook19 characterized different activities using different window lengths of sensor events and added past contextual information into features, which leaded to better performance for daily activity recognition. Wen and Zhong26 treated the activity patterns as the triggered sensor event sequences without specific sensor state values and proposed similarity measure algorithm to recognize daily activities which was able to achieve

Chen et al. handsome recognition performance. Hong and Nugent27 applied a sensor segmentation algorithm to part a stream of time series sensor events and inferred sensor segments to recognize with good accuracy. The aforementioned works leveraged some extracted representative information as features to present activities, but the portability of the activity models is weakened because a lot of numerical and state values are involved in building their activity features. Some famous methods have this disadvantage, for example, in Bao and Intille’s28 research, mean, energy, and frequency-domain entropy and correlation of accelerator values were calculated and used as activity elements. In the similar way, Stikic et al.29 divided accelerometer data stream into a lot of time sliding windows and calculated the values of mean, variance, energy, and the like to construct 96-dimensional feature vectors to present activity. As another example, Zhang et al. considered mean, variance, correlation, and entropy as the basic features in their study. Features like zero crossing rate, mean crossing rate, and first-order derivative are also considered because these features have been successfully applied in similar recognition problems, while they innovatively included physical features. All these features are extracted from both three-axis accelerometer and gyroscope.30 Although these representation activity methods achieved good results, unlike our approach, they had a common flaw—too many specific sensor values involved, which was the key of reducing portability. Once the sensor data supplier(s) is(are) changed, the activity models would have to be constructed again. However, less is known about this point in the field of activity recognition; based on this kind of consideration, we propose a novel method using pretrained ADL primitives in order to improve recognition system’s portability. In addition, most of previous researches moved activity features into classifier to train activity models directly without any other processes. Lee and Cho31 calculated the difference between previous acceleration and current acceleration, average acceleration, and standard deviation of acceleration to be features, which were inputted into ME model to infer activities without any other operations. Tapia et al.32 assumed that temporal information, in addition to which sensors fired, would be as activity features, and these features outputted the evidence entered into the nodes of the naive Bayesian network directly. Without special process after extracting features, there is still a lot of noisy data and unnecessary attributes in training classifiers, which may reduce recognition accuracy likely. As opposed to these works, we add an activity reduction module into recognition framework to delete data, what we think useless for recognition. In this reduction module, raw activity data represented by the corresponding primitive set includes some invalid data which are not

3 involved in its primitive set. When finishing this step, the amount of activity labels trained and recognized can be reduced and the recognition accuracy will be improved deservedly.

Overview of AR-ADLP framework We incorporate the concept of ADL primitive into traditional activity recognition framework to transfer the classical activity recognition into the problem of string matching, thus a novel recognition framework—ARADLP—is created.

Preliminary To express more effectively, we first describe some terms such as home activity data, ADL primitive, and primitive judging threshold. Definition 1. Home activity data. A smart home environment is based on the deployment of n binary sensors, and these sensors are organized in a sensor set: S = fs1 , . . . , sn g. Home activity data are defined as the sequence that is m triggered sensor events in length: D= fsi1 , si2 , . . . , sim g, s:t: sij 2 S, 1 j\m, j = 1, . . . , m. Definition 2. ADL primitive. In the raw activity data, there are many sensor event pairs that are constituted by two successively triggered sensor events: \sij , sik ., s:t: 1\k  m, j\k: We define a sensor event pair as an ADL primitive when the frequency of sensor event pair occurrence in a certain activity is over primitive judging threshold Jthresh (i.e. times(\sij , sik .) Jthresh ). The aim of primitive judging threshold Jthreh is to select representative sensor event pairs to be ADL primitives.

AR-ADLP architecture The essential components of the AR-ADLP framework are shown in Figure 1, where the arrows represent the steps of activity recognition. In point, AR-ADLP and ADLP have similar stages—training stage and recognition stage in the framework; however, the difference is that AR-ADLP extends ADLP with ADL primitive construction and data reduction. During the training stage, the labeled training activity data are inputted into ADL primitive construction module to generate the primitive vocabulary by counting the event pairs that frequently co-occur in the labeled data streams. From this naive vocabulary, the data reduction procedure is invoked for an activity to choose its own primitives that are triggered more often by that activity. Through the process of data reduction, some primitive strings will be removed if they have less

4

International Journal of Distributed Sensor Networks

Figure 1. Architecture of AR-ADLP framework.

intersection with the data streams labeled as a concrete activity, so primitive strings can be transferred into relatively clean ones, which are trained to learn activity models in the final step. In the recognition stage, we render the raw unlabeled data using the same method to build primitive sets. Next is the key part of this stage—pattern selection—which calculates similarities between the primitive set of testing data and all primitive sets of training data. Specifically, the distance of two sets is calculated by equation (1), which is used to calculate similarity as an element of equation (2). After similarity calculations, we select the training activity pattern with the maximum of similarities as the testing activity pattern. Finally, represented and reduced testing data can be recognized by referring trained activity models X

dist(Xap , Yap ) = +

X

cfactor(Xap , Yap ) ffi vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P 1 u0 P u count(Xap (i)) count(Yap (i)) 2 u i2Xap ^Yap i2Xap ^Yap B C  =u A t@ len(Xap ) len(Yap ) ð3Þ

count(Xap (i))

i2Xap ^i62Yap

count(Yap ( j))

ð1Þ

j2Yap ^j62Xap

where Xap (i) stands for the ith ADL primitive in the primitive set of activity X, and count(Xap (i)) stands for the counter of Xap (i) that is marked 1, where the ith primitive is included in Xap but not included in Yap (i.e. i 2 Xap ^ i 62 Yap ) similarity(Xap , Yap ) = 1 

where len(Xap ) stands for the primitive number of primitive set Xap . There are two situations after calculating similarities, namely, successful match and unsuccessful match. Testing data can get a training activity pattern directly (i.e. we can get only one maximum of similarities), which we call successful match. If there are two or more maximums of similarities, we need to introduce equation (3) as a correction factor in order to operate unsuccessful match

dist(Xap , Yap ) max (len(Xap ), len(Yap ))

ð2Þ

The value cfactor of Xap and Yap is smaller, the more similar they are. Equation (4) is extended to become a new similarity equation when encounter the unsuccessful match situation  similarity(Xap , Yap ) = 1 

 dist(Xap , Yap )  max(len(Xap ), len(Yap ))

(1  cfactor(Xap , Yap )) ð4Þ

In order to keep the form of testing data consistent with training data, after pattern selection module, the

Chen et al. raw testing data are reduced using its selected primitive set. Last but not least, the usable testing data are inputted into CRF classifier with the activity models trained in the last stage, and we can get the final recognition results that can be evaluated by evaluation measures. Based on the description above, the main points of AR-ADLP framework are primitive vocabulary construction and pattern selection in testing stage. In addition, compared to the traditional activity recognition framework, the framework we propose defines sensor event pairs as the basic activity elements, thus transfer the complex activity recognition problem into a string matching one. AR-ADLP is very different from Mi Zhang’s motion primitive. In Mi Zhang’s framework, the activity representation method is based on intuitive statistics simply ignoring temporal relationship between primitives. In contrast, AR-ADLP framework not only includes temporal relationship in ADL primitive design stage but also introduces a reduction module to decrease interference of noisy data and unnecessary attributes.

ADL primitive mining In this section, we introduce the algorithms used in the process of ADL primitive mining. We exploit the method of how to mine activity features in order to construct ADL primitive, which can transfer the complex activity recognition problem into a string matching one, thus to decrease the difficulties of training and testing.

Algorithm 1. ADL primitive construction. Input: The raw activity data: D = fsi1 , si2 , . . . , sim g and the sensor set S = fs1 , . . . , sn g. Output: The primitive vocabulary: V. 1. Initialize sensorPairMatrix; 2 D do 2. Foreach \sij , sij + 1 .P 3. sensorPairMatrix \sij, sij + 1.; 4. endForeach 5. Foreach e 2 sensorPairMatrix do 6. If e . Jthresh then 7. add e to V; 8. endIf 9. endForeach 10. Return V;

Algorithm 1 is introduced as the procedure of ADL primitive construction. The algorithm starts from an initial solution building a n3n matrix: sensorPairMatrix containing all sensors (Line 1). A certain element sensorPairMatrix stands for the triggered times of sensor event pair \sij , sij + 1 . in the activity D = fsi1 , si2 , . . . , sim g, s:t:, 1  j\m (Lines 2–4). Element e is the member of sensorPairMatrix, if e is

5 over primitive judging threshold Jthresh, which can be considered as an ADL primitive, the corresponding sensor event pair will be added into the primitive vocabulary V (Lines 5–9). We designed triggered successive sensor event pairs as primitives, which are pooled together to be a commonly used primitive vocabulary for all activities. For a concrete activity, a subset of the primitive vocabulary is selected by calling data reduction—Algorithm 2 that removes irrelevant primitives by checking whether a primitive occurs in the labeled data streams that represent that activity (Lines 6–8) and whether their occurrence (i.e. times(\sj, sj + 1.)) is over an expected threshold (Line 9).

Algorithm 2. Data reduction. Input: The labeled activities LA = {la1, la2, ., lal} and their raw activity data: D = fsi1 , si2 , . . . , sim g associated with each labeled activity lai, and the primitive vocabulary V. Output: The primitive vocabulary Vi associated with each lai. 1. Foreach lai 2 LA do 2. Vi = {}; 3. Foreach \sij, sik.2V do 4. times(\sij, sik.) = 0; 5. endForeach 6. Foreach sj2D and sj + 12D do 7. If \ sj, sj + 1 . 2 V then increase times(\sj, sj + 1.); 8. endForeach 9. If times(\sj , sj + 1.) . Jthresh then add \sj , sj + 1. to Vi; 10. endForeach 11. Return Vi;

Algorithm 3. Activity pattern selection. Input: The testing primitive set of activity X: Xap ; The set of training primitive sets: 1 2 k A = fYap , Yap , . . . , Yap g. Output: The pattern selection result: patternId. 1. Initialize simArray; i 2. Foreach Yap 2 A do i 3. simArray(i) calculate similarity between Xap and Yap ; 4. endForeach 5. If unsatisfied selection condition then i 2 A do 6. Foreach Yap i 7. newSimArray(i) correct similarity between Xap and Yap ; 8. endForeach 9. endIf 10. patternId the serial number of the maximum of simArray or newSimArray; 11. Return patternId;

Activity pattern selection In this section, the procedure of activity pattern selection is organized to become Algorithm 3 in recognition stage, which plays a decisive role in whether unlabeled

6 testing data can select activity pattern appropriately. This algorithm calculates similarities between a testing primitive set Xap and all elements of the set of training 1 2 k primitive sets A = fYap , Yap , . . . , Yap g to find an activity pattern for activity X. In Line 1, Algorithm 3 creates an array named simArray to record similarities, and the length of simArray is k that is same with the element number of set A. From Lines 2–4, k similarities between primitive set Xap and each element of A are calculated using equation (2) and recorded as simArray elements. If we cannot select the only one maximum from k similarities, what we call unsatisfied selection condition, these k similarities will be corrected using corrected factor (i.e. equation (3)) to become new k similarities using equation (4) and recorded in newSimArray (Lines 5–9). Otherwise, this situation is a satisfied selection condition. Finally, patternId is set at the serial number of the maximum of simArray or newSimArray (Line 10).

International Journal of Distributed Sensor Networks Table 1. Types of sensor data. Sensor ID

Function

M01   M26 I01   I05 I06 I07 I08 D01 AD1-A, AD1-B AD1-C Asterisk

Motion sensor Item sensor Medicine container sensor Pot sensor Phone book sensor Cabinet sensor Water sensor Burner sensor Phone sensor

1.

The total activity recognition accuracy rate reflects the comprehensive recognition ability of classifier. When we use this metric to evaluate, we divide activity labels recognized into two categories—correct recognition labels and incorrect recognition labels. The equation of rate is as follows

Experiments Our experiments use activity database of daily living collected by Cook and Edgecombe33 in WSU Apartment Test bed, and our recognition results will be compared with Cook’s results. In order to evaluate validation of AR-ADLP framework, we set two evaluation criteria, namely, the total activity recognition accuracy: rate and the standard deviation of activity recognition accuracy: srate. The former is to measure the chance of being correct of recognition framework we propose, and the latter is to measure the balance ability among various activities. In the rest of this section, we introduce two experiments in detail. The first experiment mentions of two aims to set proper values to experimental parameters, and the second one runs whole process of AR-ADLP framework to demonstrate its recognition performance.

Sensor data Various types of sensors provide user’s actions of different types, such as some motion sensors give us direct location information of users and some item sensors tell us whether users touch the objects. There are totally 39 sensors used in gathering raw experimental activity data, and their IDs and functions are summarized in Table 1.

Evaluation metrics In order to validate the effectiveness of AR-ADLP framework, we introduce two evaluation metrics, namely, the total activity recognition accuracy rate and the standard deviation of activity recognition srate as follows:

rate =

lr lr + lw

ð5Þ

where lr stands for the number of correct recognition labels and lw stands for the number of incorrect recognition labels. The total recognition accuracy is a value arranged from 0 to 1. The closer that rate is to 1, the more correct activity labels are recognized, that is, the better comprehensive recognition ability the classifier has. In our experiments, rate is the most important evaluation metric. 2.

The standard deviation of activity recognition accuracy srate reflects the robustness of our framework in recognizing different activities. In order to determine srate, we need to calculate average recognition accuracy classrate of all activities, and then to calculate their standard deviation. The method of calculating classrate is shown in equation (6) and srate is shown in equation (7) n P

classrate =

srate =

acti

i=1

n vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uP u n u ðacti  classrateÞ2 ti = 1 n1

ð6Þ

ð7Þ

where n stands for the total number of activities and acti stands for the recognition accuracy of the ith activity. The less the value of srate, the more balanced recognition ability the framework we propose has on different kinds of activities.

Chen et al.

7

Table 2. Sensor event pairs in the activity. Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times Sensor event pair Occurrence times

\M13, M13. 1 \M13, M17. 5 \M14, M15. 13 \M15, M13. 3 \M15, AD1-B. 3 \M16, M16. 1 \M17, M14. 13 \M17, M18. 16 \M17, AD1-C. 7 \M18, M16. 1 \I01, I01. 1 \I07, M15. 1 \D01, D01. 1 \AD1-B, AD1-C. 6

\M13, M14. 11 \M13, AD1-B. 1 \M14, M16. 9 \M15, M14. 4 \M16, M13. 6 \M16, M17. 15 \M17, M15. 3 \M17, I07. 1 \M18, M13. 3 \M18, M17. 12 \I01, I05. 1 \I08, M14. 1 \AD1-B, M17. 36 \AD1-C, M17. 5

Experiment 1: parameter selection Before recognizing activities, we need to do some prepared work. In this experiment, we describe the detailed processes of determining the element number of ADL primitive and primitive judging threshold. Parameter 1. The element number in a primitive. The essence of ADL primitive in smart home environment is successively temporal triggered sensor event combination. In order to ensure the effectiveness of primitives, their element number should be figured out appropriately. ADL primitive should include information as much as possible, so one primitive should consist of two or more sensor events. Given that the triple, quad, or other multi-tuple all can be extended by twotuple, we decided that the two-tuple of successively triggered sensor events satisfying some conditions becomes an ADL primitive, that is, we set the element number in a primitive to two. Parameter 2. Primitive judging threshold. When a participant is doing activity, he or she triggers one or a few sensors. The corresponding triggered sensor events may be vital or inessential on represent activities, so we need a judging parameter to select valid sensor events as primitive elements, and this parameter should be defined as primitive judging threshold: Jthresh. Some sensor events are

\M13, M15. 9 \M14, M13. 10 \M14, M17. 2 \M15, M16. 20 \M16, M14. 1 \M16, AD1-B. 5 \M17, M16. 2 \M17, D01. 1 \M18, M14. 4 \M18, M18. 10 \I02, I01. 1 \D01, M17. 1 \AD1-B, M18. 5 \AD1-C, M18. 1

\M13, M16. 3 \M14, M14. 2 \M14, I08. 1 \M15, M17. 5 \M16, M15. 8 \M17, M13. 7 \M17, M17. 49 \M17, AD1-B. 31 \M18, M15. 1 \M18, AD1-B. 1 \I05, D01. 1 \D01, I02. 1 \AD1-B, AD1-B. 21 \AD1-C, AD1-B. 7

necessary to complete activity, while others show participant’s paths or prepare to make target activities. And the sensors in the former group are triggered frequently, while the sensors in the latter group are triggered with fewer times but in relatively wide areas. We consider that the sensors in the latter group are unnecessary and should be deleted; by this way, the amount of activity data can be reduced. According to the introduction about the primitive judging threshold Jthresh, the setting process needs three steps as follows: (1) we need to know about the occurrence number of each sensor event pair: times (\sij , sik .). (2) We divide the times into three grades— 0  times\10, 10  times\100, and times  100, and figure out times share in each grade, which stands for the size of triggered area. For example, if the first grade’s times share is the highest, that is, its sensor event pairs’ occurrence times are less in relatively wide areas, which means that these sensor event pairs are probably triggered to prepare to do activity. (3) Jthresh is set to the upper bound of the highest times grade. The root cause is that without regard to the sensor event pairs preparing activity, the sensor event pairs in the other two grades are considered as key elements to represent activity. Taking the part of activity data in our experiment for example, we introduce the process of determining Jthresh in detail. Table 2 shows all sensor event pairs

8

International Journal of Distributed Sensor Networks

Table 3. Occurrence time shares of sensor event pairs.

Times share (%)

0  times\10

10  times\100

times  100

77.2

22.8

0

occurring in data and their corresponding occurrence times. The share of three times grades is shown in Table 3. According to these two tables, we set the upper bound of having the highest share: 10 as the primitive judging threshold, that is, sensor event pairs with occurrence times over 10 become primitives. The final result of primitives in this activity data part is shown in Table 4. After selecting ADL primitives for vocabulary, each activity can pick out its own primitives to be a set and be represented in the form of the corresponding primitives in the remainder of recognition process. However, the nature of primitive is sensor event pair without actual meaning; in order to represent activity data stream in a natural and well-understood way, we give each primitive a physical meaning. Through the between raw activity data streams and their primitive sets, we find that when user executes activities, there are some fixed action paths and featured sensors triggered. Let us take the activity ‘‘make a phone call’’ as an example to illustrate what we discover. The primitive set of the activity ‘‘make a phone call’’ includes 12 primitives such as \M01, M07., \M07, M08., \M08, M09., \M09, M14., \M13, M13., \M13, M14., \M13, asterisk., \M14, M13., \M14, M14., \M23, M01., \I08, M13., and \asterisk, M13.. Except for primitives \M13, asterisk., \I08, M13., and \asterisk, M13., other

primitives only involve some motion sensors, from which we can detect user’s moving path in house when he or she is making a phone call. For these primitives, we give them physical meaning of location points. The primitives \M13, asterisk. and \asterisk, M13. involve the phone sensor; combining the experimental activity setting, we give \M13, asterisk. the physical meaning of pick up the phone and \asterisk, M13. the physical meaning of hang up the phone. In the same way, since sensor I08 is the phone book sensor, we consider the physical meaning of primitive \I08, M13. to be the end of using phone book. Based on these primitives with physical meaning, we can infer and describe the execution process of the activity ‘‘make a phone call’’ as ‘‘the participant moves to the phone and uses the phone book, maybe looks up a special number, then picks up the phone to dial the number, and listens to the message, finally hangs up the phone.’’ When all primitives are given their physical meaning, we can represent execution processes of activities in the natural way just like the example of ‘‘make a phone call.’’

Experiment 2: activity recognition In order to evaluate the recognition performance of AR-ADLP framework objectively, we leverage crossvalidation strategy to ensure experimental results’ academic rigor and reliability. And our recognition results are compared with the famous researcher Cook’s recognition accuracy using the same activity database. In the cross-validation strategy, we divide all experimental data into four groups, three of them are set as training dataset and the remainder group is testing data, thus there are totally four iterations in the whole recognition process. During these four iterations, every

Table 4. Selected sensor event pairs as primitives. \M13, M14. 11 \M16, M17. 15 \M17, AD1-B. 31

ADL primitive Occurrence times ADL primitive Occurrence times ADL primitive Occurrence times

\M14, M13. 10 \M17, M14. 13 \M18, M17. 12

\M14, M15. 13 \M17, M17. 49 \AD1-B, M17. 36

\M15, M16. 20 \M17, M18. 16 \AD1-B, AD1-B. 21

Table 5. Four iterations’ acrs of cross-validation strategy.

itera_1 itera_2 itera_3 itera_4

Make a phone call

Wash hands

Cook

Eat

Clean

1 1 1 1

0.9875 1 0.9551 1

0.9981 0.9929 0.9958 1

1 0.9605 1 0.9844

0.9594 0.9583 0.9797 0.9104

Chen et al.

9

Figure 2. Comparison of acrs of AR-ADLP framework and Cook’s method.

Table 6. rate and srate of AR-ADLP framework in four iterations.

rate Srate

itera_1

itera_2

itera_3

itera_4

0.9861 0.0173

0.9800 0.0211

0.9903 0.0193

0.9743 0.0389

group experimental data should be ensured to become the testing data one and only one time. The average of these four iterations’ results is considered as the experimental result to evaluate our framework’s recognition effectiveness. There are five activities to recognize, and their accuracy results (acrs) are shown in Table 5, in which each row stands for an iteration process and each column stands for an activity. Through these acrs, we get the best acr in the iteration one and average acr results by averaging all acrs. Figure 2 depicts the accuracy comparison of AR-ADLP framework (namely, the best and the average case, Cook’s recognition method with NB and HMM, and Zhang’s method with SVM). From this figure, we can see that acr in Zhang’s SVM method has the worst performance. This is because M Zhang and AA Sawchuk’s21 framework works on a bag of features defined by the mean, standard deviation, root mean square, averaged derivatives, and mean crossing rate of continuous sensor data (or signals), but they are not well-defined on discrete sensor data (or events). So to

have a better comparison, we will treat Cook’s recognition results as the baseline in next experiments. In Figure 2, the acrs of activities ‘‘wash hands’’ and ‘‘clean’’ in the best acr values are namely 3.75% and 0.94% higher than those in Cook’s HMM method, and the former has the same results with the latter in activities ‘‘make a phone call’’ and ‘‘eat,’’ which are all 100%. Although the acr in the activity ‘‘cook’’ of the best iteration is 0.19% lower than Cook’s HMM method, with respect to the average acr values, its acrs are 3.57% and 0.2% higher than Cook’s HMM in the activities ‘‘wash hands’’ and ‘‘clean,’’ and 0.33% and 1.38% lower in activities ‘‘cook’’ and ‘‘eat.’’ Overall, in terms of acrs, AR-ADLP framework achieves outstanding and satisfactory results. Except for comparing acrs of each activity, we also need to compare rate and srate. Combining the crossvalidation strategy, we get the results during four iterations shown in Table 6. From Table 6, we can see that the best recognition accuracy rate is achieved in the third iteration, although its recognition stability parameter srate is only a little below the first iteration’s srate. And through this table, we average rate and srate to compare with Cook’s recognition results, which are presented in Table 7. We can see from Table 7 that rate and srate of AR-ADLP best group are both better than others; among these, AR-ADLP best group is 1.03% higher than Cook’s best accuracy (i.e. Cook’s HMM group) in rate and 8.03% higher than Cook’s NB group.

10

International Journal of Distributed Sensor Networks

Table 7. Comparison of rate and srate in AR-ADLP group and Cook’s group.

AR-ADLP (best) AR-ADLP (average) Cook (NB) Cook (HMM)

rate

srate

0.9903 0.9827 0.9100 0.9800

0.0193 0.0242 0.0785 0.0273

Table 8. Confusion matrix of labels recognized in the experiment group.

Make a phone call Wash hand Cook Eat Clean

Make a phone call

Wash hand

Cook

Eat

Clean

228

0

0

0

0

1 0 0 0

91 2 0 0

0 506 3 0

0 0 185 18

0 0 0 360

The significance of bold values empathize that how many activities are recognized correctly.

Table 9. Confusion matrix of labels recognized in the control group.

Make a phone call Wash hand Cook Eat Clean

Make a phone call

Wash hand

Cook

Eat

Clean

231

15

0

0

0

1 0 0 0

116 0 0 0

2 592 6 0

0 0 209 34

0 0 0 405

The significance of bold values empathize that how many activities are recognized correctly.

Moreover, in terms of AR-ADLP average group, the rate is 0.27% better than Cook’s HMM group, and the srate is a litter below Cook’s group. That is to say, the AR-ADLP framework we propose has better performance on recognition accuracy, and outperforms Cook’s methods on the stability and robustness of recognizing different kinds of activities. Confusion matrix is a useful table that presents both the class distribution in the data and the classifiers predicted class distribution with a breakdown of error types. We construct two confusion matrices (Tables 8 and 9) for recognition frameworks’ integrated activity reconstruction module (i.e. the experiment group) and not integrated one (i.e. the control group). Each row in a confusion matrix stands for the real activity label in testing data and each column stands for the recognized label. The element which is located in the ith row and the jth column stands for the number of activity label ‘‘i’’ that is recognized as activity label ‘‘j,’’ thus the

‘‘bold’’ elements stand for the number of activity labels recognized correctly. Compared to the recognized situation of the control group, the experiment group indeed recognizes the less number of activity labels. So Tables 8 and 9 show that the control group not only has a higher true positives but also lower false positives if compared with the experiment group. If we consider the true negative rate (tnr), the control group tnr is lower if compared with that of the experiment group: it decreases from 0.32 to 0.16 for ‘‘make a phone call,’’ from 0.23 to 0.10 for ‘‘wash hands,’’ from 0.53 to 0.44 for ‘‘cook,’’ from 0.31 to 0.18 for ‘‘eat,’’ and from 0.42 to 0.29 for ‘‘clean.’’ So, we can conclude that the AR-ADLP framework not only improves the ability of recognition but also reduces the total number of activity labels needed to be recognized, thus to release burden of the whole recognition process.

Discussions The goal of our experiments is to verify the recognition performance of AR-ADLP framework. All the tables and figures present that the results of rate and srate of our novel framework are both better than the best experimental situation of Cook’s methods and indicate that AR-ADLP framework has more accurate recognition and stronger robustness in various activities. In another aspect, we carry out to find actual physical meaning for each ADL primitive in real daily living, and by this way, represented activities can be easily understood. The AR-ADLP framework is not a universal one but only suitable for smart environments that generate discrete data (events). The temporal attribute of an event can be used to determine the temporal relationship between events. This information can be useful for the ordering of event sequence in a timeline and also for the better presentation of discrete data. It is easy to incorporate them in a new smart home system because ADL primitives are the simplest temporal relationship. To do so, we first learn ADL primitives by counting events that co-occur often in annotated activity data. Then, we determine the primitives for a given activity through the process of data reduction. Then, events and the learned primitives are treated as features to train a specific classifier. There are two less-than-ideal acrs of activity ‘‘cook’’ in our best and average groups which are slightly worse than Cook’s HMM method, but we cannot deny the effectiveness of AR-ADLP framework. Through our careful analysis, we think the high similarity between the primitive set of activities ‘‘cook’’ and ‘‘eat’’ leads to bad recognition results, that is, the trained classifier misjudges some primitives belonging to ‘‘cook’’ as primitives belonging to ‘‘eat.’’ This problem can be fixed

Chen et al.

11

through adding more information about sensor locations, and it will become the key point in our future research.

3.

Conclusion

4.

In this article, we propose a novel AR-ADLP. In doing so, we determine the definition of the primitive, construct the specific primitive set for each experimental activity, and represent activity data as the format of primitive strings. According to our experimental results, AR-ADLP framework is good at improving activity recognition accuracy and facilitating deployment in other environment with pre-trained primitive vocabulary. Compared to the traditional recognition framework like Cook’s without ADL primitive, the total recognition accuracy and the standard deviation of activity recognition accuracy of our framework are both better, and the total number of activity labels needed to be recognized is less, which means that new framework has more powerful recognition ability, robustness, and less number of recognized activity data. In the future, we would like to add some area density factors such as sensor locations in our recognition framework to fix the drawback of the situation where two activities have similar primitive sets.

5.

6.

7.

8.

9.

Declaration of conflicting interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

10. 11.

Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (nos 61672122, 61402070, and 61602077), the Natural Science Foundation of Liaoning Province of China (no. 2015020023), the Educational Commission of Liaoning Province of China (no. L2015060), and the Fundamental Research Funds for the Central Universities (no. 3132016348).

12.

ORCID iD

15.

Rong Chen

13.

14.

http://orcid.org/0000-0001-5848-6398

References 1. Kasteren TV, Noulas A, Englebienne G, et al. Accurate activity recognition in home setting. In: Proceedings of 10th international conference ubiquitous computing, Seoul, South Korea, 21–24 September 2008. New York: ACM. 2. Patterson DJ, Fox D, Kautz H, et al. Fine-grained activity recognition by aggregating abstract object usage. In: Proceedings of the ninth IEEE international symposium on

16.

17. 18.

wearable computers, Osaka, Japan, 18–21 October 2005. New York: IEEE. Chernbumroong S, Cang S, Atkins A, et al. Elderly activities recognition and classification for applications in assisted living. Expert Syst Appl 2013; 40: 1662–1674. Mocanu S, Mocanu I, Anton S, et al. AmIHomCare: a complex ambient intelligent system for home medical assistance. In: Proceedings of the 10th international conference on applied computer and applied computational science, Venice, 8–10 March 2011, pp.181–186. Stevens Point, Wisconsin: World Scientific and Engineering Academy and Society. Kasteren TV, Englebienne G and Kro¨se B. An activity monitoring system for elderly care using generative and discriminative models. Pers Ubiquit Comput 2010; 14: 489–498. Mocanu I and Florea AM. A model for activity recognition and emergency detection in smart environments. In: Proceedings of the first international conference on ambient computing, applications, services and technologies, Barcelona, 23–29 October 2011. Wilmington, DE: IARIA. Song B, Kamal AT, Soto C, et al. Tracking and activity recognition through consensus in distributed camera networks. IEEE T Image Process 2010; 19(10): 2564–2579. Cheng J, Amft O and Lukowicz P. Active capacitive sensing: exploring a new wearable sensing modality for activity recognition. In: Proceedings of the 8th international conference on pervasive computing, Helsinki, 17–20 May 2010, pp.319–336. Berlin: Springer. Niu L, Chen R, Zhou X, et al. An empirical study of DRER model in managing temporal RFID data. ICIC Express Lett 2011; 5(2): 461–472. WSU CASAS datasets, http://ailab.wsu.edu/casas/datasets. html Cook DJ, Crandall A, Singla G, et al. Detection of social interaction in smart spaces. Cybern Syst 2010; 41(2): 90–104. Cook DJ. Learning setting-generalized activity models for smart spaces. IEEE Intell Syst 2012; 27(1): 32–38. Kasteren TV and Kro¨se B. Bayesian activity recognition in residence for elders. In: 3rd IET international conference on intelligent environments, Ulm, 24–25 September 2007, pp.209–212. Herts: IET. Wang F, Zhu H, Tian B, et al. A HMM-based method for anomaly detection. In: 4th IEEE international conference on broadband network and multimedia technology (IC-BNMT), Shenzhen, China, 28–30 October 2011, pp.276–280. New York: IEEE. Fahim M, Fatima I, Lee S, et al. Activity recognition based on SVM Kernel fusion in smart home. Comput Sci Appl 2012; 203: 283–290. Zhao L, Wang X, Sukthankar G, et al. Motif discovery and feature selection for CRF-based activity recognition. In: 20th international conference pattern recognition, Istanbul, 23–26 August 2010, pp.3826–3829. New York: IEEE. http://www.cs.waikato.ac.nz/ml/weka/ Dernbach S, Das B, Krishnan NC, et al. Simple and complex activity recognition through smart phones. In: 8th international conference on intelligent environments,

12

19.

20.

21.

22.

23.

24.

25.

International Journal of Distributed Sensor Networks Guanajuato, Mexico, 26–29 June 2012, pp.214–221. New York: IEEE. Krishnan NC and Cook DJ. Activity recognition on streaming sensor data. Pervasive Mob Comput 2014; 10(Part B): 138–154. Joumier V, Romdhane R, Bremond F, et al. Video activity recognition framework for assessing motor behavioural disorders in Alzheimer disease patients. In: International workshop on behaviour analysis and video understanding, Sophia Antipolis, 23 September 2011, pp.9–20. New York, NY: ACM. Zhang M and Sawchuk AA. A bag-of-features-based framework for human activity representation and recognition. In: Proceedings of the 2011 international workshop on situation activity & goal awareness, SAGAware ’11, Beijing, China, 2011, pp.51–56. New York, NY: ACM. Gru¨nerbl A, Muaremi A, Osmani V, et al. Smartphonebased recognition of states and state changes in bipolar disorder patients. IEEE J Biomed Health 2015; 19(1): 140–148. Wan J, O’grady MJ and O’hare GMP. Dynamic sensor event segmentation for real-time activity recognition in a smart home context. Pers Ubiquit Comput 2015; 19(2): 287–301. Zhan K, Faux S and Ramos F. Multi-scale conditional random fields for first-person activity recognition on elders and disabled patients. Pervas Mob Comput 2015; 16: 251–267. Sun FT, Yeh YT, Cheng HT, et al. Nonparametric discovery of human routines from sensor data. In: IEEE international conference on pervasive computing and

26.

27.

28.

29.

30.

31.

32.

33.

communications, Budapest, 24–28 March 2014, vol. 9074, pp.11–19. New York: IEEE. Wen J and Zhong M. Activity discovering and modelling with labelled and unlabelled data in smart environments. Expert Syst Appl 2015; 42(14): 5800–5810. Hong X and Nugent CD. Segmenting sensor data for activity monitoring in smart environments. Pers Ubiquit Comput 2013; 17(17): 545–559. Bao L and Intille SS. Activity recognition from userannotated acceleration data. In: International conference on pervasive computing, Linz, 21–23 April 2004, pp.1–17. Berlin: Springer. Stikic M, Larlus D and Schiele B. Multi-graph based semi-supervised learning for activity recognition. In: International symposium on wearable computers, Linz, 4–7 September 2009, pp.85–92. New York: IEEE. Zhang M and Sawchuk AA. Human daily activity recognition with sparse representation using wearable sensors. IEEE J Biomed Health 2013; 17(3): 553–560. Lee YS and Cho SB. Activity recognition with android phone using mixture-of-experts co-trained with labeled and unlabeled data. Neurocomputing 2014; 126(3): 106–115. Tapia EM, Intille SS and Larson K. Activity recognition in the home using simple and ubiquitous sensors. In: International conference on pervasive computing, Linz, 21– 23 April 2004, pp. 158–175. Berlin: Springer. Cook DJ and Edgecombe MS. Assessing the quality of activities in a smart environment. Methods Inf Med 2009; 48(5): 480–485.