Adaptive mobile activity recognition system with ...

11 downloads 196674 Views 3MB Size Report
In this paper, we propose a novel phone-based dynamic recognition framework ..... environment, our recognition approach deals with activities as clusters rather than ..... phone and desktop. The mobile phone is a Galaxy SIII Android with the.
Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Adaptive mobile activity recognition system with evolving data streams Zahraa Said Abdallah a, Mohamed Medhat Gaber b,n, Bala Srinivasan a, Shonali Krishnaswamy c a

Faculty of Information Technology, Monash University, Melbourne, Australia School of Computing Science and Digital Media, Robert Gordon University, UK c Institute for Infocomm Research (I2R), Singapore b

art ic l e i nf o

a b s t r a c t

Article history: Received 18 February 2014 Received in revised form 10 September 2014 Accepted 13 September 2014

Mobile activity recognition focuses on inferring current user activities by leveraging sensory data available on today's sensor rich mobile phones. Supervised learning with static models has been applied pervasively for mobile activity recognition. In this paper, we propose a novel phone-based dynamic recognition framework with evolving data streams for activity recognition. The novel framework incorporates incremental and active learning for real-time recognition and adaptation in streaming settings. While stream evolves, we refine, enhance and personalise the learning model in order to accommodate the natural drift in a given data stream. Extensive experimental results using real activity recognition data have evidenced that the novel dynamic approach shows improved performance of recognising activities especially across different users. & 2014 Elsevier B.V. All rights reserved.

Keywords: Ubiquitous computing Mobile application Activity recognition Stream mining Incremental learning Active learning

1. Introduction Data stream mining has unique characteristics that make it more challenging than static data mining. In typical streaming settings, data has an infinite length; therefore, traditional mining techniques that require several passes on data cannot be applied in a streaming environment. Concept-drift nature of streaming data makes it hard to predict and classify new incoming data. The prior knowledge of data eventually becomes outdated while stream evolves. Thus, the classificatory model has to be continuously updated and refined to cope with changes occurring naturally in data stream. There are many emerging applications in which data streams play an important and natural role, one of these is activity recognition. Activity recognition aims to provide accurate and opportune information on people's activities and behaviour. Activity recognition has become an emerging field in the areas of pervasive sensory data processing and ubiquitous computing because of its important proactive and preventing applications. The increased interest in the field of human activity recognition contributes to numerous domains, such as health care [18,32] surveillance and security [26,11], personal-informatics [32,5] and just-in-time systems [13,25]. Activity recognition (AR) has been widely studied with different approaches and from different perspectives. The state of the art in n

activity recognition research has focused on traditional classificatory learning techniques. First, data is collected and annotated by domain experts. Then, labelled data is deployed to build and train the classifier learning model. When the model is ready, the recognition system is used to predict activities from the sensory data. A wide range of classification models has been deployed for activity recognition such as Decision Trees, Naive Bayes and Support Vector Machines. The premise underlying machine learning in activity recognition is that new activities can be recognised using prior knowledge of previously collected data representing different activities. State-of-the-art activity recognition systems rely strongly on prior knowledge, ignoring post deployment essential adaptation and refinement that are naturally occurring in a dynamic streaming environment. Moreover, personalisation of model to suit a particular user had little focus in the research area. Typically, walking for one user may well be running for another, therefore tuning the general model to recognise a given user's personalised activities is crucial for building a robust activity recognition system. Thus, it became crucial to handle the emerging change of activities resulted from the modification of users' activities patterns or personalisation of user's activities while stream evolves. In this paper, we aim to build a personalised and adaptive framework for activity recognition that incrementally learns from evolving data stream. The developed framework deals with high speed, multi-dimensional streaming data to learn, model, recognise personalised user's activities. The novel approach extends the

Corresponding author.

http://dx.doi.org/10.1016/j.neucom.2014.09.074 0925-2312/& 2014 Elsevier B.V. All rights reserved.

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

state-of-the-art contributions:

in

activity

recognition with

the

following

 Dynamic incremental learning from evolving data stream: To





handle concept drift in data stream, we introduced a novel incremental automatic learning from unlabelled data to refine and adapt the learning model to the most changes in the data stream in real time. Effective active learning with lowest cost: Manual labelling of data stream is usually impractical, costly and time consuming. Therefore, we propose a novel active learning approach that asks to label only a small amount of the most uncertain data in the stream for a dynamic model adaptation and refinement. Mobile real time AR application: We demonstrate the effectiveness of our light-weight framework by deploying the activity recognition streaming application on a mobile platform for real time activity recognition.

To the best of our knowledge, no recognition/streaming system addresses all aforementioned contributions in a single framework. We termed this framework STAR, which stands for STream learning for mobile Activity Recognition. The rest of the paper is organised as follows. Section 2 provides a discussion of the research context. Fine treatment of the proposed framework and its details are presented in Sections 3 and 4. Section 5 reports the experimental results and analysis. Finally, Section 6 concludes the paper with a summary.

2. Related work Activity recognition is a very wide research area that has been investigated from many perspectives. An immense research has been directed towards learning methods for recognising activities from sensory data. Most of these learning methods considered the deployment of supervised machine learning approaches. In supervised learning, labelled data is collected to train a classificatory model. Then, the built model is deployed for recognising activities from incoming unlabelled data. Various supervised learning techniques are commonly used for activity classification; these methods are reviewed in [22,21]. The vast of activity recognition research did not consider the adaptation of the classificatory model beyond the training phase [33]. They also assume the availability of a significant amount of labelled data in order to build an efficient classificatory model. These assumptions are non-realistic especially when dealing with streaming sensory data. The evolving data stream encounters various kinds of change. One of the main causes of changes concerns the personalisation of the recognition model. Different individuals might perform the same activity but in various ways. The most accurate recognition results can be obtained if we train the learning model with the annotated data for a specific user. Researchers investigated the impact of training the model on a personalised data and compared it with training the model on a general data collected from different users. The researchers demonstrated the improved accuracy when deployed subject specific data for training instead of the general model [16,29]. Retraining the model for user specific annotated data is not always applicable because of the scarcity of labelled data especially in the streaming environment. Moreover, model reconstruction is not realistic for real time recognition. Recent research in [28] adjusted the learning model from person A with selected confident sample for another person B. The proposed algorithm is an integration of SVM classier and clustering approach for updating model automatically. However,

the proposed system has not been evaluated in a streaming setting. The deployment of activity recognition system in streaming environment imposes more challenges as the change of data distribution when the stream evolves may cause the model to drift away from the actual data distribution. Recently, few studies have considered the streaming nature of data for activity recognition. Krishnan and Cook [15] developed an efficient technique for handling streaming data based on windowing. This system is based on the fact that different activities can be characterised by various window lengths. The study evaluated different windowing techniques for analysing stream of activities. While the addition of this study to the field of activity recognition, research still requires addressing other research gaps. A limitation of the proposed approach is its inability to adapt the model post the deployment. The classifier is built with training data, with no flexibility to be adapted and personalised post the training phase. MARS [9,10], stands for Mobile Activity Recognition System, is an incremental system for predicting activities in a data stream from a mobile device. The learning process in MARS is divided into two phases: training and recognition. In the training phase, user performs activities and annotates them interactively when data is collected from mobile sensors. The collected annotated data is saved for building the model offline. In the recognition phase, the incoming unlabelled data is then classified based on the offline built model. The study compared the results of both static decision tree and incremental Naive Bayes for evaluating the system performance. The proposed algorithms are light-weight thus can be deployed on mobile devices. Another recent research by Lockhart and Weiss [17] has presented the Actitracker system for mobile activity recognition in health care domain. Actitracker uses Random Forest model to build a general/universal classifier that could be replaced by a personalised model for a particular user. The system handles data stream with fixed time window and transmits data for backend server for processing. Although, techniques presented in the aforementioned systems presented an efficient approach that combines activity recognition with stream mining in an ubiquitous environment, some challenges still to be addressed. These systems assume the availability of labelled data for each user; each individual has to collect and annotate data that represents the personalised activities for building the model. When new subject uses the system, the model has to be retrained with the data collected and annotated for this particular user. Training model in such way is impractical in streaming settings that require automated and incremental approach for ‘adjusting’ the model to fit a particular user. Moreover, the annotation process is time consuming, erroneous and not applicable for streaming environment. Due to the scarcity of labelled data in streaming environment, selecting only the most profitable data with the minimum effect on performance is crucial for an effective recognition. According to Muslea et al. [20], the main goal of active learning algorithm is to find the most profitable and less costly data to label. Unlike incremental learning which does not require user input, active learning inquires user for true label. Many approaches have proposed for an efficient active learning in data streams, such as in [30,19,12]. Stikic et al. [27] employed a multi-sensor approach to choose important data to be labelled. Two approaches evaluated, selected data are these which classifier is least confident about or when two classifiers have a high degree of disagreement. Results showed improved performance when active learning is applied. Our framework varies from other studies in terms of numerous aspects. State of the art activity recognition approaches rely on a static model that assumes no change occurring to the model beyond training. Few studies addressed the adaptation issue with

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

a complete retraining of the model. However, deploying this model in streaming settings requires real rime adaptation with limited availability of labelled data. Alternatively, our proposed approach uses a dynamic model that could be refined and adapted automatically while stream evolves. The model is also personalised to a particular user with no need to retrain. Our technique is deployed on the mobile streaming environment for real time recognition and adaptation. Finally, our technique addresses the limited availability of labelled data by integrating batch active learning in data stream mining for activity recognition to select only the most profitable data to label in batches.

3. STAR top level description In this section, we introduce our novel STream learning for mobile Activity Recognition framework- termed STAR – along with its phases and components. In terms of the learning paradigm, STAR is divided across two phases: offline modelling phase and online recognition and adaptation phase. In the offline modelling phase, we build a learning model (LM) from a set of annotated sensory data that represents different activities. The output of the offline modelling phase is a fine grained learning model that represents activities existing in the training examples. In the online phase, we introduce a dynamic recognition technique of unlabelled streaming data with different activities. The key challenge in this phase is to enable incremental and continuous learning so that the recognition system can cope with and reflect the expected changes in single user's activities or across different users. Fig. 1 shows the top level explanation of STAR with its phases and components. The overall process for our approach is depicted in Algorithm 1. Subsets of the algorithm are descried in subsequent sections. Modelling component builds the initial learning model from annotated data. We split data stream into equal size chunks of unlabelled data (line 3). In order to detect concurrent activities, we apply an online clustering for each chunk of data (line 4). The prediction and adaptation techniques are deployed on each cluster (lines 6 and 7). The model is refined with recent data. Updated

3

model replaces old one for future prediction with stream evolution (line 8). Algorithm 1. STAR top level algorithm. Input: TrainigData: Annotated data for building LM Stream: Sensory data evolving from sensors Output: Predcticed : activities predicted labels 1: LM’ ModellingComponent (TrainingData) 2: while Stream not empty do 3: WindowDatat ’Stream data of the sliding window at time t 4: WindowClusterst ’OnlClus (WindowDatat) 5: for all Cluster i A WindowClusterst do 6: Predictedti ’PredictActivityLabel (Clusteri, LM) 7: LM new ’UpdateFramework (Predictedti, Clusteri) 8: LM’LMnew 9: end for 10: end while

Offline phase: The aim in this phase is to build a flexible, expandable, fine grained and light-weight learning model for activity recognition. The novel dynamic model consists of a set of clusters, each contains its corresponding sub-clusters. Subclusters represent different patterns inside a cluster. The modelling component processes annotated sensory data for training. Only summarised characteristics of the data remain into memory to represent various activities for the following phase. The lightweight and flexibility of the created learning model allow future necessary model tailoring to best fit drifting data stream. Online phase: The online phase consists of both the recognition and adaptation components. The two components are implemented on board the mobile phone for real-time recognition and adaptation. Firstly, we deploy the light-weight learning model and integrate it with the recognition component for predicting incoming sensory streaming data. Then, we apply our novel stream mining ensemble prediction technique for recognising different activities online with continuous windowing of the data stream.

Fig. 1. Top Level Framework.

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

Simultaneously, the adaptation component refines the learning model in an incremental and active approach. The adaptation component aims to cope with the changes in data stream by tailoring and enhancing the training model to best fit the incoming data in real-time.

4. Methodology In this section, we present a detailed description of the methodology for the STAR system. We start with the developed methodology for the modelling component followed by the two online components: recognition and adaptation. 4.1. Modelling component We aim in this component at building a robust and flexible learning model (LM) which is the basis for a successful recognition and adaptation. For the learning model to be efficient, a set of key criteria has to be considered. One of these criteria is the model granularity; building a fine grained model is expected to enhance the overall recognition accuracy. We achieve this in our technique by building a learning model that separates subsets of the cluster that have distinguished characteristics apart. We refer to these subsets as sub-clusters of the cluster. Sub-clusters of a particular cluster represent different patterns inside this particular activity. Since sub-clusters provide microscopic information about each cluster, fine grained model is expected to be more efficient than using the clusters themselves directly. One key challenge in the modelling process is to keep the balance between the model lightweightiness and fine granularity. As we process the entire training examples to build the fine grained model, we extract only the essential summarised characteristics of the clusters to build the model and dismiss all raw data. The small size of the model is essential for deployment on devices with limited resources such as mobile phones. Another significant criterion of an efficient learning model is its flexibility to update continuously along the evolving steam. The model adaptation occurs beyond the learning offline phase and in the streaming environment. Unlike other techniques that entirely retrain the model with new data, we apply an incremental approach that only updates the summarised characteristics of the model with no need of re-processing the entire training examples as they are already dismissed at this point. Adaptation of simple characteristics such as cluster centroid, size and other basic characteristics is sufficient to cope with the evolving data streams for detecting and handling changes, and subsequently refining the model accordingly. The methodology for building the learning model starts with applying a supervised learning technique on the annotated sensory data collected for training. Thus, the initial learning model that is composed of number of clusters is created. Each cluster corresponds to one of the labelled activities that exists in the training data. After creating K clusters, we further split each cluster into smaller sub-clusters. We apply a clustering technique for each cluster to form sub-clusters representing different patterns inside

the cluster. Typically, any activity would span a range of patterns for either one user or across various users. Simple walking activity, for instance, contains different patterns such as strolling, jogging, or normal pace walking pattern. Walking from one person could be jogging for another. A fine grained model that consists of different patterns/sub-clusters inside each activity/cluster plays a significant role in efficient recognition. The process of generating sub-clusters model is illustrated in Fig. 2. Besides the model's fine granularity, we aim in the modelling component to maintain a light-weight model that can be deployed effectively in an ubiquitous environment. Therefore, we extract a summary of the statistics of each cluster and its corresponding subclusters, then we dismiss all raw data instances when concluding the offline phase. Cluster/sub-cluster characteristics are the extracted information describing each cluster/sub-cluster and distinguishing it from others. Characteristics of the learning model (LM) span across three levels; sub-clusters characteristics, cluster characteristics and holistic characteristics. Sub-cluster (sc) characteristics portray an abstract description of each pattern/sub-cluster within a cluster which distinguish it from other sub-clusters belonging to the same cluster. Sub-cluster characteristics include the following:

 Weightsc is the total number of data instances that belong to the sub-cluster (sc).

 Centroidsc is one of the most important extracted features. It locates the centre of each sub-cluster in the data domain. For n dimensional data instances, Centroidsc is an n dimensional vector of the mean value of the n dimensional instances inside the sub-cluster as per Eqs. (4.1) and (4.2): Centroidsc ¼ fd1 ; d2 ; d3 ; …dn g dj ¼





∑m i ¼ 1 P i;j Weight sc

ð4:1Þ ð4:2Þ

where dj is the centroid of the jth feature. And P i;j is the jth feature of the ith data sample within the sub-cluster (sc). WISCSDsc (WIthin Sub-Cluster Standard Deviation) measures the dispersion of instances within the sub-cluster. Each activity pattern which is represented by a sub-cluster in the LM has its own standard deviation. Standard deviation of an n dimensional instances within the sub-cluster is calculated as shown in Eq. (4.3): sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ∑m i ¼ 1 ðEDistanceðP i ; Centroidsc ÞÞ WISCSDsc ¼ ð4:3Þ Weight sc where EDistance is the Euclidean Distance and Pi is an n dimensional data instance inside the sub-cluster (sc). Radiisc is the max distance between sub-cluster centroid and any data sample belongs to the sub-cluster as explained in Eq. (4.4): Radiisc ¼ maxfEDistanceðP i ; Centroidsc Þ 8 P i A scg

ð4:4Þ

where (EDistance) is the Euclidean Distance, Pi is an n dimensional data instance inside the sub-cluster (sc) and Centroidsc is the sub-cluster centroid.

Fig. 2. Modelling component.

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

 AvDistsc is the average distance between instances within the sub-cluster which is the sum of the distance between subcluster instances and respective centroid divided by the number of examples. The average distance is calculated as per Eq. (4.5):

 DMax is the maximum distance between any pair of subclusters centroids.

ð4:5Þ

Finally, the most abstract level represents the holistic characteristics for the entire training data. Thus, the holistic characteristics represent a collective view on the data. The characteristics are as follows:

where (EDistance) is the Euclidean Distance and Pi is an n dimensional data instance inside the sub-cluster. Densitysc (density of sub-cluster) is defined according to the density function in Eqs. (4.6) and (4.7):

 Nclus is the number of clusters/activities in the training data.  Centroidglob is the global centroid of all instances in the training

∑m EDistanceðP i ; Centroidsc Þ AvDist sc ¼ i ¼ 1 Weight sc



5

Weight sc Densitysc ¼ Volumesc Volumesc ¼

4 π Radiinsc 3

ð4:6Þ ð4:7Þ

where Volumesc is the volume of the sub-cluster as a hypersphere in an n dimensional space.

data, which is analogous to the centre of all clusters centroids.

 Capacity is total number of processed training instances.

All of the extracted characteristics facilitate incremental and continuous learning due to their simplicity of calculation. After extracting all characteristics offline, a cluster-based, fine grained and light-weight learning model (LM) becomes ready for the recognition phase. All raw data is dismissed at the end of this modelling stage. 4.2. Recognition component

Cluster characteristics represent a higher level description of the extracted features corresponding to clusters. The description of cluster characteristics is as follows:

 Nsubc is the number of sub-clusters within the cluster which corresponds to the number of patterns inside an activity cluster.

 Centroidc is the mean of all sub-clusters centroids that belong to the cluster. The centroid is defined in Eq. (4.8): Centroidc ¼

c ∑Nsub i ¼ 1 Centroidsc Nsubc

ð4:8Þ

 Weightc is the total number of data instances within the cluster.  Radiic measures and controls decision boundary of each cluster in the domain space. It is defined as the maximum distance between the cluster centroid and all data instances inside the cluster. Radiic is calculated as per Eq. (4.9): Radiic ¼ max fEDistanceðP i ; Centroidc Þ 8 P i A cg



ð4:9Þ

where ðEDistanceÞ is the Euclidean Distance and Pi is an n dimensional data instance inside the cluster. WICSDc measures the dispersion of data instances within the cluster. Standard deviation is calculated as per Eq. (4.10): sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ∑m i ¼ 1 ðEDistanceðP i ; Centroidc ÞÞ WICSDc ¼ ð4:10Þ Weight c

 GF is the gravitational force among sub-clusters of the cluster. Inspired by physical laws, there exists a natural attraction force between any two objects in the universe and this force is called the gravitational force. According to Newton's universal law of gravity, the strength of gravitation between two objects is directly proportional to the product of the masses of the two objects, but inversely proportional to the square of distance between them. Gravitation force among sub-clusters is defined in Eq. (4.11). GFc characteristic is two dimension array of the gravitation force generated among each pair of sub-clusters within the cluster: GF i;j ¼ G



Weight sci Weight scj r2

ð4:11Þ

Where G is the constant of universal gravitation; Weight sci is the weight of sub-cluster i, Weight scj is the weight of sub-cluster j, r is the Euclidean distance between sci and scj centroids. Densityc is calculated according to Eq. (4.6), but for cluster Weightc and Radiic replacing Weighsc and Radiisc respectively.

The components of recognition and adaptation occur in STAR online phase. In the streaming settings of the online phase, sensory data is generated continuously. The objective of the recognition component is to perform the required computations in order to predict activities in the stream with a single scan of the data and using only a bounded amount of resources on the mobile phone. As stream evolves, the recognition component assesses sensory data to predict performed activities. We start with integrating the learning model (LM) with the recognition component. The integrated unit of the recognition component (RC) receives sensory data from a continuous sliding window to produce an output of the predicted activity label that is performed during this window. Fig. 3 shows how the recognition component functions in the streaming settings. Firstly, a continuous fixed size sliding window is applied on the stream from time (t) to (t þwindow size). The small window data is saved to an online buffer for real time processing while accumulating the new coming data instances of the following window at (t þwindow size). The recognition component predicts activities in each window with a cluster-based approach. In order to explain the idea of the cluster-based approach, we have to make a clear distinction between techniques for handling and for processing data. Data handling techniques represent tailored methods that are developed especially for mining data streams. The reason for developing these methods is the high speed nature of data stream which requires special approaches that are capable of handling the incoming data. Examples of these techniques are sampling, windowing, data segmentation, etc. In our technique, we apply a fixed size, continuous sliding window for handling the stream of data. Approaches for data processing are different in terms of how selected data (collected data, sample data, window data, data chunk, etc.) is processed for classification. State of the art data mining techniques such as Decision Tree, SVM, KNN process each instance of the tabular data for classification individually. In stream classification, most techniques such as VFDT [8] – classify each instance in the data chunk/ window. Few studies have considered a cluster-based approach for classification in streaming settings such as in [24]. Yet these techniques are not tailored for the context of activity recognition. In our approach, we process window data collectively; the processing unit in STAR is the window cluster instead of window instances. Thus, this approach is tailored to activity recognition. As people perform activities in a sequential manner (i.e., performing one activity followed by another), activity recognition data stream typically consists of a sequence of chunks that represents various activities. Instead of processing each instance which is costly in streaming

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

6

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Fig. 3. Recognition Component.

environment, our recognition approach deals with activities as clusters rather than responding to each single instance in the stream. Thus, the predicted activity with the cluster-based approach corresponds to the major activity performed in the target time window. Another advantage of using the cluster-based approach is the preservation of the limited resources on the mobile device especially when dealing with unbounded very fast sensory data stream. Capturing concurrent and interleaved activities: One of the challenges with the windowing, cluster-based approach for recognition is how to handle concurrent and interleaved activities. Typically, some activities are highly correlated such as walking and standing. Even in a small size window one person would have pauses of “stand” activity, for instance, while “walking”. In our cluster-based approach we aim to detect the major activity performed in a single window. In this example, only the major activity, namely walking, will be detected. However, capturing concurrent and highly related activities improves the overall efficiency of the system. Therefore, we combine the clusterbased approach with a concurrent activity detection technique to be able to detect concurrent and interleaved activities in each window/ cluster. In order to capture concurrent activities in a single window, we propose an online clustering method (OnlClus) for each window – as explained in Fig. 3. OnlClus aims to detect and separate concurrent activities that occur at the same window. At this step, we apply wellknown clustering techniques such as K-means or EM on the small size window data. The formed clusters of each window are assessed with an ensemble classifier for recognising their corresponding activities. 4.2.1. Ensemble classification technique The flow of the recognition component continue with applying an ensemble classification technique on clusters of each window. The ensemble classifier is a light-weight algorithm based on a hybrid similarity measures approach for prediction. Thus, the classifier deploys an ensemble of four measures to assess the similarity of each new cluster (of the current window) with LM clusters. State of the art classification methods adopt a single measure for prediction, combing various measures benefits from the strength of each and provides a comprehensive understanding of data from various perspectives. Each measure votes for its own “best candidate” cluster from the measure's perspective. Then, the classifier decides upon the predicted label as the one with the majority votes among all measures. The deployed measures are distance, density, deviation and gravity. The deployment of the ensemble classification for clustering, classification and activity recognition yet in a static environment has been presented in our previous work in [1–3] respectively. A formal definition of the

learning model is described in the following equation: LM ¼ fC 1 fsc11 ; sc12 …sc1P 1 g; C 2 fsc21 ; sc22 …sc2P 2 g; …C n fscn1 ; scn2 …scnPn gg

When a new cluster Cnew emerges, the ensemble classifier is applied to choose the best candidate cluster among n clusters with m sub-clusters in the LM. Where m is the total number of sub-clusters ¼∑ni¼ 1 P i . Number of sub-clusters varies from one cluster to another based on the diversity of patterns within the cluster. The four assessing measures are the following:

 Distance: This basic measure focuses on the closeness and the







separation of incoming data from LM clusters. Cj is the best candidate from distance perspective if the Euclidean distance between CentroidC new and Centroidsci is the shortest among all n clusters with m sub-clusters where sci A C j . Gravity: This measure concerns not only the closeness but also the weight of the clusters. Each cluster has its own gravitational force generated from its weight. The gravity measure focuses on the attraction among clusters caused by their gravitational force as per Eq. (4.11). The bigger the weight of the cluster the stronger the gravitational force produced around it. Therefore, the probability it could attract more data is increased. When the gravitational force between C new and sci is the biggest among all n clusters with m sub-clusters, where sci A C j , then Cj is the best candidate from the gravity perspective. Density: Among the choices of measures is the density which concerns data cohesiveness and dispersion for both the cluster level and collective level of all clusters. The density perspective studies the impact on LM cluster/sub-cluster density if new data joins this particular cluster. In order to choose the best candidate from density perspective, we compute the virtual density difference (VDD) for both cases of gain or loss. The VDD computes the virtual value of gain/loss when the Cnew is merged with existing LM sub-cluster compared to sub-cluster original density Densitysc. Cj is the best candidate from the density perspective when the VDD of C new if merged with sci is either of the maximum gain or minimum loss among all n clusters with m sub-clusters, where sci A C j ,. Deviation: Unlike density, deviation focuses on cluster's internal cohesiveness around centroid. When new cluster Cnew emerged, we test the impact on WISCSD for each sub-cluster. The best candidate from the deviation prospective is the one with least affect on its WISCSD when merged with the new data.

The hybrid similarity measures approach chooses the best candidate cluster by taking the majority vote among the above four measures.

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

We consider all measures in this paper equally weighted. Therefore, the confident level of choosing Cj has to be more than or equal to 50%, i.e., at least 2 measures voted for the same cluster. 4.3. Adaptation component In order to tackle the challenges raised by data streams dynamic nature, our STAR has to be continuously updated to be consistent with the most recent changes in the data stream. In this component, we apply batch incremental and active learning approaches to cope with the drifting nature of data stream. The component updates LM in batches with the entire selected cluster data. 4.3.1. Data selection for incremental and active learning The adaptation component does require all instances neither to be labelled nor to be fed back to the model. Only the most uncertain data is required to be annotated. Table 1 explains the different decisions of active, incremental or not any based on the certainty level of the prediction algorithm applied in the recognition component. The first case of active learning occurs when the confidence level is at the lowest certainty of 25%. This means that each measure votes for different candidate cluster, and not any pairs are alike. The other case of active learning happens when the four measures are confused between exactly two candidate clusters with equal votes to each. The confidence level in this case is 50% for each of the two candidates. With the aforementioned cases, no decision is made by the recognition component. Thus, the adaptation component inquires the user about the cluster true label. Batch-active learning technique inquires about the entire new cluster label instead of inquiring about instances individually. Cluster data attached with true label is fed back to the system in order to adapt LM incrementally. The most uncertain recognition occurs when the confidence level is 50% (with no equal votes to exactly 2 clusters). In this case, the recognition component presents a decision about the predicted activity, however with the lowest level of confidence. To strengthen the learning model and boost the efficiency of the system, we apply an incremental learning in this uncertain case. Thus, incremental learning data is fed back to adapt the learning model with the most recent changes and refine its characteristics accordingly. Both active and incremental approaches adapt the learning model to the most recent predicted activity. However, active learning updates LM with the true label provided by the user, while incremental learning updates it with the predicted label for unlabelled data. 4.3.2. Adaptation methodology The adaptation component aims to refine the LM extracted characteristics in real-time to the most recent changes in data stream. One of the most dominant characteristics of the LM is the centroid. In refining the model, we update the smallest entity which is the sub-cluster and that consequently updates the bigger entities – cluster and holistic characteristics. The predicted cluster is either provided by the user – in active learning – or predicted by the

7

recognition component – in incremental learning. The adaptation algorithm applies distance method to choose the sub-cluster to update inside already selected cluster. For the selected scj ACi, we run the centroid adaptation algorithm. A numerically stable algorithm for weighted centroid update is given in Algorithm 2. It computes the mean based on Knuth [14]. Algorithm Weightsc).

2. UpdatescCentroid

(data,

Weightdata,

Centroidsc ,

sumWeight ¼Weightsc for x in data temp¼Weightdata þ sumWeight delta¼ x – Centroidsc R¼ delta n Weightdata / temp Centroidsc ¼Centroidsc þ R End for

Centroid update algorithm is extended for variance and standard deviation update [6]. Algorithm 3 illustrates the refinement of WISCSD for the selected scj. Algorithm 3. UpdateWISCSD (data, Centroidsc, WISCSDold, Weightsc). n ¼ Weight sc M ¼ WISCSDold nn2 for x in data n ¼n þ 1 delta¼ x – Centroidsc Centroidsc ¼Centroidsc þ delta/n M ¼M þ delta n (x – Centroidsc) End for variance¼ M/n WISCSD ¼ SQRT (variance)

Another essential characteristic that requires update is Densityscj . The density calculation is previously described in Eq. (4.6). When merging new instances into scj, we recalculate the density based on the new parameters. We initially update subcluster's weight by adding up Cnew weight. Another important characteristic to update is Radiiscj . Radii relies on the position of Cnew from existing sub-clusters. As illustrated in Fig. 4, three cases may occur when updating the Radiiscj . Radiiscj remains the same if data of Cnew is fully contained inside the sub-cluster scj, as shown in Fig. 4(a). The other two cases are illustrated in Fig. 4(b) and (c). These cases occur when Cnew intersects with or fully separated from scj. Thus, Radiiscj is adapted as in Eq. (4.12): NewRadii ¼

EDisðCentroidsc ; Centroidnew Þ þ Radiisc þRadiinew 2

ð4:12Þ

where EDistance is the Euclidean Distance. Since higher level characteristics are initially based on the lower level ones, the adaptation algorithm recalculates both cluster and holistic characteristics with the updated values of sub-clusters' characteristics. For example, cluster gravity – GF – is

Table 1 Confidence level with recognition and adaptation decision. Conf. level (%)

Description

Recognition decision

Adaptation decision

25 50 50 75 100

Each measure chooses different cluster Equal votes between exactly 2 clusters Clusi and Clusj Two measures vote for Clusi with no equal votes Three measures voted for Clusi All measures voted for Clusi

None Inquire to choose Clusi or Clusj Clusi Clusi Clusi

Active Active Incremental None None

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

8

Fig. 4. Radii update cases. (a) Contained, (b) intersected and (c) separated.

automatically updated according to Eq. (4.11) with the updated sub-clusters weights and centroids. 4.4. Changes from the initial learning version [4] The initially published version of StreamAR has been greatly improved in STAR. The most important aspects in which it differs from the present version are as follows. StreamAR assumes the availability of labelled data at any time. This is invalid in a realistic environment when labelled data is rare. STAR, on the other hand, requires only small amount of the most profitable data to be annotated beyond the modelling phase (with active learning approach). Moreover, the incremental learning in StreamAR adapts the learning model yet with user annotated data. Therefore, the adaptation process is not automated and relies on user's input. In contrast, STAR applies an automated data selection technique for incremental learning that relies only on the learning process with no further requirement of user's input. Furthermore, in terms of the underlying learning model, the learning model in STAR is fine grained with sub-clusters within each cluster. Only the essential summarised characteristics are extracted from the generated sub-cluster to maintain the light-weight feature of the learning model. Finally, STAR is deployed in an ubiquitous environment; all recognition, processing and adaptation occur on a mobile phone with limited resources. In a nutshell, as StreamAR is an earlier version of STAR, significant improvements distinguish STAR in terms of the underlying learning model, concepts of learning, deployment on a mobile platform and results.

5. Experimental evaluation In this section, we discuss different datasets applied for evaluation, experimental setup, and the discussion of the findings. 5.1. Datasets We conducted our experiments on three real activity recognition datasets. Two of these datasets contain data collected from mobile accelerometer sensor, while the third is collected from onbody sensors. Details of datasets are as follows:

 OPPORTUNITY [23]: The data contains daily human activities recorded in a sensor rich environment of a room simulating a studio flat with kitchen, deckchair, and outdoor access where subjects performed daily activities. It records 72 sensors of 10 modalities, integrated in the environment, in objects, and on





the body. They designed the activity recognition environment and scenario to generate many activity primitives, yet in a realistic manner. Thus, the dataset has labels for user activities like (sitting, walking and running) streaming from various data sources. The dataset consists of annotated complex, multidimensional and naturalistic activities, with particularly large number of atomic activities (around 3,0000 for each segment). The OPPORTUNITY dataset contains annotated four activities for five subjects across five different segments. WISDM [16]: This dataset is collected from user's mobile phone accelerometer sensor. Six activities are performed by user while data collection, namely, walking, jogging, sitting, standing, upstairs and downstairs. The dataset is collected by different users and contains more than 1 million annotated accelerometer data. Smart Phone Accelerometer Data (SPAD) [7]: It contains a manually collected and annotated data from accelerometer sensor on a smart phone to recognise four different activities; walk, still, run and drive.

5.2. Experiential setup In order to evaluate the performance of STAR in a ubiquitous environment, we run all the experiments on a mobile phone. For all experiments, part of the data is used to build the learning model offline; other data from different users is applied for testing. Testing data is a stream of unlabelled data that is not used for training the model. Labels are only revealed for the evaluation purposes. The default window size contains 20 instances unless otherwise stated. For each window data, we apply EM for detecting concurrent activities. We limit the number of concurrent activities in each single window to only two. In this analysis, we define some terms as performance meters as follows.

 Tp is the total existing class instances correctly classified,  ALRate is the number of triggered active learning inquires, and  CPur is the percentage of the cluster purity; which is the percentage of instances with the major label inside the cluster. We have performed the processing time analysis on both a mobile phone and desktop. The mobile phone is a Galaxy SIII Android with the specifications of a Quad-core 1.4 GHz processor and 1 GB of RAM, while the desktop is Core i7 with speed of 2.7 GHZ and 4 GB RAM. Two parameters influence the processing time; data dimensions in terms of number of features and the percentage of incremental learning along the stream. We start with analysing STAR processing

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

time with a 3-D data collected from mobile accelerometer sensor. Both WISDM and SPAD are 3-D datasets with sampling rate of 20 Hz and 5 Hz respectively. The window size of 20 elapses 1 s with WISDM dataset and 4 s in SPAD. The processing time for each window in both datasets is 50–80 msec on the mobile phone and dropped to only 4– 10 msec when deployed on the desktop. The sampling rate in both datasets is longer than the processing time, thus STAR successfully fulfils the real time constraint in these datasets. In the opportunity dataset, the number of features is 110. The processing time for a window of 20 instances is of average 900 mesc on the mobile phone, 57–90 msec for desktop deployment. The sampling rate for the opportunity dataset is 30 Hz (approximately 1 sample every 33 msec). Thus, the observation time for a window of 20 instances is about 700 msec. Indeed STAR still performs a real time or almost real time recognition (for mobile deployment) despite the data high dimensionality. The sampling rate for WISDM and OPPORTUNITY is 20–30 HZ, while with SPAD is with lower rate of 5 Hz. A window size of 20 instances corresponds to an approximate observation length of 0.7 s, 1 s and 4 s for OPPORTUNITY, WISDM and SPAD datasets respectively. The reason of the small window size despite the high sampling rate in some datasets (i.e., OPPORTUNITY and WISDM datasets) is to keep the computational complexity to a minimum. The number of instances in each window influences the order of the computational complexity. The recognition algorithm as a cluster-based approach tests new data clusters. We apply EM clustering for each window with a complexity of O(nki), where n is the number of instances in each window, k is the number of clusters and i is the number of iterations. As the observation time for each window varies from 0.7 s to 4 s, we limit the number of interleaved, concurrent or transitional activities to only two, which is very reasonable based on the small window size. Thus, the complexity for clustering data in each window is O(2n i). The main computational complexity arises from the clustering algorithm. While the recognition technique complexity is O(m) where m is the number of sub-clusters in the learning mode, the complexity of the incremental adaptation techniques is O(nm) where n is the number of instances within a single window, m is the number of sub-clusters in the learning model. Given that the adaptation process is activated for only a subset of the data along the stream (approximately 35% of the data triggers model adaptation), the complexity of the adaptation technique has less impact on the overall computational complexity. The computational complexity in a typical KNN approach is O(nM) where n is number of instances in the stream and M is the stored instances, obviously (nM)is much higher than (nm), where m (number of sub-clusters) {M (number of stored instances in the training phase).

9

in Kilobyte. As shown in the table, the size of the learning model with a high-dimensional training data is just above 100 KBs. On the other hand, the model size is as small as few kilobytes using datasets with smaller dimensions. Number of training instances to build the model has no effect on the model size as we only store the extracted characteristics and dismiss the actual raw data. Only the number of attributes and the number of clusters/sub-clusters within each cluster would have an effect on the model size. Opp  Si is the data across all segments for subjecti in the OPPORTUNITY dataset. Hence, the only difference among Opp  Si training datasets is the

5.3. Analysis and discussion Fig. 5 displays screen shots of the developed mobile application for activity recognition. The accelerometer sensor reading along the three axes is displayed at the top as shown in the screen shots. The application displays the sequence of recognised activities that correspond to each window in real time as per Fig. 5(a). In case that STAR is unable to recognise the window activity, an active learning alert pops up for user to select the correct label for the unrecognised activity – as shown in Fig. 5(b). The selected label along with the confusing data are fed back to the system in order to adapt the model instantaneously. In Fig. 5(c), the application confirms the user selection of “sitting” activity as an input for active learning. The application continues to recognise the streaming activity with the adapted model as per Fig. 5(d). 5.3.1. Learning model One of the key contributions of STAR is the ability to build fine grained, light-weight learning model. Table 2 shows different datasets' characteristics and the corresponding learning model size

Fig. 5. STAR Mobile Application Screen Shots.

Table 2 Model size and training data characteristics. Training data

No. attributes

No. instances

No. classes

LM size

Opp  S1 Opp  S2 Opp  S3 Opp  S4 WISDM6act WISDM4act SPAD  train

110 110 110 110 3 3 3

134,613 133,023 124,320 105,082 32,262 166,226 2622

4 4 4 4 6 4 4

102 102 102 102 9 4 4

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

10

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

number of instances, the model size remains static regardless of the number of instances. For accelerometer only datasets, the only difference between WISDM6act and WISDM4act is the number of classes inside; WISDM4act is a subset of WISDM6act with data representing only four activities. Each class has a set of subclusters inside. Therefore, less number of classes means fewer characteristics to be stored and consequently smaller size of the learning model.

5.3.2. Dynamic adaptation The main objective of the adaptation component is to accommodate the concept drift in the stream with the minimum cost. The adaptation component applies both incremental and active learning techniques to fulfil its goal. For building an efficient system, ALRate should be at a minimum along the stream, not to bother user with too many inquiries. At the same time, active learning has to operate along the stream with a balanced rate to handle expected changes when necessarily. Fig. 6(a) and (b) displays the behaviour of active learning with a sample of the OPPORTUNITY dataset elapses for 45 min. In this run, LM is built on all segments of subject S4. All data in subject S1 is applied for testing. ALRate is the number of triggered active learning inquires. The above line in Fig. 6 displays the instances of stream moving across the four activities over time. The lower line represents the ALRate along the steam which corresponds to the evolving activities. We have noted some observations from this evaluation. Firstly, the number of inquiries is balanced along the stream. Thus, active learning is uniformly triggered along the stream and therefore expected to be efficient in handling concept drift whenever it occurs. Secondly, ALRate is reasonably small with the total of around 5% of processed data. Thirdly, active learning is tightly related to the recognition decision and the rate of change of activities. Areas of plateau which represent a sequence of data representing the same activity are most likely to trigger less active

learning inquires. Also, ALRate may increase at the beginning of the stream in order to personalise the model. Eventually, ALRate decreases which reflects the adaptation of the system to recent changes in the stream – as shown at the beginning of the stream in Fig. 6(b). In contrast, ALRate increases with the high rate of activity change especially in transitional periods. Both transitional activities and the frequent alternation between activities could be the most common motives for frequent triggering of active learning. Thus, the technique could be improved by including a filtering step for recognising the transitional activities at a high level (prior to applying STAR). This filter would isolate data that represents the transitional activities; therefore active learning will be only triggered in case of real change. Another approach for keeping the active learning rate at a minimum is to postpone the label inquiring until the change became persistent. In the current version of the algorithm, active learning is triggered when the most uncertain data is detected. The alternative approach, given also the high sampling rate, Table 3 STAR Performance with OPPORTUNITY. Test

Technique

TP

Stand w¼45%

Walk w¼ 26%

Lie w¼5%

Sit w¼ 24%

Pur

S1

STAR (%) DT (%) SVM (%)

80.7 63 76.8

88.4 57.4 88

56.5 68.1 39.3

93.1 46.6 66.4

52.1 65.5 33.2

99.2 – –

S2

STAR (%) DT (%) SVM (%)

74.9 54.5 71.8

82.5 49.1 71.5

50.1 70 56.4

89.9 39.2 60.6

47.3 48.2 87.1

99.1 – –

S3

STAR (%) DT (%) SVM (%)

68.6 56.9 64.7

87 76.7 65.3

44.3 56.7 57.4

86.1 0 27.7

7.2 40 56.2

98.9 – –

S4

STAR (%) DT (%) SVM (%)

71.1 43.4 52.9

84.2 30.1 71.4

43.6 70 13.7

90 0 69.1

30.3 50.2 59.6

98.6 – –

Fig. 6. Active learning inquires along the stream.

Fig. 7. Effect of incremental learning.

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

is to keep the suspected data in a short memory until a real and recurrent change detected and validated. Once the change is validated, it could be used to adapt the system with the true labels. In order to test the correlation between active and incremental learning rates, we run STAR while enabling only active learning (incremental learning is deactivated in this case) and evaluate the percentage of data requires active learning. Then, we run STAR with both active and incremental learning enabled. Fig. 7 shows the inverse relationship between incremental and active learning rates. The evaluation is carried on for N different subsets of the OPPORTUNITY dataset, each run has the same initial setup of training on a single subject data and testing on a new one. The percentage of

Table 4 STAR performance with WISDM. Technique

TP

Walking w¼ 35.7%

Jogging w¼34.42 %

Sitting w¼ 6 %

Standing w¼ 4.6 %

Stairs w¼ 19.2%

STAR (%) DT (%) SVM (%)

71.2 41.4 44.3

80.4 24.6 58.5

65 86.7 52.7

99.1 0 0

99.6 0 0

49.2 13.6 27.7

11

incremental learning varies from one run to another based on the nature of the data itself. Our focus in this figure is to display the effect of incremental learning on active learning, comparing the “soloactive” bar and “active with incremental bar”. The “solo-active” bar represents the fraction of data that triggers active learning with different runs while disabling incremental learning. On the other hand, the “active with incremental” bar displays the fraction of data that requires active learning when enabling both active and incremental learning in STAR. The above line is related only to “active with incremental” bar. It represents the fraction of data that requires incremental learning for each run. The figure shows an average of 20–35% of the data is used to update the model with the incremental learning approach. The percentage of active learning data is much lower with an average of 8% of the processed data. We compare the behaviour of active learning with and without enabling the incremental learning. When the percentage of incremental learning is small, i.e., at the first 5 runs, The percentage of data requires active learning when enabling incremental learning is more than or equal to the percentage of data for solo-active. The inverse happens as the incremental learning percentage increases (i.e. more than 27% after run 6). As shown in Fig. 7, the results show that with a reasonable increase of the percentage of incremental learning (from around

Fig. 8. Snapshot of the WISDM dataset. (a) Excluding stairs and (b) including stairs.

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

12

30%), the percentage of active learning decreases when incremental learning enabled compared to “solo-active”. This applies for each individual run with an increasing amount of incremental learning. In a nutshell, applying incremental learning has a positive impact on reducing the rate of active learning inquires. Incremental learning has the advantage of updating the learning model with the selected data automatically. As the percentage of incremental learning increases, the percentage of data requires active learning in contrast decreases. This means that the system requires less inquires for annotated data via active learning when incremental learning adapts the system automatically.

5.3.3. Recognition accuracy In this section, we evaluate the STAR performance with different datasets. Table 3 displays the STAR measures across users in the OPPORTUNITY dataset. We build the learning model from data of one user and evaluate the framework with other user's data. Each row represents the average measures of evaluation for a particular subject data. The training model for each subject is built from data of all other subjects. STAR is evaluated and compared to other static classification methods, i.e. decision tree and support vector machine. Both methods have been applied pervasively in the literature for AR. We run the traditional classification methods with Weka 3.7 built in techniques [31]. Despite the efficiency of DT and SVM for activity recognition in a static environment, these techniques lack the ability to adapt with the evolving data streams. Also, traditional classification techniques preserve training data and allow many iterations on data for prediction. However, this is not realistic in streaming settings especially with deployment on a limited resources device such as a mobile phone. The reported results also display the adaptation effect separately for each activity. w is the percentage of activity occurrences in the data stream. The effectiveness of detecting concurrent activities in each window is noted by the CPur percentage. CPur is the average purity

of all clusters generated along the stream. The high CPur value indicates the well separated pure clusters and demonstrates the ability to recognise concurrent activities in a stream window. As illustrated in Table 3, performance of STAR in terms of TP rate always outperforms decision trees (DT). STAR shows a significant accuracy enhancement from SVM for S3 and S4. The “Lie” class is effectively predicted with STAR. For the heavy weighted “Stand” class, STAR shows a stable accuracy – well above 80%. “Stand” is a challenging activity because it can easily overlap with other activities, such as walking. “Lie” class is small – only 5% – and hence tends to be misclassified. For instance, DT failed to detect any occurrence of lie class in both S3 and S4. We secondly evaluate STAR on the WISDM dataset. From data collected in the WISDM lab, users 36 and 27 have performed the five activities. To show the personalisation impact on STAR, we train the model on one of them and test on the other. The sub-clusters learning model tends to aggregate same activity with different patterns together. Therefore, we combine “Upstairs” and “Downstairs” activities into a single “Stairs” class with sub-clusters inside. Table 4 shows the performance of STAR compared to traditional classification techniques. Average purity for clusters CPur is 99.9%. Fig. 8 shows a snapshot of activities in WISDM dataset. In Fig. 8(a), we hide the “Stairs” class for better visibility to other classes. STAR could efficiently detect various user activities across multi-classes. The small size classes of “Sitting” and “Standing” are totally misclassified with other methods, while STAR recognition rate is up to 99% for same classes. As shown in the graph, “Sitting” and “Standing” classes are small yet dense clusters. Fig. 8 depicts the overlapping between stairs class and all other classes that results in poor recognition accuracy for “Stairs” class. Finally, we evaluate on SPAD dataset which is collected from mobile accelerometer sensor. Table 5 shows that STAR outperforms all other static techniques in terms of prediction accuracy. Because of its small size, “Run” class is the most challenging activity in this dataset as it overlaps with “Walk” activity. Although “drive” activity is small too, its corresponding recognition rate is high because it is well-distinguished from other classes/activities as shown in Fig. 9.

Table 5 STAR Performance with SPAD. Technique

TP

Walk w¼ 60.6%

Run w¼ 12.4%

Still w¼20%

Drive w¼ 7%

STAR (%) DT (%) SVM (%)

97.21 95.39 68.32

100 100 100

77.9 65.7 5

100 100 0

99.1 94.8 100

6. Conclusion Activity recognition is an important aspect in the area of pervasive computing, especially when dealing with non-stationary streaming data. In this paper, we have developed STAR, an adaptive and lightweight framework for recognising activities in an ubiquitous

Fig. 9. Snapshot of SPAD Dataset.

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

environment. The developed technique learns incrementally from the evolving data stream and adapts the system accordingly. The unsupervised continues learning approach aims at coping with the concept drift along the stream. Therefore, STAR performs an automated adaptation and personalisation for the recognition system to fit a particular user or context. STAR's ability to select the most profitable data with the lowest cost for active learning is another important contribution, especially in a streaming environment when labelling cost is very high. The proposed active learning technique is well suited for choosing only a small amount of data to be labelled. The system is further refined with the selected true labelled data. The developed technique is evaluated with real datasets in streaming settings and on a mobile phone. The experiments showed that STAR could successfully recognise activities from the evolving data stream with the incremental learning approach. The results also demonstrated the efficiency of the developed personalised system when applied across different users. STAR's low computational cost and real time recognition is another important result. The technique employed by the current version of STAR ensures an efficient recognition of activities in a dynamic streaming environment. Future work will focus on data preprocessing techniques that may further enhance the capabilities of the proposed technique. Another aspect for future work is to enhance the active learning approach for generating the lowest number of efficient inquires. Also, as important as adapting for changes in existing/known activities, further development will focus on detecting novel/unknown activities that have not been seen by the recognition system beyond deployment. References [1] Z. Abdallah, M. Gaber, B. Srinivasan, S. Krishnaswamy, Cbars: cluster based classification for activity recognition systems, in: Advanced Machine Learning Technologies and Applications. Communications in Computer and Information Science, vol. 322, Springer, Berlin, Heidelberg, 2012, pp. 82–91. [2] Z.S. Abdallah, M.M. Gaber, DDG: A-clustering: a novel technique for highly accurate results, in: Proceedings of the IADIS European Conference on Data Mining, 2009, pp. 163–167. [3] Z.S. Abdallah, M.M. Gaber, KB-CB-N classification: towards unsupervised approach for supervised learning, in: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011, part of the IEEE Symposium Series on Computational Intelligence. 2011, pp. 283–290. [4] Z.S. Abdallah, M.M. Gaber, B. Srinivasan, S. Krishnaswamy, Stream AR: incremental and active learning with evolving sensory data for activity recognition, in: IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI), vol. 1, IEEE, Athens, Greece, 2012, pp. 1163–1170. [5] L. Bao, S. Intille, Activity recognition from user-annotated acceleration data, in: A. Ferscha, F. Mattern (Eds.), Pervasive Computing. Lecture Notes in Computer Science, vol. 3001, Springer, Berlin, Heidelberg, 2004, pp. 1–17. [6] T.F. Chan, G. Golub, R.J. LeVeque, Algorithms for computing the sample variance: analysis and recommendations, Am. Stat. 37 (1983) 242–247. [7] T.M. Do, S.W. Loke, F. Liu, Healthylife: an activity recognition system with smartphone using logic-based stream reasoning, in: Mobile and Ubiquitous Systems: Computing, Networking, and Services, Springer, Tokyo, Japan, 2013, pp. 188–199. [8] P. Domingos, G. Hulten, Mining high-speed data streams, in: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Boston, Massachusetts, USA, 2000, pp. 71–80. [9] J.B. Gomes, S. Krishnaswamy, M.M. Gaber, P.A. Sousa, E. Menasalvas, Mars: a personalised mobile activity recognition system, in: 2012 IEEE 13th International Conference on Mobile Data Management (MDM), IEEE, Bengaluru, Karnataka, 2012, pp. 316–319. [10] J.B. Gomes, S. Krishnaswamy, M.M. Gaber, P.A. Sousa, E. Menasalvas, Mobile activity recognition using ubiquitous data stream mining, in: Data Warehousing and Knowledge Discovery, Springer, Vienna, Austria, 2012, pp. 130– 141. [11] W. Hu, D. Xie, T. Tan, S. Maybank, Learning activity patterns using fuzzy selforganizing neural network, IEEE Trans. Syst., Man, Cybern., Part B: Cybern. 34 (3) (2004) 1618–1626. [12] S. Huang, Y. Dong, An active learning system for mining time-changing data streams, Intell. Data Anal. 11 (4) (2007) 401–419. [13] S.S. Intille, Ubiquitous computing technology for just-in-time motivation of behavior change, Stud. Health Technol. Inf. 107 (Pt 2) (2004) 1434–1437. [14] D.E. Knuth, The art of computer programming, Seminumerical Algorithms, 3rd edition, vol. 2, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1997.

13

[15] N.C. Krishnan, D.J. Cook, Activity recognition on streaming sensor data, Pervasive Mob. Comput. 10 (Feb.) (2014) 138–154. [16] J.R. Kwapisz, G.M. Weiss, S.A. Moore, Activity recognition using cell phone accelerometers, ACM SigKDD Explor. Newslett. 12 (2) (2011) 74–82. [17] J.W. Lockhart, G.M. Weiss, The Benefits of Personalized Smartphone-Based Activity Recognition Models, Chapter 69, 2014, pp. 614–622. [18] B. Longstaff, S. Reddy, D. Estrin, Improving activity classification for health applications on mobile devices using active and semi-supervised learning, in: Pervasive Computing Technologies for Healthcare (PervasiveHealth) 2010, March 2010. pp. 1–7. [19] M.M. Masud, C. Woolam, J. Gao, L. Khan, J. Han, K.W. Hamlen, N.C. Oza, Facing the reality of data stream classification: coping with scarcity of labeled data, Knowl. Inf. Syst. 33 (1) (2012) 213–244. [20] I. Muslea, S. Minton, C.A. Knoblock, Selective sampling with redundant views, in: AAAI/IAAI, 2000, pp. 621–626. [21] T. Peterek, M. Penhaker, P. Gajdoš, P. Dohnálek, Comparison of classification algorithms for physical activity recognition, in: Innovations in Bio-inspired Computing and Applications, Springer, Ostrava, Czech Republic, 2014, pp. 123– 131. [22] S.J. Preece, J.Y. Goulermas, L.P. Kenney, D. Howard, K. Meijer, R. Crompton, Activity identification using body-mounted sensorsa review of classification techniques, Physiol. Meas. 30 (4) (2009) 1–33. [23] D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. Forster, G. Troster, P. Lukowicz, D. Bannach, G. Pirkl, A. Ferscha, J. Doppler, C. Holzmann, M. Kurz, G. Holl, R. Chavarriaga, H. Sagha, H. Bayati, M. Creatura, J. del R Millan, Collecting complex activity datasets in highly rich networked sensor environments, in: 2010 Seventh International Conference on Networked Sensing Systems (INSS), June 2010, pp. 233–240. [24] E.J. Spinosa, A.P. de Leon F de Carvalho, J. Gama, Olindda: A cluster-based approach for detecting novelty and concept drift in data streams, in: Proceedings of the 2007 ACM symposium on Applied computing. ACM, Seoul, Korea, 2007, pp. 448–452. [25] M. Sricharan, V. Vaidehi, P. Arun, An activity based mobility prediction strategy for next generation wireless networks, in: IFIP International Conference on Wireless and Optical Communications Networks, IEEE, Bangalore, 2006, pp. 5–10. [26] C. Stauffer, W.E.L. Grimson, Learning patterns of activity using real-time tracking, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 747–757. [27] M. Stikic, K. Van Laerhoven, B. Schiele, Exploring semi-supervised and active learning for activity recognition, in: 12th IEEE International Symposium on Wearable Computers (ISWC 2008), IEEE, Pittsburgh, PA, 2008, pp. 81–88. [28] Q.V. Vo, M.T. Hoang, D. Choi, Personalization in mobile activity recognition system using-medoids clustering algorithm. Int. J. Distrib. Sens. Netw. (2013), http://dx.doi.org/10.1155/2013/315841. [29] G.M. Weiss, J.W. Lockhart, The impact of personalization on smartphonebased activity recognition, in: AAAI Workshop on Activity Context Representation: Techniques and Languages, 2012. [30] D.H. Widyantoro, J. Yen, Relevant data expansion for learning concept drift from sparsely labeled data, IEEE Trans. Knowl. Data Eng. 17 (3) (2005) 401–412. [31] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition, Morgan Kaufmann, San Francisco, CA, 2005. [32] J. Yang, Toward physical activity diary: motion recognition using simple acceleration features with mobile phones, in: Proceedings of the 1st International Workshop on Interactive Multimedia for Consumer Electronics (IMCE'09), ACM, Beijing, China, 2009, pp. 1–10. [33] J. Yang, H. Lu, Z. Liu, P.P. Boda, Physical activity recognition with mobile phones: challenges, methods, and applications, in: Multimedia Interaction and Intelligent User Interfaces, Springer, 2010, pp. 185–213.

Zahraa Said Abdallah is a final year Ph.D. student at Monash University, Australia. She was ranked first in OPPORTUNITY Activity Recognition Challenge. OPPORTUNITY is a FET-Open (Future and Emerging Technologies Open Call) project under the Information and Communication Technologies theme of the 7th Framework Program of the European Commission Handte et al. (2010).

Mohamed Medhat Gaber is a Reader at the School of Computing Science and Digital Media, Robert Gordon University. He received his Ph.D. from Monash University, Australia. His Ph.D. thesis was nominated for Mollie Holman award for best computer science dissertation in 2006. After completing his Ph.D., he then held appointments with the University of Sydney, CSIRO, and Monash University, all in Australia. Prior to joining Robert Gordon University, he worked for the University of Portsmouth in the UK as a Senior Lecturer. He has published over 100 papers, co-authored

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

14

Z.S. Abdallah et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

one monograph-style book, and edited/co-edited 5 books on data mining and knowledge discovery. His publications have attracted well over 1800 citations, and as of September 2014, his h-index is 20, and g-index is 40, with 110 cited publications. He has served in the program committees of major conferences related to data mining, including ICDM, PAKDD, ECML/PKDD and ICML. He has also co-chaired over 10 workshops and special sessions on various data mining topics. He is recognised as a fellow of the UK Higher Education Academy (HEA). He is also a member of the International Panel of Expert Advisers for the Australasian Data Mining Conferences. In 2007, he was awarded the CSIRO teamwork award. More information is available at: 〈http://mohamedmgaber.weebly.com/〉.

Shonali Krishnaswamy is the Head of the Data Analytics Department at the Institute for Infocomm Research (I2R), Singapore. Prior to this she was an associate professor (2009–2013), a Senior Lecturer (2006–2009) and an Australian Research Council (ARC) Australian Post Doctoral (APD) Fellow (2003–2005) at Monash University. She was the Director of the Centre for Distributed Systems and Software Engineering which is one of the five research centres in the Faculty of IT from (2010 to 2011). She was the Higher Degrees by Research Coordinator (2009–2010) for the Caulfield School of IT.

Bala Srinivasan is a professor of Information Technology in the Faculty of Information Technology, Monash University, Australia. He has more than 30 years of experience in academia, industries and research organizations. He has authored and jointly edited technical books and published in international journals and conferences in the areas of multimedia databases, data communications, data mining and distributed systems. He is a gold medallist in Bachelor of Engineering Honours degree in Electronics and Communication Engineering, Guindy Engineering College, University of Madras, India, a Master's and a Ph.D. degree both in Computer Science from the Indian Institute of Technology, Kanpur, India.

Please cite this article as: Z.S. Abdallah, et al., Adaptive mobile activity recognition system with evolving data streams, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.09.074i

Suggest Documents