Synchronous Dynamic View Learning - EPSL | Embedded and

3 downloads 0 Views 1MB Size Report
Perron-Frobenius theorem [15], we know that spectral radius of C is less that ...... [9] Ramin Fallahzadeh, Samaneh Aminikhanghahi, Ashley Nichole Gibson, and.
Synchronous Dynamic View Learning: A Framework for Autonomous Training of Activity Recognition Models using Wearable Sensors Seyed Ali Rokni

Hassan Ghasemzadeh

Washington State University School of Electrical Engineering and Computer Science Pullman, WA 99164–2752, USA [email protected]

Washington State University School of Electrical Engineering and Computer Science Pullman, WA 99164–2752, USA [email protected]

ABSTRACT Wearable technologies play a central role in human-centered Internetof-Things applications. Wearables leverage machine learning algorithms to detect events of interest such as physical activities and medical complications. These algorithms, however, need to be retrained upon any changes in configuration of the system, such as addition/ removal of a sensor to/ from the network or displacement/ misplacement/ mis-orientation of the physical sensors on the body. We challenge this retraining model by stimulating the vision of autonomous learning with the goal of eliminating the labor-intensive, time-consuming, and highly expensive process of collecting labeled training data in dynamic environments. We propose an approach for autonomous retraining of the machine learning algorithms in real-time without need for any new labeled training data. We focus on a dynamic setting where new sensors are added to the system and worn on various body locations. We capture the inherent correlation between observations made by a static sensor view for which trained algorithms exist and the new dynamic sensor views for which an algorithm needs to be developed. By applying our realtime dynamic-view autonomous learning approach, we achieve an average accuracy of 81.1% in activity recognition using three experimental datasets. This amount of accuracy represents more than 13.8% improvement in the accuracy due to the automatic labeling of the sensor data in the newly added sensor. This performance is only 11.2% lower than the experimental upper bound where labeled training data are collected with the new sensor.

CCS CONCEPTS • Human-centered computing → Ubiquitous and mobile computing; • Computing methodologies → Machine learning;

KEYWORDS Wearable Sensors, Machine Learning, Signal Processing, IoT

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. IPSN 2017, Pittsburgh, PA USA © 2017 ACM. 978-1-4503-4890-4/17/04. . . $$15.00 DOI: http://dx.doi.org/10.1145/3055031.3055087

ACM Reference format: Seyed Ali Rokni and Hassan Ghasemzadeh. 2017. Synchronous Dynamic View Learning: A Framework for Autonomous Training of Activity Recognition Models using Wearable Sensors. In Proceedings of The 16th ACM/IEEE International Conference on Information Processing in Sensor Networks, Pittsburgh, PA USA, April 2017 (IPSN 2017), 12 pages. DOI: http://dx.doi.org/10.1145/3055031.3055087

1

INTRODUCTION

Internet of Things (IoT) is emerging as a promising paradigm for a large number of application domains such as environmental and medical monitoring, transportation, manufacturing, smart grids, and home automation [9, 10, 13, 20]. Many of these applications involve humans, where humans and things operate synergistically [22]. At the heart of these systems is human monitoring where wearables are utilized for sensing, processing, and transmission of physiological and behavioral data. Typically, sensors acquire physical measurements, use computational models such as machine learning and signal processing algorithms for local data processing, and communicate the results to the cloud. The computational algorithms allow for continuous and realtime extraction of clinically important information from wearable sensor data. These algorithms, however, need to be reconfigured (i.e., retrained) upon any changes in configuration of the system, such as addition/ removal of a sensor to/ from the network, displacement/ misplacement/ mis-orientation of the sensors, replacement/ upgrade of the sensors, adoption of the system by new users, or changes in physical/ behavioral status of the user. Without reconfiguration, uncertainties lead to dramatic decrease in performance of the computational algorithms. For example, the accuracy of activity recognition can drop from 98% to 33% due to sensor displacements [21]. As another example, sensor displacement can increase the error of regression-based MET (Metabolic Equivalent of Task) estimation by a factor of 2.3 [1]. The reason for such a sudden performance degradation is that the machine learning algorithms learn a model based on a set of training data that is collected in a controlled environment. The outcome of the algorithms, however, changes significantly, when the system is utilized in a situation different than that of the training. A major challenge with reconfiguration of the machine learning algorithms is that retraining of these algorithms requires collecting sufficiently large amount of labeled training data. Collecting enough labeled data is a time consuming, labor-intensive, and expensive process that has been identified as a major barrier to personalized medicine [27]. This problem becomes more challenging considering

IPSN 2017, April 2017, Pittsburgh, PA USA that wearables are deployed in highly dynamic and uncontrolled environments, mainly due to their direct and continuous exposure to end-users and their living environments. Thus, it is imperative that we develop automatically reconfigurable algorithms as wearable sensors, setting in which they are utilized, and their configuration change in end-user settings. We envision that wearables of the future must be autonomous in the sense that their underlying computational algorithms reconfigure automatically without need for collecting any new labeled training data. Our pilot application in this study is activity recognition where wearables are used to detect and profile activities of daily living or physical exercise of the user. In this paper, we make one of the earliest attempts in developing an autonomous learning framework for wearables. Specifically, we focus on cases where a new sensor is added to the system and the new sensor is worn/used on various body locations. Addressing the problem of expanding pattern recognition capabilities from a single setting algorithm with a predefined configuration to a dynamic setting where sensors can be added, displaced and used unobtrusively is challenging. In such cases, successful knowledge transfer is needed to improve the learning performance by avoiding expensive data collection and labeling efforts. We address this problem by proposing a novel method to transfer activity recognition capabilities of existing static sensor to a newly added dynamic sensor. After training, the dynamic sensor can act independently and therefore, the reliability of the network increases due to failure of the static sensor. Prior studies demonstrated that we can synchronize wireless sensors in a wearable network [5, 11]. Therefore, a newly added sensor can accumulate labels provided by old sensors and build its own dataset. Furthermore, we proposed a calibration method for transferred labels by performing clustering in joint feature space of both sensors [19]. However, these prior researches assume that the location of the new sensor is fixed. In reality, however, wearable sensors can potentially move in body coordinates as they are utilized by end-users. Addressing this problem is challenging. A prior study showed that clustering sensor observations is not an effective approach in separating sensor observations associated with various body locations [18]. Our approach allows to transfer machine learning knowledge from an existing sensor, called static view, to a new dynamic sensor, called dynamic view, and these capabilities are calibrated through the sensing observations made in the dynamic view. We refer to this problem as Synchronous Dynamic View Learning (SDVL). We formulate this problem using Linear Programming, introduce a similarity graph modeling of the problem, and propose an iterative updating approach to solve the problem. We evaluate the performance of the proposed autonomous learning approach using real data collected using wearable motion sensors.

2

RELATED WORK

Our approach to realizing self-configurable system is based on the concept of transfer learning. Transfer learning applies knowledge learned from one problem domain, the source (e.g., existing sensors in this case), to a new but related problem, the target (e.g., a new sensor). Transfer learning approaches can be classified into instance transfer, feature representation transfer, parameter transfer, and relational knowledge transfer [14]. Our approach of developing

Rokni et al. autonomous learning solutions for sensor additions in synchronous views falls within the category of instance transfer. In order to transfer instances, TrAdaBoost, an extension of AdaBoost, proposed a boosting algorithm to enable transferring knowledge from one domain to another by using training data from the source domain to incrementally build up the training data for the target domain [7]. However, in dynamic environments, the assumption that both source and target systems have the same feature space and operate in the very similar domains is not realistic. In transfer learning research, when source and target domains work on the same data points but have different feature spaces, multi-view learning approaches [24] can be used. In smart home applications, previous research showed that it is possible to avoid data collection phase by transferring classifier models of activity recognition in one home to another with similar activity recognition systems [25]. To address the need for a common feature space, the authors proposed “meta-features”, which are the common ground on which the knowledge transfer can happen. However, defining meta-features in real time is challenging. Moreover, the previous research was used with binary sensors in smart home settings. Similarly, authors in [10] introduced a Feature-Space Remapping method to enable transfer learning between domains with different feature spaces. Teacher / Learner (TL) transfer learning has being used when there is no direct access to training data. Instead, source trained classifier operates simultaneously with the target learner and provides labels of newly observed data points to gather enough training data. Although noticeably less studies have focused on using teacher/ learner approaches, such approaches gave shown promising outcomes in increasing the transfer learning performance. Several studies applied the teacher/learner model to develop an opportunistic system capable of performing reliable activity recognition in a dynamic sensor environment [5, 11, 17]. The work in [5] showed that by synchronizing current sensor and new sensor, the existing sensor can provide labels of future activities. This approach requires two sensors to be worn for a long time until a sufficient amount of different activities are performed by the user. Furthermore, it needs constant rate of data transmission between existing sensor and new sensor. One of the main problems of this approach and many other teacher/ learner approaches is that the accuracy of the learner is bounded by the accuracy of the teacher. In addition, it needs a very reliable source classifier because the only source of gathering ground truth labels is the source sensor. Therefore, the learner is completely reliant upon labels provided by the teacher. Authors in [19] suggested a calibration method for transferred labels by clustering observations in a combined feature space of source and target. However, they assumed that the location of tge target sensor is fixed while it is not a realistic assumption for mobile sensors such as smart phones, which move in the body coordinate of the user as well. In this paper, we aim to introduce a method for transferring and refining labels while placement of the target sensor changes in real-time. In this paper, we use a graph propagation method to refine the transferred labels. A similar approach is used in the area of semi-supervised learning when the dataset contains few labeled

Synchronous Dynamic View Learning

IPSN 2017, April 2017, Pittsburgh, PA USA

AR

AR

AR

AR

AR

AR AR

AR

AR

AR

AR

AR

AR

AR

AR

AR

AR

AR

AR

(a) Initial Setting

(b) Dynamic sensor

additionAR

(c) Autonomous learning

(d) Collaborative recognition

Figure 1: Synchronous Dynamic View Learning: (a) Initial setting with available activity recognition algorithm for an existing/static sensor; (b) A new dynamic (e.g., moving) sensor is added to the system and can be freely worn on various body locations; (c) Subject starts performing activities while static sensor sends information of its currently recognized activity to the new sensor; The newly added sensor refines the received information and learns its own classifier. (d) After training phase, both static and dynamic sensors can collaborate in activity recognition task. and many unlabeled instances [28]. Leveraging graph-based semisupervised learning have shown promising result in Natural Language Processing such as part-of-speech tagging [23] and machine translation [26]. In contrast to semi-supervised learning, we have a set of uncertain labeled data with different levels of accuracy. We use label propagation methods to refine low-accurate labels to obtain labels of higher accuracy.

3

SDVL FRAMEWORK

Figure 1 shows the evolution of a wearable network when a new dynamic sensor is added to the system with a previously trained machine learning model (i.e., activity recognition classifier), the process of autonomous learning, and the new activity recognition algorithm in action. Initially, the network consists of a fixed number of sensors (e.g., one sensor worn on ‘ankle’) with trained machine learning models for activity recognition. We refer to the collection of the existing sensors as ‘static view’. A new untrained sensor (e.g., a smartphone) is added to the system. The new sensor can be naturally used on different body locations (e.g., ‘wrist’, ‘waist’, and ‘pocket’). We refer to this newly added dynamic sensor ‘dynamic view’. Our synchronous dynamic view learning captures sensor data in both views simultaneously, labels instances captured by ‘dynamic’ based on the collective knowledge of ‘static’ and ‘dynamic’, and constructs a new classification algorithm for activity recognition. During the data collection phase, dynamic sensor can be placed on different body locations and data gathering continues. As soon as a new classification algorithm is constructed using our SDVL approach, the system can be used to detect physical activities with higher accuracy in a collaborative fashion. Additionally, it is possible to remove the static node from the network and continue only with the newly added sensor. This is consistent with real scenarios where dynamic wearable sensors are exposed to robust but temporarily available static sensors. For example, a wearable sensor can learn from sensors placed in a home or office (e.g., Kinect, cameras, smart-home sensors). The knowledge transferred from these statics sensors can be used to train dynamically changing wearable sensors for the purpose of activity recognition.

3.1

Problem Definition

An observation X i made by a wearable sensor at time ‘ti ’ can be represented as a D-dimensional feature vector, X i = {fi1 , fi2 , . . . , fi D }. Each feature is computed from a given time window and a marginal probability distribution over all possible feature values. The activity recognition task is composed of a label space A = {a 1 , a 2 , ..., am } consisting of the set of labels for activities of interest, and a conditional probability distribution P(A|X i ) which is the probability of assigning a label a j ∈ A given an observed instance Xi . Although a sensor with a trained classifier can detect some activities, its observations are limited to the body segment on which the sensor is worn. Therefore, some activities are not recognizable in one sensor’s view point. For example, a sensor worn on ‘ankle’, although expert in detecting lower-body activities such as ‘walking’, cannot easily detect activities such as ‘eating’ that mainly involves upper-body motions. To increase accuracy of activity recognition, it is desirable to add more sensors to the network to cover a larger set of physical activities. We note that advances in embedded sensor design and wearable electronics allow end-users to utilize new wearable sensors. In such a realistic scenario, the newly added sensor has not been trained to collaborate with existing sensors in detecting physical activities. The general goal of our study is to develop an autonomous learning algorithm for newly added sensor to detect activities that are not recognizable by existing sensors. Without loss of generality, we assume that the static view consists a single sensor, in our formulation of the SDVL problem. In the case of multiple static sensors, an ensemble classifier on all static sensors can be used as an integrated static view. Problem 1 (SDVL). Let P = {p1 , . . . , pk } be a set of k possible sensor placements. Let Ps ⊂ P be a set of placement of existing sensors with an integrated trained classification model. Furthermore, let Pt ⊂ P be a set of possible placements of the newly added sensor. Moreover, let A={a 1 , a 2 , . . . , am } be a set of m activities/ labels that the system aims to recognize, and X={X 1 , X 2 , . . . , X N } a set of N observations made by the dynamic sensor when placed in any body locations l ∈ Pt . The Synchronous Dynamic View Learning (SDVL) problem is to accurately label observations made by the dynamic sensor such that the mislabeling error is minimized. Once these instances are labeled,

IPSN 2017, April 2017, Pittsburgh, PA USA

Rokni et al.

it is straightforward to develop a new activity recognition classifier using the labeled data. In Figure 1, Ps = {‘ankle’} as illustrated in Figure 1a. As shown in Figure 1b, Pt = {‘wrist’, ‘waist’, ‘pocket’} denoting that the dynamic sensor may be worn on three different locations on the body. During autonomous learning (i.e., in Figure 1c), as the user performs physical activities in A, instances in X are observed by the dynamic sensor.

3.2

Problem Formulation

The SDVL problem described in Problem 1 can be formulated as follows. N Õ m Õ Minimize x i j ϵi j (1) i=1 j=1

Subject to:

N Õ i=1 m Õ

x i j ≤ ∆i ,

∀j ∈ {1, . . . , m}

(2)

xi j = 1 ,

∀i ∈ {1, . . . , N }

(3)

∀i, j

(4)

j=1

x i j ∈ {0, 1} ,

where x i j is a binary variable indicating whether or not instance X i ∈ X is assigned activity label a j , and ϵi j denotes error due to such a labeling. The constraint in Equation 2 guarantees that a j is assigned to at most ∆i instances, where ∆i is the upper bound on the number of instances with a j as their possible labels. Since each instance must be assigned to exactly one actual activity label, the constraint in Equation 3 guarantees that each instance X i receives one and only one activity label. If ϵi j and ∆i are known, this problem can be assumed as a Generalized Assignment Problem [6]. The main challenge in solving the SDVL problem is that the ground truth labels are not available in the dynamic view and therefore ϵi j are unknown a priori. In this paper, we estimate ϵi j using a combination of confusion matrix of the static view and disagreement among hypothetically similar instances in the dynamic view. Static sensor has a limited ability in detecting activities depending on the physical placement of the sensor on the body. In order to quantify capability of the static sensor for activity recognition, we define an ‘ambiguity relation’ as follows.

Definition 3.2 (Likelihood Deficiency). Given a sensor sp and a pair of activity ai , a j ∈ A, where ((p, ai ), a j ) ∈ ARp . Let np (ai , a j ) denote the number of instances of activity a j predicted by sp as ai . The likelihood deficiency Lp (ai , a j ) is given by: np (ai , a j ) a k ∈A np (ai , ak )

Lp (ai , a j ) = Í

(5)

We note that ((p, ai ), a j ) < ARp implies Lp (ai , a j ) = 0. As soon as likelihood deficiencies are computed, the static view, instead of sending the predicted label, will transmit its predicted label combined with the likelihood of actual label. We call this complex label a semi-label generated by the static view and transmitted to the dynamic view for the purpose of autonomous learning. Definition 3.3 (Semi-Label). A semi-label SLi generated by a sensor sp is a 2-tuple (ai ,W ) where |W | = |A| and Wj = Lp (ai , a j ). In other word, W is a likelihood vector of all possible actual labels when predicted label is ai . In this paper, we use confusion matrix of the classification algorithm used in the static view to retrieve values of the likelihood deficiency function. We then resolve this recognition deficiency by combining information of multiple sensors through Minimum Disagreement Labeling process described in Section 4.

3.3

Autonomous Learning Overview

In our dynamic view learning approach, the set of static sensors acts as a ‘Teacher’ and sends a semi-label of the current activity to the dynamic view in real-time. The dynamic view, also called ‘Learner’, learns from the information provided by ‘Teacher’ and boosts the labeling accuracy by combining its local observations with the received semi-labels. Thus, the challenging task is how to convert semi-labels to exact-labels such that the overall labeling error is minimized. Algorithm 1 Synchronous Dynamic View Learning Algorithm

Definition 3.1 (Ambiguity Relation). Let sp refer to a static sensor placed on body-location p. An ambiguity relation ARp for sensor sp is defined as a mapping P × A → A such that for each sensor location p ∈ P and activity pair a 1 , a 2 ∈ A, ((p, a 1 ), a 2 ) ∈ ARp if and only if there exists a sensor sp placed on body joint p using which a 1 and a 2 are unrecognizable. In other word, there exists an observation instance with predicted label a 1 made by sp which corresponds to the actual activity a 2 .

while (more sensor data are required) do Static performs activity recognition on its observation at time t. Static detects activity at . Static computes semi-label SLt for activity at . Static sends SLt to the sensors in Dynamic view. Dynamic receives semi-label SLt Dynamic assigns SLt to its observation X t at time t. if Dynamic sensor needs displacement then change the location of Dynamic sensor Use Minimum Disagreement Labeling (MDL) to compute the best distribution for each semi-label

The ambiguity relation represents confusion in detecting a particular activity given an instance of the sensor data. We use the concept of ambiguity relation to determine deficiency of a sensor in detecting particular activities given a collection of training instances in the static view. To quantify this deficiency of the static view with a given location p and each ((p, ai ), a j ) ∈ ARp , we define Lp (ai , a j ) as likelihood deficiency of sensor sp in confusing action ai with activity a j .

Algorithm 1 shows our heuristic approach for solving the optimization problem described in Problem 1. We assume that the sensors within the two views are worn on the body at the same time while the user performs physical activities. Because the static and dynamic sensors may have different clocks, the dynamic sensor needs to know to which observation each semi-label corresponds. We resolve the problem of different clocks by first synchronizing

Synchronous Dynamic View Learning

IPSN 2017, April 2017, Pittsburgh, PA USA

static and dynamic sensors. At time t, the static sensor detects an instance of the current activity, at ; it extracts corresponding semilabel SLt and transmits (SLt , t) at the size of |A| to the dynamic sensor via a wireless link. Usually |A| is very small and we can buffer semi-label values and send them as a batch. When the dynamic sensor receives (SLt , t), it assigns SLt to its observation of time t. When sufficient instances of each activity are gathered, the Minimum Disagreement Labeling (MDL) is applied on the gathered data to label all instances in the dynamic view to calibrate transferred labels.

4

MINIMUM DISAGREEMENT LABELING

The intuition behind minimum disagreement labeling is that instances of the same location and activity are similar and close in the feature space. Problem 2 (Minimum Disagreement). Let X={X 1 , X 2 , . . . , X N } a set of N observations representing N instances in feature space RD . Furthermore, let SL={SL1 , SL2 , . . . , SLN } be the semi-labels associated with the N observations in X. Minimum Disagreement Labeling (MDL) is the problem of refining each SLi such that the amount of disagreement among similar instances is minimized. We solve the problem of minimum disagreement in two phases. First, we construct a similarity graph. We then refine labels through neighboring nodes in the similarity graph. Definition 4.1 (Similarity Graph). A Similarity Graph G(V , E) is a weighted graph where each node in V represents an instance in X (i.e., X = V ). Each node in the graph maintains a vector of its own feature values and the probability distribution of its labels (i.e., semi-label). Each edge ei j ∈ E represents the amount of similarity between instances X i and X j . In this paper, we use Gaussian kernel [3] to quantify the amount of similarity between X i and X j . We denote this similarity measure by η(X i , X j ) and compute it as follows: kX i − X j k 2 2σ 2 η(i, j) = e −

(6)

Because instances of different activities do not contribute to the label of each other, we build similarity graph using k-NN schema which is one of the most popular approaches in similarity graph construction [12]. Therefore, we measure edge weights in the similarity graph using the following equation: n η(i, j) if i ∈ κ(j) or j ∈ κ(i) (7) 0 otherwise where κ(i) is the set of k-nearest-neighbors of node X i based on the defined similarity function. With the introduced similarity graph and label refinement, now we can estimate error ϵi j in Equation 1 by measuring the amount of disagreement between similar instances. This also converts the SDVL problem into the problem of minimizing the energy function (disagreement) E given by ei j =

E(Fi ) =

1 Õ kF(i) − F(j)k 2 2 j ∈κ(i)

(8)

where Fi , a vector of size SLi , denotes the classification function that assigns the probability distribution of label on point X i . Eventually, the label for X i is given by arg maxv ≤ |A | Fiv .

4.1

Label Refinement

The idea of label refinement is similar to label propagation, a wellknown approach in the area of semi-supervised learning when the dataset contains both labeled and unlabeled instances [28]. The intuition behind this idea is that similar instances should have same labels. This method tries to connect similar instances first. It then transfers labels of known neighbors to those unlabeled instances. We utilize this idea to refine the semi-labels provided by the static sensor. In contrast to semi-supervised learning, which deals with a set of labeled data and a set of unlabeled data, we have a set of uncertain labeled data with different levels of accuracy. We use label propagation method to refine less accurate labels and obtain higher accurate labels. In other words, the activity recognition capability of the static sensor, due to its placement, is limited because the static view cannot reliably distinguish among some activities. Therefore, there is no guarantee that labels predicted by the source view are perfect. Additionally, the dynamic sensor, because it is carried on different body locations, may have some discriminating features which can separate instances of different activities that were overlapped in the static feature space. To have a clear understanding of the label refinement process, we have magnified parts of the graph shown in Figure 2a and highlighted those nodes in Figure 2b where different colors correspond to different label distributions. The label/color of each node is determined according to maximum probability. There are 3 orange nodes, 1 green node, and 1 blue node, forming the 5 closest neighbors of the center node with initial distributions similar to its blue neighbor. After performing one update iteration, as shown in Figure 2c, label distribution of the center node changes and its color is converted to orange. We refine distribution of the labels for one node through its neighbors. A larger edge weight allows for more contribution to the neighbor’s label. First, we normalize the adjacency matrix for G by defining a contribution matrix C such that Ci j = Í

ei j

k ∈V eik

(9)

C specifies that Õ

Ci j = 1 Ci j ≥ 0

(10)

j

During each iteration of label propagation, each node takes a fraction of label information from its neighborhood, and maintains some part of its initial distribution. Consequently, for an instance x i at time t + 1 we have: Fit +1 = α

Õ j

Ci j Ftj + (1 − α)SLit

(11)

where 0 ≤ α ≤ 1 is a mixing coefficient which specifies the fractional contribution of the neighbors. The algorithm start with initial semi-labels. That is, Fi0 = SLi . Let t F = {Ft1 , Ft2 , . . . , Fnt } be the classification function for n observations. We write the batch update as follows.

IPSN 2017, April 2017, Pittsburgh, PA USA

Rokni et al.

(b) Distributions prior to propagation

(a) Similarity graph examples

(c) Distributions after one step propagation

Figure 2: Similarity graph and propagation process: (a) Similarity graph construction: each instance is connected to k (e.g., k=5) closest neighbors; (b) Prior to label propagation center node has an incorrect label; (c) As a result of label propagation, similar nodes have similar label distributions.

5.1 Ft +1 = αCFt + (1 − α)SL

(12)

We can continue this updating process until it converges to a value such that Ft +1 = Ft . We will prove that this update process will eventually converge, however, we will show that after several iterations the rate of change decreases dramatically and we can stop at an approximately close solution. Theorem 4.2 (convergence). Equation 12 will eventually converge. Proof. Based on Equation 12 Ft = (αC)t −1 SL + (1 − α)

t −1 Õ

(αC)i SL

(13)

i=0

where F0 = SL. Í We defined C as j Ci j = 1 and Ci j ≥ 0; therefore, based on Perron-Frobenius theorem [15], we know that spectral radius of C is less that or equal to 1. Furthermore, 0 ≤ α ≤ 1, then lim (αC)t −1 = 0

t →∞

t −1 Õ lim (αC)i = (I − αC)−1

t →∞

We designed an experiment to gather wearable motion sensor data during daily physical activities. The data collection took place between July and December 2015. Nine healthy subjects aged between 20 and 29 years were recruited to participate in this experimental study. Each participant was asked to perform the set of 22 physical activities from high intensity activities such as running or jumping to medium and low intensity such as normal walking and sitting while typing. During each experiment subjects wore 5 Shimmer [4] sensors on ‘chest’, ‘back’, ‘right-arm’, ‘left-thigh’, and ‘head’. The relative orientation of all the sensors with respect to the body coordinates remained consistent across all participants and throughout data collection. All non-strenuous activities were performed for 1 minute while the other activities were performed for a set number of repetitions. The data were continuously collected at a frequency of 50Hz during each experimental activity and sent wirelessly to a gateway computer. The collected data contained over 28, 500, 000 samples of acceleration, angular velocity, and magnetic field. The data were used to extract 10 statistical features from individual sensor streams.

(14)

5.2 (15)

i=0

This shows that both parts of the equation will converge to a constant value.  We know so far that matrix inversion could be calculated at O(n2.37 ) [8]. Furthermore, each iteration of the update algorithm can be perform at O(n) and we show with a constant number of iterations we can stop at a approximately close point to the optimal solution. There is a trade-off between lowest energy graph and computation. We discuss stopping criteria in the Section 5.3.

5

Data Collection

VALIDATION

In this section, we demonstrate the effectiveness of the proposed transfer learning approach using data collected in experiments involving human subjects performing various daily physical activities while wearing motion sensor nodes. In addition to our collected data, we validate our approach using two publicly available datasets.

Publicly Available Datasets

To test the generalizability of our approach, we also evaluated the performance of SDVL on two publicly available datasets. The first public dataset that we used in this paper is the OPPORTOUNITY dataset (OPP) [16] which is used to evaluate our approach on locomotion activities. In OPP, four subjects operated in a room simulating daily life activities by interacting with real objects while seven inertial measurement units (IMUs) were placed on their back, (right/left) (upper/lower) arms, left and right shoes. Since sensors on shoes have different output format, we eliminated those sensors from our analysis in this paper. Each subject performed 5 runs of activity of daily living (ADL) following a given scenario as Grooming, Relaxing, Preparing Coffee, Drinking Coffee, Preparing Sandwich, Eating Sandwich, Cleaning up, wrapped in a Starting and Breaking. The activities are annotated in five different levels but in this paper we used locomotion (i.e., sit, stand, lie and walk) annotations. The second public dataset, Sport and Daily Activities (SDA), was used to ensure SDVL works not only on controlled dataset but

Synchronous Dynamic View Learning

IPSN 2017, April 2017, Pittsburgh, PA USA 60

5.3

Algorithm Design Choices

Mixing coefficient as a design parameter in our algorithm can be optimized experimentally and according to specifics of the sensors and events/ activities of interest. In this section, we describe how this parameter is set for subsequent analyses in this paper. Furthermore, we experimentally set the number of iterations based on the amount of improvements made in consecutive iterations of the labeling algorithm. We chose 10% of the training data as cross validation and performed 10-fold validation to tune the parameters. 5.3.1 Mixing Coefficient. Our iterative approach presented in Section 4.1 uses parameter α as a mixing coefficient to balance between current probability distribution of one node and average of its neighbors. A very small α means that the contribution of the neighbors is very small, thus making the algorithm longer to converge to a final solution. In addition, a very large α makes the algorithm more susceptible to noise. Figure 3a shows the labeling accuracy of the algorithm as the mixing coefficient ranges from α = 0.05 to α = 0.95. As shown in this figure, the accuracy is initially 67.5% at α = 0.05. The accuracy increases as α grows until the accuracy reaches its maximum value of 70.2% when α = 0.75. From this point on, the accuracy starts dropping until it reaches to 61.5% at α = 0.95. Based on this experimental analysis, we set the mixing coefficient to α = 0.75 for our autonomous learning framework. 5.3.2 Stopping Criteria. Since the equation in Equation 12 is convex, it has a global optimum. Using our iterative approach, the algorithm gets closer to the optimal solution in every step of the algorithm. Figure 3b shows how more iterations decrease the objective function in the dataset. During initial iterations, we observe more significant changes in the energy function of the graph while these changes become less significant after 5 iteration. There is a trade-off between running time of the algorithm and the lower energy graph. We approximate optimal solution with 5 iterations where the decreasing rate of the energy function hugely drops. In addition, optimal error rate varies from one experiment to another. Yet, the pattern remains consistent across all experiments as illustrated in Figure 3b. 5.3.3 Static node classification algorithm. Table 1 shows the accuracy of the five sensors alone in recognizing the 22 activities using 3 different machine learning classifiers including Decision Tree (DT), kNN (k-Nearest-Neighbor), and SVM (Support Vector

72

Head Chest Back R-Arm L-Thigh

50

70 68

Error

Accuracy (%) Labels

also when subjects are allowd to be flexible in how to perform activities [2]. Subjects performed 19 daily and sport activities while wearing 5 motion sensor nodes on their “Torse (TO)”, “Left Arm (LA)”, “Left Leg (LL)”, “Right Arm (RA)” and “Right Leg (RL)” body locations. Each sensor node had a 3-axis accelerometer, a 3-axis gyroscope, and a 3-axis magnetometer. The sampling frequency was set to 25Hz and the 5-min signals were divided into 5-sec segments so that 480(=60 × 8) signal segments were obtained for each activity from all users. The collected dataset contained over 9, 120, 000 samples of acceleration, angular velocity and magnetic field. In this dataset, there are obvious inter-subject variations in the speed and intensity of same activities because the subjects were asked to perform the activities in their own style and were not restricted on how the activities should be performed.

66

40

30

64

20

62 10 0

60 0

0.2

0.4

0.6

0.8

5

10

15

Iteration

1

,

(a) Accuracy vs. α

(b) Error (Energy) vs. # of iterations

Figure 3: Impact of various parameters on performance of the autonomous learning algorithm. This includes impact of different mixing coefficient values on accuracy of activity recognition algorithm (a), and impact of number of iterations on error (energy) of the algorithm (b). Table 1: Accuracy (%) of activity recognition algorithms including Decision Tree (DT), k-Nearest-Neighbor (kNN), and Support Vector Machine (SVM) using data of individual sensor nodes Location Head Chest Back R-Arm L-Thigh Avg.

DT 63.2 61.0 52.1 64.1 57.4 59.6

kNN (k = 3) 45.5 45.8 37.7 52.6 37.9 43.9

SVM 73.3 66.5 54.2 71.4 58.8 64.8

Avg. (All Classifiers) 60.7 57.8 48.0 62.7 51.4 –

Machine). With this analysis, our goals were: (1) to compute the accuracy of a single-sensor network to validate our hypothesis that a single sensor may not provide sufficiently high accuracy in detecting a large set of physical activities; (2) to examine what classification algorithm provides highest level of accuracy and robustness when used for activity recognition on different body locations. As shown in Table 1, due to inter-subject variations in activity speed and intensity, no classifier works well. The accuracy of a single-sensor system ranges from 37.7% using a kNN classifier trained for the sensor placed on the ‘back’ to 73.3% using a SVM classifier for the ‘head’ sensor. One explanation for the low accuracy of the single-sensor scenario is that the weakness of the classifier is associated with signals and extracted features in each body location rather than the choice of the machine learning algorithm. Since a single location is not prominent enough in distinguishing all different activity types, changing the classification algorithm cannot improve the recognition accuracy. On average, SVM classifier achieves the highest accuracy among the classification algorithms, as shown in Table 1. Thus, for the rest of our analysis in this paper, we use SVM classifier as the machine learning algorithm used by the static sensor for activity recognition and for determination of semi-labels.

5.4

Analysis Methodology

In this section, we present comparison of our algorithms with prior research in the area of transfer learning for mobile wearable

IPSN 2017, April 2017, Pittsburgh, PA USA devices. Prior to presenting the results, we describe our comparative evaluation approach. Research in the area of transfer learning for wearables is new. To the best of our knowledge, there exist only three of such algorithms, namely Naive and System-Supervised suggested in [5], and Plug-n-learn presented in [19], all of which are applicable to the synchronous teacher/ learner transfer learning studied in this paper. We compare our SDVL technique with these three algorithms as well as the experimental upper bound obtained using ground truth labels gathered during our data collection. • Naive: Authors in [5] proposed the Naive approach as reusing classifier of the ‘teacher’ in ‘learner’. They emphasized that this method only works when ‘teacher’ and ‘learner’ are completely similar (e.g., sensors are co-located on the body and are homogeneous). In this approach, static sensor sends its trained classifier to the dynamic sensor. For example, for the case of SVM as it is used in this paper, static sensor sends the support vectors. As stated before, the limitation of this approach is that the two views are co-located. • System-supervised: This method refers to the case where ‘learner’ labels its observations based on labels predicted by ‘teacher’. In other words, both teacher and learner sensors should be synchronized in advance. Upon any new observation, the classifier trained for ‘teacher’ predicts label of the current activity and sends its prediction to the ‘learner’ for the purpose of activity recognition. According to its authors, accuracy of this method is limited by accuracy of the teacher sensor. • Plug-n-Learn: This approach is designed to address the limitation of the system-supervised method that the accuracy of the learner sensor was limited by that of the teacher sensor [19]. It designs a novel combinatorial view by incorporating both teacher and learner feature spaces to partition current observations. It then retrieves more accurate labels by solving an assignment problem. However, the underlying assumption in this approach is that the on-body location of the learner sensor is always fixed. In this paper, we will show how this method fails when the learner sensor moves around the body. • Experimental Upper Bound: The experimental upper bound can be achieved when the dynamic sensor has access to the classifier trained based on ground truth labels for different body locations. Our analysis compares accuracy of provided labels in ‘dynamic’ using SDVL with that of each of the competing methods. For experimental setting, we choose a sensor node as static sensor and assumed that the dynamic sensor can move from one body location to another for all remaining locations. We divided the obtained sequence of observations into three parts for (1) training the static classifier; (2) transfer learning phase to train the dynamic sensor; and (3) for measuring accuracy of our framework. After training an activity recognition model for the static sensor, we assumed that the dynamic sensor is added to the system moved over all other locations except location of the static senor. We followed the SDVL

Rokni et al. framework to train the dynamic sensor using semi-labels provided by the static sensor. We compared the classifier of dynamic sensor, built upon the computed labels, with all the competing algorithms based on the following metrics. TP +TN (16) TP + T N + FP + FN TP (17) Precision = TP + FP TP Recall = (18) TP + FN 2 × Precision × Recall F1 = (19) Precision + Recall where T P, T N , F P and F N stand for true-positive, true-negative, false-positive and false-negative respectively. Each metric captures different aspect of classifier performance. Specifically, in the context of our teacher / learner transfer learning, we aim to increase recall without sacrificing precision. Accuracy =

6 RESULTS 6.1 Accuracy of Transferred Labels As discussed in the previous section, we start our comparison by comparing accuracy of automatically labeled data using different transfer learning algorithms. Obviously, the upper bound labeling accuracy is 100%, which applies to the case where labeled training data are collected. On the other hand, in the Naive approach, static sensor sends its classifier once and we only use the provided classifier in the dynamic view. The only requirement is that the feature spaces of the two views should be compatible. Consequently, we cannot include the Naive approach in this comparison. The investigation of training label accuracy is essential because these labels are being used as training data and their accuracy significantly contributes to the accuracy of the final classifier trained in the dynamic view. Figure 4 shows average accuracy of training labels over all subjects. According to Figure 4a, on IRH dataset, the accuracy of calibrated labels using SDVL ranged from 69.9% to 72.7% which shows 15.9% and 29.6% improvement from system-supervised and plug-nlearn methods, respectively. Looking at Figure 4b, we observe 87.5% accuracy using SDVL resulting in 21.1% and 42.2% enhancement compared with system-supervised and plug-n-learn on the OPP dataset. Similar results cab be observed on SDA dataset as shown in Figure 4c.

6.2

Accuracy of Activity Recognition

In Section 6.1, we compared the accuracy of training labels of different methods. To create an activity recognition model, we developed an SVM classifier on the labeled data using each algorithm under comparison. According to Figure 5a, the average accuracy of SDVL is 70.2% on the IRH dataset. This shows 15.4% improvement over system-supervised, 30.7% compared to plug-n-learn and 51.8% considering the Naive approach. We note that the accuracy is about 15% less than the experimental upper bound. The effect of enhancing the accuracy of labeling is more considerable in Figure 4b. SDVL boosts the accuracy of the training labels by

Synchronous Dynamic View Learning

SDVL

Sys-Sup

100

PNL

80

60

40

20

HE

CH

BK

RA

UpprBnd

SDVL

Sys-Sup

80

60

40

20

BK

LT

100

PNL

Accuracy of Training Labels(%)

UpprBnd

Accuracy of Training Labels(%)

Accuracy of Training Labels(%)

100

IPSN 2017, April 2017, Pittsburgh, PA USA

RU

RL

LU

UpprBnd

PNL

60

40

20

LL

TO

RA

LA

RL

LL

Static Sensor

(b) OPP dataset

(a) IRH dataset

Sys-Sup

80

Static Sensor

Static Sensor

SDVL

(c) SDA dataset

Figure 4: Accuracy using labels of different algorithms. The results are presented for IRH dataset(a), OPP dataset (b), SDA dataset (c) 100

110

110 UprB

SDVL

Sy-Su

PNL

Naiv

UprB

100

Sy-Su

PNL

Naiv

40

20

80 70 60 50 40

HE

CH

BK

RA

PNL

Naiv

70 60 50 40 30

20

20 10 BK

LT

Sy-Su

80

30

10

0

SDVL

90

Accuracy of Classifier(%)

60

UprB

100

90

Accuracy of Classifier(%)

Accuracy of Classifier(%)

80

SDVL

RU

RL

LU

LL

TO

RA

Static Sensor

Static Sensor

(b) OPP dataset

(a) IRH dataset

LA

RL

LL

Static Sensor

(c) SDA dataset

Figure 5: Accuracy of activity recognition using different algorithms. The results are presented for IRH dataset (a), OPP dataset (b), and SDA dataset (c) Table 2: Precision and Recall of IRH dataset

Static Node Head Chest Back Right Arm Left Thigh Average

Naive 21.77 22.42 21.84 26.29 21.48 22.76

Precision Sy-Su SDVL 56.54 63.45 50.6 63.17 43.21 64.39 53.52 65.32 45.83 64.15 49.94 64.1

PNL 40.5 35.29 38.47 37.14 38.82 38.04

Upper Bound 86.79 86.95 87.26 86.69 86.68 86.87

Naive 13.05 20.74 19.38 19.28 19.44 18.38

PNL 41 37.62 39.5 39.3 40.12 39.51

Recall Sy-Su SDVL 64.39 69.94 56.51 70.48 47.07 69.91 58.23 70.3 47.64 70.18 54.77 70.16

Upper Bound 86.45 86.58 86.94 86.43 86.41 86.56

Table 3: Precision and Recall of OPP dataset

Static Node Head Right Upper Arm Right Lower Arm Left Upper Arm Left Lower Arm Average

Naive 65.79 59.74 43.31 60.1 59.16 57.62

PNL 24.08 31.94 31.85 31.77 32.22 30.37

Precision Sy-Su SDVL 88.89 80.18 62.38 85.28 61.68 79.38 60.58 75.59 36.58 74.1 62.02 78.91

Upper Bound 92.93 92.66 93.03 93.01 92.77 92.88

more than 23% and 42% over system-supervised and plug-n-learn, respectively. This suggests that SDVL could be more effective compared to other approaches in distinguishing locomotion activities.

Naive 45.25 40.76 39.16 41.57 41.43 41.63

PNL 39.3 45.28 43.09 40.9 43.34 42.38

Recall Sy-Su SDVL 71.98 87.67 66.63 88.48 63 86.78 63.18 80.93 46.94 83.35 62.35 85.44

Upper Bound 93.42 93.84 93.5 93.5 93.25 93.50

However, plug-n-learn works better on average over all different scenarios, according to Figure 4b, although there are several cases where it works worse than the Naive approach. One explanation

IPSN 2017, April 2017, Pittsburgh, PA USA

Rokni et al. Table 4: Precision and Recall of SDA dataset

Static Node Torse Right Arm Left Arm Right Leg Left Leg Average

Naive 45.22 45.79 46.24 41.7 46.55 45.1

PNL 53.43 53.03 53.65 54.03 52.85 53.4

Precision Sy-Su SDVL 83.76 85.86 80.35 78.04 86.57 84.92 86.67 88.08 93.02 93.82 86.07 86.14

100 UprB

SDVL

Sy-Su

PNL

Upper Bound 97.41 97.46 97.49 97.38 97.36 97.42

60

SDVL

Sy-Su

PNL

80

60

60

F1 CH

BK

RA

UprB

Naiv

80

40

40

20

20

0 HE

Upper Bound 97.33 97.38 97.41 97.29 97.29 97.34

SDVL

Sy-Su

PNL

Naiv

F1

80

20

Recall Sy-Su SDVL 83.78 87.02 79.06 82.68 86.11 87.42 88 90.11 91.14 91.96 85.61 87.83

100 UprB

40

PNL 56.5 56.67 57.69 57.87 56.01 56.94

100

Naiv

F1

Naive 34.69 36.25 35.82 29.8 36.05 34.52

LT

Static Sensor

(a) IRH dataset

0

0 BK

RU

RL

LU

Static Sensor

(b) OPP dataset

LL

TO

RA

LA

RL

LL

Static Sensor

(c) SDA dataset

* Figure 6: F1 measure of classifier using different algorithms. The results are presented for IRH dataset (a), OPP dataset (b), and SDA dataset (c) for this observation is that all pairs of node locations are in close proximity and may capture similar patterns in the OPP dataset (i.e., Left/Right Upper/Lower Arms). As previously discussed, in situations where static and dynamic nodes are close to each other the decision made using a static classifier on dynamic feature space could be more accurate. We can observe this in the scenario where the static sensor location is Back where plug-n-learn works better. We also note that the accuracy of SDVL is 6.9% less than the experimental upper bound. In addition to the IRH and OPP datasets, we conducted similar experiments on the SDA dataset. As Figure 4c shows, activity recognition using SDVL has an average accuracy of 88% which is 2.6% and 31% higher than system-supervised and plug-n-learn, respectively, and approximately 11% less than experimental upper bound. Here, the Naive achieves the lowest performance with 34.5% accuracy on average.

6.3

could be revised based on the similarity graph on the observation of dynamic node. In Table 3, we observe a similar case where the recall improvement is more than that of precision on dataset OPP. However, the average accuracy of plug-n-learn is less. Analogous to explanation of classifier accuracy, in this dataset some pairs of static and dynamic have similar locations which cause the Naive approach work better. For dataset SDA, system-supervised and SDVL produce comparable results because the accuracy of the static sensor was high and contributed to the accuracy of the system-supervised method. F1-measure combines precision and recall and provides a powerful metric for comparing the effectiveness of two classifiers. Figure 7a compares F1-measure of five different methods on the IRH dataset. On average, SDVL improves this metric by 15.7% and 28.6% from system-supervised and plug-n-learn methods, respectively. As expected, F1 of the Naive approach is quite low with an average of 17.6%. Similarly, on the OPP SDA datasets, SDVL successfully boosts the F1-measure by 23.5% and 2.4%, respectively.

Precision, Recall and F1-Measure

In Table 2, we compare Precision and Recall for different approaches on the IRH dataset. In both measures, Naive method achieves lowest performance with an average precision of 23% and recall of 18% while the upper bound precision and recall are approximately 87% on average. Both precision and recall improved using SDVL compared to system-supervised and plug-n-learn. However, the recall improvement is more considerable than precision enhancement. One explanation is as follows. In system-supervised, if the static sensor makes a wrong decision, it will never be recovered by the dynamic sensor. In SDVL, however, a wrong decision by static node

6.4

Effect of Static Sensor Accuracy

In Section 6.3, we demonstrated that SDVL can revise the decision made by static sensor. However, it is necessary to highlight limitations of the proposed approach. Particularly, when the prediction of the static sensor is better than random guess, the correct labels will be augmented through the propagation process. However, in real scenarios, there might be cases where one sensor has no ability to distinguish several activities. For example, observations of a sensor on ‘ankle’ are very similar for upper body activities such

Synchronous Dynamic View Learning

IPSN 2017, April 2017, Pittsburgh, PA USA while the dynamic sensor has a more complex feature set. Figure 8 shows that in both experiments a stronger capability to separate instances of the dynamic view leads to a better classification performance. In particular, by removing gyroscope and magnetometer from the dynamic sensor, accuracy of the final classifier decreases by 5.1% on average. Similarly, with a more complex features set classification accuracy of the dynamic view increases by a small margin of 0.6% on average.

100 100

Acc+Gyro+Mag

Acc

Less Features

More Features

90

80

Accuracy (%) of Classifier

Accuracy (%) of Classifier

as Typing and Writing. In this section, we show learnability limitation of our approach. We consider a scenario where the source sensor has no ability to distinguish among a subset of activities. In other words, when any of those activity occur, the source classifier operates as good as a random guess. We simulated a scenario that the static sensor cannot separate instances of these three activities: Sitting, Standing and Lying in the SDA dataset. Consider this set of activities a confusion set. We put a filter at the output of static sensor’s classifier that changes those predictions to a uniformly random guess among activities in the confusion set. Analogously, we picked three similar activities (i.e., Sitting, Writing and Typing) from the IRH dataset and performed the same analysis. Figure 7 compares the original accuracy of SVDL and the accuracy of SVDL when the source is noisy. As shown in Figure 7, confusion of these three activities causes about 10% drop in the accuracy of the final classifier using the IRH dataset. The amount of accuracy drop was about 6% for the SDA dataset. This shows that SVDL is sensitive to the accuracy of the static classifier. In cases where the source of knowledge (i.e. static sensor) is not accurate, querying external sources of knowledge such as other sensors, ambient sensor, or even the user could be effective. Specifically, active learning methods can serve useful to minimize the number such queries from external sources.

60

40

20

80

70

60

0 TO

RA

LA

RL

Static Sensor

(a) Sensor Modality

LL

50 TO

RA

LA

RL

LL

Static Sensor

(b) Feature Complexity

Figure 8: Accuracy of classifier using different feature spaces on SDA dataset. The results are presented for different modality (a) and different feature complexity(b)

90 SDVL

100

Noisy Source

SDVL

Sen1

7

70

Accuracy (%) of Classifier

Accuracy (%) of Classifier

80

60 50 40 30 20

60

40

20

10 0

0 HE

CH

BK

RA

LT

Static Sensor

(a) IRH dataset

TO

RA

LA

RL

LL

Static Sensor

(b) SDA dataset

Figure 7: Accuracy of classifier using original SVDL and corrupted source scenario. The results are presented for IRH dataset (a), nd SDA dataset (b)

6.5

DISCUSSION AND FUTURE WORK

80

Effect of Feature Space

In previous experiments, we extracted the same set of features for both static and dynamic sensors. As describe in Section 3.3, static sensor sends only semi-labels to dynamic sensor. There is no restriction on having a different feature space in dynamic view than that of static sensor. In fact, as long as two sensors could be synchronized and observe the same phenomena, dynamic sensor can rely on semi-labels provided by the static sensor to develops its model based on its own feature space. We conducted an experiment with the SVDL solution where static and dynamic sensors have different feature spaces. To this end, we examined two scenarios. In the first scenario, static sensor had three modalities (i.e., accelerometer, gyroscope and magnetometer) while dynamic sensor contained only accelerometer. We compared this scenario with the original SDVL where the dynamic sensor had the same modality. For the second scenario, both static and dynamic sensors had the same modality

We use several heuristic methods to tackle the autonomous learning problem. For example, using confusion matrix to extract semi-labels is not always a practical solution. Using history of confusion of a sensor placed on a specific part of body or comparing different signal pattern that could be generated by different activities we might be able to discover perfect semi-labels. Moreover, minimum disagreement algorithm is not always a suitable option especially when accuracy of static sensor is significantly low. In this situation, wrong labels could contribute more and spread more widely through the graph. We plan to devise different similarity measurements and customize algorithm parameters to address this issue. In addition, since all similarity measurements use features of dynamic sensor, different feature selection strategies result in different feature spaces and may consequently lead to different similarity graphs. We note that the feature space remains unchanged even when dynamic sensor moves. In fact, if the dynamic sensor could detect its own location, the problem would reduce to several disjoint examples of (Plug-n-Learn) [19] presented in related work and each location could have its own model. The challenge addressed in this paper was training a new sensor in a dynamic setting without need for ground truth labels. In addition to training a dynamic sensor, the newly trained sensor could collaborate with the static sensor and develops a more reliable recognition model. Collaboration can happen using different mechanisms such as ensemble classifier or weighted decision fusion. This collaboration could lead to a better accuracy much closer to a practical upper bound. Our study in this paper is a first step towards designing next generation wearables that are computationally autonomous and can automatically learn machine learning algorithms. Dynamic

IPSN 2017, April 2017, Pittsburgh, PA USA attributes of wearables are not limited to real-time sensor addition and movement. A sensor can be removed, upgraded, or replaced, sensors can produce noisy signals, and the system can be adopted by new users. Our ongoing research involves development of transfer learning algorithms that address dynamically evolving behavior of human-centered monitoring applications. In this paper, we evaluated SDVL for sensor addition scenarios using three different datasets. We believe that our SDVL technique is applicable to any scenarios where ‘static’ and ‘dynamic’ views observe the phenomena of interest synchronously. We plan to study the effectiveness of our dynamic view learning approach in a network with heterogeneous or asynchronous sensors in the future.

8

CONCLUSION

As wearable sensors are becoming more prevalent, their function becomes more complex and they operate in highly dynamic environments. Machine learning algorithms for these sensors cannot be designed only for one specific setting. To address the dynamic nature of wearable sensors, we proposed a dynamic-view learning approach that uses knowledge of existing sensors to autonomously learn activity recognition in newly added sensors. We introduced a dynamic-view learning approach to learn computational algorithms in new dynamically changing settings without any need for labeled training data in the dynamic view. Our experiments show that we can combine knowledge of existing system with the observation of a new sensor to boost accuracy of the predicted labels without ground truth labels in dynamic views. Our experimental analysis demonstrate promising results compared to the best available solutions with an overall accuracy improvement of 13.8% using three datasets.

ACKNOWLEDGMENTS The authors would like to thank the shepherd Dr. Shahriar Nirjon and anonymous reviewers for providing valuable feedback. This work was supported in part by the National Science Foundation, under grant CNS-1566359. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations.

REFERENCES [1] Parastoo Alinia, Ramyar Saeedi, Bobak Mortazavi, Ali Rokni, and Hassan Ghasemzadeh. 2015. Impact of sensor misplacement on estimating metabolic equivalent of task with wearables. In 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN). IEEE, 1–6. [2] Billur Barshan and Murat Cihan Yüksek. 2014. Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units. Comput. J. 57, 11 (2014), 1649–1667. [3] Christopher M Bishop and others. 2006. Pattern recognition and machine learning. Vol. 4. springer New York. [4] Adrian Burns, Barry R Greene, Michael J McGrath, Terrance J O’Shea, Benjamin Kuris, Steven M Ayer, Florin Stroiescu, and Victor Cionca. 2010. SHIMMER–A wireless sensor platform for noninvasive biomedical research. Sensors Journal, IEEE 10, 9 (2010), 1527–1534. [5] Alberto Calatroni, Daniel Roggen, and Gerhard Tröster. 2011. Automatic transfer of activity recognition capabilities between body-worn motion sensors: Training newcomers to recognize locomotion. Eighth international conference on networked sensing systems (INSS’11), Penghu, Taiwan 6 (2011). [6] Dirk G Cattrysse and Luk N Van Wassenhove. 1992. A survey of algorithms for the generalized assignment problem. European journal of operational research 60,

Rokni et al. 3 (1992), 260–272. [7] Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2007. Boosting for transfer learning. In Proceedings of the 24th international conference on Machine learning. ACM, 193–200. [8] A. M. Davie and A. J. Stothers. 2013. Improved bound for complexity of matrix multiplication. Proceedings of the Royal Society of Edinburgh: Section A Mathematics 143, 2 (001 004 2013), 351–369. [9] Ramin Fallahzadeh, Samaneh Aminikhanghahi, Ashley Nichole Gibson, and Diane J Cook. 2016. Toward personalized and context-aware prompting for smartphone-based intervention. In Engineering in Medicine and Biology Society (EMBC), 2016 IEEE 38th Annual International Conference of the. IEEE, 6010–6013. [10] Kyle D Feuz and Diane J Cook. 2015. Transfer Learning across Feature-Rich Heterogeneous Feature Spaces via Feature-Space Remapping (FSR). ACM Transactions on Intelligent Systems and Technology (TIST) 6, 1 (2015), 3. [11] Marc Kurz, Gerold Holzl, Alois Ferscha, Alberto Calatroni, Daniel Roggen, Gerhard Troster, Sagha, and others. 2011. The opportunity framework and data processing ecosystem for opportunistic activity and context recognition. International Journal of Sensors Wireless Communications and Control 1, 2 (2011), 102–125. [12] Markus Maier, Matthias Hein, and Ulrike Von Luxburg. 2007. Cluster identification in nearest-neighbor graphs. In Algorithmic Learning Theory. Springer, 196–210. [13] Saeed D Manshadi and Mohammad E Khodayar. 2016. Expansion of Autonomous Microgrids in Active Distribution Networks. IEEE Transactions on Smart Grid (2016). [14] Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. Knowledge and Data Engineering, IEEE Transactions on 22, 10 (Oct 2010), 1345–1359. [15] Oskar Perron. 1907. Zur theorie der matrices. Math. Ann. 64, 2 (1907), 248–263. [16] Daniel Roggen, Alberto Calatroni, M. Rossi, Thomas Holleczek, Kilian FÃűrster, Gerhard TrÃűster, Paul Lukowicz, David Bannach, Gerald Pirkl, Alois Ferscha, Jakob Doppler, Clemens Holzmann, Marc Kurz, Gerald Holl, Ricardo Chavarriaga, Hesam Sagha, Hamidreza Bayati, Marco Creatura, and JosÃľ del R. MillÃąn. 2010. Collecting complex activity data sets in highly rich networked sensor environments. In Seventh International Conference on Networked Sensing Systems. [17] Daniel Roggen, Kilian FÃűrster, Alberto Calatroni, and Gerhard TrÃűster. 2013. The adARC pattern analysis architecture for adaptive human activity recognition systems. Journal of Ambient Intelligence and Humanized Computing 4, 2 (2013), 169–186. [18] Seyed Ali Rokni and Hassan Ghasemzadeh. 2016. Autonomous sensor-context learning in dynamic human-centered internet-of-things environments. In Proceedings of the 35th International Conference on Computer-Aided Design. ACM, 75. [19] Seyed Ali Rokni and Hassan Ghasemzadeh. 2016. Plug-n-learn: automatic learning of computational algorithms in human-centered internet-of-things applications. In Proceedings of the 53rd Annual Design Automation Conference. ACM, 139. [20] Seyed Ali Rokni, Hassan Ghasemzadeh, and Niloofar Hezarjaribi. 2016. Smart Medication Management, Current Technologies, and Future Directions. Handbook of Research on Healthcare Administration and Management (2016), 188–204. [21] Ramyar Saeedi, Janet Purath, Krishna Venkatasubramanian, and Hassan Ghasemzadeh. 2014. Toward Seamless Wearable Sensing: Automatic On-Body Sensor Localization for Physical Activity Monitoring. In The 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). [22] J.A. Stankovic. 2014. Research Directions for the Internet of Things. Internet of Things Journal, IEEE 1, 1 (Feb 2014), 3–9. [23] Amarnag Subramanya, Slav Petrov, and Fernando Pereira. 2010. Efficient graphbased semi-supervised learning of structured tagging models. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 167–176. [24] Shiliang Sun. 2013. A survey of multi-view machine learning. Neural Computing and Applications 23, 7-8 (2013), 2031–2038. DOI:http://dx.doi.org/10.1007/ s00521-013-1362-6 [25] TLM van Kasteren, Gwenn Englebienne, and Ben JA Kröse. 2010. Transferring knowledge of activity recognition across sensor networks. In Pervasive computing. Springer, 283–300. [26] Atefeh Zafarian, Ali Rokni, Shahram Khadivi, and Sonia Ghiasifard. 2015. Semisupervised learning for named entity recognition using weakly labeled training data. In Artificial Intelligence and Signal Processing (AISP), 2015 International Symposium on. IEEE, 129–135. [27] David Zakim and Matthias Schwab. 2015. Data collection as a barrier to personalized medicine. Trends in pharmacological sciences 36, 2 (2015), 68–71. [28] Xiaojin Zhu, Zoubin Ghahramani, John Lafferty, and others. 2003. Semisupervised learning using gaussian fields and harmonic functions. In ICML, Vol. 3. 912–919.