A Closed-loop Deep Learning Architecture for Robust Activity Recognition using Wearable Sensors Ramyar Saeedi§ , Skyler Norgaard† , and Assefaw H. Gebremedhin§ §
School of Electrical Engineering and Computer Science Washington State University, Pullman, WA 99164–2752 Email: {rsaeedi, assefaw}@eecs.wsu.edu † Department of Computer Science, Kalamazoo College Email:
[email protected]
Abstract—Human activity recognition (HAR) plays a central role in health-care, fitness and sport applications because of its potential to enable context-aware human monitoring. With the increase in popularity of wearable devices, we are witnessing a large influx in availability of human activity data. For effective analysis and interpretation of these heterogeneous and high-volume streaming data, we need powerful algorithms. In particular, there is a strong need for developing algorithms for robust classification of human activity data that specifically address challenges associated with dynamic environments (e.g. different users, signal heterogeneity). We use the term robust here in two, orthogonal senses: 1) leveraging related data in such a way that knowledge is transferred to a new context; and 2) actively reconfiguring machine learning algorithms such that they can be applied in a new context. In this paper, we propose an architecture that combines an active learning approach with a novel deep network. Our deep neural network exploits both Convolutional and Long Short-Term Memory (LSTM) layers in order to learn hierarchical representation of features and capture time dependencies from raw-data. The active learning process allows us to choose the best instances for fine-tuning the deep network to the new setting in which the system operates (i.e. a new subject). We demonstrate the efficacy of the architecture using real data of human activity. We show that the accuracy of activity recognition reaches over 90% by annotating less than 20% of unlabeled data.
I.
I NTRODUCTION
Physical activity monitoring is a crucial component in many applications, including sport, gait analysis, and, more importantly, patient monitoring (e.g. heart failure, cardiovascular disease, diabetes). It is well known that regular daily activity improves blood glucose control and can prevent or delay diabetes and heart diseases. Other observed benefits of physical activity include positive effects on lipids, blood pressure, cardiovascular condition, mortality, and overall quality of life [1], [2]. Recent advancements in sensors, electronics, and wireless communications are allowing continuous and remote monitoring of physical activities. In particular, wearable motion sensors (such as accelerometers and gyroscopes [3]) with embedded wireless connectivity are commonly being used for human activity recognition (HAR). Activity recognition refers to detection of activities ranging from simple body postures (e.g. standing, sitting) to complex activities (e.g. lunge, jumping) depending on the application.
Supervised machine learning algorithms form the core intelligence of embedded software in wearable systems. However, these approaches are limited in several ways in their ability to meet the needs in contexts involving wearables. First, current machine learning algorithms rely on engineered features to address increasingly complex activity recognition systems. Human activity recognition is challenging due to the large variability and heterogeneity in activity data [4]–[6]. We need to learn a better representation of data to build generalized models. Learned representations often result in much better performance than can be obtained with handdesigned representations [7]–[10]. Secondly, the collected data also includes non-relevant activities and non-activities in addition to the activity of interest through the life time of the system, making it necessary to have a specific representation for each activity. Thirdly, a system generally is designed for a limited number of predefined settings (e.g. a fixed set of sensor locations, a fixed set of activities) or operating environments (e.g. a specific group of subjects). Therefore, the accuracy of machine learning algorithms drops whenever the system’s setting or operating environment changes [11]–[13]. Fourth, a user’s needs may change over time. For example, in a physical activity monitoring system, we may need to detect a new set of activities [14]. The HAR system should be able to interact with the users to update the activity recognition algorithms, while minimizing disturbance or interference in user’s life and expenses related to active learning [14]–[17]. Next generation mobile devices need to be capable of adapting to dynamic environments in order to generate appropriate feedbacks and make accurate decisions. Our goal is to design systems to separate the factors of variation from the observed data. Deep learning techniques are promising to achieve this goal for several reasons: 1) Deep learning approaches have the potential to uncover features that are tied to the dynamics of human movements. 2) The complexity of a deep neural network can help mitigate the effects of variations in the data such as walking with different paces. And 3) Deep architecture can capture the relation between consecutive data segments. Furthermore, as an added benefit, we can use the output of the soft-max layer to measure the confidence of our model in data labeling. In this paper, we propose an active deep learning architecture targeted for HAR systems. The architecture is designed
to meet the following requirements: a) learn features that are representative of data in different contexts, making the model capable of knowledge extraction from related datasets (i.e. other subjects); b) incorporate an uncertainty detection unit along with the HAR machine learning model; and c) decrease the total number of queries to reconfigure the deep network in the active learning phase. Our main contributions are outlined below. a) Closed-Loop Activity Recognition (Section III). We propose a closed-loop architecture for interactive reconfiguration in wearable activity recognition systems. Fig. 1 gives a high level overview of the architecture. b) Deep Architecture (Section IV). We design a deep neural network which combines different types of layers to be used in the closed-loop architecture. The Convolutional layers are used for representation learning, and the Long Short-Term Memory (LSTM) layers capture the longterm dependency in activity data. c) Experimental Evaluation (Section V). Using a publicly available human physical activity dataset, we evaluate the efficacy of the active deep architecture in robust activity recognition. We show that using the architecture increases the accuracy of activity recognition by up to 9% compared to a state-of-the-art deep architecture and by 31% compared to traditional classifiers. The accuracy of HAR reaches over 90% by annotating less than 20% of unlabeled data. II.
P RELIMINARIES
We begin with a brief review of preliminaries around deep learning and active learning that we will use later in the active deep architecture. A. Deep Learning Deep learning is a class of machine learning based on the idea of learning multiple levels of representations as opposed to task-specific algorithms. Deep Learning is well suited for applications where factors of variation are present in the input data. A deep architecture learns high level features that are useful for different settings or tasks, corresponding to the underlying factors that appear in more than one setting. A high-level abstraction of data could be used to design robust human activity recognition systems in the presence of signal heterogeneity. Specifically, Convolutional Neural Networks (CNNs) and LSTM networks are of interest in this paper. CNNs can capture local dependencies in inertial sensors’ data—the nearby acceleration readings are likely to be correlated [18]–[20]. The CNN layers preserve invariant feature scales which result in more general representation of activity data. This specification helps to capture data heterogeneity (e.g. different walking patterns). LSTM, a specific type of recurrent neural networks, is suitable for applications where there are very long-time lags of unknown sizes between important events. To do so, LSTMs exploit new sources of information so that data can be stored in or read from a node at each step [19], [21]. During the training, the network learns what to store and when to allow reading/writing to minimize the classification error. Using both
CNNs and LSTMs in a deep architecture helps to develop general models for data classification. B. Active Learning The key idea behind active learning is to allow the learning algorithm to choose the data from which it learns. However, the system should strive to minimize the cost of querying while maintaining the system’s accuracy [22]. There are situations in which unlabeled data is abundant but manually labeling is expensive. In such a scenario, learning algorithms can actively query the user/experts for labels. In active learning, a learner often begins with a small number of instances in the labeled training set XL . However, an activity recognition system quite often may have to start with no labeled data at all for the current context [14]. This requires more sophisticated strategies in the initialization phase of the active learning process. We show that the deep learning architecture benefits from the related training data to learn features for activity recognition systems. The decision of whether or not to query an instance can be framed in several ways. One approach is to evaluate instances using some measure that shows how informative an instance is, since more informative instances are more likely to be queried [23]. However, only considering informativeness of instances is prone to querying outliers. To address this, we used a new measurement, proposed in [22], called information density (ID). ID is defined as follows. φ
ID
(x) = φ
SE
(x) ×
1 sim(x, xu ) |XU |
β (1)
xu ∈XU
Here, the informativeness of x is weighted by its average similarity to all other instances in the unlabeled set XU , subject to a parameter β that controls the relative importance of the density term. In the formulation presented above, sequence entropy φSE measures the base informativeness. This density measure requires us to compute the similarity of two instances. We measure the similarity between two instances using cosine similarity. One potential drawback of ID is that the number of required similarity calculations grows quadratically with the number of instances in XU . However, these densities are only needed for a limited number of data instances in each active learning phase. Thus, when employing information density in a realworld interactive activity recognition, the density scores are computed once and cached in the back-end server during the actual active learning process. III.
C LOSED -L OOP D EEP ACTIVITY R ECOGNITION
The proposed active deep architecture, to be used for interactive reconfiguration of wearable HAR, is depicted in Fig. 1. The setup is performed in two phases. The first phase aims to pre-train the deep neural network using the related labeled data. The second phase is used to actively update the deep neural network by getting feedback from experts to annotate informative unlabeled data when it is necessary. We go through details of these two steps and related system development matters in the rest of this section.
Related Labeled Data ( ) Data
Feature Learning (Fixed Layers)
Context-related Layers
Activity Type
Deep Architecture Pre-training Phase
Control
Labeled data ( )
Annotator
Unlabeled data ( )
Active Learning Phase
6HQVRUV
Query Strategy
Figure 1: High level overview of the proposed active deep learning architecture.
Pre-training Phase. The pre-train phase is offline since it is performed on the back-end server prior to real-time execution of activity recognition. Using the related labeled data XR , as shown in Fig. 1, the deep neural network is trained for activity recognition. Our deep network model is fed with activity data segments through all layers. The network is trained to: 1) capture the structure of the data (feature learning); and 2) act as the first activity recognition system (context-related). The context-related layers are trained for the related context and depending on the similarity between the related data and the current context, they need to be re-trained in the active learning phase. Active Learning Phase. In this phase, which is carried out in real-time, the system gradually adds data instances to the set of unlabeled data XU . Whenever the active learning phase is enabled (e.g. operating in a new context), the most informative set of unlabeled data are selected to fine-tune the weights in the context-related layers. The active learning phase may run on the gateway or back-end server depending on the computational capability of the gateway. Algorithm. The algorithm corresponding to the closed-loop activity recognition is outlined in Algorithm 1 in psuedocode form. The algorithm takes as input the related labeled data XR , the set of unlabeled data XU in the new context, the activity set A, the number of instances for each round of query K, and the uncertainty threshold uth . It gives as output the deep neural network model for HAR MHAR . Algorithm 1 calls the routine I NITIALIZE DNN, which corresponds to the pretraining phase, using the related labeled data XR . The whileloop subsequent to the network initialization corresponds to the active learning phase. In each iteration of the while-loop, the most informative instances are selected via the function TOP K (recall the discussion about information density in Section II-B), which returns the K instances with the highest information density from the unlabeled data set XU . Then, the new labeled data are added to the labeled data and the network parameters are updated using the new labeled data (U PDATE DNN). This process is repeated until we iterate over all the selected instances or the learner’s uncertainty score falls below the threshold uth for all Ai ∈ A. In other words, the active learning phase is stopped when the deep architecture is confident enough to label activities in the new context (i.e. new subject). The model MHAR is returned as the output of the algorithm. In the next section, we discuss the details of the deep architecture that we propose.
Algorithm 1 ACTIVE D EEP L EARNING Input: XR , XU , A = {A1 , A2 , ..., Ak }, K, uth Output: Deep Model for HAR MHAR 1: XS ← ∅, XL ← ∅, uM ← 1 2: MHAR ← I NITIALIZE DNN(XR , A) /*pre-training*/ 3: while uM > uth do 4: (X, uM ) ← TOP K(K, XU ) /*query strategy*/ 5: for all xi ∈ X do 6: yi ← Q UERY(xi ) /*ask from annotator(s)*/ 7: XS ← XS ∪ (xi , yi ) 8: end for 9: XL ← XL ∪ XS 10: MHAR ← U PDATE DNN(XL ) /*active learning*/ 11: XS ← ∅ 12: end while
IV.
D EEP L EARNING A RCHITECTURE
The proposed deep learning architecture is depicted in Fig. 2. The network comprises eight layers with weights— five Convolutional layers, two recurrent LSTM layers, and one soft-max layer. The deep network model works by passing each activity data segment through the CNN layers to produce fixedlength vector representations for the next layer. In the output of each CNN layer, we have different levels of abstraction for the activity data. After feature learning, the LSTM layers then take over to capture the temporal dependence of activity data. The output of the last LSTM layer is fed to a soft-max layer which produces a distribution over the activity labels. Maxpooling layers, with size of 2 × 1, follow the first, fourth, and fifth Convolutional layers. The ReLU and hyperbolic tangent non-linearity is applied to the output of Convolutional layers and LSTM layers, respectively. The first Convolutional layer filters the 3D input tensor with 128 kernels of size 5 × 1 × 3 with a stride of 1 sample. The second Convolutional layer takes as input the (responsenormalized and pooled) output of the first Convolutional layer and filters it with 96 kernels of 5 × 1 × 128 size. The third Convolutional layer has 64 kernels of size 5×1×96 connected to the outputs of the second Convolutional layer. The fourth Convolutional layer has 48 kernels of size 5 × 1 × 64, and the fifth Convolutional layer has 32 kernels of size 5 × 1 × 48. Because we use zero padding to compute the convolution, the second dimension of the data fed through the network remain the same unless we have max-pooling (e.g. first layer). The output of the fifth Convolutional layer is reshaped to flatten the second and third dimensions of data. This flattened input is then fed into the recurrent layers of the network. Both recurrent layers feature 128 LSTM units. This is then fed into the fully connected layer, where the soft-max activation function is applied in order to compute the most likely class that the action falls within. Below, we describe specific features that we exploit in the deep learning architecture. 3D Input Structure. The input to the model is considered as a 3-dimensional time series sensor data of the form, lenw × nums × 3, where lenw represents the window length and nums represents the number of sensors. Inspired by RGB representation of images, we add a notion of depth,
Size of window
96
64
48
Soft-max
128
dense
LSTM
3
Max pooling
Max pooling
dense
LSTM
Max pooling
Depth:
128
128 dense
32
Figure 2: An illustration of the 3D-ConvLSTM architecure. The network’s input is 3-dimensional. The network includes 11 layers including the max-pooling layers.
corresponding to the X, Y and Z components of each sensor module (e.g. accelerometer, gyroscope). Max-Pooling. Max pooling helps to make the representation become approximately invariant to small translations of the input. Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do not change. Also, the number of parameters decreases because of max-pooling layers and therefore, the training time decreases. Built-in Active Learning. The soft-max layer produces a distribution over the activity labels. Using the output of the soft-max layer, we compute the base informativeness φSE (Section II-B) for all the collected unlabeled instances xui ∈ XU . This is used to calculate the information density in the active learning phase. Implementation. We implemented the proposed deep architecture in the Keras library. The pre-training phase and active learning phase are run on a Tesla K80 GPU. We used a fully-supervised model training using back-propagation. The network parameters are optimized by minimizing the crossentropy loss function using mini-batch gradient descent. V.
E XPERIMENTAL E VALUATION
In this section, we experimentally assess the efficacy of the proposed architecture using real human activity data. The section is organized as follows. We describe the dataset we use in Section V-A. We present the architecture design settings in Section V-B. In Section V-C, we evaluate the performance of the deep architecture with/without active learning. Finally, we compare this work to a state-of-the-art deep human activity recognition system (DeepConvLSTM [19]) in Section V-D. A. Dataset Description We used the OPPORTUNITY activity recognition dataset [24]. The data is collected from sensors configured on four subjects who perform Activities of Daily Living (ADL) (e.g. drink from cup, open door, close door, open dishwasher). There are 18 classes in this dataset including a Null class. The Null
class refers to either non-relevant activities or non-activities. The full setup includes both ambient and on-body sensors. However, in this paper, we only consider tri-axial wearable sensors as we study the use of deep learning in wearable systems. The sensor set includes accelerometers, gyroscopes, and magnetometers in seventeen different locations on the body. The sampling frequency of each sensor is 30Hz. More detail about the dataset can be found in [24]. B. Architecture Design Settings We discuss below design settings we used in the data processing flow. Data Preparation. We applied a moving average filter on the activity data to remove high frequency noises from the sensor data. After data filtering, a two-second sliding window was used for data segmentation. Consequently, we have 60 × 37×3 input tensors. We consider the 50% overlap (one-second) between consecutive segments of activity data. Deep Network Setting. We train the deep network (3DConvLSTM) with batches of 100 input tensors. We used the RMS Prop optimizer, in which the learning rate is 0.001 and the decay factor is 0.9. We also shuffle the batches at each training epoch for the pre-training phase. We applied dropout as a regularization technique with p = 0.5 probability during the pre-training phase. Because the instances that are queried to annotate are scarce, we revoke the dropout setup in the active learning phase. For the DeepConvLSTM, we used the same setting as what the authors mentioned in their paper [19]. Feature Extraction. A variety of features can be extracted from human activity data segments. Previous studies have shown that statistical features such as mean, standard deviation, peak to peak, maximum, and minimum are effective for activity recognition systems [4], [14]. We used these features to transform each segment of raw-data to a feature vector for the traditional classifiers we include in our experiments. Active Learning Parameters. The query strategy submodule selects the top K = 20 instances with the highest information density in each query round to ask from the
(a) activity data without Null class
(b) activity data including Null class
Figure 3: HAR F1-score of deep learning methods compared to traditional classifiers for subject-independent scenario.
annotators. We use the ground truth labels to label the selected instances. We set β = 1 for the information density calculation.
an increase in F 1-score by up to 14% compared to DeepConvLSTM [19] and by up to 40% compared to traditional classifiers.
C. Performance Evaluation
Fig. 3b shows the F 1-score for the case where Null class is included. The average F 1-score of the 3D-ConvLSTM architecture is higher by up to 13% compared to RF, by up to 11% compared to K-nn, and by up to 8% DeepConvLSTM. It can be seen that our architecture consistently gives F 1score higher than 80%, while the F 1-score for other classifiers is fluctuating for different subjects. This shows that the 3DConvLSTM architecture is robust against subject heterogeneity. DT classifier holds the minimum F 1-score.
We compare our 3D-ConvLSTM architecture with three traditional classifiers using engineered features: decision tree (DT), random forest (RF), and K-nearest neighbor (K-nn). We also compare our architecture with the state-of-the-art deep network—DeepConvLSTM [19]. For the purpose of thorough comparison, we define several scenarios as follows. Fully-Trained. The training and testing set have the same data distribution (i.e. same subject and fixed sensor locations). This scenario gives an upper bound on classification accuracy. Subject-Independent. The computational model is built based on all subjects except one, which is used for test. This scenario is used to evaluate the effect of subject heterogeneity on the performance of the HAR models. This scenario shows how different computational models (i.e. traditional classifiers, deep architectures) behave in the presence of variation. Active-Training. In this scenario, each machine learning algorithm is initially built as a subject-independent model. We actively update the model by adding labeled data. We use this scenario to show how the active learning approach increases the accuracy of different machine learning models. To compare performance, we use the F 1-score for all models. The F 1-score combines precision and recall accuracy. The rationale for using F 1 score is that activity data classes are imbalanced and hence using accuracy may not be a good way to compare their performance. Fig. 3 shows bars corresponding to F 1-scores of HAR for different classifiers for the subjectindependent scenarios. Subject 1 corresponds to the case that the machine learning models are trained on all subjects except subject 1. Because the Null class in the OPPORTUNITY dataset constitute approximately 75% of the data, we build the HAR models for both cases—with and without Null class. Fig. 3a shows the F 1-score for all machine learning algorithms trained on the OPPORTUNITY dataset excluding the Null class. The deep networks are pre-trained for 100 epochs. We can see that our proposed deep architecture enables
Fig. 4 shows F 1-scores for machine learning models equipped with the active learning. We compare our architecture with: 1) a random forest based classifier; and 2) DeepConvLSTM HAR architecture [19]. We choose RF classifier because its F 1-score for subject-independent model is higher compared to the other classifiers as was shown in Fig. 3. Fig. 4 shows that by using only 20% query, the proposed deep architecture outperforms DeepConvLSTM and RF classifier by up to 12% and 25%, respectively. Furthermore, the F 1-score for our proposed architecture scales up with a higher slope compared to the other models. The F 1-score is only 1% to 3.2% less than the upper bound accuracy for a fully-trained deep architecture. The number of epochs is set to 20 in the active learning phase. Most importantly, while new samples are added to update the proposed 3D-ConvLSTM architecture, the F 1-score increases monotonously in most cases. This shows that the features learned by our architecture are more robust to variation and noise in the activity data. D. Deep Architecture Results 1) Number of Epochs: The number of epochs determines the time that we need to pre-train or train the final deep neural network. We explore the effects of the number of epochs on the F 1-score of the deep architectures. Fig. 5 shows the F 1-score of our architecture and DeepConvLSTM [19] architecture for different number of epochs. The reported accuracy are the average of 10 different runs for all subjects. As we increase the number of epochs, we see a raise in the F 1-score for both architecture. Furthermore, our architecture shows better
(a) Subject 1
(b) Subject 2
(c) Subject 3
(d) Subject 4
Figure 4: F 1-score in the active learning phase versus the number of queries in the active-training scenario. 400 instances roughly are 20% of data for each subject.
Table I: Comparison of training time for each epoch (3DConvLSTM vs DeepConvLSTM). Deep Architecture 3D-ConvLSTM (this work) DeepConvLSTM ( [19])
Time S1 8.2 56
per epoch (second) S2 S3 S4 9.1 9.4 9.2 57 57 60
results as we increase the number of epochs compared to DeepConvLSTM [19]. We reach to maximum accuracy after 40 and 100 epochs for the cases with and without Null class, respectively. 2) Execution Time Comparison: In this subsection, we compare the runtime of training for our deep architecture (i.e. 3D-ConvLSTM) with the DeepConvLSTM architecture. We train both networks for all subject-independent scenarios (i.e. four deep networks). The results are summarized in Table I. We ran the training phase for all scenarios for 10 epochs, and report the average of runtime. The training time for the 3D-ConvLSTM architecture is one order of magnitude lower compared to the DeepConvLSTM architecture for one epoch. In general, we need to run the training phase for 50 to 100 epochs. Therefore, our architecture is three orders of magnitude faster compared to the state-of-the-art architecture (i.e. DeepConvLSTM). The max-pooling layers and fewer number of kernels decreases the number of parameters in the deep architecture, which in turn decreases the training time for the 3D-ConvLSTM architecture.
Figure 5: The accuracy of activity recognition vs the number of epochs in the training phase (comparison between 3DConvLSTM and DeepConvLSTM).
VI. R ELATED W ORK The robustness of prediction model for wearables is widely studied in the literature. In [12], a set of heuristics is presented for activity recognition with sensors displaced within a single body segment. The authors show that within certain limits and with modest accuracy drop, activity recognition is tolerant to displacement. Forster et. al. proposed a genetic programmingbased feature selection algorithm. The goal is to find a set of discriminative and variation tolerant features that also reduce the energy requirements of the system [7]. The results showed an average 80% of activity recognition accuracy.
Two drawbacks of transfer learning approaches are that: 1) they use engineered features in the knowledge transfer procedure; and 2) it is assumed that there is always relation between source and target context. In [9], a deep architecture only using Convolutional Neural Networks (CNN) is proposed to capture local dependency and mitigate the effects of variation. They also exploit a modified weight sharing technique for the accelerometer data to get further improvements. As mentioned in earlier sections in this paper, in the work [19], a deep architecture for HAR based on Convolutional and LSTM recurrent units is proposed for multi-sensor wearable systems. The advantage of this work is that it explicitly models the temporal dynamics of activity data in addition to representation learning. However, none of the above deep architectures considers active development of the deep architecture.
However, in the dynamic environment of wearables, the assumption that training data is available for different configurations is unrealistic. Fallahzadeh et. al. [25] propose a
network-based approach for cross-subject knowledge transfer in an activity recognition system. The algorithm leverages the feature space cross-subject similarities by constructing a network for the data in source and target subjects and performs community detection to extract core observations in the target data. In another work, a multi-view learning approach is presented for wearables with multiple sensors in which one of them acts as the source domain [26]. It is shown that by using an additional wearable node the performance increases by up to 15%.
VII.
C ONCLUSION
We proposed a novel deep architecture to support reconfiguration of wearable human activity recognition systems operating in dynamic environments. In contrast to traditional methods that rely on domain knowledge and feature engineering, here, we used a deep architecture to learn hierarchical representation of features from raw-data as well as to capture the time dependencies of human activity data. The architecture includes an active learning module; whenever the system faces uncertainty (i.e. the ID score for incoming samples is high) in the real-time execution, the active learning phase is triggered to update the deep architecture. This allows us to choose the best samples for fine-tuning the model to the new setting that the system operates in (e.g. a new user). We demonstrated the efficacy of the architecture using real activity data collected in different contexts. We plan to extend this work in several directions. First, we plan to design deep architectures for more complex heterogeneity such as mis-orientation and misplacement of sensors. In particular, we plan to develop knowledge transfer algorithms for location-independent human activity recognition. Second, we plan to explore the suitability of integrated methods for transfer and active learning, instead of having them sequentially in the system development process.
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
ACKNOWLEDGMENT This work was supported by the National Science Foundation CAREER Award IIS 1553528 and by the National Science Foundation Research Experiences for Undergraduates Program under Grant No. 1460917. R EFERENCES [1] [2]
[3]
[4]
[5]
[6]
[7]
[8]
M. C. Riddell and R. J. Sigal, “Physical activity, exercise and diabetes.” Canadian journal of diabetes, vol. 37, no. 6, pp. 359–360, 2013. A. J. Cooper, S. Brage, U. Ekelund, N. J. Wareham, S. J. Griffin, and R. K. Simmons, “Association between objectively assessed sedentary time and physical activity with metabolic risk factors among people with recently diagnosed type 2 diabetes,” Diabetologia, vol. 57, no. 1, pp. 73–82, 2014. H. Ghasemezadeh, S. Ostadabbas, E. Guenterberg, and A. Pantelopoulos, “Wireless medical embedded systems: A review of signal processing techniques for classification,” IEEE Sensors Journal, 2012, In Press. R. Saeedi, N. Amini, and H. Ghasemzadeh, “Patient-centric on-body sensor localization in smart health systems,” in Signals, Systems and Computers, 2014 48th Asilomar Conference on. IEEE, 2014, pp. 2081– 2085. R. Saeedi, H. Ghasemzadeh, and A. H. Gebremedhin, “Transfer learning algorithms for autonomous reconfiguration of wearable systems,” in Big Data (Big Data), 2016 IEEE International Conference on. IEEE, 2016, pp. 563–569. A. Stisen, H. Blunck, S. Bhattacharya, T. S. Prentow, M. B. Kjærgaard, A. Dey, T. Sonne, and M. M. Jensen, “Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition,” in Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, 2015, pp. 127–140. K. Forster, P. Brem, D. Roggen, and G. Troster, “Evolving discriminative features robust to sensor displacement for activity recognition in body area sensor networks,” in Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2009 5th International Conference on. IEEE, 2009, pp. 43–48. T. Pl¨otz, N. Y. Hammerla, and P. Olivier, “Feature learning for activity recognition in ubiquitous computing,” in IJCAI ProceedingsInternational Joint Conference on Artificial Intelligence, vol. 22, no. 1, 2011, p. 1729.
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
M. Zeng, L. T. Nguyen, B. Yu, O. J. Mengshoel, J. Zhu, P. Wu, and J. Zhang, “Convolutional neural networks for human activity recognition using mobile sensors,” in Mobile Computing, Applications and Services (MobiCASE), 2014 6th International Conference on. IEEE, 2014, pp. 197–205. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105. R. Saeedi, J. Purath, K. Venkatasubramanian, and H. Ghasemzadeh, “Toward seamless wearable sensing: Automatic on-body sensor localization for physical activity monitoring,” in The 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2014. K. Kunze and P. Lukowicz, “Using acceleration signatures from everyday activities for on-body device location,” in Wearable Computers, 2007 11th IEEE International Symposium on. IEEE, 2007, pp. 115– 116. R. Saeedi, R. Fallahzadeh, P. Alinia, and H. Ghasemzadeh, “An energyefficient computational model for uncertainty management in dynamically changing networked wearables,” in International Symposium on Low Power Electronics and Design (ISLPED). IEEE/ACM, 2016. R. Saeedi, K. Sasani, and A. H. Gebremedhin, “Co-MEAL: Costoptimal multi-expert active learning architecture for mobile health monitoring,” in The 8th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB’17). ACM, 2017. R. Lomasky, C. E. Brodley, M. Aernecke, D. Walt, and M. Friedl, “Active class selection,” in European Conference on Machine Learning. Springer, 2007, pp. 640–647. B. Settles, M. Craven, and L. Friedland, “Active learning with real annotation costs,” in Proceedings of the NIPS workshop on costsensitive learning, 2008, pp. 1–10. K. Kunze, P. Lukowicz, H. Junker, and G. Tr¨oster, “Where am i: Recognizing on-body positions of wearable sensors,” in Location-and context-awareness. Springer, 2005, pp. 264–275. F. J. O. Morales and D. Roggen, “Deep convolutional feature transfer across mobile activity recognition domains, sensor modalities and locations,” in Proceedings of the 2016 ACM International Symposium on Wearable Computers. ACM, 2016, pp. 92–99. F. J. Ord´on˜ ez and D. Roggen, “Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition,” Sensors, vol. 16, no. 1, p. 115, 2016. A. Anderson, K. Shaffer, A. Yankov, C. D. Corley, and N. O. Hodas, “Beyond fine tuning: A modular approach to learning on small data,” arXiv preprint arXiv:1611.01714, 2016. J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2625–2634. B. Settles and M. Craven, “An analysis of active learning strategies for sequence labeling tasks,” in Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2008, pp. 1070–1079. B. Settles, “Active learning literature survey,” Computer Science Technical Report 1648, University of Wisconsin, Madison, no. 1-67, p. 67, 2010. D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. F¨orster, G. Tr¨oster, P. Lukowicz, D. Bannach, G. Pirkl, A. Ferscha et al., “Collecting complex activity datasets in highly rich networked sensor environments,” in Networked Sensing Systems (INSS), 2010 Seventh International Conference on. IEEE, 2010, pp. 233–240. R. Fallahzadeh and H. Ghasemzadeh, “Personalization without user interruption: boosting activity recognition in new subjects using unlabeled data,” in Proceedings of the 8th International Conference on CyberPhysical Systems. ACM, 2017, pp. 293–302. S. A. Rokni and H. Ghasemzadeh, “Plug-n-learn: automatic learning of computational algorithms in human-centered internet-of-things applications,” in Proceedings of the 53rd Annual Design Automation Conference. ACM, 2016, p. 139.