Mining Hidden Correlations between Sleep and Lifestyle Factors from ...

7 downloads 0 Views 383KB Size Report
Sep 16, 2016 - collect their sleep data as well as lifestyle data such as ... between personal sleep and lifestyle factors. Our ... Support, confidence, and lift are.
Missing:
UBICOMP/ISWC ’16 ADJUNCT, SEPTEMBER 12-16, 2016, HEIDELBERG, GERMANY

Mining Hidden Correlations between Sleep and Lifestyle Factors from Quantified-Self Data Zilu Liang

Abstract

National Institute of Advanced Industrial

It has been widely recognized that discovering potential contributing factors to personal sleep is as important as understanding sleep pattern per se. However, in large quantified-self datasets, contributing factors may only show correlations to sleep when their values are within certain ranges. Existing correlation analysis using Pearson Correlation Coefficient cannot identify such hidden dependencies. We propose a new method based on association rules mining. Our method not only can discover hidden correlations that existing methods cannot, but also provides users with actionable knowledge to guide sleep improvement through lifestyle change.

Science and Technology, Japan The University of Tokyo, Japan [email protected] Mario Alberto Chapa Martell The University of Tokyo, Japan [email protected] Takuichi Nishimura National Institute of Advanced Industrial Science and Technology, Japan [email protected]

Author Keywords Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). Ubicomp/ISWC'16 Adjunct, September 12-16, 2016, Heidelberg, Germany ACM 978-1-4503-4462-3/16/09. http://dx.doi.org/10.1145/2968219.2968319

547

Personal informatics; sleep; activity tracking; quantified self; health; data mining; association rules mining.

ACM Classification Keywords H.2.8. Data mining; J.3. Health; K.m. Miscellaneous.

Introduction Nowadays many people are using self-tracking tools to collect their sleep data as well as lifestyle data such as steps walked, exercise, and sedentary time. Though it

UBICOMP/ISWC ’16 ADJUNCT, SEPTEMBER 12-16, 2016, HEIDELBERG, GERMANY

was recognized that understanding sleep contributing factors is critical for guiding sleep improvement, personal sleep data was rarely analyzed together with other personal data.

Figure 1: An example of local correlation. Minutes sedentary and sleep efficiency are not globally correlated. However, when minutes sedentary is within segment [800, 1000], the two attributes are positively correlated (local correlation).

Recently, several systems were developed to discover correlations between sleep and lifestyle factors [1-3]. However, the correlation analysis techniques, such as Pearson correlation coefficient or information gain, can only discover dependencies between two attributes over the entire instance space, which is called global correlation. In some cases, the dependence between two attributes is only strong within certain ranges. As is shown in Figure 1, there is barely global correlation between minutes sedentary and sleep efficiency using Pearson Correlation Coefficient. Nevertheless, if we divide the instance space of minutes sedentary into three segments [400, 600), [600, 800), and [800, 1000], clearly minutes sedentary and sleep efficiency is positively correlated when minutes sedentary is between [800, 1000]. We name this kind of segmental correlation between two attributes is called local correlation as opposed to global correlation. Since segmental correlations are not likely to be discovered using traditional correlation analysis techniques, they are also called hidden correlations. In this paper we presented a method to tackle the problem of discovering hidden segmental correlations between personal sleep and lifestyle factors. Our method is based on association rules mining techniques. We will show that our method not only could effectively discover local correlations, but also can generate actionable knowledge that guides users to improve personal sleep through lifestyle change.

548

Related Work Association rules mining was traditionally developed for discovering interesting relationships between items in market transaction datasets and was later applied to many other fields such as bioinformatics [4]. An association rule is a relationship in the form 𝐴 ⇒ 𝐵, where A is called left-hand-side (LHS) and B is righthand-side (RHS). Support, confidence, and lift are commonly used measures for evaluating the interestingness of a rule. In this paper, we convert selftracking datasets to the format of market transaction datasets. Sleep and lifestyle factors are the attributes in the transaction datasets. The goal is to discover association rules where the LHS contains one lifestyle factor within a segment of its instance space and the RHS is sleep quality.

Mining Association Rules between Sleep and Lifestyle We assume that users’ quantified-self data were collected on daily basis using wearable devices, mobile apps or diary. We selected sleep efficiency (SE) as the sleep quality metric to avoid unnecessary complexity. It is worth mentioning that the proposed data mining algorithm can be applied to other sleep quality metrics such as sleep duration and sleep onset latency. Lifestyle factors include steps, distance, minutes very active, minutes lightly active, weight, water taken, bed time and so on. The goal of the mining task is to discover association rules where the LHS is a lifestyle factor and the RHS is good or bad sleep, so that one association rule represents a segmental correlation between the lifestyle factor and sleep.

SESSION: NEWFRONTIERSQS

The procedure of mining algorithm is described in plain words below. As algorithm design is not a main concern of this paper, the mathematical formulation of the problem and the formal description of the algorithm are not presented here. Discretization of Attributes Association rules mining was developed for mining nominal attributes. However, quantified-self data usually contained many continuous attributes or nonnumerical attributes. These attributes needed to be converted into nominal attributes. For sleep efficiency, we used 95% as the partition boundary to classify sleep into good sleep and bad sleep [5]. For lifestyle factors, we used three unsupervised discretization methods: interval-based method (equal width interval binning), frequency-based method (equal frequency interval), and cluster-based method (k-means clustering). Continuous attributes were discretized into N segments, where N is a user-specified parameter. The selection of discretization method may have direct impact on the identified segmental correlations. Mining Algorithm The algorithm used for mining hidden local correlations was a modified version of the algorithm proposed in [6]. The general flow of the algorithm was as follows:

Figure 2: Histogram of sleep efficiency in Dataset I~III.

1. Data cleaning: removing measurement errors and unreliable data entries, e.g., steps < 1000, or bed time between [7:00, 19:00]. 2. Data preprocessing: converting continuous and non-numerical attributes into nominal attributes as described in previous subsection.

549

3. Data mining: performing Apriori association rules mining algorithm [7]. 4. Rule pruning: removing redundant rules; filtering out the rules with LHS consisting of more than one factor (for the purpose of easy interpretation).

Experiment and Results In this section we present the basic statistics of the dataset, the association rules mining results, and the impact of parameter N (the number of discretized segments) and the individuality of discovered association rules. Data Collection We prepared five datasets for evaluating the proposed method. Dataset I was collected between December 2015 and May 2016 (approximately 180 days) using a Fitbit HR Charge. Dataset II~V were collected during July 2015 and September 2015 in our previous study under the ethics approval obtained in University of Melbourne [3]. After data cleaning, the dataset that was used for final analysis had 7 attributes: sleep efficiency, bed time, steps, distance, minutes sedentary, minutes very active, and minutes lightly active. Other attributes such as weight, water, floors climbed were not included because they were either not reliable or not tracked. The histogram of sleep efficiency is shown in Figure 2 and Figure 3. It is shown that personal sleep demonstrated strong individuality. Association Rules Mining Results We first discretized the lifestyle attributes into 10 segments using frequency-based discretization method. After applying the Apriori association rule mining technique, 10 association rules were discovered. None of these rules was redundant and thus no rule was

UBICOMP/ISWC ’16 ADJUNCT, SEPTEMBER 12-16, 2016, HEIDELBERG, GERMANY

removed in the pruning process. The LHS of all the rules only had 1 lifestyle factor. The discovered rules are shown below:  Rule 1: Minutes very active=[8,16) => good sleep  Rule 2: Minutes very active=[33,38) => good sleep  Rule 3: Minutes lightly active=[301,335) => good

sleep  Rule 4: Minutes sedentary=[618,639) => good sleep

N on mining results..

Global Pearson Correlation

Global Distance Correlation

1

0.73

0.14

0.19

2

-0.13

0.14

0.19

3

-0.11

-0.06

0.19

4

-0.75

0.06

0.19

5

0.58

0.06

0.19

6

-0.23

0.08

0.17

7

0.22

0.08

0.17

 Rule 6: Steps=[14547,16206) => good sleep

8

-0.22

0.08

0.18

 Rule 7: Steps=[18658,20263] => good sleep

9

0.44

0.08

0.18

 Rule 8: Distance=[9.56, 9.85) => good sleep

10

0.47

-0.02

0.21

 Rule 10: Bed time=[23:26,23:32) => good sleep

Figure 4: The impact of parameter

Segmental Pearson Correlation

 Rule 5: Minutes sedentary=[807,972] => good sleep

 Rule 9: Distance=[10.95,12.47) => good sleep

Figure 3: Histogram of sleep efficiency in Dataset IV~V.

Rule No.

Taken the first rule as an example, the interpretation is as follows: when the subject was active for 8~16 minutes during the day, it is very likely that she would have good sleep at night. This provides straightforward guidance on what the subject could do (e.g. be very active for 8~16 minutes per day) to sleep better. Similarly, the subject may walk more than 14000 steps during the day in order to sleep well. Other rules can be interpreted in a similar way. To validate the discovered association rules, we extracted the data entries where the LHS is satisfied and analyzed the segmental Pearson correlations between the lifestyle factor and sleep in the subset. The comparison of segmental Pearson correlations, global Pearson correlations, and global distance correlations [8] between sleep and the concerned lifestyle factors is shown in Table 1.

550

Table 1: A comparison of Segmental Pearson Correlation, Global Pearson Correlation, and Global Distance Correlation between sleep and lifestyle factors (frequency-based discretization, N=10). Association rule mining discovered hidden segmental correlations that traditional global correlation analysis failed to discover. Global Pearson correlations showed that none of the lifestyle factors was globally correlated to sleep. However, global distance correlations indicated that there were dependencies as the distance correlation coefficients were non-zero [8]. Indeed, the lifestyle factors were correlated to sleep locally within the segments defined in the association rules. For example, the correlation between minutes sedentary and sleep efficiency is 0.06 (indicating no correlation). However, association rule mining discovered an association rule that minutes sedentary= [807, 972] is associated with good sleep. After extracting data entries where minutes sedentary is in the segment [807, 972], the local

SESSION: NEWFRONTIERSQS

ID

Associate Rules Minutes very active =[47,96] => good sleep

P1

Minutes very active =[32,47) => good sleep Minutes very active =[13,32) => good sleep Minutes very active =[2,6) => bad sleep

P2

Minutes very active =[20,30) => bad sleep Minutes very active =[30,56) => bad sleep Minutes very active =[6,20) => bad sleep

P3

Minutes very active =[6,20) => good sleep Minutes very active =[50,58) => good sleep

P4

Minutes very active =[40,50) => good sleep Minutes very active =[58,92) => good sleep Minutes very active =[29,40) => good sleep

Table 2. Individuality in identified association rules for four participants. The same lifestyle factor (minutes very active) has distinct associations to sleep efficiency for each person.

correlation between minutes sedentary and sleep in the subset is 0.58 (indicating moderate positive correlation), which confirmed the correctness of the association rule. It is also worth noting that rule No. 2 and No.3 did not yield high local Pearson correlation. Possible explanation could be that Pearson correlation only shows linear correlation, whereas the segmental dependencies were not necessarily linear. Impact of Parameter N Still applying frequency-based discretization method to Dataset I, we investigated the impact of parameter N (=2~10), which is the number of discretization segments. Figure 4 plots the number of rules discovered as N changes. The general trend is that as N increased, (1) the number of rules discovered decreased, (2) the number of redundant rules (=Total number of rules – Number of pruned rules) decreased, and (3) the number of compact rules (i.e., LHS only has 1 factor) increased. Individuality of Discovered Associations We applied the proposed approach to Dataset II~V. It shows that the identified association rules demonstrated strong individuality. For instance, we summarized the rules related to minutes very active in Table 2. On the one hand, this lifestyle factor was associated to good sleep for P1, P3 and P4, but the ideal quantity was different for each person. On the other hand, the same quantity of active minutes (e.g. Minutes very active = [6, 20)) could be associated to good sleep in one person (P3) but the opposite in another (P2).

551

Discussion By comparing the discovered association rules with segmental Pearson correlation coefficient, we confirmed that our method can discover hidden segmental correlations that traditional global correlation analysis cannot. As for the impact of parameter N, the generally trend is that small N leads to more redundant rules and longer rules. In addition, cluster-based discretization method is very sensitive to parameter N. However, analysis using difference scenarios is required to draw general conclusions. The identified association rules demonstrated strong individuality as well as commonness. The relationship between the two (individuality and commonness) may be an interesting topic deserving explorer. It is also observed that the association mining results were bounded by users’ lifestyle context; the optimal ranges of lifestyle factors indicated by association rules were with respect to users’ current lifestyle rather than being absolutely ideal ranges. In order to explorer personal optimal ranges of a lifestyle factor, it is necessary to carefully design the self-tracking experiments in a controlled manner [9].

Conclusion In this study, we proposed a new method based on association rules mining to discover hidden segmental dependencies between sleep and lifestyle from personal quantified-self data. Experiment results confirmed that our method effectively discovered segmental correlations that traditional correlation analysis techniques failed to, and the discovered association rules provided actionable knowledge on lifestyle change for sleep improvement.

UBICOMP/ISWC ’16 ADJUNCT, SEPTEMBER 12-16, 2016, HEIDELBERG, GERMANY

This is an ongoing research project. Our overarching goal is to establish a comprehensive data mining protocol for analyzing personal sleep contributing factors. In the next step, we intend to incorporate significance test into the association rules mining process. In addition to sleep efficiency, we will also extend the dimension of sleep quality by considering more metrics such as minutes asleep, minutes awake, sleep onset latency, number of awakenings and modify the proposed method accordingly. Finally, the proposed method will be tested and evaluated on more quantified-self datasets.

3.

Zilu Liang, Bernd Ploderer, Wanyu Liu, et al. (In press) SleepExplorer: a visualization tool to make sense of correlations between personal sleep data and contextual factors. Personal and Ubiquitous Computing.

4.

C. Creighton, and S. Hanash. 2003. Mining gene expression databases for association rules. Bioinformatics, 19(1):79-86.

5.

M. A. Carskadon, W. C. Dement. Normal human sleep: an overview. In Kryger, M. H., Roth, T., Dement, W. C., eds. Principles and practice of sleep medicine. 4th ed. Philadelphia, PA: Elsevier Saunders, 13-23, 2005.

Acknowledgements

6.

Zilu Liang, Bernd Ploderer, Mario Alberto Chapa Martell, Takuichi Nishimura. 2016. A Cloud-based Intelligent Computing System for Contextual Exploration on Personal Sleep-tracking Data Using Association Rule Mining. In Proc of ICISC 2016.

7.

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases, in Bocca, Jorge B.; Jarke, Matthias; and Zaniolo, Carlo; editors, Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, pages 487-499, September 1994.

8.

G. J. Szekely, M. L. Rizzo, N. K. Bakirov. Measuring and testing independence by correlation of distances. Annals of Statistics 35(6): 2769-2794 2007.

9.

E. O. Lillie, B. Patay, J. Diamant, et al. The n-of-1 clinical trial: the ultimate strategy for individualizing medicine? Per Med 8(2):161-173, 2011.

This study was supported by the New Energy and Industrial Technology Development Organization (NEDO). As well authors 1 gratefully acknowledge the support from Australian Government Endeavour Research Fellowship and Microsoft BizSpark.

References 1.

2.

Frank Bentley, Konrad Tollmar, Peter Stephenson, et al. 2013. Health Mashups: Presenting Statistical Patterns between Wellbeing Data and Context in Natural Language to Promote Behavior Change. ACM Transactions on Computer-Human Interaction (TOCHI) 20:5, 30. Eun Kyoung Choe, Bongshin Lee, Matthew Kay, et al. 2015. Sleeptight: Low-Burden, Self-Monitoring Technology for Capturing and Reflecting on Sleep Behaviors. In Proc of UbiComp 2015. 121-132.

552