Correlations between objective behavioral features

7 downloads 0 Views 185KB Size Report
from mobile and wearable devices, and depressive mood symptoms in affective .... depressive mood symptoms has raised great enthusiasm regarding using mobile and wearable technology .... The features were presented in a nomenclature list to create a standardized ..... sound detection (speech pauses; = 0.34, p
Systematic review

Correlations between objective behavioral features collected from mobile and wearable devices, and depressive mood symptoms in affective disorders: A systematic review Darius A. Rohani1,2, Maria Faurholt-Jepsen3, Lars V. Kessing3, Jakob E. Bardram1,2 Department of Applied Mathematics and Computer Science, Technical University of Denmark 2 Copenhagen Center for Health Technology, Denmark 3 Psychiatric Center Copenhagen, Rigshospitalet 1

Abstract

Background: Several studies have recently reported on the correlation between objective behavioral features collected via mobile and wearable technologies and depressive mood symptoms in affective disorders (unipolar disorder and bipolar disorder). However, individual studies have reported on different and sometimes contradicting results, and no quantitative systematic review of the correlation between objective behavioral features and depressive mood symptoms has been published. Objective: The objectives of this systematic review were to 1) provide an overview of correlations between objective behavioral features and depressive mood symptoms reported in the literature, and 2) investigate the strength and statistical significance of these correlations across studies. The answers to these questions could potentially help in the identification on which objective features have shown most promising results across studies. Methods: A systematic review of the scientific literature, reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines was conducted. IEEE Xplore, ACM Digital Library, Web of Sciences, PsychINFO, Pubmed, DBLP computer science bibliography, HTA, DARE, Scopus and Science Direct were searched and supplemented by hand examination of reference lists. The search ended 04/27-2017 and was limited to studies published 20072017. Results: A total of 46 studies were eligible for the review. These studies identified and investigated 85 unique objective behavioral features covering 17 various sensor data inputs. These features can be categorized into seven overall categories. Several features were found to have statistically significant and consistent correlation directionality with mood assessment (e.g., the amount of home stay, sleep duration, vigorous activity), while others showed directionality discrepancies across the

studies (e.g., amount of SMS sent, time you spend between locations, frequency of smartphone screen activity). Conclusions: Several studies showed consistent and statistically significant correlations between objective behavioral features collected by mobile and wearable technology and depressive mood symptoms. Hence, continuous and every-day monitoring of behavioral aspects in affective disorders could be a promising supplementary objective measure to estimate depressive mood symptoms. However, the evidence is limited by methodological issues in individual studies and by a lack of standardization of 1) the collected objective features, 2) the mood assessment methodology, and 3) the statistical methods applied. Therefore, consistency in data collection and analysis in future studies is needed making replication studies as well as meta-analyses possible. Keywords: Mood Disorder; Affective Disorder; Depression; Depressive Mood Symptoms; Bipolar Disorder; Objective Features; Correlation; Behavior; Sensor Data; Mobile Health; mHealth; Smartphone; Wearables; Systematic Review

Introduction Recently, there has been an increasing body of research investigating the use of mobile and wearable technologies as a treatment intervention for depression [1]. Several mobile solutions have been proposed to utilize a self-monitoring and intervention based treatment of depression (see [2–5] for a review). One particular research approach taken by many research groups has been to investigate how objectively measured behavioral features such as 'location' and 'social interaction' correlate with depression, and thereby try to differentiate euthymic and depressed state (e.g., [6–11]). For example, by using a smartphone application passively recording information from sensors in the phone, Saeb et al. [7] could show a statistically significant correlation between six different objective features, including smartphone usage frequency and self-assessed mood using the Patient Health Questionnaire 9 (PHQ-9) scale [12] in non-clinical samples. Similarly, FaurholtJepsen et al. [6] found five different objective features, including the number of outgoing text messages, which had a statistically significant positive correlation with depression severity as assessed using the Hamilton Depression Rating Scale (HDRS) in patients with bipolar disorder (BD). The diagnostic process, as well as the process of symptom severity assessment in affective disorder, is based upon a combination of clinical evaluations and patient information, and there is a lack of objective markers of, for example, trait and state. Digital behavioral markers have been defined as higher-level features, reflecting behaviors, cognitions, and emotions, which are measured using low-level features and sensor data collected from digital technology, including mobile and wearable computing [13]. Many studies have found statistically significant correlations

between objective behavioral features collected from mobile and wearable technology and mood symptoms in non-clinical samples of participants without psychiatric illness (e.g., [14–17]) as well as in clinical samples of patients diagnosed with psychiatric disorders (e.g., [11,18–20]). The discovery of such significant correlations between objective features and depressive mood symptoms has raised great enthusiasm regarding using mobile and wearable technology in the treatment and monitoring of depression and other affective disorders. It has been argued that such an approach may provide an easy and objective way to monitor illness activity and could serve as a digital marker of mood symptoms in affective disorders [13,18]. Thus, if there is a well-established correlation between a specific digital marker – such as the number of steps taken and depressive mood symptoms – it would in practice be possible to develop an entirely automatic monitoring system. When, for example, the measured objective feature deviates from healthy behavior, an alarm or trigger could be raised in the clinic, which then could contact the patient [21]. However, when looking across individual studies, it is not easy to identify which objective features that consistently correlate with depressive mood symptoms and in what way. Some studies show similar results, while others show contradicting results. For example, Beiwinkel et al. [22] found a statistically significant negative correlation between the number of outgoing text messages and the HDRS, whereas Faurholt-Jepsen et al. [6] found a statistically significant positive correlation. Asselberg et al. [15] found a negative correlation with smartphone usage frequency and depressive symptoms, while Saeb et al. [7] found the opposite. No prior work has presented a comprehensive quantitative overview of objectively collected mobile features and how they relate to depressive mood symptoms. A more qualitative overview has recently been provided by Dogan et al. [5] which highlight different mobile systems that have been developed for affective disorders that record subjective and objective features. They describe the findings of 29 different studies divided into different feature categories such as physical activity, location, and phone usage, in a study-by-study evaluation. Hence, a relevant question is to what degree studies show similar or different correlations between objective features and depressive mood symptoms, and how strong these correlations are. The purpose of this paper is to provide a systematic review of the available studies investigating the correlation between objectively collected features from mobile and wearable technologies and depressive mood symptoms measured using various methods. The systematic review aims to answers: (i) which objective features have been collected? (ii) What is the correlation between objective features and depressive mood symptoms? (iii) Are the correlations similar across studies collecting the same features? Answering these questions could help us identify which objective features have shown most consistency across multiple studies and assist in designing future studies using technologies for objective assessment of depressive mood symptoms.

Methods We initiated the systematic review by following the PICO worksheet guideline [23]. Then we conducted and reported the systematic review according to the preferred reporting items for systematic reviews and Meta-Analysis (PRISMA) statement [24].

Inclusion and Exclusion criteria The following inclusion criteria were met with the included original papers: (1) the study involved any type of objectively measured features. (2) The data is collected from a smartphone or other nonintrusive consumer-based mobile or wearable technology. (3) Participants were assessed on a mood scale which includes selfreported scales (e.g., PHQ-9) or clinical diagnostic scales (e.g., HDRS) used within psychiatry to quantify abnormal depressed mood either prior, during or post-study period. (4) Comparison of the objective features and the assessed depression scale between- or within subjects were available or provided upon request from the respective corresponding author. (5) As per definition from the PICO Search Strategy following publication types were included: Meta-Analysis, Cohort study, Systematic Review, Case-Control Study, Randomized Controlled Trial, and Case series/report. To ensure a broad inclusion of studies investigating the relationship between objective features and mood symptoms, the 3rd statement was deliberately chosen to reflect a broad selection of clinical and non-clinical participants rated on different mood scales. This includes both commonly used and clinically verified rating scales such as the HDRS and PHQ-9, as well as non-standard scales designed for a specific usage or technology, such as the 7-point (-3 to 3) scale used in the MONARCA project [25,26]. We excluded original papers on the following premises: (1) Non-quantitative studies or when only subjective features are collected. (2) If no English version of the paper was available. (3) Studies that included participants with disorders other than mood disorders. (4) Non-human participants. (5) Studies within social media since it has been thoroughly investigated elsewhere [27]. (6) Studies with participants < 18 years of age [28], to keep the focus on behavioral objective features collected on adults. (7) Studies before 2007/01/01. (8) Articles that haven’t been published through peer review. (9) Following publication types were excluded: Trial protocols, in vitro/lab research, Animal research, and editorials/letters/opinions.

Search strategy The corresponding author (DAR) searched the following databases the 25 th of November 2016 to target both clinical and technical scientific literature: IEEE

Xplore, ACM Digital Library, Web of Sciences, PsychINFO, Pubmed, DBLP computer science bibliography, HTA, DARE, Scopus and Science Direct. Systematic reviews and meta-analysis publications were included in the search for a subsequent cited reference search which was conducted the 27th of April 2017. A broad database-specific search string was designed to target all studies that investigate mood disorders within a mobile setting. The specific search string for Pubmed was: (smartphone OR mobile OR wearable OR “smart phone” OR app OR apps) AND (depression OR bipolar OR unipolar OR “affective disorder” OR “mental health” OR “mood disorder”) AND (“2007/01/01”[Date – Publication] : “2017/01/01”[Date – Publication]) AND English[Language]. The search strings for the other databases can be found in the Supplementary Appendix 1. The resulting publications were combined to one large spreadsheet, using an inhouse Matlab script, with header information: database, title, author, publication year, publication type, and publisher.

Study selection After removal of duplicates, studies were screened for eligibility in two phases. In phase 1, one author (DAR) excluded based on the title. The title revealed several exclusion criteria, including different disorders (Alzheimer, schizophrenia, diabetes, chronic pain, autism, Parkinson, PTSD, or anorexia nervosa), non-human experiments, smartphone addiction topics, focuses on diary methods which only involves subjective data, uses Internet-based interventions, and non-medical related topics such as bipolar electricity. In phase 2, one author (DAR) went through the abstract. If eligible, the full-text was retrieved and reviewed. Exclusion reasons included no objective features collected, only self-assessment, and studies concerning emotion. The resulting list, together with review papers from phase 1 were then used in a cited reference search by two authors (DAR, JEB) to produce the final list. The final list was critically investigated by all authors which led to the exclusion of 16 papers due to outcome measures that did not represent mood assessment (e.g., happiness scales [29–31], Quality of Life [32], or Satisfaction With Life Scale [33] as these are not reflecting abnormal depressed mood), or wearables that was not deemed consumer-based, e.g., an Holter monitor [34], or multisensory clothing [35– 37]. Several studies only reported correlation strengths or did not include correlation results between the objective features and the outcome assessment (e.g., [9,14,31,38]). Contact via email was made with the corresponding author of these, and the relevant data was acquired in all cases.

Results of the study selection are outlined in Figure 1.

Data extraction Data were extracted from the final list by one author (DAR) in a predetermined format validated by a second author (JEB). The data was extracted into two separate tables; one for non-clinical samples of participants without psychiatric illness (Table 1), and one for clinical samples of patients diagnosed with Unipolar Disorder (UD) or BD (Table 2). The division into two tables was reviewed by all authors. Both tables list the following data for each study; first author, year of publication, the specification of the mobile device, number of participants, participant demographics (% female, mean +/- SD age), days of the study, and the outcome depression scale. Table 2 includes a diagnosis column. The supplementary material contains expanded versions of Table 1 and 2 (labeled Appendix 2 and 3), which also include information about the recruitment method and the method of assessing the relation between objective features and mood symptoms (e.g., persons correlation, two sample t-test). Table S1 and S2 in the supplementary material provides a detailed overview of the different features for each study, classified into a feature category, the sensor used, a small description, and the results with respect to the mood assessment.

Data analysis We were interested in investigating the correlation between behavioral objective features and depressive mood symptoms, across all the included individual studies. To do this, we first identified all types of objective features, which have been applied in the eligible studies. The features were presented in a nomenclature list to create a standardized definition across all studies (Table 3). Furthermore, we visualize the popularity of the different features and their statistical significance with respect to the correlation with mood assessment (Figure 2). Secondly, we were interested in investigating the strength of the correlation between objective features and depressive mood symptoms (i.e. the correlation coefficient) across the included studies. The investigation was carried out by combining the directionality of the correlation values for identical objective features, weighted by the respective sample size, and visualized as the x-axis and total sample size (log-transformed) on the y-axis. This was done in two separate graphs one representing non-clinical samples of participants (Figure 3 representing data from Table 1), the other clinical samples of patients diagnosed with either UD or BD (Figure 4 representing data from Table 2). The two groups would most likely display different behavior, and the separation was done on this premise. However, a combined result is displayed in the supplementary files Appendix 4 for the convenience of the reader.

A positive directionality indicates that a larger quantity of the respective feature tends to give a higher depression score, i.e., lower mood score (positive correlation with the depression score), while a negative directionality indicates that a larger quantity of the feature value tends to give a lower depression score, i.e., larger mood score (negative correlation with the depression score). All correlation values with outcome measures that represent larger values with better mood outcome were multiplied by -1, to achieve the same weighted correlation directionality across studies. A meta-analysis of the specific correlation values was not considered for this systematic review. The heterogeneity across the studies was too substantial to do any valid meta-analysis of correlations. Not only were different analysis methods applied (e.g., some using within-subject correlation others between subjects, some using day-averaged others week-averaged data) but also different apparatus and mood assessments were used. However, there is a clear correlation directionality invariance shown by studies comparing different analysis methods [6,22], and studies replicating same analysis methods on different datasets [7,39] which argues that the directionality is a stable metric. Regarding the specific correlation values, we still encourage the reader to look at the results across studies using Table S1, and S2, as a reference.

Results From 3,507 potentially eligible studies, 46 meet the criteria of the review. A flowchart of the screening process is seen in Figure 1. [INSERT FIGURE 1 HERE] Caption: [A flowchart illustrating the number of reviewed studies through the different phases. An exhaustive cited search was performed on the eligible studies, as represented by the “Additional records identified through cited search” box.] Characteristics of the included studies are summarized in Table 1 and 2. Table 1 lists studies including non-clinical samples of participants (n = 20) and Table 2 lists studies including clinical samples of patients diagnosed with either UD or BD (n = 26). A more detailed overview of the included studies is listed in the supplementary material as Table S1 and S2. Table 1. Summary of the included studies with non-clinical samples of participants, listing; author details, technology used, including operating system (OS) and name of sensor software (when specified), number of participants (N) with percentage of female (F), age information, the study duration (days), and the applied mood scale. ab

First author; year

Technology

N (%F)

Age

Days

Scale

Asselbergs; 2016 [15] Baras; 2016 [40]

Android;Funf

27 (78%)

21.1±2.2

36

10p mood

Android;EmotionStore

10 (10%)

N/A N/A 22.5 85.3±4.1 31 57

14

BRUMS Mood PHQ-9 CES-D PHQ-8 BDI-21

Becker; 2016 [41] Ben-Zeev; 2015 [42] Berke; 2011 [43] Canzian; 2015 [9] Cho; 2016 [44]

Android;Funf Android Multisensor (waist) Android;MoodTraces Phone records

Chow; 2017 [45] DeMasi; 2016 [46] Edwards; 2016 [47] Farhan; 2016 [17] Mark; 2016 [48]

Android Android Digi-Walker Pedometer Android/iOS;LifeRhythm Fitbit flex

27 (N/A) 47 (N/A) 8 (50%) 28 (46%) 532 (56%) 72 (51%) 44 (61%) 39 (59%) 79 (74%) 40 (50%)

19.8±2.4 N/A 21.82 18-25 N/A

17 56 7 N/A 12

9 (33%)

28.4 ± 2.8

7

DASS-21 BDI-21 PHQ-9 PHQ-9 Affect balance rPOMS

Matic; 2011 [16] Mehrotra; 2016 [49] Mestry; 2015 [14] Pillai; 2014 [50]

Windows M. 6.5;MyExperience Android Android Actigraph

25 (N/A) 2 (50%) 39 (74%)

N/A 22 19.55±3. 2 28.9±10. 1 N/A N/A N/A

30 34 7

PHQ-8 DASS21 BDI-21

Saeb; 2015 [7]

Android;Purple robot

28 (71%)

14

PHQ-9

Saeb; 2016 [39] Wang; 2014 [51] Wang; 2015 [52] Total

Android;Studentlife Android;Studentlife Android;Studentlife

48 (21%) 48 (21%) 37 (N/A) 1,189

70 70 70

PHQ-9 PHQ-9 PHQ-9

a

b

42 70 10 71 N/A

N/A: Not available Depression subscale of BRUMS

a

b

Table 2. Summary of the included studies with clinical samples of participants diagnosed with UD or BD, listing; author details, technology used, including operating system (OS) and name of sensor software (when specified), number of participants (N) with percentage of female (F), the clinical diagnose (Diag), age information, the study duration (Days), and the applied mood scale. First author; year

Technology

N (%F)

Diag

Age

Days

Scale

Abdullah; 2016 [53]

7 (71%)

BD

25-64

28

SRM II-5

Alvarez-Lozano; 2014 [11]

Android; MoodRhythm Android;Monarca

18 (N/A)

BD

N/A

150

7p mood

Beiwinkel; 2016 [22] Berle; 2010 [54]

Android;SIMBA Actigraph

13 (39%) 23 (56%)

BD UD

47.2±3.8 42.8±11

365 14

Dickerson; 2011 [55]

iOS;Empath

1 (100%)

UD

83

14

HDRS Group difference 10p mood

Doryab;2016 [18] Faurholt-Jepsen; 2012 [56]

Android Actiheart

6 (50%) 20 (60%)

UD UD

>18 45.2±12

20 3

Faurholt-Jepsen; 2015 [57]

Actiheart

18 (61%)

UD

3

Faurholt-Jepsen; 2016 [58]

Android; Monarca

28 (68%)

BD

45.6±11. 1 30.3±9.3

CES-D Group difference HDRS-17

84

HDRS-17

Faurholt-Jepsen; 2014 [10]

Android; Monarca

17 (71%)

BD

33.4±9.5

90

HDRS-17

Faurholt-Jepsen; 2015 [59]

Android; Monarca

61 (67%)

BD

29.3±8.4

182

HDRS-17

Faurholt-Jepsen; 2016 [6] Gershon; 2016 [60]

Android; Monarca Actigraph

29 (62%) 37 (62%)

BD BD

30.2±8.8 34.4±10.

84 46

HDRS-17 Group

Gonzales; 2014 [61]

Actigraph

42 (64%)

BD

Grünerbl; 2015 [62] Guidi; 2015 [20] Hauge; 2011 [63]

Android Android Actigraph

10 (80%) 1 (100%) 25 (44%)

BD BD UD

Krane-Gartiser; 2014 [64]

Actigraph

12 (58%)

BD

Loprinzi; 2014 [65]

Actigraph (hip)

UD

Miwa; 2007 [66]

UD

35.1

87

Muaremi; 2014 [67] O’Brien; 2016 [8] Osmani; 2013 [68] Palmius; 2016 [69]

Armband; SenseWear Pro Android Actigraph Android Android; AMoSS

2574 (51%) 5 (0%)

4 41.0±11. 2 33-48 36 42.9±10. 7 39.9±15. 6 46.3

6 (N/A) 59 (73%) 5 (100%) 36 (75%)

BD UD BD BD

18-65 74±6 N/A 44±14

76 7 90 60

St-Amand; 2013 [70]

Actigraph

14 (50%)

BD

44.6±11

14

Todder; 2009 [71]

Actigraph

27 (48%)

UD

49±13

7

Total

7 84 98 14 1 7

difference IDS-C-30 7p mood mood state Group Difference Group difference Group differences Group difference 7p mood MADRS -3:3 mood QIDS-SR16 Group difference Group difference

3,094

We identified seven overall behavioral feature categories, which we denote as "Feature Categories". These categories used 17 unique data inputs to analyze 85 different objective features. The same features were used across studies yielding 176 investigated features, with information about directionality with respect to the mood score on 88% (n = 155) of the cases. The other cases (n = 21) report on accuracy and weightings by combining objective features into single evaluations, which was mostly observed in research papers with classification models [53,62,67,69]. The seven feature categories are shown in Figure 2 and defined and described in Table 3. The size of each pie chart in Figure 2 reflects the total number of features across all studies, and each pie chart is divided in proportions of reported statistically significant (green), statistically non-significant (red) correlation results, and missing statistical evaluation (grey, e.g., [31,40]). A complete overview of the features and their correlation or prediction of depressive mood symptoms is included in Table S3, with Table S4 illustrating the studies that contributed to each of the categories. The supplementary files also include a graph illustration of the data inputs and how they contributed to the different category. Table 3. An overview of the included features together with the data input name (column 2) separated into seven distinct categories (column 1). Feature Category

Feature

Social Features describing social behavior, including activity related to phone calls, texting, social network size, and other people in the user's context.

Call Duration (incoming/outgoing) – Call log Call Frequency (incoming/outgoing) – Call log Calls missed – Call log Maximum call duration – Call log Number of conversations – Call log

SMS received (characters) – SMS log Characters in SMS (sent/received) – SMS log SMS (sent/received) – SMS log Speak duration – Call log Devices seen – Bluetooth Physical activity Features describing physical activity, including movement and step count.

Activity (afternoon, day, evening, morning, night) – Accelerometer Autocorrelation - Accelerometer Vigorous activity – Accelerometer Distance – Accelerometer, GPS Energy expenditure – Multiple-sensors Fourier analysis - Accelerometer Inactivity duration – Accelerometer Jerk - Accelerometer Movement duration – GPS Movement speed – Accelerometer, GPS Movement speed variance – GPS RMSSD – Accelerometer Sample Entropy - Accelerometer Standard deviation of stillness – Accelerometer Steps – Accelerometer, Pedometer

Location Features describing mobility, including GPS tracking, clustering of location (e.g. home stay), and transition time.

Cell tower ID – GSM Home stay – GPS Location clusters – GPS Break duration – FM radio signal Circadian rhythm – GPS Entropy – GPS Home to location cluster – GPS Max distance between clusters – GPS Raw entropy – GPS Routine index – GPS Transition time – GPS Location variance – GPS Coverage area - GPS

Device Features describing device (smartphone or wearable) usage, including app usage, lock/unlock events, and classification of app usage.

Communication/social usage – App Duration – App Browser usage – App Images taken – Camera Number of running apps – App Response time – Notification Screen active duration/frequency – Screen Screen clicks – Screen Time from arrival till seen – Notification Time from seen till acted – Notification Data transmitted - Wifi Deep sleep / total sleep – Accelerometer Deviation of F0 – Microphone Envelope – Microphone Fitness - ECG Fundamental frequency – Microphone Harmonics-to-noise ratio – Microphone Pauses in recording – Microphone

Subject Features capturing the subject's physical state, including sleep and voice.

Short turns during conversation – Microphone Sleep (duration, efficiency, onset latency) – Accelerometer Standard deviation pitch frequency – Microphone Laying down – Camera Standard deviation sleep – Accelerometer Environment Features collected from the physical surroundings of the user. Bio Biometric features related to the subjects body

Intensity level – Light sensor Humidity – Internet Heart rate (sleep, day) – LED light sensor, ECG Skin conductance – EDA

[INSERT FIGURE 2] Caption: Each pie chart represents a category of features (see Table 3). The size of each pie chart indicates the number of different features included in each category. Green color reflects the number of features that statistically significantly correlated with depressive mood symptoms, red indicates statistically non-significant correlations, while grey indicates missing information on the statistical significance of the correlation value. An in-depth analysis of each feature occurring in more than two studies is shown in Figure 3 and 4 for non-clinical samples of participants and clinical samples of participants, respectively. Figure 3 and 4 were constructed as follows. The x-axis is a weighted correlation directionality (wD) defined as: M n(m) wD ( x ) =∑ sgn(F x (m)) , M≥2 N ( x) m=1

(

)

Fx is the correlation value of a unique feature such as SMS Sent. M is the total number of Fx across all studies where N is the combined total number of participants. 'sgn' denotes the sign operation which is -1 for values below zero, and 1 for values above zero. As an example, when considering the correlation between 'screen active frequency' and mood symptom, according to Table S3, this is analyzed in two studies; one studies with N = 28 has a positive correlation, whereas one study with N = 27 shows a negative correlation. This yields a wD of: wD=1

28 27 + (−1 ) =0.02 (27 +28 ) ( 27 +28 )

A wD value of 1 would indicate that all studies have a positive correlation between the measured feature and the mood assessment. This means that consistency across studies would place the feature on either +1 (consistent positive) or -1 (consistent negative) on the x-axis.

The y-axis is log-transformed values, to accommodate the large diversity, of the total number of participants that the feature is measured on. Non-clinical samples of participant studies measuring Call frequency (n = 370) have the highest average study participants, while clinical samples of participants measuring Humidity (n = 6) have the lowest. As in Figure 2, the size of the feature pie chart represents M which is the total number of studies of that particular feature. Similarly, the pie charts are divided in statistically significant (green), statistically non-significant (or lack of reporting) (red) correlations, and missing information on statistical significance (grey). In total, Figure 3 and 4 provide an overview of the correlation between statistically significant features and depressive mood symptoms. Each feature is followed by the result reported in the figure, which is the wD value, the number of studies that included the feature (n), the percentage of statistical significant cases (s), and the mean and standard deviation of the participants included in the “n” studies (m±SD). For non-clinical samples of participants (Figure 3): Most studies, but excluding Callduration (wD = -.04, n = 4, s = 25%, m±SD = 278.50±293.32), frequency (wD = -.04, n = 3, s = 33.33%, m±SD = 370.67±279.44), Screen active frequency (wD = .02, n = 2, s = 50%, m±SD = 27.5±0.71) and transition time (wD = -.26, n = 2, s = 0%, m±SD = 38.00±14.14), agree on the correlation direction, since most features are either at -1 or +1. Home stay (wD = 1, n = 4, s = 75.00%, m±SD = 56.75±23.32), circadian location rhythm (wD = -1, n = 2, s = 100%, m±SD = 38.00±14.14) and entropy (wD = -1, n = 6, s = 83.33%, m±SD = 51.67±22.98) had the largest number of statistically significant studies whereas Distance (wD = -1, n = 4, s = 0%, m±SD = 45.75±24.10), Movement speed (wD = -1, n = 2, s = 0%, m±SD = 63.50±21.92), and Transition time have no statistical significant studies. Similarly, for the clinical sample of patients (Figure 4), we can see the following: Most studies, but excluding Distance (wD = .30, n = 2, s = 0%, m±SD = 10.00±4.24), Humidity (wD = 0, n = 2, s = 0%, m±SD = 6.00±0.00), SMS sent (wD = -.50, n = 4, s = 25%, m±SD = 30.00±21.76) and Activity (wD = -.81, n = 9, s = 66.67%, m±SD = 23.33±16.10), agree on the correlation direction since they are at either -1 or +1 on the wD axis. Cell tower ID (wD = -1, n = 3, s = 66.67%, m±SD = 19.67±8.33), Screen active duration (wD = 1, n = 3, s = 66.67%, m±SD = 21.33±6.66), and Activity had the largest statistically significant percentage, whereas Distance, SMS Received (wD = -1, n = 2, s = 0%, m±SD = 45.00±22.63), and Humidity are lowest. [INSERT FIGURE 3] Caption: Features collected from at least two studies using non-clinical samples of participants. The x-axis (wD) represents a weighted directionality of the correlation between the feature and mood symptoms. Positive values represent a larger

depressive score and vice versa. The y-axis represents the logarithm of the total number of participants across all studies for this feature. The size of each pie chart represents the number of studies that recorded the feature, while the green, red, and grey areas represent statistically significant, statistically non-significant correlations, and missing statistical significance respectively. [INSERT FIGURE 4] Caption: Features collected from at least two studies including clinical sample of patients. The x-axis (wD) represents a weighted directionality of the correlation between the feature and mood symptoms. Positive values represent a larger depressive score and vice versa. The y-axis represents the logarithm of the total number of participants across all studies for this feature. The size of each pie chart represents the number of studies that recorded the feature, while the green, red, and grey areas represent statistically significant, statistically non-significant correlations, and missing statistical significance respectively. Several objective features were only included in a single study. Therefore, their relationship to a depressive mood scale cannot be compared across studies as done in Figure 3 and 4. Some of these features are quite creative and worth mentioning. The most promising results for the non-clinical samples includes the time spent in break rooms (ρ = -0.21, p > 0.05) [16], and less standard deviation of stillness amount which can be interpreted as a more uniform activity pattern ( β = -3.3, p