CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 Published online 20 June 2013 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.3064
SPECIAL ISSUE PAPER
A methodology for assessing the predictable behaviour of mobile users in wireless networks Radu-Corneliu Marin 1 , Ciprian Dobre 1, * ,† and Fatos Xhafa 2 1 University
Politehnica of Bucharest, Computer Science, Splaiul Independentei 313, Bucharest, Romania Politecnica de Catalunya, Computer Science, Girona Salgado 1-3, 08034 Barcelona, Spain
2 Universitat
SUMMARY The analysis of the predictability of human behaviour is an emerging topic in the ubiquitous computing community. Recent endeavours in studying the human behaviour are either based on synthetic models or on real mobile user traces, but what they are mainly focusing on is location: discovering travel patterns or anticipating the whereabouts of mobile users. We extend the analysed context by studying the wireless behaviour of mobile users: interactions with both peers and network devices in academic and office environments. We propose a methodology and a set of guidelines to assess and aid in analysing wireless mobile traces. We prove that the human behaviour is predictable when the studied traced sets are convergent, complete and correct, and we obtain a surprising invariability in interacting with wireless access points, as we are able to pinpoint a wireless user to one of two locations at any specific hour. Copyright © 2013 John Wiley & Sons, Ltd. Received 30 April 2013; Revised 3 May 2013; Accepted 6 May 2013 KEY WORDS:
predictability; human behaviour; entropy; user interactions and human mobility; delay tolerant network analysis; opportunistic interactions
1. INTRODUCTION The mobile computing paradigm has seen a sudden, yet impressive, shift in tides as smartphones and tabletPCs, with enhanced computing ability and connectivity, that have taken the mobile consumer market by storm. Today, smartphone sales reach more than 30% of mobile sales worldwide [1], and such figures are estimated to grow even more in the upcoming years. The emerging field of human dynamics can only benefit from this sudden escalation of abundance of high-end mobile devices, as they are today equipped not only with cutting-edge hardware, but also with high-tech sensors that enable the sensing of the environment and context inference. Furthermore, smartphones and tabletPCs are today equipped with networking capabilities that enable high-speed data access, by means of Wi-Fi and mobile broadband connections; also, Bluetooth connectivity is used for wireless interaction among devices. Thus, such mobile devices pose a great opportunity for complex system researchers in statistical physics, as human behavioural patterns have already been recognised by several previous studies in different real mobile user traces [2–4]. The importance of quantifying, analysing and comprehending complex human behavioural patterns can more easily be expressed through the significance of the real-life problems they are attempting to solve: predicting the outbreak of human and electronic viruses, containing such viral spreads, improving crowd control or even designing safer and better public transport systems.
*Correspondence to: Ciprian Dobre, University Politehnica of Bucharest, Spl. Independentei 313, Romania. † E-mail:
[email protected] Copyright © 2013 John Wiley & Sons, Ltd.
1216
R.-C. MARIN, C. DOBRE AND F. XHAFA
Currently, mobile carriers are already using user traces [5] for discovering buying habits, smart advertising and contextually improved searches. In this paper, we are analysing experiences related to tracing mobile interactions, extracting synergic patterns and exploring the limits in foreseeing the whereabouts of individuals in relations with the virtual communities they are part of. The paper extends on the results presented in [6] with an extensive analysis on interactions with wireless (access points) APs, in which we propose a methodology and guidelines to be used to determine the completeness and correctness of a trace set, but also to determine the predictability of mobile users interacting with wireless APs. As such, we generate and examine more realistic situations and, furthermore, we reproduce our analysis on other mobility traces as well, publicly available in the CRAWDAD [7] database. The rest of the paper is structured as follows: Section 2 offers an overall perspective of background work in human dynamics, as well as in predictability study from information theory. Section 3 covers the experimental setup and tracing application, whereas Sections 4, 5 and 6 provide a detailed description of the analysis and interpretation of experimental results regarding the social human patterns. We present the motivation for proposing the predictability analysis methodology in Section 7, and we conclude the paper with Section 8, which provides a summary of our observations and which presents future work.
2. RELATED WORK Researchers in human dynamics are trying to detect human behavioural patterns and to exploit such predictable sequences as to foresee situations of interest with low error rates. Unfortunately, predictability is hard to express and estimate in case of lack of information or in the absence of determinism. One of the most powerful tools in describing predictable behaviour of ergodic sources is the computation of entropy [8, 9]. In fact, entropy is probably the most popular property used to approximate predictability. However, entropy is not the sole instrument of researchers. For example, Musolesi and Mascolo [3] use a correlogram of residuals of a time series to assess the correctness of a general purpose predictor for context information. Ihler et al. [10] propose modelling normal periodic behaviour as a time-varying Poisson process model, which is further modulated by a hidden Markov process to account for anomalies; on the basis of the aforementioned processes, they introduce a Bayesian unsupervised learning framework to differentiate between patterns and unusual events. Last, but not the least, DelSole [11] presents an encompassing review of the most recent advances in information theory and introduces a framework for exploring the predictability that unifies techniques for analysing predictable behaviour such as ‘linear regression, canonical correlation analysis, singular value decomposition, discriminant analysis, data assimilation’ and, most importantly, predictable component analysis. Most techniques are borrowed from fields that have been studying the predictability of natural phenomena longer than the emergence of human dynamic (e.g. earthquake forecasting [12]). More recent studies in human dynamics focus mainly on location and travel patterns. In [4], the authors estimate the physical location of users using Kalman filters applied on real mobile traces (syslog) and map interactions with wireless APs onto synthetic tracks with a median relative error of 17%; they extract mobility models from synthetic tracks and user pause times, which are easily dispatched in simulators. Also, on the basis of large amounts of tracing data from smartphones (in this case Foursquare’s movement data set [13]), Noulas et al. [14] study human mobility patterns over multiple cities worldwide and report evidence of a proportionality between discrepancies in human behaviour from different cities and the heterogeneity of locations distributed across multiple urban landscapes. Moreover, the authors extract a universal law of human mobility, which is proved to capture realistic movements in different locations by quantifying the number of locations visited from source to destination, rather than the exact physical location. Song et al. [2] measure the entropy of the trajectories of users based on large amounts of tracing data and compute the mobility predictability by means of Fano’s inequality (the authors report a remarkable result of 93%). Probably the most important finding is the invariability of predictability in accordance with the distance covered by users. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
EXPLORING PREDICTABILITY IN MOBILE INTERACTION
1217
Most studies in human dynamics are based on large amounts of tracing data extracted from smartphone users. These data sets are usually collected from specific providers such as Foursquare [13] and MIT’s Reality Mining [15], and in some particular cases directly from carriers [2]. A small niche of researchers are also experimenting with disseminating tracing applications amongst volunteers [4, 16, 17]. As physical location has gradually become a stereotype in context-aware computing, a need for more complex context is already expressed by various studies [18, 19]. In [20], Hui and Crowcroft study the impact of predictable human interactions on forwarding in Pocket Switched Networks. By applying vertex similarity on a data set extracted from the MIT’s Reality Mining traces [15], they observe that adaptive forwarding algorithms can be built by using the history of past encounters. Furthermore, the authors design a distributed forwarding algorithm on the basis of node centrality. Also, the behaviour of mobile users in wireless networks is becoming an interesting topic, especially because a larger range of applications are moving towards mobile platforms. Whether it is predicting resource availability over ad hoc networks [21] for computationally intensive applications or simulating and analysing the dynamic of the human behaviour over wireless networks by using synthetic models [22], the wireless presence of mobile users is creating quite an impact on distributed computing [23]. When correlated with contextual information, the wireless behaviour of mobile users creates the need for intelligent collaboration [24] 3. TRACING EXPERIMENT Our initial premises for the following experiments are that synergic patterns in academic and office environments are subject to repeatability. As explained in Section 2, as opposed to previous more generic studies [2, 14], we focus towards environments where human behaviour can be predictable and try to understand the physical laws governing the human processes. Our work, from this perspective, is somewhat similar to [4]; however, we cover a more generic space in determining the social and connectivity predictability patterns in case of academic and office environments. This section presents the tracing application developed for this, as well as the experimental setup details. 3.1. HYCCUPS Tracer The HYbrid Contextual Cloud for Ubiquituous Platforms comprised of Smartphones (HYCCUPS) Tracer is an Android application designed to collect contextual data from smartphones. The application runs in the background and collects traces for multiple features (see later in the text), which can further be classified by the temporality of acquisition into static or dynamic, or by the semantic interpretation into availability or mobility features. Moreover, static properties are determined at application start-up and are comprised of the device’s traits, whereas dynamic features are momentary values acquired on demand. On the other hand, availability features represent values pertaining to the overall computing system state, whereas mobility features describe the interaction of the device with the outside world. As such, the features collected by the HYCCUPS Tracer are as follows: Minimum and maximum frequency: static properties describing the bounds for Dynamic
Voltage/Frequency Scaling. Current frequency: momentary value of the frequency according to Dynamic Voltage/
Frequency Scaling Load: the current CPU load computed from =proc=st at Total memory: static property of the device describing the total amount of memory Available memory: momentary value that represents the amount of free memory on the device
(bear in mind that, in Android, free memory is wasted memory) Out of memory: asynchronous event notifying that the available memory has reached the
minimal threshold and, in consequence, the Out Of Memory Killer will stop applications Memory threshold: the minimal memory threshold that, when reached, triggers the Out Of
Memory events Sensor availability: static property that conveys the presence of certain sensors (e.g.
accelerometer and proximity) Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
1218
R.-C. MARIN, C. DOBRE AND F. XHAFA
Accelerometer: the accelerometer modulus is a mobility feature that characterises fine grain
movement (if available) Proximity: proximity sensor readings (if available) Battery state: the current charging level (expressed in %) and also the current charge state User activity: availability events representing user actions that trigger opening/closing appli-
cation activities Bluetooth interactions: momentary beacons received from nearby paired devices AllJoyn interactions: interactions over Wi-Fi modelled using the Alljoyn framework. AllJoyn
is an open-source, peer-to-peer software development framework, which offers the means to create ad hoc, proximity-based, opportunistic interdevice communication. The true impact of AllJoyn is expressed through the ease of development of peer-to-peer networking applications provided by: common APIs for transparency over multiple operating systems; automatic management of connectivity, networking and security and last, but not the least, it is optimised for embedded devices. Wi-Fi scan results: temporised wireless AP scan results. Tracing is executed both periodically, with a predefined timeout, and asynchronously on certain events (such as AllJoyn interactions or user events). This paper concentrates on dynamic mobility features, represented by the last three tracing features in the previous list. 3.2. Experimental setup The tracing experiment lasted for 65 days, in March–May 2012, and has taken place at the Faculty of Automatic Control and Computer Science, University Politehnica of Bucharest. A total of 66 volunteers participated. They were selected in the experiment to have a wide range of study years and specialisations covered: one first year Bachelor student, one third year Bachelor student, 53 fourth year Bachelor students, three Master students, two faculty members and six external participants (participants selected from office environments). The experiment implied an initial start-up phase (hereafter called the pairing session), where all participants were asked to meet and pair all devices for Bluetooth interactions, as illustrated in Figure 1. The participants were asked to start the HYCCUPS Tracer each weekday between 10:00 and 18:00 – as we assumed this is the interval in which most participating members are attending classes or work. As expected, the volunteers in our experiment did not always respect the instructions with conscientiousness. Nonetheless, the results proved that the captured tracing data was sufficient for our needs.
Figure 1. Experiment start-up: pairing session. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
1219
EXPLORING PREDICTABILITY IN MOBILE INTERACTION
4. SOCIAL STRUCTURES As expected, the social aspects of mobile interactions have influenced tracing data, as participants tend to interact more with users in their community or social circle (an aspect previously identified by other authors as well). In this subsection, we describe the methodology applied to discover social structures and we present the communities formed during our experiment. 4.1. MobEmu To determine social structures, we extended the MobEmu emulator [25]. The emulator is designed to parse the data traces with schema conforming to CRAWDAD well-known formats [7]. It further performs various actions at every given time interval, depending on the structure of the trace files. As such, the emulators are capable of computing and reporting results on: total contacts per node, number of encounters with external or internal nodes and contact durations. MobEmu is implemented in Java, and each participant is assigned a corresponding node object; each node is attributed with a unique ID, as well as with all other tracing information presented in Section 3 (battery statistics, load, frequency, etc.). Right after an interaction, a contact object is created containing the unique IDs of the peers, the time stamp and the duration of the contact. Given that MobEmu is easily expandable, we have augmented it by adding an on demand distributed community detection algorithm, namely k-CLIQUE [26], as the main function of the emulator, which checks for contacts at every time interval by looping through all traces. k-CLIQUE is a dynamic algorithm for on-the-fly community detection by means of analysing a node’s history of passed encounters. It is easily customisable by tweaking two parameters: the contact threshold – the total amount of time of synergy between two nodes before they are considered as being apart of the same community and the community threshold – the number of common community nodes that two other nodes have to belong to the same social structure. The MobEmu emulator running k-CLIQUE for every time interval determines an array of community nodes for each participant. A fact worth mentioning is that communities need not be similar as interactions are not always symmetrical; Bluetooth beacons are not synchronised, link errors lead to AllJoyn desynchronisations. 4.2. HYCCUPS communities Academic and office environments are naturally grouped into social communities; in our case, groups of students, faculty members and office colleagues. The Faculty of Automatic Control and Computers within the University Politehnica of Bucharest is structured as follows: there are 4 years for Bachelor students, split up into four groups of about 30 persons each, and 10 Masters tracks, with about 20 students each. By running the MobEmu with k-CLIQUE on our tracing data, we have computed the HYCCUPS communities – the results are illustrated in Figure 2(a) for AllJoyn interactions, respectively in Figure 2(b) for Bluetooth contacts. 4
0
13 51
32
28
50
18
38
19
0
5
43
26
22
48
8 22 31
15
46
25
10 1
34
37
31
2
1
35
42 49
14
47
28
15
8 20
48
39
41
45
40
35
38 27 25
20
17
46 5
21
(a) AllJoyn
36
6
3
26
30 6
10
2
27
33 4
50
11 7
47 7
40
49
19
44
42
29 13
16
12
(b) Bluetooth
Figure 2. Detected communities using a contact threshold of 1200s and a community threshold of 6. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
1220
R.-C. MARIN, C. DOBRE AND F. XHAFA
In computing the communities, we have varied the two k-CLIQUE parameters, namely the contact threshold and the community threshold, as follows: Contact threshold D 3600s, community threshold D 8: this configuration proved to be
too restrictive, as we ended up ignoring interactions and even omitting nodes from the communities; Contact threshold D 600s, community threshold D 4: as opposed to the previous configuration, the current one is placed at the other extreme, being too permissive – as we obtained an almost full-mesh community; Contact threshold D 1200s, community threshold D 6: this is the appropriate balance between the previous two configurations, as can be observed in Figures 2(a) and 2(b). As expected, there is a high degree of connectivity, considering we usually obtain one large community. This is easily explained by the spatial restraint, as almost all participants are students of the same school, and therefore tend to interact more on the grounds of the university. However, there is a slight difference, as interacting over Bluetooth tends to isolate stray mini-communities – as can be seen in Figure 2(b). We believe that the key factor in this separation is range, as Bluetooth is designed for shorter ranges (about 5–10 m), whereas Wi-Fi APs (wireless routers) have ranges up to 30 to 40 m. 5. PREDICTABILITY OF WIRELESS INTERACTIONS After ascertaining the social structures in our experiment, we explored the predictable behaviour of participants while interacting with peers. As stated in Section 3, we considered in these experiments both Bluetooth and AllJoyn interactions. As such, we analysed and compared the tracing data for both types of synergy. We observed in these experiments [6] that AllJoyn interactions tend to occur much more often than those on Bluetooth; respectively, Wi-Fi encounters cumulate up to 20,658, whereas Bluetooth sums up only 6969, which amounts to only 33.73% of the latter. We believe that such results are reflected by the low range of Bluetooth – which was also observed in the community analysis in the previous section. We studied next the hourly interactions of individuals on a daily basis, and as such, we computed the probability that an individual interacts at least once each day at the same hour. Figure 3 shows
Figure 3. Distribution of entropy of interacting for Bluetooth and AllJoyn. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
EXPLORING PREDICTABILITY IN MOBILE INTERACTION
1221
the entropy of hourly interactions; a correction to [6] has been applied, as entropy was normalised by the total number of interactions. As seen, AllJoyn hourly interactions peak almost as low as Bluetooth. We point out that the comparison between the two peer-to-peer solutions actually comes down to a compromise between low range versus low power saving as more powerful radios lead to much faster battery depletion. 6. THE WIRELESS BEHAVIOUR OF MOBILE USERS As seen in the previous sections, despite its many benefits, Bluetooth proves to have considerable drawbacks relating to our study. In consequence, we decided to focus our analysis on the predictability of Wi-Fi synergy. Taking in mind that AllJoyn creates peer-to-peer sessions over wireless networks, we consider wireless APs to be the backbone on top of which devices interact; we were particularly interested in the patterns of visitation of such network devices. The following subsections present the methodology and guidelines proposed to study wireless interactions, and also the applicability of the methodology on various traces, including the HYCCUPS trace set. 6.1. Methodology and guidelines for analysing wireless behaviour This subsection presents a proposal for a methodology not just as a basis for studying already existing mobile traces that involve scanning of nearby wireless APs, but also as a set of guidelines for future tracing application developers to consider when designing and developing tracers. The basic principles for the methodology are inspired from the analysis conducted by Song et al. in [2]. In their paper, the authors study the limits of predictability in the mobility and behaviour of mobile users over cell towers. Here, we try to formalise, adapt and enhance their analysis to map it onto wireless network traces. As such, we need to fill the gap between their analysed context and ours, namely bridging the difference between the range of cell towers and wireless APs. Seeing that the main focus of this methodology is interaction with wireless APs, we need to define a measure of sufficiency of the tracing data, namely the observed interval sufficiency. This measure determines if the tracing data converged to a point where it is sufficiently informed to perform additional operations on it. Moreover, we define the observed interval sufficiency as the minimum interval in which the discovery of APs converges. Recalling that we are dealing with Wi-Fi networks in academic and office environments, we can assess that the surroundings of such a tracing experiment are limited, and, as such, patterns are visible sooner than in mobile networks (e.g. Global System for Mobile Communications). This should limit the tracing interval to several months or weeks, rather than a full year. Also, an important factor of our analysis is the number of subjects involved in the experiment, as well as their conscientiousness (or control over the tracing application; we will emphasise more on conscientiousness later in this section). As such, we have empirically discovered that a minimum of 10 to 20 users are necessary to provide statistical correctness of the analysis (see later subsections for examples). Naturally, the next step in such an analysis is to formalise the interactions between users and wireless APs. The reader should be aware that the tracing experiments at the focus of this methodology must contain the temporised results of wireless AP scans. As such, we define a virtual location (VL) as the most relevant AP scanned by a user during an hour. Considering that multiple APs can be scanned by a user during an hour, we need to define a heuristic for choosing the most relevant one. As opposed to [6], we represent VLs as basic service set identifiers, as service set identifiers are prone to name clashes, a fact that may negatively influence our study. Although it may seem unintuitive at first, choosing hours to be the analysis temporal step has many reasonable explanations. First of all, tracing applications that gather wireless scan results might use different timing intervals; we consider an hour to be the maximum interval that still yields sufficiently different tracing results. Furthermore, the object of our study refers to academic and office environments in which an hour is usually the unit of work (considering work schedules for example). Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
1222
R.-C. MARIN, C. DOBRE AND F. XHAFA
We propose the use of two VL choosing heuristics: 1. First Come First Served (FCFS): choose the first sighting of an AP as the most relevant VL. The purpose of this heuristic is to mimic a pseudo-random algorithm of picking VLs. 2. Alpha: choose the most outstanding VL by weighing both the number of sightings during an hour, but also the average signal strength. Basically, we choose a VL as the AP that maximises the following expression ˛ count .VLi / C .1 ˛/ avg.sigS t r.VLi //. If FCFS describes a pseudo-random heuristic, Alpha offers more control over choosing APs; by tweaking the alpha factor, we can guide the algorithm towards more realistic situations: when alpha is lower, signal strength is more important than the number of sightings, thus better mapping on a situation with reduced mobility (closed surroundings), where there are fewer APs and the signal strength is the most valuable feature one can use to model human mobility. On the other hand, when increasing alpha, we turn our attention towards situations with a high range of mobility; signal strength is only momentary, whereas sighting an AP multiple times shows a predictable pattern. On the basis of these two heuristics, we introduce the term VL sequence as the result of splitting up the entire tracing interval into hourly intervals, and generating a chain of VL symbols for each hour of the monitored period. Whenever the VL of a user is unknown for a time segment, it is marked with a special symbol (e.g. ‘?’). Such shortcomings in tracing data, also known as lack of conscientiousness, are approximated by means of the knowledge coefficient similar to the q parameter used by [2], which characterises the fraction of segments in which the location is unknown. Also similar to [2], we choose a lower limit of 20% for our knowledge coefficient – we found it to be sufficient for our needs. In total, a set of 12 sequences were generated for each user : 1 FCFS and 11 Alpha sequences (by sweeping the alpha value from 0 to 1 with a 0.1 step value). We believe these 12 sequences to be sufficient in analysing the predictability of interacting with wireless APs. On the basis of these VL sequences, we computed three measures of entropy for each user: Srand is the entropy of a useri travelling in random patterns and is defined as
Srand .i/ D log2 .Ni /, where Ni is the total number of VLs that useri has discovered. Sunc is the entropy of spatial travelling patterns without taking into account the temporal
component of an interaction (also named temporally uncorrelated entropy). It is defined as: Sunc .i/ D
Ni X
pi .j / log2 .pi .j //,
j D1
where pi (j) is the probability of useri to interact with a specific VLj . Sest is the estimated entropy computed by means of a variant of the Lempel–Ziv algorithm [27].
As we previously mentioned, we computed the estimated entropy, Sest , by means of a variant of the Lempel–Ziv data-compression algorithm, considering the history of passed encounters. By doing so, we correlated the temporal dimension with the VL interaction patterns. As such, we constructed an estimator that computes entropy as !1 1X Sest D i ln n, n i
where n is the length of the symbol sequence and i is the shortest substring that appears starting from the index i, but which is not present for indexes lower than i. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
EXPLORING PREDICTABILITY IN MOBILE INTERACTION
1223
Kontoyiannis et al. [28] prove that Sest converges to the real entropy when n approaches infinity. We have chosen these three measures of entropy to cover both synthetic models (random walks on graphs) and real-life models (spatial and spatio-temporal visitational patterns). Song et al. [2] also use the probability ˘ of a predictor to correctly and accurately anticipate the future locations of users. We find the three measures of entropy to be sufficient for our needs as this paper mainly focuses on analysing the predictability of user interactions, rather than building predictors for human mobility. We consider Sest (i) 6 Sunc (i) 6 Srand (i) < 1 [2] to be a reasonable assumption for each useri , as a participant taking random actions will be less predictable than another one frequenting VLs irregardless of time. Also, both are less invariable than a real user taking logical decisions. Moreover, we propose this inequality to be the goal of our statistical analysis. 6.2. Analysis of the HYCCUPS trace During the HYCCUPS experiment, 6650 APs were discovered; Figure 4 shows the distribution of distinct APs discovered for various weekly intervals. As can be observed, 10 weeks are sufficient for the number of discovered APs to converge and, as such, we can state that the most frequented wireless network devices have already been detected. Noteworthy, most participants have limited mobility, as they meet few APs. These restricted travel patterns favour interactions, as individuals are clustered into communities situated in closed surroundings – in the range of a few preferential wireless APs. By applying the proposed methodology on the HYCCUPS trace, we obtained VL sequences with 368 symbols (8 h 46 weekdays), each symbol corresponding to an outstanding VL for a specific hour. Unfortunately, the lack of conscientiousness of volunteer participants has left its imprint on the tracing data, as by applying such a knowledge coefficient we trimmed down more than half of the participants. Figure 5 shows the distributions of entropy P(Srand ), P(Sunc ), respectively P(Sest ) for (a) FCFS and (b) Alpha (0.7). We found that Sest 6 Sunc 6 Srand holds for our experiments. Table I shows the peaks for all types of sequences, as proof that the goal inequality holds on all cases. Figure 6 presents a comparison of the distributions for the three proposed entropies, on all VL sequences. As expected, FCFS presents one of the most skewed distributions for each of the proposed entropies. This proves that such pseudo-random simulations tend to suffer from unrealistic traits. Most surprisingly, the cases for Alpha (1) shows that in the HYCCUPS trace, the signal strength was a tie-breaker for choosing VLs; one interpretation could be that users involved in the experiment had a low range of mobility, and they travelled mostly in surroundings limited to the
Wi-Fi
9000
6000
3000
2.5
5.0
7.5
10.0
Week
Figure 4. Distribution of number of discovered access points (APs) for each week in the HYCCUPS trace. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
1224
R.-C. MARIN, C. DOBRE AND F. XHAFA
Figure 5. The inequality of entropies for Srand , Sunc , Sest for the HYCCUPS trace. Table I. Entropy peaks for all sequences in the HYCCUPS trace. Sequence
Estimated
Temporally uncorrelated
Random
FCFS Alpha 0 Alpha 0.1 Alpha 0.2 Alpha 0.3 Alpha 0.4 Alpha 0.5 Alpha 0.6 Alpha 0.7 Alpha 0.8 Alpha 0.9 Alpha 1
0.7718473 0.6865041 0.6845432 0.6845432 0.6831773 0.681429 0.6724427 0.6705612 0.6569588 0.654626 0.6573192 0.7494014
2.618399 2.287341 2.22213 2.22213 2.20681 2.16964 2.136612 2.099739 2.081863 2.076209 2.069322 2.596805
5.087463 4.857981 4.857981 4.70044 4.857981 5 4.643856 4.643856 5.209453 5.247928 4.807355 5.459432
Figure 6. Comparison of entropy distributions for all sequences for the HYCCUPS trace.
university grounds and certain office locations. As can also be observed, an Alpha value around 0.7 generates results close to normal distributions. On the basis of these results, obtained when applying the proposed methodology, we can remark that the wireless behaviour of users in the HYCCUPS trace is subject to predictability, as any real user can be pinpointed to one of the 20.68 1.6 locations, whereas a user taking random decisions will be found in any one of 24.94 30.7 locations. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
EXPLORING PREDICTABILITY IN MOBILE INTERACTION
1225
6.3. Analysis of other traces This subsection presents the application of the proposed methodology to two external traces, accessed from the CRAWDAD [7] database: namely, we choose Rice [16] and Nodobo [17] as independent data traces to validate our previous conclusions. These external traces have previously also been studied in relation to mobility prediction, and we found them relevant to evaluate further more the applicability of our guidelines and proposed methodology. 6.3.1. Rice. The Rice [16] data trace is composed of cellular and Wi-Fi scan results from the Rice community in Houston, Texas. Ten subjects have participated in the tracing experiment that lasted for 44 days (from 16 January 2007 to 28 February 2007). During the experiment, a total of 6055 wireless APs have been discovered – Figure 7 shows the distribution of discovering APs. Again, as seen, the 8 weeks of the experiment are almost sufficient for convergence. In consequence, the proposed methodology can be applied to study the predictability of interacting with wireless APs. The distributions of entropy P(Srand ), P(Sunc ), respectively P(Sest ) for (a) FCFS and (b) Alpha (0.7) are illustrated in Figure 8. The entropy inequality for all generated sequences still holds, given the peaks for the studied entropies shown in Table II.
Wi-Fi
9000
6000
3000
0 2
4
6
Week
Figure 7. Distribution of number of discovered access points (APs) for each week in the Rice trace.
(a) FCFS
(b) Alpha 0.7
Figure 8. The inequality of entropies for Srand , Sunc , Sest for the Rice data trace. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
1226
R.-C. MARIN, C. DOBRE AND F. XHAFA
Table II. Entropy peaks for all sequences in the Rice data trace. Sequence
Estimated
Temporally uncorrelated
Random
FCFS Alpha 0 Alpha 0.1 Alpha 0.2 Alpha 0.3 Alpha 0.4 Alpha 0.5 Alpha 0.6 Alpha 0.7 Alpha 0.8 Alpha 0.9 Alpha 1
2.048435 1.551438 1.553135 1.553135 1.529704 1.531354 1.551438 1.551438 1.50697 1.529704 1.575544 1.767827
3.768333 2.836445 2.844426 2.844426 2.819934 2.813175 2.834167 2.823333 2.758835 2.781814 2.865607 3.172387
5.321928 5.087463 5 5 4.954196 4.807355 4.954196 4.754888 4.807355 4.584963 4.392317 5.209453
Furthermore, Figure 9 presents the distributions for the three measures of entropy on all generated sequences. As opposed to HYCCUPS, the distributions of the estimated entropy are heavily skewed, but still consistent. This can be a consequence of the knowledge factor. More surprisingly, although the Rice data trace contains data from only 10 users, they all have a high degree of collected knowledge – all users have a computed knowledge factor of over 60%. This increased informational gain may also affect the Random entropy, as seen in Figure 9(c): each generated sequence is generally different from the others. This further proves that, in real-life, random heuristics are not able to accurately simulate human behaviour. As a resemblance with the HYCCUPS analysis, signal strength is also a tie-breaker in choosing the outstanding VL. As such, evidence show that, while tracing wireless APs, the quality of an AP is more important than the number of sightings. Also, in both traces, FCFS seems to have the same behaviour: the distributions are skewed and the peaks are higher, but they still do not reflect the worst case scenario. By applying the proposed methodology, we showed that the users that collected the traces in the Rice experiment are subject to repeatability. Furthermore, a real user can be pinpointed to one of 21.6 3.03 locations, whereas a random user can be found in one of 24.9 29.85 locations. The estimated entropy seems quite higher compared with the ones previously shown for HYCCUPS data trace – which can be explained by the higher knowledge factor in the Rice experiment. 6.3.2. Nodobo. The Nodobo [17] data trace was collected by means of a social sensor software suite for Android devices (also dubbed Nodobo). The experiment involved 21 subjects and lasted for 23 days (it actually lasted for a longer period, but we chose this subset because it was the longest contiguous trace interval) – from 9 September 2010 to 1 November 2010. In applying the methodology, the Nodobo trace is less accurate, because the tracing period proves to be insufficient, and there are insufficient users with a knowledge factor over 20%. As such, in
(a) Estimated entropy
(b) Temporally-uncorrelated entropy
(c) Random entropy
Figure 9. Comparison of entropy distributions for all sequences for the Rice trace. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
EXPLORING PREDICTABILITY IN MOBILE INTERACTION
1227
following the exact guidelines of the proposed methodology, such a trace proves to be hardly appropriate for predictability studies. Figure 10 shows, for example, the distribution of the discovered nodes – as seen, this distribution fails to actually converge. As a comparison with the previous two data traces, where we accumulated more than 6000 discovered wireless APs, the Nodobo data trace discovers only 153. One should bear in mind that the low knowledge factor is not necessarily influenced by the lack of wireless APs in the vicinity of mobile users, but merely by the lack of conscientiousness of the volunteers involved in the experiment. To further validate the methodology and guidelines, we attempted to analyse the predictability of interacting with wireless APs in Nodobo. Figure 11 illustrates the distributions of entropy P(Srand ), P(Sunc ), respectively P(Sest ) for (a) FCFS and (b) Alpha (0.7). Along with the peak values for all sequences, shown in Table III, we can state that the goal inequality of predictability does not hold. Furthermore, when attempting to compare the distributions for all sequences (as presented in Figure 12), the insufficiency of both the tracing interval and of the sample size impact the statistical analysis heavily. As a result, we see that the results in this case are quite inconclusive. The proposed methodology applied on Nodobo has mathematically demonstrated that data traces with insufficient information tend to be insufficient to determine any measure of predictability of 120
Wi-Fi
90
60
30
1
2
Week
3
4
Figure 10. Distribution of number of discovered access points (APs) for each week in the Nodobo data trace.
(a) FCFS
(b) Alpha 0.7
Figure 11. The inequality of entropies for Srand , Sunc , Sest for the Nodobo data trace. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
1228
R.-C. MARIN, C. DOBRE AND F. XHAFA
Table III. Entropy peaks for all sequences in the Nodobo trace. Sequence
Estimated
Temporally uncorrelated
Random
FCFS Alpha 0 Alpha 0.1 Alpha 0.2 Alpha 0.3 Alpha 0.4 Alpha 0.5 Alpha 0.6 Alpha 0.7 Alpha 0.8 Alpha 0.9 Alpha 1
0.754087 0.754087 0.754087 0.754087 0.754087 0.754087 0.754087 0.754087 0.754087 0.754087 0.754087 0.754087
0.6207715 0.6207715 0.6207715 0.6207715 0.6207715 0.6207715 0.6207715 0.6207715 0.6207715 0.6207715 0.6207715 0.6207715
1.584963 1 1.584963 1.584963 1.584963 1.584963 1.584963 1.584963 1.584963 1.584963 1.584963 1.584963
(a) Estimated entropy
(b) Temporally-uncorrelated entropy
(c) Random entropy
Figure 12. Comparison of entropy distributions for all sequences for the Nodobo trace.
wireless behaviour (an intuitive result, but our numerical results demonstrate their formal validity). The guidelines prove to be efficient in filtering the trace set before the statistical analysis can be performed, in both positive cases (e.g. HYCCUPS and Rice) and also in negative cases (Nodobo). 7. MOTIVATION After having defined and discussed all of the needed terms and theories, we present now a summary of our conclusions, together with the motivation behind the proposed methodology. As seen in this paper, we proposed a guide to analysing the predictability of interactions between mobile devices and wireless APs, as we believe these devices to be the gateways to intelligent collaborations between mobile user agents. Although it may seem we cover only a niche of applicability in terms of mobile software (in our case interactions, for example, over AllJoyn), we actually present a clear methodology of extracting pattern information from user traces. Such a result opens new frontiers for mobile application optimizations that can tap into vast amounts of contextual data exchanged in wireless networks. The remainder of this section will cover issues in the methodology and guidelines that may seem problematic at first sight. First of all, the reader might be curious as to why the methodology is oriented at periodic wireless scan results and not at the actual connected APs. As it may seem unintuitive, we are actually not interested in the connected wireless APs, as extracting patterns from such tracing data would render more information on the predictability of the software in the mobile operating system, which is responsible for choosing which AP it should connect to. In this sense, many factors may influence the patterns, such as: favouring previously cached BSSIDs, locked networks and even the personal preferences of the user. We consider such patterns not to reflect the true wireless behaviour of a mobile user, and we believe it to be much too restrictive. By analysing the scan results and constructing various types of sequences based on obtained results, we actually analyse multiple Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
EXPLORING PREDICTABILITY IN MOBILE INTERACTION
1229
possibilities at once, as opposed to connected APs, which would reflect into a single path over networks. By searching on multiple paths, we are able to extract patterns that would match various situations. This was pointed out in the description of our methodology: when using FCFS sequences, we choose to observe a random possible path in a user’s travel patterns, where using an Alpha 0 sequence would better characterise patterns while in closed surroundings where the quality of signal is more important. Second of all, the existence of the knowledge factor is also problematic. This is not an issue of the methodology, but of the volunteers’ conscientiousness in running the tracing application. Many unfortunate situations may arise, such as: the device runs out of battery, the device malfunctions or the user takes the decision to stop the tracing application. By measuring the entropy of interacting with APs, we obtain patterns of the user’s behaviour. We are able to understand how the user reaches different situations in a predictable manner. As such, we are not analysing just the wireless behaviour of the users, but we are also capturing patterns of invariability about other traits of the user’s behaviour. The purpose of the proposed methodology is not only to study the predictability, but also to act as a rule of thumb for developers of tracing applications. Moreover, as the results have shown, the various numbers of studied situations (based on generated sequences) present an inherent predictability. On the basis of this supposition, we intend to design an algorithm for mobile operating systems that optimises the informational gain of a mobile user by choosing wireless APs based on the learned tracing data. As mentioned previously, such traces converge in a matter of weeks or months, and as such, we consider the amount of collected and aggregated trace data to be reasonable in size. The basic idea behind the algorithm is that the trace data are sufficient in predicting future encounters with wireless APs within multiple real-life situations (based on the various generated sequences). 8. CONCLUSIONS In this paper, we have presented a tracing experiment that took place in March to May 2012 at the Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Romania, with the sole purpose of collecting contextual data as to further analyse the repeatability and predictability of mobile interactions in academic and office environments. We have inspected the tracing data and investigated user synergy from two points of view: the group view and the individual view. We have applied a distributed community detection algorithm and have found that confined surroundings lead to the creation of large highly adhesive communities. However, the wireless communication media can have an important influence as low-ranged solutions, such as Bluetooth, tend to isolate loosely-coupled microcommunities. As for the individual’s perspective over interactions, we focus more on Alljoyn because Bluetooth interactions occur three times more rarely than the latter. Furthermore, we propose a methodology and a set of guidelines to be used in analysing the predictability of interaction between mobile users and wireless APs based on the study of Song et al. [2]. By applying the methodology on three cases (HYCCUPS, Rice and Nodobo), we prove that mobile users have a predictable wireless behaviour given if the trace sets are complete, correct and sufficiently informed. We believe that studying the predictability of human behaviour based on real mobile user traces can prove to be the key to intelligent mobile collaboration in opportunistic networks comprised of smartphones that will eventually lead to less power consumption and that will be able to harness the full potential of contextual data by distributed context aggregation and detection.
ACKNOWLEDGEMENTS
This work was supported by project ‘ERRIC – Empowering Romanian Research on Intelligent Information Technologies/FP7-REGPOT-2010-1’, ID: 264207. The work has been cofounded by the Sectoral Operational Programme Human Resources Development 2007 to 2013 of the Romanian Ministry of Labour, Family and Social Protection through the Financial Agreement POSDRU/89/1.5/S/62557. Special thanks go to volunteers participating in the presented tracing experiments. Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe
1230
R.-C. MARIN, C. DOBRE AND F. XHAFA
REFERENCES 1. Goasduff L, Pettey C. Gartner says worldwide smartphone sales soared in fourth quarter of 2011 with 47 percent growth. Available from: http://www.gartner.com/it/page.jsp?id=1924314 [Accessed: 22/02/2012]. 2. Song C, Qu Z, Blumm N, Barabsi AL. Limits of predictability in human mobility. Science 2010; 327(5968):1018–1021. 3. Musolesi M, Mascolo C. Evaluating context information predictability for autonomic communication. In A World of Wireless, Mobile and Multimedia Networks, International Symposium on, Vol. 0, 2006; 495–499. 4. Kim M, Kotz D. Extracting a mobility model from real user traces. In In proceedings of IEEE INFOCOM, 2006; 1–13. 5. Carrier iq. Available from: http://www.carrieriq.com/ [Accessed: 30/03/2012]. 6. Marin RC, Dobre C, Xhafa F. Exploring predictability in mobile interaction. In Proceedings of Third International Conference on emerging Intelligent Data and Web Technologies (EIDWT 2012), 2012; 133–139. 7. CRAWDAD. Available from: http://crawdad.cs.dartmouth.edu/ [Accessed: 04/05/2012]. 8. Ebeling W, Frommel C. Entropy and predictability of information carriers. Biosystems 1998; 46(1-2):47–55. 9. Ebeling W, Molgedey L, Kurths J, Schwarz U. Entropy, complexity, predictability, and data analysis of time series and letter sequences. In The Science of Disasters. Springer: Berlin, Heidelberg, 2002; 2–25. 10. Ihler A, Hutchins J, Smyth P. Learning to detect events with Markov-modulated poisson processes. ACM Transactions on Knowledge Discovery from Data (TKDD) 2007; 1(3):pp. 13. 11. Delsole T, Tippett MK. Predictability: recent insights from information theory. Reviews of Geophysics 2007; 45(4):RG4002+. 12. Zechar JD, Schorlemmer D, Liukis M, Yu J, Euchner F, Maechling PJ, Jordan TH. The Collaboratory for the Study of Earthquake Predictability perspective on computational earthquake science. Concurrency and Computation: Practice and Experience 2010; 22(12):1836–1847. 13. Foursquare. Available from: https://foursquare.com/ [Accessed: 02/05/2012]. 14. Anastasios N, Scellato S, Lambiotte R, Pontil M, Mascolo C. A tale of many cities: universal patterns in human urban mobility. PloS one 2012; 7(5):e37027. 15. Mit media lab: Reality mining. Available from: http://reality.media.mit.edu/ [Accessed: 01/05/2012]. 16. Rahmati A, Zhong L. CRAWDAD data set rice/context (v. 2007-05-23), 2007. Downloaded from http:// crawdad.cs.dartmouth.edu/rice/context [Accessed: 04/10/2013]. 17. McDiarmid A, Irvine J, Bell S, Banford J. CRAWDAD data set strath/nodobo (v. 2011-03-23), 2011. Downloaded from http://crawdad.cs.dartmouth.edu/strath/nodobo [Accessed: 04/10/2013]. 18. Schmidt A, Beigl M, Hans-W H. There is more to context than location. Computers & Graphics 1999; 23(6):893–901. 19. Visan A, Istin M, Pop F, Cristea V. Bio-inspired techniques for resources state prediction in large scale distributed systems. International Journal of Distributed Systems and Technologies (IJDST) 2011; 2(3):1–18. 20. Hui P, Crowcroft J. Predictability of human mobility and its impact on forwarding. In 2008 Third International Conference on Communications and Networking in China, IEEE, 2008; 543–547. 21. Finger M, Bezerra GC, Conde DR. Resource use pattern analysis for predicting resource availability in opportunistic grids. Concurrency and Computation: Practice and Experience 2010; 22(3):295–313. 22. Doci A, Springer W, Xhafa F. Impact of the dynamic membership in the connectivity graph of the wireless ad hoc networks. Scalable Computing: Practice and Experience 2009; 10(1):25–34. 23. Pascual VS, Xhafa F. Evaluation of contact synchronization algorithms for the Android platform. Mathematical and Computer Modelling 2013; 57(11–12):2895–2903. 24. Barolli L, Anno J, Xhafa F, Durresi A, Koyama A. A context-aware fuzzy-based handover system for wireless cellular networks and its performance evaluation. Journal of Mobile Multimedia 2008; 4(3):241–258. 25. Ciobanu RI, Dobre C, Cristea V. Social aspects to support opportunistic networks in an academic environment. In Proceedings of the 11th International Conference on Ad-hoc, Mobile, and Wireless Networks, ADHOC-NOW 2012. Springer: Berlin, Heidelberg, 2012; 69–82. 26. Hui P, Yoneki E, Chan SY, Crowcroft J. Distributed community detection in delay tolerant networks. In Proceedings of 2nd ACM/IEEE International Workshop on Mobility in the Evolving Internet Architecture, MobiArch ’07. ACM: New York, NY, USA, 2007; 7:1–7:8. 27. Ziv J, Lempel A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 1978; 24(5):530–536. 28. Kontoyiannis I, Algoet PH, Suhov YM, Wyner A. Nonparametric entropy estimation for stationary processes and random fields, with applications to English text. IEEE Transactions on Information Theory 1998; 44:1319–1327.
Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency Computat.: Pract. Exper. 2014; 26:1215–1230 DOI: 10.1002/cpe