overseas trip in comparison to daily trips in a city; at a local scale, to reduce traveling costs, people tend to ... movement behaviors (e.g. Dodge et al. 2012). 2.
An Index for Characterizing Spatial Bursts of Movements: A Case Study with Geo-Located Twitter Data E.-K. Kim1, A. M. MacEachren1 1
GeoVISTA Center, Department of Geography, Pennsylvania State University, University Park, PA 16802 Email: {eun-kyeong.kim; maceachren}@psu.edu
1. Introduction Human movements have often been modelled as diffusion processes and as a random walk (e.g. Brockmann et al. 2006) because mobility patterns include both predictable factors (related to regular working, commuting, and sleeping activities) and many unknown and unpredictable factors (Gonzalez et al. 2008). Recent studies on human mobility have revealed that human trajectories can be approximated by Lévy flights, a special type of random walk in which spatial distances between subsequent events (i.e. movements) follow power-law distributions (e.g. Brockmann et al. 2006, Gonzalez et al. 2008). A power-law distribution of spatial distances signifies that people make a large number of short distance movements alternating with a small number of long distance movements, which is defined as ‘spatial bursts’ of movements in this study. Such patterns are distinguished from Brownian motion, a random walk that models movements of physical particles (Klafter et al. 1996, Gonzales et al. 2008, Reynolds and Rhodes 2009). Lévy flight patterns have been found in movement behaviors of animals including albatrosses (e.g. Viswanathan et al. 1996) and spider monkeys (e.g. Ramos-Fernandez et al. 2004), and Lévy flights are known to facilitate maximizing the efficiency of foraging behaviors for resources (Reynolds and Rhodes 2009). For humans, Lévy flights can potentially be associated with movement behaviours that minimize traveling costs (e.g., time, money) to achieve objectives. At a country scale, for example, people tend to rarely make an overseas trip in comparison to daily trips in a city; at a local scale, to reduce traveling costs, people tend to make multipurpose trips (i.e. trying to achieve multiple purposes at one or a few zones) (Hanson 1980). As a parameter, exponents of power-law distributions capture different forms of distribution for spatial distances. However, such exponents are not easy to interpret and compare with each other because they do not have upper or lower bounds indicating the degree of spatial regularity or irregularity. This study proposes an index for detecting spatial movement bursts. The index is adapted from an existing index that detects the temporal event bursts (e.g. Goh and Barabási, 2008). This study applies the proposed index to movement patterns in geo-located Twitter data to determine whether collective movement behaviors reflected in geolocated tweets follow Lévy flight patterns, random patterns, or regular patterns at a county level and explores how such patterns relate to socioeconomic traits of a corresponding county. This study complements traditional time-geographic approaches to characterizing movement patterns that use space-time path/prism within a space-time cube (e.g. Hägerstrand, 1970, Kwan 2004) and that use spatiotemporal data mining applied to movement behaviors (e.g. Dodge et al. 2012).
2. Research Question We focus on three sets of research questions: (1) about the proposed index, (2) about spatial burst patterns of counties based on collective mobility patterns, and (3) about association between spatial bursts and socioeconomic characteristics of each county.
1a) What index is effective to characterize spatial bursts of movements? 1b) What pros and cons does the proposed index have? How is it interpreted? 2a) How regular are collective movement patterns of geo-located Twitter users who live in each county? Are they regular, random, or spatially bursty? 2b) How are values of the proposed index spatially distributed across U.S. counties? 3a) Do spatial distributions of values of the index relate to socioeconomic characteristics of counties (e.g. population size, transportation infrastructure)?
3. Movements in Geo-Located Twitter Data This study applies the proposed index to characterize movement patterns in geo-located Twitter data. Twitter data have been used as a proxy for global-, national-, or city-scale movements (e.g. Hawelka et al. 2014). Twitter reflects only the roughly 18% of the population who use it and about 2% of tweets are posted with location information, see: http://www.pewinternet.org/2013/12/30/demographics-of-key-social-networking-platforms/. Nevertheless, Twitter data can potentially serve as an alternative or complement to traditional census or survey-based data on travel, which are expensive and infrequently collected (Handy 1996). Twitter data to support the research reported here, as well as a study of geographical variation in movement as reflected in Tweet locations reported elsewhere (Kim and MacEachren submitted), were collected with the Twitter API from 10/01/2012 to 09/30/2013 throughout the U.S. After cleaning to remove duplicates and other data errors, the data set contains approximately 698 million geo-tweets that were posted by about 5.5 million users; on average, the number of tweets per user is about 126.12. Each tweet contains information of the tweet and the corresponding user including: ID, time, geographic coordinates, and text. Each tweet location was joined with a U.S. county boundary file from the U.S. Census.
4. An Index for Characterizing Spatial Bursts of Movements 4.1 Lévy flight and Power-law distribution Spatial bursts of movements can be characterized by Lévy flight patterns (Klafter et al. 1996). Figure 1 shows the different patterns; A and B represent movement distances from an origin to a destination, and black lines represent stops so that intervals between black lines are spatial distances between consecutive stops. While pattern A is based on a Poisson random process for the length of spatial distances, pattern B represents spatial distances by Lévy flights in which many short-distance movements (τa) alternating with long-distance movements (τb) occur and a cluster of short-distance movements is called ‘bursts’ (Barabási 2005).
Figure 1. Spatial distances of movements based on (a) a Poisson random process and (b) Lévy flights. Modified from Barabási (2005, 208).
Lévy flight patterns can be signified by a power-law distribution of movement distances. We define τ as a spatial distance between two subsequent geo-tagged tweets and its frequency is defined as the corresponding number of geo-tagged tweets for each spatial distance range. Then, the power-law probability function is defined as follows: , where (1) With the logarithmic transformation, the power-law function is (2) which is similar in form to the standard linear equation . The exponents α indicates different forms of distribution; it is interpreted as representing the probability of short distance movements with the higher exponents indicating higher probabilities (Barabási 2005). 4.2 Proposed Index: Spatial Burstiness Index The exponent derived through the power-law calculation above is difficult to interpret since it is unbounded. This study modifies the temporal burstiness index that characterizes temporal burst patterns to provide a more easily interpreted index for spatial burst patterns. While Lévy flight patterns show a power-law distribution of spatial distances between events, temporal ‘bursts’ show a power-law distribution of inter-event time intervals between events (Barabási 2005). To complement the power-law exponent, Goh and Barabási (2008) proposed the burstiness index, B in order to facilitate comparing different dynamic systems. Burstiness is defined within a range from 1 to -1, in which a burstiness of 1 represents completely bursty patterns and a burstiness of -1 represents completely regular patterns. The burstiness of 0 expresses completely neutral patterns obtained from a Poisson random process, where and are the mean and the standard deviation of inter-event time intervals.
This study replaces inter-event time intervals by spatial distances in the above equation to propose an index, the spatial burstiness that characterizes spatial burst patterns over its range from 1 to -1.
5. Characterization of Counties by Spatial Burstiness Index This study applies the proposed spatial burstiness index to individual Twitter user’s trajectory of geo-tagged tweets as a proxy of their movements in two different ways: county-based aggregation and individual-based aggregation. First, spatial distances between successive geo-tagged tweets of each individual are calculated. Second, spatial distances are aggregated to calculate the spatial burstiness index. For county-based aggregation, the spatial burstiness index of each county represents how bursty collective tweeting patterns of individuals who are assumed to live in the county are (Figure 2). For individual-based aggregation, the average values of the spatial burstiness index of individuals within each county are calculated (Figure 3). Since the number of geo-tagged tweets varies with individual users and users can differently behave (or move) depending on the number of tweets, this study applies different thresholds as the minimum number that individual users need to have (Figure 2 and 3). Maps show that highly-populated counties tend to be more bursty.
(a) (b) Figure 2. Spatial burstiness index of counties (a) with threshold 10 tweets and (b) with threshold 100 tweets. In both cases, coastal states appear to be more spatially bursty.
(a) (b) Figure 3. The average of spatial burstiness index of individuals belonging to the same county (a) with threshold 10 tweets and (b) with threshold 100 tweets.
6. Expected Results and Further Work This study proposed and applied a spatial burstiness index to geo-located Twitter data, but we need to understand how such values of a spatial burstiness index relate to socioeconomic factors of counties, which is our research question three. Some associations between spatial burstiness index and county’s population size or transportation infrastructure are expected.
Acknowledgements Data were collected by the Salathe Group at Penn State (http://www.salathegroup.com/) and we appreciate the access to these data.
References Barabási A-L, 2005, The origin of bursts and heavy tails in human dynamics.Nature, 435(7039):207–211. Brockmann D, Hufnagel L, and Geisel T, 2006, The scaling laws of human travel. Nature, 439(7075):462–465.
Dodge S, Laube P, and Weibel R, 2012, Movement similarity assessment using symbolic representation of trajectories. International Journal of Geographical Information Science, 26(9):1563–1588. Goh KI and Barabási AL, 2008, Burstiness and memory in complex systems. EPL (Europhysics Letters), 81(4):48002. Gonzalez MC, Hidalgo CA, and Barabasi A-L, 2008, Understanding individual human mobility patterns. Nature, 453(7196):779–782. Hanson S, 1980, Spatial diversification and multipurpose travel: implications for choice theory. Geographical Analysis, 12(3):245–257. Hawelka B, Sitko I, Beinat E, Sobolevsky S, Kazakopoulos P, and Ratti C, 2014, Geo-located Twitter as the proxy for global mobility patterns. Cartography and geographic information systems, 41(3):260–271. Hägerstrand T, 1970, What about people in regional science?. Papers in regional science, 24(1):7–24. Kim E-K and MacEachren AM, submitted, Moving people: Regularity analysis sing geolocated Twitter data. GIScience 2014: 8th International Conference on Geographic Information Science, Vienna, Austria. Kwan MP, 2004, GIS Methods in Time‐Geographic Research: Geocomputation and Geovisualization of Human Activity Patterns. Geografiska Annaler: Series B, Human Geography, 86(4):267–280. Ramos-Fernandez G, Mateos JL, Miramontes O, Cocho G, Larralde H, and Ayala-Orozco B, 2004, Lévy walk patterns in the foraging movements of spider monkeys (Ateles geoffroyi). Behavioral Ecology and Sociobiology, 55(3):223–230. Viswanathan GM, Afanasyev V, Buldyrev SV, Murphy EJ, Prince PA, and Stanley HE, 1996, Lévy flight search patterns of wandering albatrosses. Nature, 381:413–415.