Exploiting Schelling Behavior for Improving Data ... - CiteSeerX

Exploiting Schelling Behavior for Improving Data Accessibility in Mobile Peer-to-Peer Networks Long Vu, Klara Nahrstedt

University of Illinois at Urbana-Champaign Illinois, USA

{longvu2,klara}@cs.uiuc.edu

ABSTRACT In 1969, Thomas Schelling proposed one of the most cited models in economics to explain how similar people (e.g. people with the same race, education, community) group together in American neighborhoods. Interestingly, we observe that the analogy of this model indeed exists in numerous real world scenarios where co-located people communicate via their personal wireless devices (e.g. cell phones, PDAs, Zune) in Peer-to-Peer (P2P) fashion. Schelling’s model therefore can potentially serve as a mobility model and offer a unique opportunity to efficiently disseminate messages in mobile P2P networks. In this paper, we exploit the natural grouping behaviors of humans presented by Schelling to expedite data dissemination in such networks. Particularly, we design a push-based scheme for dense network areas to maximize query hit and a pull-based scheme for sparse network areas to utilize network bandwidth. We ensure that our scheme is lightweight since query and response are automatically limited within groups of mobile nodes carried by similar people. Moreover, we avoid broadcast storms by assigning each message a broadcast timer and applying overhearing mechanism to reduce redundant transmissions. Our scheme also allows leaving nodes and arriving nodes to collaboratively answer queries and thus further improve data accessibility. Finally, our experiment results show that the proposed data dissemination scheme improves the query hit ratio significantly while utilizing network bandwidth efficiently and avoiding broadcast storms.

1.

INTRODUCTION

Data dissemination in mobile P2P networks has drawn significant attention from research community [3, 10, 12, 13, 18, 19, 20, 21]. Previous studies on data dissemination fall into two main categories. The first category is for sparse and intermittently connected networks where mobile nodes use a store-carry-forward paradigm and only exchange messages once they have physical contacts. This model follows the emerging idea of Delay Tolerant Networks [1, 22] but provides no guarantee about the quality of data dissemination. In contrast, the second category is for dense networks where mobile nodes are assumed to either move according to a repetitive pattern [4] or group [7, 11, 17]. For cooperative mobile P2P networks, such as military or mission-based networks, grouping methods work efficiently since nodes are pre-configured to move in groups, thus they always stay in close proximity during their missions to cooperatively expedite data dissemination. Although the strong grouping assumption is typical in pre-

Matthias Hollick

Technische Universität Darmstadt Darmstadt, Germany

[email protected] vious wireless networking research literature, it is indeed unrealistic since people move much more freely in reality, and thus the grouping of wireless devices carried by these people may break at any time. This is especially true for numerous scenarios such as shopping streets where users move toward their own destinations without any pre-configurations and requirements. As a result, the grouping of mobile individuals can occur instantly but not permanently. Let us consider a shopping street scenario where customers walk to their interested shops and exchange messages via Bluetooth or 802.11 wireless interfaces of their personal wireless devices. Given two co-located customers A and B, according to previous grouping algorithms [7, 17], A and B will definitely be grouped and required to collaborate in ad hoc mode. However, A and B may have different targeted shops, so they may move to different directions in very near future, causing ad hoc connection broken. Moreover, if A is interested in shoes and B is interested in food, what is the immediate incentive for A to relay the packet about food from B, and vice versa? In this case A and B fail to communicate, although grouped. In contrast, two customers A and C can definitely group and collaborate if they are co-located and have mutual interest on shoes. When A and C share an interest on shoes, they are likely to move toward the same shopping zone for shoes. This not only motivates A and C to group and exchange information of interest but also offers stable ad hoc connections between them. We therefore believe that sharing mutual interest is crucial to motivate mobile people to collaborate and maintain wireless connections. Interestingly, we observe numerous real-world scenarios and applications where people motivate themselves to collaborate if they share mutual interests on some topic. For example, audiences of an exhibition or students in a campus can group to exchange messages while heading toward the same destinations such as exhibition halls or libraries. More interestingly, grouping behavior of people who share similarities was presented in one of the most cited model in economics by Thomas Schelling [15]. According to this model, people move apart from each other if they have different interests; whereas, they group into clusters if they share mutual interests. In this paper, we exploit the natural grouping behaviors of humans presented by Schelling to expedite data dissemination in mobile P2P networks. Particularly, we first unveil two important properties of Schelling’s model. Then, we leverage these properties to design a pushbased scheme for dense network areas to maximize query hit and a pull-based scheme for sparse network areas to utilize network bandwidth. Our scheme is lightweight since

B1

A1

B2

B3

B4

A2

A3

A4

Figure 1:

People always move toward their similar neighborhoods

query and response are automatically limited within groups of mobile nodes carried by similar people. Moreover, we avoid broadcast storms by assigning each message a broadcast timer and applying overhearing mechanism to reduce redundant transmissions. Our scheme also allows leaving nodes and arriving nodes to collaboratively answer queries and thus further improve data accessibility. Finally, our experiment results show that the proposed data dissemination scheme improves the query hit ratio significantly while utilizing network bandwidth efficiently and avoiding broadcast storms. The paper is organized as follows. Section 2 presents Schelling’s original model, its properties, and representative examples where Schelling’s model exists in the context of wireless technology. Then, we discuss system models and system overview in section 3. Next, we present in detail our protocol design and data management in section 4 and section 5, respectively. Section 6 evaluates our scheme and protocols while section 7 summarizes related work. Finally, section 8 concludes the paper.

2.

SCHELLING BEHAVIOR

For clarification, we define the term similar individuals or similar people as follows: two individuals are similar if they share common similarity (race, education, community) or mutual interest on some topic (books, musics, movies).

2.1

Schelling’s Original Model

In 1969, Thomas Schelling proposed one of the most cited models in economics to explain how similar people group together in American neighborhoods [15]. According to Schelling, the grouping is created by movements of individuals who want at least a certain portion of similar neighbors. In other words, when an individual doesn’t satisfy with his current neighborhoods, he tries to move toward a place where he has more similar neighbors. Such movements eventually create clusters of similar individuals. Figure 1 shows the idea of Schelling’s model where a circle denotes an individual and shading patterns represent different interests. In this example, A1 moves toward its closest neighbor, A2 . When all people satisfy with their neighborhoods, the clustering reaches the stable equilibrium. Schelling’s model motivates our research on mobile P2P networks due to their similarities in terms of dynamics, decentralization, and self-organization. In what follows, we use group and cluster interchangeably.

2.2

Analysis of Schelling’s Model

From our observation, Schelling’s model has two important properties, which are later exploited by our protocols for improving data accessibility in mobile P2P networks.

2.2.1

Density of similar individuals increases at proximity of clusters

This property directly follows Schelling’s original model since individuals move towards their desired neighborhoods and thus create clusters of similar individuals at these neigh-

borhoods. As a result, the density of similar individuals increases significantly at the center and proximity of these clusters.

2.2.2

Similar individuals form small “moving” clusters during their movements

In Schelling’s model, each individual always moves to his final cluster where he satisfies with the neighborhood and stays. According to Schelling, on the ways to their final clusters, similar individuals form small clusters. However, individuals at the boundary of these clusters may not satisfy with their current mixed neighborhoods. Thus, they tend to move toward bigger clusters where they have better (similar) neighborhoods. When an individual at the boundary leaves, other inner individuals will stay at the boundary; this again might cause them to leave. This process creates small “moving” clusters, which merge to bigger clusters.

2.2.3

Schelling Behavior

From our observation, Schelling’s original model is mainly for economic and social phenomena where individuals gradually form groups on a very large timescale. For example, the formation of a China town in a city might take decades. However, with the flourish of wireless technologies, we find numerous scenarios exhibiting Schelling’s model on a much smaller timescale. For example, co-located customers can group for 20 minutes and exchange their mutually favorite product information via their wireless handheld devices, while heading towards the same mall. Further, Schelling’s model originally focuses only on the outcome of the grouping process, the final clusters. Meanwhile, we observe that in the context of wireless technologies, not only the outcome but also the grouping process itself can be exploited to expedite data dissemination. This motivates us to study the analogies of Schelling’s model (instead of the original model itself), where two properties of Schelling’s model appear in a much smaller timescale and the grouping process occur during the physical movements of people carrying wireless devices. In what follows, we use the term Schelling behavior to denote analogies of Schelling’s original model.

2.3

Schelling Behavior & Wireless Technology

In this section, we show that there are many scenarios where wireless technology and Schelling behavior co-exist. Our first scenario can be found in commercial sector. Let us consider shopping mall and shopping street where customers cluster while arriving their targeted shops. In this scenario, wireless base stations at shops can broadcast product advertisements, hot sales, discount. Meanwhile, customers are individuals in Schelling’s model who walk to shops and can form group (this group of customers is moving toward the shop as second property) to exchange their opinions, reviews, and comments about their mutually interested products via their wireless personal devices. Our second scenario is a campus life where places such as book stores, libraries, class rooms, bus stations, and coffee shops are visited frequently by university students. These places represent final clusters and students represents moving individuals in Schelling’s model. Similar to shopping street scenario, wireless base stations at these places broadcast announcements and advertisements to the coming students. Meanwhile, coming students can form moving groups due to their co-locations and similar movement patterns

n4

n n

n9 7

n

2

n

n

n

n 3

6

n 5

1 0

1

8

Point of Interest n

Mobile Node ni i

Figure 2: Network model

Figure 3: System overview

(similar walking speed and targeted places). Again, with their personal wireless devices, students can exchange their information during their movements. Our third scenario is a social event such as an art exhibition or music concert in the downtown area of a city. The event “attracts” interested audiences and plays the role of a final cluster in Schelling’s model. Wireless base station at the event can broadcast advertisements, content and showtimes of the event to arriving audiences. These audiences can form groups and exchange their opinions, suggestions, comments about the event. For a generalized presentation, we denote the shops, bookstores, and exhibitions in the above scenarios as Points of Interest (P oIs). We believe these three above examples represent a popular class of applications where P oIs “attract” incoming individuals. These P oIs and mobile individuals essentially create mobile P2P networks where Schelling behavior occurs. In this network, disseminating messages from P oIs to mobile individuals and among mobile individuals efficiently is challenging due to its dynamics and diversity. Therefore, we present a data dissemination scheme to address this challenge. In what follows, mobile individual and mobile node are used interchangeably.

discount in commercial applications, or announcements from the bookstores to arriving students in the campus life scenario. The broadcast frequencies of messages depend on their popularity, which is determined by the P oIs, the creators of messages. For example, for a hot sale with big discount, the shop will broadcast it more frequently than other normal sales. Without loss of generality, we assume the popularity of messages follows a Zipf-like distribution:

3.

SYSTEM MODEL AND OVERVIEW

In this section, we first present our network model and data model. Then, we discuss our design objectives. Finally, we present the overview of our system.

3.1

Network Model

We focus on a hybrid mobile P2P network where each Point of Interest (P oI) has a wireless base station to broadcast messages and mobile nodes moving toward their targeted P oIs. In our network, Schelling behavior occurs and its two properties hold. Figure 2 presents our network model where squares represent P oIs (a final cluster in Schelling behavior) and circles are nodes whose arrows denote movement directions. In this figure, different shading patterns represent different interests. Each mobile node n in our context is identified by a tuple (id, i), where id is n’s unique identity and i denotes n’s interest. This interest associates with one P oI. Although it can change its interest any time, n moves to only one targeted P oI at one time. n also has a limited amount of cache (memory) to store messages.

3.2

Data Model

Messages are created and periodically broadcasted by P oIs in a limited range. For example, messages can be sales and

1

f (r; θ; N ) = PNr

θ

1 i=1 iθ

(1)

In Equation 1, N is the total number of messages created by one P oI and r is the rank of a message. When θ is equal to one, the Zipf-like distribution becomes “more” classic Zipf. In this paper, we also assume that the request model of mobile nodes follows the above Zipf-like distribution since people usually request more popular data than less popular data in reality.

3.3

Design Objectives

First, the data dissemination scheme should maximize data accessibility of mobile nodes following Schelling behavior. In other words, the scheme should provide timely access for mobile nodes, especially for nodes in the close proximity of P oIs. This is intuitive because when the mobile users arrive closer to the P oIs, they expect to have more timely access to the data of their targeted P oIs. Second, the scheme should minimize message overhead to save network bandwidth due to its scarcity in wireless networks.

3.4

System Overview

Figure 3 shows the overview of our system design in which a square represents a P oI and circles represent mobile nodes. The dotted circles denote wireless broadcast (transmission) ranges. In our network, a mobile node moves toward its targeted P oI (e.g. nodes ni with 0 ≤ i ≤ 8) and after it arrives at its targeted P oI, it stays (nodes n3 , n4 , n6 ), and then leaves (nodes n9 , n10 ). The P oI has a wireless base station to disseminate information in a limited range. This base station periodically broadcasts messages, more popular and important messages will be disseminated more frequently. Due to the high density of similar nodes surrounding the P oI, base station creates a Broadcast Region by assigning each disseminated message m a Time-To-Live (T T L) value. The T T L specifies how many broadcasts mobile nodes will perform on m. On receiving a message m, the receiving mobile node n computes a broadcast timer for m and stores it. Later, when m’s timer expires, if m’s T T L is positive,

Broadcast Region

Arriving

7

n

n

3

1

n

2

n 5

1 0

1

n

n12 n

1 1

New

6

5

n

n9

2

6

4

n

n

3

n4

Staying

8

Figure 5: Mobile nodes can be in one of the three states: Point of Interest n i

Mobile Node ni

Figure 4: Broadcast Region is specified by the solid line n will re-broadcast m. Outside the Broadcast Region, due to the similarities in terms of movement patterns and interest, similar nodes can automatically group and collaborate to answer queries. In the following sections, we present in detail the system and protocol designs.

4.

DATA DISSEMINATION PROTOCOL

4.1

Point of Interest and Broadcast Region

Each P oI has a wireless base station, which periodically broadcasts messages in a limited transmission range. To increase data accessibility of mobile nodes in the surrounding area of the P oI, more popular and important messages will be disseminated more frequently. For example, a sale is valid for a certain period, when the period is going to expire, this sale becomes more important than other sales. The P oI, therefore, increases the broadcast frequency of this sale to broadcast it more before its expiration.

4.1.1

Broadcast Region

Because Schelling behavior exists, according to the first property, density of similar nodes in surrounding region of a P oI is much higher than that of other regions. To improve data accessibility we use the push model in which messages are rebroadcast by nodes in this region. However, to limit scope of flooding, each message m will be assigned a TimeTo-Live (T T L) value. A mobile node n rebroadcasts m as presented in Algorithm 1. Algorithm 1 Broadcasts a message INPUT: node n is holding message m OUTPUT: n broadcasts m and updates m’s T T L value. BEGIN if (m.broadcastT imer expires) && (m.T T L > 0) then m.T T L = m.T T L-1; n broadcasts m; end if END

This algorithm creates a broadcast region surrounding the P oI. To be precise, the Broadcast Region is defined as follows: “a Broadcast Region of a P oI is a region created by broadcasts of mobile nodes arriving the P oI. These mobile nodes only broadcast messages created by their interested P oI, if these messages have positive T T L values. ” According to this definition, each message has a different Broadcast Region (BR). This is because the size of BR depends on T T L value of the message, transmission range and

New, Arriving, Staying

speed of relaying nodes. In particular, for a message m, if relaying nodes move faster to their P oI or their transmission ranges are shorter, m’s broadcast region is smaller. Figure 4 shows a BR with the solid curve boundary. This region is created by transmission range of arriving nodes.

4.2

Mobile Node States

Figure 5 shows a state machine where each circle is a state of mobile nodes in our network. Each mobile node can be in one of three states: N ew, Arriving, or Staying. In following sections, we discuss characteristics of these states, the transitions among them, and corresponding protocols of mobile nodes at each state.

4.3

New Node

In our context, each mobile node has only one interest. That means, at each time point, one mobile node can only move toward its current P oI. A mobile node, which is coming to its current P oI and is out side the BR of its targeted P oI, is in N ew state. In figure 4, nodes n7 , n8 are new mobile nodes because they are out side the BR. Meanwhile, n9 is leaving the P oI of n8 but at the same time n9 might be a new node of another P oI. There are two transitions at state N ew: transition 1 and 2. The former occurs when a new node switches interest to a new interest and starts moving toward the new P oI. The latter occurs when the mobile node enters the BR of its targeted P oI and changes state to Arriving. Mobile nodes detect the existence of BR by overhearing broadcast messages from nodes at the boundary of this region. To save network bandwidth we use a pull model for new nodes. In particular, when a new node has a query, it broadcasts its query to its neighborhoods; these neighbors collaboratively help to find answer for that query. Detail of this query and response process will be presented in Section 5.1.

4.4

Arriving Node

A new node becomes an arriving node if it enters the BR of its targeted P oI. Mobile node n1 in Figure 4 is an arriving node. The role of arriving nodes is to rebroadcast messages inside the BR created by the P oI, to exploit the first property of Schelling behavior. Therefore, inside the BR, mobile nodes have potential opportunities to learn the content of their P oI by receiving a rich set of broadcasts from similar neighborhoods. This improves data accessibility of mobile nodes inside the BR and thus improves data accessibility of the entire system. From Arriving state, a mobile node can switch either to N ew state or Staying state (transitions 3 and 4 in Figure 5). Switching to N ew state means the mobile node changes

Algorithm 2 Broadcast timer update

n 1

n 2

n 3

Figure 6: n1 is the broadcaster, n2 , n3 are receivers. The most distant node, n3 , will be the next broadcaster interest and starts moving toward the new P oI. Meanwhile, switching to Staying state means the mobile node enters the center of its current targeted P oI or the transmission range of the base station. To reduce message overhead and avoid broadcast storms inside the BR, each message m will be assigned a broadcast timer as presented in the next section.

4.4.1

Broadcast Timer Estimation

When an arriving node n receives a message m, n estimates m’s broadcast timer as follows: m.broadcastT imer =

TX · T imeU nit Dist(sender, n)

(2)

In Equation 2, m.broadcastT imer denotes the value of message m’s broadcast timer estimated by receiving node n. Notice that n can later be the sender of m. T X is the transmission range of mobile nodes and T imeU nit specifies the scale of the broadcast timer. For example, T imeU nit may be in seconds or minutes. The distance between sender and n (i.e. Dist(sender, n)) can be estimated by various techniques [2, 8]. In Equation 2, if n1 is the sender of m and n2 , n3 are receivers, and if Dist(n1 , n2 ) < Dist(n1 , n3 ), n3 will assign a shorter broadcast timer to m. Then, n3 will broadcast m prior to n2 , resulting in a larger region covered by m in the network. Figure 6 shows an example where n1 is the original broadcaster of m. n3 will be the next broadcaster because it is farther from n1 than n2 .

4.4.2 Overhearing Mechanism Besides broadcast timer, we use the overhearing mechanism in wireless network to avoid broadcast storms and save network bandwidth. In particular, whenever a mobile node n3 receives a message m, it performs Algorithm 2 to update m’s broadcast timer. If m doesn’t exist in n3 ’s cache, n3 will add m into its cache. By applying Algorithm 2, once n3 overhears m, it resets m’s broadcast timer, and thus we can minimize the redundant broadcast of m as well as maximize data dissemination coverage.

4.5

Staying Node

When an arriving node n enters the transmission range of base station at its targeted P oI, n switches to Staying state. For example, node n3 in Figure 4 is a staying node. At Staying state, because n is inside the transmission range of the base station, all of its requests will be answered instantly by its local cache, its neighbors or the base station at P oI. Similar to Arriving nodes, nodes in Staying state also assign broadcast timers for messages and apply overhearing mechanism to reset message timers when overhearing from its neighborhood.

INPUT: n3 receives m from n1 . OUTPUT: n3 updates m’s broadcast timer. BEGIN TX · T imeU nit tmp = Dist(n 1 ,n3 ) if m doesn’t exist in n3 ’s cache then m.broadcastT imer = tmp; n3 adds m and applies cache replacement in Section 5.1 else m.broadcastT imer = tmp; end if END

From Staying state, a mobile node can switch to N ew state. This is when a mobile node changes to a new interest. When a mobile node switches interest, it starts leaving its old P oI for its new P oI (e.g. node n10 in Figure 4).

5.

QUERY, RESPONSE, CONTEXT-SWITCHING

All nodes in our network can make queries at anytime but only arriving and staying nodes can rebroadcast messages. In this section, we first present query and response process. Then, we present the context-switching where a mobile node changes interest and how this context-switching can be used to improve data accessibility. Algorithm 3 Query and Response INPUT: a mobile node n has a query q for a message m OUTPUT: n has a query hit or a query miss BEGIN if (m exists in n’s memory) then return FOUND; // query hit else if (n is a “Staying” node) then n sends q to the base station of its targeted P oI and the base station returns response to n. else n broadcasts q to its neighbors (with query message format presented in Figure 7). Suppose n2 receives q. if (m exists in n2 ’s memory) then if (n2 is n1 ’s one-hop neighbor) then n2 returns response message to n directly else n2 returns response message to n via an underlying routing protocol such as AODV or DSR. end if else if (n2 .i == n.i) then n2 broadcasts q on behalf of n. The process repeats. else n2 discards the query q end if end if end if end if END

5.1

Query, Response, and Limited Flooding

Because Schelling behavior occurs in our network, the second property hold. That is, if co-located mobile nodes share mutual interests, they can automatically form groups, which “move” towards the P oIs . We exploit this property to improve our query and response process. In particular, when a node n has a query for a message m, it performs Algorithm 3. By following this algorithm, query and response are only forwarded within the similar nodes of n. Thus,

Sender ID

Sender’s Interest ID

Query

Sender ID

Receiver ID

Response

Query Message Response Message

Figure 7: Query and Response Message Formats we can limit the message overhead and save network bandwidth. Figure 7 shows the formats of query and response messages in which “Requester’s Interest ID” is used to limit the flooding scope. Particularly, node n only forwards the query whose original requester has the same interest as n, otherwise n discards the query. This is an intuitive behavior as co-located nodes with mutual interest have automatic incentive to collaborate. With this automatic grouping, our scheme does not need any group management protocol and thus reduces communication overhead. When node n receives any messages from the network, including response for its request and broadcast messages, n performs Algorithm 4 to update its memory. Algorithm 4 Cache Replacement INPUT: n receives m OUTPUT: n finishes cache replacement BEGIN if (m exists in n’s memory) then n updates m’s broadcast timer as in Algorithm 2 else if (n’s memory is full) then if (all messages belong to n’s current interest) then n replaces least popular message with m, if exists. Otherwise, n discards m. else //There are several messages of old interest n replaces least popular message of old interest with m end if else n adds m into its memory end if end if END

5.2

Context-switching

In our network, at one time, each node n has only one current interest. When n switches to a new interest, n starts moving toward its new P oI (e.g. P oI2 ) instead of its old P oI (e.g. P oI1 ). When n gets closer to P oI2 , n learns more about P oI2 through its queries. At the same time, its cache content changes gradually, with more and more messages of P oI2 replacing messages of P oI1 . This is where the contextswitching occurs. To leverage this context-switching, we apply a pull model of data dissemination for nodes outside the BR. In particular, when n switches interest to P oI2 , although it does not broadcast messages of P oI1 in its memory, n still answers queries from coming nodes to P oI1 . For example, in Figure 4, P oI1 is the square, when n9 is leaving P oI1 for P oI2 , it meets n7 whose current target is P oI1 . If n7 queries some message m and n9 overhears this query, n9 can answer if m still exists in n9 ’s memory. By this way, context-switching can be leveraged to improve data accessibility for coming nodes to P oI1 . Likewise, n9 may learn about its new interest P oI2 from nodes leaving P oI2 long before it enters the BR of P oI2 . Notice that in contextswitching, leaving nodes and coming nodes are considered to have partly mutual interest.

6.

EVALUATION

We implement a Java-based middleware layer simulation to evaluate our data dissemination scheme. Particularly, the simulation is mainly for the design of data dissemination protocol rather than the network stack and routing protocols. In this section, we first describe simulation settings, which give us Schelling behavior. Then, we rely on the Schelling behavior to evaluate our data dissemination scheme.

6.1

Existence of Schelling Behavior

Field Number of nodes Number of P oIs Node speed (walking speed) Area of simulation Node and P oI transmission range (T X) Staying period at a P oI Probability of changing interest (p)

Value/Unit 1000 6 random [1.0,2.0] (m/s) 1000x1000 (m2 ) [50,75,100] (m) random [20,50] (s) random [0.01,0.1]

Table 1: Network settings Table 1 presents the settings we use to simulate Schelling behavior. We implement a mobility model operating on a Manhattan grid street where P oIs are at the intersections of streets. A mobile node n in our simulation works as follows. Initially, n obtains a position, a speed, and a P oI, all at random. Then, n starts moving along with streets and toward its P oIs. During its movement, n also can switch interest with probability p. When arriving at its P oI, n stays for a random period, then changes interest (corresponding to a new P oI), and repeats the entire process. In this paper, we study the steady state of our simulation. We expect that if we use settings in Table 1, at steady state, the simulation shows us Schelling behavior. Figure 8 shows node distribution in steady state of our simulation and Figure 9 shows node distribution of one particular interest. Each shape in these figures denotes one interest, a big shape represents a P oI and a small shape is one mobile node. When a node changes to a new interest, its shape changes accordingly. These figures show that at steady state the Schelling behavior exists because (1) closer to a P oI, the density of nodes interested in this P oI increases and (2) similar nodes can group into clusters due to their close proximities on the ways toward their mutual targeted P oIs. Figure 9 also shows that density of similar nodes gets maximized at the P oI and decreases gradually at farther distance. To further validate Schelling behavior in our simulation, we define the notion of Interest Group as follows: 1. IF (n1 .i = n2 .i) && InT ransmissionRange(n1 , n2 ) THEN InterestGroup(n1 , n2 ) Here, InterestGroup(n1 ,n2 ) means n1 and n2 are in one Interest Group and InT ransmissionRange(n1 , n2 ) means n1 and n2 are within transmission range of each other. 2. IF InterestGroup(n1 ,n2 ) && InterestGroup(n1 ,n3 ) THEN InterestGroup(n2 ,n3 ) We then use this Interest Group definition to group similar nodes and compute the group size. Figure 10 shows the relationship between group size and distance from mobile nodes to the P oIs. In this figure, closer to the P oIs, group size increases. This confirms the first property of Schelling

100

TX = 50 (m) 700

Group Size (number of nodes)

90

600

500

400

80

70

60

50

40

30

20

10

300

0

0

200

400

600

800

1000

1200

Distance from mobile node to the PoI (m) 200

Figure 10: Distance to the P oI and the group size of similar nodes

100 200

300

400

500

600

700

800

Figure 8: Schelling behavior exists in steady state. Big shapes are P oIs and small ones are mobile users 700

600

500

Name Local Hit (L1 ) Similar Nodes Hit (S1 ) Leaving Nodes Hit (L2 ) Server Hit (S2 ) Total Hit Query Miss

400

Description/Unit(%) Query answered by local memory of mobile nodes Query answered by similar nodes in the neighborhoods of n Query answered by mobile nodes during their context-switchings is contributed by (i) “Staying” nodes, who directly access the base station or (ii) “Arriving” and “New” nodes through multi-hop relays L1 + S1 + L2 + S2 100 - Query Hit

Table 3: Definitions of Metrics

300

200

100 200

300

400

500

600

700

800

Figure 9: Schelling behavior for one interest behavior. At farther distances, group size varies from 5 to 20. This confirms that mobile nodes can group into small clusters on the ways to their targeted P oIs; thus, the second property of Schelling behavior holds. Given the Schelling behavior, we simulate data dissemination scheme with settings in Table 2. Then, we use metrics defined in Table 3 to evaluate our presented scheme. Field Number of messages created by a P oI Number of messages in one broadcast of P oI Node memory size M (all nodes are equal) TTL θ

Value/Unit 500 50,75,100 50,75,100 1,2,3,4 [0.6,..,1.0]

Table 2: Data Management Settings

6.2

Performance of Data Dissemination Scheme

In our experiments, we vary θ (see Equation 1), node transmission range T X, node memory size M , number of messages in one broadcast of a P oI, T T L, and p to evaluate our proposed data dissemination scheme.

6.2.1

Broadcast Region and Context-switching

Figure 11 shows how Broadcast Region (BR) and contextswitching improve “Total Hit”. Particularly, Figures 11(a) and 11(b) present the difference between nodes inside BR

and the ones outside this region. For nodes inside BR, we have a better “Local Hit” because they are more informed by their similar neighborhoods. Meanwhile, “Local Hit + Similar Nodes Hit + Leaving Nodes Hit” of nodes outside BR is comparable (i.e. from 60% to 64%) to that of nodes inside BR. This is because inside BR, nodes have very similar memory content as they tend to store most popular messages. Thus, when a node fails to answer a request, it is likely its neighborhood will fail. In contrast, nodes outside BR have more diverse memory contents and with context-switching, leaving nodes help coming nodes to answer queries. This confirms that context-switching does help to improve data accessibility, especially for nodes outside the BR. In contrast, Figure 11(a) shows that context-switching is not very effective for nodes within BR because these nodes can obtain information from their neighbors or base station. This explains why the two curves “Local Hit + Similar Nodes Hit + Leaving Nodes Hit” and “Local Hit + Similar Nodes Hit” look similar. Figures 11(a) and 11(b) also emphasize the effectiveness of BR where the “Total Hit” increases up to 100% while nodes outside BR can only get 64%. This is because nodes inside BR can collaboratively relay request to base station and return the answer to the requester. Meanwhile, communication of nodes outside BR is almost ad hoc, except very big groups whose several members are staying nodes. In these figures, when T X increases, “Total Hit” varies slightly. This is because inside the BR, nodes instantly get informed messages from neighborhood; thus, increasing T X does not improve much. On the other extreme, outside the BR the network is too sparse of similar nodes, increasing T X does not change the query hit ratio significantly. However, when

90

80

80

70

70

60

50

40

30

20

Local Hit Local Hit + Similar Nodes Hit Local Hit + Similar Nodes Hit + Leaving Nodes Hit Total Hit

10

0

50

60

70

80

90

90

80

70

60

50

40

60

50

40

30

30

20

20

10

10

0

100

100

Local Hit Local Hit + Similar Nodes Hit Local Hit + Similar Nodes Hit + Leaving Nodes Hit Total Hit

Query Hit (%)

100

90

Query Hit (%)

Query Hit (%)

100

50

60

Transmission Range (m)

70

80

90

0

100

Local Hit Local Hit + Similar Nodes Hit Local Hit + Similar Nodes Hit + Leaving Nodes Hit Total Hit 50

60


(a) Inside Broadcast Region

70

80

90

100


(b) Outside Broadcast Region

(c) For the entire network

Figure 11: Nodes inside BR can get 100% of Total Hit while nodes outside BR only get up to 64%. For the entire network, we get an average Total Hit ratio up to 87%. (θ = 0.8, T T L=2, and M =75) θ = 0.7 θ = 0.8 θ = 0.9 θ = 1.0

70

Query Hit (%)

Query Miss (%)

60

50

40

30

100

100

90

90

80

80

70

60

50

20

40

10

30

0

0

100

200

300

400

500

600

Query Hit (%)

80

20

70

60

50

TX=75,M=50, inside MRZ TX=75,M=75, inside MRZ TX=75,M=100, inside MRZ TX=75,M=50, outside MRZ TX=75,M=75, outside MRZ TX=75,M=100, outside MRZ

40

Total Hit (inside MRZ) Total Hit (outside MRZ) AVG Hit (entire network) 0

Distance To Points of Interest

(a) Distance sensitivity

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

30

20 0.55

0.09

0.6

0.65

0.7

Probability of Changing interest (p)

(b) Changing interest

0.75

0.8

θ value

0.85

0.9

0.95

1

1.05

(c) θ sensitivity

Figure 12: System sensitivity (θ = 0.8, T T L=2, and M =75) we compute the average query hit for the entire network (Figure 11(c)), the “Total hit” increases about 15% when increasing T X. This is because when we increase T X, the BR becomes larger and thus we have more nodes inside BR with very high “Total Hit”. In conclusion, for nodes inside BR, our scheme is robust with respect to T X. The context-switching mechanism contributes more for nodes outside BR and for smaller T X.

Impact of Distance and Changing Interest

Figure 12(a) shows that at further distance to P oIs, query miss increases. This is expected because when a node n comes closer to its P oI, n observes higher data availability provided by similar nodes in its neighborhood. Thus, n should have higher query hit ratio. Figure 12(b) shows when the probability of changing interest increases, query hit ratio of nodes inside BR is stable due to the rich set of information within BR (here T X = 75, M = 75, θ = 0.8, T T L = 2). Meanwhile, the query hit ratio of nodes outside BR decreases gradually due to the sparse network density. In this figure, p = 0.08 means at any time of the simulation, 80 nodes change their interests. This confirms the robustness of our scheme to p for nodes inside BR.

6.2.3

Percentage of Mobile Nodes (CDF)

6.2.2

100

Impact of θ and Node Memory Size

Figure 12(c) shows that inside the BR, increasing node memory size (M ) does not improve the query hit much because the high availability of data within this region. In contrast, outside BR, increasing M improves query hit considerably, from 5% to 10%. Figure 12(c) also presents impact of θ on data accessibility. In particular, when θ is closer to

90

80

70

60

50

40

30

20

Our scheme Basic scheme

10

0

0

100

200

300

400

500

600

Query Hit Delay (seconds)

Figure 13: Query Hit Delay 1, we have “more Zipf” broadcasts and requests; thus, they create better “match” and provide a higher query hit ratio. Again, impact of θ on nodes inside BR is less significant than that on nodes outside BR because nodes inside BR are well informed by their neighborhoods. Meanwhile, nodes outside BR communicate in ad hoc mode with limited memories. This result together with result in Sections 6.2.1 and 6.2.2 confirms that inside BR, our presented scheme is robust to T X, θ, p and M .

6.2.4

Query Hit Delay

We define the “Query Hit Delay” metric of our scheme for a node n as follows: QueryHitDelay(n) = t2 − t1

(3)

In Equation 3, t1 is the time at which n switches to a new interest (and thus n starts moving towards the correspond-

100

100

10000

90 80

60

60 50

40

40 30

20

80 Query Hit (%)

Query Hit (%)

70

Nodes inside MRZ (%)

80

20

Total Hit inside MRZ Total Hit outside MRZ AVG Hit (entire network) Nodes inside MRZ 2 3 TTL Value

4

Our scheme AVG query hit PureFlooding AVG query hit LimitedFlooding AVG query hit Our scheme overhead PureFlooding overhead LimitedFlooding overhead

20

0 1

100 40

10

0 0

60

0

5

Figure 14: T T L sensitivity and number of nodes inside BR T X = 75, M = 75, θ = 0.8 ing P oI). t2 is the time when n’s first query for the new interest gets a hit. Figure 13 compares our scheme with a basic scheme where node n holds queries since n switches to a new interest until n arrives at the P oI. At the P oI, n obtains 100% of query hit. “Query Hit Delay” metric of basic scheme can be computed from the distance between old and new P oIs, and the speed of n. Figure 13 shows that our scheme offers shorter delay (less than 170 (s)) for about 78% of nodes. Meanwhile, the basic scheme has a sudden change at time 170 (s) since nodes arrive at their new P oIs after 170 seconds (notice that distance between two P oIs varies, see Figure 8). This concludes that our dissemination scheme obtains better access time for all nodes in the network, regardless their distances to the targeted P oIs.

6.2.5 T T L sensitivity and Message Overhead Figure 14 shows when T T L increases, the “Total hit” of nodes inside BR is stable due to the high data availability. In contrast, “Total hit” of nodes outside the BR decreases because when the T T L increases, the radius of BR also increases (i.e. radius of BR∼T X · (T T L + 1) ). Meanwhile, node density decreases when the distance to the P oI increases, thus at the edge area of a larger BR, the node density is much lower. This results in lower query hit of nodes outside a larger BR. However, for the entire network, the average query hit is stable (∼82%) because larger T T L provides more number of nodes inside BR. Thus, larger T T L offers better data accessibility because more nodes inside BR have access to information of their targeted P oIs. To further evaluate our scheme (with T X=75,M =75,θ=0.8), we compare it with two other schemes: P ureF looding and LimitedF looding. These two schemes also rely on Schelling behavior of the network but they have different data dissemination strategies. P ureF looding broadcasts message naively without broadcast timer, T T L, context-switching. Query is answered by node n itself and nodes within n’s transmission range. LimitedF looding uses T T L to limit broadcasts. However, it does not have broadcast timers and context-switching. Moreover, its query process is the same as P ureF looding. All other simulation settings of three schemes are exactly the same. Figure 15 presents the average query hit ratio over the entire networks and average number of messages broadcast (overhead) by a node of three schemes. Particularly, our schemes improves the query hit ratio more than 20% because mobile nodes collaboratively relay and answer queries. Our scheme also outperforms P ureF looding and LimitedF looding in terms of av-

1 0

1

2 3 TTL Value

4

5

AVG # of messages sent by one node (log scale)

100

Figure 15: Our scheme improves significantly average query hit ratio while minimizing message overhead

erage message overhead sent by one node in the network. In particular, we observe that when T T L increases to 4, each mobile nodes in our scheme sends less than 200 messages while the corresponding values of LimitedF looding is 1000 and P ureF looding is 10000. In conclusion, our proposed scheme improves significantly average query hit ratio while minimizing message overhead.

7.

RELATED WORK

We first summarize previous studies on data dissemination in wireless networks. Then, we discuss limitations of existing mobility models currently used to evaluate data dissemination schemes in mobile P2P networks.

7.1

Data Dissemination Strategies

There have been numerous studies on data dissemination in mobile P2P networks [3, 10, 13, 18, 19, 12, 20, 21]. In this section, we present representative data dissemination schemes to highlight the difference between our proposed data dissemination scheme and the existing works. The first approach is broadcast-based data dissemination [14, 18, 19, 12, 20]. This is an intuitive approach because of the dynamic nature of wireless networks where connections among nodes are not stable. The broadcast, therefore, is the best way to disseminate messages. However, blind broadcast causes broadcast storms and cost network bandwidth. To avoid this, numerous methods have been proposed [5, 9, 16], in which mobile nodes actively broadcast messages and tune broadcast rate adaptively according to network condition. Although the broadcast is controlled, these schemes may still create broadcast storms in dense networks. The second approach to data dissemination is for intermitted wireless networks [1, 22]. The main idea of these schemes is leveraging the mobility and store-carry-forward paradigm to improve data delivery. These schemes essentially work for sparse networks but may not work for denser networks where the quality of dissemination is expected. For example, deadline or coverage of data delivery may not be guaranteed with these schemes. The third approach is topic-based (interest-based) data dissemination [3, 6, 22] where co-located mobile users only exchange information if they share mutual interest on some topic. However, these schemes are only for ad hoc networks and don’t exploit the big picture of the whole network where network density changes according to distance to P oIs as shown in Schelling behavior. Understanding the big picture

of the network and the behavior of mobile users is key to design an efficient data dissemination scheme.

tion in mobile P2P networks. Toward this end, our proposed data dissemination scheme is novel and widely applicable.

7.2

9.

Mobility Models and Data Dissemination

Mobility is a crucial factor in designing data dissemination schemes for mobile P2P networks. Currently, almost all existing data dissemination schemes for mobile P2P networks are evaluated by either Random Way Point [5, 16, 18, 19, 20] or Group-based [7, 11] mobility model. However, these mobility models have noticeable drawbacks. Obviously, Random Way Point Mobility model is unrealistic, especially in macro-scope scale, due to its repetitive movement patterns. On the other extreme, Group-based mobility model requires nodes to move in groups at all times and they can never change to new groups or moves independently. This is a very strong assumption, which makes Group-based model unrealistic. Obviously, Group-based mobility model can be used in disaster/recovery, task-based, mission-critical scenarios where team members are required to go in pre-configured groups. However, for civilian scenarios and daily activities, a strong grouping assumption may not hold. A recent work shows that movement is affected by the needs of humans to socialize or cooperate [11]. Thus, people who share mutual interest can group instantly to exchange messages.

8.

CONCLUSION

In this paper, we leverage the natural grouping and moving behaviors of humans presented in Schelling’s model, to design a novel data dissemination scheme for hybrid mobile P2P networks. First, we present two important properties of this model: (1) closer to the PoIs, density of similar individuals increases, and (2) due to their close proximities on the ways to their mutually targeted P oIs, individuals can group into small moving clusters. Then, we demonstrate various scenarios in real world where analogies of Schelling’s model and wireless technologies (i.e. Schelling behavior) indeed coexist such as shopping malls, shopping streets, campus life. This co-existence creates an ideal opportunity to expedite data dissemination in mobile P2P networks. Therefore, we propose an efficient and lightweight data dissemination scheme, which exploits Schelling behavior to improve data accessibility, especially for mobile nodes inside the Broadcast Region (BR). To this end, we design a push-based scheme by allowing mobile nodes in this region to rebroadcast messages collaboratively. In order to reduce message overhead and avoid broadcast storms, we assign each message m a broadcast timer, which is reset by applying overhearing mechanism. When mobile nodes are outside of the BR, we apply a pull-based scheme where requesters broadcast queries. This scheme suits the sparse network density, reduces message overhead and saves node energy. To further improve data accessibility, we present contextswitching where leaving nodes can participate in answering query of coming nodes. Our scheme is lightweight because there is no need of group membership management protocols and small groups of similar nodes can themselves limit the flooding of queries and responses. Our experiments show that based on the Schelling behavior, the proposed data dissemination scheme improves data accessibility significantly while minimizing network overhead. In summary, we believe wireless technology and Schelling behavior indeed co-exist in many real world scenarios and they offer a unique opportunity to expedite data dissemina-

REFERENCES

[1] M. Grossglauser and D. N. C. Tse. Mobility increases the capacity of ad-hoc wireless networks. In INFOCOM, pages 1360–1369, April 2001. [2] M. Hazas, C. Kray, H. Gellersen, H. Agbota, and G. Kortuem. A relative positioning system for co-located mobile devices. In Mobisys, 2005. [3] A. Heinemann, J. Kangasharju, F. Lyardet, and M. Muhlhauser. iClouds - peer-to-peer information sharing in mobile environments. European Conference on Parallel Processing, 2003. [4] X. Hong, M. Gerla, G. Pei, and C.-C. Chiang. A group mobility model for ad hoc wireless networks. In ACM International Workshop on Modeling and Simulation of Wireless and Mobile Systems, 1999. [5] A. Khelil, C. Becker, J. Tian, and K. Rothermel. An epidemic model for information diffusion in manets. In MSWiM, 2002. [6] G. Kortuem, J. Schneider, and D. Preuitt. When peer-to-peer comes faces-to-faces: Collaborative peer-to-peer computing in mobile ad hoc networks. In International Conference on Peer-to-Peer (P2P) Computing, 2001. [7] B. Li and K. Wang. Nonstop: Continuous multimedia streaming in wireless ad hoc networks with node mobility. IEEE Journal on Selected Areas in Communications, 21:1627–1641, 2003. [8] H. Lim, L.-C. Kung, J. C. Hou, and H. Luo. Zero-configuration, robust indoor localization: Theory and experimentation. In INFOCOM, 2006. [9] W. Lou and J. Wu. Double-covered broadcast (dcb): a simple reliable broadcast algorithm in manets. In INFOCOM, pages 2084–2095, 2004. [10] M. Motani, V. Srinivasan, and P. S. Nuggehalli. Peoplenet: Engineering a wireless virtual social network. In MOBICOM, 2005. [11] M. Musolesi and C. Mascolo. Designing mobility models based on social networks theory. Mobile Computing and Communications Review, 11, 2007. [12] S.-Y. Ni, Y.-C. Tseng, Y.-S. Chen, and J.-P. Sheu. The broadcast storm problem in a mobile ad hoc network. In MOBICOM, 1999. [13] M. Papadopouli and H. Schulzrinne. Effects of power conservation, wireless coverage anc cooperation on data dissemination among mobile devices. In MOBICOM, 2001. [14] T. Repantis and V. Kalogeraki. Data dissemination in mobile peer-to-peer networks. In International Conf. on Mobile Data Management (MDM), 2005. [15] T. C. Schelling. Models of segregation. American Economic Review, 59, 1969. [16] L. Vu and K. Nahrstedt. Adaptive mobility-assisted data dissemination in mobile disaster/recovery environments. In MILCOM, 2007. [17] K. H. Wang and B. Li. Group mobility and partition prediction in ad hoc wireless networks. In IEEE International Conference on Communications, 2002. [18] O. Wolfson and B. Xu. Opportunistic dissemination of spatio-temporal resource information in mobile peer to

[19]

[20]

[21]

[22]

peer networks. In International Conf. on Mobile Data Management (MDM), 2004. O. Wolfson, B. Xu, and R. M. Tanner. Mobile peer-to-peer data dissemination with resource constraints. In International Conf. on Mobile Data Management (MDM), 2007. O. Wolfson, B. Xu, H. Yin, and H. Cao. Search-and-discover in mobile p2p network databases. In International Conference on Distributed Computing Systems (ICDCS), 2006. X. Yang and A. Bouguettaya. Using a hybrid method for accessing broadcast data. In International Conf. on Mobile Data Management (MDM), 2005. W. H. Yuen, R. D. Yates, and S.-C. Mau. Exploiting data diversity and multiuser diversity in noncooperative mobile infostation networks. In INFOCOM, 2003.

Exploiting Schelling Behavior for Improving Data ... - CiteSeerX

Exploiting Schelling Behavior for Improving Data ... - CiteSeerX

Suggest Documents

Exploiting Data Mining techniques for improving the ... - CiteSeerX

Exploiting Data Mining techniques for improving the ... - CiteSeerX

Exploiting Equalization Techniques for Improving Data Rates in ...

Exploiting Network Parallelism for Improving Data Transfer Performance

Improving Memory after Interruption: Exploiting Soft ... - CiteSeerX

IMPROVING SIDE-CHANNEL ATTACKS BY EXPLOITING ... - CiteSeerX

mesoscale data assimilation for improving quantitative ... - CiteSeerX

Exploiting Redundancy for Flexible Behavior: Unsupervised Learning ...

Behavior Aware Data Placement for Improving Cache Line Level ...

Data layout transformation exploiting memory-level ... - CiteSeerX

Behavior and Techniques for Improving

Exploiting Unlabeled Data in Ensemble Methods - CiteSeerX

Exploiting Unlabeled Data in Ensemble Methods - CiteSeerX

583 IMPROVING STUDENT BUS-RIDING BEHAVIOR ... - CiteSeerX

Exploiting new genome data and web tools for ... - CiteSeerX

Exploiting Semantics of Web Services for Geospatial Data ... - CiteSeerX

Exploiting Locality for Data Management in Systems of ... - CiteSeerX

A Calculus for Exploiting Data Parallelism on Recursively ... - CiteSeerX

Exploiting Hâ Sampled-Data Control Theory for High ... - CiteSeerX

Improving Downlink UMTS Capacity by Exploiting Direct ... - CiteSeerX

thomas s. schelling

Exploiting multidimensional data for web site automation

Exploiting Semantics for Big Data Integration

Data Parallelism Exploiting for H.264 Encoder

Exploiting Schelling Behavior for Improving Data ... - CiteSeerX