Exploiting Location Information for Infostation-Based Hoarding Uwe Kubach and Kurt Rothermel Institute of Parallel and Distributed High-Performance Systems (IPVR), University of Stuttgart, Breitwiesenstr. 20-22, 70565 Stuttgart, Germany
fUwe.Kubach,
[email protected]
Our hoarding mechanism is based on the use of infostations [3]. Infostations are islands of wireless LANs, which are embedded in an area that is otherwise not covered b y a wireless netw ork at allor only by a wireless WAN (see Figure 1). The wireless LANs oer a high bandwidth (typically 11 MBit/sec), but cover only a small area, since their range is limited to a few hundred meters. We decided to use an infostation infrastructure for tw o reasons. The rst one is that the installation costs for such an infrastructure are low and the second one is that people will not use, for example, a mobile guide, if they have to pay for expensive wireless WAN connections to access the oered information. Although there will be faster wireless WANs in the future, the infostation infrastructure will still be a good choice, since wireless LAN technology will always be faster and cheaper than wireless WAN technology.
ABSTRACT
With the increasing popularity of mobile computing devices, the need to access information in mobile environments has grown rapidly. Since the information has to be accessed over wireless netw orks,mobile information systems often have to deal with problems like low bandwidth, high delay, and frequen t disconnections.Information hoarding is a method that tries to o vercome these problems by transferring information, which the user will probably need, in advance. The hoarding mechanism that we describe in this paper exploits the location dependence of the information access, which is often found in mobile information systems. Our simulation results sho w that it is bene cial to do so and that we achieve higher hit ratios than with a caching mechanism. 1.
INTRODUCTION
The increasing popularity of mobile computing devices, such as laptops and PDAs, has fostered the need to access information in mobile environments. Thus, many mobile information systems ha vebeen developed recently, e.g. [1, 6]. One problem, that arises in all these systems is that the information has to be accessed through wireless netw orks. This often leads to diÆculties like low bandwidth, high delays, and frequent disconnections. In this paper we describe a hoarding mechanism that aims to overcome these diÆculties. The basic idea of hoarding is to predict the information that a user will need in the future and to transfer this information to his mobile device before it is actually accessed. This process is called information hoarding. The problem with hoarding is to predict what information a user will need. How ever, a good prediction is important to avoid that information is transferred to the user's device that he will never request. Otherwise scarce resources, e.g. the time required for the transmission or the memory on the user's device, are w asted. Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. ACM SIGMOBILE 7/01 Rome, Italy © 2001 ACM ISBN 1-58113-422-3/01/07…$5.OO
Figure 1: Infostations spread over the city center of Stuttgart. The circles indicate the areas cov ered by the wireless LANs.
15
In our mechanism the hoarding is exclusively done at the infostations. If available, a wireless WAN is only used in the case of hoard misses, i.e. when an information item is requested that has not been hoarded before. Our mechanism is based on the assumption that the information access of the users is location dependent, i.e. that it depends on the users' locations, which information items they preferably request. The idea of our mechanism is to gather information about what items the average user prefers in certain areas and to use this information to conclude from a user's location to the information items he will probably request. The remainder of this paper is organized as follows: in Section 2 we introduce our mechanism in detail, discuss two variants of it that dier in the supported level of locationawareness, and suggest some further enhancements. Afterwards, we analyze our mechanism's performance in Section 3. Before presenting the results of the analyses, we describe the simulation model and the framework, we used for our experiments. After discussing the related work in Section 4, we conclude the paper in Section 5. 2.
the entirety of the wireless LAN and the associated proxy. The infostations serve geographically limited areas, their socalled hoarding areas. To each infostation belongs exactly one hoarding area and each hoarding area is assigned to exactly one infostation. In reasonable con gurations the hoarding area of an infostation should be bigger than the area covered by the wireless LAN. The infostations maintain knowledge about what the information items are, that the users preferably access while moving through their hoarding area. When a user arrives at an infostation, this knowledge is used to predict the information items that he will most probably request during his subsequent trip through the infostation's hoarding area. Based on this prediction the infostations determine the information items to be hoarded on the user's device. The hoarding areas of adjacent infostations should overlap each other, so that the users can be supplied at each infostation with all the information that they will probably need until they reach the next infostation (see Figure 2).
MECHANISM
In the following we describe two variants of our mechanism. The basic mechanism only provides location-awareness at a relatively coarse level, whereas the enhanced mechanism supports location-awareness at any desired granularity. Before we describe the dierences of the two variants, we give an overview of the basic functionality of the mechanism. 2.1 Preliminaries
As mentioned above, we assume that the information access is location dependent, i.e. the probability with which a user accesses a certain information item depends on his location. This is especially true for what we call an inherently location dependent information access, where the user accesses the information through his location. In situated information spaces [9], for example, the information is accessed through real world objects, or in location targeted advertising systems, the user's location triggers the information transfer to his mobile device. But even in many systems, where the location dependence is not that obvious, the information access is often location dependent. An example for such a system is a mobile tourist guide like [6]. There, the user will most probably access information about the tourist attractions which are close to him. Furthermore, we assume that the information system provides access to a set of single, discrete information items, which can be downloaded independently from each other and that each of these items has a unique identi er. The user mobility model, the information model, and the access model that we used for our simulations are described in detail in Section 3.
Figure 2: Adjacent infostations (IS) and their hoarding areas (HA).
The hoarding process for each user is performed in a cyclic manner. The cycle consists of the following three phases: 1.
During the download period the user is located at an infostation, i.e. his mobile device is connected to a proxy server through a wireless LAN. Based on its knowledge about what the preferred information items are in its hoarding area, the proxy determines the information items that the user will most probably access, before he reaches the next infostation, and transfers them to the user's device. The amount of data that is transferred to the device is either limited by the time the user is staying within the area covered by the wireless LAN or by the memory available on the device. Since the available knowledge about the preferred information items and the determination of the items that will be transferred dier in the two variants of our mechanism, we describe these issues separately in subsections 2.3 and 2.4. 2. The second phase of the cycle begins after the user has left the infostation. It lasts until he reaches the next infostation. During this
2.2 Overview
As mentioned before, our hoarding mechanism is based on the use of infostations. A proxy server that coordinates the hoarding processes is associated with each infostation. In the remainder of this paper we use the term infostation for
Download:
Disconnected Operation:
16
period of disconnected operation, the user's information requests are answered with the information that has been hoarded at the previous infostation. If the user requests an information item that has not been hoarded, a hoard miss occurs. Then the request has to be handled over the wireless WAN, if available and if the user accepts to pay for the WAN connection, or it cannot be answered. Independent of whether or not a request succeeded, it is logged to a le together with the geographic position, where the request occurred. The geographic position is determined with an appropriate location sensor on the user's device, such as a GPS sensor. 3. Finally, the third phase starts as the user reaches the next infostation. Then the information about the user's requests that has been logged on his device is uploaded to the proxy, which distributes this information to the other proxies. This distribution is done by sending the log le to each infostation, including the distributing infostation itself, in whose hoarding area at least one of the logged information requests occurred. The information in the log le is used by the proxies to update their knowledge about the preferred information items within their hoarding areas. Finally, the log le on the mobile device is deleted and the cycle restarts with the subsequent download process. Since the handling of the log les also diers in the two variants of the mechanism, it is again described separately in subsections 2.3 and 2.4. Although the log les are transferred anonymously, the infostations have to know the address of a user's device in order to communicate with it. Thus, information about the user's behavior might be mapped to the address of his mobile device. This might lead to privacy problems if the user does not trust the operator of the infostation infrastructure. Upload:
Figure 3: A user's path through a hoarding area after leaving an infostation.
probability forr information item i during the update period as a(i) = n(i) , where n is the total number of requests counted during the update period. After the update period has expired, the APT is updated as follows: In every entry of the APT the value of the access probability a(i) is set to a new value a (i) according to the following formula: a (i) = a(i) + (1 ) a (i); where 0 1 is a parameter that determines how strong requests during former update periods are considered in the new value a (i). If there is an information item i for which no entry in the APT exists and for that a(i) > 0, we will add an entry (i; a (i)). If the table size exceeds a con gurable threshold, the entries with the lowest access probabilities will be deleted from the APT. 0
0
0
2.3 Coarse Grained Variant
2.3.2 Hoarding Decision
At the beginning of each download phase of the hoarding cycle, the infostation, where the user is currently located, has to decide which information items should be transferred to the user's device. As mentioned above, this decision is based on knowledge about what the information items are, which are most frequently accessed in the infostation's hoarding area. In this variant of our mechanism all this knowledge simply consists of a table that contains the access probabilities of certain information items. We call this table the access probability table (APT). The entries of the APT have the form (i; a(i)), where i is the unique identi er of an information item and a(i) is the access probability of i.
With the APT we can easily decide which information items should be transferred to the user's device, when he wants to hoard information at an infostation. Let us assume that a maximum of m information items can be transferred to the device due to memory restrictions on the device or due to restrictions concerning the maximum transfer time. Then, those m items will be transferred that have the highest access probabilities according to the APT. 2.4 Fine Grained Variant
The disadvantage of the coarse grained variant is that it cannot use any knowledge about the user's future location, which often is easily available and can be very useful for the hoarding decision. For example, a navigation system usually knows the exact path, on which its user travels. If the navigation system could communicate this knowledge to the infostation and the infostation would know which information items are preferred in the crossed subareas of the hoarding area, the hoarding decision could be much more precise. This is illustrated in Figure 3. If we assume that the user only crosses the shaded squares after leaving the infostation and that the infostation knows which items are preferred in these squares, it would be enough to transfer only these items. A transfer of any items belonging to the
2.3.1 Maintenance of the APT
The infostation updates its APT each time an update period expires. During this period it counts how often each information item is requested in its hoarding area. The information about the requests is drawn from the log les that the infostation receives during the update period. Thus, at the end of every update period, the infostation knows the number of requests r (i) that have been made in its hoarding area for information item i during the update period. Each update period ends, when a con gurable number of log les have been processed. So we can calculate the access
17
residual squares would then not be necessary at all, and thus more items belonging to the crossed squares could be transferred. Another example, where knowledge about the preferred information items in subareas would be bene cial, is when a user moves towards a certain destination. Then, it is not very likely that he returns to subareas visited before. So when he prefetches information at an infostation, no information belonging to these subareas should be transferred to the mobile device. If we additionally know the direction in which the user is moving, we should be able to primarily prefetch information that belongs to the subareas in this direction. The goal of the ne grained variant of our mechanism, is to use such hints about the user's future location, in order to improve the prediction about the information items, he will probably need. Since this kind of information is not gained by the hoarding mechanism itself, we call it external knowledge. External knowledge might be oered by the applications or by the users themselves. The basic idea of the ne grained variant is to divide each hoarding area into separate zones, which do not overlap, and to observe separately for each of these zones, which are the preferred information items within the zone. Thus, we maintain for each of these zones a separate APT. The geometry of the zones might be as simple as depicted in Figure 3, where the zones are equally sized squares, or the zones might re ect the geometry of buildings, streets, or other real world objects. As we will explain below, we also have to manage information about the visit probabilities of the zones, which is stored in so-called visit probability maps. Furthermore, the selection process of the information items to be hoarded diers from the rst variant.
squares. The brighter a square is depicted, the higher is its visit probability.
Figure 4: Visualization of a visit probability map.
Each infostation maintains a visit probability map of its hoarding area. This works quite similar to the maintenance of the APTs. During the same update period , we count the number of visits n (z) of each zone z. In each log le we only count the rst visit of each zone. Thus, for the time period , we can calculate the probability v (z) with which a user, who visits the infostation's hoarding area, visits the zone z as v (z) = nt(z) , where t is the number of log les processed during the update period. After a time period has expired, the visit probability map is updated to new values v (z) according to the following formula: v (z ) = v (z ) + (1 ) v (z ); where is again a parameter to control the in uence of former time periods on the new values. 0
2.4.1 Maintenance of APTs
Despite the fact that there is a separate APT for each zone, the maintenance of the APTs works similar to the rst variant. For every zone we count the information requests that occur there separately and consider only those requests in a zone's APT that occurred within the zone. We only count the rst request for an information item i during a user's visit within a zone. Additionally, we count the number of visits in each zone that occur during the update interval . Thus we can calculate the entries of the APT of a zone z, which have the form (i; a(i; z)). i is again the unique identi er of an information item and a(i; z) is the probability with which a user requests information item i, when he visits zone z.
0
2.4.3 External Knowledge
In order to provide external knowledge, the user or the application can specify absolute or relative visit probabilities. Absolute visit probabilities can be used if any knowledge about the user's actual visit probabilities is available. Sometimes, however, only the fact that certain zones are more likely to be visited than others might be known, for example, if the user moves in a certain direction. Then no knowledge about the absolute values of the visit probabilities is available and only relative values can be speci ed. Both kinds of visit probabilities are speci ed in external visit probability maps. These external maps may cover the whole hoarding area or only parts of it, since sometimes only knowledge about parts of the hoarding area might be available. An external visit probability map that speci es absolute visit probabilities has the form vi : Zi ! [0::1], where Zi Z . If more than one map is speci ed, we assume that Zi \ Zj = ;, if i 6= j .
2.4.2 Visit Probability Maps
Basically, the visit probability map of a hoarding area h is a function v : Z ! [0::1], where Z is the set of all zones within the hoarding area h and v(z) is the probability with which a user who visits the hoarding area h also visits the zone z. We call these functions maps, because of their graphical representation. For an example see Figure 4. It shows a visit probability map of the city center of Stuttgart, which we got by simulating the movements of tourists visiting Stuttgart. In this case the zones were equally sized
18
Figure 5: Visualization of an external map with absolute visit probabilities that specify a user's path.
Figure 6: Visualization of an external map with relative visit probabilities that specify a user's preferred direction.
An example where an external map with absolute visit probabilities for the whole hoarding area can be provided, is the navigation system mentioned above. To provide the hoarding mechanism with its knowledge about the user's future path, the navigation system has to specify a map for the whole hoarding area, in which the visit probability of the zones, crossed by the path is set to one and that of the other zones is set to zero (see Figure 5). Partial maps with absolute probabilities are needed, for example, when a mobile guide knows that certain buildings, e.g. museums or stores, are currently closed. Then, the visit probability of the zones covered by these buildings can be set to zero. For the hoarding decision the externally speci ed maps with absolute visit probabilities and the infostation's self-maintained map are integrated into one nal map v : Z ! [0::1] as follows: ( v (z) + (1 ) v(z); if z 2 Z i i i i v (z ) = v (z ); if z 62 S Zi
to comply with the ratios speci ed by the relative probabilities: vi (z ) = c Pri (z ) ; ri (j ) j 2Zi
where c 2 R+
should be chosen so that 8z 2 Zi : vi (z) 1. However we want to avoid, for example, that an area, that is rarely visited according to the infostation's map v, becomes to a frequently visited one in the integrated map v. Therefore we choose, if possible, the constant c so that the average of the visit probabilities vi (z) of all zones in the transformed map is the same as the average of the visit probabilities v(z) of these zones in the infostation's map, i.e.: P c Pri (z) P v(z) z Zi j2Zi ri (j ) ! z Zi = jZi j jZi j Thus we get c = Pz Zi v(z) and X vi (z ) = v (z ) Pri (z ) : r (j ) 2
i
2
2
8
The parameter 0 i 1 controls how strong the infostation's map is considered in zones that are covered by the external map vi . If it is set to one, the infostation's map is not considered at all in these zones. This should be the default value if the external knowledge sources are considered as reliable. External maps with relative visit probabilities have the form ri : Zi ! R+ , where again Zi Z and Zi \ Zj = ;, if i 6= j . If the relative visit probability ri (z1 ) is assigned to zone z1 and ri (z2 ) is assigned to zone z2 , this means that the user is expected to visit zone z1 rrii ((zz12 )) times as often as zone z2 . An example of a map with relative visit probabilities is given in Figure 6. This map states that the user will preferably move to the south after leaving the infostation. Before a map ri with relative visit probabilities can be integrated into the infostation's map, it has to be transformed into a map with absolute visit probabilities. Basically, there is an in nite number of possibilities to choose the absolute values vi (z) for the transformed map vi , since they only have
z2Zi
j 2Zi
i
If this calculation results in values vi (z) > 1, it is not possible to comply to the relative visit probabilities speci ed in the map ri and to get the same average absolute visit probability as for the probabilities stored in the infostation's map. In this case we normalize the values vi (z) as follows: vi (z ) = vi (z ) max vi (j ) 0
j 2Zi
This normalization changes the average value of the visit probabilities vi (z), but keeps it as close as possible to the average of the visit probabilities stored in the infostation's map. Finally the transformed map is integrated into the nal map v in the same way as we described it for the external maps with absolute visit probabilities. The external visit probability maps are only a low level interface to our mechanism. We can imagine that there are also application dependent higher level interfaces, which
19
could, for example, allow a navigation application to specify a user's path through the coordinates of the visited locations. From these coordinates the external map could be automatically calculated, so that the user or application do not have to know anything about zones. Figure 7 summarizes the integration process of external knowledge.
be the same as the ones directly maintained in the APT in the rst variant. However smaller dierences may occur, due to the fact that we always consider multiple requests to the same information item in the rst variant, but do not count such requests in the second variant, if they occur within the same zone. If external knowledge is available, it is highly bene cial to use the second variant, as we will show in Section 3. 2.5 Common Characteristics of Both Variants
An advantage of both variants is that they dynamically adapt to changes in the users' access and visit patterns. As an example, we can again consider a mobile guide. During the day the users might preferably access information about stores, malls, museums, etc., whereas in the evening they might prefer information about restaurants, bars, or cinemas. Such a change would be re ected in the APTs and thus the users will always get the information that is currently most popular. The time needed for such an adaption depends on the history parameter and the length of the update interval . Another advantage is that both variants scale very well. If an infostation is overloaded, this problem can be solved by adding one or more new infostations in the neighborhood of the overloaded one. It is even possible that two or more infostations serve exactly the same hoarding area. However, we have to make sure that the users do not visit an infostation, that serves exactly the same hoarding area as another one visited before. Accordingly, the area supported by the whole information system can be easily enlarged by adding new infostations. The only problem with an increasing number of infostations is, that the communication overhead also increases. The reason is that a log le received by an infostation from a mobile client, has to be sent to more infostations. However, the log les usually have to be sent only to infostations that are close to the sender. Furthermore, this communication is done over a wired network and the log les are small (only a few KBytes). Thus, we do not expect any bandwidth problems. A problem that has to be solved in both variants is the initialization of a new infostation or an infostation recovering from a failure. Often the information provider might have an idea about what information items the users prefer and where they will preferably go. Then the information provider can specify the initial APTs and for the second variant also an initial visit probability map. If this is not possible, we have to start with zero knowledge. However, as we show in Section 3, it does not take very long until acceptable hoarding decisions can be made. We can also imagine that each infostation periodically writes its knowledge to stable storage, which could then be used for recovery purposes.
Figure 7: Integration of external maps and the infostation's self-maintained map into the nal map v.
2.4.4 Hoarding Decision
The hoarding decision is based on the nal visit probability map v and the APTs. When a user arrives at an infostation, we rst calculate for every zone z and each information i the probability that the zone z is visited by the user and that he accesses information item i within zone z as: v(z) a(i; z ). Next we can determine, for all information items i, the probability q(i) that the information i is not accessed by the user during his whole trip through the hoarding area: Y q (i) = (1 v (z ) a(i; z )) z2Z
Furthermore, we get for the probability a(i) that the user accesses the information item i during his trip through the hoarding area: Y a(i) = 1 q (i) = 1 (1 v(z) a(i; z)) z 2Z
2.6 Enhancements
Besides the users' locations, further hints can be used to improve the prediction of the information items that a user will request during the disconnected operation. In the following we describe two concepts that exploit such hints and that can be easily integrated in both of the above variants.
If we again assume that a maximum of m items can be transferred to the user's device, we nally select those m items for the hoarding that have the highest access probabilities. If no external knowledge is available, the results achieved with the two variants will be quite the same, because then the access probabilities calculated in the second variant will
2.6.1 Channels
20
The channel concept aims to consider a user's pro le when selecting the information to be hoarded. For example, a tourist using a mobile guide will usually access other information items than a business person using the same system. The idea of the channel concept is to oer a separate channel for each kind of user pro le supported by the information system, e.g. tourists, business persons, and shoppers. Each user can subscribe to one or more channels. If the user subscribes to more than one channel, he has to specify his interest in each of the channels by assigning a value vc between 1 (little interest) and 5 (high interest) to each channel c that he has subscribed to. Although our channel concept would allow to use any number of dierent interest levels, we decided to use ve, since we think that this is a reasonable number. From the values vc we calculate a weight wc for each of the subscribed channels c as follows: wc = Pvc v 8
i
are ltered out. Dependent on the information system the infostations can provide one or more lters, which can be selected and parameterized by the client application or the user according to the individual preferences of the user. If a user needs a very speci c lter, that is not provided by the infostation, we can even imagine that he implements this lter on his own and sends it to the infostation in order to do the desired ltering there. Based on the hoarding decision, as we described it before, we only transferred the n most popular information items, i.e. we implicitly ltered out the less popular items. If we make this ltering explicit, we get a simple lter, that returns the n rst items of each ranking passed to it. More complex lters can, for example, be used if the accessed information space is structured with hyperlinks. Then, we can implement a lter that identi es unreachable items in the rankings. An unreachable item is an item that can not be accessed, since at least one of its predecessors in every access path to the item is not included in the ranking. Another lter may check if the user's device is able to display the information items in the ranking. Besides the ranking, this lter would also need a description of the user's device as a further parameter.
i
The weights of all unsubscribed channels are set to zero. For each channel separate APTs and, in the second variant, also separate visit probability maps are maintained, which exclusively contain information about the access probabilities and visit probabilities of the users that have subscribed to the respective channel. This requires some minor modi cations of the update algorithm of the APTs and visit probability maps. Basically, we have to make sure that the requests and visits in a log le received by an infostation are only considered in the APTs and maps of each channel according to the weight that has been assigned to the channel. For the hoarding decision we calculate for each channel the access probability ac (i) of an information item i as before. However, this time the calculation is based on the channelspeci c APTs and maps. From the channel-speci c access probabilities of an information item i, we calculate a global access probability ag (i) of the item as follows: X ag (i) = wc ac (i);
3. EVALUATION
So far, neither a standard simulation tool for mobile information access nor extensive traces of the accesses to mobile information systems are available. Thus, we developed our own simulation framework to evaluate the hoarding mechanism. Before we describe our experiments, we present the model and the environment that we used for our simulations. 3.1 Simulation Model
Our simulation model is based on a graph representation of the city center of Stuttgart with 115 vertices and 150 edges, which we derived from GIS data. For the sake of clearness we omitted some of the vertices and edges in the illustration of the graph in Figure 8. The vertices represent the locations that the users might visit and the edges model the connections between these locations. The model itself consists of the following three sub-models:
c
8
where wc is the weight of channel c as it has been determined for the downloading user. Finally, the items with the highest global access probabilities are hoarded on the user's device.
3.1.1 Mobility Model
Basically, our mobility model is a travel demand model [13], i.e. we assume that a user makes a trip, because of a certain demand for the trip. We also assume that each user visits one or more locations for whatever purpose he makes his trip. After deciding which locations to visit, he always visits them on the shortest possible path. Besides these destinations of his trip, the user will also visit some intermediate locations on his way to these destinations. The mobility model assigns to every location a probability with which the location is selected as a destination. Thereby, we distinguish between preferred locations, which are of special interest for the user, and non-preferred locations, which are not. Consequently, the probability pp that a preferred location is selected as a destination is usually higher than the probability pn that a non-preferred location is selected. For our experiments we declared ten locations, where tourist attractions are located, as preferred locations.
2.6.2 Plug-in Filters
In both variants of our mechanism, independent of whether or not we use channels, the hoarding decision is nally based on a ranking that lists information items according to their access probabilities. However, it may sometimes happen that an information item should not be hoarded, although it has a high access probability. For example, it would make no sense to transfer a high resolution image to a PDA, whose display has only a resolution of 160 x 160 pixels. In such cases we need a post-processing of the rankings that have been calculated during the hoarding decision. This postprocessing is done by the plug-in lters. Basically a plug-in lter implements a method that takes a ranking as a parameter and returns a reduced ranking, in which all the items that the lter identi ed as useless
21
Figure 8: A graph modeling the city center of Stuttgart.
the current location is 1 pg . Once we know to which set of information items a request refers to, we have to state the item within this set that is actually accessed. Therefore, the access model de nes the probability with which an item is selected from a given set. We assume that these probabilities are Zipf-distributed [4]. We also assume that a user can access the same information item multiple times during his visit of a location, and that he does not request any information, when he revisits a previously visited location. We also made some experiments without the last assumption. They showed that it has almost no eect on the performance of the hoarding mechanism and the hit ratio of the caching mechanism, which we compared to our hoarding mechanism, since the users rarely visit the same location more than once.
3.1.2 Information Model
Since we want to consider location dependent information, we have to model the relationship between locations and information items. This is done in the information model. It assigns to every location a set of information items that can be exclusively accessed from this location. We denote the number of such items assigned to each preferred location as ap and the number assigned to non-preferred locations as an . We use two dierent numbers, since we assume that more information will be oered for the preferred locations, since they are of special interest for the users. In addition, the information model de nes a set of information items that can be accessed from every location. The number of items ag in this global information pool is a further parameter of the information model. We added the global information pool to the information model, in order to be able to model a non-inherently location dependent information access, where some information items might be accessed from all locations. Consider, for example, information about the public transportation system in a mobile guide scenario.
3.2 Simulation Environment
As mentioned before, we developed a simulation framework for mobile information access, which we used for the evaluation of our hoarding mechanism. The framework has been implemented in Java using the JDK 1.2.2. Since one of the main design goals for the framework was exibility, we encapsulated each sub-model in its own class. Thus the sub-models can easily be changed. For our experiments we con gured the framework to use the sub-models described in the previous section. If not otherwise stated, we used the default parameter settings that are summarized in Table 1. For the parameters we chose default values as we expect them for a typical mobile guide system. In addition to the parameters of the simulation model, we also examined the parameters of our mechanism, i.e. the history parameter , the length of the update interval , and the hoardsize. The hoardsize is the number of information items that can be hoarded on a user's device. We chose a default value of 80 information items for it, since we assumed that 4 MBytes
3.1.3 Access Model
The access model re ects the way the users access the information available at each location. It determines for every location, how many of the information items, which are potentially accessible from the location, a user actually requests when he visits the location. Like in the other submodels, we have two dierent parameters rp and rn for preferred and non-preferred locations. They specify how many items a user requests at each kind of location. The access model has a further parameter pg , which states the probability with which a user's request refers to the globally accessible information pool. Accordingly, the probability that the request refers to one of the items exclusively assigned to
22
1
0.8
0.8
0.6
0.6
hit ratio
hit ratio
1
0.4 fine grained variant coarse grained variant
0.2
fine grained variant coarse grained variant caching
0.4
0.2
0
0 0
50
100 150 200 hoardsize [#items]
250
300
0
Figure 9: Hit ratios achieved with dierent hoardsizes.
pp pn ap an ag rp
Value
0.3 0.0 100 items 5 items 1000 items 10 items
Parameter rn pg i
length of hoardsize
0.2
0.3
0.4 0.5 0.6 0.7 selection probability
0.8
0.9
1
Figure 10: Hit ratios depending on the probability of a preferred location to be selected as a destination.
of memory, i.e. half the memory of today's standard PDAs, are available for hoarding and that the average size of an information item is 50 KBytes. If in future devices more memory will be available for the hoarding, this does not inevitably lead to a better performance of our mechanism, because then the time for which a user is staying within an infostation's coverage area will become more important as a limiting factor. Parameter
0.1
items to hoard. In the second variant, we also implemented the possibility of using externally speci ed visit probability maps. However, we did not implement any of the enhancements mentioned in Section 2.6 and any remote communication between infostations and mobile clients. We did not need any remote communication, since we ran the whole simulation on a single system. 3.3 Experiments
Value
The metric that we use in our experiments to rate the performance of our hoarding mechanism is the hit ratio. This is the ratio of the number of information requests that can be answered with the information hoarded on the user's device and the total number of information requests he makes during his trip through the hoarding area. If applicable, we compared the hit ratios achieved with the hoarding mechanism to those achieved with a caching algorithm. We assumed that the memory available for the caching is not limited, i.e. we never remove any information item from the cache. Thus we can show that the value of any caching mechanism, independent of the used replacement strategy, is strongly limited in location dependent information systems, as we explain in more detail in Section 4. We also assumed that the cache is empty at the beginning of each experiment, because this exactly re ects the cache's state, when a user starts using the information system. For the experiments we considered one infostation, whose hoarding area covers the whole city center as it is represented by our graph. In preparation of every experiment, we initialized the infostation with 1000 log les. The experiments themselves consisted of the simulation of another 1000 independent trips. At the beginning of each trip the user performing the trip hoards the information at the infostation. The hoarding decision was always based on the APTs and visit probability map, as they were found after the initialization. For each trip we determined the achieved hit ratio. In the results, we nally show the average of all the 1000 determined hit ratios. In the analyses of the ne grained variant, which are described in this paper, we always assumed that the exact path, on which the user will travel, is externally speci ed and that each location lies in a separate zone. In further experiments we also analyzed the
1 item 0.0 0.5 1 1000 log les 80 items
Table 1: Default parameter settings.
Each simulation run consists of a con gurable number of independent user trips. The simulation of a single trip starts with the determination of the destination locations. Afterwards, the shortest route covering all the destinations is calculated. Then the framework determines for each location on this route the information items that the user requests there. All the visited locations and the information requests are logged to a trace le. These trace les are nally used to test the hoarding mechanism. Besides the experiments described in this paper, we also made some experiments, in which we did not calculate the shortest route, but assumed that the user always chooses the destination that is closest to him as his next destination or that the destinations are visited in a random order. In both cases the results did not dier signi cantly from those presented here. For the testing we used prototype implementations of both variants of our mechanism. These prototypes were also developed in Java using the JDK 1.2.2. The functionality of the prototypes includes the handling of incoming log les, the maintenance of the APTs and the visit probability map (second variant only), and the selection of the information
23
1
1 fine grained variant coarse grained variant caching
fine grained variant coarse grained variant caching
0.8
0.6
hit ratio
hit ratio
0.8
0.4
0.2
0.6
0.4
0.2
0
0 0
0.2
0.4 0.6 access probability
0.8
1
0
Figure 11: Hit ratios depending on the probability of accesses to global information.
25
50
75 100 125 number of items
150
175
200
Figure 12: Hit ratios depending on the number of information items assigned to preferred locations.
contrast, the knowledge of the user's path is useless, when from all locations the same information items are accessed. When the access probability to the global information pool is zero, all information requests refer to one of the 1525 information items that are assigned to a certain location. As soon as we start increasing the access probability to the global information pool, the 1000 items of the global pool might additionally be accessed. Thus the probability that the same item is reaccessed decreases in the beginning. Consequently, the cache hit ratio also decreases. With an increasing access probability to the global pool, however, this eect is more and more compensated by the concentration of the information requests on the items in the global information pool. Therefore the hit ratios nally increase. Due to this concentration the hit ratios of the hoarding mechanism also increase for high access probabilities to global information. For bigger global information pools the observed increment of the hit ratios is smaller and the turning point from decreasing to increasing hit ratios moves to higher access probabilities to the global pool. With our next experiment we wanted to nd out what effect the number of available information items has on the mechanisms' performance. Therefore, we varied the number ap of items assigned to the preferred locations (see Figure 12). If many items are available, only a small part of them can be hoarded. Consequently, the hoarding hit ratios decrease with an increasing number of information items. The caching hit ratios also decrease, since it is less probable that the same item is accessed more than once, if many items are available. However, the eect of increasing the number of available information items weakens, if an already big number of items is further increased. The reason for this is that the added information items are very rarely accessed due to the assumed Zipf-distribution of the items' popularity. We also analyzed how long it takes a new infostation or a recovered infostation to gather enough information about the users' behavior in order to achieve acceptable hit ratios. Our results (see Figure 13) show that already after processing the log les of 20 users, the maximum hit ratio is almost achieved. During this experiment we always set the length of the update interval to the same value as the number
hit ratios that can be achieved, when the user's direction is speci ed through external maps with relative visit probabilities. The results were not surprising, since the achieved hit ratios were, as expected, higher than those achieved without any external knowledge and lower than those achieved with the speci cation of the user's path. In our rst experiment we examined what hit ratios we can achieve with dierent hoardsizes (see Figure 9). Already with the default size of 80 items, we get hit ratios of 43% and 70%. The hit ratios achieved with the ne grained variant are for almost all hoardsizes about 25% higher than those achieved with the coarse grained one. The reason for the non-linear increase is that the most popular items are hoarded rst, i.e. already with small hoardsizes. The items additionally hoarded with big hoardsizes are the less popular ones, which are rarely accessed. Next, we were interested in the eect that the simulation model parameters have on the hit ratios. The rst of these parameters that we examined was the probability pp with which a preferred location is selected as a destination. The results (see Figure 10) show that for all probabilities both variants achieve higher hit ratios than the caching mechanism. The higher the probabilities are, the less bene t we get from using the ne grained variant. For high probabilities the ne grained variant can not pro t from the knowledge of a user's path, since almost all or all possible destinations are visited. Therefore less information items can be rejected from the hoarding due to the fact that the according location will not be visited. So far, we only considered a completely location dependent information access, where the users always access information that is exclusively available at their current location. In our next experiment we analyzed what happens if this is not the case, i.e. if the users also access information from the global information pool. Figure 11 shows the hit ratios for dierent access probabilities to the global information pool. Again, with hoarding higher hit ratios are achieved than with caching. As long as the global information pool is only accessed with low probabilities the ne grained variant is clearly better than the coarse grained one, because then it can again exploit the externally speci ed knowledge. In
24
0.7
0.6
0.6
0.5
0.5 hit ratio
hit ratio
0.7
0.4 0.3 0.2
fine grained variant coarse grained variant
0.3 0.2
0.1
fine grained variant coarse grained variant
0.1
0
0 0
10
20
30 40 50 60 70 80 number of processed log files
90
100
0
Figure 13: Hit ratios depending on the number of processed log les.
4. RELATED WORK
In this section we discuss the work that has been done so far to improve the mobile information access over wireless networks. We evaluate the usability of the existing approaches in the context of location dependent information systems and compare them to our solution. Caching is a method that is also used in wired networks. The idea is to store information, once it is transferred from the server, locally on the user's device. Thus, the information is already available on the device, if it is reaccessed. If no more memory is available to locally store further information items, a caching strategy has to decide, which information items should be removed from the cache. Most of these caching strategies are based on the assumption that there is a temporal locality in a user's request pattern. However, this might not be true in mobile environments, especially when location dependent information is accessed. For example, when a user moves from one location to another, he will probably not be interested in the information concerning the previous location anymore. Therefore, new caching strategies, e.g. [14], have been developed, which rely on a geographical locality. However, independent of the replacement strategy, caching never speeds up the rst access to an information item. Thus the hit ratios achieved in location dependent information systems are not high, if the users do not frequently return to previously visited locations. Chang et al. propose an asynchronous information access [5]. If an information request occurs while no or only a low bandwidth is available, it is delayed until a high bandwidth network connection is available. The problem with this approach is that the users might then not be interested in the requested information anymore, as, in the meantime, they have moved on to another location.
0.65 0.6 0.55 fine grained variant coarse grained variant
0.45 0.4 0.35 0
0.2
0.4
0.6
0.8
100
infostations hardly learn from recently processed log les. However, if it is too small, the infostations forget the knowledge learned from older log les too fast. Thus, a moderate value between 0.5 and 0.7 should be chosen to achieve high hit ratios. Diering from our default value, we set the length of the update interval to 100 during this experiment in order to get more updates for the evaluation of the parameter .
0.7
0.5
10 20 30 40 50 60 70 80 90 length of update interval [#processed log files]
Figure 14: Hit ratios depending on the length of the update interval.
of processed log les. If no log le has been processed at all, the items with the smallest identi ers are transferred to the user's device. Since randomly some of these items are requested by the user during his subsequent trip through the hoarding area, the average hit ratio is higher than 0%, although no log le has been processed. As described in Section 2 each infostation periodically updates its knowledge after a con gurable number of log les has been processed. The length of this update interval also has an eect on the achieved hit ratios (see Figure 14). Although small update intervals allow the infostation to adapt its knowledge faster to changes in the users' behavior, the intervals should not be too small. Otherwise the hit ratios decrease, since the information gathered during each update period only re ects the behavior of a few single users, which can be dierent from that of the average user. As the plot shows, a length of 20 to 30 log les for the update interval is a good choice, since for longer update intervals the hit ratios only slightly increase. Consequently longer update intervals would only unnecessarily slow down the mechanism's reaction on changes in the users' behavior.
hit ratio
0.4
1
alpha
Figure 15: Hit ratios depending on the history parameter .
The history parameter introduced in Section 2 also aects the hit ratios (see Figure 15). If it is chosen too big, the
25
Other approaches, like [2], [7], or [11], are based on broadcast dissemination of information. Their primary focus is to reduce the response time and to increase the scalability of the system. If they are location-aware, they are designed to support the users with the information items they need at their current location, e.g. within the coverage area of one cell of the dissemination system. Therefore, locationaware, broadcast-based dissemination mechanisms do not make any predictions on the information items the users will need after leaving for another location. Furthermore, they are mainly based on the access patterns of the average user and can not exploit external knowledge, which could be speci ed by the user or the application. If the access patterns of the individual users dier strongly the eÆciency of these approaches decreases. However, a broadcast based information dissemination might be useful in our approach to decrease the bandwidth required for the hoarding processes at the infostations. The rst hoarding approaches that were especially designed to support users during disconnections, e.g. the one in the Coda le system [15], relied on user interactions and required a list of the user's preferred information items. This is not applicable in mobile information systems, because the users do not know in advance which information items they will access. Kuenning and Popek propose an automated hoarding mechanism [12], which uses semantic distances between les in order to predict which les a user will need. In contrast to our approach the user's location is not considered there. The hoarding tool described by Tait et al. [16] also relies only on le access patterns. Some other hoarding mechanisms use information about the user's location. In the Map-on-the-Move application [17] the user's position and movement pattern is used for the determination of the items to be hoarded. The authors show that their intelligent hoarding mechanism works very well with the considered map application and that it decreases the response time of the system signi cantly. In this map application it is known which part of the map is accessed at each location. It is also assumed that the start and end points of a user's trip are known. This all eases the information hoarding. In contrast, our mechanism deals with uncertainty in both the user's route and data accessed. De Nitto Persone et al. assume, as we do, that information items can be mapped to certain areas [8]. Their work provides an analysis of the eectiveness of location-aware hoarding. Knowledge about the users' future movements is only used in the case of a linear movement, e.g. along a road. Then, more information belonging to the area in a user's preferred direction is hoarded than for the area in the opposite direction. 5.
suited for mobile information systems, where the information access is location dependent. As examples for such a location dependent information access we mentioned mobile guides and situated information spaces, however there are many more. We even believe that the information access is location dependent in some way in almost every mobile information system. Our simulations also showed that caching, independent of the replacement strategy, only achieves low hit ratios in location dependent information systems. For the near future we plan to implement the enhancements presented in Section 2.6 and to analyze the bene ts we get from these enhancements. We also plan to develop an infostation-based information system, which will allow us to test our mechanism in a real world application. With such a real world application we will also be able to evaluate the eectiveness of further optimizations of our mechanism, e.g. an automatic inference of a user's channel, simpli cations of the APTs, and alternative mechanisms for the maintenance of an infostation's knowledge. Another topic of our future research will be the cooperation of hoarding with other mobile data management mechanisms. 6. ACKNOWLEDGEMENTS
This research was supported by the Deutsche Forschungsgemeinschaft (DFG) within the research group Nexus [10] and the graduate training program Parallel and Distributed Systems (GKPVS). We thank Martin Bauer for reading a draft of this paper and the reviewers for their valuable comments and suggestions. 7. REFERENCES
[1] G. Abowd, C. G. Atkeson, J. Hong, S. Long, R. Kooper, and M. Pinkerton. Cyberguide: a mobile context-aware tour guide. , 3(5):421{433, October 1997. [2] S. Acharya and S. Muthukrishnan. Scheduling on-demand broadcasts: New metrics and algorithms. In , pages 43{54, Dallas, Texas, USA, October 1998. [3] B. R. Badrinath, T. Imielinski, R. Frenkiel, and D. Goodman. Nimble: Many-time, many-where communication support for information systems in highly mobile and wireless environments. http://www.cs.rutgers.edu/badri/dataman/nimble/, 1996. [4] L. Breslau, P. Cao, F. Li, G. Philips, and S. Shenker. Web caching and zipf-like distributions: Evidence and implications. In , pages 126{134, New York City, NY, USA, March 1999. [5] H. Chang, C. Tait, N. Cohen, M. Shapiro, S. Mastrianni, R. Floyd, B. Housel, and D. Lindquist. Web browsing in a wireless environment: Disconnected and asynchronous operation in artour web express. In Wireless Networks
Proceedings of the Fourth Annual International
Conference on Mobile Computing and Networking (MobiCom '98)
CONCLUSION
In this paper we presented a hoarding mechanism that uses information about a user's location, in order to predict what information items he will need. We discussed two variants of the mechanism. The rst one only gathers information about the users' preferred items at the coarse level of hoarding areas, whereas the second variant allows to divide these areas in any desired number of separate zones. As we showed in our evaluation the use of location information for the hoarding decision makes our mechanism well
Proceedings of the IEEE INFOCOM
'99
Proceedings of the Third Annual
International Conference on Mobile Computing and
26
[16] C. D. Tait, H. Lei, S. Acharya, and H. Chang. Intelligent le hoarding for mobile computers. In
, pages 260{269, Budapest, Hungary, September 1997. [6] K. Cheverst, K. Davies, K. Mitchell, and A. Friday. Experiences of developing and deploying a context-aware tourist guide: the guide project. In Networking (MobiCom '97)
Proceedings of the First International Conference on
, pages 119{125, Berkeley, CA, USA, November 1995. [17] T. Ye, H.-A. Jacobsen, and R. Katz. Mobile awareness in a wide area wireless network of info-stations. In , pages 109{120, Dallas, TX, USA, 1998. Mobile Computing and Networking (MobiCom'95)
Proceedings of the Sixth Annual International
Conference on Mobile Computing and Networking
, pages 20{31, Boston, MA, USA, August 2000. [7] N. Davies, K. Cheverst, K. Mitchell, and A. Friday. Caches in the air: Disseminating information in the guide system. In , pages 11{19, New Orleans, USA, February 1999. [8] V. de Nitto Persone, V. Grassi, and A. Morlupi. Modeling and evaluation of prefetching policies for context-aware information services. In , pages 55{64, Dallas, Texas, USA, October 1998. [9] G. Fitzmaurice. Situated information spaces and spatially aware palmtop computers. , 36(7):39{49, July 1993. [10] F. Hohl, U. Kubach, A. Leonhardi, K. Rothermel, and M. Schwehm. Next century challenges: Nexus { an open global infrastructure for spatial-aware applications. In , pages 249{255, Seattle, WA, USA, August 1999. [11] Q. Hu, D. L. Lee, and W.-C. Lee. Performance evaluation of a wireless hierarchical data dissemination system. In , pages 163{173, Seattle, WA, USA, August 1999. [12] G. Kuenning and G. Popek. Automated hoarding for mobile computers. In , pages 264{275, St. Malo, France, October 1997. [13] N. Oppenheim. . Wiley-Interscience, New York, NY, USA, 1995. [14] Q. Ren and M. Dunham. Using semantic caching to manage location dependent data in mobile computing. In , pages 210{221, Boston, MA, USA, August 2000. [15] M. Satyanarayanan, J. Kistler, P. Kumar, M. Okasaki, E. Siegel, and D. Steere. Coda: A highly available le system for a distributed workstation environment. , 39(4):447{459, April 1990.
Proceedings of the Fourth International Conference on
(MobiCom 2000)
Mobile Computing and Networking (MobiCom '98)
Proceedings of the 2nd IEEE
Workshop on Mobile Computing Systems and Applications (WMCSA '99)
Proceedings of
the Fourth Annual International Conference on Mobile Computing and Networking (MobiCom '98)
Communications
of the ACM
Proceedings of the Fifth Annual
International Conference on Mobile Computing and Networking (MobiCom '99)
Proceedings of the Fifth
Annual International Conference on Mobile Computing and Networking (MobiCom '99)
Proceedings of the 16th ACM
Symposium on Operating Systems Principles (SOSP '97)
Urban travel demand modeling: from
individual choices to general equilibrium
Proceedings of the Sixth Annual International
Conference on Mobile Computing and Networking (MobiCom 2000)
IEEE Transactions on Computers
27