Protecting User Trajectory in Location-Based Services - IEEE Xplore

0 downloads 0 Views 382KB Size Report
Then we propose the k-anonymity trajectory (KAT) algorithm, which is suitable for both single query and continuous queries. Different from existing works, the ...
Protecting User Trajectory in Location-Based Services Dan Liao1,2, Hui Li1, Gang Sun1,2, Vishal Anand3 1

Key Lab of Optical Fiber Sensing and Communications (Ministry of Education), University of Electronic Science and Technology of China, Chengdu, China 2 Institute of Electronic and Information Engineering in Dongguan, UESTC, China 3 Department of Computer Science, The College at Brockport, State University of New York, USA

Abstract—Preserving user location and trajectory privacy while using location-based service (LBS) is an important issue. To address this problem, we first construct three kinds of attack models that can expose a user’s trajectory or path while the user is sending continuous queries to a LBS server. Then we propose the k-anonymity trajectory (KAT) algorithm, which is suitable for both single query and continuous queries. Different from existing works, the KAT algorithm selects k-1 dummy locations using the sliding widow based k-anonymity mechanism when the user is making single queries and selects k-1 dummy trajectories using the trajectory selection mechanism for continuous queries. We evaluate and validate the effectiveness of our proposed algorithm by conducting simulations for the single and continuous query scenarios.

Keywords—location privacy; trajectory k-anonymity; single query; continuous queries

I.

privacy;

LBS;

INTRODUCTION

Advances in sensing technology allow us to easily obtain location and positioning information with high accuracy using global positioning systems (GPS). Such location data is being used in a variety of location-based services (LBS), which have rapidly become popular. Using LBS, users can easily get the information they want by using the hand-held terminals (e.g., Smart phones, tablets). For example, users can find the nearest restaurant or hospital [1]. However, using LBS can cause privacy concerns. To use LBS, users send requests to the LBS provider (LP) with their precise location information. However, there is no guarantee that the LP is credible. A malicious LP may steal or expose location data, even sell user location information to a third party. Many researchers have proposed solutions [2-5] to address privacy issues for LBS for single query. The k-anonymity technique [6] is a widely used solution to maintain privacy. In this technique when a user requests for service, the user sends true position data along with k-1 dummy or false location data, so that LBS server cannot distinguish the user’s real location from the k locations. Single query in LBS has been researched in recent years [7, 8]. However, in reality users may send service requests continuously for a period of time. We call this kind of request “continuous queries”. The work in [9] proposed a virtual path programming solution for trajectory preserving and [10] proposed a location anonymity scheme based on the fake queries in continuous location-based services. In [11] the authors employed a Track False Data method with pseudonymization and perturbation, so as to not release user’s real trajectory data, but use false tracks when user sends requests to the LP. In this paper, we design an algorithm based on the k-anonymity technique to protect trajectory privacy of user. Our

scheme is not only suitable for the continuous query scenario, but also for the single query scenario. The main contributions of this paper are as follows: z As we noted that the LP may be malicious and not credible. When a user protects their trajectory using the traditional k-anonymity technique, there are three kinds of scenarios where user’s trajectory privacy may be exposed. Accordingly we propose three attacks: shared attack, roadless attack and probability attack. z To protect location privacy, we propose the k-anonymity trajectory (KAT) algorithm in the single query scenario. KAT algorithm selects other k-1 dummy locations through the Sliding Window based k-anonymity mechanism. z To protect trajectory privacy, we introduce the maximum entropy and the trajectory selection mechanism into our KAT algorithm for choosing other k-1 dummy trajectories to resist attacks in the continuous query scenario.

II. PROBLEM MODELING A. Attack Models We assume that a user ui using the LBS wants to employ the traditional k-anonymity technique for preserving privacy and the LP is malicious. There are three possible cases where the user’s trajectory privacy can be exposed. Accordingly, we propose three attack models using k-anonymity. In the k-anonymity technique, the value k means that there are k candidate locations including k-1 dummy locations and the real location of user. Here, we set k=4. Now suppose that the user sends three queries in a time period. Since, the user sends query with the real location and the other k-1 dummy locations to the LP, so each query includes 4 locations. Hence, we get three areas termed as “anonymous area” for the three queries. (1) Shared-attack

Fig.1: The shared-attack

As shown in Figure 1, there are four locations (denoted by the small squares) in an anonymous area (denoted by circle). The malicious LP scans all trajectories between these locations and gets six trajectories (Ra, Rb, Rc, Rd, Re, Rf) by matching with a map

978-1-4799-5952-5/15/$31.00 ©2015 IEEE

(e.g., Google Maps). Trajectory privacy can be easily exposed when there is only one trajectory shared in all of the anonymous area. So the LP can easily deduce that the user’s real trajectory is Rb, because the intersection of the three anonymous areas is on the trajectory Rb. (2) Roadless-attack As shown in Figure 2, before matching with a map (e.g., Google Maps), the LP may consider the four trajectories are shared in the anonymous areas. However, after matching with the map, the LP discovers that there are no roads on some of the trajectories (pictured using dash line). Accordingly the LP filters the impossible trajectories and infers the user’s real trajectory is Ra. Thus, we should prevent choosing those locati -ons that have no roads between them while selecting dummy trajectories.

which verifies that the user is legitimate and distributes a pseudo identity to the user. The second stage: the validation process. The main purpose of the validation process is to ensure that users do not fabricate the PID and once again confirm that they are legitimate. If user is legitimate, the LP returns some side information [1] to the user. In our paper, the side information is not only limited to the query times for historical locations, but also to the query times for historical trajectories of users. So there are two kinds of side information for our LBS system: historical probability table of location-vertex and historical probability table of trajectory-edge.

Fig.4: The LBS system Fig.2: The roadless-attack

(3) Probability-attack Figure 3 indicates that the four trajectories are shared in all three anonymous areas, and we could know that the four trajectories are real after matching with the map. However, if the attacker gets hold of side information [1], e.g., query proba -bility of location-vertex and trajectory-edge, then he/she would find that there are three trajectories Ra, Rc, Rd (pictured in the red line) have much smaller probability than the trajecto -ry Rb (pictured in the green line). Thus, the attacker can easily infer that user’s real trajectory is Rb.

For the historical probability table of locations (i.e., location-vertex), we divide the area into cells similar to [1]. Then we can get the request times of users in each cell in accordance with the historical statistics of user in single query, and calculate the corresponding historical probability of each cell. So we define the set of historical probabilities of locations as historical probability table of location-vertices denoted by p. For the historical probability table of trajectories (i.e., trajectory-edge), there exists some trajectories (edges) between cells. Then we can calculate the historical probability of each edge basing historical statistics of user continuous queries. So we define the set of historical probability of edges as historical probability table of trajectory-edges denoted by q. The third stage: request processing. Using the information about the historical probability table of location-vertex or historical probability table of trajectory-edge, user chooses other k-1 dummy locations and sends request to the LP. Then the LP responds to the user based on the request content.

III. PRELIMINARIES Fig.3: The probability-attack

B. System Model For privacy preserving, we design a system model for LBS. Figure 4(a) shows the existing LBS system [1], and Figure 4(b) shows our proposed system model involving three major components: USER, PIDS and LP. Where the USER is the subscriber, PIDS is the pseudonym identity server generating a pseudo identity for users, and the LP is the LBS provider. Our system model can be divided into three main stages as follows. First stage: pseudo identity (PID) distribution. When a new user wants to use the LBS services, he registers with the PIDS,

(1) User location: The location of user is denoted by d (x, y), where x is represented of the latitude and y represented of the longitude. (2) Request packet: The request packet of user is denoted by Req=(PID, L, G, t, r, θmax, θmin), where PID is the pseudonym identification to identify the user uniquely, L is the set of positi -ons{d1, d2,…, dk}including the user’s real location and k-1 dummy locations, G is the minimum degree of privacy that the user can accept and implies the number of candidate locations/trajectories in our algorithm. t is the time at which the request is sent, and r is the serviced content of the request (e.g., entertainment, dining, dating). The size of θmax and θmin decides the scope of the area for selecting the dummy location, where

θmax and θmin are the maximum and minimum radius of the circular area. We denote {p1, p2,…, pk} as the k probabilities for the occur -ence of an event, where the sum of all pi is 1. Then the entropy H for the k probabilities is defined as:

Selection Mechanism (TSM) algorithm. If a request does not meet the requirement (e.g., the selected k trajectories cannot attain the anonymity degree D), the KAT algorithm will return to call the SWK algorithm. Algorithm 1 describes the pseudo code of KAT algorithm. begin

. The maximum is Hmax = log2k, where pi=1/k, i=1, 2,…, k. When a user sends request Req to the LP, the content L of Req includes the candidate set of locations, denoted by{d1, d2,…, dk}, whose corresponding probability of location-vertex is{p1, p2,…, pi,…, pk}. We note that this sum of k selected probabilities is less than 1 as the sum of all the probability in historical probability table is 1. Thus, we need to normalize these probabilities, which we denoted as {pd1, pd2, …, pdk}, befo -re computing the entropy. , j = 1, 2,…, k



Now, we can ensure the ∑ compute the entropy.

input

TSM

continuous

Is the first tim e?

Single or continuous request?

Y

singl e

SWK

Apply for PID

N Select the dum my tracks

Det ermine the anonymous area

=1, i=1, 2,…, k. Then we can Meet the requirement?

.

Generate the k sli ding window

N

Y ounput

When a user sends requests continuously, there is a trajecto -ry between the two continuous queries. So the trajectory has a historical trajectory probability q. Matching with the historical probability table of trajectory-edge, we can carefully choose other k-1 dummy trajectories {q1, q2,…, qk-1}. Firstly normalize the probability of k trajectory {q,, q1,, q2,,…, qk-1,}. ,



, i=1, 2,…, k-1



Then we can have the corresponding entropy: ∑

.

.

, i=1, 2,…, k-1

(3) Anonymity degree: As mentioned above G means that there are k candidate locations/trajectories. According to the historical probability table of location-vertex/trajectory-edge, we can get k corresponding probabilities. In order to ensure the k locations have the similar probabilities for achieving the maximum entropy. We define the anonymity degree D. H

D = 2 ∈[G-ε, G] Because the maximum H = log2k, at pi=1/k, i=1,2,…k , so the D ≤G. The bigger the , the more similar the k probabilities. Thus, D can limit the similarity in the k probabilities by setting an appropriate value for ε.

IV. ALGORITHM DESIGN In this section, we explain the proposed K-Anonymity Trajectory (KAT) algorithm. Fig. 5 describes the framework of the KAT algorithm. It uses the Sliding Window based k-anonymity (SWK) algorithm while sending a single request or once before sending continuous requests. When used for continuous requests KAT algorithm calls the Trajectory

end

Fig.5: The framework of the KAT algorithm

Algorithm 1: K-Anonymity Trajectory (KAT) 1: if (single request) 2: call the SWK algorithm. 3: else if (continuous request) 4: call the TSM algorithm. 5: if (it is the first time request || the dummy trajectories don’t meet the requirement) 6: call the SWK algorithm. 7: end if 8: end if A. The SWK Algorithm In this algorithm user sets G, θmax and θmin based on the privacy requirements and gets his real location d1 using some positioning device (e.g. GPS). Also, if time t = 0 and flag = 1, it is the first request of a continuous series of queries, and if flag = 0 it is a single query. Algorithm 2 describes the pseudo code of SWK algorithm. Algorithm 2: Sliding Window based k-anonymity (SWK) Input: p, G, θmax, θmin, d1, t, flag Output: L={d1, d2,…, dk} 1: if ((t == 0 && flag == 1) || flag == 0) 2: Apply for a PID. 3: Compute the scope of anonymous area A by θmax, θmin and d1. 4: Sort the probabilities in the area A in ascending order and let

pd1 is the probability of location d1. if (number (pi< pd1)≥(k-1)/2&& number (pi> pd1)≥(k-1)/2) Initialize the sliding window to include k probabilities which is pd1-centered and slide to both sides and compute anonymity degree D. 7: while (D ∉ [G-ε, G]) 8: Slide the window toward right with one step. 9: if (pd1 ∉ sliding window|| window.size()