Continuous Skyline Monitoring over Distributed Data ... - Springer Link

Continuous Skyline Monitoring over Distributed Data Streams Hua Lu1 , Yongluan Zhou2, and Jonas Haustad2 1

2

Dept. of Computer Science, Aalborg Uni., Denmark [email protected] Dept. of Mathematics and Computer Science, Uni. of Southern Denmark, Denmark [email protected], [email protected]

Abstract. To monitor skylines over dynamic data, one needs to continuously update the skyline query results in order to reflect the new data values. This paper tackles the problem of continuous skyline monitoring on a central query server over dynamic data from multiple data sites. Simply sending the updates of tuple values to the server is cost-prohibitive. Therefore, we propose an approach where the central server collaborates with the data sites to monitor the possible skyline changes. By doing so, the processing load is distributed over all the nodes instead of only on the central server. Furthermore, the approach can minimize the bandwidth consumption between the server and the data sites, which is often critical in a widely distributed environment. Extensive experiments demonstrate that our proposal is efficient and effective.

1 Introduction A skyline query [4] returns from a multi-dimensional data set those points that are not dominated by others. A point is said to dominate the other, if it is not worse than the other in every single dimension and better in at least one dimension. Because of the power in retrieving interesting data according to multiple criteria, skyline queries can be used in many decision making applications. In many applications, multi-dimensional data are often generated from multiple dynamic data sites (e.g., base stations managing sensors or web sites on the Internet). Due to the dynamic nature of data sites, snapshot skyline queries in such environments make little sense for data interpretation and decision making. Instead, continuous skyline monitoring is necessary in such environments. By capturing the continuous query result changes as time elapses, such continuous skyline monitoring is able to detect potential significant events. For example, geologists, oceanographers and seismologists are able to conduct in tsunami forecast and forewarning by analyzing continuous multiple measures including water level, earthquake wave, fall, etc. An efficient skyline monitoring over relevant data sites can determine a successful warning of a deadly tsunami, which as a result may save millions of lives. Another example is avalanche monitoring. The occurrence of avalanche is closely related to the weather conditions, e.g., rapid rise of temperature, a strong wind, heavy snowfall, as well as the strong solar radiation at the day. Consider that a large number of weather stations with multiple sensors are installed in the mountain areas to monitor M. Gertz and B. Ludäscher (Eds.): SSDBM 2010, LNCS 6187, pp. 565–583, 2010. c Springer-Verlag Berlin Heidelberg 2010

566

H. Lu, Y. Zhou, and J. Haustad

the conditions of different places. Continuous skyline monitoring over such stations is able to maintain the locations having the most avalanche-favoring conditions. Continuous skyline monitoring can also be interesting in geographically distributed environments. Within a trade day, for example, a stock trader needs to be continuously aware of which stocks worldwide are worth investing, based on multiple attributes like last sale price, last buy price, volume, etc. Apparently, a continuous skyline monitoring over multiple dynamic stock information sources at different places (e.g., the US market and the Europe market overlap in trading hours) will serve that purpose well. Motivated by the aforementioned observations, we, in this paper, tackle the problem of efficient continuous skyline monitoring in a distributed environment characterized by a central server that acts as the query interface and multiple data sites each maintaining a large number of dynamic data tuples. Our solution for continuous skyline monitoring mainly consists of two phases: initialization and maintenance. In the initialization phase, the initial query result, i.e. the global skyline, is obtained by correctly merging local skyline from all data sites. Based on the initial skyline, all data tuples are categorized with respect to their membership in local skyline and global skyline. Such a categorization is maintained in an efficient way on both server and data sites. To support accurate and efficient skyline monitoring under dynamic data updates, a comprehensive case study for individual data updates is performed, which reveals the minimal skyline changes that can happen as time elapses. Consequently, in the maintenance phase, possible skyline changes are captured accordingly via efficient collaboration between the server and the data sites. In this way, unnecessary processing on server or data sites, and unnecessary communications between server and data sites are avoided. Furthermore, the processing load are distributed over the server and the data sites to avoid the server becoming the bottleneck. In summary, we make the following major contributions in this paper. First, we formalize the continuous skyline monitoring problem in a generic two-tier distributed computing environment, and propose a two-phase solution for such an important problem. Second, we conduct a thorough case study on the possible incremental changes of continuous skyline results. Third, we develop an efficient two-tier continuous skyline maintenance approach based on the case study. Fourth, we evaluate our proposal through extensive experimental study. The remainder of this paper is organized as follows. Section 2 briefly review the related work on skyline queries. Section 3 formalizes the problem statement. Sections 4 and 5 details respectively the initialization and maintenance phases of continuous skyline monitoring. Section 6 presents the experimental results. Finally, Section 7 concludes the paper.

2 Related Work Skyline queries in the centralized data storage. Borzonyi et al. [4] introduced two algorithms Block Nested Loop (BNL) and Divide-and-Conquer (D&C). Chomicki et al. [5] proposed a Sort-Filter-Skyline (SFS) algorithm as BNL’s variant. Tan et al. [18] proposed Bitmap and Index algorithms. Kossmann et al. [10] proposed a Nearest Neighbor (NN) method. Papadias et al. [16] proposed a Branch-and-Bound Skyline (BBS)

Continuous Skyline Monitoring over Distributed Data Streams

567

method. Godfrey et al. [7] did a comprehensive analysis of previous skyline algorithms without indexing supports. Yiu and Mamoulis [22] proposed efficient algorithms to retrieve top-k points that dominate the largest number of points [16]. Lin et al. [13] proposed to select skyline points such that the total number of dominated points is maximized. Dellis and Seeger [6] defined the reverse skyline query that returns the points whose dynamic skyline [16] contains the query point. Morse et al. [14] proposed a lattice based skyline algorithm for data on low cardinality domains. Lee et al. [11] proposed to access points in Z-order in order to compute/update skylines more efficiently. Pei et al. [17] defined probabilistic dominance for uncertain data where each record has several instances. Skyline queries in distributed and dynamic environments. Balke et al. [3] addressed skyline operation over web databases where different dimensions are stored on different data sites. Wu et al. [21] proposed a parallel execution of constrained skyline queries in an overlay network. Huang et al. [8] proposed techniques for efficient distributed skyline query processing in MANETs. Zhu et al. [24] proposed a feedback-based skyline algorithm for geographically distributed servers on the Internet. Huang et al. [9] defined continuous skyline query in a moving setting where dynamic distance between a moving object and all static data point is yet another dimension in the skyline computation. Lin et al. [12] proposed an efficient skyline computation method over sliding window data stream model. Tao and Papadias [19] addressed the similar problem differently by lazy updates and pre-computation. Wu et al. [20] proposed to efficiently maintain skyline with point deletions. Handling point insertion and deletion are also addressed in [24]. Our setting in this paper is different in that updates change point values continuously rather than inserting or deleting points. Recently, Zhang et al. [23] studied the frequent skyline query over a sliding window on continuous objects from dynamic data sites. Given a window Wts (of size of s timestamps until t) and a threshold θ (0 < θ ≤ 1), such a query returns all objects that appear in at leat θ ·s snapshot skylines of all the s ones within Wts . This paper differs from [23] in several important ways. First, each data site in our setting maintains a number of dynamic data tuples/records; while each client in [23] is a dynamic record. Second, our work maintains the real continuous skyline over the dynamic data; while [23] actually looks at several snapshot skylines within the given sliding window. In other words, the query in [23] is still executed one-time, rather than being maintained continuously as in our setting. Third, sampling based approximation is employed in [23] to reduce communication cost; while our continuous skyline query always maintains the exact skyline result efficiently. As a result, the solution in [23] cannot be applied directly to our problem in this paper. Monitoring of distributed dynamic data. Streaming data are often generated from distributed sites. Hence there are much recent research effort devoted to studying continuously monitor over distributed dynamic data. Various query types have been studied. For instance, Babcock et al. [2] studied how to monitor top-k data items efficiently by adaptively setting filters at the data sites. Mouratidis et al. [15] adopted a similar approach to solve the problem of monitoring k nearest neighbors. Zhou et al. [25] studied the problem of multi-join queries over distributed streams. In this paper, we target a

568


much different type of queries, skylines, where we have to consider multiple ranking criteria at the same time. Hence, a much different solution is needed.

3 Problem Setting We consider a typical distributed setting adopted by existing literature, such as [2]. There is a central server responsible for returning the global skyline results. In addition, there are a number of remote N data sites, namely S1 , S2 , . . . , SN . Each site maintains a set of dynamically changing tuples each of which, for example, can be readings from sensors at a particular weather station. The data sites will communicate with the server for data update and perform local filtering to minimize the communication cost. At any time t, the tuple set on data site Si (1 ≤ i ≤ N ) is captured in a local relation Ri (t) with the scheme idtuple , p1 , p2 , . . . , pd , t∗ , where pi (1 ≤ i ≤ d) is a value of the corresponding attribute of interest, idtuple is the tuple identifier, and t∗ is the time when the tuple is obtained from the corresponding data site. Note that idtuple practically serves as the primary key of a local relation. In other words, only the latest instance of any dynamic tuple is kept in a local relation. For a specific tuple tpid with identifier id , we use tpid (t) to denote its instance at timestamp t. For simplicity, we also use tp.attrs to denote tuple tp’s all attributes from p1 to pd . Accordingly, we have a dynamic (virtual) global relation Rg (t) that basically is the union of all local relations at time t, i.e., Rg (t) = 1≤i≤N Ri (t). For the simplicity of representation, we alter the global scheme to idsite , idtuple , p1 , p2 , . . . , pd , t∗ , where idsite indicates the site from which the tuple comes. As a result, idsite , idtuple serves as the primary key of the global relation. In environmental monitoring applications, weather stations attached with several different types of sensors are deployed to monitor the meteorological conditions [1]. Each station, responsible for its own proximity, continuously reports a dynamic tuple of values, such as temperature, solar radiation, and wind speed. To ease management, several stations will be grouped together and their data would be collected and relayed by a common base station, which we call a data site in this paper. For example, Table 1 shows three local relations and the corresponding global relation. Given two tuples tp1 and tp2 , we define the dominance relationship between them in terms of all their pi (1 ≤ i ≤ d) attributes. We say that tp1 dominates tp2 , termed as tp1 tp2 , if ∀1 ≤ i ≤ d, tp1 .pi is no worse than tp2 .pi ; and ∃1 ≤ i ≤ d, tp1 .pi is better than tp2 .pi . Note that “better” and “worse” are generic in the sense that they have different indications in different contexts. Refer to the example in Table 1. Intuitively, a higher value in any of the three reading attributes indicates a higher chance of avalanche. A reading tuple tpi dominates another one tpj , if tpi ’s all attribute values are no smaller than tpj ’s but at least one of tpi ’s attribute values is higher than tpj ’s. As a result, the local skylines and the global skyline are shown in Table 1(e) and (f) respectively. The dominance definition above also applies to the different instances of the same tuple at different timestamps. For example, if tp1 at site 1 is updated to −1.005, 365.292, 5.283 at 16:20, we say the new instance dominates the old one.


569

Table 1. Snapshots of local/global relations and local/global skylines idtuple tp1 tp2 tp3 tp4

temp. -1.150 -4.713 -6.900 -1.280

idtuple tp1 tp2 tp3

temp. -8.588 -1.087 -9.713

idtuple tp1 tp2 tp3

temp. 8.787 -5.588 -14.338

solar radi. 365.292 146.223 194.078 363.165

wind speed 4.282 2.556 6.646 2.548

(a) local relation #1 solar radi. 82.417 309.46 337.642

wind speed 3.763 1.592 2.573

(b) local relation #2 solar radi. 267.455 156.858 247.250

wind speed 2.866 1.383 2.662

(c) local relation #3

t∗ 16:15 16:18 16:18 16:18 t∗ 16:18 16:17 16:18 t∗ 16:18 16:18 16:18

temp. -1.150 -4.713 -6.900 -1.280 -8.588 -1.087 -9.713 8.787 -5.588 -14.338

solar radi. 365.292 146.223 194.078 363.165 82.417 309.46 337.642 267.455 156.858 247.250

idsite site1 site1 site1 site1 site2 site2 site2 site3 site3 site3

idtuple tp1 tp2 tp3 tp4 tp1 tp2 tp3 tp1 tp2 tp3

idsite site1 site2 site3

local skyline {tp1 , tp3 } {tp1 , tp2 , tp3 } {tp1 }

wind speed 4.282 2.556 6.646 2.548 3.763 1.592 2.573 2.866 1.383 2.662

t∗ 16:15 16:18 16:18 16:18 16:18 16:17 16:18 16:18 16:18 16:18

(d) global relation

(e) local skylines


global skyline tuples {tp1 , tp3 } {tp2 } {tp1 }

(f) global skyline

As all tuples are dynamic in our setting, the local skylines and the global skyline are also dynamic. Our goal is to monitor the global skyline continuously in a distributed environment as described above. Problem: (Continuous Skyline Monitoring Query). A continuous skyline monitoring query, termed as CSMQ, is issued against the global relation Rg . It is activated at some time ts , the start time of the query, and terminated at some later time te , the end time of the query. The query result is maintained from time ts to time te in the following set: ∀t ∈ [ts , te ], CSMQ(Rg ) = {tp | tp ∈ Rg (t) ∧ ∃tp ∈ Rg (t), tp tp}. Our solution in this paper consists of two phases. The initialization phase, to be presented in Section 4, obtains the initial query result for a CSMQ query, and initializes necessary settings to facilitate continuous query processing. The maintenance phase, to be presented in Section 5, continuously updates the query result according to the dynamic changes of relevant tuples.

4 Query Initialization When a CSMQ query is activated at time ts , the initialization is conducted in the system as follows. The server first sends query requests to all data sites. Each data site Si in turn sends to the server its local skyline SKi (ts ) that is a subset of its local relation. The server initializes the global skyline SKg according to the global scheme. When the server receives the local skyline from a data site, the incoming skyline will be merged into the current global skyline. After all local skylines are received and merged, the server obtains the initial global skyline and sends necessary control information back to all data sites. The merging procedure (shown in Algorithm 1) is intended to eliminate unqualified temporary skyline points from both the temporary skyline result that is stored in SKg , and the incoming skyline stored in SKin . Given a local skyline SKin from data site Si , portion of SKin may not participate the final global skyline. We call that portion false

570


Algorithm 1. merge (Incoming local skyline SKin , data site identifier idsite , current global skyline SKg ) 1: SKfp [idsite ] ← ∅ 2: for each tuple tpi in SKin do 3: tpi ← tpi .attrs, tpi .idtuple , idsite , tpi .t∗ 4: for each tuple tpj in SKg do 5: if tpi tpj then 6: move tpj from SKg to SKfp [tpj .idsite ] 7: else if tpj tpi then 8: move tpi from SKin to SKfp [idsite ] 9: break 10: SKg ← SKg ∪ SKin

positive skyline. We use an array SKfp [1..N ] to store all such false positive skylines for all data sites. At any timestamp, a tuple tp can be in one and only one state of three possibilities. First, tp can be a non-skyline point. We term this state NS. Second, tp can be a local skyline point on its data site but not a global skyline point with respect to all data sites. We term this state FS, according to the aforementioned false-positive skyline definition. Third, tp can be a global skyline point. We term this state GS. Our goal in this paper is to efficiently maintain the set of all tuples in the GS state when tuples are under possible updates. For that purpose, we are interested in the possible state switching for a single tuple. Figure 1 shows the state diagram of a single tuple.

NS Table 2. SKlg and SKf p

GS

FS Local Skyline


SKlg {tp1 , tp3 } {tp2 } {tp1 }

SKf p ∅ {tp1 , tp3 } ∅

Fig. 1. Tuple State Diagram

On each data site Si , we maintain the following structures: (1) Local skyline SKl , the set of local skyline tuple identifiers. (2) Local global skyline SKlg , the set of local skyline tuple identifiers that participate the global skyline. This corresponds to the tuple state GS. (3) False-Positive skyline SKfp = {tp.idtuple | tp ∈ SKl } \ SKlg , the set of local skyline tuple identifiers that do not participate the global skyline. This corresponds to the tuple state FS. On the server side, we maintain the following structures: (1) Global skyline SKg , the set of global skyline tuples. (2) An array of false-positive skyline tuples SKfp [1..N ]. Here, ∀1 ≤ i ≤ N, SKfp [i] = Si .SKfp .


571

Table 3. Change cases from time t to t (t > t) tp(t) ∈ NS tp(t ) ∼ tp(t) Case 1 (Q1, Q2) tp(t) tp(t ) Case 2 (–) tp(t ) tp(t) Case 3 (Q1, Q2)

tp(t) ∈ FS Case 4 (Q1, Q2, Q4) Case 5 (Q4) Case 6 (Q1, Q2)

tp(t) ∈ GS Case 7 (Q1, Q2, Q3) Case 8 (Q1, Q3) Case 9 (Q2)

Refer to the running example. The local data structures SKlg and SKfp are shown in Table 2. After all local skylines are merged, the server initializes the relevant data structures, and sends to each data site all its false-positive skyline tuple identifiers. When a data site receives that identifier set, it initializes local data structures accordingly. Note that the server and the sites will make use of their initializations subsequently in continuous skyline monitoring, as to be detailed in Section 5.

5 Continuous Skyline Monitoring In this section we elaborate how the result of a CSMQ query is continuously maintained in the presence of dynamic updates from data sites. To be able to design concrete algorithms for the maintenance phase, we first proceed to discuss the cases of possible skyline changes caused by dynamic tuples from data sites. 5.1 Cases of Possible Skyline Changes As tuples get updated continuously, the global skyline also needs to be maintained accordingly and correctly. The initial skyline SKg (ts ) obtained in Section 4 will serve as the starting point of the continuous maintenance. Suppose the correct skyline at time t ≥ ts is SKg (t) = {tp | tp ∈ Rg (t) ∧ ∃tp ∈ Rg (t), tp tp}, and data site Si gets an updated tuple as tp(t ) at a later time t > t, we need to determine the correct skyline SKg (t ) at time t . For that purpose, we need to consider for tp(t ) all or part of the three particular questions as follows. Question 1. Is tp(t ) dominated by no skyline point in SKg (t) (or SKl (t))? If positive, tp(t ) will be in SKg (t ) (or SKl (t)). Question 2. Does tp(t ) dominate any skyline point in SKg (t) (or SKl (t))? If positive, relevant old skyline points will expire and be out of SKg (t ) (or SKl (t)). Question 3. Does tp(t ), a global skyline point at time t, stop dominating any nonskyline point solely dominated by tp(t) at time t? If positive, relevant old non-skyline points will enter SKg (t ). In addition, it is also of interest to know the answer to the following question, so that the local and global structures for false positive skyline will be correctly updated. This is necessary for facilitating further continuous query processing. Question 4. Does tp(t ) stop being a false-positive global skyline point because it is now dominated by some local skyline point? If positive, tp(t ) should be removed from local SKfp and server side SKfp [1..N ].

572


These questions, on the other hand, are closely related to two aspects: the membership of tp(t) in SKg (t), and the dominance relationship between tp(t) and tp(t ). All possible cases with respect to these two aspects are listed in Table 3. In this table, each case is attached with questions that can have positive answers, which indicates possible skyline changes. We proceed to explain how SKg (t) will evolve to SKg (t ) in each case. Case 1. In this case, it is possible that tp(t ) at time t is no longer dominated by any skyline point in SKg (t) (Question 1), which makes it become a new skyline point and SKg (t ) = SKg (t) ∪ {tp(t )}. It is also possible that tp(t ) dominates a subset SK ⊆ SKg (t) (Question 2), which drives any point in SK out of the skyline. It is noteworthy that Question 2 can have positive answer only if the answer to Question 1 has been proved positive. As a result, the skyline will be SKg (t ) = (SKg (t) \ SK) ∪ {tp(t )}. Whereas the answer to Question 3 is negative in this case. Since tp(t) is not a skyline point at time t, it must be dominated by some skyline point p ∈ SKg (t). As a result, any non-skyline point dominated by tp(t) at time t must also be dominated by p, because of the transitivity of dominance [4]. It is therefore impossible for tp(t) to solely dominate any non-skyline point at time t. Case 2. In this case, we have negative answers to all three questions. The answer to Question 1 is negative, because tp(t ) must be dominated by some skyline point p ∈ SKg (t) that dominates tp(t) at time t. The answer to Question 2 is also negative, because otherwise the skyline point q ∈ SKg (t) would also be dominated by p aforementioned and cannot be a skyline point at all. The answer to Question 3 is negative for the same reason in Case 1. As a consequence, the skyline in this case will not change from time t to t , i.e. SKg (t ) = SKg (t). Case 3. This case is similar to Case 1. Case 4. This case is similar to Case 1, except that Question 4 should be checked because tp may no long be a false positive global skyline point. The answer can be decided locally as we only need to check the updating tp with all other local skyline points. Case 5. This case is similar to Case 2, except that we need to check the answer to Question 4. Case 6. This case is similar to Cases 1 and 3. Here, tp(t ) dominates its old instance tp(t) which was a local skyline point. Therefore, tp(t ) cannot be dominated by any local skyline point, and it is unnecessary to check the answer to Question 4. Case 7. In this case, answers to all three questions can be positive, because tp(t) is a skyline point at time t. If the answer to Question 1 is positive, the skyline keeps unchanged from time t to t , i.e. SKg (t ) = SKg (t). Otherwise, the skyline will be SKg (t ) = SKg (t) \ {tp(t)}. It is still true that Question 2 can have positive answer only if Question 1 does. If the answer to Question 2 is positive, the new skyline becomes SKg (t ) = SKg (t) \ SK, where SK is defined the same as in Case 1. If the answer to Question 3 is positive, which guarantees a set P ⊆ Rg (t)\SKg (t) of all those non-skyline points that are solely dominated by tp(t) at time t but not by tp(t )


573

at time t , the skyline will become SKg (t ) = SKg (t) ∪ P . Here, checking Question 3 does not depend on the answer to Questions 1 or 2. Case 8. This case is similar to Case 7, except that it is not necessary to check the answer to Question 2. In this case, tp(t) dominates its new instance tp(t ), which makes it impossible for tp(t ) to dominate any local/global skyline points. Because otherwise tp(t) would have already dominated such skyline points. Case 9. In this case we have positive answer to Question 1 and negative answer to Question 3. Since tp(t) is a skyline point, tp(t ) with a higher dominating capability must stay in the skyline and still dominate all those non-skyline points dominated by tp(t). The answer to Question 2 can be positive, because tp(t ) may dominate some old skyline points. Consequently, the new skyline becomes SKg (t ) = SKg (t) \ SK, where SK is defined the same as in Case 1. With the case study on skyline changes, we are ready to design an efficient solution to dynamically maintain skylines in the distributed environment. 5.2 Processing on the Updating Data Site When an update for tuple tp comes to data site Si at time t , the site needs to decide the change case type. This requires availability of two pieces of information: the dominance relationship between tp(t) and tp(t ), and whether tp(t) belongs to SKg (t) or not. The former is obtained by comparing tp(t) and tp(t ); the latter is obtained by checking the reduced local skyline signature SKlg . Both are done locally at data site Si . After deciding the change case, site Si sends to the server a specific update message together with corresponding information. Cases 1, 3 and 6 are processed according to Algorithm 2. Here, Question 2 is only checked when the answer to Question 1 is positive. Therefore, the process of checking Question 2 continues only if tp(tc ) is not dominated by any local skyline point (line 1). Particularly, tp(tc ) is added into the local skyline if the case type is not 6 (lines 2– 3). Furthermore, the old local/global skyline points that are dominated by tp(tc ) are eliminated accordingly (lines 4–8). Then, a specific update message is sent to the server (line 9). Otherwise, tp(tc ) is dominated and removed from SKfp (lines 10–11). Case 4 is processed according to Algorithm 3. If the updating tuple tp(tc ) now is dominated by some local skyline point (line 1), it is removed from the local falsepositive skyline SKfp if its old instance is there (lines 2–3), and then a specific message is sent to the server to remove tp from SKfp there (line 4). Otherwise, tp(tc ) is not dominated by any local skyline. Similar to the procedure for cases 1, 3 and 6, Question 2 is to be checked (lines 5–11). Case 5 is processed according to Algorithm 4, which is similar to the first part of the procedure for case 4. Case 7 is processed according to Algorithm 5. Its first part (lines 1–6) is similar to the counterpart in Algorithm 2. Next, it continues to check the answer to Question 3, by obtaining those tuples that stop being dominated by their sole dominator tp at the updating time (lines7–10). Here, DR(tp) denotes the dominating region [8] of a tuple tp. All such tuples are added into the local skyline (line 11). Finally, a specific message is sent to the server (line 12).

574


Algorithm 2. updateCase136 (Updating tuple tp, current timestamp tc , local global skyline SKlg ) 1: if tp ∈ SKl s.t. tp tp(tc ) then 2: if the case type is not 6 then 3: add tp(tc) to SKl 4: SKexp ← {id ∈ SKlg | tp(tc ) Ri [id]} 5: SKlg ← SKlg \ SKexp 6: SKexpf ← {id ∈ SKf p | tp(tc ) Ri [id]} 7: SKf p ← SKf p \ SKexpf 8: SKl ← {tp ∈ SKl | tp .idtuple ∈ SKexp ∪ SKexpf } 9: send UPT MSG CASE 136, tp, SKexp , SKexpf to the server 10: else if tp.idtuple ∈ SKfp then 11: remove tp.idtuple from SKfp

Algorithm 3. updateCase4 (Updating tuple tp, current timestamp tc , local global skyline SKlg ) 1: if ∃tp ∈ SKl s.t. tp tp(tc ) then 2: if tp.idtuple ∈ SKfp then 3: remove tp.idtuple from SKfp 4: send UPT MSG CASE 4 Q4, tp.idtuple to the server 5: else 6: SKexp ← {id ∈ SKlg | tp(tc ) Ri [id]} 7: SKlg ← SKlg \ SKexp 8: SKexpf ← {id ∈ SKf p | tp(tc ) Ri [id]} 9: SKf p ← SKf p \ SKexpf 10: SKl ← {tp ∈ SKl | tp .idtuple ∈ SKexp ∪ SKexpf } 11: send UPT MSG CASE 4, tp, SKexp , SKexpf to the server

Case 8 is processed according to Algorithm 6. In this case, it is impossible for the disadvantaged new instance of the updating tuple to dominate any local/global skyline points. However, it is still possible that the new instance is dominated by some local/global skyline points. It is also possible that such a local skyline point dominating the tp(tc ) may be dominated by tp’s old instance. Therefore, we here check the answer to Question 3 first (lines 1–5), followed by steps of checking that to Question 1 (lines 6–10).

Algorithm 4. updateCase5 (Updating tuple tp, current timestamp tc , local global skyline SKlg ) 1: if ∃tp ∈ SKl s.t. tp tp(tc ) then 2: if tp.idtuple ∈ SKfp then 3: remove tp.idtuple from SKfp 4: send UPT MSG CASE 5, tp.idtuple to the server


575

Algorithm 5. updateCase7 (Updating tuple tp, current timestamp tc , local global skyline SKlg ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

if tp ∈ SKl s.t. tp tp(tc ) then SKexp ← {id ∈ SKlg | tp(tc ) Ri [id]} SKlg ← SKlg \ SKexp SKexpf ← {id ∈ SKf p | tp(tc ) Ri [id]} SKf p ← SKf p \ SKexpf SKl ← {tp ∈ SKl | tp .idtuple ∈ SKexp ∪ SKexpf } Pc ← {tp ∈ Ri | tp ∈ DR(Ri [tp.idtuple ]) − DR(tp(tc ))} for each tuple tp ∈ Pc do if ∃tp ∈ SKl \ {tp(t)} s.t. tp tp then remove tp from Pc SKl ← SKl ∪ Pc send UPT MSG CASE 7, tp, SKexp , SKexpf , Pc to the server

Algorithm 6. updateCase8 (Updating tuple tp, current timestamp tc , local global skyline SKlg ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:

Pc ← {tp ∈ Ri | tp ∈ DR(Ri [tp.idtuple ]) − DR(tp(tc ))} for each tuple tp ∈ Pc do if ∃tp ∈ SKl \ {tp(t)} s.t. tp tp then remove tp from Pc SKl ← SKl ∪ Pc if ∃tp ∈ SKl s.t. tp tp(tc ) then SKexp ← {tp.idtuple } SKl \ SKexp else SKexp ← ∅ send UPT MSG CASE 8, tp, SKexp , Pc to the server

Algorithm 7. updateCase9 (Updating tuple tp, current timestamp tc , local global skyline SKlg ) 1: 2: 3: 4: 5: 6:

SKexp ← {id ∈ SKlg | tp(tc ) Ri [id]} SKlg ← SKlg \ SKexp SKexpf ← {id ∈ SKf p | tp(tc ) Ri [id]} SKf p ← SKf p \ SKexpf SKl ← {tp ∈ SKl | tp .idtuple ∈ SKexp ∪ SKexpf } send UPT MSG CASE 9, tp, SKexp , SKexpf to the server

576


Case 9 is processed according to Algorithm 7, which is similar to its counterparts in Algorithms 2 and 3. 5.3 Update Processing on the Server Upon the receipt of an upstream update message from an updating data site, the server processes the message according to the procedure described in Algorithm 8. If the message type is UPT MSG Case 4 Q4 or UPT MSG Case 5, the updating tuple will be moved out of the false positive skyline array since it is no longer a false positive global skyline point (lines 1–2). Otherwise, the server needs to check the answers to Questions 1, 2 and 3 locally, and send corresponding downstream messages to involved data sites (lines 3–37). Specifically, other message types are processed as follows. First, a temporary array is initialized to contain possible expiring global skyline points (line 4). Second, those global skyline points that are no longer local skyline points on the updating site are removed from the global skyline (line 6). Next, if the message type is not UPT MSG Case 8, those false positive global skyline points that are no longer local skyline points on the updating site are removed from the false positive skyline array (lines 7–8). After that, all tuples that may enter the global skyline are added into set gs (lines 9– 11), including the updating tuple and those sent by the site if it is a UPT MSG Case 7 or UPT MSG Case 8 message. Subsequently, Question 1 is checked by comparing the updating tuple with the current global skyline points not coming from the updating site (lines 12–24). If the updating tuple is dominated, the process will stop immediately (lines 15–20). Those existing global skyline points that come from the updating site but are dominated by the updating tuple are removed (line 17–18). Otherwise, all invalid global skyline points are moved to the temporary array for later use (lines 21–22). After that, it is confirmed that the updating tuple is not dominate and it is (still) a global skyline point. All remaining tuples in gs are added into the global skyline, and a corresponding downstream message is sent to the updating site (lines 23–24). Furthermore, invalid global skyline points, which are still local skyline points on their own data sites, are moved from the temporary array to the false positive skyline array, and a corresponding downstream message is sent to each involved data site for local update (lines 25–28). Finally, in the case of a UPT MSG Case 7 or UPT MSG Case 8 message, Question 3 is checked for each particular data site, in order to identify those current false positive skyline tuples that are no longer dominated (lines 29–37). All such tuples are obtained by dominance comparison with the updating tuple and all current global skyline tuples (line 33). If they do exist, they are moved from the false positive skyline to the global skyline (lines 35–36), and a corresponding downstream message is sent to the involved site for local update (line 37). 5.4 Passive Update Processing on Data Sites When the server processes an update message, some data sites may receive downstream messages from the servers. Such data sites (including the updating site) also need to do


577

Algorithm 8. serverUpdate (Update message msg, data site identifier id) 1: if msg.type ∈ {UPT MSG Case 4 Q4, UPT MSG Case 5} then 2: remove the tuple identified by msg.tp.idtuple from SKfp [id] 3: else 4: for i from 1 to N do 5: SKexp [i] ← ∅ 6: SKg ← {tp ∈ SKg | tp.idsite = id ∨ tp.idtuple ∈ msg.SKexp } 7: if msg.type = UPT MSG Case 8 then 8: SKfp [id] ← SKfp [id] \ msg.SKexpf 9: gs ← msg.tp 10: if msg.type ∈ {UPT MSG Case 7, UPT MSG Case 8} then 11: gs ← gs ∪ {tp ∈ msg.Pc | tp ∈ SKg \ {msg.tp(t)} s.t. tp tp} 12: for each tuple tp in SKg do 13: if tp.idsite = id then 14: continue 15: if msg.type = UPT MSG Case 9 and tp msg.tp then 16: SKexp [id] ← SKexp [id] ∪ {msg.tp} 17: if ∃tp ∈ SKg s.t. tp .idtuple = msg.tp.idtuple ∧ tp .idsite = id ∧ msg.tp tp then 18: SKg ← SKg \ {tp } 19: remove msg.tp from gs 20: break 21: if msg.tp tp then 22: move tp from SKg to SKexp [tp.idsite ] 23: SKg ← SKg ∪ gs 24: send UPT MSG GS, {tp.idtuple | tp ∈ gs} to data site Sid 25: for i from 1 to N do 26: if SKexp [i] = ∅ then 27: SKfp [i] ← SKfp [i] ∪ SKexp [i] 28: send UPT MSG GS2FS, SKexp [i] to data site Si 29: if msg.type ∈ {UPT MSG Case 7, UPT MSG Case 8} then 30: for i from 1 to N do 31: if i = id then 32: continue 33: f ps ← {tp ∈ SKfp [i] | msg.tp(t) tp ∧ msg.tp(tc) tp ∧ (tp ∈ SKg \ {msg.tp(t)} s.t. tp tp)} 34: if f ps = ∅ then 35: SKfp [i] ← SKfp [i] \ f ps 36: SKg ← SKg ∪ f ps 37: send UPT MSG FS2GS, {tp.idtuple | tp ∈ f ps} to data site Si

passive updates on their local data structures accordingly. The passive update processing on a relevant data site is described in Algorithm 9. If the downstream message type is UPT MSG GS, it is sent to the updating data site to indicate that the updating tuple and/or some other local points (sent in the Pc set in a UPT MSG CASE 7 or UPT MSG CASE 8 message) are global skylines now. Therefore, the corresponding tuple identifiers are added to the local global skyline (lines 1–2).

578


Algorithm 9. psvSiteUpdate (Update message msg) 1: if msg.type = UPT MSG GS then 2: SKlg ← SKlg ∪ msg.id set 3: Px ← Px \ msg.id set 4: SKfp ← SKfp ∪ Px 5: Px ← ∅ 6: else if msg.type = UPT MSG FS2GS then 7: SKfp ← SKfp \ msg.id set 8: SKlg ← SKlg ∪ msg.id set 9: else if msg.type = UPT MSG GS2FS then 10: SKlg ← SKlg \ msg.id set 11: SKfp ← SKfp ∪ msg.id set

However, not all local skyline tuples sent to the server become global skyline points. Some may be eliminated by the server. Accordingly, we need to put the identifiers of such false positive skyline tuples to the local SKfp structure. To ease the processing, we maintain on the updating site a local variable Px = {tp.idtuple | tp ∈ Pc or tp is the updating tuple}. Particularly, Px is set after each local update is done. When processing a UPT MSG GS message, the identifiers of those new false positive skyline tuples are obtained by the difference between Px and the returned identifier set in the message (line 3). Those tuples are merged to SKfp and Px is reset to empty (lines 4–5). If the downstream message type is UPT MSG FS2GS, the involved tuple identifiers will be moved from the local SKfp to SKlg , as such tuples are promoted into the global skyline (lines 6–8). If the message type is UPT MSG GS2FS, the involved tuple identifiers will be moved from the local SKlg to SKfp , because such tuples are no longer global skyline points (lines 9–11). It is noteworthy that a downstream message cannot force a tuple in the local SKlg or SKfp to move out to enter the non-skyline set. For the local skyline membership to expire, no matter the tuple is in the global skyline or not, the update causing the change must come from the same data site. Such a change is processed before the data sites sends a upstream message to the server, as described in Section 5.2. 5.5 Brief Analysis on Algorithm Costs In this section, we briefly analyze the costs of those algorithms proposed above. Table 4 lists the notations used in this section. Table 5 lists the worst-case costs of main updating algorithms, where we regard the dominance comparison between two tuples as the crucial operation. Algorithms 2, 3, 4, and 7 need to compare the updating tuple tp with each local skyline tuple in the worst case, which incurs the cost of O(si ). In the worst case, Algorithms 5 and 6 need to compare the updating tuple tp with all local tuples in order to find all possible new local skyline points that used to be dominated by the old instance of tp, i.e., set Pc in the algorithms. Furthermore, set Pc is compared against the local skyline to find all those points that used to be dominated solely by the old instance of tp. Therefore, their worst-case cost is O(ri ) + O(|Pc | · si ).

Continuous Skyline Monitoring over Distributed Data Streams Table 4. Notation Nota. S F ri si fi

Description Cardinality of SKg Total number of false positive skyline points Cardinality of Ri on the i-th data site Cardinality of SKlg on the i-th data site Cardinality of SKfp on the i-th data site

579

Table 5. Worst-Case Costs Algorithms Alg. 2 Alg. 3 Alg. 4 Alg. 5 Alg. 6 Alg. 7 Alg. 8

Worst-Case Cost O(si ) O(si ) O(si ) O(ri ) + O(|Pc | · si ) O(ri ) + O(|Pc | · si ) O(si ) O(S · F )

The cost of Algorithm 8 mainly comes from two parts. It needs to compare the updating tuple with all global skyline points, which incurs O(S) in the worst case. Also, in the worst case for a UPT MSG Case 7 or UPT MSG Case 8 message, it needs to compare each false positive skyline point with all global skyline points in order to decide whether a false positive skyline point should be promoted into the global skyline or not. This incurs the cost of O(S · F ). As a result, the worst case cost of Algorithm 8 is O(S) + O(S · F ) = O(S · F ).

6 Experimental Studies 6.1 Experimental Settings We call the solution proposed in this paper the Global approach, and compare it with two alternatives. In the Naive approach, each data site maintains its relation of up-todate tuples, and the server maintains the global relation Rg and the global skyline SKg . A data site sends each tuple update to the server, which in turn triggers the global skyline update accordingly. The update is done in two aspects: removing dominated points from Rg , and adding qualified points from Rg \ SKg to SKg . In the initialization phase, the server only computes the initial global skyline without obtaining any other information. In the Local approach, each data site maintains its relation of up-to-date tuples and its local skyline, and the server maintains the global skyline only. A data site only sends to the server a tuple update that changes its local skyline. When the server receives a tuple update, it updates the global skyline by checking the answers to Questions 1, 2, and 3 defined in Section 5.1. Note that as no auxiliary information is maintained on the server, it has to send necessary messages to relevant data sites to answer Question 3. We claim the local approach as our contribution in the sense that it is a weakened version of the global approach. We consider three performance metrics in the experiments. (1) bandwidth consumption, where we we count the total sizes of tuples and identifiers sent between the server and the data sites, (2) server processing time, where we measure the average time spent on performing the updates the results at the server, and (3) site processing time, i.e. the average processing time on all the data sites for maintaining the updates over the simulation period. All the algorithms are implemented with Java 1.6 and the experiments are run on a Linux desktop with an Intel Core 2 Duo CPU @1.86GHz and 2G RAM.

580

H. Lu, Y. Zhou, and J. Haustad Table 6. Parameter Settings Parameter Dimensionality # of data sites Data distribution

Settings 2, 3, 4, 5 100, 200, . . . , 500, . . . , 800 Random (Indep.), Anti-corr.

We fix the local cardinality of each site at 100 tuples, and update all tuples for 100 rounds. All tuple values on each dimension are normalized to the range [0, 1]. At each round, the ratio of tuples that really get updated by default set to 1%, which will be varied. The other experimental parameters are varied according to Table 6. Default setting values are shown in bold. In the whole period of experiment, updates of each particular tuple are generated according to a Gaussian distribution with a standard deviation of 0.05 and a mean of the tuple’s old value. 6.2 Experimental Results

Anti-correlated, Dim: 3 - Bandwith usage 5

1.0⋅10

5

8.0⋅10

4

Global Local Naive

6.0⋅104 4.0⋅104 2.0⋅104 0.0⋅10

0

200

Anti-correlated, Dim: 3 - Server processing time

400 600 800 Number of Sites

(a) Bandwidth consumption

1200 1000

Global Local Naive

800 600 400 200 0 200

Anti-correlated, Dim: 3 - Site processing time Site processing time (msec)

1.2⋅10

Server processing time (msec)

Bandwidth usage (kb data transfered)

Figure 2 reports the results on the effect of site number on anti-correlated data sets. As the number of data site increases, all approaches degrade. Regarding the bandwidth consumption, referring to Figure 2(a), the global approach is the best as it reduces a considerable parts of tuples and messages that otherwise would be sent via the network. Referring to Figure 2(b), the global approach incurs the least server processing time, as it maintains the continuous skyline incrementally and updates the result only when it is necessary. Referring to Figure 2(c), the global approach incurs slightly more site processing time than the naive approach. This is merely because the latter actually does no local processing at all but sending an updated tuple to the server whenever an update happens locally.

400 600 Number of Sites

800

(b) Server processing time

550 500 450 400 350 300 250 200 150 100 50 0

Global Local Naive

200


800

(c) Site processing time

Fig. 2. Effect of Site Number on Anti-correlated Data Sets

Figure 3 reports the results on the effect of tuple dimensionality on anti-correlated data sets. Higher dimensionality increases the costs of all approaches. The global approach still has the lowest bandwidth consumption and server processing time. Its gap between the naive approach regarding the site processing time becomes apparent, because the local skyline sizes become larger as dimensionality increases.

Anti-correlated, Sites: 500 - Bandwith usage 5

2.0⋅10

5

Anti-correlated, Sites: 500 - Server processing time

1.5⋅105 1.0⋅105 5.0⋅10

4

0.0⋅10

0

2

3 4 Tuple Dimension

5


1.0⋅10 4 9.0⋅10 4 8.0⋅10 7.0⋅104 6.0⋅104 5.0⋅104 4.0⋅104 3.0⋅104 4 2.0⋅10 4 1.0⋅10 0 0.0⋅10

Global Local Naive

2

3 4 Tuple Dimension

581

Anti-correlated, Sites: 500 - Site processing time

5

Global Local Naive

Site processing time (msec)

2.5⋅10




600 550 500 450 400 350 300 250 200 150 100

5

Global Local Naive

2


3 4 Tuple Dimension

5


Fig. 3. Effect of Dimensionality on Anti-correlated Data Sets

Random, Dim: 3 - Bandwith usage

1.0⋅10

Random, Dim: 3 - Server processing time

Global Local Naive

5

8.0⋅104 6.0⋅104 4.0⋅10

4

2.0⋅104 0.0⋅10

0

200

140

100 80 60 40 20 0

400 600 800 Number of Sites

200


Random, Dim: 3 - Site processing time

Global Local Naive

120


1.2⋅105



The same kinds of results on random data sets are reported in Figures 4 and 5. Regarding bandwidth consumption, as shown in Figures 4(a) and 5(a), The global approach is still the best. Whereas the local approach catches up and outperforms it with even less server processing time, according to Figures 4(b) and 5(b). Local skyline sizes become smaller on random data sets, and therefore fewer updating tuples sent to the server, which favors the local approach that updates the global skyline directly on the server side. Referring to Figures 4(c) and 5(c), the global approach reclaims its advantage by short site processing time. This is because the smaller local skyline sizes reduce the local processing cost by the global approach; whereas the naive approach still needs to report each updating tuple to the server.


500 450 400 350 300 250 200 150 100 50 0

Global Local Naive

800

200



800


Random, Sites: 500 - Bandwith usage 5

2.0⋅10

5

Random, Sites: 500 - Server processing time

Global Local Naive

1.5⋅105 1.0⋅10

5

5.0⋅10

4

0.0⋅100

2

3 4 Tuple Dimension


5

1600

Random, Sites: 500 - Site processing time 500

Global Local Naive

1400 1200


2.5⋅10



Fig. 4. Effect of Site Number on Random Data Sets

1000 800 600 400 200 0

Global Local Naive

450 400 350 300 250 200 150 100

2

3 4 Tuple Dimension


5

2

3 4 Tuple Dimension


Fig. 5. Effect of Dimensionality on Random Data Sets

5

582


We also varied the tuple updating ratio from 0.1% to 1%, and observed similar trends for various ratios. Due to the space limitation, we omit such results here.

7 Conclusion In this paper we address continuous skyline monitoring in distributed environments. We target a generic type of computing environments with two-tiers: a server as query interface and multiple data sites each managing a number of dynamic data tuples. Our solution consists of two phases: initialization and maintenance. We propose a complete set of techniques in order to maintain the continuous skyline results efficiently. First, in the initialization phase, the initial query result is obtained and necessary membership information is initialized on both tiers. Second, a comprehensive case study is conducted to disclose the minimal skyline changes under dynamic data updates. Third, an effective two-tier collaboration is proposed to process possible skyline changes and to update the query results continuously in an incremental manner. The results of extensive experiments demonstrate that our proposal is efficient and scalable in terms of both communication costs and processing costs. Acknowledgments. Yongluan Zhou is partially supported by research grant 09-073281 from the Danish Council for Independent Research — Natural Sciences (FNU).

References 1. SensorScope Project, http://sensorscope.epfl.ch/ 2. Babcock, B., Babcock, B., Olston, C.: Distributed top-k monitoring. In: Proc. SIGMOD, pp. 28–39 (2003) 3. Balke, W.-T., Guentzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 256–273. Springer, Heidelberg (2004) 4. Borzonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proc. ICDE, pp. 421–430 (2001) 5. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proc. ICDE, pp. 717–719 (2003) 6. Dellis, E., Seeger, B.: Efficient computation of reverse skyline queries. In: Proc. VLDB, pp. 291–302 (2007) 7. Godfrey, P., Shipley, R., Gryz, J.: Maximal vector computation in large data sets. In: Proc. VLDB, pp. 229–240 (2005) 8. Huang, Z., Jensen, C.S., Lu, H., Ooi, B.C.: Skyline queries against mobile lightweight devices in MANETs. In: Proc. ICDE, p. 66 (2006) 9. Huang, Z., Lu, H., Ooi, B.C., Tung, A.K.H.: Continuous skyline queries for moving objects. TKDE 18(12), 1645–1658 (2006) 10. Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: An online algorithm for skyline queries. In: Proc. VLDB, pp. 275–286 (2002) 11. Lee, K.C.K., Zheng, B., Li, H., Lee, W.-C.: Approaching the skyline in z order. In: Proc. VLDB, pp. 279–290 (2007)


583

12. Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: Efficient skyline computation over sliding windows. In: Proc. ICDE, pp. 502–513 (2005) 13. Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: The k most representative skyline operator. In: Proc. ICDE, pp. 86–95 (2007) 14. Morse, M.D., Patel, J.M., Jagadish, H.V.: Efficient skyline computation over low-cardinality domains. In: VLDB, pp. 267–278 (2007) 15. Mouratidis, K., Hadjieleftheriou, M., Papadias, D.: Conceptual partitioning: An efficient method for continuous nearest neighbor monitoring. In: Proc. SIGMOD, pp. 634–645 (2005) 16. Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proc. SIGMOD, pp. 467–478 (2003) 17. Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: Proc. VLDB, pp. 15–26 (2007) 18. Tan, K.L., Eng, P.K., Ooi, B.C.: Efficient progressive skyline computation. In: Proc. VLDB, pp. 301–310 (2001) 19. Tao, Y., Papadias, D.: Maintaining sliding window skylines on data streams. TKDE 18(3), 377–391 (2006) ¨ Abbadi, A.E.: Deltasky: Optimal maintenance of sky20. Wu, P., Agrawal, D., Egecioglu, O., line deletions without exclusive dominance region generation. In: Proc. ICDE, pp. 486–495 (2007) 21. Wu, P., Zhang, C., Feng, Y., Zhao, B.Y., Agrawal, D., Abbadi, A.E.: Parallelizing skyline queries for scalable distribution. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 112–130. Springer, Heidelberg (2006) 22. Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multidimensional data. In: Proc. VLDB, pp. 483–494 (2007) 23. Zhang, Z., Cheng, R., Papadias, D., Tung, A.K.H.: Minimizing the communication cost for continuous skyline maintenance. In: SIGMOD Conference, pp. 495–508 (2009) 24. Zhu, L., Tao, Y., Zhou, S.: Distributed skyline retrieval with low bandwidth consumption. TKDE 21(3), 384–400 (2009) 25. Zhou, Y., Yan, Y., Yu, F., Zhou, A.: PMJoin: Optimizing Distributed Multi-way Stream Joins by Stream Partitioning. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 325–341. Springer, Heidelberg (2006)