A Framework for Processing Cumulative Frequency

2 downloads 0 Views 2MB Size Report
Problem Definition. Given a medical data stream S, a set of cumulative fre- ... cumulative frequency of a certain value range [l, r], we apply a linear scan on A and.
A Framework for Processing Cumulative Frequency Queries over Medical Data Streams Ahmed Al-Shammari1,2 ( ), Rui Zhou1 ( ), Chengfei Liu1 Mehdi Naseriparsa1 and Bao Quoc Vo1 1 Swinburne University of Technology, Melbourne, Australia 2 University of Al-Qadisiyah, Al Diwaniyah, Iraq {aalshammari, rzhou, cliu, mnaseriparsa, bvo}@swin.edu.au

Abstract. Medical data streams processing becomes increasingly important since it extracts critical information from a continuous flow of patient data. Various types of problems have been studied on medical data streams, such as classification, clustering, anomaly detection, etc.; however, efficient evaluation of cumulative frequency queries has not been well studied. The cumulative frequency of patients’ status can play an instrumental role in monitoring the patients’ health conditions. Up to now, efficiently processing cumulative frequency queries on medical data streams is still a challenging task due to the large size of the incoming data. Therefore, in this paper, we propose a novel framework for processing the cumulative frequency queries over medical data streams to support the online medical decision. The proposed framework includes two components: data summarisation and dynamic maintenance. For data summarisation, we propose a hybrid approach that combines two data structures and exploits a classification algorithm to select the more efficient data structure for computing the cumulative frequency. For dynamic maintenance, we propose an incremental maintenance approach for updating the cumulative frequencies when new data arrive. The experimental results on a real dataset demonstrate the efficiency of the proposed approach. Keywords: Medical data streams, Cumulative frequency query, Binary Indexed Tree, Dynamic maintenance

1

Introduction

Data summarisation is an important data mining technique that aims to generate a concise representation of the underlying data [14,1]. The summarised data is represented in a compact form and still informative [9,2]. More recently, medical streaming data summarisation has become a useful task in healthcare information systems. That is because the data summary contains critical information for patients’ health status. For instance, in the clinical bio-statistics, the cumulative frequency is often used to determine the number of observations that exist above or below a specific value, or between two bounding values [5,11]. It may indicate the vital signs of blood pressure and electrocardiogram data. For example, consider the case where the patients data is recorded by different measurement devices. Then, these raw data is transferred to the medical centre for further analysis, and finally is presented to medical professionals.

During this process, devices are designed to measure the physical status of the human body, such as blood flow, blood pressure, and electrocardiogram, in a very short period of time. Thus, to ensure the effective monitoring of patients’ health status, it is necessary for the medical professionals to check the patient health-related data over time. For monitoring purposes, a doctor may submit a cumulative frequency query that includes the lower and upper bound values. Then, the system retrieves the cumulative frequency of the relevant measurements during a certain time interval for the specified range. However, processing a cumulative frequency query on medical data streams is a challenging task due to the large size of the continuous flow of patients data. For instance, to monitor a patient with a heart disease, the measurement device generates the streams very frequently (every half a minute). Thus, we face a vast amount of streaming data that are technically difficult to process. In this paper, we address some challenges for processing the cumulative frequency query on medical streaming data. First, there is a need to design an index structure that is capable of calculating the cumulative frequency efficiently. Second, there is also a need to design an incremental maintenance approach for maintaining the designed indexes in a timely manner. Therefore, we propose a new framework for processing the cumulative frequency queries on the medical streaming data. Technically, the proposed framework includes two components: (1) data summarisation, and (2) dynamic maintenance. The main contributions of this paper are summarised as follows: – We propose a novel framework for processing the cumulative frequency queries over medical data streams. The framework includes a hybrid approach that combines two data structures and exploits a classification algorithm to minimise the computations. – We further propose an incremental maintenance approach for updating the data summaries in a sliding time window. – We validate the proposed approaches with experiments on a real-world dataset and demonstrate their efficiency. The rest of the paper is organised as follows: Section 2 presents the problem formulation, which is followed by the proposed solution in Section 3. Section 4 presents the experimental results. The related work is presented in Section. 5, and finally the paper is concluded in Section 6.

2

Problem Formulation

We first show how a medical data stream is modelled, and then define necessary preliminary terms. Finally, we define the problem studied in this paper formally. Definition 1. (Medical Data Stream): A medical data stream S is defined as a sequence of tuples S=hs1 , t1 i, hs2 , t2 i,. . . ,hsn , tn i, where si is a set of data values, and si .xj indicates the jth measurement value for si on the jth dimension/symptom, and ti is the associated timestamp of hsi , ti i. Definition 2. (Time/Tuple based Window): A window (w) is defined as a range that bounds the flow of data stream. The time-based window is specified based on the time units, whereas tuple-based window is specified based on the number of tuples.

In this paper, we consider time-based window to manage the flow of medical data streams. Tuple-based window can be handled similarly. Definition 3. (Frequency): A frequency f (si .x) is defined as the number of times that an attribute value si .xj appears within a specified time window w. Practically, in many medical applications, it is useful to check the total number of occurrences of some measurement value appearing within a doctor pre-specified range so that the patients’ status can be monitored. The main indicator that determines the risky and normal statuses of the patients depends on the health-related boundaries. Usually, there is a need to specify an interval and monitor patients’ the cumulative frequency of a particular attribute value. Definition 4. (Cumulative Frequency Query): Given an interval [l, u], a cumulative frequency query aims to retrieve the total of frequencies of all the values within the range [l, u], which can be expressed by the following formula:

cf (si .x) =

n X

f (si .x), si .x ∈ [l, u]

(1)

i=1

Medical data streams are evolving over time. The cumulative frequency results are likely to change over time accordingly. It is desirable to provide query answers as soon as possible for doctors if their cumulative frequency queries are continuous. As a result, it is significant that the cumulative frequency summaries can be maintained incrementally when new data arrive. The maintenance of C includes inserting new data stream values, updating the cumulative frequency values in {cfi , . . . , cfn }, and deleting expired data stream values. Now, we summarise the problem that is studied in this paper below: Problem Definition. Given a medical data stream S, a set of cumulative frequency queries Q = {q1 , q2 , ..., qn }, and their corresponding specified windows W = {w1 , w2 , ..., wn }, the required task is to retrieve and maintain the cumulative frequencies C = {cf1 , cf2 , cf3 , . . . , cfn } from the patients data for the corresponding specified time windows W efficiently.

3

The Proposed Solution

This section clarifies the technical aspects of the proposed framework. Figure 1 shows the main components of our framework. From the figure, the main components are as follows: (1) data summarisation, and (2) dynamic maintenance. In the first component, we discuss how to process cumulative frequency queries. In the second component, we discuss how to maintain the cumulative frequency results when new stream items arrive and old stream items expire. The technical details of the data summarisation and dynamic maintenance are presented in Sections 3.1 and 3.2 respectively.

Data Summarisation Table1

Binary Indexed Tree

Classifier

Table2

Baseline

Data Streams

Table3

Dynamic Maintenance Index Initialisation

Updating the Index

Tablen

Fig. 1: The components of proposed framework

3.1

Data Summarisation

The first component of the proposed approach is data summarisation. Computing the cumulative frequency of the data streams is one of the data summarisation processes. We propose a new approach for processing the cumulative frequency queries on medical data streams efficiently. Specifically, our approach is a combination of a Binary Indexed Tree (BIT) based approach and a baseline approach. In the design, we build a classifier to predict the better approach for processing the cumulative frequency cf of the medical data streams. Since the incoming data may involve big or small range of data, we take the value range of data streams into consideration, which can be classified into two categories: small and large. The BIT approach is employed to efficiently calculate the cumulative frequency of large query ranges, whereas the baseline approach is employed to calculate the cumulative frequency of small query ranges. In The technical details of the proposed approaches are presented in sections 3.1.1, 3.1.2, and 3.1.3 respectively.

3.1.1

Baseline Approach

The baseline approach employs an array A to store and update the frequency of the values appearing within a specific time window on a medical data stream. For example, we assume that B is a bag of attribute values in a specific time window, eg., B = {95, 120, 95, 99, 95, 100, 64, 66, 95, 77}, here the value frequencies stored in A are f (64) = 1, f (66) = 1, · · · , f (95) = 4, · · · , f (100) = 1. Note that, for an absent value, the frequency field in A is recorded as 0, eg., f (65) = 0. Then, to compute the cumulative frequency of a certain value range [l, r], we apply a linear scan on A and aggregate the frequency values from f (l) to f (u) by using the the Formula 1. Clearly, due to a linear scan on an array, this process requires O(n) time complexity, where n is the possible number of values. When a data value enters or leaves the window, the corresponding frequency can be updated in O(1) time. The baseline approach works better in updating time and is good when data value range is small. However, the O(n) complexity makes query efficiency not satisfactory, especially, when data value range is large, eg., when data value range is enlarged by dividing data values with smaller intervals in order to provide better precision.

[1,8] Frequencies stored in respective positions of BIT

[1,4] [1,2] [1]

Position in BIT Binary Representation

[5,6] [5]

[3]

1 0001

2

3

4

5

0010

0011

0100

0101

[7]

6 0110

7 0111

8 1000

Fig. 2: The structure of BIT

3.1.2

Binary Indexed Tree Approach

Binary Indexed Tree (BIT) [7] is a data structure that accelerates the retrieval of the cumulative frequency of a certain range. It calculates the cumulative frequency in O(log n) time complexity. The core idea of BIT is that the elements of the array store the cumulative frequencies of specific ranges. Each node stores a sub-sum of frequencies, covering the number of values as some power of two. Fig.2 shows the frequencies stored in a binary indexed tree array, assuming data values start from 1. In this BIT array, bit[1] stores the frequency of elements at arr[1]; bit[2] stores the frequency of elements at arr[1] and arr[2]; bit[3] stores the frequency of elements at arr[3]; bit[4] stores the cumulative frequency of elements from arr[1] to arr[4]. As a result, given a value n, the BIT can help to calculate the cumulative frequencies of the values [1, n]. The binary representation of n includes at most dlog2 ne bits. For processing a cumulative frequency query q = [l, u], the BIT can help to calculate the cumulative frequency of the lower end bit[l-1] (which records the values in [1,l-1]) and the upper end bit[u] (which records the values in [1,u]). And then, we take the difference bit[u] - bit[l-1] and get the cumulative frequencies of the range [l,u]. Note that, for ease of illustration, in Fig 2, we adopt 1 as the starting value, and 1 as the increment step between two adjacent values. In practice, the starting value can be any and the increment step can be also set to a larger or smaller value, depending on the need of users. A mapping can be easily done by taking the measured value as input and using a mapping function to locate the right BIT array entry for recording the occurrence of such value. 3.1.3

Hybrid Approach

The baseline approach uses a linear scan on an array data structure to retrieve cumulative frequencies of a given user-interested value range. However, this incurs high computational cost, when the range is large and the interval increment is small. Therefore, to improve the performance, the BIT approach exploits an index to retrieve the cumulative frequency in O(log n) time. This BIT approach does not excel when the query range is small. This is because, under such cases, the overhead for processing the cumulative frequency cannot be significantly better (or may be even worse) than the baseline approach. In order to take advantage of both approaches, we propose a

hybrid approach that employs a classifier to select the proper approach for processing a cumulative frequency query. A Decision Tree (DT ) classification algorithm is used to decide the more efficient approach based on the probability estimation. The DT algorithm is one of the most simple and effective classification algorithms. It is capable of learning from the streaming data [13]. The core idea of this algorithm is to choose optimal splitting attributes by estimating some statistics. Technically, this algorithm includes two parts: (a) training phase and (b) testing phase. In the training phase, a decision tree is built based on the input, output and the training medical data streams. To build the model, the DT would exploit the entropy function (E(P1 , P2 ) = P2 − i=1 Pi log Pi ) to decide on which input features the data will split, where P1 is the probability that an instance from S results in the outcome 0. While, P2 is the probability that the instance from S results in the outcome 1. We design Pn the feature s.x vector ~v that includes three main statistics as follows: the mean (m= i=1n i ), the variance (r=

Pn

~ 2 i=1 (s.xi −m) n

), and a set of cumulative frequency queries Q.

Algorithm 1: Hybrid Approach 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Input : Data stream S , cumulative frequency query q = [l, u] Output: Cumulative frequency cf S train ← getSubset(S) S test ← S \ S train DT ← T rain(S train ) if Test(DT) then m ← calculateAvg(s.x); r ← calculateV ar(s.x, m); ~ v ← createV ector(m, r, q); y ← classif y(DT, ~ v ); if y == 0 then use baseline to retrieve cf for q; else use BIT to retrieve cf for q; else go to line 1 return cf ;

Algorithm 1 presents the steps of processing the cumulative frequency query using the hybrid approach. In lines 1-2, we prepare the training and testing data. In lines 5-7, we create the feature vector ~v . In lines 9-12, we select the more efficient approach for processing the cumulative frequency query based on the the output y of the DT classifier in line 8. If y returns 0, we use the baseline approach, otherwise we use the BIT approach. In line 14, we require to use the training phase if the test is not successful. In line 15, we retrieve the cumulative frequency cf for the specified q. 3.2

Dynamic Maintenance

The second component of the proposed approach is the dynamic maintenance component. When data streams slide into a new time window w, the cumulative frequencies of BIT are to be updated incrementally. The BIT approach maintains the cumulative frequencies in two primary steps presented as follows: (1) index initialisation, and (2) index update. Firstly, we initialise the BIT index when the data streams in the first time window w arrive. To build the index, we firstly extract the distinct attribute values in the window w and compute the frequency cf of these values. Then, to include these values and their corresponding frequencies into the BIT, we use an insertion operation inside the BIT data structure. Secondly, the cumulative frequency of some of

the existing attribute values may be subjected to change in a new time window. That is because more instances of a value may arrive into the window or some instances exit from the window. In order to update the index structure efficiently, we enhance the system to be capable of maintaining the index of the cumulative frequencies by using two operations that are stated as follows: – Increment: this operation increments the cumulative frequency associated with the attribute value s.x which is from a new instance < s, t > in the current time window wc . – Decrement: this operation decrements the cumulative frequency associated with the attribute value s.x which is from an old instance < s, t > that was in the old window wo but has exited from the current time window wc . Algorithm 2: BIT Maintenance 1 2 3 4 5

Input : Old window wo , Current window wc , Index B Output: Updated index B foreach stream item < s, t > ∈ wc − wo do Increment: f (s.x) = f (s.x) + 1 in B; foreach stream item < s, t > ∈ wo − wc do Decrement: f (s.x) = f (s.x) − 1 in B; return B

Algorithm 2 presents the steps of the maintenance using the BIT approach. In lines 1-2, we update the cumulative frequency of an attribute value by using the increment operation to add in new stream item values. In lines 3-4, we update the cumulative frequency of an attribute value by using the decrement operation to remove expired stream items. In line 5, we retrieve the new index B.

4

Experimental Results

This section highlights the experimental results for the proposed approaches as follows: (a) baseline, (b) BIT, and (c) hybrid approach. We conducted experiments on a real medical dataset. We use the Cuff-Less Blood Pressure Estimation dataset1 . The format of the dataset is Matlab’s v7.3, and the total size is 3.17 GB. The dataset includes four subsets, each subset contains 3000 records. The experiments are implemented in Java and executed by a processor Intel, Core (i5)-3570 CPU 3.40 GHz. We verify the performance of the proposed approaches by varying the size of streaming data and varying the query bounds. 4.1

Varying the Data Stream Size

Fig. 3 presents the execution time of the proposed approaches by varying the size of the data streams. We set the cumulative frequency query q = [55, 75] for this set of experiments. We use 4 subsets of data streams with various sizes. The order of their sizes are as follows: Subset 3 < Subset 1 < Subset 4 < Subset 2. When the size of the data streams is set from 500 to 3000 with an increase of 500 records at each iteration, we note that the subset 2 which has the largest size requires a long execution time in comparison with other subsets for all approaches. Conversely, subset 3 which has the smallest size requires a less execution time for all approaches. 1

https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation

1500 309 384.39 482.53

2000 323.52 390.22 515.92

2500 421.44 482.13 591.35

3000 445.18 522.72 632.36

700 585

Hybrid

470

BIT

355

Baseline

240 125

Hybrid BIT Baseline

500 500 25.93 42.32 80.21

1000 1000

1500 200020002500 2500 1500 3000 43.21 66.23 128.94 176.12 Size of Data Streams 69.49 95.51 162.63 210.68 112.35 154.65 215.37 286.73

3000 214.63 276.14 328.24

Execution Time in Milliseconds

260

Hybrid BIT Baseline 100 20

500

1000

1500

2000

2500

3000

Size of Data Streams

(c) Subset 3

1500 795 972.64 1128.67

2000 812 983.43 1245.21

2500 937 1095.28 1270.82

3000 932 1115.45 1310.59

1200

Hybrid BIT

1000

Baseline

800 600

500

340

1000 762.66 895.2 1078.25

1400

Hybrid BIT Baseline

(a) Subset 1

180

500 676 836.47 1012.26

Hybrid BIT Baseline

Execution Time in Milliseconds

Execution Time in Milliseconds

1000 239 325.52 454.46

Execution Time in Milliseconds

500 194.65 256.94 368.24

Hybrid BIT Baseline

500 234.65 293.64 490.24

1000

1500

2000

2500

1000of Data 1500 Size Streams 2000 274.31 312.82 364.61 352.46 391.65 482.84 540.43 591.37 619.55

3000 2500 435.43 530.39 685.49

(b) Subset 2

3000 482.32 576.82 712.37

800 650

Hybrid BIT

500

Baseline 350 200

500

1000

1500

2000

2500

3000

Size of Data Streams

(d) Subset 4

Fig. 3: Execution time for the baseline, BIT and hybrid approaches with varying the data streams size

That’s because when the size of data streams increase, all approaches scale linearly with the execution time to retrieve the cumulative frequency q. Moreover, the hybrid approach shows the best performance in comparison with two other approaches. That’s because the hybrid approach employs the decision tree classifier which effectively selects the more efficient approach. We can observe that the data stream size has the minimum effect on the performance of hybrid approach. 4.2

Varying the Query Bounds

Fig. 4 presents the execution time of the proposed approaches by varying the query bounds of in the experimented data streams. We quantify the effect of the baseline, BIT and hybrid approaches in processing the cumulative frequency of the blood pressure. The query bounds are categorised into four groups as follows: (1) small (q = [55, 60]), (2) medium (q = [60, 70]), (3) large (q = [70, 90]), and (4) very large (q = [90, 130]). From Fig. 4.a, we observe that the baseline approach performance is close to BIT or in other cases it beats the BIT approach. That’s because when the query bounds are small, the BIT computational overhead deteriorates its performance. Conversely, when the query bounds are larger the performance of BIT improves in comparison with the baseline approach as shown in Fig. 4.c and Fig. 4.d. We also observe that the hybrid approach outperforms the baseline and the BIT approaches in all cases. As a result, the hybrid approach is always the best approach.

5

Related Work

This section highlights the basic findings in the studies related to the binary indexed tree and range query processing [15]. Most of the existing works have been focused on the Binary Indexed Tree (BIT). This data structure introduced by Fenwick [7] for efficiently calculating and maintaining the cumulative frequencies. Dima et al. [6] employed the binary indexed tree to solve the Range Minimum Query (RMQ) problem.

q[55,56] 121 150 144

q[56,57] 125 153 147

q[57,58] 132 156 158

q[58,59] 138 163 167

q[59,60] 141 159 165

180 150 120

Hybrid BIT

90

Baseline

60

q=[61,62] q=[62,64] q=[64,66] q=[66,68] q=[68,70] 117 120 122 132 135 141 146 158 164 168 163 166 181 184 185

Hybrid BIT Baseline

30

Execution Time in Milliseconds

Execution Time in Milliseconds

Hybrid BIT Baseline

200 170 140

Hybrid

110

BIT Baseline

80 50

Query Bounds

Query Bounds

Range Queries

q[70,74] 112 147 186

q[74,78] 115 152 192

q[78,82] 130 159 206

Range Queries

q[82,86] 124 161 201

(a) Small

q[86,90] 129 169 215

Hybrid BIT Baseline

225

195 165

Hybrid

135

BIT Baseline

105

75

QueryBounds Bounds Query

Range Queries

(c) Large

Execution Time in Milliseconds

Execution Time in Milliseconds

Hybrid BIT Baseline

q[90,98] 122 142 203

q[98,106] q[106,114]q[114,122]q[122,130] 125 131 135 119 144 152 159 149 202 198 216 228

(b) Medium

230 200

170

Hybrid

140

BIT

110

Baseline

80

Query QueryBounds Bounds

Range Queries

(d) Very Large

Fig. 4: Execution time for the baseline, BIT and hybrid approaches with varying the query bounds

They proved that the binary indexed tree is faster than Segment/Range Trees and Sparse Table algorithm. Bille et al. [3] proposed two succinct models of the binary indexed tree for solving the partial sums problem. The first model requires nk + O(n) bits of space with supporting the sum and update operations. In the second model, the optimal time for both operations is increased partially by changing the space usage to nk + O(nk) bits. Han et al. [8] utilized (BIT) to estimate the number of join results to help with load shedding over data streams. Mladenovic et al. [12] proposed a variable neighbourhood search approach to solve the pickup-and-delivery travelling salesman problem. They used the binary indexed tree data structure for updating and checking of solutions in the neighbourhoods. As mentioned earlier, the tree-based index structure plays a significant role for answering the range queries [4,10]. Zhu et al.[16] proposed a new binary index, called binary obstructed tree (OB-tree) for indexing composite items in the obstructed space. The basic idea of OB-tree is to divide the obstructed space into non-obstructed sub spaces for efficiently retrieving highly qualified candidates of the range-based obstructed nearest neighbour (RONN) search. The distinction of this paper is that we compute the cumulative frequency value between two ranges of the given query. The literature reveals that none of the proposed approaches employ the binary indexed tree BIT for processing cumulative frequency queries in the streaming environment.

6

Conclusion

In this paper, we propose a novel framework for processing the cumulative frequency queries over medical data streams. Our proposed framework includes two main components: data summarisation and dynamic maintenance. In the first component, we proposed a hybrid approach which uses a combination of the baseline and Binary Indexed Tree (BIT) approaches to retrieve the cumulative frequency query. In the second component, the hybrid approach maintains the cumulative frequency summaries by initialising and maintaining the BIT index. Verified experimentally, the

hybrid approach improves the performance of processing the cumulative frequency queries on average. For future work, we will consider further improvements on the hybrid approach to make it more effective to retrieve and maintain the continuous cumulative frequency queries over very fast data streams. Acknowledge. This work was partially supported by the ARC Discovery Project under Grant No. DP170104747 and DP180100212.

References 1. A. M. Abbas, A. A. Bakar, and M. Z. Ahmad. Fast dynamic clustering soap messages based compression and aggregation model for enhanced performance of web services. Journal of Network and Computer Applications, 41:80–88, 2014. 2. A. Al-Shammari, C. Liu, M. Naseriparsa, B. Q. Vo, T. Anwar, and R. Zhou. A framework for clustering and dynamic maintenance of xml documents. In International Conference on Advanced Data Mining and Applications, pages 399–412. Springer, 2017. 3. P. Bille, A. R. Christiansen, N. Prezza, and F. R. Skjoldjensen. Succinct partial sums and fenwick trees. In International Symposium on String Processing and Information Retrieval, pages 91–96. Springer, 2017. 4. L. Chen, Y. Gao, X. Li, C. S. Jensen, G. Chen, and B. Zheng. Indexing metric uncertain data for range queries. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 951–965. ACM, 2015. 5. L. Chen, Y. Gao, A. Zhong, C. S. Jensen, G. Chen, and B. Zheng. Indexing metric uncertain data for range queries and range joins. The VLDB Journal, 26(4):585–610, 2017. 6. M. Dima and R. Ceterchi. Efficient range minimum queries using binary indexed trees. Olympiads in Informatics, 9:39–44, 2015. 7. P. M. Fenwick. A new data structure for cumulative frequency tables. Software: Practice and Experience, 24(3):327–336, 1994. 8. D. Han, C. Xiao, R. Zhou, G. Wang, H. Huo, and X. Hui. Load shedding for window joins over streams. In Advances in Web-Age Information Management, 7th International Conference,WAIM 2006, Hong Kong, China, June 17-19, 2006, Proceedings, pages 472– 483, 2006. 9. D. Hoplaros, Z. Tari, and I. Khalil. Data summarization for network traffic monitoring. Journal of network and computer applications, 37:194–205, 2014. 10. H. Jung, Y. S. Kim, and Y. D. Chung. Qr-tree: An efficient and scalable method for evaluation of continuous range queries. Information Sciences, 274:156–176, 2014. 11. S. Mitra, S. K. Pal, and P. Mitra. Data mining in soft computing framework: a survey. IEEE transactions on neural networks, 13(1):3–14, 2002. 12. N. Mladenovi´c, D. Uroˇsevi´c, A. Ili´c, et al. A general variable neighborhood search for the one-commodity pickup-and-delivery travelling salesman problem. European Journal of Operational Research, 220(1):270–285, 2012. 13. L. Rutkowski, M. Jaworski, L. Pietruczuk, and P. Duda. The cart decision tree for mining data streams. Information Sciences, 266:1–15, 2014. 14. C. Wang, R. Zhang, X. He, G. Zhou, and A. Zhou. Event phase extraction and summarization. In International Conference on Web Information Systems Engineering, pages 473–488. Springer, 2016. 15. Y. Wang, A. Meliou, and G. Miklau. Rc-index: Diversifying answers to range queries. Proceedings of the VLDB Endowment, 11(7), 2018. 16. H. Zhu, X. Yang, B. Wang, and W.-C. Lee. Range-based nearest neighbor queries with complex-shaped obstacles. IEEE Transactions on Knowledge and Data Engineering, 30(5):963–977, 2018.

Suggest Documents