Towards Minimizing Computational Cost of Range Aggregates ...

10 downloads 86 Views 1MB Size Report
Towards Minimizing Computational Cost of Range Aggregates against. Uncertain Location-Based Queries. Y.Rajasekhar. Department of Computer Science.
ISSN:2229-6093

Y Rajasekhar , Int.J.Computer Technology & Applications,Vol 4 (6),945-950

Towards Minimizing Computational Cost of Range Aggregates against Uncertain Location-Based Queries Y.Rajasekhar Department of Computer Science M.Tech, Artificial intelligence JNTUCEA,Anantapur [email protected]

Abstract Queries which are location based are unsure in nature. Such queries are become common in many real time applications where location based services are cause to be. Multi-dimensional search space is effective use of such applications. Processing unsure location based queries accurately is a tough task. This paper handles a new algorithm to handle such queries. It makes utilize of range aggregates like count, avg, sum etc. for query processing. The algorithm calculates range aggregates proficiently so as to support correct result making. We built a model application for testing the future algorithm. The investigational results showed that the application is proficient of processing uncertain location based queries efficiently consuming less processing power and computation cost.

Index Terms – Aggregates, Uncertain location based queries

1. INTRODUCTION Uncertainty is the main problem in location based applications where there is imprecision occurs to query. In order to practice such queries wellorganized computation of range aggregates is required as it is a difficult task. The presented solutions are not up to mark as they are not capable of processing uncertain data points and query points. The work in this paper is provoked by some real world examples such as executing weapons in military accurately without causing damage to Civilian facilities and police patrol. In case of the primary example it is main for military officers to compute range aggregates and avoid civilian fatalities. They make every effort to reduce the number of civilian points to be destroyed by adjusting the falling place so as to ensure that the target is not missed and at the same time the

IJCTA | Nov-Dec 2013 Available [email protected]

civilian facilities are not cracked. circumstance is illustrated in fig. 1.

This

Fig. 1 – Motivating Example As seen in figure 1 there are many civilian points and query points. The q1 and q2 are representing for query points while the civilian points are p1 to p6. Projectile of the army has its range of damage. That range has to be considered and the civilian points are to be kept out of the range of the projectile. When projectile is operated to hit targets such as q1 and q2, the nearby resident places might be destroyed. The application has to calculate the range aggregates and decide the falling place of the missile so as to reduce the destruction of as many civilian places as possible while clearly destroying the target. Another example that motivated this paper is pertaining to police patrol. Police should recognize or estimate which route is best for police patrol to be helpful. As viewed in fig. 1, Q represents various spots in the route of patrol. The points such as p1 to p7 signify real world objects such as hotels, schools, hospitals etc. maintaining the present location of the patrolling vehicle in mind the best route has to be computed so as to benefit police to serve the purpose of their patrolling. The algorithm proposed to compute range aggregates and filter out civilian places so as to enable the application to hit the target while ensuring that civilian places are not destroyed.

945

ISSN:2229-6093

Y Rajasekhar , Int.J.Computer Technology & Applications,Vol 4 (6),945-950

This paper focused on processing queries which are uncertain locality based. The presented algorithm in this paper can help computing range aggregates efficiently in order to process the query in a best way without causing problems to unintended locations.

2. PRIOR WORK Services which are location based play an important role in real world applications. They have been about to process uncertain location based queries. Though, to process such queries is a not easy job. Many preceding works focused on the problem of processing such queries. In [1] arrangement of such works is found. It probabilistic queries exposed and query assessment procedures. In order to process such queries rage aggregates are to be calculated. Cost based aggregates are given importance in this paper. This difficulty is very major and focus was more on probability – thresholding. For any given region query, threshold etc. the results are nothing but objects that are most possible ones. Probability Thresholding Index (PTI) is another approach to solve this problem which is presented in [2]. The negative aspect of this is that it works fine with only uncertain objects in single – dimensional space. Agarwal et al. [3] proposed indexing technique to solve the problem of processing uncertain location based queries. Another approach proposed by Tao et al. [4] Probabilistic Constrained Region (PCR) for supporting such queries. Probability density function given by them as an intermediary solution. To ensure that the search space is reduced and the processing time is reduced thus lessening the computational cost Pruning technique was also proposed by them. They also improved the technique presented in [5] with another solution that can solve range queries. Chen and Cheng [6] also focused on that kind of problems and they used PCR pruning technii8qe for validation. In [7] query processing based on range aggregates is projected. They uses two approaches for estimation of aggregates that are needed to process queries[8]. Considerable study is made in [9] and [10] also with respect to range queries. Though, they used certain constraint in processing queries. The constriction is nothing but they forced the objects in the dataset to follow Gaussian distribution. Afterwards grading is given to results that provided user fulfillment. In [11] also this kind of work was done and the queries are processed using indexing in the presence of HI dimensional data. Clustering is one of the data mining approaches to group similar objects. In [12] the clustering is studied for processing conventional queries. This is the motive why these techniques can’t be

IJCTA | Nov-Dec 2013 Available [email protected]

functional directly to the problems mentioned in this paper. However, the problem considered in this paper is similar to that of [13] with one difference. The difference is that in [13] the search space is considered to be a rectangle that is not suitable for solving queries of uncertain location. In this paper we deliberate computing range aggregates which are not similar to that of [13]. From our experiments it is understood that the PCR method can be applied to the problem of computing range aggregates by slight differences.

2.1PROPOSED CALCULATE AGGREGATES

SYSTEM

TO RANGE

This section discusses about the problem, the solution in the form of an algorithm and pruning techniques to improve the efficiency of the solution.

2.1.1Problem Definition In n-dimensional search space, a set of points assume in this paper. There are distances among the points and the distance is computed using relevant metrics. Euclidian Distance (ED) is used in this paper for distance computation using the following equation.

In uncertain location based applications, based on distance and on query are computed. Users might show interest in certain falling probabilities that satisfy given threshold. The Falling probability can be calculated as follows.

2.1.2Proposed Algorithm Calculating Range Aggregates

for

Filtering and verification is the algorithm used in this paper in order to filter out the civilian places when a target has to be damaged precisely. The set of points are well prearranged in the form of a tree named “R-Tree” as discussed earlier in [13]. There are many entries in “R-Tree”. Each entry might be an intermediary entry or an item that can be processed directly. Middle entries are nothing but group of objects that are to be processed in succession. The one by one item are processed. When the application finds an go-between entry, then all the items in that entry are sent back in the queue so as to process them in sequential manner. The algorithm uses a threshold that helps in identifying objects that need to be processed. With admiration to distance and query all the decisions

946

ISSN:2229-6093

Y Rajasekhar , Int.J.Computer Technology & Applications,Vol 4 (6),945-950

are completed The algorithm uses different notations which are shown in table 1.

Table 1 –Notations used in algorithm As seen in table 1, various notations utilized in the pseudo code and their denotation is given. The genuine algorithm that uses these notations is as shown in listing 1.

contains query distance, threshold filtering technique and unsure query. Afterwards, the calculation of rage aggregates is completed. Then it algorithm proceeds the count of the validated points. The algorithm gives most exact solution which can save civilian places while accurately destroying target in case of military model. Pruning the exploration space can improve the efficiency of this algorithm. For this motive a pruning procedure is accessible in listing 2.

Listing 2 –Pruning Technique As seen in listing 2, the algorithm is meant for filtering out some data points so that the computation of range aggregates is done much quicker. The algorithm takes an entry from R-Tree as input along with query threshold, distance and anchor point then filters out the data points. The search space is reduced by the time of pruning is finished. The computational cost is robotically reduced thus making the proposed algorithm computationally efficient. 3. EXPERIMENTS AND RESULTS The prototype of application we built used for experiments. The application is with Graphical User Interface built in Java encoding language. For building experiments, a PC with Core 2 processor and 2GB RAM are used. For developing prototype Net Beans IDE is used. In the application various system parameters are used. They are obtainable in table 3.

Listing 1 –Proposed Algorithm Listing 1 shows algorithm that take all points in the form of R-Tree and processes them. The tree also

IJCTA | Nov-Dec 2013 Available [email protected]

947

ISSN:2229-6093

Y Rajasekhar , Int.J.Computer Technology & Applications,Vol 4 (6),945-950

Fig. 3 shows filtering performance with respect to candidate size. Set of space values are shown in horizontal axis while candidate size is shown in vertical alignment. The graph shows that when number of distance standards is increased, the consequent candidate size gets reduced.

Table 3 –System Parameters As can be seen in the on top of table, the experiments used the system parameters presented in the table. The investigational results are presented in the form of graphs as shown below.

Fig. 4 –Filter performance vs. space usage with respect to candidate size

Fig. 2 – Filtering time vs. filtering performance

As seen in fig. 4, the number of anchor points is presented in horizontal axis while the candidate size is shown in vertical alignment. The consequences show that APF is improved than any other technique. One information conventional with this graph is that when number of anchor points raise, the performance of APF increases.

Fig. 2 shows that the X axis represents the count of anchor points while the Y axis represents the filtering time taken by process. As seen in the graph the proposed filtering technique APF performs better than all other techniques. It can be convinced from the graph that filtering time is not much influenced by the number of anchor points.

Fig. 5 – Filter performance vs. space usage with respect to filtering time As seen in fig. 5 number of anchor points is presented in X axis while time taken for filtering is presented in Y alignment. The chart shows when the number of anchor points is increased the rate of APF increases. Fig. 3 – Filtering performance for each vs. candidate size

IJCTA | Nov-Dec 2013 Available [email protected]

948

ISSN:2229-6093

Y Rajasekhar , Int.J.Computer Technology & Applications,Vol 4 (6),945-950

As shown in fig. 8, query distances are presented in X axis while the filtering time is presented in Y alignment. The graph shows that the reply time increases as the query distance increases.

Fig. 6 –Candidate size vs. query distance (“us” dataset) As shown in fig. 6 query distances are represented by X axis while the candidate size is represented by Y alignment. Presentation of APF is compared with other techniques. Performance of APF is proved to be enhanced than other techniques.

Fig. 9 – Filtering time vs. query distance (“3d uniform points” dataset) As shown in fig. 9, the query distance and filtering time are represented by horizontal and vertical axes correspondingly. The filtering concert of four techniques is presented. The development presented in the graph is that the response time increase as query distance increases.

Fig. 7 - Candidate size vs. query distance (“3d uniform points” dataset) As shown in fig. 7, query distances are represented by X axis while the candidate size is represented by Y alignment. When numeral value of anchor points grow the performance of APF is better than other techniques.

Fig. 10 –Query response time vs. point distance (“us” dataset) As shown in fig. 10, the query distance and query response time are represented by horizontal and vertical axes correspondingly. The query reply performance of four techniques is obtainable. The performance of APF is improved than all other techniques.

Fig. 8 – Filtering vs. query distance (“us” dataset)

IJCTA | Nov-Dec 2013 Available [email protected]

949

ISSN:2229-6093

Y Rajasekhar , Int.J.Computer Technology & Applications,Vol 4 (6),945-950 [2] S. Prabhakar, and J.S. Vitter, R. Cheng, Y. Xia, and R. Shah, “EffcientIndexing Methods for Probabilistic Threshold Queries overUncertain Data,” Proc. Int’l Conf. Very Large Data Bases (VLDB),2004. [3] C. Bohm, M. Gruber, P. Kunath, M. Schubert, and A. Pryakhin, “Prover: Probabilistic Video Retrieval using the Gauss-Tree,” Proc.IEEE 23rd Int’l Conf. Data Eng. (ICDE), 2007. [4] Y. Tao, R. Cheng, X. Xiao, W.K. Ngai, S. Prabhakar, and B. Kao ,“Indexing Multi-Dimensional Uncertain Data with ArbitraryProbability Density Functions,” Proc. Int’l Conf. Very Large DataBases (VLDB), 2005

Fig. 11 – Query response time vs. distance (“3d uniform points” dataset) As shown in fig. 11, the query distance and query response time are represented by horizontal and vertical axes correspondingly. The query reply concert of four techniques is presented. The performance of APF is superior than all other techniques.

[5] X. Xiao, Y. Tao, and R. Cheng, “Range Search on MultidimensionalUncertain Data,” ACM Trans. Database Systems, vol. 32, no. 3, pp. 1-54, 2007. [6] S.-W. Cheng, P.K. Agarwal, Y. Tao, and K. Yi, “Indexing Uncertain Data,” Proc. Symp. Principles of Database Systems (PODS), 2009. [7] W. Zhang, Y. Zhang, S. Yang, and X. Lin, “Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data,” Proc.Joint Int’l Conf. Advances in Data and Web Management (APWeb/WAIM), 2009. [8] N. Mamoulis, Y. Tao , X. Dai, M. Yiu, and M. Vaitis, “ProbabilisticSpatial Queries on Existentially Uncertain Data,” Proc. Int’l Symp.Large SpatioTemporal Databases (SSTD), 2005. [9] Y. Tao, K. Yi, P.K. Agarwal, and S.-W. Cheng, “IndexingUncertain Data,” Proc. Symp. Principles of Database Systems (PODS),2009. [10] C. Bohm, M. Schubert, and A. Pryakhin, “Probabilistic RankingQueries on Gaussians,” Proc. 18th Int’l Conf. Scientific and StatisticalDatabase Management (SSDBM), 2006.

Fig 12 Proposed Results As shown in the Fig: 12, represents horizontal axis represents request time while vertical axis represents response time.

4. CONCLUSION In this paper, we studied the problem of solving uncertain range queries in an environment where multidimensional uncertain objects be present. We proposed an algorithm for efficiently computing range aggregates such as sum, max count, etc. to process uncertain location based queries. For this an application is built in Java to test the proposed algorithm for its effectiveness. The investigational results show that the proposed algorithm is computationally effective.

[11] P. Yu, and C. Aggarwal, “On High Dimensional Indexing ofUncertain Data,” Proc. IEEE 24th Int’l Conf. Data Eng. (ICDE), 2008. [12] M. Pfeifle, and H.P. Kriegel, “Density-Based Clustering ofUncertain Data,” Proc. 11th ACM SIGKDD Int’l Conf. Knowledge Discovery in Data Mining (KDD), 2005. [13] J. Chen and R. Cheng, “Efficient Evaluation of Imprecise Location-Dependent Queries,” Proc. IEEE 23rd Int’l Conf. Data Eng. (ICDE),2007.

5. REFERENCES [1] R. Cheng, S. Prabhakar and D.V. Kalashnikov “Evaluating Probabilistic Queries over Imprecise Data,” Proc. ACM SIGMODInt’l Conf. Management of Data, 2003.

IJCTA | Nov-Dec 2013 Available [email protected]

950