use of convex hull for detection of outliers in ...

0 downloads 0 Views 587KB Size Report
Jun 19, 2016 - 'n' sided polygon (convex hull) with least area encompassing all the points is ..... [5] Steven J. Worley, Scott D. Woodruff, “ICOADS Release.
USE OF CONVEX HULL FOR DETECTION OF OUTLIERS IN OCEANOGRAPHIC DATA PERTAINING TO INDIAN OCEAN 1

CH MURALI KRISHNA, 2TVS UDAYA BHASKAR, 3M KRANTHI KIRAN

1,3

Computer Science and Technology, Department of CSE ANIL Neerukonda Institute of Technology and Sciences (ANITS), Visakhapatnam. 2 Data and Information Management Group Indian National Centre for Ocean Information Services (INCOIS), Hyderabad. E-mail: [email protected], [email protected], [email protected]

Abstract— This work discusses a new method of identifying erroneous surface meteorology data using ICOADS data. An 'n' sided polygon (convex hull) with least area encompassing all the points is constructed based on the Jarvis March algorithm. The periphery points from the clusters formed while plotting the parameter (e.g.: Air temperature, humidity) against longitude and latitudes is used for building the polygons. Subsequently, Point-In-Polygon (PIP) principle is used to classify the data as in or out of the polygon. It is observed that all possible outlier associated with the data can be identified using this method. Keywords— Convex Hull, Polygon, Jarvis March, Point-In-Polygon (PIP), Outliers, ICOADS, AWS.

the convex hull of a given set of points. This algorithm describes the peripheral points of a convex hull for a given set of points.

I. INTRODUCTION The problem of identifying erroneous data from the huge amount of ICOADS data comprising of ocean parameter collections such as Air temperature, humidity, and wind speed is a difficult task because of its size and large amount. This paper presents a new method of identifying erroneous surface meteorology data using the quality controlled data obtained from International Comprehensive OceanAtmospheric Data Sets (ICOADS). An „n‟ sided polygon (Convex Hull) with least area encompassing all the points is constructed based on the Jarvis March Algorithm [2]. The periphery points from the clusters formed while plotting the parameter (e.g.: Air Temperature, humidity) against longitude and latitudes is used for building the polygons. Subsequently, Point-In-Polygon (PIP) principle is used to classify the data as In (implying good) or Out (implying bad) of the polygon [6] [7]. ICOADS is a world ocean marine meteorological and surface ocean data set. The ICOADS data coverage is global and data density varies depending on date, time and geographic position relative to different shipping routes and ocean observing systems. ICOADS Datasets are formed by gathering, merging many national and international data sources that contain measurements and visual observations from ships (e.g.: merchant, navy, and research), moored buoys and drifting buoys, coastal observation stations, and other marine meteorological platforms. Each coverage report contains observations of oceanographic, marine meteorological and surface oceanographic variables, Such as sea surface and air temperatures, wind speed, pressure, humidity, and cloudiness [5] [9]. Geometric objects such as lines, points, and polygons are the basis of Geometric algorithms. The convex hull of a set of points CH(S) is the smallest convex set that contains all the points in set S. Jarvis March algorithm is a fundamental computational geometric algorithm used to compute

Fig. 1. An Example of Convex Hull [1]

Point-In-Polygon is such a fundamental problem encountered in two-dimensional computational geometry in determining whether a given point lies within given closed polygon or not. There are several methods for this checking, those are polygons winding number, ray tracing...etc [6] [7]. In this paper, we are using ray casting method to check whether a given point lies inside or outside of the polygon. In this work we are presenting two fundamental computational geometry algorithms i.e., Jarvis March algorithm and point-in-polygon principle. Jarvis March algorithm used to find the convex hull for a given set of input data and the point-in-polygon principle is to determine whether the points falls inside (good data) or outside (bad data) of the convex hull („n‟ sided polygon). II. THE CONVEX HULL ALGORITHM 2.1. Jarvis March Algorithm We begin by presenting some preliminaries. We use the term points, vertices, and polygon interchangeably throughout the paper. A polygon p is said to be simple [2] if it consists of straight, nonintersecting line segments, called edges that are joined pair wise to form a closed path. The adjacent

Proceedings of 58th IRF International Conference, 19th June, 2016, Pune, India, ISBN: 978-93-86083-41-8 43

Use of Convex Hull For Detection of Outliers in Oceanographic Data Pertaining to Indian Ocean

edges of the polygon meet only at their common endpoint known as vertices. An edge connecting two points a and b are denoted by e (a, b). The xcoordinate and y-coordinate of a point c are denoted by x(c) and y(c) respectively. Here and throughout the paper, unless qualified otherwise, we take polygon to mean simple polygon on the plane. The Convex Hull, Con. Hull(S) of a point set S is the smallest convex hull set that contains S. Convex hull of a point set S is represented by a set of vertices that defines hull edges. Many algorithms have been presented [1] [2] [3] [4] for finding the convex hull. One of the output-sensitive algorithms for finding the convex hull is Jarvis March [2]. It is based on the idea of finding hull edges instead of hull vertices.

IV. FLOW CHART OF OUTLIER DETECTION 4.1. Flow Chart of Proposed Method This flow chart provides the information about how the algorithm works and how the two different computational geometry algorithms are integrated to process the input ICOADS data for outlier detection. ICOADS data set is ocean marine meteorological surface ocean data; it is formed by gathering, merging many national and international data sources that contain measurements and other visual observations.

III. POINT-IN-POLYGON ALGORITHM 3.1. Ray Casting Principle Point-In-Polygon algorithm is the fundamental computational geometry algorithm to determine whether the point falls inside or outside of the polygon. Many algorithms have been presented [6] [7] to determine this PIP principle. One simple way of finding whether the point is inside or outside of a simple polygon is to test how many times a ray, crosses the polygon edges from starting point and moving in any fixed direction, intersects the edges of the polygon. If the point is on the outside (implying bad) of the polygon the ray will intersect its edge an even number of times. If the point is on the inside (implying good) of the polygon then it will intersect the edge an odd number of times. The point-in-polygon algorithm is based on a simple consideration that if a point moves along a ray and if it crosses the edge of a polygon, possibly many times, then it alternately goes from inside to outside of the polygon, then from outside to inside of the polygon, etc. As a result, after every two edge crossings, the moving point goes outside of the polygon. This observation is proved mathematically using the Jordan Curve Theorem [8]. Figure. 2. Illustrates the typical case of a convex polygon with 7 sides. The P is a point which needs to be tested, to determine whether point P is lies inside the polygon or not with point-in-polygon principle using ray casting method. In this ray crosses the polygon edge an odd number of times means, the point is inside (implying good point) the polygon.

Fig. 3. Flow Chart Representation of working

In the process of outlier detection using Jarvis March algorithm and Point-In-Polygon principle works as follows. First, ICOADS data is passed as input to Jarvis March algorithm, this algorithm processes the given raw data and produces convex hull as a result also called as „n‟ sided polygon with peripheral points of the input data. The polygon resulting from Jarvis March algorithm passed as input to the PointIn-Polygon Algorithm. This Algorithm works using ray casting/ray tracing method and it classify the given input data as two parts. i.e., inside the convex hull (good data) and outside the convex hull (bad data). The points inside convex hull are treated as non-outlier data and the points outside the convex hull treated as outlier data, and need to be eliminated from the data from input ICOADS data and again pass outlier eliminated from the data as input to the Jarvis March Algorithm. It is a repetitive process.

Fig. 2. An Example of Point-In-Polygon [6]

Proceedings of 58th IRF International Conference, 19th June, 2016, Pune, India, ISBN: 978-93-86083-41-8 44

Use of Convex Hull For Detection of Outliers in Oceanographic Data Pertaining to Indian Ocean

V. RESULTS & DISCUSSIONS 5.1. Results on an ICOADS Dataset We run experiments to evaluate the results of this algorithm with different sizes of input, different parameters against longitude and latitude in the process of generating the polygons. The parameters we used in this evaluation are Sea Pressure, Humidity and Sea Surface Temperature against longitude and latitude. In this, we compare performance, results against other two more convex hull algorithms [1] [3].

Fig. 6. Jarvis March Algorithm Lon Vs Sea-Pressure

Fig. 4. Jarvis March algorithm Lat Vs Sea-Pressure

Fig. 7. Jarvis March Algorithm Lon Vs Sea-Pressure

Figure. 6 and Figure. 7 represent the experimental results of the Jarvis March algorithm when we plot against Longitude versus Sea-Pressure. In Figure. 7 the clear representation of the convex hull. In this, few points to be outside the polygon. 5.2. Results on an AWS Raw Dataset The Automated Weather Station (AWS) Dataset that we have used is from the Real-time automatic weather stations which are being operated at ESSO – INCOIS, Hyderabad. AWS Stations are located at different places of Indian coastline. Those places are Visakhapatnam, Chennai, Andaman and Nicobar Islands, Thiruvananthapuram...Etc. There are total 24 AWS stations that are continuously providing data to ESSO-INCOIS from March 1st 2013 onwards. The main objective of AWS is to measure the Ocean surface meteorological parameters (OSMET) – Ocean parameters in order to validate and refine the forcing parameters (Obtained from different OSMET agencies) for the Indian Ocean Forecasting System. [10] AWS Dataset contains different parameters which are being received from AWS stations. The Parameters that contains the AWS dataset are Air Temperature, Humidity, Sea Level Pressure, Wind Speed…etc.

Fig. 5. Quick Hull algorithm Lat Vs Sea-Pressure

Figure. 4 and Figure. 5 represent the results of the Jarvis March algorithm and Quick hull algorithm respectively, when we plot the graph against the Latitude versus Sea-Pressure. It is clear that both the algorithms give the common result but there is a lot of difference in execution time and time complexity of these two algorithms. It is proved that Jarvis March algorithm is the Output-Sensitive algorithm. The execution time depends on the input size. The reading and evaluation process of these two algorithms are completely different. Hence, it is proved that Jarvis March algorithm has less time complexity and faster execution time when compared to other convex hull algorithms [1] [3].

Proceedings of 58th IRF International Conference, 19th June, 2016, Pune, India, ISBN: 978-93-86083-41-8 45

Use of Convex Hull For Detection of Outliers in Oceanographic Data Pertaining to Indian Ocean

Fig. 8. Jarvis March Algorithm Latitude Vs Sea Level Pressure with AWS Raw Data

Fig. 10. Jarvis March Algorithm result after removal of erroneous data points.

Figure. 8. Represents the validation results of the Raw AWS Data against ICOADS polygon. In this, the Raw AWS data is validated against the abovegenerated ICOADS polygon. ICOADS dataset is quality controlled data. Here, the AWS data is raw data that is directly received from AWS stations. We applied the ICOADS dataset polygon on the Raw AWS Data.

Figure. 10. Represents the validation result of the algorithm after removal of the erroneous data from the AWS dataset. It is clearly observed that the peaks we observed in Figure. 8. And Figure. 9 should be successfully eliminated from the final result using the Point-In-Polygon (PIP) Principle. 5.3. Experimental Comparison of Algorithms We implemented the classical Graham Scan and Quick hull algorithms and the algorithm proposed in this paper in Java on an Intel Inside i5-G50 2.30 GHz PC with a 2GB main memory. We used ICOADS data set with a number ranging from 500 to 10, 00,000 in the same range and tested to evaluate the performance of above three algorithms. They are striking the minimum convex hull of the considered points. Table 1. Comparison of three algorithms

The time cost is calculated from the average value of the overhead during striking a convex hull using the same points by several times. The results are shown in Table 1. As can be seen from Table 1, our algorithm beats classic algorithms in generating the minimum convex hull of the same point set and also it is shown that all the three algorithms give the same number of peripheral points with respect to size. .

Fig. 9. Jarvis March Algorithm Longitude Vs Sea Level Pressure with AWS Raw Data

Figure. 9. Represents the validation results of the AWS Data against ICOADS dataset polygon. In Figure. 8. And Figure 9 we observed the peaks when we validate the AWS Sea Level Pressure data against the ICOADS polygon. Those peaks to be revisit once, because the peaks should have depended on climate change. After a revisiting of the data, those peaks should be removed from the raw data because of its erroneous nature. If the peaks depend on the climate change then those peaks should be in the data only.

CONCLUSIONS In this paper, we have proposed a novel computational geometry algorithm to build convex

Proceedings of 58th IRF International Conference, 19th June, 2016, Pune, India, ISBN: 978-93-86083-41-8 46

Use of Convex Hull For Detection of Outliers in Oceanographic Data Pertaining to Indian Ocean

hull based on Jarvis March algorithm and one more computational geometry algorithm to classify the resulted data i.e., Point-In-Polygon algorithm. The Jarvis March and Point-In-Polygon algorithms are integrated together to detect the outliers in ocean data. We have also illustrated how efficiently these algorithms work in section 2.1 and 3.1. The experimental results verify the promising performance of Jarvis March and PIP algorithms. We have initiated the data mining concepts to improve the functionality of the Automated Quality Checking of the Ocean Data in INCOIS using these algorithms.

REFERENCES [1]

R. L. Graham, “An efficient algorithm for determining the convex hull of a finite planar set”, Information Processing Letters 1, pp. 132–133, 1972. [2] R. A. Jarvis, “On the identification of the convex hull of a finite set of points in the plane”, Information Processing Letters 2, pp. 18–21, 1973. [3] C. Bradford Barber, David P. Dobkin, Hannu Huhdanpaa, “The Quick hull Algorithm for Convex Hulls”, ACM Transactions on Mathematical Software, Vol. 22, pp. 469483, 1996. [4] F. P. Preparata, S. J. Hong, “Convex hulls of finite sets of points in two and three dimensions”, Communications of the ACM, Vol. 20, pp. 87-93, 1977. [5] Steven J. Worley, Scott D. Woodruff, “ICOADS Release 2.1 Data and Products”, International Journal of Climatology, Vol. 25, pp. 823-842, 2005. [6] Alciatore David G. and Rick Miranda., "A Winding number and point-in-polygon algorithm.", Glaxo Virtual Anatomy Project Research Report, Department of Mechanical Engineering, Colorado State University, 1995. [7] Kai Hormann and Alexander Agathos, “The point in polygon problem for arbitrary polygons”, Computational Geometry Theory and Applications, Vol. 20, pp. 131-144, 2001. [8] Tverberg Helge, “A proof of the Jordan curve theorem”, Bull. London Math. Soc, Vol. 12(1), pp. 34-38, 1980. [9] http://www.rda.ucar.edu/ [10] http://www.incois.gov.in/portal/datainfo/aws.jsp.

ACKNOWLEDGMENTS The authors wish to thank Dr. TVS Udaya Bhaskar (Scientist-“E”), INCOIS, Hyderabad for their support, Guidance throughout working on this project and preparing this manuscript, and also thank Mr. M Kranthi Kiran (Asst. Professor), ANITS, Prof.S.C.Satapathy (Head of Dept.), ANITS, Visakhapatnam for their support in the College and work.



Proceedings of 58th IRF International Conference, 19th June, 2016, Pune, India, ISBN: 978-93-86083-41-8 47