Predicting Ship Behavior Navigating through Heavily ... - IEEE Xplore

4 downloads 64447 Views 1MB Size Report
Abstract—The newly developed Big Data oriented distributed systems such as Apache HBase have been proven effective in storing and analyzing the ...
2013 First International Symposium on Computing and Networking

Predicting Ship Behavior Navigating Through Heavily Trafficked Fairways by Analyzing AIS Data on Apache HBase Wayan Mahardhika Wijaya

Yasuhiro Nakamura

Department of Computer Science National Defense Academy Yokosuka Shi, Kanagawa, Japan Email: [email protected]

Department of Computer Science National Defense Academy Yokosuka Shi, Kanagawa, Japan Email: [email protected] of HDFS (Hadoop Distributed File System) as data storage system and Hadoop MapReduce as data processing framework. In addition to MapReduce, which is well known of its excellent performance in batch processing, Apache has also developed HBase, a distributed scalable column oriented database built on top of HDFS[3]. It is the answer to the need of interactive or real-time read/write random-access to the Big Data which is the feature that MapReduce lacks of. Many organizations are currently known to have adopted the Apache’s distributed system, and their success stories have also been world widely published. It is reported that by processing and analyzing Big Data using those newly developed technologies, new solution for knowledge acquisition and decision making are made possible. For companies, this surely can lead to greater profit margin. However, retrieving value from Big Data is not as simple as revealing new result by merely summing some already known values. The value is discovered through a refining modeling process started from making a hypothesis, then followed by creating statistical data, visualization, or creating semantic models. The results are validated, then make another hypothesis based on the newly discovered values. The process may involves a person interpreting visualization or creating interactive knowledge-based queries, or a development of machine learning adaptive algorithms that can discover meaning.

Abstract—The newly developed Big Data oriented distributed systems such as Apache HBase have been proven effective in storing and analyzing the exponentially growing volume of wide variety of data such as sensors data, customer generated media, web logs, and so forth. In this paper, Apache HBase, a distributed scalable big data store, is used to store, process, and analyze a large amount of spatiotemporal data generated by shipboard AIS transponders. The objective is to predict the behavior of ships navigating through heavily trafficked fairways around the gates of busy harbors. For that purpose, experiments were conducted using tens of gigabytes of real world AIS data. The data were processed to form historical ships’ tracks and were classified based on ships attributes such as type, draught, voyage destination and country of origin. Finally, a simple algorithm was implemented to predict the target ships behavior based on its attributes and movement characteristic. As a result, an acceptable prediction of ships movement is achieved. Furthermore, the experimental result also indicated that in the case of data processing speed, this technique remarkably outperformed the traditional GIS application software. Keywords—Apache HBase, Apache Hadoop, Automatic Identification System, movement prediction, Geographic Information System, vessel track visualization, trajectory analysis, Big Data

I.

I NTRODUCTION

Nowadays, it is a quite common phenomenon that more and more organizations are facing complexity in dealing with their extensively growing volume of data. They have to manage as well as analyse the data, and they need to extract value or meaning from the massive data stream which is continuously growing and has already reached the limit of traditional data processing application. This kind of data is currently known as Big Data which refers to tremendous datasets that are challenging to store, search, share, visualize, and analyze [1]. Currently, Big Data is often defined to have three characteristics : volume, variety, and velocity[2].

In this paper, tens of gigabytes of shipboard Automatic Identification System (AIS) data are stored and analyzed on Apache HBase operating on top of HDFS. The goal is to predict the movement of cargo ships navigating through heavily trafficked fairways in the vicinity of the gates of busy harbors by merely using the analytical result of the AIS data set. AIS is a means for vessels that make it possible to transmit data via radio wave to other vessels as well as to inland stations. The transmitted information can be divided into three categories: (1) Static Information, which includes vessel’s name, International Maritime Organization (IMO) number, Maritime Mobile Service Identity (MMSI) number, and dimension. (2) Dynamic Information, which contains vessel’s position, Speed Over Ground (SOG), Course Over Ground(COG), current status, and rate of turn. (3) Voyage-specific Information, which consists of destination, Estimated Time of Arrival (ETA), and draught[4]. The motivation of this research is to offer valuable information for ship operators as well as harbor traffic administrators

As an effort to challenge the complexity of Big Data processing, new framework of distributed parallel data storage and processing such as Google File System, MapReduce, and Bigtable have been implemented following the publication of its technical specification by Google. At this time, the most publicly recognized implementation of the framework is the distributed computing software developed by Apache Software Foundation known as Apache Hadoop. Hadoop mainly consist 978-1-4799-2795-1/13 $31.00 © 2013 IEEE DOI 10.1109/CANDAR.2013.39

220

by providing close future estimation of ships behavior. The process of analyzing the data was initiated with a hypothesis that the ships movement behavior that form its tracks is determined by several parameters such as course, speed, draft, type, time, season, country of origin, and voyage destination of the ships. Thus, it is logically possible to predict any ship’s track based on those parameters. The next step is determining the HBase table schema to store the AIS dataset on HBase. The schema needs to be designed in such a way that statistical data can be retrieved effectively as the performance of HBase is greatly influenced by the table design[5]. Then, statistical data and visualization are created based on the parameters that determine the ships’ tracks. Base on this statistical data as well as the visualization, and by implementing a simple algorithm, the future position of target ships were predicted. The results are then validated through repeated experiments. As a conclusion, an acceptable prediction of ships movement behaviour is practical to achieve by this method.

construction task. To examine the possibility of using parallel distributed system on processing Geographic Information System (GIS) data, Nathan Thomas Kerr conducted a research on parallel GIS processing using Hadoop and Message Passing Interface (MPI)[10]. He concluded that parallel GIS processing is possible and can afford significant decrease in processing time while using simple algorithms in cases where one or more datasets must be processed in their entirety. Furthermore, in [11] Shubin Zhang et al proposed an algorithm called Spatial Join with MapReduce(SJMR). He compared the performance of his newly developed algorithm with the traditional Parallel Partition Based Spatial-Merge join (PPBSM) and concluded that SJMR outperformed PPBSM. So far, Hadoop with its beneficial features was proven feasible to be used in processing large data set including spatial data. However, MapReduce as the core of Hadoop data processing framework was not designed to handle interactive or real time read/write random-access to the assigned data sets. For this matter, in [12], Abhisek Sagar proposed Hadoop-withDatabase (HadoopDB) which takes Database Management System (DBMS) as the storage and MapReduce mechanism for parallelization. As a complement to Hadoop which is lacked of interactive data processing feature, Apache has developed HBase [13][3] as an open source implementation of Google Bigtable[14]. HBase is the answer when real time read/write random-access to very large datasets is required[15]. A research on using HBase to process multidimensional data was conducted by Shoji Nishimura[16] who proposed an extension of HBase called MD-HBase. The proposed system provides an effective range searching for multi-dimensional data without sacrificing the scalability feature of HBase. Another research was carried out by Gimadeev Sergey who presented a method and algorithms for optimizing access to spatial data in HBase [17]. The method uses quad-tree indexing to index the spatial data objects for further processing.

The contributions of this paper are as follows: (1) Ship movement behaviour is predicted by implementing the developed simple algorithms on large data set of real-world AIS data stored and processed on Apache HBase. (2) A finding which emphasize that HBase is obviously faster than the traditional desktop application in the case of processing large data set. The experiment section of this paper explain that HBase performed much better than the MarineCadastre.Gov’s TrackBuilder embedded on ESRI ArcGIS in the process of generating historical ships’ tracks to visualize the AIS data set. The remaining of this paper is organized as follows. Section II discusses several related works on distributed parallel processing using Hadoop and HBase as well as several researches on ship motion prediction using AIS data. In section III, the details of the proposed method is presented. Following this, the result of the experiment based on the method is provided in section IV. Finally, the conclusion and the future work of this paper are stated in section V.

B. Researches on AIS data II.

R ELATED W ORKS

In year 2000, the International Maritime Organization (IMO) adopted a requirement for all ships to be equipped with AIS capable of providing information about the ship to other ships as well as to coastal authorities automatically. The requirement then became effective for all ships by 31 December 2004[18]. Since then, the spatiotemporal data generated by the shipboard AIS transponders has become a common subject of researches on the field of maritime studies. The researches were mainly carried out on a purpose of discovering new values from the abundant availability of spatiotemporal data which are continuously generated. The method used on this purpose can be statistical model as well as data mining techniques.

A. Researches on Distributed System Distributed system has been a common method in processing and analyzing large data set especially after Apache Software Foundation introduced Hadoop[6] as an open source implementation of Google File System (GFS)[7] and MapReduce[8]. In [6] Konstantin Shvachko et al described the architecture of HDFS and report their experience on managing 25 petabytes of enterprise data at Yahoo! using Hadoop. The data were stored and processed on Hadoop clusters which span 25000 servers, with the largest cluster being 3500 servers. They found the scalability advantage of Hadoop and recommended that managing multiple clusters is more prudent than a single large one. Having recognized of its scalability, fault tolerance, and its simple programming model, many researches on processing massive spatial data using Hadoop were conducted by researchers of academic institutions as well as professionals of enterprise research centers. In [9] Ariel Cary proposed a scalable solution for building R-tree indexes on large spatial databases using the MapReduce parallel programming model. His paper stated that R-tree construction times were significantly lowered with the parallel approach, which achieved close to linear scalability as more nodes were used in index

In [19], B. Ristic et al proposed a method of anomaly detection and motion prediction by conducting statistical analysis of motion patterns in AIS data. It used AIS dataset collected over a period of three months to generate vessels trajectories. Based on the trajectories, vessel motion patterns were extracted which were used to perform anomaly detection and motion prediction. The motion prediction was conducted using the Gaussian sum tracking filter under an assumption that no anomaly exist. However, the paper does not provide numerical data presenting the extend of validity and effectiveness of

221

the proposed method. Another research on vessels trajectories generated from AIS data was conducted by Premalatha Sampath[4]. His paper presented an optimization method of raw AIS data with a purpose to conduct trajectory analysis in order to identify the characteristics of vessels in New Zealand waterways. The proposed method was executed to generate the velocity profile of three types of vessel which were High Speed Craft, Passenger Vessel, and Ferry and to create visualization of ship trajectories.

TABLE I. RowKey

A Combination String of VesselTypeID.Draught.Status.MMSI

ColumnKey

Value

Geohash String

The static, dynamic, and voyage specific information of AIS data

whether it is underway, at anchor, aground, and so forth. (7) Nationality. In a busy international harbors, various nationality of vessels delivering country-specific commodities are entering and leaving the harbors in every business day. It is logical to assume that those vessels will navigate through different route based on the direction they come from, or possibly according to their cargo. These seven parameters are then grouped into two categories, which are primary and secondary parameter. Location, course, and speed are categorized as the primary parameter, while the rest of it are classified as the secondary one. In other word, the primary parameter is the spatiotemporal element of the AIS data, whereas the secondary one is the vessel’s attributes.

In the term of safety in maritime environment, Trika Pitana et al proposed a hazard navigation map using AIS data[20] in which the information from ship database such as the length and type of ship is combined with AIS data to determine the dangerous score[21] when the ship is underway. The resulting dangerous score is then combined with GIS to develop hazard navigation map. Following this research, a more realistic dynamic navigation hazard map was proposed in [5] by combining relatively static data of Electronic Navigational Chart (ENC) and the dynamic AIS data. III.

B ROADCAST TABLE TABLE SCHEMA

P ROPOSED M ETHOD

The outline of vessel movement pattern can be visually observed by visualizing the vessels past tracks using the spatiotemporal information of the AIS data. In this case, the vessel attributes of the secondary parameter are then used to classify the movement patterns. In order to generate valid movement patterns for any vessels making way on the monitored sea area, an adequate amount of AIS data over certain period of time must be available.

According to the previous works on distributed system mentioned in section II, it is feasible to accelerate the performance of processing large sized spatial data by implementing distributed system such as Hadoop and HBase. On the other hand, the results of researches on AIS data indicated that it is possible to estimate the movement of vessels based on its moving patterns extracted from historical vessels trajectories. Thus, an attempt to predict the moving behaviour of vessels using large dataset of AIS data processed on HBase is a convincing idea.

B. Designing HTable Schema and Creating Statistical Visualization When the required data set is ready, the next step is to design the HBase table schema in such a way that it shall surely provide a fast and effective response to spatial queries such as kNN query and vessel trajectories query. There are three tables that need to be created, namely, BroadcastTable, VesselTable, and VoyageTable. BroadcastTable is the main table which hold the AIS data and will answer the incoming queries. VesselTable hold only the static information of vessel profile, while VoyageTable containts voyage-specific information. Both of these table are only needed when constructing or updating the BroadcastTable. The schema of the BroadcastTable is as in Table I.

A. Hypothesis Establishment This paper was initiated by a hypothesis that any vessels navigating through fairways on its way to enter or leave a harbor must follow a certain moving pattern base on its attributes and characteristics. Vessels having exactly the same or similar attributes and characteristics are likely to follow similar moving pattern. The vessel attributes and characteristics that are used as the parameters for determining vessels moving pattern and predicting its future position are retrieved from AIS data as follows: (1) Current location. This parameter is needed to gather the k nearest neighbors (kNN) of the target ships which will be used to determine its moving pattern. (2) Course and speed. These are selected under a logical assumption that ships having the same course and speed are likely to reach the same location in a certain period of time. (3) Vessel Type. This parameter is selected base on real world observation that passenger ships generally have different route from tankers or cargo ships. (4) Draft. Vessels are constrained by her draft when making way through waterways. In order to navigate safely, sufficient depth must be taken into account so that vessel with different draft will need different depth to be considered as navigable waters. (5) Destination. It is logical that vessel with different destination will navigate through different route. However, in AIS data, the destination information is provided by ship’s operator which means the information is prone to error caused by misspell, wrong abbreviation, or just because the operator forget to update the information. (6) Status. This parameter indicate the vessel moving status

The combination of vessel type, draft, status, and MMSI is chosen as the row key of the BroadcastTable because it enables an effective response to vessel trajectories query which is a query that request vessel(s) trajectories. These trajectories are then visualized by using Java based open source GIS software so that visual analysis of vessels movement pattern is made possible. Statistical visualization of vessels trajectories can be generated by customizing the vessel trajectory query with the previously mentioned parameters. Geohash[3] is generated from vessel’s position coordinate (longitude, latitude) to construct BroadcastTable’s column key. Geohash is a technique to convert two dimensional location coordinate of longitude and latitude into one dimensional sorted strings similar to Z-order space filling curve. The algorithm enables us to execute prefix searching as a simple and effective way to conduct two dimensional range searching.

222

This feature is also applicable in providing fast response to kNN query. C. Implementing Simple Prediction Algorithm The main idea of this paper is predicting vessel future position based on the past tracks of vessel of the similar attributes and characteristics. Thus, the algorithm logic is simply finding the vessels similar to the target ship and assuming that the target vessel will follow the same track as those similar vessels. In this paper, vessels of type cargo ship are chosen to implement the algorithm. Within the type of cargo ship there are 10 subtypes of vessel assigned with different type code. The parameter Vessel Type mentioned in this section refers to this type code. The prediction algorithm is as follow: •

Convert the target ship location coordinate P (x, y) to Geohash, where x is the longitude value and y is the latitude one. Target ship must have destination.



Scan the BroadcastTable using row key prefix ”VesselType.Draft.Status” as the scanning row key and the target ship location Geohash string as column key. The result will be the target ship kNN, and all of the neighbors will be ships of the same type, similar draft, and the same status. The kNN algorithm used here is the Geohash based kNN algorithm [3].



Select and return only the resulted kNN that fulfill all of the following requirements : (1) It has the same destination and the same nationality as the target ship, and (2) it has similar course and speed to the target ship.



Compute the distance between the target ship position P (x, y) and each of the resulted neighbors position N (x, y).



Retrieve the neighbors’ trajectories from the BroadcastTable. Each vessel trajectory is expressed as geometry of type LineString represented in Well Know Text (WKT) format. The LineString consist of geometry of type Point expressed in longitude latitude coordinate. The Points that form the LineString are sorted by time with 60 seconds interval.



Fig. 1.



= =

If the movement prediction results in more than one future positions, decide one central position of the prediction by computing all Nearest Neighbors (aNN) of the predicted future positions. Algorithm 1 Finding the central position predictedPositionList = list of future positions if predictedPositionList.size mod 2 == 0 then neighborsNum = predictedPositionList.size / 2 else neighborsNum = (predictedPositionList.size − 1) / 2 if predictedPositionList.size >= 3 then for each element in predictedPositionList calculate kNN where k = neighborsNum aNN = kNN of all element in predictedPositionList count the appearance of each position in aNN central position = position with max count else randomly choose either of the predicted future positions as the central position

The future position of the target ship is calculated using the following formula: P (xt , yt ) D(dxt0 , dyt0 )

The Prediction Algorithm

Fig.2 illustrates an example of the algorithm implementation.

Nt (xt , yt ) − D(dxt0 , dyt0 )(1) N (xt0 , yt0 ) − P (xt0 , yt0 ) (2)

Having calculated the predicted position of the target ship, then the accuracy of the prediction can be examined by simply calculating the distance between the target’s real position and the predicted one.

where T = T0 + ΔT is the future time, T0 is the current time, and ΔT is the time interval between T0 and T . P (xt0 , yt0 ) and P (xt , yt ) is the current and the future position of the target ship respectively. N1 (xt0 , yt0 ), N2 (xt0 , yt0 ), ..., Nk (xt0 , yt0 ) are the k nearest neighbors position at the current time and N1 (xt , yt ), N2 (xt , yt ), ..., Nk (xt , yt ) are their position at time T which can be retrieved from its corresponding trajectories. At the current position, dx and dy are longitude and latitude distance between target and each of its neighbors respectively. Thus, D(dx, dy) is the vector distance from target ship to each of its neighbors, shown in Fig.1.

IV.

E XPERIMENT

In order to examine the validity of the proposed method as well as to evaluate the prediction result, real world AIS data were processed on HBase in fully distributed mode. A. Experiment Dataset The real world AIS data were provided by MarineCadastre.Gov which cover zone 11 of the UTM zoning (Fig.3)

223

Fig. 2.

Fig. 4.

Monitored Area for the Experiment

Fig. 5.

HBase on Experimental Hadoop Cluster

Finding the central position of multiple predicted future positions

B. Cluster Setup Fig. 3.

HBase, which is the Hadoop database, works on HDFS. However, it is also deployable on other distributed file system. In this experiment, HBase is deployed on a Hadoop cluster which consist of 11 Red Hat Linux machines having identical specification. Each machine has eight cores of Intel CPU and 24 gigabytes RAM. One machine was set up as Namenode and as HMaster. The remainding 10 machines were configured as Datanodes and as Region Servers. Moreover, while functioning as datanode and region server, three machines were also set up as the ZooKeeper (Fig.5). In addition, one Windows PC with eight cores of Intel CPU and 16 gigabytes of RAM and a Linux virtual machine with one core of CPU and 4 gigabytes of RAM are set as the HBase client.

Universal Transverse Mercator zones (www.marinecadastre.gov)

collected over the period of 2 years (January 1st, 2009 to December 31st, 2010). The data are in File Geodatabase(GDB) format, and after it were converted to Shapefile(SHP) format for further processing, the size reach 58.8 gigabytes.The data of July 2009, December 2009, August 2010, and December 2010 were used as the test data. As the goal of the proposed method is to predict ships movement on heavily trafficked fairways around the gates of busy harbors, two of the most busy harbors in the United States were chosen, namely, Long Beach and Los Angeles Harbor. An area of approximately 33.825 kilometers wide and 51.971 kilometers long (latitude 33.4850 to 33.7894, longitude -118.5117 to -118.0440) which covers the area of the two harbors and the fairways near its gates were set up as the monitored area where the target vessels behavior will be predicted. Furthermore, four areas, which are marked as West Entrance, East Entrance, Long Beach Exit, and Los Angeles Exit, are chosen as the area of initial position of the target vessels shown in Fig.4. The reason of choosing these four areas is because each vessel on this area has more than one possible main route. For example, vessel on the east entrance will at least have three possible routes: route to the gate of Los Angeles Port, route to the gate of Long Beach Port, and route to the anchorage areas outside the two harbors.

C. Building HBase Table The AIS data provided by MarineCadastre.Gov were in File Geodatabase format which is a proprietary file format of ESRI. The ESRI ArcGIS was used to convert the file to Shapefile. As in Section III, three HBase tables : VesselTable, VoyageTable, and BroadcastTable were created to store the AIS data. The VesselTable hold vessel attributes which are the static information of the AIS data such as MMSI, IMO Number, Call Sign, Name, Type, Length, Width, and Dimension Components. MMSI is set as table’s row key, and the remainder are set as column keys. Table value is the value of each attributes. The table stored 5277 of unique vessels. The vessels that were

224

TABLE II.

T RACKS B UILDING P ERFORMANCE C OMPARISON

Measured Process

HBase (in seconds)

ArcGIS (in seconds)

Reading Input Data Track Building Adding Vessel Data

71.9

588.0 480.0 479.0

Write to disk

12.6

NA

Total Time

84.5

1,547.0

used as the test data were not included in the VesselTable. The VoyageTable stores voyage-specific information of the AIS data such as Voyage ID, Destination, Cargo, Draught, ETA, Start Time, End Time, and MMSI. Voyage ID is set as the row key and the remainder are set as column keys. The schema of the BroadcastTable is as in Section III. It stores all types of the AIS data such as static and dynamic information as well as voyage-specific information. The dynamic information contains vessel position, SOG, COG, Heading, ROT, DateTime, and Status.

Fig. 6.

Track Lines of Danish and Japanese Cargo Ships

is compulsory because it is required to visualize the vessel trajectories. Meanwhile, each of the secondary parameters can be recognized to have a certain rate of contribution in forming the pattern of vessel tracks. In this case, the nationality parameter is found to have a significant role. For example, Fig.6 clearly indicates that Japanese cargo ships visited only the Long Beach Harbor, while those of from Denmark mostly ended up berthing at Los Angeles Port. This means that the cargo ships’ moving pattern can be generally identified by its nationality.

D. Vessel Tracks Visualization Visualization of vessel tracks was created using two different method. The first method was by querying the BroadcastTable on HBase to retrieve each ship trajectories in WKT format. The trajectories do not only contain ship coordinate / location information but also contain the ship’s attributes such as MMSI, IMO number, call sign, name, type and so forth. The second one was by using the AIS TrackBuilder designed for ESRI ArcGIS Desktop 10.1 provided by MarineCadastre.Com. AIS TrackBuilder is a tool that work on ESRI ArcGIS Desktop 10.1 to convert a collection of point features into track line, and is designed to work with data available from MarineCadastre.gov. The software was executed as it is on the Windows PC of the HBase clients.

E. Movement Prediction The movement prediction was conducted as intended by the algorithm in Section III. As indicated in the result of visual observation of the visualized cargo ship trajectories, the nationality of the cargo ship significantly affected its tracks pattern. In order to show the extend of the influence of nationality parameter to the pattern of ship movement, three movement prediction experiment were conducted: (1) Movement prediction with nationality and the other parameters but without draft. (2) Movement prediction with draft and the other parameters but without nationality. (3) Movement prediction using all of the seven parameters including nationality and draft. The result is as indicated by Fig.7. The line chart of the figure displays the prediction accuracy in ten minutes interval to the future from the current time. The horizontal axis of the charts shows distance in meters between the predicted position and the target real position. Whereas, the vertical axis indicates the percentage of the predicted positions that fall within a certain range from the real position. The prediction accuracy line graph indicate that there are more predicted positions that fall near to the real position when the ship nationality parameter is used. In Fig.7, prediction with nationality parameter resulting about 34 percent of the predicted positions hit the range within 500 meters from the target real position, while that of without the nationality parameter generate merely 20 percent.

The two method were implemented and were compared. The result of the comparison is displayed in Table.II. it is obvious that the method using HBase showed outstanding performance which significantly outperformed the one using desktop application. To generate the track lines of all vessels navigating through the monitored area during July 2009, the first method took only around 72 seconds to retrieve data from HBase, sort the position information by time, and embed the vessel attributes to be writen to local storage as WKT file. On the other hand, the second method took almost 20 times longer than that of the first one. The possible reason are: (1) The second method took too much time on reading input data from local storage because the process is constrained by the I/O speed of the disk drive. (2) The AIS TrackBuilder application seems to separate the process of building the track lines and embedding the vessel attributes. It builds the tracks first, then read vessel attributes from local disk and add it to the generated tracks, and finally write the tracks to local storage. Whereas, the HBase client retrieved the complete AIS information including vessel attributes, sort it, and just write position information and vessel attributes to local disk. Through observation of the visualized vessel trajectories, the contribution of the parameters mentioned in Section III in determining vessel moving pattern can be evaluated. Based on this visual observation, it is clear that the primary parameter, which is the spatiotemporal element of the AIS data,

In addition to the measurement of the prediction accuracy along with the influence of the contributing parameters, the time spent for each future position prediction was also measured. The result indicated that the time needed to predict

225

prediction is assigned with an appropriate weight so that the accuracy of the prediction can be improved.

Number of Predicted Position / Total Number of Ships Predicted (%)

Prediction Acuracy for Interval = 10 minutes 8 Draft without Nationality Nationality without Draft Both Nationality and Draft

7.5 7

R EFERENCES

6.5 6

[1]

5.5 5 4.5

[2]

4 3.5 3

[3]

2.5 2

[4]

1.5 1 0.5 0 0

500

1000

1500

2000

2500

3000

[5]

3500

Distance of Predicted Position from Real Position (Meter)

[6] Fig. 7.

Position Prediction Accuracy (10 minutes from the current time) [7]

the future position of a certain target ship was varied when it was repeatedly conducted in several different time. Those time lapses were measured between 1.2 to 7.5 seconds with an average of 2.4 seconds for predicting one target ship. This inconsistency may be caused by the experiment circumstances such as: (1) The communication between the clients which send the prediction query to the HBase and the HBase region servers that process and send back the result to the clients is carried out through a Local Area Network of multiple users. (2) The machines used in the experiment were not intentionally prepared for this research only. It were common machines used by multiple users. (3) Apache HBase has a caching functionality that may affect the processing speed. In this case, the network traffic condition, the machines load, and the HBase caching functionality during the time an experiment was conducted may be considered to have contributed to the inconsistency of the prediction time lapses. However, further research is need to be conducted to examine this hypothesis. V.

[8]

[9] [10] [11]

[12]

[13] [14]

[15]

C ONCLUSION AND F UTURE W ORK

[16]

A simple method of predicting vessel behavior is proposed by utilizing the beneficial features of Apache HBase in processing large data set. To examine the feasibility and the validity of the method, experiment using real AIS data of the ships navigating through the waters near to Los Angeles and Long Beach Harbor over a period of two years have been carried out. Based on the result of the experiment, the proposed method provides two benefits: (1) The method outstandingly outperformed the traditional GIS desktop application in processing massive amount of AIS data to generate vessel trajectories visualization. (2) By merely developing and implementing a simple algorithm, the method is able to bring an acceptable result in predicting ships movement on a relatively wide area of open sea. However, there are still a lot of works that need to be accomplished facing the fact that all parameters used in the ship movement prediction are generally treated as having an equal weight. Therefore, the future work that need to be considered for this research is developing a more advanced algorithm where each parameter used in the

[17]

[18]

[19]

[20]

[21]

226

Helen Sun and Peter Heller, Oracle Information Architecture: An Architect’s Guide to Big Data. Redwood Shores, CA: Oracle, August 2012. Paul C. Zikopoulos, Chris Eaton, Dirk deRoos, Thomas Deutsch, and George Lapis, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. USA: McGraw-Hill, 2012. Nick Dimiduck and Amandeep Khurana, HBase in Action. Shelter Island, NY: Manning Publications Co., 2013. Premalatha Sampath and David Parry, “Trajectory Analysis using Automatic Identification System in New Zealand Waters”, International Journal of Computer and Information Technology, Vol. 2, pp. 132-136, 2013. Wayan Mahardhika Wijaya and Yasuhiro Nakamura, “Proposal of Dynamic Navigational Hazard Map Using Big Data Processing Technologies”, in press. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler, “The Hadoop Distributed File System”, IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System”, SIGOPS - Operating Systems Review, 37(5):29-43, 2003. Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, Communications of the ACM, 51(1):107113, 2008. Ariel Cary, “Scaling Geospatial Searches in Large Spatial Databases”, Florida International University, 2011. Nathan Thomas Kerr, “Alternative Approaches To Parallel GIS Processing”, Arizona State University, December 2009. Shubin Zhang, Jizhong Han, Zhiyong Liu, Kai Wang, and Zhiyong Xu, “SJMR:Parallelizing Spatial Join with MapReduce on Clusters”, IEEE International Conference on Cluster Computing and Workshops, 2009. Abhisek Sagar, “Large Spatial Data Computation on Shared-Nothing Spatial DBMS Cluster via MapReduce”, Indian Institute of Technology, June 2012. Lars George, HBase: The Definitive Guide. O’Reilly Media, Inc., 2011. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, “Bigtable: A Distributed Storage System for Structured Data”, OSDI’06: Seventh Symposium on Operating System Design and Implementation, 2006. Tom White, Hadoop: The Definitive Guide, 3rd edition. North Sebastopol, CA: O’Reilly Media, Inc., 2012. Shoji Nishimura, Sudipto Das, Divyakant Agrawal, and Amr El Abbadi, “MD-HBase: A Scalable Multi-dimensional Data Infrastructure of Location Aware Service”, International Conference on Mobile Data Management(MDM), 2011. Gimadeev Sergey, “Methods and Algorithms for Optimizing Access to Spatial Data in the Database HBase”, Donetsk National Technical University, 2012. International Maritime Organization(IMO), “Guidelines for onboard operational use of shipborne automatic identification systems(AIS)”, Resolution A.917(22), 2002. B. Ristic, B. La Scala, M. Morelande, and N. Gordon, “Statistical Analysis of Motion Patterns in AIS Data: Anomaly Detection and Motion Prediction”, 11th International Conference on Information Fusion, pp.1– 7, 2008. Trika Pitana, A. A. Bagus Dinariyana, Ketut Buda Artana, M. Badrus Zaman, and Hilman Persada, “Development of Hazard Navigation Map by Using AIS data,” Journal of Maritime Researches Vol.1, No.1, pp.43– 52, 2011. Ketut Buda Artana, Dinariyana DP, Masroeri, and Trika Pitana, “Combining AIS Data And Fuzzy Clustering To Measure Danger Score of Ships”, Journal of Maritime Researches Vol.1, No.1, pp.33–41, 2011.

Suggest Documents