Fast decoding of Automatic Identification Systems ...

1 downloads 0 Views 238KB Size Report
board navigational safety by allowing the captain to access information about the ship's surroundings even when potential obstacles are obscured from sight or ...
Fast decoding of Automatic Identification Systems (AIS) data Henrik Holm

Niklas Mellegård

RISE Viktoria [email protected]

RISE Viktoria [email protected]

Abstract—Decoding large AIS encoded data sets into clear text is time consuming. This paper details an approach on how to structure the innermost part of AIS decoding to increase performance. The method is compared to existing Open Source implementations, as well as to a straight forward example. The proposed approach can increase decoding performance 20 - 30 times compared to these. Index Terms—AIS, decoding

I. I NTRODUCTION Automatic Identification System (AIS) is a maritime tracking system that allows ships to broadcast their position, current maneuvers, and other information to receivers in their area[5]. One important purpose with AIS is to facilitate improved onboard navigational safety by allowing the captain to access information about the ship’s surroundings even when potential obstacles are obscured from sight or on-board sensors (e.g. radar). AIS messages are broadcast from ships over VHF radio, and can be accessed by other ships, coastal receivers, and satellites. These receivers can be joined together in networks to cover large areas, such as a country’s coast line. Aggregating data from such sources—either by subscription to streaming services1 or crowd-sourcing2 —will over time result in large volumes of historical AIS data. When analysed, stored historical AIS data can provide valuable information, for instance about traffic patterns [10] and trends in shipping operations [6], or to measure compliance of nautical regulations [4]. The broadcast format of AIS was designed to be carried on low bandwidth transmissions; the encoded format is therefore compact and suitable for efficient storage. Even though storage today is relatively cheap, the volume of AIS data that is generated is large and growing—as illustration, approximately 1.5GB of encoded AIS messages3 are generated in Swedish territorial waters per day. Analysing AIS data from a larger geographical area or time span has many of the characteristics of Big Data [3]—the volume of data poses challenges in terms of both storage and processing with standard tools. While there are several Open Source libraries for decoding AIS data, they were not necessarily designed to efficiently 1 E.g.

Swedish Maritime Administration - http://www.sjofartsverket.se/ais 2 E.g. AISHub - http://www.aishub.net 3 AIS messages as NMEA sentences

handle large data volumes. Instead many of the freely available libraries are intended for decoding a live AIS data stream, in which case the data decoding rate is not the limiting factor on a modern computer. In light of the challenges described above, the use case is to decode AIS messages in sub sets of interest, instead of storing all data decoded. While utilizing the compact nature of the encoded messages to minimize storage requirements, this approach calls for a highly optimized decoding algorithm to minimize the time spent on decoding. In this paper we focus on the innnermost part of decoding AIS since it is time consuming and has a large impact of the overall running time, if optimized. More specifically, in this paper we show: • How to structure the AIS data in memory, including an example in C (Section III) • How to retrieve data fields, also including an example in C (Section III) • That the above increase performance by a factor of 20 30 compared to a straight forward approach (Section IV) • That the above increase performance by a factor of 19 and 34 compared to two existing Open Source libraries (Section IV) II. BACKGROUND AND APPROACH Encoded AIS is received as NMEA 0183 sentences. The AIS data payload in each NMEA sentence consists of ASCII armoured six-bit bytes4 in network byte order (i.e., Big Endian)[8][5]. That is, six bits are used for data and the bytes are sent in the reverse order compared to what is used on modern, Intel-based computers. Decoding AIS consist of: 1. Removing the ASCII armour 2. Converting the resulting six-bit bytes 3. Extracting the fields of information. To dearmour the data every byte is subtracted by the value5 48. But, the 8 value interval from 88 to 95 (ASCII characters "X" to "_") are not allowed according to the standard, so in case a byte value is larger than 87, 56 is subtracted from that byte (i.e., 48 + 8). 4 Throughout this paper "six-bit bytes" refer to bytes where only six of the available eight bits can be used. Analogically, "eight-bit bytes" means all bits can be used. 5 All values in this paragraph are decimal

Conversion is the process of removing the unused bits (i.e., packing the six-bit bytes as eight-bit bytes). The converted bytes represent binary encoded fields of data, the smallest being a single bit and the largest several bytes. Extracting a field requires, given a start bit and an end bit, to mask out the bits that represent a specific field and load it into a variable. There exist 27 different types of messages[5], each with its own decoding scheme in terms of which bits mean what and of which data type. There is an IALA document that further defines technical aspects of messages and their content[1], but it is not publicly available, and the author of this paper has not read it. The approach to speeding up decoding consists of the following three parts: A. Use a lookup table for the dearmouring of each incoming byte B. Process several bytes at a time instead of copying over one bit at the time from the input bytes. C. Store the converted bytes in reverse order. That is, to put the first incoming byte at the end of the result buffer.

needs to be shifted left 8 steps. Byte 2 (containing bits 8 to 15) is assigned as the low byte. The resulting two bytes then needs to be shifted right two steps, to remove bits 14 and 15, and masked to remove bits 0 to 5 (See figure 1 for bit index). Below is a straight forward approach to convert a string of six-bit bytes. static int convert_simple(char* input, int in_bytes, char* output, int output_sz){ for(int i = 0, j = 0; i < 6*in_bytes; i++, j++){ // indices to input and output buffers int in_byte = i/6; int in_bit = i%6; int out_byte = i/8; int out_bit = i%8; // Get and de-armour the whole byte int v = input[in_byte]; v = v > 88 ? v - 56 : v - 48; // Shift right to the current bit v = (v >> (5 - in_bit)) & 0x01; // Shift left to the right output location v = (v 87 ? b - 56 : b - 48;

} return 1; }

The example below shows a straight forward approach to retrieving values. A byte is loaded in the low end of the result. For each succeeding byte, the already loaded ones are shifted left. Finally, the result is shifted right if there are unused bits at the far right and then masked. static int retrieve_field_simple(char* input, int startbit, int endbit){

After dearmouring, the string of six-bit bytes is converted into eight-bit bytes. Conversion is, in this version, done by copying every bit, one at a time, from the source string into a destination string. This using two position references, one for each string.

int startbyte = startbit/8; int nr_bytes = (endbit - startbit)/8 + 1; // Values are never more than 4 bytes int v = 0; for (int i = 0; i < nr_bytes; i++) { char b = input[startbyte + i]; // Make room for the next byte v = v > (7 - endbit%8);

Fig. 1. Dearmoured and converted bytes, including bit positions, in Version A. Converted ’Byte 1’ stored at the lowest memory address

The data fields can be extracted after conversion. Each field use only a minimum of bits to store the data, with the desired precision. Few of the fields use even bytes for values and in most cases a field does not start nor end at the byte boundary, so shifting bits and masking is usually necessary. As an example, in order to retrieve a field starting at bit 6 and ending at bit 13, then byte 1 (containing bits 6 and 7)

// Clear bits left of the end v = v & ~((~(0ULL)) > (7 - endbit%8); // Clear bits left of the end v = v & ~((~(0ULL))