A Metadata Encoding for Memory-Constrained ... - Semantic Scholar

5 downloads 49 Views 205KB Size Report
Nov 21, 2011 - Second, TEDS stores sensor data in templates, the structure of which varies ... System tag refers to the node platform and includes informa-.
A Metadata Encoding for Memory-Constrained Devices Farha Ali, Yvon Feaster, Sally K. Wahba, Jason O. Hallstrom School of Computing, Clemson University, Clemson, SC 29634-0974

{fali, yfeaste, sallyw, jasonoh}@cs.clemson.edu

ABSTRACT With the broad applicability of wireless sensor networks across fields, it is desirable to develop self-describing sensor nodes that can operate in a plug-n-play manner. In this paper, we present MoteML, a metadata encoding suitable for storage on memory-constrained devices, designed in support of this goal. MoteML is consistent with Sensor Web Enablement’s [23] Sensor Model Language (SensorML). More specifically, while MoteML does not conform to the SensorML schema, it can be translated into SensorML and vice-versa. This paper explores the available solutions for storing self-describing information on memory-constrained sensor nodes and presents the design of MoteML. MoteML is a text-based encoding that captures a subset of SensorML in a template-based structure. This text data is then compressed using available text compression techniques. The resulting file is small enough to be stored on a memoryconstrained embedded device.

Categories and Subject Descriptors C.2.0 [Computer-Communication Networks]: Data communications; E.2 [Data Storage Representation]: Composite Structures; E.4 [Coding and Information Theory]: Data compaction and compression

General Terms Design, Standardization, Wireless Communication

Keywords Embedded networks, sensor networks, metadata, SensorML, MoteML

1.

INTRODUCTION

A wireless sensor network (WSN) is a computer network that monitors and records sensed data. WSNs are used in monitoring civil infrastructure [1], aiding in disaster response [10], monitoring water resources [8], and monitoring

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 49th ACM Southeast Conference, March 24– 26, 2011, Kennesaw, GA, USA. Copyright 2011 ACM 978-1-4503-06867/11/03 ...$10.00.

health conditions, such as Parkinson’s disease and strokes [15]. A WSN typically consists of a set of deployed sensor nodes, middleware to collect data from the nodes, and a database to store the data [17, 25, 31]. For the data to be meaningful across application layers, potentially developed by different vendors, it is beneficial to represent the data in a standard format. Moreover, to fully understand the data collected from a WSN, metadata information is necessary. This includes information about where the data was collected, any constraints, and the units of measurement. This too, should be stored in a standard format. In this paper, we focus on the Sensor Web Enablement (SWE) [23] standards introduced by the Open Geospatial Consortium (OGC) [19] and used to support in-network discovery of sensor nodes and decoding the data they collect. Sensor Model Language (SensorML) [21] is SWE’s standard for describing sensor systems. SensorML is encoded using XML and its schema provides a rich collection of elements that can be used to represent both data and metadata. The natural question that arises when using SensorML is where to store the SensorML file for each node. One solution is to store the file in an external repository. When a data packet is received from a node, the corresponding file is fetched and used to interpret the data. Adopting this solution, a number of new questions arise: When should the file be stored in the repository? How is the repository kept consistent with the deployed network? One solution is to develop a way to store SensorML files within each node. But this is difficult: On the one hand, a typical sensor node relies on a microcontroller that offers 64K of ROM, 4K of RAM, and 2K of EEPROM storage [2]. On the other hand, SensorML files are expressed using XML and are commonly on the order of 10K – 30K in size — too large to be stored on a sensor node. There are two possible solutions to consider: SensorML files may be (i) saved in the Transducer Electronic Data Sheet (TEDS) format [12] or (ii) compressed using XML compression techniques. The first solution approach presents problems. First, TEDS stores sensor information in fixedsized binary blocks. SensorML is a text-based standard; data cannot be stored in fixed-size blocks. Second, TEDS stores sensor data in templates, the structure of which varies according to sensor type. In contrast, SensorML uses one general element for storing information about a sensor. Hence, it is difficult to map all different types of TEDS sensor templates to a single SensorML element. The second method, too, has problems. SensorML files compressed using this method are still too large to be stored within a typical sen-

sor node. For example, using available XML compression utilities, a 10K-30K SensorML file can be compressed to a size range of 1K to 3K, which is potentially too large to be stored on a sensor node’s EEPROM of size 2K. In this paper, we describe a metadata encoding approach that combines the capabilities of both solution candidates. We present MoteML, a text-based encoding technique for storing self-describing sensor data on memory-constrained sensor nodes. The text encoding can be compressed using standard compression techniques, and the resulting files are small enough to be stored on a sensor node’s EEPROM and still be translated into SensorML. The remainder of the paper is organized as follows: Section 2 presents background information on SensorML, TEDS, and XML compression techniques. Section 3 describes MoteML. Section 4 details the implementation of tools to support MoteML. Section 5 provides an evaluation of MoteML. Section 6 discusses elements of related work. Finally, Section 7 concludes with a discussion of future work.

2.

BACKGROUND

MoteML is designed to capture a key subset of SensorML using ideas adapted from TEDS and available XML compression schemes. Subsection 2.1 introduces SensorML. Subsection 2.2 presents TEDS, a standard for storing information about a sensor’s observational capabilities. Subsection 2.3 presents an overview of some standard XML compression techniques.

2.1

SensorML

The Open Geospatial Consortium (OGC) [19] is an international organization that develops standards for geospatial and location-based services. Sensor Web Enablement (SWE) [23] is one of many suites of standards developed by the OGC. SWE consists of a group of encoding and webservice standards for locating sensor nodes, as well as accessing and understanding the data they collect. The Sensor Model Language (SensorML) [21] is the SWE encoding standard for describing a process of measurement. SensorML defines a sensor system using several different encoding standards present in the SWE suite. For example, the location of a sensor system is defined using SWE’s Geography Markup Language (GML) [18], and sensor data is encoded using the SWE’s Common Data Model [22]. When using SensorML to describe a sensor platform, the System tag refers to the node platform and includes information about its identity, geographical location, and valid time frame for observation gathering. Multiple sensors can be mounted on a sensor platform, described in a ComponentList nested within the System tag. ComponentList may contain one or more Component tags, each used to describe an individual sensor hosted by the platform. The sensor description in a Component tag includes a list of inputs, outputs, and parameters. InputList contains information about the physical phenomena being measured, and OutputList contains the name of the outputs and their units of measure. Parameters are defined in SWE’s Common Data Model to specify output constraints, such as the accuracy and range of a given sensor. The Common Data Model provides a rich collection of elements for describing numerical data values. Example elements include: (i) DataRecord – multiple numerical values grouped together, (ii) Quantity – a single numerical value, and (iii) QuantityRange – a numerical range.

This is only a partial treatment of SensorML. We are using only the elements essential to describing the location and observational properties of a sensor system.

2.2

Transducer Electronic Data Sheets

IEEE 1451 specifies the Transducer Electronic Data Sheets (TEDS) standards for analog sensors. A TEDS document is used to store information about a sensor’s observational capabilities. The specification of a sensor is stored as a chain of binary blocks, each structured according to a given template. These blocks are separated by a 10-bit section that identifies the next template that should be used for decoding the proceeding data. TEDS data can be stored in EEPROM or referenced externally. To decode TEDS data, templates are referenced from external resources. There are three main types of TEDS templates: basic templates, standard transducer templates, and calibration templates. A TEDS sensor specification begins with a basic template — a block of 64 bits. It stores sensor information, such as the manufacturer ID, the model number, the version letter, the version number, and the serial number. Typically, the second template is a standard transducer type template. The structure of these templates varies according to sensor type, and they are identified using a numeric identifier. Optionally, the transducer type template can be followed by a calibration template, which is used to specify information about the data provided by the transducer. Calibration templates also vary according to the type of calibration data they store. Finally, a user data section may be included to store custom information about the sensor.

2.3

XML Compression

SensorML is encoded using XML and the resulting files are compressed using XML compression utilities. XML compression techniques are classified into two categories: textbased compression and XML-aware compression. Standard text compression techniques are used to compress SensorML. They do not, however, make use of a document’s XML structure. Here we briefly discuss two common utilities for text compression: Gzip and Bzip2 Gzip [9] is based on the DEFLATE [6] algorithm, a combination of Lempel-Ziv77 (LZ77) [7] and Huffman coding. LZ77 is a lossless, dictionary-based compression algorithm. More concretely, strings of characters are mapped to a single numeric code. These codes are written in a manner that allows the program to continuously build the dictionary, both during compression and decompression. Huffman coding is an encoding algorithm that uses variable length code tables to encode source data. DEFLATE uses LZ77 to find repetitive strings of characters and then builds a coding table using Huffman coding. Bzip2 [4] is the most recent version of the Bzip utility. It uses the Burrows-Wheeler Transform (BWT) [16], also known as block sort. BWT permutes input text strings such that the frequently appearing strings are grouped together in a block. Bzip2 uses move-to-front [26] and Huffman coding. Although both text compression techniques are promising, the average size of a SensorML file describing a sensor node and the sensors mounted on it is on the range of 10K-30K, and their text compressed versions (not using XML-aware compression) are on the range of 1K to 3K – still too large for EEPROM storage. Accordingly, these techniques alone are not sufficient.

SS1234 , 2005−11−21T20 : 0 0 : 0 0 Z , 2011−11−21T20 : 0 0 : 0 0 Z , 4 5 . 4 5 , −89.12# Listing 1:

Platform

Template for Example Sensor System

XML-aware compression techniques are the next option to consider. These techniques are classified on the basis of how they make use of the document structure. Popular techniques/software include XMill [14], XGrind [29], XComp [13], XWRT [28] and XMLPPM [5]. XMill separates the structure of an XML document from the data it contains. Structure and data are saved separately and then compressed using the most suitable text compression algorithm. XGrind does not separate data from structure. Instead, it saves XML tags and data using a dictionary-based encoding scheme. Character data is compressed using semiadaptive Huffman coding. XComp parses an XML document, reorganizes its contents, and then sends it to a compression tool. XWRT is similar to XMill; it prepares a dictionary of frequently appearing strings (e.g., XML tags) and replaces them with a reference in the dictionary. Character data is then passed to a general text compressor. XMLPPM uses prediction by partial match method (PPM). PPM uses the previous symbols in the symbol stream to predict the next symbol. Although the above mentioned algorithms are suitable for XML files in general, further compression can be achieved by exploiting the fact that the structure of a SensorML document need not be saved during compression.

3.

3 Temp Sensor , T123 , t e m p e r a t u r e , realTemp , measuredTemp , F /& d c a l i b r a t i o n , l i n e a r C a l i b r a t i o n , s t e a d y S t a t e , 3 /& measurementRange , F , −5.0 3 5 . 0 /& g a i n , , 1 . 0 /& o f f s e t , , 0 . 0 /& p a c c u r a c y /& a c c u r a c y , Cel , 0.002#

MOTEML

MoteML uses a text-based encoding, where a sensor node’s information is represented in fields and arranged in a predefined structure – templates. These templates can be directly related to SensorML elements. The information is then compressed using common text compression techniques. Generating a MoteML file consists of the following steps: First, sensor data is acquired using either an input file or a user interface. Second, the acquired data is saved using the template structure shown in Listings 1 and 2 (and discussed below). Finally, the resulting byte stream is compressed using text compression methods. In MoteML, templates have been designed only for the fundamental elements of SensorML. With the exception of the Platform template, each template begins with a numeric ID. All MoteML templates end with ‘#’. The Platform template defines the properties of the sensor node on which one or more sensors are mounted. It is expressed as a commaseparated list of fields. As an example, consider a sensor node geographically located at latitude and longitude coordinates of 45.45, -89.12, respectively. The sensor node’s ID is SS1234, assigned by the network administrator. Data is collected between the timestamps 2005-11-21T20:00:00Z and 2011-11-21T20:00:00Z1 . This sensor node has one temperature sensor mounted on it. The sensor’s ID is T123 and its name is Temp Sensor. The sensor measures temperature in ◦ F, with an error tolerance of +/0.002. The acceptable measured temperature values should be within the range -5 to +35 ◦ F. A gain of 1.0 and an offset 1 These timestamps are defined using Coordinated Universal Time using a

Suggest Documents