Use of Data Warehouse to Manage Data from Wireless Sensors ... - USP

2 downloads 5604 Views 383KB Size Report
Keywords—wireless sensor network, data warehouse, animal monitor. I. INTRODUCTION. The Amazon is known for its diversity of flora and the large amount of ...
Subprojeto 4 Anexo 10 Eleventh International Conference on Mobile Data Management

Use of data warehouse to manage data from wireless sensors networks that monitor pollinators

Ricardo Augusto Gomes da Costa

Carlos Eduardo Cugnasca

Escola Politécnica, Universidade de São Paulo São Paulo, Brazil [email protected]

Escola Politécnica, Universidade de São Paulo São Paulo, Brazil [email protected] energy consumption, since nodes are powered by batteries, depending on the use, energy can last from days to weeks [2]. With the help of WSN, it is possible to monitor various characteristics of the environments in which bees are present, but these data alone or simply collected over time are difficult to be interpreted by users, and especially do not contribute to rapid decision-making. For the monitored data to be recovered in a productive way by the parties, it must be organized in a repository or database, and have an interface with easy access, through which the user can view consolidated information and be able to make strategic decisions. The description above refers to Data Warehouse (DW) that means a set of technologies for decision support used by people interested in making decisions quickly and easily. DW were initially designed to consolidate business data and assist business managers in making decisions [3], as well as designed to contribute directly to corporate competition [4]. A major contribution of this paper is an alternative to manage data collected by WSN based on a model to extract, transform and normalize this data and load it in a DW. The results showed that the crossing of tabulated data with others sources, such as technical reports could improve data accuracy and help to create better data warehouse views. Hence, data warehouse applied to the context above shows to be a useful alternative that helps specialists to obtain information for the decision process, which could generates temperature adjustment into beehive and observation of humidity effects outside the beehive, for example. The remainder of this paper is organized as follows. Section 2 examines related works in the animal species monitoring area with WSN and data extraction-analyze and highlights the small amount of research in this area of knowledge that deal with data warehouse to manage data collected by WSN. Section 3 reviews the technologies and terminologies used in the whole paper, presenting products used in the prototype developed. Section 4 presents the architecture proposed focusing on the process of acquiring and delivering data from WSN to DW. Section 5 shows the results obtained using those collected by WSN in an experiment with pollinators. Section 6 concludes this paper and outlines our future plans, generalizes the solution to others animal and plant species, abstracting it and focuses on data from WSN and extract-transform-load operation into a DW.

Abstract—This paper shows the use of data warehouse as an alternative for managing data collected by Wireless Sensor Networks that monitor bees. The scientific research that uses Wireless Sensor Network for habitat or animal monitor used to produce a large amount of data that need to be analyzed and normalized, so as to help researchers and other people interested in the information. These data managed and compared with information from other sources and systems could contribute in decision processes and seed again other experiments. This paper proposes a model to extract, transform and normalize data collected by Wireless Sensor Networks that monitor pollinators and loaded in a data warehouse. The crossing of tabulated data with other sources, such as technical report is essential to improve data accuracy and create better data warehouse views. Hence, data warehouse applied to the context above is detached as a useful alternative that helps specialists to obtain information for decision processes. Keywords—wireless sensor network, data warehouse, animal monitor

I.

INTRODUCTION

The Amazon is known for its diversity of flora and the large amount of water from regional rivers, and hosts a huge reservoir of natural species of animals and plants, many not yet known to scientists, while others are already endangered due to unauthorized exploration and smuggling. Among these species in the Amazon region, a group of pollinators stands out, accounting for transferring pollen from one flower to another, and therefore contributing to the development of fruits and seeds: bees. Among the studies involving bees, to be detached are those which use technology to monitor the habitat of these animals in order to identify behaviors at different times of day, considering variations in temperature, light, humidity and even count the number of bees from a hive [1]. For such monitoring, Wireless Sensor Networks is used. A Wireless Sensor Network (WSN) is a collection of components (nodes and gateway) equipped with radio transmitter, processor, sensors and memory and can be densely distributed in a tracked region of interest. The nodes are able to collect data from different environments and to communicate with each other, in order to transfer information at distances greater than the scope of each. There are several restrictions to the use of WSNs in relation to their 978-0-7695-4048-1/10 $26.00 © 2010 IEEE DOI 10.1109/MDM.2010.72

402

II.

III.

RELATED WORKS

Monitoring animal species with WSNs is presented as a research field with great potential benefits for scientific communities and the society as a whole [5] [6] [7] [8]. Paper [8] presents the use of WSN to observe the behavior of zebras, wild horses and lions in a research center in Kenya. The experiment lasted about a year and monitored an area of thousands of square kilometers. The main purpose of this research was to monitor the animals in situations, such as interacting with other animals of the same species or not, while grazing, in motion or in contact with human species. The animals were equipped with sensor nodes and a GPS unit, the latter for the location and speed of movement of the animal. These sensors collected data every three minutes, which was transmitted over the network to a temporary station; later, a vehicle would come and collect the data storage and verification. In [1], there is a discussion on the use of control networks and wireless sensor networks for the study of bees. It uses a sensor node to monitor the colonies, which measures the internal temperature and humidity, counts incoming and outgoing bees, temperature control and has a microphone to capture the sounds generated by the bees. Outside, the magnitudes of the environment were observed by a weather station. To analyze data from WSN, [9] introduces an approach based on tasking sensor networks through declarative queries. Given a user query, a manager creates a plan for this statement execution. A leader node is necessary to consolidate data from other nodes. Non-leader nodes have a scan operator to read sensor values periodically and to send them to the leader node. However, stands problems with aggregation operations, which is dependent of leader node that is exposed to volatility of the underlying communication layer. In [10] is presents TinyDB, a distributed query processor that runs on each of the nodes in a sensor network, particularly Berkley mote platform, on top of Tiny OS. The features include the ability to select, join, project, and aggregate data, also incorporating power consumption reduction requirements. Both, [9] and [10] focus on an important element of a WSN: sensor node. Including modification in sensor node source code or attributing different responsibility for this element. This paper proposes a generic approach to extract, transform and load data from WSN and other relative sources into a DW. Focus on the layer between WSN and DW. It is independent of WSN architecture or manufacturer. Even with a wide variety of applications for WSN, there is a lack of alternatives for managing data collected by WSN-oriented decision-making. This differential is presented herein through the DW, technology that is growing very rapidly and with great appeal for aiding in strategic decisions.

TECHNOLOGIES

A. Wireless Sensor Network (WSN) WSN has been made possible by the rapid convergence of three technologies: microelectronics, wireless communications and micro-electromechanical systems. A WSN is a tool for distributed sensing of phenomena, processing and dissemination of data collected and processed information to one or more observers. The potential for observation and control of the real world allows WSNs to be regarded as a solution for various applications for monitoring and control, such as environmental monitoring, management, infrastructure, industrial monitoring and control, public safety and environment in general, areas of disaster and risk to human life, transportation, medical and military control [2]. Among the variety of equipment used for WSN, those developed by Berkeley University and commercialized by Crossbow Technology Inc stand out. [11]. The sensor nodes belonging to the MicaZ family is used here, as shown in Figure 1 and for base station gateway MIB520 USB is used, as shown in Figure 2, and Ethernet MIB600, both from Crossbow (XBow).

Figure 1. Sensor node MicaZ – MPR2400 [11].

The gateway is a device accounting for interfacing between the network of wireless sensors and a computer, allowing the collected information to be stored in the database to be handled by software and displayed on the computer screen. Before data is sent to the WSN gateway, they travel through several network nodes and are susceptible to interference, which could lead to the loss or duplication of information and errors, all caused by overlapping areas of monitoring and equipment failure.

Figure 2. MIB 520 – Gateway USB [11].

403

B. Data warehouse (DW) DW is a collection of data-oriented subject, integrated, variable in time and non-volatile to support the managerial decision-making process [4]. Its main features are: Subject Oriented: Data with a particular vision of the company or subject monitored and not on day-to-day or in progress operation. Integrated: Data from different backgrounds and of different types are organized in a coherent manner. For example, Oracle DBMS, DBMS Postgree, XML and TXT files. Variable over time: All the data stored in DW are identified by a specific time interval. Non-volatile: by definition, the data stored in DW are stable and are not removed. Updates are rare. So the main operations are loads and queries. This allows a better overview of the business that is managed through DW. The Extraction, Transform and Load (ETL) process is one of the most critical phases in the design of a DW. Figure 3 depicts an ETL process with different sources (two databases and XML and TXT files). The ETL process occurs in three steps: 1) Extraction: The first step is to define the data sources to be consulted. The sources can be databases, XML files, TXT files and any other structure that stores minimally organized data. 2) Processing: This step is the standardization of data according to the type and the desired representation in DW. For example, the Gender field can be represented in different ways in various data sources (M and F, or Male and Female, and 0 or 1) it is important for them to be uniform before going to the DW, so that they may reflect compatibility and integration. 3) Load: the last step in entering the data, the DW itself. This process usually occurs at times of little or no access to the DW, usually at dawn and may be a slow operation, consuming hours, depending on the volume of data to be inserted.

Online Analytical Processing (OLAP) allows navigation of the data in a DW, having a suitable structure for both research and for presenting of information. In the navigation tools, OLAP can navigate between different granularities of a cube. Through a process called Drill, the User can increase (Drill down) or decrease (Drill up) the level of detail of the data. For example, a report may be consolidated by the State. With the Drill down, the data will be submitted by cities, districts and so on until the lowest level possible. The opposite process, Drill up, causes data to be consolidated at higher levels. Another possibility presented by the majority of OLAP navigation tools is to rearrange columns and rows. You can change the order of columns and rows and delete them or view those that are hidden in data visualization. IV.

ARCHITECTURE

This section describes the architecture proposed to manage data from WSN by extracting, transforming and loading this information into a DW. Figure 4 depicts a general model including different data sources. The architecture shows a WSN monitoring bees. This WSN has a gateway that collects and stores data in a database. One example of data collected is organized as shown in Table I. Table 1 presents data collected by just one sensor node every 10 seconds. Other sensors node collects temperature and humidity from its coverage area. This model supports data from Other Systems too; for example, a weather station that monitors temperature in some areas. So as to complete the data sources support, consideration is given to other technical reports that are tabulated that could represent some governmental report, academic report or a simple technical report. The data source block in figure 4 represents all sources supported by the architecture proposed. The next step is the ETL process. It represents the most important activity. It begins with the Extraction of data from different sources. Figure 5 summarizes this process. From databases, it could be extracted by SQL scripts usually with the SELECT clause. Assuming that the ETL process will be executed once a day, the script has to get all operations performed in the current day and all modifications made in consolidated data. TABLE I.

EXAMPLE OF DATA COLLECTED BY A SENSOR NODE.

Time 1/7/2010 23:45 PM 1/7/2010 23:55 PM 1/8/2010 00:05 PM 1/8/2010 00:15 PM 1/8/2010 00:25 PM Figure 3. ETL Process with heterogeneous sources.

404

Temperature (°C) 24.3 24.4 24.5 24.6 24.7

Humidity 73 73 72 72 72

Figure 4. Architecture proposed.

The next step is called Transformation process, which consists of the normalization of all data types to be expressed in the same unit. As shown in figure 5, the data could be expressed in different units or time zones. In this case, temperature is in °C and °F. Thus, a conversion to the appropriate unit is necessary. The three tables in Figure 5 depict data collected at different times. The transformation process is accounts for consolidating this data in the same time zone and granularity. The granularity for a time could be in hours, days, months, etc. DW usually expresses data from a large period. Hence, small granularity, such as hour or minute, is usually discarded. Figure 6 shows the normalization process of the example above. After extracting and transforming data, it is necessary to Load this information into a DW. A DW is a database modeled in dimensional modeling. According to [12], dimensional modeling (DM) is the name of a logical design technique often used for data warehouses. It is different from, and contrasts with, entity-relation modeling (ER). This DM consists of a fact table that represents the fact that the specialist is interested in observing and a dimension table that represents de auxiliary entities. The data stored into DW can not be deleted. By concept, only INSERT and UPDATE operations are allowed. After the ETL process, the DW is populated.

Assuming that technical reports have tabular data, they could be extracted by a simple JAVA code that reads this file and extracts all the information in there. Figure 5 depicts the process described above.

V.

RESULTS

To create a DW schema and manage its data, Mondrian [13] and OLAP server written in Java were used to enable interaction with very large data sets stored in SQL databases. With Mondrian, it is possible to access data stored in

Figure 5. Transformation process.

405

based on charts and tables, both outputs from DW. Besides, the ability to cross data to make custom reports helps different user levels with a wide range of needs. As future work, an abstraction of this architecture can be devised to help all kinds of experiments that use WSN. Most of the experiments with WSN have trouble with large amounts of data collected. The search is conducted by directly accessing to database with SQL commands or by non rich screens with tabular data. The DW associated with the architecture proposed here could improve the results observations helping interested people in making decisions about each experiment.

databases using the MDX query language. MDX stands for 'multi-dimensional expressions' [13]. Below, it is possible to see an MDX query used according to the context presented herein: SELECT {[Measures].[Bee count], [Measures].[Temperature] } ON COLUMNS, {[Time].[2010].[Q1] } ON ROWS FROM [Bees] WHERE [Humidity].[25]

ACKNOWLEDGMENT This work was supported in part by the FAPEAM which I would like to thank for the financial support.

This MDX query returns the Bee account and temperature in a table when humidity is 25% in the first 2010 quarter. Table 2 depicts the results. TABLE II.

REFERENCES [1]

EXAMPLE OUTPUT FROM A MDX QUERY.

[2010].[Q1]

[Measures]. [Bee Count] 39

[Measures]. [Temperature] 26

[2010].[Q1].[Jan]

20

25

[2010].[Q1].[Feb]

19

24

[2010].[Q1].[Mar]

0

0

[Time]

[2]

[3]

[4] [5]

This example is to prove the ability of DW to create quick responses to complex questions, such as: How many bees are there outside the hive when the air humidity is above 80% and temperature below 23 ° C? What is the average temperature inside the hive when the bee’s number is greater than 50? These kinds of questions are the ones that specialists are interested in, just because the answers could help them to make a decision about their experiment or quickly generate an experiment report. Besides that, DW consolidates old data. Thus, with simple data crossing it is possible to compare indicators such as Bee Count or temperature in a different year or decades. VI.

[6]

[7]

[8]

[9]

CONCLUSION AND FUTURE WORKS

[10]

Animal monitoring represents an important class of sensor network applications. Each animal species has particular characteristics that attract scientists and other professionals to study them. Our focus was on bees, one of the most important pollinators in the planet. The association of WSN and DW is little explored a research area. However, the benefits of using DW to manage data collected by WSN are shown here. Among the things that stand out is the possibility to help decision-making

[11] [12] [13]

406

C. Cugnasca, "Redes de controle e redes de sensores sem fio aplicadas às pesquisas com meliponíneos," in Anais do Encontro sobre Abelhas de Ribeirão Preto, São Paulo, Ribeirão Preto, 2008, p. 11. S. Meguerdichian, F. Koushanfar, M. Potkonjak, and M. B Srivastava, "Coverage problems in wireless ad hoc sensor networks," in IEEE INFOCOM - Annual Joint Conference of the IEEE Computer and Communications Societies, 2001, pp. 1380-1387. S. Chaudhuri and U. Dayal, "An overview of data warehousing and OLAP technology," ACM SIGMOD Record, vol. 26, no. 1, pp. 6574, Março 1997. W. Inmon, Building Data Warehouses, 2nd ed.: Wiley Books, 1997. A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson, "Wireless sensor networks for habitat monitoring," in 1st ACM international workshop on Wireless sensor networks and applications, Atlanta, 2002, pp. 97-99. N. Ramanathan et al., "Designing Wireless Sensor Networks as a Shared Resource for Sustainable Development," in 1st International Conference on Information and Communication Technologies and Development, Berkeley, EUA, 2006. J. Polastre, "Design and Implementation of Wireless Sensor Networks for Habitat Monitoring," Berkley University, Berkley, USA, MSc Thesis 2003. P. Juang et al., "Energy-efficient computing for wildlife tracking: design tradeoffs and early experiences with zebranet.," in SIGOPS Oper. Syst. Rev., New York, 2002, pp. 96-107. Y. Yao and J. Gehrke, “The cougar approach to in-network query processing in sensor networks” in ACM SIGMOD Record, vol. 31, New York, NY, USA, 2002. S. Madden, M. Franklin, J. Hellerstein and W. Hong, “TinyDB: an acquisitional query processing system for sensor networks” in ACM Transactions on Database Systems (TODS), vol. 30, New York, NY, USA, 2005, p. 122-173. Crossbow. (2010, Janeiro) Crossbow Technology: the company. [Online]. www.xbow.com R. Kimball, "A Dimensional Modeling Manifesto," in DBMS and Internet Systems, San Francisco, 1997, pp. 58-70. Mondrian. (2010, Feb) Pentaho - Open Source Business Intelligence. [Online]. http://mondrian.pentaho.org/