Building a Front End Interface for a Sensor Data Cloud - Google Sites

1 downloads 112 Views 4MB Size Report
Large-Scale Sensor Data Management s2 s1 ... Elastic, easy to scale up ... A cloud system for massive time series manage
Building a Front End Interface for a Sensor Data Cloud

Ian Rolewicz, Michele Catasta, Hoyoung Jeung, Zoltan Miklos, and Karl Aberer Swiss Federal Institute of Technology (EPFL)

Backgrouds Frontend of TimeCloud Experiments Conclusions

Microsoft SensorMap  Web-based visualization of real-time sensor data

Planetary Skin  NASA-Cisco climate change monitoring platform – $39 billion – online collaborative platform to process data from satellite, airborne and sea- and land-based sensors around the globe

Swiss Experiment  Collaborative environmental research project

Large-Scale Sensor Data Management s1 s3

time

t1 t2 t3

s2 time

sensors

s1

s2 s3

5.9 6.1

5.8 ?

6.1 6.2

36.2

6.0

6.3

internet

t1 t2 t3

sensors

s1

s2

5.9

5.8

6.1 36.2

? 6.0

 Systems are typically distributed, federated

Pitfalls  Users are generally not computer geeks – Managing servers?

 Difficult to upgrade systems distributed – Patch? new version?

 Deterministic, inflexible – More users/data this month?

 Hard to process distributed complex processing – Servers: must run for obtaining data!

Cloud-Based Sensor Data Management  No maintenance cost  High availability of data  Fast complex data processing – Centralized environment

 Elastic, easy to scale up  Easy to patch, version up

TimeCloud  A cloud system for massive time series management – Being developed at the distributed information systems laboratory, EPFL – Consists of frontend and backend

 Basic functionalities – Tables, graphs, password-protected, group-based data share

 Advanced built-in support (ongoing) – Detecting/notifying dead sensors, data cleaning – Dynamic metadata creation/join – User-subscribed R/MATLAB execution

 Third-party software (ongoing) – SensorMap, SwissEx Wiki

Backend  Scalable, fault-tolerant – Built upon Hadoop, Hbase, and GSN

 Adaptive data storage – Partition-and-cluster (PaC) store

 Model-based cache – Minimize data transmission

 Model-coding join – Fast distributed join using bitmap

Frontend

Goals  Simple, intuitive, easy to use  Going beyond just displaying data  Minimize backend workload  Minimize data transmission

Key Approach: Model-Based Processing • Probabilistic processing • Error estimation • Data cleaning • Prediction • Interpolation • Compression • Fault-tolerance …

Continuous Moving Queries Give a (in car) pollution update every 30 mins Aggregate Queries COX emitted yesterday in Lausanne center

Model-based middle layer

user-defined models

DBMS (storage of raw sensor values)

Mobile Sensor Data (Pollution Values)

incomplete, inaccurate, correlated sensor readings

Models • Regression models (e.g. linear) • Approximation models (e.g., Chebychev) • Correlation models (e.g., GAMPS) • Probabilistic models (e.g., HMM) • Interpolation models (e.g., Kriging) • Signal processing (e.g., DFT)

2e

Model-Based Views in DBs

MauveDB [SIGMOD’06]

Challenges in MSD

FunctionDB [SIGMOD’08]

Model-based Processing in Frontend  Model-based views – Approximate results first, instead of actual data – Only when users ask actual data (e.g., a button in GUI), fetch actual data – Less data transmission, fast visualization

 Model caching – Cache model parameters – Reuse for table vis. -> graphs vis. , and vice versa

 Incremental visualization – Bring only what you see

Implementation  Web-Based interface  Display tables and graphs – Visualizations implemented with Protovis – Visualization zoo library for plotting graphs

 Python with – the Django Framework and the YUI 2 library.

Backend Data Model

 NULLs not stored in HBase → better for sparse data  Column families stored in separate files

Frontend Screenshot  Model-based approximated data

Frontend Screenshot  Full precision

Frontend Screenshot  Model-cached graph plotting

Frontend Screenshot  Other graph plotting

Experiments

Performance Measure  Settings – Testbed on a cluster of 13 Amazon EC2 servers, each having: • • • •

15 GB Memory 8 EC2 Computing Units 1.7 TB Storage 64-bit platform

– One of them: HBase Master + Front End – 12 others: HBase Region Servers

 Run – 1000 random reads over real sensor data stored in TimeCloud

Query Processing Times

Network Usages

Graph #

KB transferred (original)

KB transferred (approximated)

1

112.3

23.3

2

124.5

28.0

3

126.6

25.9

4

120.2

25.1

5

119.9

26.8

6

124.4

27.7

Conclusions  Introduced an advanced frontend for TimeCloud – Simple, intuitive, and easy to use – But going beyond just displaying data

 Model-based processing – Minimize data transmission over networks – Minimize backend workload

 Future work – Various model support – Design of additional visualizations

Thank you

Suggest Documents