DIGITAL FILTER IMPLEMENTATION IN HADOOP ...

32 downloads 0 Views 1MB Size Report
Implement smoothing filter (Savitzky-Golay) in. Hadoop Data Mining System. • R environment? RevoRscale. RHadoop. → Amazon Ec2. R Enterprise → Oracle ...
DIGITAL FILTER IMPLEMENTATION IN HADOOP DATA MINING SYSTEM Dariusz Czerwiński Lublin University of Technology (POLAND)

1

Agenda • • • •

Introduction Aim of the work Filter implementation Conclusions

2

Introduction • Data mining – very important in IT industry and science • Data Mining Applications in DSP: – classification – clustering – segmentation – mining and sequential analysis – multi-dimensional visualization 3

Cause of research (1) • Commercial NI Diadem Measurement Data Mining System – superb post processing capabilities – 2 billion data points limit

4

Cause of research (2) • Quench measurement system for 2G HTS tape

5

Cause of research (3) • Long time measurements – 15 seconds - 6 million measured data points – 12 minutes - above 231 (2 billion), over 7 GB CSV output file

6

Aim of the work • Implement smoothing filter (Savitzky-Golay) in Hadoop Data Mining System RevoRscale → Amazon Ec2 RHadoop R Enterprise →

• R environment?

Oracle DB

Renjin SaaS → Google App Amazon Beanstalk Heroku

Omegahat

RAmazonS3 → Amazon S3

7

Testbed • Host machine (PC – CPU - AMD Athlon X2 240, 2 cores, 6 GB RAM, 250 GB HDD SATA), host system Windows Professional x64, VMware Player v.7.0 with VMware Tools installed • Guest - OS Cloudera CDH 5.3.0.0 (Centos 6.4 x64) with 4 GB RAM, 2 cores, 40 GB

8

RHadoop idea • rhdfs - basic connectivity to the HDFS file system • rhbase - provides basic connectivity to HBASE • plyrmr - common data manipulation operations, as found in popular packages such as plyr and reshape2, on very large data sets stored on Hadoop • rmr2 - package that allows to perform statistical analysis in R via Hadoop MapReduce functionality on a Hadoop cluster. • ravro - adds the ability to read and write avro files from local and HDFS 9

MapReduce idea

Source: http://developer.yahoo.com

10

R environment setup • R environment additions • > install.packages(c(“rJava”, “Rcpp”, “RJSONIO”, “bitops”, “digest”, “functional”, “stringr”, “plyr”, “reshape2”, “caTools”))

• R Hadoop connectors • >install.packages(“/home/cloudera/Downloads/ rhdfs_1.0.8.tar.gz”, repos = NULL, type=”source”) • >install.packages(“/home/cloudera/Downloads/ rmr2_3.3.0.tar.gz”, repos = NULL, type=”source”) • >install.packages(“/home/cloudera/Downloads/ signal_0.7-4.tar.gz”, repos = NULL, type=”source”) 11

Filter implementation • >hdfs.init() • >my.data=read.csv("/home/cloudera/filter /sample.csv") • >I=my.data[,"I0"] • >I.index=to.dfs(I) • >sg= values(from.dfs(mapreduce( input=I.index, map=function(k,v) sgolayfilt(v)))) 12

Experimental results

Measured data of instantaneous current

Filters comparison 13

Conclusions • Filter implemented in test environment gave very good results and allows for big data handling • Experimental results showed, that it is possible, with positive attempt, to build the digital filter in Data Mining System using R programming language and environment • There is ongoing work to implement Savitzky-Golay digital filter using reduce stage for introducing the filter equation and convolution coefficients and compare the results with earlier one 14

Contact Dariusz Czerwiński [email protected]

Institute of Computer Science Lublin University of Technology

• Thank you for attention!

Nadbystrzycka 36B 20-618 Lublin Poland

15

Suggest Documents