Functional Models of Hadoop MapReduce with Application to Scan pp

Recommend Documents

conceptually!straightforward,!but! ... intermediate!key/value!pairs!are!partitioned! across! ... all!intermediate!values!of!the!same!intermediate!key!across!all!map!

MapReduce & Hadoop

“Hadoop – The Definitive Guide”. Core: A set of components and interfaces for distributed file systems and general I/O ( serialization, Java RPC, persistent data ...

MapReduce OLAP Hadoop Hadoop Hadoop ... - SAP Virtual Agency

SAP delivers a âBig Dataâ solution that scales with your organization's needs. With SAP, your data warehouse can ans

Evaluating Virtualization for Hadoop MapReduce

Evaluating Virtualization for Hadoop MapReduce on an OpenNebula. Cloud. Pedro Roger ..... MapReduce is a web service to which users submit. MapReduce ...

Performance Evaluation of a MapReduce Hadoop-Based ...

family of analytics applications from processing log data files. Indeed, log data files are commonplace in many Internet- based systems and applications, ...

A Comprehensive View of Hadoop MapReduce

this paper, we describe the overview of Hadoop MapReduce and their scheduling issues and problems. ... deliver a highly-available service on top of a cluster.

A Comparison of Hadoop and BitDew-MapReduce

Using the emulation framework on Grid5000, we .... the number of concurrent tasks for efficient multi-co e node ' ... 12: {Re ol e mu ual affini y dependence}.

A Comparison of Hadoop and BitDew-MapReduce

Assessing MapReduce for Internet Computing: A Comparison of Hadoop and. BitDew- ... Keywords-desktop grid computing, MapReduce, data- intensive ... large numbers of computing resources to attack their problems ..... Grid5000 platform.

Functional dynamic factor models with application to yield

Ours is a dynamic factor model with functional coefficients which we call ... mates both functional and time series components simultaneously, and does.

Functional dynamic factor models with application to yield curve ...

Box (1987), PeËna and Poncela (2004), to name just a few] using a dynamic ... Siegel model (DNS): a three factor DFM with functional coefficients esti- mated in ...... which to buy is made based on the sign of the predicted spread in their one.

Cloud-enabling Sequence Alignment with Hadoop MapReduce - ipcbee

MapReduce [1] is a popular cloud computing programming model or a framework for distributed ... are filtered with another MapRed job, so that an unambiguous best .... Retrieved from http://developer.yahoo.com/blogs/hadoop/posts/2009/05/ ...

Tutorial on Hadoop HDFS and MapReduce - Hortonworks

Task 3: Import the input data in HDFS and Run MapReduce ............................. ... 3. Introduction. In this tutorial, you will execute a simple Hadoop MapReduce job.

MapReduce Tutorial - Apache™ Hadoop - The Apache Software ...

1. Purpose. This document comprehensively describes all user-facing facets of the Hadoop MapReduce framework and serves as a tutorial. 2. Prerequisites.

Fault tolerance in Hadoop MapReduce implementation - Hal

Sep 18, 2013 - abroad, or from public or private research centers. .... Fault tolerance of the MapReduce paradigm in cloud systems was first ..... VMware (2012).

Intro Hadoop and MapReduce Certificate.pdf - Google Drive

There was a problem loading more pages. Intro Hadoop and MapReduce Certificate.pdf. Intro Hadoop and MapReduce Certifica

QoS oriented MapReduce Optimization for Hadoop

12 | Page. QoS oriented MapReduce Optimization for Hadoop Based ... encompasses two components, Hadoop distribution .... service of the cloud network.

Big Data Processing with Hadoop-MapReduce in Cloud - IAES Journals

Oct 31, 2012 - International Journal of Cloud Computing and Services Science (IJ-CLOSER) ... popular data processing engine for big data is Hadoop-MapReduce due to ... Hadoopâs biggest contributor has been the search ... provide âRack Awarenessâ

Hadoop Mapreduce Performance Enhancement Using In-node

KEYWORDS. Big Data, Hadoop, MapReduce, NoSQL, Data Management. 1. .... [2] built PACMan, an input data caching service that coordinates access to the ...

Hadoop, MapReduce and HDFS: A Developers ... - ScienceDirect

To analyse and process this huge amount of data and to extract meaningful ... Big Data intensive analytic jobs because of its scale-out architecture and its ability ...

Hadoop Performance Models

Jun 6, 2011 - Computer Science Department. Duke University ... The execution of a MapReduce job is broken down into map tasks and reduce tasks.

Hadoop Performance Models

Jun 6, 2011 - writing map outputs to local disk), and Merge (merging sorted spill files). Reduce task execution is divided into the phases: Shuffle (transferring ...

Functional multiple-output decomposition with application to ...

We show that it suffices to consider a fraction of the assignable functions without loss of result quality. Formula (16) can also be written as di fk. Rdi. Rfk. (17).

Big Data Analysis Using Hadoop Mapreduce - American Journal of ...

paper, we present a study of Big Data and its analytics using Hadoop MapReduce, which is open- ... Keywords: Big Data, Hadoop, MapReduce,HDFS,zettabyte.

Big Data Analysis: Comparision of Hadoop MapReduce, Pig ... - IJIRSET

may not be able to handle such large quantities of data. In this paper, we present a study of Big Data and its analysis using Hadoop mapreduce, pig and hive.

Functional Models of Hadoop MapReduce with Application to Scan pp

Download PDF

0 downloads 115 Views 673KB Size Report

Comment

Brief History of (Hadoop) MapReduce ... target data (set-like data, not lists or trees). Functio. Comm u .... Functional

Functional Models of Hadoop MapReduce with Application to Scan pp Kiminori Matsuzaki Ki i iM ki Kochi University of Technology

1

Brief History of (Hadoop) MapReduce • 2004: Google proposed MapReduce [OSDI 2004] 2004‐2006: 2006: Open Open‐Source Source MapReduce in Nutch MapReduce in Nutch • 2004 • 2006‐: Hadoop project • 2011 Dec.: Hadoop 1.0.0 • 2012: Industry standard in distributed processing • 2013 Oct.: Hadoop 2.2.0 (first stable ver. 2.x) [http://research.yahoo.com/files/cutting.pdf] [h [http://www.guruzon.com/6/introduction/map‐reduce/history‐of‐map‐reduce] // / / d / d /h f d ] [http://hadoop.apache.org/releases.html] 2

MapReduce in a NutShell • 3 phases, 2 user‐defined functions (K V l ) (Key, Value)

3

Misunderstandings • The OSDI paper said: “map/reduce were inspired from those in functional programming” Functio onal Commu unity

• Map/reduce in MapReduce differs in terms of – target data (set‐like data, not lists or trees) – how they work (map/reduce are applied independently) – no associativity needed in reduce

DB Communitty C

• “MapReduce: A major step backwards” (2008) – No indexing, poor impl., DBMS‐incompatibility, etc.

4

Functional Models A functional model describes clearly the computation of the framework the computation of the framework, especially by using the types • R. Lämmel: “Google’s MapReduce programming model ‐‐ revisited.” Science of Computer Programming 2008 Science of Computer Programming, 2008 – Provides a functional model of Google’s MapReduce – Model is written in Haskell 5

Lämmel’s Functional Model Map k v

(Dictionary) k2 Æ [v2] Æ Maybe (k2, v3) k1 Æ v1 Æ [(k2, v2)]

map (mapper)

map (reducer)

6

Why Functional Model Matters? • Understanding the computation – Avoid misunderstandings

• Proof of Correctness – Developing functional code to check – Proof using Coq (Related work [Ono 2011], [Jiang 2014])

• Program Calculation – Developing program‐transformation rules l f l

• Cost Model f t i (R l t d k [Dö 2014]) – P Performance tuning (Related work [Dörre 7

Contributions In The Paper • Two functional models of Hadoop MapReduce – Low‐level model (based on implementation) • Nested input/output d / • Stateful mapper/reducer • Detailed modeling of Shuffling phase g gp – High‐level model (user‐friendly specification) • for “secondary‐sorting” technique

• Scan (prefix‐sums) algorithm on the models – Three‐phase algorithm (L‐reduce, G‐scan, L‐scan) based 2 superstep algorithm – BSP BSP‐based 2‐superstep algorithm 8

Nested Input/Output in Hadoop

9

Nested Input/Output in Hadoop • Data Æ Split Æ Record

Split (e.g. 64MB)

Record (e.g. 1 line)

• Split Î Unorderd Î Bag (multi‐set) – Parallel (independently / order may change) P ll l (i d d tl / d h )

• Record ÎOrdered Î List – Sequential (one‐by‐one / order preserved) q ( y / p ) 10

Mapper Class in Hadoop

11

Mapper Class in Hadoop • Definition in Hadoop class Mapper { void setup(Context); void map(Key, Value, Context); void cleanup(Context);

}

void run(Context c) { setup(c); for (kv : split) { map(kv k kv v c); map(kv.k, kv.v, c); } cleanup(c); }

• A simple case == map

• A stateful case == foldl

• A state‐monadic map (in paper) A state monadic map (in paper) 12

Shuffle Phase in Hadoop

13

Shuffle Phase in Hadoop • 3 sub‐phases 1. 2. 3.

Partitioning (cf. reduce job) Sorting Grouping (cf. reducer func.)

14

Toward High‐Level Model • 3‐phase Implementation 1. Partitioning (cf. reduce job) 2. Sorting 3. Grouping (cf. reducer func.) S i Sorting and grouping should d i h ld be consistent

• Possibly any comparator P ibl t Let comp a b = |a – b|