Experiences with MapReduce, an Abstraction for Large-Scale ...

Recommend Documents

Experiences on Processing Spatial Data with MapReduce - CiteSeerX

Jul 25, 2008 - GIS systems have to efficiently manage repositories of spatial data for various .... detailed description of MapReduce and Hadoop concepts. 3.

Experiences Teaching MapReduce in the Cloud - CiteSeerX

berkeley.edu/~cs61c/archives.html becomes visible to both instructors and pupils in a way that is not currently common in computer science. We also eval-.

An Interrelational Grouping Abstraction for

data types required (i.e., t1,t1,...,tn) and second by the proximity function Fp applied to ...... files/Product_Feature_Reference_Chart.pdf. ... 57â68. Krishnamurthy, L., Adler, R., Buonadonna, P., Chhabra, J., Flanigan, M., Kushalnagar, ... In Pr

Tackling an Abstraction Gap: Co-Simulating with

path from BS-ESL to Verilog. Both SystemC and BS-ESL can effectively model RTL. However, the manner in which designs are simulated are different. SystemC ...

StreamMR: An Optimized MapReduce Framework for ... - CiteSeerX

StreamMR: An Optimized MapReduce Framework ... MapReduce framework optimized for AMD GPUs. ..... String Match (SM) SM searches an input keyword in.

A LargeScale Gene-Trap Screen for Insertional

David P. Hill, * Francois Guillemot, * Stephan Gasca, * 9t Dragana Cado,* J. Anna Auerbach * and Siew-Ian Ang * 94. *Samuel Lunenfeld Research Institute, ...

An Ameliorated Methodology for the Abstraction and

Minimization of Functional Dependencies of legacy 'C' .... entry, and the formal notation would be A. → B. Minimal Cover: The set of dependencies contain or.

MapReduce with Deltas - userpages.uni-koblenz.de

Software Languages Team, University of Koblenz-Landau, Germany. Abstractâ The ... measurements are based on Apache's H

Building cubes with MapReduce - CiteSeerX

Oct 28, 2011 - three different options to build a data cube using MapRe- duce; Section 5 shows ..... Sons, Inc., 1992. [13] W. Inmon. Data Mart Does Not Equal ...

MapReduce with Deltas - userpages.uni-koblenz.de

Software Languages Team, University of Koblenz-Landau, Germany. Abstractâ The ... Keywords: MapReduce; Delta; Distribu

Monotonic Abstraction for Programs with ... - Uppsala University

frameworks have emerged allowing either to establish decidability results and .... parts). We represent the store as a graph, where the vertices represent the list cells, and the ..... Based on our method, we have implemented a prototype in Java.

Programming GPGPU with MapReduce - Nvidia

Jan 15, 2010 - ... for a specific domain? âDomain specific language( DSL ). â¢ SQL. â¢ MAP-Reduce .... char * memory

Experiences Teaching MapReduce in the Cloud - Semantic Scholar

Electrical Engineering and Computer Science Department. University of ... you-go billing in the Cloud for a large undergraduate course. Modest instructor effort .... of the course. This permitted our assignments to use Hadoop's better-supported ....

MAPREDUCE FOR INTEGER FACTORIZATION

Jan 4, 2010 - [4] Israel Kleiner. 2005. Fermat: The Founder of Modern Number Theory. Mathematics Maga- zine, Vol. 78, No. 1 (Feb., 2005), pp. 3-14.

An Internet Traffic Analysis Method with MapReduce - Semantic Scholar

recently developed due to the cloud computing platform and the cluster filesystem, they could be usefully applied to analyzing big traffic data. Thus, in this paper, ...

An Improved MapReduce Design of Kmeans with Iteration Reducing ...

large datasets as one of the most challenging tasks for data mining and processing. We propose an improved MapReduce design of. Kmeans algorithm with an ...

Mapreduce Notes Install mapreduce locally:

... datasets to test whenever possible. Install mapreduce locally: Based on http:// www.stanford.edu/class/cs246/cs246-11-mmds/hw_files/hadoop_install.pdf.

HAFNI-enabled largescale platform for neuroimaging informatics ...

Open Access Article ... meaningful results information from raw fMRI data, and to share open-access ... fMRI Big data Informatics system HELPNI HAFNI XNAT ...

MovieLens Unplugged: Experiences with an Occasionally ... - CiteSeerX

Buchanan et. al. [2] look at general usability problems on wireless .... [2] George Buchanan, Sarah Farrant, Matt Jones,. Harold W. Thimbleby, Gary Marsden, and ...

An Insider's Experiences with Environmental ... - GreenProf

author analyses typical environmental business features and its main .... A newly established company always faces a major challenge: the good business idea ... to enable industrial-scale production, small companies often end up facing ...

Experiences with the innovation of an Autonomous

Course Outline. Yes. 448 requests. Course calendar. Yes, project part. â95 requests. Course catalog. Yes. Yes. Lectures, contact sessions. External Resources.

Experiences with an Outpatient Relapse Program (Community ...

Experiences with an Outpatient Relapse. Program (Community Reinforcement Approach). Combined with Naltrexone in the. Treatment of Opioid-Dependence:.

Experiences with an Electronic Whiteboard ... - Semantic Scholar

been enhanced recently by the use of educational software tools and online tools that ..... Technology Grant Scheme (EDTech'04) of the School of. Electrical ...

Psychiatric Patients Experiences with Mechanical Restraints: An ...

Jun 11, 2015 - This study resulted in an overbridging theme: Physical Presence, ... committed staff during mechanical restraint was important, demonstrating ...

Experiences with MapReduce, an Abstraction for Large-Scale ...

Download PDF

5 downloads 31 Views 2MB Size Report

Comment

recovering from machine failure. – status reporting ... Google File System (GFS) chunkserver. – Scheduler daemon ... handling of machine failures. – robustness.

Experiences with MapReduce, an Abstraction for Large-Scale Computation

Jeff Dean Google, Inc.

Outline • Overview of our computing environment • MapReduce – overview, examples – implementation details – usage stats

• Implications for parallel program development

2

Problem: lots of data • Example: 20+ billion web pages x 20KB = 400+ terabytes • One computer can read 30-35 MB/sec from disk – ~four months to read the web

• ~1,000 hard drives just to store the web • Even more to do something with the data

3

Solution: spread the work over many machines • Good news: same problem with 1000 machines, < 3 hours • Bad news: programming work – communication and coordination – recovering from machine failure – status reporting – debugging – optimization – locality

• Bad news II: repeat for every problem you want to solve

4

Computing Clusters • Many racks of computers, thousands of machines per cluster • Limited bisection bandwidth between racks

5

Machines • 2 CPUs – Typically hyperthreaded or dual-core – Future machines will have more cores

• 1-6 locally-attached disks – 200GB to ~2 TB of disk

• 4GB-16GB of RAM • Typical machine runs: – Google File System (GFS) chunkserver – Scheduler daemon for starting user tasks – One or many user tasks

6

Implications of our Computing Environment Single-thread performance doesn’t matter •

We have large problems and total throughput/$ more important than peak performance

Stuff Breaks •

If you have one server, it may stay up three years (1,000 days)

•

If you have 10,000 servers, expect to lose ten a day

“Ultra-reliable” hardware doesn’t really help •

At large scales, super-fancy reliable hardware still fails, albeit less often – software still needs to be fault-tolerant – commodity machines without fancy hardware give better perf/$

How can we make it easy to write distributed programs? 7

MapReduce • A simple programming model that applies to many large-scale computing problems

• Hide messy details in MapReduce runtime library: – automatic parallelization – load balancing – network and disk transfer optimization – handling of machine failures – robustness – improvements to core library benefit all users of library! 8

Typical problem solved by MapReduce

• • • • •

Read a lot of data Map: extract something you care about from each record Shuffle and Sort Reduce: aggregate, summarize, filter, or transform Write the results Outline stays the same, map and reduce change to fit the problem

9

More specifically…

• Programmer specifies two primary methods: – map(k, v) * – reduce(k',