J. Parallel Distrib. Comput. 74 (2014) 2166–2179
Contents lists available at ScienceDirect
J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc
SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters Rong Gu a , Xiaoliang Yang a , Jinshuang Yan a , Yuanhao Sun b , Bing Wang b , Chunfeng Yuan a , Yihua Huang a,∗ a
National Key Laboratory for Novel Software Technology, Nanjing University, 163 Xianlin Road, Nanjing, 210023, China
b
Intel Asia-Pacific Research and Development Ltd, 880 ZiXing Road, Zizhu Science Park, Shanghai, 200241, China
highlights • • • • •
Analyzed and identified two critical limitations of MapReduce execution mechanism. Achieved first optimization by implementing new job setup/cleanup tasks. Replaced heartbeat with an instant messaging mechanism to speedup task scheduling. Conducted comprehensive benchmarks to evaluate stable performance improvements. Passed a production test and integrated our work into Intel Distributed Hadoop.
article
info
Article history: Received 9 February 2013 Received in revised form 4 October 2013 Accepted 26 October 2013 Available online 1 December 2013 Keywords: Parallel computing MapReduce Performance optimization Distributed processing Cloud computing
abstract As a widely-used parallel computing framework for big data processing today, the Hadoop MapReduce framework puts more emphasis on high-throughput of data than on low-latency of job execution. However, today more and more big data applications developed with MapReduce require quick response time. As a result, improving the performance of MapReduce jobs, especially for short jobs, is of great significance in practice and has attracted more and more attentions from both academia and industry. A lot of efforts have been made to improve the performance of Hadoop from job scheduling or job parameter optimization level. In this paper, we explore an approach to improve the performance of the Hadoop MapReduce framework by optimizing the job and task execution mechanism. First of all, by analyzing the job and task execution mechanism in MapReduce framework we reveal two critical limitations to job execution performance. Then we propose two major optimizations to the MapReduce job and task execution mechanisms: first, we optimize the setup and cleanup tasks of a MapReduce job to reduce the time cost during the initialization and termination stages of the job; second, instead of adopting the loose heartbeat-based communication mechanism to transmit all messages between the JobTracker and TaskTrackers, we introduce an instant messaging communication mechanism for accelerating performance-sensitive task scheduling and execution. Finally, we implement SHadoop, an optimized and fully compatible version of Hadoop that aims at shortening the execution time cost of MapReduce jobs, especially for short jobs. Experimental results show that compared to the standard Hadoop, SHadoop can achieve stable performance improvement by around 25% on average for comprehensive benchmarks without losing scalability and speedup. Our optimization work has passed a production-level test in Intel and has been integrated into the Intel Distributed Hadoop (IDH). To the best of our knowledge, this work is the first effort that explores on optimizing the execution mechanism inside map/reduce tasks of a job. The advantage is that it can complement job scheduling optimizations to further improve the job execution performance. © 2013 Elsevier Inc. All rights reserved.
1. Introduction
∗
Correspondence to: Department of Computer Science and Technology, Nanjing University, 163 Xianlin Road, Nanjing, 210023, China. E-mail addresses:
[email protected] (R. Gu),
[email protected] (X. Yang),
[email protected] (Y. Sun),
[email protected] (B. Wang),
[email protected] (C. Yuan),
[email protected],
[email protected] (Y. Huang). 0743-7315/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jpdc.2013.10.003
The MapReduce parallel computing framework [7], proposed by Google in 2004, has become an effective and attractive solution for big data processing problems. Through simple programming interfaces with two functions, map and reduce, MapReduce significantly simplifies the design and implementation of many data-intensive applications in the real world. Moreover,
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
MapReduce offers other benefits, including load balancing, elastic scalability, and fault tolerance, which makes it a widely adopted parallel computing framework. Hadoop [1], an open-source implementation of MapReduce, has been widely used in industry and researched in academia. Both the MapReduce frameworks of Google and Hadoop have been widely recognized for their high-throughput, elastic scalability, and fault tolerance. They focus more on these features than job execution efficiency. This results in relatively poor performance when using Hadoop MapReduce to execute jobs, especially short jobs. The term ‘short job’ has already been used in some related work [30,31]. There is no quantitative definition for short jobs now. Usually they refer to MapReduce jobs with execution time ranging from seconds to a few minutes as opposed to long MapReduce jobs that take hours. Facebook names this type jobs as ‘small job’ in its recently released optimized version of Hadoop, Corona [24]. Some studies show that short jobs compose a large portion of MapReduce jobs [7,5]. For example, the average execution time of MapReduce jobs at Google in September 2007 is 395 s [7]. Response time is most important for short jobs in scenarios where users need the answer quickly, such as query or analysis on log data for debugging, monitoring and business intelligence [30]. In a pay-by-thetime environment like EC2, improving MapReduce performance means saving monetary costs. Optimizing MapReduce’s execution time can also prevent jobs from occupying system resources too long, which is good for a cluster’s health [31]. Today there are a number of high-level query and data-analysis systems that provide services on top of MapReduce, such as Google’s Sawzall [17], Facebook’s Hive [22] and Yahoo!’s Pig [16]. These systems execute users’ requests by converting SQL-like queries to a series of MapReduce jobs that are usually short. These high-level declarative languages can greatly simplify the task of developing applications in MapReduce without hand-coded MapReduce programs [21]. Thus, in practice, these systems play more important roles than hand-coded MapReduce programs. For example, more than 95% Hadoop jobs in Facebook are not handcoded but generated by Hive and more than 90% MapReduce jobs in Yahoo! are generated by Pig [12]. In fact, these systems are very sensitive to the execution time of underlying short MapReduce jobs. Therefore, reducing the execution time of MapReduce jobs is very important to these widely-used systems. For the above reasons, in this paper we concentrate on improving the execution performance of short MapReduce jobs. Having studied the Hadoop MapReduce framework in great detail, we focus on the internal execution mechanisms of an individual job and the tasks inside a job. Through in-depth analysis, we reveal that there are two critical issues that limit the performance of MapReduce jobs. To address these issues, we design and implement SHadoop, an optimized version of Hadoop that is fully compatible with the standard Hadoop. Different from improving the performance on job scheduling or job parameter optimization level, we optimize the underlying execution mechanism of each of tasks inside a job. In implementation, first, we optimize the setup and cleanup tasks, two special tasks when executing a MapReduce job, to reduce the time cost during the initialization and termination stages of the job; second, we add an instant messaging communication mechanism into the standard Hadoop for the fast delivery of the performance-sensitive task scheduling and execution messages between the JobTracker and TaskTrackers. This way the tasks of a job can be scheduled and executed instantly without heartbeat delay. As a consequence, the job execution process becomes more compact and utilization of the slots on the TaskTrackers can be much improved. Experimental results show that SHadoop outperforms the standard Hadoop and can achieve stable performance improvements of around 25% on average for comprehensive benchmarks. Our optimization work has passed a
2167
production-level test in Intel and been integrated into the Intel Distributed Hadoop [11]. To the best of our knowledge, this work is the first effort that explores on optimizing the execution mechanism inside map/reduce tasks. The advantage is that it can complement job scheduling optimization work to further improve the job execution performance. The rest of this paper is organized as follows: Section 2 introduces the related works on MapReduce performance optimization and comparisons them with SHadoop. Section 3 focuses on analyzing the job/tasks execution mechanism in standard Hadoop MapReduce. Based on this, Section 4 describes our optimization methods for improving job execution efficiency in the standard Hadoop MapReduce. Section 5 discusses experiments and performance evaluations of our optimization work. Finally, we conclude this paper in Section 6. 2. Related work analysis Many studies have been done to improve the performance of the Hadoop MapReduce framework from different levels or aspects. They fall into several categories. The first focuses on designing scheduling algorithms to optimize the execution order of jobs or tasks more intelligently [30,29,15,8,27,9,14,25]. The second explores how to improve the efficiency of MapReduce with the aid of special hardware or supporting software [31,3,28]. The third conducts specialized performance optimizations towards a particular type of MapReduce applications [13,20,26]. Some researchers also focus on exploring optimizing job configuration settings or parameters to improve its execution performance [2]. Many researchers have shown interests in optimizing the scheduling policies in Hadoop. In 2009, Zaharia [30] proposed a task scheduling algorithm called LATE (Longest Approximate Time to End), which executes speculative tasks to improve Hadoop’s performance on heterogeneous clusters. To further improve the overall performance of Hadoop clusters, Hsin-Han [29] proposed a new scheduler named Load-Aware scheduler to address the problem resulting from the phenomenon of dynamic loading. A scheduler that is aware of different types of jobs running on the cluster is designed by Radheshyam [15] for the same goal. Mohammad [8] proposed the Locality-Aware Reduce Task Scheduler (LARTS), another practical strategy for improving MapReduce performance. Similar work can be found in [27,9,14,25]. These studies improve the Hadoop MapReduce performance by making intelligent or adaptive job and task scheduling for different running circumstance. On the other hand, our optimization work focuses on optimizing the underlying job and task execution mechanism to reduce the execution time cost for each individual job and its tasks. Little work has been done on this level so far and one advantage of our work is that it can complement the job scheduling optimizations as above to further improve the performance of the Hadoop MapReduce framework. To achieve higher execution efficiency on the Hadoop MapReduce framework, researchers have tried to adopt special hardware accelerators or supporting software for performance enhancement. Yolanda [3] and Miaond [28] exploited approaches of Hadoop MapReduce improvement with Cell BE processors and GPU acceleration respectively. A software method to improve the performance of MapReduce is using distributed memory cache [31], in which an existing open-source tool Memcached [6] is adopted to provide high-performance, distributed memory caching capability. Compared with these studies, SHadoop is easier to be put into practical use for the reason that we only modify the source code of the standard Hadoop to avoid employing any special hardware or supporting software and ensure full compatibility with the existing MapReduce programs and applications, including configurations.
2168
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
Fig. 1. The state transition of a job during its execution.
Some other researchers focus on reducing the execution time cost of particular type of MapReduce jobs. For the one-pass analysis applications, Boduo [13] proposed a Hadoop-based prototype using a new frequent key based technique for performance enhancement. Ideas proposed in [20,26] can help to improve the execution efficiency of MapReduce jobs with heavy workloads in the shuffle and reduce phase. As each type of these specialized optimizations only pertain to certain type of applications, they lack general applicability. Our optimization is a generalized approach to improving the performance of MapReduce jobs. 3. In-depth analysis of MapReduce job execution process In this section, we first give a brief introduction to the Hadoop MapReduce framework. Then we focus on performing an in-depth analysis of the underlying execution mechanism and process of a MapReduce job and its tasks in Hadoop. Hadoop MapReduce framework, which is deployed on top of HDFS, consists of a JobTracker running on the master node and many TaskTrackers running on slave nodes. ‘‘Job’’ and ‘‘Task’’ are two important concepts in MapReduce architecture. Usually, a MapReduce job contains a set of independent tasks. As a core component in MapReduce framework, the JobTracker is responsible for scheduling and monitoring all the tasks of a MapReduce job. Tasks are assigned to the TaskTrackers on which the map and reduce functions implemented by users are executed. When receiving a job, MapReduce framework will divide the input data of the job into several independent data splits. Then, each data split is assigned to one map task which will be distributed to the TaskTracker for processing by data locality optimization. Multiple map tasks can run simultaneously on the TaskTrackers and their outputs will be sorted by the framework and then fetched by reduce tasks for further processing. During the whole execution process of a job, the JobTracker monitors the execution of each task, reassigning failed tasks and altering state of the job in each phase. In order to better elaborate our optimization work in the next section, here first we present the execution state transition of a job in MapReduce framework, and then we analyze the execution process of a task.
The execution state transition of a job is illustrated in Fig. 1. Generally, the execution process can be divided into three phases in sequence. They are PREPARE, RUNNING and FINISHED. When a job is submitted to a Hadoop MapReduce cluster, the execution process works as follows: (1) PREPARE phase: a job begins its journey from the START state. It enters the PREPARE.INITIALIZING state to initialize itself, conducting some initialization processing such as reading the input data splits information from HDFS and generating the corresponding map and reduce tasks on the JobTracker. After that, a special task called ‘‘setup task’’ will be scheduled to a TaskTracker to setup the job execution environment. At this time, the job execution reaches the PREPARE.SETUP state. When the setup task finished successfully, the job will enter the RUNNING phase. (2) RUNNING phase: in this phase, the job starts from the RUNNING.RUN_WAIT state. During this state, the job waits to be scheduled for execution by the MapReduce framework. When one of its tasks has been scheduled to a TaskTracker for execution, the job will enter the RUNNING.RUNING_TASKS state to execute all its map/reduce tasks. Once all the map and reduce tasks are completed successfully, the job then moves to the RUNNING.SUC_WAIT state. (3) FINISHED phase: in this state, another special task called ‘‘cleanup task’’ will be scheduled to a TaskTracker to clean up the running environment of the job. After the cleanup task is done, the job will finally arrive at the SUCCEEDED state, or in other words, the job has finished successfully. In any state of the PREPARE and RUNNING phase, a job can be killed by the client and ended in the KILLED state or go into the FAILED state due to various failures. According to Fig. 1, we know that when a job is initialized, many map/reduce tasks of the job will be created. These tasks are waiting to be scheduled to the TaskTrackers for execution. Fig. 2 shows the timeline of how a task is processed. Generally, the processing workflow consists of 8 steps. (1) When the tasks are created, the JobTracker will generate a ‘‘TaskInProcess’’ instance for each task. At this time, the tasks are still in the UNASSIGNED state. (2) Each TaskTracker sends a heartbeat to the JobTracker for requesting tasks. In response, the JobTracker allocates one or several tasks to each TaskTracker. This information exchange is done through the first round of heartbeat communication. The interval between two heartbeat messages is at least 3 seconds by default. (3) After receiving a task, the TaskTracker performs the following work: creating a ‘‘TaskTracker.TaskInProgress’’ instance, running an independent child JVM to execute the task, and then changing the task state of the TaskTracker to the RUNNING state. (4) Each TaskTracker reports the information of its task to the JobTracker, and the JobTracker updates the task state into the RUNNING state. This is done through the second round of heartbeat communication. (5) After a while, the task is completed in child JVM. Then, the TaskTracker changes the task state into the COMMIT_PENDING. This is a state that waits for the JobTracker’s approval of committing the task. (6) This state change message will be forwarded to the JobTracker by the TaskTrackers through the next round of heartbeat communication. In response, the JobTracker changes the task state into the COMMIT_PENDING state to allow the TaskTrackers to commit the task results. (7) When getting the JobTracker’s approval, the TaskTracker submits the task execution results and then changes the task state into the SUCCEEDED. (8) After that, the TaskTracker reports the SUCCEEDED state to the JobTracker through the next heartbeat. Then the JobTracker changes the task state into the SUCCEEDED. By this time, the execution of a task is completed.
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
2169
Fig. 3. Job execution state transition after optimization.
Fig. 2. The execution process of a task with respect to time. The vertical lines represent time axes and each arrow line stands for one communication between entities. Child JVM lies on slave nodes with the TaskTracker. The tasks are executed on different child JVMs independently. The ‘‘TaskInProgress’’ in the JobTracker and the ‘‘TaskTracker.TaskInProgress’’ in the TaskTracker are two important runtime instances.
4. Optimization of MapReduce job and task execution mechanisms Based on the above in-depth analysis of execution mechanisms of a MapReduce job and its tasks, in this section we reveal two critical limitations to job execution performance in the standard Hadoop MapReduce framework. Then we present our optimization work to address these issues in more detail. The optimizations made in SHadoop aim at reducing the internal execution time cost of individual MapReduce jobs, especially for short jobs, by optimizing the job and task execution mechanisms to improve the hardware resource utilization rate of the TaskTracker slots. 4.1. Optimizing setup/cleanup tasks in MapReduce job As the state transition process of a job presented in Fig. 1, prior to scheduling the map/reduce tasks of a job, a setup task should be scheduled and executed first. In brief, the setup task is processed as follows: (1) Launch job setup task: after the job is initialized, the JobTracker needs to wait for a heartbeat message from a TaskTracker to tell it has a free map/reduce slot ready to receive and execute a new task, and then the JobTracker schedules the setup task to this TaskTracker. (2) Job setup task completed: the TaskTracker processes the task, and keeps reporting state information of this task to the JobTracker by periodical heartbeat messages until the task is completed.
The two steps described above usually need to take two rounds of heartbeat communication (at least 6 s, as the default heartbeat interval of Hadoop is 3 s). Similarly, after all map/reduce tasks are completed successfully, a cleanup task must be scheduled to run on a TaskTracker before the job really ends. This needs another two rounds of heartbeat communication, which means another 6 s. Thus the setup and cleanup tasks will take at least 12 s in total. For a short job which runs only in a couple of minutes, these two special tasks may take around 10% or even more of its total execution time. If the fixed time cost of 4 rounds of heartbeat communication for a short job can be reduced, it will be a noticeable performance improvement for job execution. By taking a closer look at the implementation of the setup and cleanup tasks in the standard Hadoop MapReduce, we observe that, the job setup task running in a TaskTracker simply creates a temporary directory for outputting temporary data during job execution, and the only thing the job cleanup task does is deleting this temporary directory. These two operations are very light-weighted and their actual time cost is very little. Hence, instead of sending messages to a TaskTracker to launch the job setup/cleanup task based on periodical heartbeats, we execute the job setup/cleanup task immediately on the JobTracker side. That means, when the JobTracker initializes a job, a setup task of the job will be immediately executed one time in the JobTracker. After all map/reduce tasks of the job are completed, a cleanup task of the job will be immediately executed one time on the JobTracker as well. By this optimization, the job will avoid 4 heartbeat intervals used in the standard Hadoop for processing the setup and cleanup tasks. Because the setup and cleanup tasks are simple operations and each job only needs to execute the setup and cleanup tasks only once no matter how many map/reduce tasks the job has, running many jobs would not overwhelm the master node in a Hadoop cluster as only a few jobs can be scheduled to be executed at a time. Fig. 3 shows the optimized state transition of a job in SHadoop. After making this optimization, the PREPARE.SETUP and CLEANUP states in Fig. 2 are incorporated into the PREPARE.INITIALIZED and RUNNING.SUC_WAIT states respectively. 4.2. Optimizing job/tasks execution event notification mechanism The MapReduce Framework adopts a periodical heartbeatbased communication mechanism to exchange information and commands between the master node and slave nodes. From the
2170
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
task execution process described in Fig. 2 we can see that the standard Hadoop MapReduce also uses these heartbeat messages between the JobTracker and TaskTrackers to notify job/tasks about scheduling and execution event information. Each TaskTracker periodically sends information to the JobTracker and performs a pull-model task requests if it has free map/reduce slots. And the JobTracker responds if it has more tasks to be executed. We refer to this as the pull-model heartbeat communication mechanism. Through this heartbeat communication mechanism, the TaskTrackers also report node information to the JobTracker and then the JobTracker issues control commands to the TaskTrackers. For controlling and managing a Hadoop cluster, an appropriate heartbeat period should be set. Now, for a cluster with less than 100 nodes, the default heartbeat interval is 3 s in the standard Hadoop, with an additional 1 s added per 100 extra nodes. To some extent, the pull-model heartbeat communication mechanism can help prevent the JobTracker from being overwhelmed. However, a heartbeat usually contains various messages such as the load state of a slave node, whether it is ready to execute tasks, reporting alive and so on. The transmission efficiency of some messages, such as ready to execute tasks, is very important and sensitive to the performance of job execution. We refer this type of performance-sensitive event messages for job and task scheduling and execution as critical event messages. Transferring critical event messages by the heartbeat communication mechanism leads to a heavy time cost during job execution for two reasons: (1) the JobTracker has to wait for the TaskTrackers to request tasks passively, and as a result, there will be a delay between submitting a job and scheduling its tasks due to the fact that the TaskTrackers would not contact the JobTracker until the heartbeat interval passed; (2) critical event messages (task requesting, task start running, task commit pending, and task finishing) cannot be timely reported from the TaskTrackers to the JobTracker and this delays the task scheduling, further increasing the time cost of job execution and decreasing the utilization efficiency of computing resources, even if the map/reduce slots on a TaskTracker are idle and wait for tasks. A short job usually only has dozens of tasks and runs for a couple of minutes. If each task is delayed for a few seconds, the total execution time would be delayed for a noticeable amount. The categories of the heartbeat messages sent from TaskTrackers to JobTracker are summarized in Table 1. There are only four critical events: task requesting, task start running, task commit pending, and task finishing. And the messages sent while the task is in the running state are not critical event messages. In other words, we regard the messages notifying the task execution workflow to move to the next state as critical event messages and the others are not. Accelerating transmission of these critical event messages will shorten the time cost of task scheduling and execution, further shortening the total execution time of a job. Decreasing the value of heartbeat interval is not a good solution to this problem. This naive approach could overwhelm the JobTracker and potentially crash the whole Hadoop cluster. It incurs many unnecessary heartbeats and usually only be used in small clusters [23]. To resolve this problem, in SHadoop, we separate the critical event messages from heartbeat messages and add an instant messaging communication mechanism for critical event notifications as shown in Fig. 4. In this new mechanism, when critical events such as task completion happen, the message will be sent to the JobTracker immediately. In this way, critical event messages will be synchronized between the JobTracker and TaskTrackers quickly. For all job/tasks execution event notifications, we use the instant messaging communication, but for those cluster management events that are not that performance-sensitive we still adopt the heartbeat communication mechanism. This way we can improve the hardware resource utilization without overwhelming the JobTracker.
Table 1 Type of the messages sent from a TaskTracker to the JobTracker in SHadoop. Message type
Critical messages
Non-critical messages
Message name
Task requesting, task start running, task commit pending, task finishing
Task in running
Fig. 4. Optimized task execution process after applying the instant messaging communication mechanism.
To make SHadoop fully compatible with the standard Hadoop, we avoid using any third-part libraries during the implementation of our instant messaging communication mechanism. For network communication, SHadoop still adopts the inner RPC mechanism in Hadoop. In SHadoop, when a critical event happens, the message will be immediately transmitted by the Hadoop RPC methods between the JobTracker and TaskTrackers without waiting for a heartbeat period. 5. Evaluation In order to verify the effects of our optimizations, we conducted a series of experiments to evaluate and compare the performance of SHadoop with the standard Hadoop. First, we performed a number of experiments to separately evaluate the effect of each of optimization measures. Second, in order to evaluate how much our optimization can benefit the MapReduce jobs with different workloads, we adopted several Hadoop MapReduce benchmark suites to further evaluate SHadoop. Third, the widely-used big data ad-hoc query and analysis systems such as Hive and Pig are built on top of MapReduce, thus, we also verified how much SHadoop can improve the execution efficiency of Hive with a number of comparative experiments. Fourth, we evaluated the scalability of SHadoop compared to the standard Hadoop. We carried out two experiments by (1) scaling the data while fixing the number of machines, and (2) scaling the number of machines while fixing the data. Finally, we evaluate the impact on the system workloads brought by the instant messaging optimization in SHadoop with both formal analysis and experimental verification.
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
5.1. Environment setup The experiments were performed under Hadoop 1.0.3 (a stable version) and SHadoop. Our test cluster contains one master node and 36 compute nodes. The master node is equipped with two 6-core 2.8 GHz Xeon processors, 36 GB of memory and two 2 TB 7200 RPM SATA disks. Each compute node has two 4-core 2.4 GHz Xeon processors, 24 GB of memory and also two 2 TB 7200 RPM SATA disks. The nodes are connected with 1 Gb/s Ethernet. They all run RHEL6 with kernel 2.6.32 operating system and ext3 file system. Each compute node acts as a TaskTracker/DataNode, and the master node acts as the JobTracker/NameNode. The Hadoop configuration uses the default settings and 8 map/reduce slots per node. Both the standard Hadoop and SHadoop run with OpenJDK 1.6 with the same JVM heap size 2 GB. 5.2. Analysis of optimization measures and effects SHadoop has made two optimizations on the standard Hadoop MapReduce. Thus, we performed a set of experiments to demonstrate the effect of each optimization in this subsection. The job execution time is the performance metric here. First, we run our experiments with the well-known WordCount benchmark. To make the application job short, the input data size was set to 4.5 GB with around 200 data blocks. We ran the benchmark with 16 reduce tasks on the standard Hadoop 1.0 environment and SHadoop under a cluster of 20 slave nodes with 160 slots in total. During the execution of a job, we recorded the load of the slots on each TaskTracker at every second into a log file on JobTracker. The results are shown in Fig. 5. Fig. 5(a) shows that the number of running tasks of the WordCount benchmark varies as time elapsed when running on the standard Hadoop. It can be seen clearly that at the beginning of a job, it takes about 7 s to execute a setup task before running users’ map/reduce tasks. Similarly, a cleanup task needs to be executed before the job ends. As shown in Fig. 5(b), after applying the optimized job setup/cleanup tasks, the setup and cleanup time costs are noticeably reduced. The total job execution time cost is shortened from 60 to 46 s, for a 23.3% improvement in performance. As shown in Fig. 5(c), with optimization of the instant messaging communication mechanism to the standard Hadoop, the number of running tasks stayed higher and changed more smoothly. The phenomenon indicates that during the job execution, the slots on the TaskTrackers have been maximally scheduled to run tasks and rarely stay at idle state. This makes the execution process more compact and efficient by improving the CPU utilization rate on each slot. For a given MapReduce job, the total computation workload is fixed, thus improving the CPU utilization rate of map/reduce slots would lead to a reduction in the total execution time. Fig. 5(d) shows the job execution with both the setup/cleanup task optimization and instant messaging optimization applied together. Compared with only the setup/cleanup task optimization in Fig. 5(b), the total execution time of the job is further shortened from 46 to 39 s, about 11.7% of extra improvement, for adding the instant messaging optimization. The two optimizations have an additive effect because they work at different phases during job execution: the setup/cleanup task optimization works at the beginning and end of a job, while the instant messaging optimization takes effect in the middle of a job. To sum it up, both of the optimization measures can make a significant contribution to the performance improvement. Compared with the standard Hadoop, SHadoop can reduce the execution time cost of the short WordCount benchmark job by 35% in total. Grep and Sort are another two widely-used benchmarks of MapReduce jobs that are also used in the original MapReduce paper [7].
2171
Grep is a typical map-side job. For a map-side job, the most work is done on map tasks. The output from map tasks is usually several orders of magnitude smaller than the input and thus there is little work for reduce tasks. On the other hand, Sort is a typical reduce-side job. For reduce-side jobs, most execution time is spent on the reduce phase, including shuffling the intermediate data and performing reduce tasks. In these jobs, the output data of map tasks are the same size as the job input data. To evaluate the effect of optimizations on different type of MapReduce jobs, we also ran these two benchmarks on the standard Hadoop and SHadoop. Experiments in the first group are performed on the Grep benchmark with 10 GB input data. For the Sort benchmark, experiments are performed on 3 GB input data. Both experiments are conducted on 20 slave nodes as the WordCount benchmark experiments. The results of these two groups of experiments are shown in Figs. 6 and 7 respectively. As shown in Figs. 6 and 7, compared to the standard Hadoop, SHadoop has shortened the execution time cost of the Grep benchmark from 47 to 29 s and the Sort benchmark from 63 to 41 s. The total execution time is reduced by 38% and 34% respectively. This demonstrates that the optimizations can enhance the execution efficiency of both map-side and reduce-side MapReduce jobs. 5.3. Impact on performance of comprehensive benchmarks In this subsection, to better evaluate and prove the general applicability and stable performance improvement of our optimization work, we further test the impact of our optimizations with comprehensive benchmarks, including HiBench [10], a widelyused benchmark suit from Intel, MRBench [4], a benchmark carried in the standard Hadoop distribution, and a widely-used application benchmark suite, the Hive benchmarks, to evaluate SHadoop. (1) HiBench evaluation: HiBench is a widely-used benchmark suite for Hadoop [10]. It consists of a set of Hadoop MapReduce program benchmarks, including both synthetic micro-benchmarks and real-world Hadoop applications. The running time of each HiBench benchmark is shown in Fig. 8, and the corresponding performance improvement rates are recorded in Table 2. The running time of the standard Hadoop with each individual optimization are also reported. From Table 2, we can see that both optimization measures always take effect and the performance improvements vary for different benchmarks. Some commonlyused benchmarks such as WordCount, Sort, and Grep, can get more than 30% improvement, and the NutchIndexing and HiveBenchaggregator benchmarks get 6% improvement. Therefore, SHadoop can improve the execution efficiency of all benchmarks to some degree and our optimization work achieves general applicability. (2) Hadoop MRBench evaluation: MRBench is one of the benchmarks from the standard Hadoop distribution for people to benchmark, stress test, measure, and compare the performance results of a Hadoop cluster with that of others. MRBench creates a sequence of small MapReduce jobs (the number can be configured), and the job execution efficiency of the underlying Hadoop is evaluated by the total time cost of these MapReduce jobs. We run three groups of comparative experiments with different submitting job numbers. The results are shown in Table 3. The running time of the standard Hadoop with each individual optimization is also reported. From Table 3, we can see that both optimization measures take effect and the performance improvement rate of SHadoop is consistently around 30%. (3) Hive benchmark evaluation: in practice, Hive and Pig are used more widely than hand-coded MapReduce programs for big data query and analysis applications. As we have mentioned, more than 95% Hadoop jobs in Facebook are not hand-coded but generated by Hive [12]. One motivation of our optimization work is to benefit big data query and analysis systems. Therefore, we
2172
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
(a) Running wordcount benchmark on the standard Hadoop.
(c) Running wordcount benchmark with the instant messaging optimization.
(b) Running wordcount benchmark with the job setup/cleanup task optimization.
(d) Running wordcount benchmark with both job setup/cleanup tasks optimization and instant messaging optimization.
Fig. 5. Performance evaluation for effects of optimization measures in SHadoop (in seconds, the lower Running Time is better).
(a) Running Grep benchmark on the standard Hadoop.
(b) Running Grep benchmark on SHadoop.
Fig. 6. Performance evaluation of Grep benchmark under standard Hadoop and SHadoop (in seconds, lower Running Time is better).
(a) Running Sort benchmark on the standard Hadoop.
(b) Running Sort benchmark on SHadoop.
Fig. 7. Performance evaluation of Sort benchmark under standard Hadoop and SHadoop (in seconds, lower Running Time is better).
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
2173
Fig. 8. Execution performance of each HiBench benchmark under Hadoop and SHadoop. (In minutes, the lower Execution Time is better. Note: the running time of the Bayes benchmark is too long, large than 25 min. To better illustrate the performance evaluation of whole benchmarks, we scale down the running time of Bayes benchmarks 10 times. Its exact execution time is noted on their data tags.)
Fig. 9. Execution performance for benchmarks in Hive under Hadoop and SHadoop. Table 2 Performance evaluation for effects of optimization measures in SHadoop using HiBench. (Hadoop represents the standard Hadoop, the first optimization represents Hadoop with only the setup/cleanup task optimization, the second optimization represents Hadoop with only the instant messaging optimization, and SHadoop represents Hadoop with both optimizations.) Benchmark case
Hadoop (s)
The first optimization (s)
The second optimization (s)
SHadoop (s)
Improvement in total (s)
WordCount Sort Grep PageRank Kmeans NutchIndexing HiveBench-aggregator HiveBench-join Bayes
60 63 201 283 47 159 113 212 1697
50 53 179 224 36 153 110 199 1588
51 52 167 271 40 156 108 197 1640
39 41 146 213 29 151 106 185 1518
35.00% 34.90% 38.27% 27.36% 24.73% 6.00% 6.00% 12.70% 10.55%
Table 3 Experiment results of MRBench under the standard Hadoop and SHadoop. (Hadoop represents the standard Hadoop, the first optimization represents Hadoop with only the setup/cleanup task optimization, the second optimization represents Hadoop with only the instant messaging optimization, and SHadoop represents Hadoop with both optimizations.) # Jobs
Hadoop (s)
The first optimization (s)
The second optimization (s)
SHadoop (s)
Improvement in total
5 50 500
122 1 252 12 504
91 943 9020
114 1 178 12 117
85 876 8754
30.30% 30.03% 30.00%
also evaluate the impact of performance improvement for the Hive benchmarks. In this experiment, we use Hive 0.9 as the big data query and analysis system, and run a number of the Hive benchmarks over the Hive based on the standard Hadoop and SHadoop respectively. The experimental results are shown in Fig. 9, and the corresponding performance improvement rates are recorded in Table 4. The
running time of the standard Hadoop with each individual optimization is also reported. From Table 4, we can see that our optimized MapReduce framework can noticeably accelerate the execution speed of Hive applications. Both optimization measures take effect and the performance improvements vary for different benchmarks. The average improvement rate in total is around 20%, which is significant for many online queries and analysis.
2174
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
Table 4 Performance evaluation for effects of optimization measures in SHadoop using the Hive benchmarks. (In the table, Hadoop represents the standard Hadoop, the first optimization represents Hadoop with only the setup/cleanup task optimization, the second optimization represents Hadoop with only the instant messaging optimization, and SHadoop represents Hadoop with both optimizations. GB_SingleReducer is short for GroupBy_SingleReducer.) Benchmark name
Hadoop (s)
The first optimization (s)
The second optimization (s)
SHadoop (s)
Improvement in total
Join Combine GroupBy (GB) GB_SingleReducer Insert_Into Order Sort Union
67 123 49 99 113 25 26 26
61 106 43 87 94 22 22 20
56 116 45 93 109 24 24 24
51 99 39 82 91 21 21 19
23.9% 19.5% 20.4% 17.2% 18.6% 16.0% 19.2% 23.1%
Table 5 Execution time of the wordcount benchmark in Hadoop and SHadoop with different size of input data per node. Data per node
256 MB
512 MB
1 GB
2 GB
4 GB
8 GB
Hadoop (s) SHadoop (s) Improvement rate
79 55 30.38%
115 90 21.74%
172 147 14.53%
291 263 9.62%
499 472 5.41%
962 919 4.47%
5.4. Scalability In this subsection, we experimentally evaluate the scalability of SHadoop compared to the standard Hadoop by scaling the data while fixing the number of machines, and scaling the number of machines while fixing the data. (1) Data scalability: Table 5 shows the performance of SHadoop and the standard Hadoop with different sizes of input data. The experimental results come from the WordCount benchmark running with 20 nodes. From Table 5, we see that SHadoop outperforms the standard Hadoop for all the different sizes of input data per node. The improvement percentage ranges from 30.38% with 256 MB input data per node to 4.47% with 8 GB input data per node. It indicates that SHadoop can improve the performance of various sizes of the MapReduce jobs but the improvement effect is much more significant for short jobs. In addition, the nearly linear trend between the job execution time and data size indicates that SHadoop performs excellent scalability when data size varies. (2) Machine scalability: we also evaluate the job execution performance of SHadoop and the standard Hadoop with different number of nodes. All the experiments in this subsection are also conducted on the WordCount benchmark with 10 GB data in total and around 500 input blocks. The experimental results are shown in Fig. 10. In Fig. 10, the execution time of the job runs on SHadoop under 4, 8, 16, 32 nodes are 287 s, 141 s, 85 s, and 46 s respectively. The later is always around half of the former, which means SHadoop scales well with different cluster nodes. Same as the standard Hadoop, when more nodes added, SHadoop speeds up proportionally. Furthermore, we can find that with the same number of nodes, the job execution efficiency of SHadoop is always higher than the standard Hadoop. The improvement percentages are more noticeable for shorter jobs. In conclusion, SHadoop achieves much better execution efficiency than that of the standard Hadoop under different node numbers, without losing good scalability in clusters with dozens of nodes. 5.5. Impact on system workloads In this subsection, we study the impact on the system workloads caused by the instant messaging optimization. First, we model this problem with a formal analysis of the impact on system workloads. To prove our model, we evaluate the real workloads in our cluster with the MapReduce job benchmarks before and after the instant messaging optimization. The workloads we study here
Fig. 10. Execution time of wordcount benchmark in SHadoop and the standard Hadoop with different node numbers.
include network traffic, CPU usage and memory usage in both the JobTracker and TaskTrackers. (1) Quantitative Analysis: in our Hadoop cluster, the JobTracker runs on the master node. Also, there are mslave nodes, each of which runs a TaskTracker. A TaskTracker can run ktasks simultaneously, where k is the number of slots. In the standard Hadoop, the TaskTrackers communicate with the JobTracker through heartbeat periodically. Let the period interval be T and the size of a heartbeat message be c. All the slots in the same TaskTracker share one heartbeat timer. When the timer counts the interval T down to zero, it will trigger the TaskTracker to send a heartbeat message to the JobTracker. Whenever a heartbeat message is sent out, the TaskTracker will reset the timer to count from the start again. Assume the life span of a task on a slot can be covered by a time window t, where t varies for different tasks. Then, as shown in Fig. 11(a) and (b), the increased number of messages is no more than 4 × m × k. Thus, the increased message size transferred during the time window t is no more than 4 × m × k × c. We can find that the increased message size is independent from the task execution time period t. This means that the increased message number caused by our instant messaging optimization is a fixed overhead, no matter how long the task execution time window t is. On the other hand, in practice, k is the number of slots in a TaskTracker that run tasks simultaneously. It is usually the number of cores of a machine, for example, 4, 8 or 16 etc. m is the number
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
2175
(a) Message transferring model of the TaskTracker in the standard Hadoop.
(b) Message transferring model of the TaskTracker in SHadoop. Fig. 11. The message transferring model of the TaskTracker in the standard Hadoop and SHadoop. IM is an abbreviation for instant message.
of the slaves of the Hadoop cluster. In a moderately-sized cluster, the number is usually in dozens. m × k can be regarded as the number of the cores running tasks parallelly in the cluster. It can reach hundreds in a moderately-sized cluster. c is the size of a message, it is around 2 KB. Therefore, the increased network traffic for the increased message size during the time window t (around 40 s for typical MapReduce Tasks) is around several mega bytes that will not create problem for a commonly-used cluster network environment such as Fast Ethernet or Gigabit Ethernet. (2) Experimental studies: we also evaluate the increased workload caused by the instant messaging optimization through experiments to better verify the impact on system workloads. The
workload we evaluate during experiments includes not only the cluster network traffic but also the CPU/Memory usages in the JobTracker and TaskTrackers. In our experimental cluster, the number of the slave machines is 20 and each machine has 8 cores. We configured 8 map and 4 reduce slots per node. Thus the number of the tasks running in parallel is 240. We chose a typical normal MapReduce job: wordCount as the benchmark case. To study the tasks with different running time, we adopted three scale-level datasets and their data block sizes are 16 MB, 32 MB and 64 MB respectively. That means, each task needs to process 16 MB, 32 MB and 64 MB data respectively.
2176
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
Table 6 Execution time and transmitted message numbers of Map/Reduce tasks in SHadoop and the standard Hadoop with different data block sizes. The number of messages is that in total sent from a TaskTracker. Data block size/task role
Execution time (Hadoop) (s)
Num. of messages (Hadoop)
Execution time (SHadoop) (s)
Num. of messages (SHadoop)
Increased num. of messages
16 MB/map task 16 MB/reduce task 32 MB/map task 32 MB/reduce task 64 MB/map task 64 MB/reduce task
21 48 33 62 57 117
7 16 11 21 19 39
17 41 29 71 54 113
37 45 41 55 50 69
30 29 30 34 31 30
(a) Network workload.
(b) CPU workload.
(c) Memory workload. Fig. 12. Workloads of the JobTracker running the 32 MB/map wordcount task under the standard Hadoop and SHadoop. The solid lines represent the standard Hadoop and the dash lines represent SHadoop.
Table 6 shows the time cost and transmitted message numbers of each task in SHadoop and the standard Hadoop. Messages sent in SHadoop are more than messages sent in the standard Hadoop. This is because each slot of a TaskTracker sends instant messages independently. This reduces the opportunities to coalesce messages of the other slots in the same TaskTracker at the beginning and end of tasks. However, as shown in Table 6, the increased message number is only around 30. Also, the physical size of each message is around 2 KB, thus the increased data size of a TaskTracker is only dozens of kilo bytes that is not large for a moderately-sized cluster. We also recorded the system workloads of the JobTracker and TaskTrackers when running the tasks in Table 6. The workloads of a typical task, 32 MB /map task, is chosen to elaborate the impact on the system workloads. The Fig. 12 shows the workload of the JobTracker during running the task under the standard Hadoop and SHadoop. From the Fig. 12(a), we can find that, the increased network traffic in the cluster is only several mega bytes. This increment is quite acceptable for commonly-used cluster network environment such as Fast Ethernet or Gigabit Ethernet in our cluster. As shown
in Fig. 12(b) and (c), CPU and memory usages do not increase much. This proves that the optimization does not overwhelm the JobTracker. Similarly, The workload of the TaskTracker during the experiments is shown in Fig. 13. We can see that the workload of the TaskTracker does not change much when the optimizations are applied. To sum up, on one hand, as the number of the increased messages is fixed and their data sizes are not large, the optimizations made in SHadoop will not cause much overhead to the system and overwhelm the system. On the other hand, they improve the job execution performance by leading the MapReduce jobs to make better use of the hardware resources in a small to moderately-sized cluster. Some studies [31] have already noted that small to moderatelysized MapReduce clusters with no more than dozens of machines with hundreds of cores in total are very common in most companies and laboratories. In the official Hadoop powered-by report website [18], we can see that among about total 600 Hadoop clusters from near 160 industry and research institutions, around 95% clusters are made up of a few dozens of nodes. Only
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
(a) Network workload.
2177
(b) CPU workload.
(c) Memory workload. Fig. 13. Workloads of TaskTracker running the 32 MB/map wordcount task under the standard Hadoop and SHadoop. The solid lines represent the standard Hadoop and the dash lines represent SHadoop.
a few clusters from Yahoo!, Facebook and eBay reach the scale of more than 500 nodes. From this point of view, the cluster with the size of 36 nodes and 288 cores in our experiment is close to the scale of most Hadoop clusters used in the production environment. Therefore, our optimization work towards efficiently utilizing small to moderately-sized clusters will be a contribution to improve the performance of Hadoop in real applications. 6. Conclusion and future work MapReduce is a popular programming model and framework for processing large datasets. It has been widely used and recognized for its simple programming interfaces, fault tolerance and elastic scalability. However, the job execution performance of MapReduce is relatively disregarded. In this paper, we explore an approach to optimize the job and task execution mechanism and present an optimized version of Hadoop, named SHadoop, to improve the execution performance of MapReduce jobs. SHadoop makes two major optimizations by optimizing the job initialization and termination stages and providing an instant messaging communication mechanism for efficient critical event notifications. The first optimization can shorten the startup and cleanup time of all the jobs. It is especially effective for the jobs with short running time. The second optimization can benefit most short jobs with large deployment or many tasks. One potential side-effect of our optimizations is that it may bring a little more burden to the JobTracker as it needs to create and delete an empty temporary directory for each job. However, the increased burden is very little as shown in Figs. 12 and 13, unless the machine running the JobTracker is configured with relatively low computational resource. Also, if the jobs are always longrunning ones, our optimizations will not benefit much to the jobs.
Compared with the standard Hadoop, SHadoop can achieve 25% performance improvement on average for various tested Hadoop benchmarks jobs and Hive applications. It also achieves excellent scalability when data size varies and excellent speedup when the number of cluster nodes increases. Moreover, SHadoop preserves all the features of the standard Hadoop MapReduce framework, without changing any programming APIs of Hadoop. It can be fully compatible with the existing programs and applications built on top of Hadoop. To the best of our knowledge, SHadoop is the first effort that optimizes the execution mechanism inside map/reduce tasks. Experimental results have shown that SHadoop works well on small to moderately-sized clusters which are the most cases in practice and achieves the stable performance improvement for comprehensive benchmarks. Our optimization work is a contribution to the Hadoop MapReduce framework and now has been integrated into the Intel Distributed Hadoop [11] after passing a production-level test in Intel. We have also distributed SHadoop as an open source project, which can be visited at the Github website [19]. In the future, we will try to explore more possible optimizations to further improve the MapReduce performance by better utilizing cluster hardware resources. Currently the slots of a Hadoop cluster can only be statically configured and used, which limits the dynamic scheduling of slots in terms of the actual utilization rate and workload of computing resources. Thus, we are planning to work and study on a resource context-aware optimization model and approach for dynamic slot scheduling for the Hadoop MapReduce execution framework. We will also work on a job cost-aware model and approach for optimizing the MapReduce job scheduler in terms of workloads for different types of applications such as computation-intensive, I/O intensive or memory-intensive jobs. Finally, we plan to integrate all these new optimizations
2178
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179
with the optimizations proposed in this paper to achieve more performance improvement. Acknowledgments This work is funded in part by China NSF Grants (61223003) and the National High Technology Research and Development Program of China (863 Program) (2011AA01A202). References [1] Apache Hadoop. http://hadoop.apache.org. [2] S. Babu, Towards automatic optimization of mapreduce programs, in: Proceedings of the 1st ACM symposium on Cloud computing, SoCC, 2011, pp. 137–142. [3] Y. Becerra Fontal, V. Beltran Querol, P, D. Carrera, et al. Speeding up distributed MapReduce applications using hardware accelerators, in: International Conference on Parallel Processing, ICPP, 2009, pp. 42–49. [4] Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO & Co. http://www.michael-noll.com/blog/2011/04/09/benchmarkingand-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbenchmrbench/. [5] Y. Chen, A. Ganapathi, R. Griffith, R. Katz, The case for evaluating mapreduce performance using workload suites, in: 19th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, MASCOTS, 2011, pp. 390–399. [6] Danga Interactive, memcached, http://www.danga.com/memcached/. [7] J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters, Commun. ACM 51 (1) (2008) 107–113. [8] M. Hammoud, M. Sak, Locality-aware reduce task scheduling for MapReduce, in 3nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom, 2011, pp. 570–576. [9] C. He, Y. Lu, D. Swanson, Matchmaking: a new MapReduce scheduling technique, in: 3rd International Conference on Cloud Computing Technology and Science, CloudCom, 2011, pp 40–47. [10] S. Huang, J. Huang, J. Dai, T. Xie, B. Huang, The HiBench benchmark suite: characterization of the MapReduce-based data analysis, in: 26th International Conference on Data Engineering Workshops, ICDEW, 2010, pp. 41–51. [11] Intel Distributed Hadoop. http://www.intel.cn/idh. [12] R. Lee, T. Luo, Y. Huai, F. Wang, Y. He, X. Zhang, Ysmart: yet another sqlto-mapreduce translator, in: 31st International Conference on Distributed Computing Systems, ICDCS, 2011, pp. 25–36. [13] B. Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy, A platform for scalable onepass analytics using MapReduce, in: Proceedings of the 2011 ACM SIGMOD international conference on Management of data, 2011, pp. 985–996. [14] H. Mao, S. Hu, Z. Zhang, L. Xiao, L. Ruan, A load-driven task scheduler with adaptive DSC for MapReduce, in: 2011 IEEE/ACM International Conference on Green Computing and Communications, GreenCom, 2011, pp 28–33. [15] R. Nanduri, N. Maheshwari, A. Reddyraja, V. Varma, Job aware scheduling algorithm for MapReduce framework, in: 3rd IEEE International Conference on Cloud Computing Technology and Science, CloudCom, 2011, pp. 724–729. [16] C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig latin: a not-soforeign language for data processing, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 2008, pp. 1099–1110. [17] R. Pike, S. Dorward, R. Griesemer, S. Quinlan, Interpreting the data: parallel analysis with Sawzall, Sci. Program. J. 13 (4) (2005) 227–298. [18] PoweredBy—Hadoop Wiki. http://wiki.apache.org/hadoop/PoweredBy. [19] RongGu/SHadoop. https://github.com/RongGu/SHadoop. [20] S. Seo, et al. HPMR: prefetching and pre-shuffling in shared MapReduce computation environment, in: International Conference on Cluster Computing and Workshops, CLUSTER, 2009, pp. 1–8. [21] M. Stonebraker, D. Abadi, D.J. DeWitt, S. Madden, E. Paulson, A. Pavlo, A. Rasin, MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53 (1) (2010) 64–71. [22] A. Thusoo, et al. Hive-A warehousing solution over a map-reduce framework, in: Proceedings of the VLDB Endowment, Vol. 2, No, 2, Aug. 2009, pp. 1626–1629. [23] Todd Lipcon/[MAPREDUCE-1906] Lower default minimum heartbeat interval for tasktracker >Jobtracker—ASF JIRA. https://issues.apache.org/jira/browse/ MAPREDUCE-1906. [24] Under the hood: scheduling MapReduce jobs more efficiently with Corona. https://www.facebook.com/notes/facebook-engineering/underthe-hood-scheduling-mapreduce-jobs-more-efficiently-withcorona/10151142560538920. [25] R. Vernica, A. Balmin, K.S. Beyer, V. Ercegovac, Adaptive MapReduce using situation-aware mappers, in: Proceedings of the 15th International Conference on Extending Database Technology, 2012, pp 420–431. [26] Y. Wang, X. Que, W. Yu, D. Goldenberg, D. Sehgal, Hadoop acceleration through network levitated merge, in: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, 2011, pp. 57–67.
[27] J. Xie, et al. Improving MapReduce performance through data placement in heterogeneous Hadoop clusters, in: 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Ph.D. Forum, IPDPSW, 2010, pp. 1–9. [28] M. Xin, H. Li, An implementation of GPU accelerated MapReduce: using Hadoop with OpenCL for data-and compute-intensive jobs, in: 2012 International Joint Conference on Service Sciences, IJCSS, 2012, pp. 6–11. [29] H.H. You, C.C. Yang, J.L Huang, A load-aware scheduler for MapReduce framework in heterogeneous cloud environments, in: Proceedings of the 2011 ACM Symposium on Applied Computing, 2011, pp. 127–132. [30] M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, I. Stoica, Improving mapreduce performance in heterogeneous environments, in: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI, 2008, pp. 29–42. [31] S. Zhang, J. Han, Z. Liu, K. Wang, S. Feng, Accelerating MapReduce with distributed memory cache, in: 15th International Conference on Parallel and Distributed Systems, ICPADS, 2009, pp. 472–478.
Rong Gu, received the B.S. degree in computer science from Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2011. He is currently a Ph.D. candidate in computer science at Nanjing University, Nanjing, China. His research interests include parallel and distributed computing, cloud computing, and big data parallel processing.
Xiaoliang Yang, received the B.S. degree in computer science from YanShan University, China, in 2008 and the Master degree in computer science from the Nanjing University, Nanjing, China, in 2012. He currently works at Baidu. His research interests include parallel and distributed computing and bioinformatics.
Jinshuang Yan, received the B.S. degree in computer science from the Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2010. He is currently a Master student in computer science at Nanjing University, Nanjing, China. His research interests include parallel computing, and large-scale data analysis.
Yuanhao Sun joined Intel in 2003. He was managing the big data product team in Datacenter Software Division at Intel Asia-Pacific R&D Ltd., leading the efforts for Intel’s distribution of Hadoop and related solution and services. Yuanhao received his bachelor and master degree from Nanjing University, both in computer science.
Bin Wang received the B.S. degree in software engineering from Nanjing University. He is currently a Master degree candidate in software engineering at Nanjing University. And he is taking an internship in Intel Asia-Pacific R&D Center. His research interests include distributed computing, large-scale data analysis and data mining.
R. Gu et al. / J. Parallel Distrib. Comput. 74 (2014) 2166–2179 Chunfeng Yuan is currently a professor in computer science department of Nanjing University, China. She received her bachelor and master degree from Nanjing University, both in computer science. Her main research interests include compute system architecture, big data parallel processing and Web information mining.
2179
Yihua Huang is currently a professor in computer science department of Nanjing University, China. He received his bachelor, master and Ph.D. degree from Nanjing University, both in computer science. His main research interests include parallel and distributed computing, big data parallel processing and Web information mining.