Hadoop i racted environm several Hadoop optimal task se e. To improve t f replicas ... scheduling solutions are not efficient when dealing BigData applications.
Available online at www.sciencedirect.com
ScienceDirect IERI Procedia 10 (2014) 70 – 75
20014 Internattional Confe ference on Future F Inform mation Enggineering
An Optiimal Tassk Selecttion Schheme for Hadoopp Scheduuling S. Suuresha*, N.P P. Gopalan nb Deparrtment of computeer Applications ,N National Institutee of Technology, Ti Tiruchirappalli 6332 015,India
Absttract MapR Reduce is a popular p parallel programmingg model used to solve widee range of BiggData applications in cloud compputing environm ment. Hadoop is an open sourcce implementattion MapReduce and widely used by vast amount of users. It proovides an abstrracted environm ment for runninng large scale data d intensive applications a in a scalable and fault tolerant mannner. There are several s Hadoopp scheduling alggorithms are prooposed in the liiterature with vaarious performaance goals. In this paper, p a new optimal o task seelection schemee is introduced in to assist th he scheduler whhen multiple loocal tasks are availlable for a nodee. To improve the t probability of percentage of local tasks launched for a job j in future, thhe task which has least number off replicas of inpuut, individual looad of disks atttached to the no ode and maximuum expected tim me to wait for oposed methodd was evaluatedd by extensive next local node is laaunched amongg the available local tasks for a node. The pro m improvves the perform mance significanntly. From the experiments, experiments and it has been obseerved that the method o locality and fairness. f arounnd 20% of imprrovements achieved in terms of © 2014 Authors. by Elsevier This isB.V. The AuthPublished ors. Published d by B.V. Elsevier Ban open access article under the CC BY-NC-ND license © 20014The (http://creativecommons.org/licenses/by-nc-nd/3.0/). Seleection and peer review undeer responsibilitty of Informattion Engineering Research Institute Selection and peer review under responsibility of Information Engineering Research Institute
Keyw words:Hadoop,MaapReduce,task selection, job schedduling.
1. In ntroduction A recent rapidd advancemennt in computinng systems, it has been appllied to solve very As v complex problems in
* Corresponding author. a Tel.: +91-99941506562. E E-mail address: suureshtvmalai85@ @gmail.com
2212-6678 © 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer review under responsibility of Information Engineering Research Institute doi:10.1016/j.ieri.2014.09.093
S. Suresh and N.P. Gopalan / IERI Procedia 10 (2014) 70 – 75
diverse fields. The data generated and processed by such applications are extremely very large in size. Several computing paradigms such as grid computing, cloud computing are emerged to solve large scale data intensive applications efficiently. However, traditional programming models, load balancing, debugging and scheduling solutions are not efficient when dealing BigData applications. Redefining the solutions to the aforementioned problems of large scale data intensive applications attracted much of research audience. Several frameworks are proposed in the literature to efficiently handling BigData applications [1-4]. MapReduce [1] is a popular programming model for data intensive applications in cloud environments. It hides all the parallel programming difficulties from the users and provides abstracted environment. Hadoop [5] is a widely used open source implementation of MapReduce. Being capable of providing fault tolerance, load balancing and scalable service over commodity hardware makes Hadoop as a standard for BigData processing. The performance of Hadoop is mainly based on scheduler. Several scheduling algorithms [6-10] are proposed for different kind of workloads, environments and applications. In this paper, a new optimal task selection scheme is introduced in to assist the scheduler when multiple local tasks are available for a node. To improve the probability of percentage of local tasks launched for a job in future, the task which has least number of replicas of input, individual load of disks attached to the node and maximum expected time to wait for next local node is launched among the available local tasks for a node. The proposed method was evaluated by extensive experiments and it has been observed that the method improves the performance significantly. From the experiments, around 20% of improvements achieved in terms of locality and fairness. The remainder of the paper is organized as follows. Section 2 explores the related works on Hadoop scheduling. The proposed method is presented in section 3. The performance of the proposed method is evaluated in section 4 and section 5 concludes the paper. 2. Related Works Hadoop uses master slave architecture for running applications concurrently. The master and slave nodes are communicated by sending heartbeat messages. Wherever a heartbeat received form an empty node, Hadoop scheduler starts launching a task to that node. Hadoop uses default FIFO scheduler for job scheduling. FIFO lacks in performance in terms of response time of small jobs and fair sharing among users. Several scheduling algorithms [6-10] are proposed for different kind of workloads, environments and applications. Yahoo proposed Capacity Scheduler [6] to share the Hadoop cluster among various organizations. A set of queues with minimum guaranteed capacity is created and shared the cluster accordingly. Facebook proposed Hadoop Fair Scheduler (HFS) [7] to share the cluster fairly among users/applications. Set of pools are created and a minimum share is assigned to each. Zaharia et al. proposed LATE [8] and Delay scheduling [9] for efficiently launching speculative task and achieving improved locality by slightly relaxing fairness respectively. The deadline constraints of job are considered in [10]. The readers interested in further Hadoop scheduling techniques can refer a detailed survey in [11]. Although several algorithms proposed in the literature, none of them considered the problem of selecting optimal task from set available local tasks for a node. 3. Proposed Method Hadoop scheduler chooses a job to assign a slot according to the scheduling policy whenever a node becomes free. After deciding the job, the scheduler starts searching to find a local task from list of unscheduled tasks of the job to launch on free node. If a local task found, scheduler launching the task immediately. If no local task is available, scheduler starts launching non local task or chance is given to next
71
72
S. Suresh and N.P. Gopalan / IERI Procedia 10 (2014) 70 – 75
job in the queue based on scheduling policy. By default, Hadoop lunches the first local task found among the available local tasks of the job. Suppose there are many local tasks available for the job, benefit of choosing a task among them compare to another is not considered. Carefully choosing a task from set of available local tasks based on various factors may improve the locality of other tasks in future and may achieves improved load balancing. By default, Hadoop replicates each input blocks into 3 different nodes. Several replication algorithms are creating and deleting the files/blocks dynamically according to workload characteristics and balancing the load among the nodes. Hadoop uses Hadoop Distributed File System (HDFS) for storing input and output data which divides the files into fixed sized blocks. This will result a situation where the input of an application (which may consists of many files) is stored with different replica count. Usually the number of nodes is less than the number blocks and each blocks are replicated into many places (either static or dynamic replication), many input blocks of the job are stored in the same node. In addition, many files are supplied as an input of an application which further increases the chance of number of local tasks of the job to a node. To choose an optimal task with the goal of highest possible locality of other tasks in mind, the knowledge of future arrival of preferred nodes free slots is to be known. In addition, launching a task whose input is stored on least loaded disk among the disks attached to the node also achieves high transfer rate and yields better load balancing at disk level. The nodes future arrival times are unknown before their arrival. To predict the future arrival time of nodes for tasks, progress rates of the nodes is used. The following things are considered while selecting a task: the replica count of input file/block, predicted arrival time of task’s next free node and the current load of the disk where the task’s input is stored. The future arrival time of a node with data locality for a task is estimated and used for selecting task as similar to the optimal page replacement algorithm in operating systems. Let T = {t1, t2, ... ti} be the set of unscheduled tasks of job Ji have their input on node ni (set of local tasks to node ni). Li is the average tasks length running on the slots where task ti’s input available. The Rt is the replication factor of ti’s input and Sik is the number of slot available in the kth local node of ti. The task’s expected time for next free slot is calculated as follows: (1) ET(ti) = ೃ σೖసభ ௌೖ The load of a disk attached to a node is equal to the sum of I/O request from the tasks running on the node and some of I/O request from non local tasks running on other nodes accessing the disk through network. To distribute the load across the disks attached to a node, choosing task whose input is stored on disk in which least number running tasks’ inputs stored is desirable. For simplicity, the non local tasks are not considered for calculating disk load. The disk load on which task’s input is stored calculated as follows: DL(ti) = 1 -
ே ே௦
(2)
where Ns is the number of slots in the node and Ni is the number of running tasks reads input from disk where the input of task ti is stored. Similarly, the replication weightage based on replication count is calculated as follows: RW(ti) = 1 -
ோ௧ି୫୧୬ሺோ௧ሻ ୫ୟ୶ሺோ௧ሻି୫୧୬ሺோ௧ሻ
(3)
73
S. Suresh and N.P. Gopalan / IERI Procedia 10 (2014) 70 – 75
where max(Rt) and min(Rt) are the maximum and minimum number of replicas of tasks input blocks respectively. The more number of replica count gives small value of RW(ti) and least number of replica count gives higher value of RW(ti). The values of DL(ti) and RW(ti) are ranging from zero to one. The value of ET(ti) is normalized between zero and one. Finally, for task selection, a multi objective function as the linear combination of aforementioned three objective functions is formalized as follows: TS(T) = max {Į × ET(ti) + ȕ × DL(ti) + Ȗ RW(ti)}, 1 i n
(4)
ே ோ௧ି୫୧୬ሺோ௧ሻ TS(T) = max{Į × ( ೃ ) + ȕ × (1 ) + Ȗ × [1 ]}, 1 I n (5) σೖసభ ௌೖ ே௦ ୫ୟ୶ሺோ௧ሻି୫୧୬ሺோ௧ሻ The symbols Į, ȕ and Ȗ are proportion variables corresponding objective functions and n is the number of local tasks available for a node from the job which could be scheduled next according to scheduling policy. The performance overhead of proposed method is trivial. Because, the number of tasks of a job which are local to a node is not a larger value even the cluster size is large. Also, calculating the value of TS(T) for small n value takes negligible amount of time. In addition, the proposed method is used with any Hadoop scheduling algorithm and any systems beyond Hadoop which has several scheduling options. The objective function is defined as a linear combination of several objectives which gives a flexibility to add or remove objectives easily based on our requirements. The values of proportion variables Į, ȕ and Ȗ are fixed according to our needs and their sum could be one. The higher value means more weightage for that objective function. 4. Performance Evaluation The performance of the proposed method is evaluated by extensive experiments. The simulation experiments were run on 20 node cluster. Each node has 2.4GHz quad core processor and 4GB of RAM. All nodes are having three 500GB sata hard disk drives of 7200RPM and nodes are connected with 1 Gbps Ethernet. The WordCount job is used for experiments and input data generated using RandomTextWriter program in Hadoop distribution of 1.0.4 version. There are 100 jobs are generated with random input sizes with the inter-arrival time of 15 seconds. The values of Į, ȕ and Ȗ are fixed 0.4, 0.3 and 0.3 respectively throughout the experiments. All experiments are repeated 5 times and average values of results obtained is presented. The Hadoop Fair Scheduler (HFS) [7] is used as a scheduler in the experiments. The performance of proposed method is measured by the improvements achieved in data locality. Figure 1 shows the percentage of local tasks launched with HFS and HFS plus proposed task selection method for various job sizes. Small jobs (1 to 2 tasks) are achieved lower percentage of locality in HFS as input is available in few nodes. Also, using the proposed method with HFS does not improve the locality significantly for small jobs. Because, the proposed method is not applied for single task jobs and jobs with 2-4 tasks has very low probability for more than one tasks input blocks are stored on the same node. The performance of proposed method for large jobs (jobs with more than 50 tasks) are also achieves negligible improvements over HFS. Large jobs are usually having the input data in larger number of node, hence leads to highest locality by default. On the other hand, the proposed method improves the locality for medium size jobs (5 to 50 map tasks) significantly. Around 20% of improvement is achieved in launching local nodes by the proposed method over the HFS. Delay scheduling [9] proposed by Zaharia et al. improves the data locality by relaxing fairness of the resource allocation. Users can achieve desirable locality level by setting the delay threshold accordingly. The performance of the proposed scheme is also tested using HFS with delay scheduling. The results obtained in
74
S. Suresh and N.P. Gopalan / IERI Procedia 10 (2014) 70 – 75
terms of locality are not presented here because of negligible performance improvement. On the other hand, a considerable improvement is achieved in fairness of resource allocation among jobs (Fig. 2). The proposed method increases the chance of task locality which reduces the fairness relaxation of delay scheduling technique to achieve desired locality. So, the proposed method improves the performance in terms of fairness in the case where delay scheduling is used for higher locality. Improvement in locality and fairness also yields better performance in jobs response time and load balancing. In future, analyzing the performance of the proposed algorithm with various scheduling algorithms, cluster sizes and workloads is planned. Also several other factors which are influencing the system performance will be considered for task selection.
PercentLocalMaps
HFS
HSF+Proposed
100 80 60 40 20 0 1Ͳ 4
5Ͳ 20 21Ͳ 50 51Ͳ 150 151Ͳ 300 JobSize(NumberofMapTasks)
Fig. 1. Percentage of local tasks launched for various job sizes
Fig. 2. Fairness improvements of proposed method over delay scheduling
>350
S. Suresh and N.P. Gopalan / IERI Procedia 10 (2014) 70 – 75
5. Conclusion In this paper, an optimal task selection method is proposed in Hadoop environments. A multi objective function based on various objectives, such as the replica count of input file/block, estimated arrival time of tasks next free node and the current load of the disk, is presented to select a task from set of available local tasks of the job. The performance of the proposed method is evaluated by extensive experiments and observed around 20% of improvements in terms of locality over HFS. Further, it also achieves significant improvement in fairness without affecting locality used with delay scheduling. If future, several other factors which are influencing the system performance will be considered for task selection.
References [1] VJeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008. [2] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed dataparallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 41(3):59–72, 2007. [3] Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. Twister: a runtime for iterative MapReduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 810–818, 2010. [4] Yunhong Gu and Robert L Grossman. Sector and sphere: the design and implementation of a highperformance data cloud. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1897):2429–2445, 2009. [5] Apache Hadoop. http://hadoop.apache.org/. [6] Hadoop’s Capacity Scheduler: http://hadoop.apache.org/docs/stable/capacity_scheduler.html [7] Hadoop’s Fair Scheduler: http://hadoop.apache.org/docs/stable/fair_scheduler.html [8] M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2008. [9] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European Conference on Computer systems (EuroSys), 2010. [10] K. Kc and K. Anyanwu, "Scheduling Hadoop Jobs to Meet Deadlines", in Proc. CloudCom,388-392, 2010. [11] B.Thirmala Rao, LSS.Reddy. Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments. International Journal of Computer Applications,34(9):29-33,2011.
75