Jun 12, 2017 - Procedia Computer Science 108C (2017) 2240â2249 .... In [4] authors proposed static and dynamic scheduling algorithms called offline and online ... from already-launched topology and performs rescheduling for the best.
Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 108C (2017) 2240–2249
International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Performance-aware of streaming Zurich,scheduling Switzerland
applications using genetic algorithm Performance-aware scheduling of Performance-aware scheduling of streaming streaming applications using genetic algorithm applications usingMelnik genetic algorithm Pavel Smirnov, Mikhail and Denis Nasonov ITMO University, Saint-Petersburg, Russia
{smirnp, mihail.melnik.ifmo, denis.nasonov}@gmail.com Pavel Smirnov, Mikhail Melnik Melnik and Denis Denis Nasonov Nasonov Pavel Smirnov, Mikhail and ITMO Russia ITMO University, University, Saint-Petersburg, Saint-Petersburg, Russia {smirnp, mihail.melnik.ifmo, denis.nasonov}@gmail.com {smirnp, mihail.melnik.ifmo, denis.nasonov}@gmail.com
Abstract The main objective of Decision Support Systems is detection of critical states and response on them in time. Such systems can be based on constant monitoring of continuously incoming data. Stream Abstract Abstract processingobjective is carried out on the basis of Systems computing detection infrastructurecritical and specialized frameworks them such as The The main main objective of of Decision Decision Support Support Systems is is detection of of critical states states and and response response on on them in in Apache Storm, Flink, Spark Streaming. However, to provide the necessary system performance at high time. time. Such Such systems systems can can be be based based on on constant constant monitoring monitoring of of continuously continuously incoming incoming data. data. Stream Stream load incoming data, additional mechanisms are required. In particular, the efficient processing is out basis of infrastructure and frameworks such processing is carried carried out on on the the data basis processing of computing computing infrastructure and specialized specialized frameworks such as as scheduling of streaming applications plays an important role in the data stream processing. Therefore, Apache Storm, Flink, Spark Streaming. However, to provide the necessary system performance Apache Storm, Flink, Spark Streaming. However, to provide the necessary system performance at at high high this is devoted to investigation of genetic algorithm to improve the performance dataefficient stream load incoming data, data mechanisms are In loadpaper incoming data, additional additional data processing processing mechanisms are required. required. In particular, particular,ofthe the efficient processing system. The proposed genetic algorithm is developed and integrated into Apache Storm scheduling of streaming applications plays an important role in the data stream processing. Therefore, scheduling of streaming applications plays an important role in the data stream processing. Therefore, platform, and its efficiency is compared with heuristic algorithm for the scheduling of Storm streaming this paper is devoted to investigation of genetic algorithm to improve performance of data stream this paper is devoted to investigation of genetic algorithm to improve the performance of data stream applications. processing processing system. system. The The proposed proposed genetic genetic algorithm algorithm is is developed developed and and integrated integrated into into Apache Apache Storm Storm platform, and its efficiency is compared with heuristic algorithm for scheduling of Storm platform, and its efficiency is compared with heuristic algorithm for scheduling of Storm streaming streaming © 2017 The Authors. Published by Elsevier B.V. Keywords: scheduling, data streaming, apache storm, genetic algorithm applications. applications. Peer-review under responsibility of the scientific committee of the International Conference on Computational Science Keywords: Keywords: scheduling, scheduling, data data streaming, streaming, apache apache storm, storm, genetic genetic algorithm algorithm
1 Introduction
Decision Support Systems (DSS) are becoming more and more essential part of the modern world. 1 Introduction 1 TheyIntroduction find a place in our everyday life, as well as in Early Warning Systems (EWS), which are
designed for monitoring, prediction and reaction onmore upcomingmore hazards and disasters, for example: for Decision Decision Support Support Systems Systems (DSS) (DSS) are are becoming becoming more and and more essential essential part part of of the the modern modern world. world. prevention of floods [1], in clinical practice [2] or for evacuation from damaged ships [3]. Integrated They find a place in our everyday life, as well as in Early Warning Systems (EWS), which are They find a place in our everyday life, as well as in Early Warning Systems (EWS), which are in DSS models use data to fit parameters and provide accuratehazards predictions. The computation of such designed for monitoring, monitoring, prediction and reaction reaction on upcoming upcoming hazards and disasters, disasters, for for designed for prediction and on and for example: example: for models must be completed in time to get the possibility to react on possible critical situations. prevention of floods [1], in clinical practice [2] or for evacuation from damaged ships [3]. Integrated prevention of floods [1], in clinical practice [2] or for evacuation from damaged ships [3]. Integrated Therefore, it is necessary to ensure theprovide required computational environment which consist of in use parameters and accurate predictions. The of in DSS DSS models models use data data to toforfit fitDSS parameters and provide accurate predictions. The computation computation of such such heterogeneous computing resources, required software packages andpossible middleware tosituations. manage models must be completed in time to get the possibility to react on critical models must be completed in time to get the possibility to react on possible critical situations. environment. Deployed management system must not only organize the computational process, but Therefore, Therefore, it it is is necessary necessary for for DSS DSS to to ensure ensure the the required required computational computational environment environment which which consist consist of of also should optimize it. Effective scheduling mechanisms may lead to improved performance or heterogeneous computing resources, required software packages and middleware to manage heterogeneous computing resources, required software packages and middleware to manage reduced energy costs. The choice ofsystem optimization depends the on computational application. The data but for environment. Deployed management must only process, environment. Deployed management system must not notcriteria only organize organize the computational process, but analyzing may come with a certain periodicity which must be processed for specified period of time, also should optimize it. Effective scheduling mechanisms may lead to improved performance also should optimize it. Effective scheduling mechanisms may lead to improved performance or or reduced reduced energy energy costs. costs. The The choice choice of of optimization optimization criteria criteria depends depends on on application. application. The The data data for for analyzing analyzing may may come come with with aa certain certain periodicity periodicity which which must must be be processed processed for for specified specified period period of of time, time, 1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the International Conference on Computational Science 10.1016/j.procs.2017.05.249
Pavel Smirnov et al. / Procedia Computer Science 108C (2017) 2240–2249
that is understood as the batch processing. On the other hand, data can flow in a continuous stream of incoming data, that should be processed online. In this cases, we can talk about data streaming and corresponding streaming applications. Today the abilities of Big Data technologies allow to process gigabytes of data, receiving from devices in real-time. The stack of data-streaming technologies like Apache Storm, Flink, Heron and others, emerged during last 5 years, demonstrates actual needs in high-performance and scalable realtime data processing. Traditional Big Data tools like Hadoop, Spark, Hive, Pig and others are designed as batch processing engines and oriented to process a fixed amount of data, accumulated and prepared in advance. Stream processing model serves to continuous data flows instead. It means that large amounts of data are arriving dynamically and are impossible to be evaluated in advance. The difference between the two models leads to a difference in scheduling and resource-allocation approaches. Subject of the paper is devoted to scheduling algorithms for real-time data streaming systems. The contribution of the paper is performance-aware scheduler based on genetic algorithm (GA). The algorithm is implemented as a custom scheduler for Apache Storm and its' efficiency compared with other Storm scheduling algorithms and experimentally proved. The remainder of the paper is organized is organized as follows. In the second section, we consider several works, which related to scheduling of streaming applications. The third section provides the background of Apache Storm framework and the definition of optimization problem. Proposed GA with its operators and models for evaluation is presented in section 4 with another stream scheduling algorithm R-Storm, which is used in our experimental study for comparison. Section 5 consist of experiments, which were conducted to investigate the performance of our proposed approach, and the conclusion is presented in section 6.
2 Related works Scheduling approaches for real-time data processing basically differ from traditional batch (workflow) scheduling ones. During a literature overview we outlined number of points (see table 1), which define difference between the two kind of approaches. Schedule moment Scheduling time (s) Scheduling policy Input data
Batch Before start Non critical (up to 30 seconds) Assign computation to the nodes where the required data are stored Static amount, annotated in advance (before start)
Resources Allocated statically Table 1: Difference between batch and streaming data processing
Streaming (realtime) Periodical (no start/end) Critical (seconds, subseconds) Assign most communicating tasks together on one node or rack Continuous flow, variability of data, impossible to get in advance Allocating dynamically
Default Storm's scheduler implements a primitive round-robin strategy, which iterates tasks and assigs them to node’s slots equally. The strategy is far from optimal for heterogeneous cluster and different tasks’ workload. Several papers propose custom scheduling algorithms to improve the performance of streaming applications. The comparison of algorithms is presented in the table 1. In [4] authors proposed static and dynamic scheduling algorithms called offline and online respectively. The key idea of both algorithms is to use communication patters among executors trying to place the most communicating executors to the same slot. The algorithm is effective for topologies, where computation latency is dominated by tuples transfer time and performance improvements
2241
2242
Pavel Smirnov et al. / Procedia Computer Science 108C (2017) 2240–2249
achieved via decrease of communication overheads. The offline scheduling is based only on topology structure, which is static and known before the topology started. The online algorithm gathers communication statistics from already-launched topology and performs rescheduling for the best performance. For benchmarking authors designed a chain (linear) topology composed of a spout and 6 bolts with shuffle/fields grouping in even-odd manner. As a real-world scenario authors use footballplayers topology (designed for Grand Challenge of DEBS ’13) and gain 20-30% better performance against the default scheduler. Better performance achieved due to lighter traffic between nodes and lower latencies as result. Another traffic-aware scheduling algorithm is presented in [5]. Authors propose scheduler based on internode communication statistics, gathered by workload monitors during evaluation phase. The algorithm sorts the executors in descending order of their total traffic and assigns executors to slots with minimum internode traffic load. Every 20 seconds monitors collect workload data about each executor (CPU usage in MHz), workload of each worker node and number of tuples sent between a pair of executors. Using that statistics, a scheduling generator periodically computes a scheduling solution with control of the following constraints: all executors of single topology assigned to a single slot; executors’ total load doesn’t exceed node’s capacity; executors’ assignment not violates consolidation proportion (amount of executors placed on one node). While T-Storm performs hot swapping of scheduling algorithm and optimization of tasks’ re-assignment overheads (not to stop running tasks until the new ones will be deployed and ready to start), it also mitigates node’s overloading if it occurs. Authors demonstrate the efficacy of the algorithm using “WordCount”, “Throughput test” and “Log stream processing” topologies. As a result, rescheduling gives up to 80% reduced average processing time on decreased number of nodes (2/5/7 against 10). Another paper [6] proposes resource-aware scheduler, which operates with nodes’ and tasks’ CPU and Memory rates. User should specify information about component’s resource requirements during a topology design. Specifying that requirements is possible due to author's extension of standard Storm's component's interfaces, which is already implemented in Storm 2.0 (available on Github). The algorithm finds the closest Eucledian distance in 3-d space between two points: task’s requirements and particular node’s available capacities. The dimensions of the space reflect node's performance capabilities (CPU, Memory, Network distance) with weights according to user's preference. Authors provide experimental results comparing CPU-utilization and throughput (tuples per second) of whole system for several commonly used topology structures scheduled by default and resource-aware algorithms. Authors provide comparable statistics of scheduling of two custom topologies used in production in Yahoo. The proposed resource-aware scheduler gives 30-50% and 70-350% better topology throughput against the default one. Better performance achieved due to higher intensity utilization of nodes with respect to knowledge about its capacities and tasks’ requirements. Paper [7] is also devoted to resource-aware scheduling for Storm. Authors pay attention to scheduling on heterogeneous clusters and try to solve two particular problems: (1) find a way to gather an explicit knowledge of performance characteristics for individual types of nodes employed in the cluster for different types of computations; (2) provide an ability to analyze a computation characteristics of incoming tasks and input data. Proposed in this work Scheduling advisor is a mechanism that allocates tasks on nodes. The first problem is solved via user-defined helper-tasks tagging input data at run-time by their expected computation characteristics. Tagging process is considered to be a part of topology design, i.e. during the design of topologies on particulars infrastructures. For the solving of second problem evaluation of tasks' consumption profiles is proposed, which accumulated on a particular cluster during short benchmarking stage before primary processing. Considered in this section works are heuristic algorithms which are based on several rules to choose the optimal placement of executors. Algorithms may use only predefined characteristics of streaming topology (offline) or statistical data, gathered by monitoring systems (online). However, there are a great number of works related to workflow scheduling in batch mode [8][9], where
Pavel Smirnov et al. / Procedia Computer Science 108C (2017) 2240–2249
evolutionary algorithms performed well, including our previous research works [10]. Therefore, in this paper we propose a genetic algorithm for scheduling of streaming applications and investigate its possibility to improve the performance of streaming applications.
3 Background 3.1 Apache Storm Apache Storm is a popular open-source platform for streams processing [11]. Sequences of required processing operations are defined by topologies - acyclic graphs of components. Topologies design is done by application designer on Java, Python [12]. Atomic piece of data transferred between components over TCP is called tuple. Initial tuples emissions are performed by spouts - components, which define logical starting points of a topology. Tuples consumption, processing and emission downside the topology are performed by bolts - components, logical processing units. Total amount of tasks to be scheduled depends on amount of components and a number of parallel workers (parallelism hint) for each of them. From physical point of view, separated nodes of storm cluster are called supervisors. Due to performance characteristics, every supervisor has several slots, which are available to be loaned by executors – JVM applications with internal threads (workers in Storm’s terminology). The common scheduling problem is to assign tasks to the best-fit slots, minimizing inter-node traffic and maximizing CPU utilization of every physical node.
3.2 Problem statement Let 𝑊𝑊𝑊𝑊𝑊𝑊 is a streaming topology and it can be represented by two sets 𝑊𝑊𝑊𝑊𝑊𝑊 =< 𝐴𝐴, 𝐸𝐸 >. The first set 𝐴𝐴 = {𝑡𝑡𝑖𝑖 } is a set of elementary tasks or executors, while the second set 𝐸𝐸 = {(𝑡𝑡𝑘𝑘 , 𝑡𝑡𝑙𝑙 )} is a set of edges between executors and defines data transfers between them. In this work we assume, that each executor requires specified amount of CPU and RAM for execution. The computing environment 𝐸𝐸𝐸𝐸𝐸𝐸 is also defined by two sets 𝐸𝐸𝐸𝐸𝐸𝐸 =< 𝑅𝑅, 𝐵𝐵 >. The set 𝑅𝑅 = {𝑟𝑟𝑖𝑖 } presents physical computing resources. Each resource 𝑟𝑟𝑖𝑖 is composed of different characteristics such as CPU cores and frequency, RAM, GPU, etc. Considering virtualization technologies, each resource from 𝑅𝑅 is completely virtualized and divided into several virtual machines or Docker containers which can be considered as JVMs for the allocation of executors. Thus, all virtual elements construct a set of computing nodes or slots 𝑁𝑁 = {𝑛𝑛𝑖𝑖 |𝑛𝑛𝑖𝑖 ∈ 𝑟𝑟𝑗𝑗 } , where each node 𝑛𝑛𝑖𝑖 is allocated on one of physical resources from 𝑅𝑅. Tasks or executors must be assigned on these virtual nodes instead of physical resource. Each node inherits a part of characteristics from its parent resource. Total required CPU and RAM by assigned on node executors shouldn’t exceed node’s capacity. Beside the resources and nodes, environment 𝐸𝐸𝐸𝐸𝐸𝐸 includes a set of network routes 𝐵𝐵 between resources from 𝑅𝑅. Each network connects a subset of resources 𝐵𝐵 = {𝑏𝑏𝑖𝑖 |{𝑟𝑟𝑘𝑘 } ∈ 𝑏𝑏𝑖𝑖 }. Any schedule 𝑆𝑆 can be represented as an allocation 𝑆𝑆 = {(𝑡𝑡𝑖𝑖 , 𝑛𝑛𝑘𝑘 )} of tasks or executors from 𝐴𝐴 on computing nodes or slots from 𝑁𝑁. A performance of 𝑡𝑡𝑖𝑖 on a certain node 𝑛𝑛𝑘𝑘 is estimated by using performance models, which can be obtained by analytical or statistical ways. In any case, this models 𝑝𝑝(𝑡𝑡𝑖𝑖 , 𝑛𝑛𝑘𝑘 ) are presented as a function of task and node. Let we have a set of different optimization criteria or objectives 𝐶𝐶. Each 𝑐𝑐𝑖𝑖 (𝑆𝑆) ∈ 𝐶𝐶 is a function of a schedule that we would like to maximize (negative 𝑐𝑐𝑖𝑖 for minimization). The main optimization criteria in our work is a maximization of topology’s throughput. Throughput is considered as the amount of tuples, which were successfully processed on the final executors of a topology. Taking into account all criteria, we can write the problem definition. The main goal is to find such optimal schedule 𝑆𝑆𝑜𝑜𝑜𝑜𝑜𝑜 , that ∀𝑆𝑆 ′ ≠ 𝑆𝑆𝑜𝑜𝑜𝑜𝑜𝑜 : ∑𝑐𝑐𝑖𝑖 ∈𝐶𝐶 𝜔𝜔𝑖𝑖 𝑐𝑐𝑖𝑖 (𝑆𝑆𝑜𝑜𝑜𝑜𝑜𝑜 ) ≥ ∑𝑐𝑐𝑖𝑖 ∈𝐶𝐶 𝜔𝜔𝑖𝑖 𝑐𝑐𝑖𝑖 (𝑆𝑆 ′ ). In this expression 𝜔𝜔𝑖𝑖 are weights which define the importance rate of each criteria and normalize them.
2243
Pavel Smirnov et al. / Procedia Computer Science 108C (2017) 2240–2249
2244
4 Algorithms 4.1 R-Storm For the comparison, we used R-Storm algorithm. The main objective, which authors had set for themselves is to maximize the resource utilization and throughput which is also our main optimization criteria. Dense assignment of tasks on computing nodes entails a reduction of inter-node communication and network latency. Thus, the overall throughput might be grown. In their representation, each computing node is considered as a set of three types of resources: CPU, memory and bandwidth. Simultaneously, each task consists of three resource requirements (CPU usage memory usage, bandwidth usage). Moreover, it is assumed, that CPU and bandwidth are soft constraints, which may be not completely satisfied, whereas the memory is a hard constraint, and its used amount cannot exceed the total amount of available memory on computing node. Moreover, they assumed, that each task or executor 𝑎𝑎𝑖𝑖 in topology has specified requirements of cpu 𝑐𝑐(𝑎𝑎𝑖𝑖 ), bandwidth 𝑏𝑏(𝑎𝑎𝑖𝑖 ) and memory 𝑚𝑚(𝑎𝑎𝑖𝑖 ). Set of computing nodes 𝑁𝑁 corresponds to total available cpu 𝑊𝑊1 , bandwidth 𝑊𝑊2 and memory 𝑊𝑊3 budgets respectively. The main goal is to assign all tasks to a subset of nodes 𝑁𝑁′ ⊆ 𝑁𝑁 that increase the total throughput by maximizing resource utilization and the result schedule should not exceed the budgets 𝑊𝑊1 , 𝑊𝑊2 and 𝑊𝑊3 : 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 ∑𝜃𝜃𝑖𝑖∈𝑁𝑁′ 𝑄𝑄𝜃𝜃𝑖𝑖 , subject to ∑𝑟𝑟𝑖𝑖∈𝑁𝑁′ 𝑐𝑐𝑟𝑟𝑖𝑖 ≤ 𝑊𝑊1 ; ∑𝑟𝑟𝑖𝑖∈𝑁𝑁′ 𝑏𝑏𝑟𝑟𝑖𝑖 ≤ 𝑊𝑊2 ; ∑𝑟𝑟𝑖𝑖∈𝑁𝑁′ 𝑚𝑚𝑟𝑟𝑖𝑖 ≤ 𝑊𝑊3 . The algorithm consists of the following steps: -
tasks are ordered by using breadth first search (BFS) traversal algorithm; then, tasks are taken one by one and assigned to nodes; selection of node is performed by compute distances between task's resource requirements and available resources on nodes in 3D resource space; the node with minimal distance is assigned for current task from queue.
This algorithm has been implemented as a custom version of Apache Storm that available at [13]. Therefore, for the experimental study we deployed this extended version of Storm framework.
4.2 Genetic Algorithm Genetic algorithm (GA) is the one of the most popular evolutionary or metaheuristic algorithms. The main idea of algorithm is in a population of individuals or chromosomes, where each individual represents candidate solution. In case of scheduling, each individual or solution represents a schedule. Evolution process of GA consists of a number of evolution steps – generations. At each generation of algorithm, individuals inside the population are updated by using several mechanisms: mutation (random change inside one individual); crossover (generation of child solution from two parent solutions); selection (the extinction of the least adapted individuals). Fitness function is used to evaluate individuals and determine the best and worst solutions. In our proposed GA, we use chromosomes, which are encoded as a set of tasks, allocated on specified nodes {(𝑎𝑎𝑖𝑖 , 𝑛𝑛𝑗𝑗 )} . The example of a topology, computing environment and its possible chromosome are shown on the figure 1. In this example, streaming topology consist of 3 processing levels or bolts (spout, processing, output) with 3 executors at each level. Computing environment include 2 resources with allocated on them 4 slots in summary.
Pavel Smirnov et al. / Procedia Computer Science 108C (2017) 2240–2249
Figure 1: Encoding of chromosomes in genetic algorithm for stream scheduling problem This coding allows to build schedule for streaming topology. Mutation operator in the developed GA is a random change of one of executor’s node in chromosome or swap of nodes for two randomly taken executors. During the crossover operator, child individual receives nodes for a task from two parents randomly. We used tournament mechanism as a selection operator. However, the main part of GA is a fitness function. The aim of fitness function is to build the schedule from chromosome and evaluate the performance of obtained schedule. For this purpose, we require performance models of executors on specified nodes to estimate their throughput. Developed performance models are built on statistical data, which is gathered by developed monitoring system during the run or from history data, if the same applications were already executed. The example of fitness estimation is shown in figure 2.
Figure 2: Estimation of candidate solutions by using statistical data Presented in figure 2 chromosome corresponds to chromosome from previous example. To estimate its fitness function, firstly, the schedule should be constructed. Then, by using performance models 𝑑𝑑𝑖𝑖 = 𝑝𝑝(𝑡𝑡𝑖𝑖 , 𝑛𝑛𝑗𝑗 ) we determine the approximate amount of processed tuples per second. However, performance of executor 𝑡𝑡𝑖𝑖 depends not only on its assigned node, but also on a parent executor. Therefore, the performance of child executors is determined as minimum value between its possible performance and amount of input tuples, taken from parent executor (shown for 𝑡𝑡1 , 𝑡𝑡4 , 𝑡𝑡7 ). Therefore, it is crucial to balance performance of all executors to avoid downtimes and floods on particular executors. The result fitness function is a total throughput of all output executors (𝑡𝑡7 , 𝑡𝑡8 , 𝑡𝑡9 in figure 2).
2245
2246
Pavel Smirnov et al. / Procedia Computer Science 108C (2017) 2240–2249
5 Experimental study For experimental studies, a benchmark topology was designed. By default, the topology has diamond shape composed of the following components: spout, parallel or sequential processing bolts, join-bolt. The amount of executors for every component defined via command-line arguments during topology submission. If join-bolt’s executors are set to zero, then topology obtains linear structure – process-bolts will be subscribed to each other and provide sequential processing. Amount of computational operations 𝐶𝐶𝐶𝐶 (number of sinus evaluations) and sizes of tuples 𝑆𝑆𝑆𝑆 (in bytes) also defined via command-line arguments. Due to configurable parameters, the topology makes possible to emulate behavior of: lightweight and computational-intensive topologies. In these experiments we used linear topology with 1 spout and 3 process bolts with parameters 𝐶𝐶𝐶𝐶 = 50000 and 𝑆𝑆𝑆𝑆 = 1024. The first experiments were dedicated to estimation of executors’ performance on different types of processors. To achieve performance heterogeneity in experiments we deployed cluster on nodes with the following CPU rates: 3.5 GHz (4-cores Intel Core i7-4770K – desktop PC), 2.0GHz (16-cores Intel Xeon E5-2650 - blade server), 900 MHz (4-cores ARM Cortex-A7 - Raspberry Pi 2B). Experiments with Raspberry shown an acceptable launch time for only two Storm executors in parallel – further Raspberry is considered to be 2-cores node. In order to gain equal CPU capacity for all types of nodes two 2-cores virtual machines have deployed on desktop PC and blade server. This makes experiments purer due to equal maximum number of executors for every type of node. Results of these experiments are presented in figure 3 and show that performance processing bolt (throughput) depends on type of processor.
Figure 3: Bolts’ performance (tuples per minute) on different types of processors (Raspberry – red line; Intel Xeon – yellow line; Intel Core i7 – green line) The experiment demonstrates scheduling efficiency in case of heterogeneous node’s capacities. We used four 1-4 cores nodes deployed as virtual machines (VMs) on 2.0GHz blade server (Intel Xeon E5-2650). To evaluate how tasks performance depends on node’s CPU-capacity we have launched a four-tasks linear topology with on every of our nodes and gained the following performance rates: 30 / 60 / 85 / 99 tuples/sec for 1-4 cores respectively. If the number of topologies launched on node is equal amount of node’s cores, then tasks’ performance on the node will be near 30 tuples/sec. It proves the idea that maximum tasks’ performance can be achieved via minimum CPU sharing, consuming the maximum amount of cores. The strategy is fully opposite to resource-aware ones, where profit is gained from higher nodes utilization. While resource-aware scheduling demonstrates better performance than the round-robin one, it does not control the performance when CPU sharing will occurs. In order to avoid any CPU sharing, a topology designer may declare 100% of CPU for
Pavel Smirnov et al. / Procedia Computer Science 108C (2017) 2240–2249
each component in topology, but this automatically require him to provide enough CPU-capacity cluster. After performance estimation of executors and bolts, we conducted comparison experiment for three algorithms: default Round-Robin (red line), R-Storm (yellow line) and proposed GA (green line). The results of experiment are presented in figure 4. This figure shows the throughput of whole topology per minute (tuples per minute) during 60 minutes for all algorithms. Results show up to 40% better topology performance of GA in compare to R-Storm.
Figure 4: Overall topology performance scheduled by different algorithms The other experiment was conducted on the same topology with 𝐶𝐶𝐶𝐶 = 50000 and 𝑆𝑆𝑆𝑆 = 1024. However, in this experiment we used homogeneous nodes’ capacities (1 core per each slot) of the same 2.0GHz blade server (Intel Xeon E5-2650). The results of experiment show throughput per 60 seconds for each bolt of topology for R-Storm (figure 5) and GA (figure 6) algorithms. Throughput of output bolt (processBolt_3) determines the result throughput of the whole topology. For the R-Storm average throughput of processBolt_3 is 1734 against 2018,3 for GA, that gains 16% improvement in performance for GA.
Figure 5: Bolts’ performance per minute for R-Storm
2247
2248
Pavel Smirnov et al. / Procedia Computer Science 108C (2017) 2240–2249
Figure 6: Bolts’ performance per minute for GA At the next experiment we launched three identical topologies and scheduled them with RoundRobin, R-Storm and GA. The results of experiment are shown in figure 7. In this case only GA was able to handle such high workload, whereas the performance of default and R-Storm algorithms remained equal as in the case with one topology.
Figure 7: Throughput (tuples/min) of 3 topologies scheduled by default (red line), R-Storm (yellow line) and GA (green line) In summary, proposed GA allows to find better solutions in compare to default and R-Storm algorithms. This take place due to accurate estimation of executor’s throughput by monitoring system and its independence to such rules as maximization of resource utilization, which is used in R-Storm. However, according to results of experiments on homogeneous and heterogeneous nodes' capacities, we can see, that computing environment is more homogeneous, GA’s improvement in performance is lower. It can be explained by a decrease of the solution space, since evolving solutions become similar to each other.
Pavel Smirnov et al. / Procedia Computer Science 108C (2017) 2240–2249
6 Conclusion In this work, we proposed the genetic algorithm to solve the streaming scheduling problem. Evolutionary algorithms are often used in workflow scheduling field for batch mode, but for stream processing their investigation is still required. Proposed GA is based on performance models, which are built on statistical data, gathered by monitoring system. Developed GA was compared with Storm default algorithm and with one of the most popular stream scheduling algorithms R-Storm. The results of experiments show the improvement of performance in terms of throughput in comparison to other algorithms, thus showing its suitability for scheduling of stream applications. Developed GA and its results are important for urgent computing field and decision support systems, where performance of computing models and applications must be maximized to respond in time on possible critical situations and minimize the risks of upcoming hazards or disasters.
Acknowledgements This paper is financially supported by Ministry of Education and Science of the Russian Federation, Agreement #14.587.21.0024 (18.11.2014). Unique Identification RFMEFI58715X0024.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
S. V. Ivanov, S. V. Kovalchuk, A. V. Boukhanovsky, Workflow-based collaborative decision support for flood management systems, Procedia Comput. Sci. 18 (2013) 2213–2222. I.I. Syomov, E. V Bolgova, S. V Kovalchuk, A. V Krikunov, O.M. Moiseeva, M.A. Simakova, Towards Infrastructure for Knowledge-based Decision Support in Clinical Practice, Procedia Comput. Sci. 100 (2016) 907–914. M. Balakhontceva, V. Karbovskii, S. Sutulo, A. Boukhanovsky, Multi-agent simulation of passenger evacuation from a damaged ship under storm conditions, Procedia Comput. Sci. 80 (2016) 2455–2464. L. Aniello, R. Baldoni, L. Querzoni, Adaptive Online Scheduling in Storm, Proc. 7th ACM Int. Conf. Distrib. Event-Based Syst. (2013) 207–218. J. Xu, Z. Chen, J. Tang, S. Su, T-storm: Traffic-aware online scheduling in storm, Proc. - Int. Conf. Distrib. Comput. Syst. (2014) 535–544. B. Peng, M. Hosseini, Z. Hong, R. Farivar, R. Campbell, R-Storm: Resource-Aware Scheduling in Storm, Proc. 16th Annu. Middlew. Conf. - Middlew. ’15. (2015) 149–161. M. Rychly, P. Koda, P. Mr, Scheduling decisions in stream processing on heterogeneous clusters, Proc. - 2014 8th Int. Conf. Complex, Intell. Softw. Intensive Syst. CISIS 2014. (2014) 614–619. L. Liu, M. Zhang, R. Buyya, Q. Fan, Deadline-constrained Co-evolutionary Genetic Algorithm for Scientific Workflow Scheduling in Cloud Computing. L.K. Arya, A. Verma, Workflow scheduling algorithms in cloud environment - A survey, Eng. Comput. Sci. (RAECS), 2014 Recent Adv. 2 (2014) 1–4. A.A. Visheratin, M. Melnik, D. Nasonov, Workflow scheduling algorithms for hard-deadline constrained cloud environments, Procedia Comput. Sci. 80 (2016) 2098–2106. Apache Storm. http://storm.apache.org/. Streamparse. https://github.com/Parsely/streamparse. R-Storm scheduler. https://github.com/jerrypeng/storm-resource-aware-scheduler.
2249