A Framework for Multi-cloud Cooperation with Hardware ... - IEEE Xplore

4 downloads 12494 Views 2MB Size Report
performance of the most important entity in our system under various conditions. Keywords-cloud computing; hardware acceleration; multi- cloud network; clouds ...
2013 IEEE Ninth World Congress on Services

A Framework for Multi-Cloud Cooperation with Hardware Reconfiguration Support Khaleel Mershad, Abdul Rahman Kaitoua, Hassan Artail, Mazen A. R. Saghir*, and Hazem Hajj Electrical and Computer Engineering Department, American University of Beirut, Beirut, Lebanon * Electrical and Computer Engineering Program, Texas A&M University at Qatar, Doha, Qatar e-mails: {kwm03, aak69, ha27, hh63}@aub.edu.lb, * [email protected] intensive functions that will help users search, mine, extract, and execute specialized functionalities within huge datasets. An important research trend in cloud computing focuses on cloud programming models. The MapReduce [2] model along with the Hadoop framework [3], have emerged as the leading choices for distributed cloud computing. The model provides abstractions that simplify writing applications to access distributed data. Hadoop allocates data and processing across clusters of servers and processes the data in parallel locally at each node. In a standard Hadoop configuration, the master node contains the control parts such as the Jobtracker and the NameNode. The NameNode is responsible for managing the read/write over the slave nodes and keeps a look up table in the memory for the files location over Hadoop Distributed File System (HDFS). The Jobtracker initiates the MapReduce job and tracks the processing of the job’s tasks. When executing a job, input files are divided into data blocks that are stored on datanodes, which periodically report to the NameNode with lists of blocks they are storing. While previous cloud solutions provided major benefits that maximize the use of resources within a cloud, they still face many challenges including the lack of cross-leverage of available resources across clouds, the need to move data from one cloud to another when the computations and data source belong to different providers, and the lack of a global efficient cloud cooperation system. In [1], we provided a detailed description on a framework that enables multiple cloud computing systems to cooperate in an efficient manner to answer various customer queries. In this paper, we describe an approach which efficiently integrates FPGAs with hardware acceleration capabilities into the datacenters of the cloud computing systems in [1]. Our approach in [1] focused on designing a multi-cloud system which can be divided into two layers: an intra-cloud layer (the set of cloud providers) and an inter-cloud layer (handles communication and management between cloud providers). Our approach benefits both the clients, that will be able to run their jobs on more universal datasets and get more accurate and meaningful results; and the cloud providers, since they become able to share resources, cooperate in handling extra loads, and share the corresponding revenues. In this paper, we describe an approach that further enhances the performance of multiple clouds that are cooperating with each other. Our framework allows cloud datacenters to process many operations in a faster manner while consuming less energy. For implementation, we adopt the famous Hadoop engine and propose important

Abstract—Cloud computing is increasingly becoming a desirable and foundational element in international enterprise computing. There are many companies which design, develop, and offer cloud technologies. However, cloud providers are still like lone islands. While current cloud computing models have provided significant benefits of maximizing the use of resources within a cloud, the current solutions still face many challenges including the lack of cross-leverage of available resources across clouds, the need to move data between clouds in some cases, and the lack of a global efficient cooperation between clouds. In [1], we addressed some of these challenges by providing an approach that enables various cloud providers to cooperate in order to execute, together, common requests. In this paper, we illustrate several enhancements to our work in [1] which focus on integrating hardware acceleration with the cloud services. We extend the Hadoop framework by adding provisions for hardware acceleration with Field Programmable Gate Arrays (FPGAs) within the cloud, for multi-cloud interaction, and for global cloud management. Hardware acceleration is used to offload computations when needed or as a service within the clouds. It can provide additional sources of revenues, reduced operating costs, and increased resource utilization. We derive a mathematical model for evaluating the performance of the most important entity in our system under various conditions. Keywords-cloud computing; hardware acceleration; multicloud network; clouds collaboration; Hadoop; FPGA

I.

INTRODUCTION

Over the last decade, the digital information world has been experiencing an exponential growth of data. This explosion in data has presented itself as challenge to manage, but also as an opportunity for new solutions. Emerging High tech companies, like Google, Amazon, and Facebook, have developed large businesses by providing applications to collect, search and analyze Web data. The datacenters of these companies became sources for additional business revenues by making use of idle computational resources. As a result, datacenters became “clouds”, where customers can submit requests for resource usage and computational services without knowing where the processing is made, and without having to own or maintain resources. These clouds can combine thousands of servers with added infrastructure for networking, storage, and cooling. The convergence of demand and outcome has produced the concept of cloud computing, which promises to benefit all parties involved in massive data operations. An important characteristic of Cloud platforms is that they offer globally compliant data-

978-0-7695-5024-4/13 $26.00 © 2013 IEEE DOI 10.1109/SERVICES.2013.12

52

modifications to it. These modifications are focused on integrating FPGAs that can run hardware acceleration files with Hadoop. Several recent studies have explored the use of FPGAs in MapReduce [4, 5], where FPGAs are used to implement integrated Map/Reduce engines that include multiple, application-specific Map and Reduce processors, and a hardware scheduler for assigning Map/Reduce tasks to idle processors. A modified design in [6] proposed the use of both CPU workers and FPGA workers that can run in parallel to perform Map and/or Reduce operations. In our design, we describe how it is possible to exploit the data flow characteristics of the MapReduce model and employ highlevel synthesis tools to automatically generate hardware accelerators that can be implemented and loaded in FPGAs. As a result, we propose a new service model that we call Hardware-Acceleration-as-a-Service (HAaaS). When a hardware accelerator for a given application is available, users can choose to pay a premium for faster data processing. The datacenter operator can also use the accelerators to reduce energy consumption and save costs, thus increasing job throughput and overall utilization. The rest of this paper is as follows: Section II briefly illustrates our base design which we proposed in [1]. Section III discusses the details of implementing hardware acceleration and offering HAaaS. Section IV provides a scalability analysis of the Master server. Finally, Section V concludes the paper with some future work ideas. II.

• The Bitstream Library: responsible for saving bitstream files, their metadata, documentation, etc., and for providing and updating this information when needed.

Figure 1. Components of the reconfigurable multi-cloud framework.

Within each Intra-cloud system (slave), two networks operate simultaneously: a software-based processing network that is very similar to traditional Hadoop (namenode and several datanodes), and a hardware network which comprises on one or more FPGA stacks. The two networks are connected via an FPGA Driver, which acts as a Tasktracker for all FPGAs. The FPGA Driver programs the FPGAs that are selected to execute a certain bitstream file. Also, the FPGA driver sends the bitstreams to the designated FPGAs and schedules and monitors the data transfer to and from the FPGAs by exploiting the FPGAs connection to banks of Solid State Drives via a fiber link. The FPGA Driver is managed by the Intra-cloud Jobtracker (which plays the role of an Intra-cloud Master node, as compared to traditional Hadoop). The Intra-cloud Jobtracker is one of the main entities in the software network which are: • The Intra-cloud Master: responsible for communicating with Intra-cloud Masters of other clouds for sharing data among different clouds. The Intra-cloud Master also keeps a reserve copy of all operations that are executing in its cloud. It periodically updates its data and informs the Inter-cloud Master. A major duty of Intra-cloud Masters is to help in reconstructing the Inter-cloud Master records when a failure occurs to it. • The Intra-cloud Jobtracker: in addition to taking the role of a master node as compared to traditional Hadoop, two main additional tasks are assigned to this component: first, it schedules and monitors the execution of bitstreams on the FPGAs. Second, it periodically reports the status of the Intra-cloud slave to both the Intra-cloud and the Inter-cloud Masters.

COOPERATION OF CLOUD PROVIDERS

We begin by presenting a general view of the components of the multi-cloud cooperation system which we presented in [1], and which is depicted in Fig. 1. In this section we highlight the overall design and give a brief overview of the system’s operations. Our system is divided into two major elements: an Inter-cloud network connected to several Intra-cloud slaves. The Inter-cloud network consists of four main components: • The Inter-cloud Master: responsible for receiving requests from clients, preparing and distributing jobs among Intra-cloud slaves, monitoring the execution of jobs at the Intra-cloud Jobtrackers, monitoring the communications and data sharing between Intraclouds, sending reports and results to clients, analyzing the runtime statistics received from Intra-clouds, invoking the generation of new hardware acceleration bitstreams, and managing the utilization of the acceleration files based on the requirements of the client applications and the availability of accelerators. • The Inter-cloud Namenode: maintains metadata from the Namenodes of all Intra-clouds, and the locations of data blocks for the data files stored in Intra-clouds. • The Hardware Design Center: responsible for generating hardware acceleration files for new Map and Reduce functions and testing them on dedicated hardware before sending them to the Bitstream Library.

53

network, the FPGA Driver handles the following responsibilities: • Managing all communications between FPGA nodes within the same Intra-cloud. • Movement of data to and from FPGAs, which involves two main operations: 1) preprocessing the data that needs to be sent to the FPGA accelerators, i.e., transforming the data into pairs; and 2) moving the results from the FPGA to storage when processing is complete. • Managing the processing on each FPGA, which involves dealing with several tasks such as: loading an FPGA with the hardware acceleration bitstream, reporting the status of each FPGA to the Intra-cloud Jobtracker, reporting the status of each running bitstream, and managing the number of MapReduce functions that should run on each FPGA based on the FPGA’s available resources. For the FPGA Driver to handle these tasks, it should contain the following components: • Proxies for communication with the FPGAs: The proxies manage the connection to different FPGAs, and the FTP connections for sending bitstreams to FPGAs. • Data organizer: This component transforms raw data into a form that FPGAs can process (e.g. pairs). • FPGA Execution manager: controls the various operations that will be managed by the FPGA Driver such as: execution of hardware accelerators, loading bitstreams to FPGAs, loading data to FPGAs, fetching results from FPGAs, monitoring each FPGA workload, and sending the FPGAs’ status to the Intra-cloud Jobtracker.

• The Intra-cloud Namenode: similar to a traditional Hadoop framework, it maintains the file system tree of the Intra-cloud slave and the metadata for all the files and directories in the tree. It also keeps a record of the workstations on which all the file blocks are located. • The Bitstream node: an Intra-Cloud library of bitstreams that have been previously used in this cloud. • The Workstation: similar to a traditional datanode in Hadoop, the workstation contains a Tasktracker and a datanode that cooperate to execute a certain MapReduce function on specific data. For the complete details on the operations of the system, including the handling of received jobs, the communication between Inter and Intra layers, the partitioning and execution of jobs, etc., please refer to [1]. In this paper, we continue on our work in [1] by proposing and discussing a framework that enables Intra-clouds to enhance the job execution efficiency by using hardware acceleration. III.

HARDWARE ACCELERATION

In this paper, we propose that cloud providers can use FPGAs to execute certain Map and Reduce functions as hardware acceleration modules. In previous works, FPGAs were used with MapReduce on a single node and with no reconfiguration capability [4], where several versions of mappers and reducers would be programmed on the FPGA. However, the changes in [4] were focused on MapReduce and did not extend to the full Hadoop framework. Also, the implementations were restricted to individual nodes, and the FPGA did not have direct access to storage. In another work [7], a framework called Mars was developed for Graphics Processing Units (GPUs). The framework was evaluated using web applications, and the results showed up to 16 times faster performance compared to a quad-core machine. However, there is limitation on the number of threads when using GPUs which is related to: 1) the hardware configuration such as the number of multiprocessors, and 2) the computation characteristics of the Map and Reduce tasks, e.g., whether it is memory or computation-intensive. In our system, we focus on integrating FPGAs as part of an extended multi-cloud Hadoop framework in an efficient manner. We propose to provide extensions to Hadoop to support the following: 1) Integration of one or more stacks of multi-FPGA systems, 2) Communication with FPGA nodes within the cloud and in other clouds, and 3) Implementation and integration of accelerated mappers and reducers.

B. FPGA Node Each FPGA in our system acts as a hardware-accelerated mapper/reducer (HAMR) node that uses a high-speed Fiber Channel interface to stream data from a storage area network (SAN). Our design of the HAMR node, which is based on the capabilities of the Xilinx Zynq-7000 Extensible Processing Platform (EPP), is depicted in Fig. 2. The main components of HAMR are: 1. Two ARM Cortex-A9 processors for running the HAMR operating system and hardware accelerators, and for managing the communication between the HAMR and the FPGA Driver, and between the HAMR and the storage area network (SAN). 2. DDR3 SDRAM to buffer the input and output data streams. 3. EMAC and PCIe peripheral controllers for connecting to the FPGA Driver. 4. DMA for high-speed data transfer between networked storage devices and HAMR DDR3 memory.

A. FPGA Driver Operations In order to integrate FPGAs within Hadoop based clouds, we rely on the FPGA Driver who plays the role of a mediator between the software-based components (Jobtracker and workstations) and the stacks of FPGAs. The FPGA Driver plays several roles which we explain in details in this section. In order to support compatibility with the Hadoop framework, several requirements should be satisfied by the FPGA Driver. As a bridge between the software Hadoopbased network and the FPGA-based hardware acceleration

54

monitors the statistics that are sent to the Inter-cloud Master about the executed Map and Reduce functions. The choice of implementing a Map (or Reduce) in Hardware depends on two factors. The first is related to the execution frequency of the function (only high frequency functions are considered candidates for hardware acceleration), while the second is about containing operations whose execution times will be reduced if implemented as accelerators. If the two conditions are met by a Map or Reduce function, searchHA() sends the corresponding function to the Hardware Design Center, which invokes, upon receiving a software coded Map or Reduce function, a synthesizerHA() utility process. The latter is a hardware design process that attempts to generate the best hardware implementation for each software-coded operation. First, it translates the software operations to a hardware description language (e.g., VHDL), then it converts the result to a hardware circuit (bitstream) through a process of logic synthesis. For example, Fig. 3 illustrates the main blocks of a sample hardware accelerator that calculates the minimum distance between a certain point P(xp, yp) and a fixed set of points [(x1, y1), (x2, y2), … (xk, yk)]. The hardware blocks are shown in Fig. 3 as green boxes.

5. HWICP controller for loading hardware acceleration bitstreams into a partial reconfigurable region that is used to implement the hardware accelerator logic. The HAMR node also uses the high bandwidth AMBA Advanced Extensible Interface (AXI) to interconnect the various processors and peripheral controllers. Finally, the HAMR node runs an embedded version of the Linux operating system to support the necessary network communication protocols, configure the partial reconfiguration regions (PRRs), and load the necessary hardware acceleration drivers. ARM Cortex-A9

Partial Reconfig urable Region

ARM Cortex-A9

EMAC

AXI

HWICAP

PCIe

AXI MIG

Fibre Channel

DMA

DDR3

Figure 2. Hardware-accelerated mapper/reducer (HAMR) node architecture.

C. Hardware Acceleration as a Service (HAaaS) Our framework adds to Hadoop the capability of supporting hardware acceleration (HA) via FPGAs. By using HA, we will be able to execute jobs that may be very expensive (in terms of delay and energy consumption) if executed as software functions. To illustrate the importance of HA, we consider a MapReduce matrix multiplication application to execute on data files of total size of 100 terabytes. If each Map instance takes a 64 MB data block, and considering 10,000 parallel Map instances, then according to [8], the software execution of matrix multiplication will take 12.8 sec, whereas the same execution takes 0.332 sec with a hardware accelerator. Hence, for our MapReduce application we can save by using HA [(100×1012)/(64×106)]/104× 12.468 = 32.47 minutes. Besides reducing the execution time, FPGAs consume less energy than other devices in many cases [9]. In this section, we explain how hardware accelerators (or bitstreams) are designed, tested, integrated, and offered to clients.

Figure 3. Hardware accelerator for finding the minimum distance between a point in a dataset and k centers.

Upon generating a suitable hardware accelerator for the particular Map or Reduce function, the synthesizerHA() uses one or more reserved FPGA stacks to test the new bitstream. The synthesizerHA() compares the hardware testing results with software results obtained from the executed job to make sure that the bitstream is faster than its corresponding software function. If the bitstream was found to be slower than its software function, it is either rejected or might go through a second round for modifications and enhancements. 2) Adding Bitstream Files to the Bitstream Library When a new hardware accelerator needs to be added to the Bitstream library, synthesizerHA() invokes an addHA() process that generates the hardware accelerator metadata, documentation, and price. The latters are added, alongside the bitstream, to the Bitstream library. The metadata is generated according to the tasks that exist in the hardware accelerator file, while the documentation represents a detailed description of the file and its execution and is continuously updated when new performance results of

1) Creating and Testing New Hardware Accelerators When the Inter-cloud Master receives a new job from a client, it invokes a utility process searchHA() that searches the Bitstream library for hardware accelerators matching the job’s Map and Reduce functions. If searchHA() does not find any suitable bitstream files, it notifies the Inter-cloud Master which continues the job execution as was described in [1]. While the job is executed on the Intra-clouds, searchHA()

55

The pre-processed Map/Reduce codes are then sent to a high-level synthesis (HLS) driver, which controls the highlevel synthesis tool through a Tcl-based command line interface. The driver is responsible for driving the architectural exploration phase for each Map/Reduce function, and for generating a family of corresponding hardware accelerators having different performance and energy consumption characteristics. These characteristics can be used by the Inter-cloud Master to decide on the most suitable hardware accelerators to use to optimize performance and energy-consumption under different workloads. In our system, we implement High-level synthesis using the Xilinx Vivado tool [10]. Vivado can convert functions written in the C programming language into RTL-based hardware accelerators cores. Through its Tcl-based command line interface, Vivado can perform a number of source code transformations (e.g. loop unrolling) and RTL optimizations (e.g. pipelining) to improve the performance of the generated hardware accelerator cores. Vivado can be constrained to take into account the limited number of logic resources available in any given device to implement hardware accelerators. Vivado can also simulate C and RTL code to provide estimates of performance improvement. This information is exchanged with the high-level synthesis driver to provide the necessary feedback for architectural exploration. Although Vivado can only be used with Xilinx FPGAs and design tools, this does not detract from the generality of our approach to hardware accelerator design automation.

future jobs are reported. On the other hand, the pricing strategy adopted for hardware accelerator should depend on the resources and features that it uses (for example, the hardware cost needed to execute the bitstream), its popularity (importance), its execution time, and the savings that it makes to its corresponding software implementation, mainly in terms of execution time and/or energy consumption. 3) Offering Hardware accelerators to Clients Going back to the original searchHA() process in our example, if it finds that one (or both) of the Map and Reduce functions that were submitted by the user has one or more matching hardware accelerators in the library, it notifies the Inter-cloud Master with the information about each hardware accelerator, its metadata, price, and the benefits it offers to the client in terms of execution time. The Jobtracker sends this data to the client, which could make use of an interactive tool that will help him choose suitable hardware accelerators and calculate the suitable number of Map and Reduce instances that he/she should order as hardware accelerators according to his budget. If a client requests the use of hardware acceleration, the Inter-cloud Master specifies for each cloud slave the amount of resources that it should use (for example, type and number of hardware accelerators) according to the client’s request, and according to the current resources that exist in the cloud slave (for example, number of FPGA stacks). The Inter-cloud Master sends this information, encapsulated in the sub-jobs to the Intra-cloud Masters. Each cloud slave that needs to execute a bitstream which does not exist in its cloud can request it from the Intercloud Master which in turn fetches it from the Bitstream library and sends it to the cloud slave Jobtracker. In addition to using hardware accelerators to offer HAaaS, a datacenter administrator can take advantage of the FPGA stacks and the possibility of executing bitstreams to enhance the overall performance of the datacenter. For example, when the datacenter becomes overloaded, it might use hardware accelerators to quickly execute some jobs and release computing resources that can be leased to other users. D. Hardware Design Center Fig. 4 is a flow diagram that shows the different components of the Hardware Design Center (HDC) and how they interact with each other. In our system, the HDC and the Inter-cloud Master communicate through a shared API. When the Inter-cloud Master submits a request for generating hardware accelerators for candidate Map or Reduce functions, it provides the source code of the candidate functions. Once the corresponding hardware accelerators are generated, the HDC updates the Bitstream Library and notifies the Inter-cloud Master. The candidate Map/Reduce source code is sent at the HDC to a pre-processing component, which identifies the performance- or energy-critical portions of the source code and converts them to a format that is compatible with highlevel synthesis tools. Initially, this can be done by instrumenting the source code with compiler directives or pragmas. However, automatic programming language processing tools might also be used.

Figure 4. Different components of the Hardware Design Center.

Once hardware accelerator RTL code has been generated, it is post-processed and packaged in a format that is compatible with the Xilinx implementation tools. The postprocessing component also generates C language accelerator driver programs that invoke the hardware accelerators once they have been loaded into the FPGA of a HAMR node. The FPGA implementation driver uses an architectural description of the HAMR node, the hardware accelerator 56

RTL code, and the generated accelerator driver programs to generate the FPGA configuration bitstreams using the Xilinx partial dynamic reconfiguration design flow. This enables the HDC to store only the configuration bitstreams of the hardware accelerators and their driver programs, and to use partial dynamic reconfiguration techniques to load the required hardware accelerators onto HAMR nodes deployed in data centers. Although our HAMR node architecture is based on the Xilinx Zynq-7000 Extensible Processing Platform (EPP), the FPGA implementation driver can be used to target other Xilinx FPGA devices. Once the hardware accelerator bitstreams have been generated, they are sent to a testing and performance characterization component. This component loads the bitstreams onto an instrumented HAMR node prototype, and runs a set of tests to measure the performance and energy consumption characteristics of the hardware accelerators. These results, along with the hardware accelerator and accelerator driver bitstreams, are then stored in the Bitstream library for later use by the Inter-cloud master. IV.

server Mu divided by the total memory MT: ρM = Mu/MT. Finally, the network utilization ρN is the number of arriving requests λ over the number that can be handled µN: ρN = λ/µN. The requests to the network card can be modeled by a Poisson process, where the service time is constant, basically equal to the transmission delay [13], so an M/D/1 queuing model is appropriate to be applied. Concerning the tasks that the master performs; it runs a Jobtracker process that accepts requests from clients, extracts the map and reduce functions from the job, and searches the Bitstream library for suitable bitstreams. It then calculates for each concerned cloud the amount of resources that should be used according to the client’s preferences, and considers the load updates it had received from each datacenter along with its configuration to decide on using bitstreams. The master then decides on using the bitstream files, and invokes the generation of new ones when necessary. The tasks that consume most resources are the ones executed upon receiving the client’s request. These tasks are in the most part sequential, and hence they can be handled by a single thread. Hence, the number of threads in the Inter-cloud Master is equal to the number of users with pending requests. Each request transforms to a thread that consumes memory to maintain its stack, utilizes the processor to run resource calculations and lookup, and incurs overhead resulting from thread context switches. We suppose that the tasks that are performed after the user’s request is received take T1 seconds, and each thread incurs a context switch of c seconds. Thread creation and destruction overheads can be ignored as they are in the order of microseconds [14]. Considering the processor utilization, an expression for the number of served concurrent users can be found by using the average number of requests in the processor, and it is NP=λ/(µp–λµp) [15]. It follows that the total memory used by the process is Hm+NP ×STm, where STm is the thread stack size and Hm is the memory allocated. The main components of Hm are the resources defined in the submitted MapReduce job, and the configurations of the concerned clouds. With an average of k Intra-cloud Masters per request, and average resource of size S plus intermediate results of size R, the memory utilization of this process is approximated by k×NP×(S+R)+NP×STm+Hm. The Intercloud Namenode performs for each request string matching operations, while the Bitstream library provides lookup services, and in some cases, update operations. These two components can be modeled as two threads, with their own stacks (Sn and Sl) and allocated memories (Hn and Hl). Hence, the memory utilization changes to Ml=k×NP×(S+R)+NP×(STm+Hm+STn+Hn+STl+Hl). Finally, we assume that aggregating the intermediate results from k Intra-cloud takes Ta seconds, while the Namenode and Bitstream library operations take Tn and Tl seconds. Concerning the network utilization, we first review the Inter-cloud Master’s processes that involve communication with external entities. If the Jobtracker process does not find matching bitstreams in the library (with a probability of r),

INTER-CLOUD MASTER SCALABILITY ANALYSIS

Given the critical role of the Inter-cloud Master, we evaluate its ability to provide reliable services as the number of clients and clouds increases. The resources that influence a server’s operation are: memory, processor, network, and storage. We ignore storage, as it presents no bottleneck to server operations. It was stated in [11] that for smooth server operations: 1) memory utilization must be below 85% to avoid page faults and swap operations, 2) processor utilization must stay below 75% to make room for kernel and other software to operate with no effect on the server operations, and 3) network utilization should be kept under 50% to prevent queuing delays at the network interface. In our analysis, we make certain simplifying assumptions in order to carry out the analysis and arrive to some representative scalability measures. First, even though in our system certain jobs consume little resources and finish quickly while others consume huge resources and take a long time to finish, we will give all jobs the same priority and assume they take the same (average) time to complete. Second, we will model the processor and network performances using queuing theory. Considering processor performance, it is well established that an M/G/1-RR (round robin) queuing model is suitable [12]. It is designed for round-robin systems (like operating systems) and is generic, as it requires the mean and variance without the full distribution of the service time. This model assumes that requests to the processor follow a Poisson distribution. The distribution of the inter-arrival time between requests is exponential with mean λ requests/sec. Since requests are assumed to have same priority with low size variations, the queuing model is reduced to M/G/1-PS (processor sharing). We assume that at full utilization, the processor can serve µp requests per seconds. Thus, by queuing theory and Little’s Theorem, the processor utilization is ρP = λ/µp. The memory utilization ρM is the amount of memory used by the

57

it examines the map and reduce functions that meet certain criteria and sends them to the Hardware Design Center (HDC). Either way, the Jobtracker sends information to the client giving him acceleration option with the corresponding cost. It then distributes the configurations to the Intra-cloud Masters that run Tasktrackers. Afterwards, the Inter-cloud Master receives updates from the slaves, and accordingly may reassign tasks to other Jobtrackers in case of overloads or failures (probability of f). This enables the master to provide the client with status of the running job (we assume each user will receive a single update per job). Finally, when the job is completed, the master sends the results to the client. It follows then that the external network interactions per request are 1) the incoming request, 2) communication with the HDC with probability r, 3) negotiation with the client, 4) distribution of tasks to the k Intra-cloud Masters, 5) receiving updates from the slaves, 6) reassigning tasks with probability f to other slaves, 7) sending a job update to the client, 8) receiving intermediate results from the slaves, and 9) transmitting the final results to the client. Now, considering the request size (MapReduce job) being close to S (defined above), the generated bitstream size being I, the size of resource updates from the Intra-cloud Masters is C, and ignoring the size of update packets to clients, each job generates SS= S×(1+r+k(1+f)) +rI+2k(R+C) bytes. Then assuming the network interface to the Internet has a bit rate of B kbps, and, we conclude that it can serve µN =1000B/(8SS) total user requests per second. Using the queuing model M/D/1, the number of concurrent requests waiting on the network interface is then given by NN= λ×(2–λ)/2µN (1–λ) [16]. The cloud credentials and resources information are stored in a database (DB) to which the Inter-cloud Master connects. The DB files are organized based on an Indexed sequential access method with the Intra-cloud Master ID as primary index. There are two types of DB operations: updates arriving from Intra-cloud Masters, and read messages relating to user requests. The updates from every cloud arrive at a rate close to the request rate, and will be reflected in the DB. The Inter-cloud Master can send up to µN = 1000B/(8C) updates per second. However, given that an update will be sent every tx seconds (>>> 8C/1000B), the external network activities of this process will not constitute a bottleneck. On the other hand, each thread will have a stack allocated, and dynamic memory of size C. Since the execution code size is relatively negligible, the memory usage of the resource update/read process can be approximated as C×NN. Finally, we assume that the total execution of an update message takes Tu seconds and that of a read message takes Tr seconds. We now use the above definitions to develop expressions that lead to a measure of the load on the server. To start with, the CPU can serve µp = 1/(T1+Tu+Tr+Ta+Tn+ Tl+c) users per second. From before, the processor utilization is ρP = λ/µp, which must be less than 0.75, or else, the processor will be the bottleneck and will limit the

server’s scalability. Next, the total memory usage of the server processes is Mu=Ml+C×NN, which leads to a memory utilization of the processes of ρM = Mu/MT that must be below 0.85. Finally, the utilization on the external network interface is given by ρN =8SS×λ/1000B and should be below 0.5. The expressions of ρP and ρN are linear in λ, and their solutions yield λP < 0.75/(T1+Tu+Tr+ Ta+Tn+Tl+c), and λN < 1000B/(8SS), respectively. On the other hand, the expression of ρM is cubic in λ, and its solution λM, if it exists, is in the form λM

Suggest Documents