SeDs, or server daemons, act as service providers, exporting func- tionalities via a ... the client receives the best server to execute the request. Then, it can ...
Modelization and Performance Evaluation of the Diet Middleware Eddy Caron, Benjamin Depardon and Frédéric Desprez
University of Lyon. LIP Laboratory. UMR CNRS - ENS Lyon - INRIA - UCBL 5668, FRANCE Email: {Eddy.Caron,Benjamin.Depardon,Frederic.Desprez}@ens-lyon.fr
1
Introduction
Using distributed resources to solve large problems ranging from numerical simulations to life science is nowadays a common practice [1,2]. Several approaches exist for porting these applications to a distributed environment; examples include classic message-passing, batch processing, web portals or GridRPC systems [3]. In this last approach, clients submit computation requests to a meta-scheduler (also called agent) that is in charge of nding suitable servers for executing the requests within the distributed resources. Scheduling is applied to balance the work among the servers. A list of available servers is sent back to the client; which is then able to send the data and the request to one of the suggested servers to solve its problem. There exists several grid middleware [4] to tackle the problem of nding services available on distributed resources, choosing a suitable server, then executing the requests, and managing the data. Several environments, called Network Enabled Servers (NES) environments, have been proposed. Most of them share a
clients which are agents which are in charge of handling the clients' requests (scheduling them) and of nding suitable servers, and nally computational servers which provide compu-
common characteristic which is that they are built with broadly three main components: applications that use the NES infrastructure,
tational power to solve the requests. Some of the middleware only rely on basic hierarchies of elements, a star graph, such as Ninf-G [5] and NetSolve [6] and GridSolve [7]. Others, in order to divide the load at the agents level, can have a more complicated hierarchy shape: WebCom-G [8] and a problem arises: what is the
best shape for the hierarchy?
Diet [9]. In this latter case,
Modelization of middleware behavior, and more specically their needs in terms of computations and communications at the agents and servers levels can be of a great help when deploying the middleware on a computing platform. The literature do not provide much papers on the modelization and evaluation of distributed middleware. In [10], Tanaka
et al. present a performance evaluation of Ninf-G, however, no
theoretical model is given. In [11] P.K. Chouhan presents a model for hierarchical middleware, and algorithms to deploy a hierarchy of schedulers on clusters and grid environments. She also compares the model with the
Diet middleware. However, a severe limitation in these latter works is that only one kind of service could
be deployed in the hierarchy. Such a constraint is of course not desirable, as nowadays many applications rely on workows of services. Hence, the need to extend the previous models and algorithms to cope with hierarchies supporting several services.
Diet 1 (Distributed Interactive Engineering Toolbox.) The Diet component architecture is structured hierarchically as a tree to obtain an improved scalability. Diet comprises several components. Clients that use Diet infrastructure to solve problems using a remote procedure call (RPC) approach. SeDs, or server daemons, act as service providers, exporting functionalities via a standardized computational service interface. Finally, agents facilitate the service location and invocation interactions of clients and SeDs. Collectively, a hierarchy of agents provides higher-level serIn this paper, we will focus on one particular hierarchical NES:
vices such as scheduling and data management. These services are made scalable by distributing them across a hierarchy of agents composed of a single several
Master Agent (MA) (the root of the hierarchy) and potentially
Local Agents (LA) (internal nodes.)
We rst briey present in Section 2 our model for hierarchical middleware, and a description of our algorithm to automatically build a hierarchy. Then, experimental results are presented in Section 3, before concluding.
1
http://graal.ens-lyon.fr/DIET
2
Model for hierarchical middleware
We do not detail the whole model and the algorithm in this paper. Instead we concentrate on the experimental results presented in the next section. Hence, we strongly encourage the reader to refer to our research report [12]. In this section, we give an overview of our model.
2.1
Resource architecture
In this paper we will focus on the simple case of deploying the middleware on a fully homogeneous, fully
G = (V, E, w, B), i.e., all nodes' processing power are the same: V is the set of nodes, E is the set of edges, w is the processing power in M f lops/s, and all links have the same bandwidth B in M b/s. We do not take into account contentions in the network. connected platform
2.2
Server and agent model
We aim at deploying a hierarchy of agents and servers that is able to support several services. As we have multiple services in the hierarchy, our goal cannot be to maximize the global throughput of completed requests (number of completed requests per time unit) regardless of the kind of services. Hence, our goal is to obtain for each service a throughput such that all services receive almost the same obtained throughput to requested throughput ratio (of course we try to maximize this ratio), while having as few agents in the hierarchy as possible, so as not to use more resources than necessary. Our model is divided into two parts: the servers and the agents models. Requests are sent from the root of the hierarchy to the servers where they evaluate the time they will take to execute the service. Then, replies are sent up the tree, and each agent aggregates the replies and selects the best server. In the end, the client receives the best server to execute the request. Then, it can directly contact this server to send the relevant data and execute the request. Our model takes into account the communications costs within the hierarchy (requests forwarding, and replies) as well as the computations costs at the agents level (time to treat a request, and time to aggregate the results) and the servers level (time to evaluate a request, and time to eectively execute a request).
2.3
Building a
Diet hierarchy
Based on the server and agent models, we designed an algorithm to automatically build a
Diet hierarchy.
The algorithm is based on a bottom-up approach. We rst distribute the available nodes between the dierent services. With this repartition we can compute the throughput the servers can oer. Then, with the remaining nodes, we try to build a hierarchy of agents that is able to support this load. The construction of the hierarchy is done level by level: we rst try to add enough agents to support the servers level,
i.e., allocate children
to the agents (as the load of an agent is linear in the number of children), and verify that the computations and communications do not exceed the platform capacity
w
and
B
(resource constraints.) If only one agent
is enough to support all the servers, then our hierarchy is built. Otherwise, we need to add at least a new level of agents on top of the previous one. We repeat this schema for as long as we have enough nodes to build new levels, and for as long as we do not have only one agent at the top of the hierarchy. If no suitable hierarchy can be found, we reduce the number of nodes allocated to the servers, which incidentally reduces the services throughput, and repeat the same schema,
i.e., we try to build a hierarchy of
agents with the remaining nodes. Note that the load of an agent depends on the number of children of each service type and the service throughputs. The agent level construction (i.e., nodes allocation, and children repartition) is done using a Mixed Integer Linear Program (MILP.) We do not detail the algorithm in this paper, instead we encourage the reader to refer to our research report [12].
2
3
Experimental Results
In this section, we present comparisons between the theoretical predictions and real experiments with the
Diet middleware. We also show that our bottom-up approach based on an MILP to build the hierarchy
gives really good results compared to classical deployments. We deployed two kinds of services: a matrix multiplication using
dgemm
[13] (for various matrix size), and the computation of the Fibonacci number
using a naive algorithm. We will denote by Fibonacci
3.1
x
dgemm x the call to a dgemm service on matrices of size x × x, and n = x.
the call to a Fibonacci service to compute the Fibonacci number for
Theoretical model / experimental results comparison
In order to validate our model, we conducted experiments with
Diet on the French experimental testbed
Grid'5000 [14]. Using benchmarks made on the Sagittaire cluster, and our bottom-up algorithm, we generated for various combinations of
dgemm
and Fibonacci services, for platforms with a number of nodes ranging
from 3 to 50. Even though the algorithm is based on an MILP, it took only a few seconds to generate the 35 hierarchies for our experiments. Our goal here is to stress
Diet, so we use relatively small services. We
dgemm 100 and Fibonacci dgemm 10 and Fibonacci 20 (small size services), dgemm 10 and Fibonacci 40 (small size dgemm, large size Fibonacci), dgemm 500 and Fibonacci 20 (large size dgemm, small size Fibonacci), and dgemm 500 and Fibonacci 40 (large size dgemm, large size Fibonacci.) We tried to obtained the same throughput compared the throughput measured and predicted for various services combinations:
30 (medium size services),
for each service: our target throughput is 10000 (which is higher than what we can expect on the platform.) The experimental protocol is the following: we rst deploy the hierarchy with
GoDiet [15], then we run
clients for both services and wait 10 seconds so that the load stabilizes. Finally, we measure the throughput during 60 seconds before killing all clients and the
Diet hierarchy. Each client runs on a separate node, and
sends one request at a time, once a request is completed, a new one is sent, this is repeated indenitely.
We used a 79-nodes cluster: the Sagittaire cluster from the Lyon site. Each node has an AMD Opteron 250 CPU at 2.4GHz, with 2GB of memory. All those nodes are connected on a Gigabit Ethernet network supported by an Extreme Networks Blackdiamond 8810. The operating system deployed using
kadeploy3
on the Sagittaire nodes was a Debian 2.6.24.3 x86_64. Finally, the chosen CORBA implementation was
2
omniORB 4.1.2 , as
Diet relies on CORBA for communications; and we used gcc 4.3.2. For the dgemm
service, we relied on the Altas library [16].
Table 1 presents the relative error between the theoretical and experimental throughput: a dash in the table means that the algorithm returned the same hierarchy as with fewer nodes, and hence the results are the same. As can be seen, the experimental results closely follow what the model has predicted. Higher errors
can be observed for really small services (dgemm 10 and Fibonacci 20.) This is due to the fact that really small services are harder to benchmark: the benchmarked server time to predict the execution time, and the time to execute a service are higher than what they should be.
3.2
Relevancy of the servers repartition
We also compared the throughputs obtained by our algorithm, and the ones by a balanced star graph (i.e., a star graph where all services received the same number of servers.) A balanced star graph is the naive approach that is generally used when the same throughput is requested for all services, which is what we aimed at in our experiments. Figures 1a to 1e present the comparison between the throughput obtained with both methods. As can be seen our algorithm gives better results: the throughput is better on all but one experiment, and no more resources than necessary are used (in Figure 1a, no more than 20 nodes are required to obtain the best throughput, and in Figure 1b only 3 nodes.)
dgemm 500 in dgemm 500 Fibonacci 40 experiment (see Figure 1e.) However, this is at the expense of the second service
The only experiment where the balanced star graph produced a higher throughput is for the
2
http://omniorb.sourceforge.net/ 3
Experiment
3
5
Number of nodes 10 20 30 40
50
dgemm 1.87% 1.58% 4.17% 4.39% Fibonacci 1.76% 1.26% 0.26% 2.28% dgemm 27.4% 10, Fibonacci 20 Fibonacci 27.3% dgemm 27.2% 29.1% 28.1% 28.7% 28.2% 22.9% 23.9% 10, Fibonacci 40 Fibonacci 14.6% 10.4% 9.7% 8.0% 6.7% 1.4% 2.3% dgemm 1.1% 0.7% 3.0% 3.0% 4.7% 9.5% 19.1% 500, Fibonacci 20 Fibonacci 31.8% 32.6% 26.4% 25.8% 20.1% 5.8% 10.4% dgemm 2.2% 7.9% 3.9% 2.8% 0.0% 0.5% 0.3% 500, Fibonacci 40 Fibonacci 10.5% 10.5% 8.2% 8.7% 5.7% 4.6% 0.3% Table 1: Relative error |theoretical−experimental| . theoretical
dgemm 100, Fibonacci 30 dgemm dgemm dgemm dgemm
which has a lower throughput. The
dgemm
throughput loss observable in Figure 1d is due to the load at the
agent level (this phenomena is predicted by our model.) Using our algorithm, we are able to increase the throughput to up to 700% (see Figure 1b for instance.)
3.3
Relevancy of creating new agent levels
In order to validate the relevancy of our algorithm to create the hierarchies, we compared the throughput obtained with our hierarchies, and the ones obtained with a star graph having exactly the same repartition of servers obtained with our algorithm. Thus, we aim at validating the fact that our algorithm sometimes creates several levels of agents whenever required. In fact in the previous experiments, this happens only for the following deployments:
dgemm 100 Fibonacci 30 on 20 nodes (1 MA and 3 LA), and dgemm 500 Fibonacci
20 on 30, 40 and 50 nodes (1 MA and 2 LA.) Here are the results we obtained with such star graphs. We present the gains/losses obtained by the star graph deployments compared to the results obtained with the hierarchies our algorithm computed:
dgemm 100 Fibonacci 30 on 20 nodes: dgemm
we obtained a throughput of 911.95 requests per seconds for
and 779.07 requests per seconds for Fibonacci, compared to respectively 1624.4 and 1699.0 with
our algorithm. Hence a loss of 43.8% and 54.1% respectively.
dgemm 500 Fibonacci 20 on 30 nodes: dgemm
we obtained a throughput of 211.65 requests per seconds for
and 3490.58 requests per seconds for Fibonacci, compared to respectively 224.0 and 4845.85 with
our algorithm. Hence a loss of 5.5%, and 28% respectively.
dgemm 500 Fibonacci 20 on 40 nodes: dgemm
our algorithm. Hence a loss of 15.4% for
dgemm,
dgemm 500 Fibonacci 20 on 50 nodes: dgemm
we obtained a throughput of 238.1 requests per seconds for
and 2476.9 requests per seconds for Fibonacci, compared to respectively 281.35 and 2391.6 with and a gain of 3.6% for Fibonacci.
we obtained a throughput of 184.68 requests per seconds for
and 2494.0 requests per seconds for Fibonacci, compared to respectively 311.7 and 2973.3 with
our algorithm. Hence a loss of 40.7% and 16.1% respectively. We can see from the above results that our algorithm creates new levels of agents whenever required: without the new agent levels the obtained throughput can be much lower. This is due to the fact that the MA becomes overloaded, and thus do not have enough time to schedule all requests.
4
Conclusion and future work
We presented in this paper a model for hierarchical middleware based on a tree structure, and briey described our algorithm for automatically nding a suitable hierarchy. Our objective, when deploying several services in the hierarchy is to provide the best provided to required throughput ratio.
4
3500
3500
1600
1600
3000
3000
1400
1400
2500
2500
1200
1200 1000
800
800
600
600
400 200 10
20 30 Number of nodes
40
2000 1500
1000
400
500
500
200
0
50
0 3 5
10
4500
90 80 70
3500
60 3000 50 2500 40 2000
30
1500
20
1000
10
500
350
50
0 30
40
7000
dgemm our algorithm Fibonacci our algorithm dgemm balanced star Fibonacci balanced star
300 dgemm throughput (requests / s)
dgemm our algorithm Fibonacci our algorithm dgemm balanced star Fibonacci balanced star
20
40
dgemm 500 Fibonacci 20 100
Fibonacci throughput (requests / s)
dgemm throughput (requests / s)
dgemm 10 Fibonacci 40
10
20 30 Number of nodes
(b) dgemm 10, Fibonacci 20
5000
3 5
1500
1000
(a) dgemm 100, Fibonacci 30
4000
2000
dgemm our algorithm Fibonacci our algorithm dgemm balanced star Fibonacci balanced star
6000
250
5000
200
4000
150
3000
100
2000
50
1000
0
Fibonacci throughput (requests / s)
3 5
dgemm throughput (requests / s)
dgemm our algorithm Fibonacci our algorithm dgemm balanced star Fibonacci balanced star
1000
Fibonacci throughput (requests / s)
dgemm 10 Fibonacci 20 1800
Fibonacci throughput (requests / s)
dgemm throughput (requests / s)
dgemm 100 Fibonacci 30 1800
0
50
3 5
10
20
Number of nodes
30
40
50
Number of nodes
(c) dgemm 10, Fibonacci 40
(d) dgemm 500, Fibonacci 20 dgemm 500 Fibonacci 40 80 70
100
60 80
50
60
40 30
40
20 dgemm our algorithm Fibonacci our algorithm dgemm balanced star Fibonacci balanced star
20
0 3 5
10
20 30 Number of nodes
40
Fibonacci throughput (requests / s)
dgemm throughput (requests / s)
120
10 0 50
(e) dgemm 500, Fibonacci 40 Fig. 1: Comparison: our algorithm and balanced star. We conducted several experiments on the Grid'5000 platform with the
Diet middleware. The results show
that the experimental results closely follow what the model predicted, and that our bottom-up algorithm
provides excellent performances. The experiments showed that our algorithm adds new levels of agents whenever required, and that it clearly outperforms the classical approach of deploying the middleware as a balanced star graph. Several points now need to be taken into account. So far we only studied the case of a fully homogeneous platform. Even though our model and algorithm can easily be extended to computation heterogeneous /
5
communication homogeneous platforms; the case of a fully heterogeneous platform is much more complicated. The model also needs to be extended with client/server communications, but for this we need to know where the clients are, whereas in this paper we considered that we did not know anything on the clients, this assumption do not seem to impact our model. Finally, as the MA can become a bottleneck, and as have several hierarchies interconnected at the MA level, we could also study this latter case.
5
Diet can
Acknowledgment
Experiments presented in this paper were carried out using the Grid'5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see
https://www.grid5000.fr.)
References 1. Berman, F., Fox, G., Hey, A.J.G.: Grid Computing: Making the Global Infrastructure a Reality. John Wiley & Sons, Inc., New York, NY, USA (2003) 2. Foster, I., Kesselman, C.: The Grid 2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2003) 3. Seymour, K., Lee, C., Desprez, F., Nakada, H., Tanaka, Y.: The end-user and middleware apis for GridRPC. In: Workshop on Grid Application Programming Interfaces, In conjunction with GGF12, Brussels, Belgium. (2004) 4. Caniou, Y., Caron, E., Desprez, F., Nakada, H., Seymour, K., Tanaka, Y.: High performance GridRPC middleware. In Gravvanis, G., Morrison, J., Arabnia, H., Power, D., eds.: Grid Technology and Applications: Recent Developments. Nova Science Publishers (2009) At Prepress. Pub. Date: 2009, 2nd quarter. ISBN 978-1-60692768-7. 5. Tanaka, Y., Nakada, H., Sekiguchi, S., Suzumura, T., Matsuoka, S.: Ninf-g: A reference implementation of RPC-based programming middleware for grid computing. Journal of Grid Computing 1(1) (03 2003) 4151 6. Casanova, H., Dongarra, J.: Netsolve: a network server for solving computational science problems. In: Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), Washington, DC, USA, IEEE Computer Society (1996) 40 7. YarKhan, A., Dongarra, J., Seymour, K.: Gridsolve: The evolution of a network enabled solver. Grid-Based Problem Solving Environments (2007) 215224 8. Morrison, J.P., Clayton, B., Power, D.A., Patil, A.: Webcom-G: grid enabled metacomputing. Neural, Parallel Sci. Comput. 12(3) (2004) 419438 9. Caron, E., Desprez, F.: DIET: A scalable toolbox to build network enabled servers on the grid. International Journal of High Performance Computing Applications 20(3) (2006) 335352 10. Tanaka, Y., Takemiya, H., Nakada, H., Sekiguchi, S.: Design, implementation and performance evaluation of gridrpc programming middleware for a large-scale computational grid. In: GRID '04: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, Washington, DC, USA, IEEE Computer Society (2004) 298305 11. Chouhan, P.: Automatic Deployment for Application Service Provider Environments. PhD thesis, PhD thesis, Ecole Normale Supérieure de Lyon (2006) 12. Caron, E., Depardon, B., Desprez, F.: Modelization for the Deployment of a Hierarchical Middleware on a Homogeneous Platform. Technical report, Institut National de Recherche en Informatique et en Automatique (INRIA) (2010) 13. Dongarra, J., et al.: Basic linear algebra subprograms technical forum standard. International Journal of High Performance Applications and Supercomputing 16(1) (2002) 1111 14. Bolze, R., Cappello, F., Caron, E., Daydé, M., Desprez, F., Jeannot, E., Jégou, Y., Lanteri, S., Leduc, J., Melab, N., Mornet, G., Namyst, R., Primet, P., Quetier, B., Richard, O., Talbi, E.G., Irena, T.: Grid'5000: a large scale and highly recongurable experimental grid testbed. International Journal of High Performance Computing Applications 20(4) (November 2006) 481494 15. Caron, E., Chouhan, P.K., Dail, H.: : A deployment tool for distributed middleware on grid'5000. In IEEE, ed.: EXPGRID workshop. Experimental Grid Testbeds for the Assessment of Large-Scale Distributed Apllications and Tools. In conjunction with HPDC-15., Paris, France (June 19th 2006) 18 16. Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the atlas project. Parallel Computing 27(1-2) (2001) 3 35
GoDiet
6