MODELLING OVERHEAD IN JAVASPACES Frederic Hancke Gunther Stuer David Dewolfs Jan Broeckhove Frans Arickx Tom Dhaene Department of Mathematics and Computer Science University of Antwerp 2020, Antwerp Belgium
[email protected]
KEYWORDS Distributed Computing, JavaSpaces, Performance Modelling.
Finally, the results of a first test of JavaSpaces against this theoretical model should provide a comparison of different distributed platforms, which is the main goal.
ABSTRACT DISTRIBUTED SYSTEMS In this paper a theoretical model is developed to compare different distributed platforms on their performance in computational problems. In order to compare different platforms in a heterogeneous environment on the same basis, the use of tasklists in XML is introduced. In this paper some initial results of the performance and overhead of JavaSpaces are presented. The resulting data set of the tests is statistically analyzed using the presented theoretical model, leading to a first conclusion on the use of JavaSpaces for computational problems. In a first stage, the data set is investigated on the presence of outliers, and in a second stage some basic statistics are applied. INTRODUCTION
In distributed systems, three kinds of models can be distinguished: the push model, the pull model and the push-pull model. In the discussion of these models in the rest of this section X will refer to the entity having some tasks to be performed and Y to the entity performing the tasks. The push model is used when each X distributes tasks to specific chosen Y s. The pull model is just the opposite: each Y scans the tasks waiting to be performed at their respective X and performs the tasks it can. When an X pushes its tasks into a medium where they are pulled to be performed by a Y , we speak of a push-pull model. A visualization of the three models is shown in figure 1.
In the research group CoMP (Computational Modelling and Programming) research is done in the field of computational science, aimed at understanding physics and engineering problems through modern modelling techniques, using new software development paradigms and advanced mathematical techniques. Many problems in the area of quantum physics often involve very intense and large computations. Therefore distributed platforms, such as JavaSpaces [FHA99], MPICH [MPI] (an MPI [GES99] implementation), etc. could be very useful. The goal of our project is to find out what platform is most suited for a certain kind and number of calculations. Thus, to classify computationally large problems. First of all, we are interested in the possibilities of some existing platforms. In this paper, we consider a Linda Tuple Space [Yal] implementation, JavaSpaces. This includes the platforms architecture, functionality and performance against a theoretical model. Next, the platform needs to be tested with a wide variety of fictitious, though representative, problems, each divided in subproblems so that the calculations can be distributed. As we don’t want the testcases to be problem specific, we consider tasks that simulate execution time only.
X1
Y1
X2
Y2 (a)
Y1
X1
Y3
Y1
X1
X2
Y2 (c)
X2
Y2 (b)
Y3
Y3
Figure 1: X1 has workers Y 1 and Y 2. X2 has workers Y 2 and Y 3. (a) push model; (b) pull model; (c) push-pull model
The farmer-worker model is hierarchically situated one level higher than the models discussed above. The rule is simple: there is one master (the farmer) and one or more slaves (the workers) doing the jobs the farmer orders them to do. Orders are given in the form of a task description, the result of the task in a result
description. The farmer-worker model can thus be implemented as a full push model, such as MPI, or as a push-pull model, such as the Linda Spaces implementations. The pull model could also be used, but in this case the farmer would not be the master anymore.
• the speed (also measured against the speed of an ideal machine) of each workers processor Sproc,w (0 ≤ Sproc,w ≤ 1 and 1 ≤ w ≤ Nwork ), and
• the load of each workers processor Lproc,w (0 ≤ Lproc,w ≤ 1 and 1 ≤ w ≤ Nwork ).
THEORETICAL MODEL BUNDCHEN The main reason for constructing a theoretical model, where the aim is to obtain a formula for calculating the total time needed to solve a given problem, is to compare the different distributed platforms with each other quantitatively. JavaSpaces could be compared to, say MPICH, but the comparison would only be between the two. If a third platform is included in the test, this should be compared to both JavaSpaces and MPICH to decide what is better in which situation. The central theoretical model proposed to allow for such comparison, will be referred to as Bundchen.
The meaning of these parameters is also shown in figure 2.
Farmer Sproc, 0, Lproc, 0 Worker 1 Sproc, 1, Lproc, 1 Worker 2 Sproc, 2, Lproc, 2
Building this theoretical model can be done in a number of steps. At each step a number of parameters is added.
Worker Nwork Sproc, N In the first step, the only model parameters are:
, Lproc, Nwork
work
Network Snet, Lnet
• the number of workers Nwork , Figure 2: Snet , Lnet , Sproc,w and Lproc,w (0 ≤ w ≤ Nwork )
• the number of tasks Ntask , and • the calculation time (in ms) of one task on an ideal system Ttask . Thus, supposing every task requires the same calculation time, a first step to a general formula for the total time needed could be written as
Finally, the above assumptions have to be cleared out:
(1)
• not every task generally requires the same amount of time to be computed, thus Ttask becomes task dependent Ttask,t (1 ≤ t ≤ Ntask ),
• the communication overhead (in bytes) of one task description Ctask ,
• the task description can not be assumed constant for all tasks, thus Ctask becomes Ctask,t (1 ≤ t ≤ Ntask ), and
Ntask eTtask T1 = d Nwork In the second step we also take into account:
• equally, the communication overhead (in bytes) of one result description Cres , • the speed (in bytes/s) of the network infrastructure Snet , and
• equally, the result description can not be assumed constant for all tasks, thus Cres becomes Cres,t (1 ≤ t ≤ Ntask ).
• the load of the network infrastructure Lnet (0 ≤ Lnet ≤ 1). Supposing the first two parameters remain constant for all task and result descriptions, the total time now becomes T2 = T 1 +
Ntask (Ctask + Cres ) Snet (1 − Lnet )
(2)
Of course, this model is still not satisfying. The aim is to have a formula or algorithm that calculates the minimum time spent on the job with the best possible distribution of tasks. Therefore we need to consider more parameters, such as • the speed (measured against the speed of an ideal machine (Sideal = 1)) of the farmers processor Sproc,0 (0 ≤ Sproc,0 ≤ 1), • the load of the farmers processor Lproc,0 (0 ≤ Lproc,0 ≤ 1),
To include the latter parameters into our proposed model, Bundchen, and clearing out the above assumptions, something stronger than a formula is needed. This means that an algorithm should do the job of distributing the tasks perfectly among the available workers, hereby using the knowledge of the speeds and loads of the available workers and the computational complexity of the tasks. Thus, an optimization problem replaces the simple formula. Although different algorithms exist for scheduling loads [BGMR96], the easiest, but surely the slowest, way to solve the problem, is just to check all possible solutions. The one that then returns the smallest value is the one we need. The number of possibilities (with Nwork > 1) is given by
XML database as well as logfiles of the results of tests. Nposs =
PNtask PNtask −i1 PNtask −(i1 +i2 ) i1 =0
i2 =0
kj =1 ikj
ij =0
Ntask i1
!
...
TESTCASE FOR JAVASPACES
PNwork −1
Pj−1
PNtask −
i3 =0
...
PNtask −
=1 ikNwork kN work
iNwork =0
Ntask − i1 i2
!
Ntask − (i1 + i2 ) i3
! ...
! Pj−1 Ntask − lj =1 ilj ... ij Ntask −
PNwork −1
lNwork =1 ilNwork
JavaSpaces !
iNwork with 3 < j < Nwork , and where ∀n, j ∈ N0 , n ≥ j :
The aim of this testcase is to find out whether JavaSpaces is a good potential candidate to solve high performance calculations, such as the quantum physics problems mentioned in the introduction. Other platforms such as MPICH, TSpaces [TSp, Wyc98] and GigaSpaces [Gig] will be tested the same way in the near future. Besides the platforms functionality, we are also interested in its performance measured against Bundchen.
n j
! =
n! j!(n − j)!
is the binomial of Newton. Clearly, this approach is rather simplistic. The algorithm does not distribute the tasks as any existing platform would do. The goal of the algorithm is to distribute tasks efficiently using its explicit properties. Distribution platforms that offer resource management for distributing the tasks will of course perform better against this model. XML TASKLIST GENERATOR Instead of testing a platform with real computationally complex problems, worker processes simulate task execution time only and are implemented by a sleep. This way it is easier to compare different platforms with Bundchen and the pool of workers becomes almost perfectly homogeneous. So, each task description takes one value that represents the duration of the worker to sleep. A tasklist consists of a number of such task descriptions where each tasks duration is randomly chosen using a probability distribution. To implement this idea, a program was written to generate these tasklists in XML [XML, McL01] format. This was done for two reasons: 1. as a storage medium for tasklists, and
JavaSpaces is one of many implementations of the so called Linda Spaces distributions concept. The underlying idea is that objects can be thrown into a virtual space and taken out, or simply read, by any object connected with the space. Many distributed platforms have been built using this idea. Other implementations, besides JavaSpaces, are TSpaces and GigaSpaces. JavaSpaces was built on top of Jini [Jin, Edw01] as a service of the Jini technology. Its functionality was kept very simple, but nevertheless is very powerful. In fact there are only three basic actions on the JavaSpace itself: • write: to write an object into the space, • read: to read an object from the space, but leaving the object in the space, and • take: to take an object out of the space. There is a fourth operation possible, but this one does not really perform on the space: notify, which notifies an object of objects being added to the space. Setup Table 1 shows the machines that have been used to perform the tests, as well as their characteristics. PC name smurf drone1of1 drone2of1 drone3of1 drone4of1
2. to be able to easily use the same tasklists over and over again in different settings of different testcases. The XML tasklist generator currently has two input parameters: the number of task durations to generate and the probability function to use. The currect version supports two types of probability distributions: the constant and the normal, or Gauss, distribution. Depending on this choice the program needs one extra parameter m for the former — the duration m for all tasks — or two extra parameters m (the mean of the normal) and sd (the standard deviation of the normal) for the latter distribution. In future versions of the generator more distributions and functionality will be added. The tasklists will also be stored in an
Processor Intel PII 400 Intel PIV 1.7 Intel PIV 1.7 Intel PIV 1.7 Intel PIV 1.7
OS SuSE Linux 7.3 SuSE Linux 8.0 SuSE Linux 8.0 SuSE Linux 8.0 SuSE Linux 8.0
Java 1.4.1 1.4.1 1.4.1 1.4.1 1.4.1
Jini 1.2.1 1.2.1 1.2.1 1.2.1 1.2.1
Table 1: Testcase setup
The HTTP server, RMI Activation Daemon, Lookup service, the JavaSpace service and the farmer were run on smurf. Each of the four workers was run on a separate drone. Measures The tests have covered different values (in the near future, tests will be performed with more different values for w and t) for four different parameters. These are: • w: the number of workers (w ∈ {1, 2, 3, 4}),
• t: the number of tasks (t ∈ {10, 50, 100}),
or
• m: the mean (in ms) of the Gauss distribution for tasks (m ∈ {1, 10, 100, 500, 1000, 2000, 5000, 10000}), and • v: the standard deviation (as a percentage of m; 0 ≤ v ≤ 100) of the Gauss distribution for tasks (v ∈ {1, 2, 5, 10, 20, 30}). Thus, for every combination of t, m and v, an XML file has been generated, except for the combinations of m = 1 for all v, and m = 10 with v ∈ {1, 2, 5}. For these combinations mv becomes real. Instead, XML files for m = 1 and m = 10 were generated with task duration m for all tasks. So a constant distribution was used instead of the normal distribution. This means 123 XML files were generated. Execution of the farmer-worker process with a given XML file resulted in one value representing the duration wct (wallclock time) of the whole process. Every test ran 10 times for each XML file and for each w, yielding 4920 data points. Statistical Analysis As the data set for evaluating JavaSpaces is, as mentioned in the previous subsection, not yet complete, we will give a first brief analysis of it in this subsection. First, the mean duration of all tasks in one tasklist (XML file) was calculated. This was done to prevent using the theoretical mean that was used to generate the tasklist, because this would corrupt further calculations. Formula (1) of Bundchen was used to get a first indication of the performance of JavaSpaces. In the rest of this section we work with the overhead of distributing one task in JavaSpaces, which is given by wct − T1 t As robustness of data sets [HMT00] is not self-evident, we first took out a number of potential outliers. Boxplots and Stem-andLeaf Plots are two possible methods to identify potential outliers. Each of these does not necessarily produce the same results. The decision which outliers will finally be discarded, is up to the user. Each potential outlier must be investigated carefully and may only be discarded if there is a good reason. In our case, occasionally high network or processor load might be good reasons.
d > Q3 + 3IQR This technique was applied on our data set. Table 2 shows a comparison of the resulting statistics. The table shows a significant difference between the mean, standard deviation, minimum and maximum using the complete data set (ALL) and using the complete data set discarding extreme outliers (ALL EO). The median, Q1 and Q3 shrink only slightly, which means that the complete data set was corrupted by only a few (171) data points. The difference between discarding extreme outliers and discarding mild outliers confirms this. As it is better to discard as few data points as possible, we will use the complete data set discarding only the extreme outliers in the rest of this subsection.
N N (%) Mean Median Std. Deviation Minimum Maximum Q1 Q3
ALL 4920 100.00 50.5536 40.0600 184.1168 -559.60 11859.77 20.3550 57.4950
ALL - EO 4749 96.52 41.1575 39.2000 30.8262 -75.00 168.03 20.1300 56.1300
ALL - MO 4589 93.27 39.8202 38.7000 26.5052 -34.30 113.20 20.2450 55.0000
Table 2: Statistics on the overhead (in ms) using JavaSpaces for distributing one task, using respectively the complete data set (ALL), the complete data set discarding extreme outliers (ALL - EO) and finally the complete data set discarding mild outliers (ALL - MO)
The graph for the overhead using JavaSpaces for distributing one task is shown in figure 3. The mean is marked with a solid line at 41.1575ms. The graph is divided in four columns grouped by the number of workers. So, from left to right we have first the data points based on 1 worker, then 2 workers, etc.
The method we used is that of boxplots. Using these, one can distinguish potential mild outliers from potential extreme outliers. A data point is marked as a mild outlier if [Wei02] d < Q1 − 1.5IQR or d > Q3 + 1.5IQR with d the value of the data point, Q1 and Q3 respectively the first and the third quartile and IQR = Q3 − Q1 the interquartile range. A data point is marked as an extreme outlier if (remark that extreme outliers are also mild outliers) d < Q1 − 3IQR
Figure 3: Graph representing the overhead using JavaSpaces for distributing one task. The X axis represents the data points, the Y axis the overhead (in ms)
There are still a few problems with this visualization. As mentioned earlier, the graph does not represent data measured with more than 4 workers. Another, more important, problem is the fact that some data points in the graph have negative values, which influence the mean badly. The reason for these negative values lies in the proposed theoretical model. Indeed, the first, simplest, formula for Bundchen was used, which supposes all tasks to have the same duration (the mean of the tasklist, not the theoretical mean). In the calculations with 3 or more workers, the situation in figure 4 may occur.
Worker 1 Worker 2 Worker 3
3
Worker 1 Worker 2 Worker 3
3
3
Divisible Loads in Parallel and Distributed Systems. IEEE Computer Society Press, 1996. [Edw01]
W. Keith Edwards. Core Jini Second Edition. Prentice Hall, 2001.
[FHA99]
E. Freeman, S. Hupfer, and K. Arnold. JavaSpaces Principles, Patterns, and Practice. Addison Wesley, 1999.
[GES99]
William Gropp, Lusk Ewing, and Anthony Skjellum. Using MPI. The MIT Press, second edition, 1999.
[Gig]
Gigaspaces. URL: http://www.j-spaces.com.
[HMT00]
David C. Hoaglin, Frederick Mosteller, and John W. Tukey. Understanding Robust and Exploratory Data Analysis. Wiley, 2000.
[Hup00]
Susanne Hupfer. The nuts and bolts of compiling and running javaspaces programs. Technical report, Sun Microsystems, Inc., 2000.
[Jin]
Jini. URL: http://www.jini.org.
[McL01]
Brett McLaughlin. Java & XML. O’Reilly & Associates, Inc., 2001.
[MPI]
Mpich. URL: http://www-unix.mcs.anl.gov/ mpi/ mpich/.
3 3 2 4 3
Figure 4: The problem with using formula (1) of Bundchen for 3 or more workers. The upper half represents the way Bundchen distributes the tasks, the lower half the way it could be distributed by JavaSpaces
Suppose 4 tasks must be distributed with each a length of respectively 3, 4, 3 and 2. The mean of this tasklist is clearly 3. So, Bundchen will theoretically distribute 4 tasks of length 3. Thus, situations in which tasks are distributed in such a way that they outperform Bundchen, might occur. However, using the complete Bundchen algorithm, which has not been fully implemented as yet, should solve this problem. CONCLUSION In this paper a theoretical model, called Bundchen, was introduced. This should provide for a better way to compare different distributed platforms on their performances for computational problems, ranging from simple ones to very complex ones. In the near future, the implementation of Bundchens algorithm will be completed. The XML tasklist generator, which is used to generate tasklists in XML format using a given distribution, will also be updated with much more functionality. A second XML file format will be developed in the near future to store the complete result set generated during tests. This will allow the use of an XML database for both tasklists and result sets. When more detailed data sets will become available, more thoroughgoing analysis tools will be used, such as analysis of variance (ANOVA), to obtain better predictions of the overhead.
REFERENCES [BGMR96] Veeravalli Bharadwaj, Debasish Ghose, Venkataraman Mani, and Thomas G. Robertazzi. Scheduling
[NKNW96] John Neter, Michael H. Kutner, Christopher J. Nachtsheim, and William Wasserman. Applied Linear Statistical Models. WCB/McGraw-Hill, fourth edition, 1996. [NZ01]
Michael S. Noble and Stoyanka Zlateva. Scientific computation with javaspaces. Technical report, Harvard-Smithsonian Center For Astrophysics and Boston University, 2001.
[NZ02]
Michael S. Noble and Stoyanka Zlateva. Distributed scientific computation with javaspaces. Technical report, Boston University, 2002.
[SM99]
Inc. Sun Microsystems. Javaspaces: Innovative java technology that simplifies distributed application development. Technical report, Sun Microsystems, Inc., 1999.
[TSp]
Tspaces. URL: http://www.almaden.ibm.com/ cs/ TSpaces/.
[Wei02]
Neil A. Weiss. Introductory Statistics. Addison Wesley, sixth edition, 2002.
[Wyc98]
P. Wyckoff. Tspaces. Technical report, IBM Almaden Research Center, 1998.
[XML]
Xml. URL: http://www.xml.com.
[Yal]
Yale linda group. Linda/ linda.html.
URL: http://www.cs.yale.edu/