Load Distribution in a CORBA Environment T. Barth, G. Flender, B. Freisleben, F. Thilo University of Siegen, H¨olderlinstr. 3, D–57068 Siegen, Germany fbarth,
[email protected], ffreisleb,
[email protected] Abstract The design and implementation of a CORBA load distribution service for distributed scientific computing applications running in a network of workstations is described. The proposed approach is based on integrating load distribution into the CORBA naming service which in turn relies on information provided by the underlying W INNER resource management system developed for typical networked Unix workstation environments. The necessary extensions to the naming service, the W INNER features for collecting load information and the placement decisions are described. A prototypical implementation of the complete system is presented, and performance results obtained for the parallel optimization of a mathematical test function are discussed.
1. Introduction Distributed software architectures based on the Common Object Request Broker Architecture (CORBA) [15] have started to offer real-life production solutions to interoperability problems in various business applications, most notably in the banking and financial areas. In contrast, most of todays applications for distributed scientific computing traditionally use message passing as the means for communication between processes residing on the nodes of a dedicated parallel multiprocessor architecture. Message passing is strongly related to the way communication is realized in parallel hardware and is particularly adequate for applications where data is frequently exchanged between nodes. Examples are parallel algorithms for complex numerical computations, such as in computational fluid dynamics where essentially algebraic operations on large matrices are performed. The advent of networks of workstations (NOW) as cost effective means for parallel computing and the advances of object-oriented software engineering methods have fostered
efforts to develop distributed object-oriented software infrastructures for performing scientific computing applications on NOWs and also over the WWW [1, 4, 11, 22]. Other computationally intensive engineering applications with different communication requirements, such as simulations and/or multidisciplinary optimization problems [7] typically arising in the automotive or aerospace industry, have even strengthened the need for a suitable infrastructure for distributed/parallel computing. The common properties of scientific computing applications are: (a) the code is mathematically rather sophisticated, it has been developed over a long period of time, and includes many thousand man years of expert knowledge which is almost impossible to transfer into a redesigned and reimplemented version of the software, and (b) the requirements on computation times and storage capacities are usually very high. From the software engineering point of view, these properties lead to several design aspects that must be considered when developing an adequate software infrastructure for dealing with these problems:
To enable the reuse of “legacy code”, a software design has to provide abstractions to wrap these codes and further treat them as an integral part of the object– oriented design and implementation. The enormous demand for computation time can be encountered with parallel or distributed implementations running either on dedicated supercomputers or networked high performance workstations; using networked workstations in a shared environment with other (console) users raises the demand for an optimal management of the available resources.
In Fig. 1, a proposal for a software system architecture for a distributed problem solving environment for engineering applications is presented. The main feature of such an environment is the integration of simulation and optimization software and the distributed solution of the particular computational problems involved.
Figure 1. Software architecture for an integrated problem solving environment for the solution of engineering problems.
The middleware layer provides (platform–independent) functionality to start distributed components of the systems and for the communication between them. It can be realized using an object–oriented approach based on CORBA ([15], [13], [17]), but alternatively, using an implementation of the non–object–oriented Message Passing Interface standard (MPI) [12] or the Parallel Virtual Machine (PVM) [18], is also possible. The layer above implements the interface management functions providing the basis for application– level communication between objects. This layer encapsulates the application–specific interface, whether it is file– based or a programming interface and makes a common interface for data exchange available, e.g. by creation and/or transformation of files. Furthermore, synchronization between the components must be handled. The topmost layer provides a common interface for the complete system. Components for pre– and postprocessing have their own (graphical) user interfaces. Additionally, user interfaces, e.g. for the convenient formulation of an optimization problem (by a graphical selection of nodes in a finite element model for constraints or decision variables, or the selection of a predefined objective function) and sophisticated visualization techniques for the results of the optimization are made available. As a whole, this layer should present the components of a coupled optimization and simulation system in a consistent manner. It also initiates and controls the data flow between distributed components: from model generation in preprocessing, simulation and optimization to the visualization of optimization results in postprocessing. As already mentioned, the communication patterns between the distributed components of a system are important for selecting the most suitable middleware. If the amount of communication in an application is much less than the amount of computation, then the overhead introduced by
CORBA compared to low level message passing is reasonable. For example, quite often a single simulation during an optimization run may take several minutes or hours to complete, such that the communication costs for passing problem–specific data (decision variables, constraint values etc.) are negligible. Furthermore, trying to use a message passing library like PVM or MPI in conjunction with a distributed object system is inconvenient and error–prone. Using message passing implies the “simulation” of method calls on remote objects. The sender has to pass a token to identify a method, then pack the arguments and send them to a specified node where the method should be executed. On the receiving node, the token has to be mapped to an object’s method and the parameters of the method call must be unpacked. Adding methods to an interface of an object or changing the prototype of an existing one is therefore associated with implementing the whole packing and unpacking functionality on either sender and receiver side. In the regular case of using CORBA’s static invocation interface (SII) the complete procedure of packing and unpacking parameters for a method call is obsolete and handled by the ORB. When using the dynamic invocation interface (DII), e.g. for asynchronous method calls without using multiple threads, the user is responsible for packing the parameters on the client side almost similar to PVM. But in contrast to PVM, at least the unpacking of parameters on the server side is unnecessary. Thus, using object–oriented middleware such as CORBA for the implementation of a complex object–oriented software system for distributed scientific computing has several advantages. Furthermore, in contrast to other middleware platforms such as Legion [10], CORBA as the middleware ”standard“ (a) is likely to be available for future hardware/operating systems environments, (b) will be known to an ever growing number of programmers, and
(c) is intended to be an integral part of the current Internet– based metacomputing efforts such as DATORR (”desktop access to remote resources“) [6]. This implies that services that are essential for distributed scientific computing must be also realized as CORBA services. In this paper, the design and implementation of a load distribution service for CORBA in a NOW environment is presented. Load distribution is one of the most important features in NOWs, since the console users of workstations in a network typically do not fully utilize the processing capabilities of their machines (e.g. while editing text, reading mail, browsing the web, or being physically absent), and thus the idle times of workstations are frequently as high as 95% [9]. We discuss different approaches to integrate a load distribution mechanism into an object request broker (ORB). Our approach is based on integrating it into the CORBA naming service [14] which in turn relies on information provided by the underlying W INNER resource management system ([2], [3]) that we have developed for typical Unix NOW environments. The necessary extensions to the naming service and the W INNER features for the collection of load information from workstations and the placement decisions are described. A prototypical implementation of the complete system carried out within our research group is used to distribute the computations of a decomposed mathematical optimization problem in our NOW environment. Finally, performance results will be presented. The paper is organized as follows. Section 2 discusses possible approaches to integrate load distribution into CORBA and presents our solution based on a modified naming service and W INNER. In Section 3, the relevant features of W INNER are described. Section 4 presents performance results. Section 5 concludes the paper and discusses areas for future research.
2 Integrating load distribution into CORBA In general, CORBA applications consist of a set of clients (applications objects) requesting a set of services. These services can either be other application objects within a distributed application, or commonly available services (object services) providing e.g. name resolution (naming service) or object persistence (persistence service). There are different approaches to integrate load distribution functionality into a CORBA environment:
Implementation of an explicit service (e.g. a ”trader“, [19]) which returns an object reference for the requested service on an available host (centralized load distribution strategy) or references for all available service objects. In the latter case, the client has to evaluate the load information for all of the returned references and has to make a selection by itself (decentralized load distribution strategy).
Integrating the load distribution mechanism to the ORB itself, e.g. by replacing the default locator by a locator with an integrated load distribution strategy [8] or using an IDL–level approach [21]
The drawbacks of these approaches are that either the source code of clients has to be changed (as in the first approach) or that load distribution depends on a specific ORB implementation or IDL compiler and can thus not be used when other ORBs are used (as in the second approach). To integrate load distribution transparently to a CORBA environment, our proposal is based on integrating it into the naming service. This ensures transparency for the client side and allows the reuse of the load distribution naming service in any other CORBA–compliant ORB implementation. In almost every CORBA–based implementation the naming service is utilized. In the case of applications which make no use of the naming service it would be useful to implement load distribution as an explicit service. In Fig. 2, the proposed approach is illustrated graphically. The load information collected by the W INNER system manager (described in Section 3) from the node managers is available for the naming service. This load information represents system load, i.e. data like CPU utilization which is collected by the host operating system. Thus, the load information reflects CORBA-induced activity as well as load caused by other processes running on the same host. The system manager has an interface to query the status of each node manager and functionality to determine the machine with the currently minimum load. Requests from application objects to the naming service are resolved using this load information for the selection of an appropriate server. The implementation of a naming service which provides a transparent load distribution for any client application is described in the following. The naming service is not an integral part of a CORBA ORB but is always implemented as a CORBA service. The OMG specifies the interface of a naming service without making assumptions about implementation details of the service. Therefore, every ORB can interoperate with a new naming service as long as it complies to the OMG specification. Using the naming service, a client can get a reference to an object that is associated with a named service. This object needs to be bound to a name by the server which implements this object. Normally, a server creates a binding of an object to a name by calling the bind method of the naming service. As described in the CORBA specification [15], via a naming service normally only one object can be bound to a particular name in a context at the same time. If the name is already bound in a given context, the operation raises the AlreadyBound exception. In our approach, the naming service implementation was altered. The bind operation no longer returns an excep-
Figure 2. Schema for the integration of load distribution in a naming service. tion, but allows binding of several objects to the same name. A call to resolve checks the load status of all machines with an object bound to the given name. It then selects the one with the minimum load and returns it to the client. The following two excerpts from the source code of the altered naming service show the different ways of handling multiple services registered under the same name. The original naming service checks for a name collision and throws the mentioned exception: ob = resolve_simple(n); if (!rebind) throw AlreadyBound();
The check for a name collision is now restricted to the registration of a new context and not applied when registering a service. Therefore, the type t of the service or context to be registered is checked: if (t==ncontext) ob = resolve_simple(n); if (!rebind) throw AlreadyBound();
If such an altered naming service is used to obtain the reference to an object, the application is independent of the actual ORB implementation. To distribute the load, the load information provided by W INNER needs to be evaluated. The following pseudo code depicts the selection mechanism of the best host for the requested service inside the naming service:
requested_name_found=false for all objects in context if object.name == requested_name requested_name_found=true if requested_name_found object_found=false repeat for all objects in objectList best_host= if object= object_found=true else until object_found or objectList is empty if object_found return object else return else throw exception NotFound
The first loop collects all objects registered under the name requested name available through the naming service in a list objectList. If this list is not empty, a second list hostList is created containing the host names of the potential server machines providing the requested service, otherwise the exception NotFound is raised. Via the interface to W INNER the best host ist selected from hostList, and it is checked whether the service is actually available on the particular machine. If this is not the case (e.g. due to a network error), the object residing on this host is deleted from the list objectList and the search is continued until an available server is found. If this is impossible, the first server found in the context is returned without considering its load status. Otherwise, the object residing on the selected host is returned. This approach supports the persistent activation method for the server, since this activation strategy is available on every CORBA ORB and is independent of the underlying operating system. Other activation strategies depend on an implementation repository which in most cases is not platform nor ORB independent. In many cases, a client gets the object reference only once when starting the application. Using the new naming service yields static load distribution for these applications. Dynamic load distribution can be achieved by repeated requests from the client to the naming service to obtain the most suitable host before each call to the requested service. With this primitive mechanism it is only possible to use stateless objects which do not need any synchronization. Repeated calls to a single service have to be independent of each other. If it is necessary to synchronize the different objects that are registered to the naming service under the same name, an additional mechanism is needed. Thus, a mechanism to save and load the state of an object has been implemented to add fault tolerance to CORBA. With this mechanism it is possible to automatically switch to another object whenever the current object is no longer available.
3. The W INNER resource management system As explained in the previous section, the naming service makes use of the capabilities of the W INNER resource management system to provide load distribution to CORBA applications. In this section, W INNER’s system structure and its algorithms for choosing the best available workstation are described. W INNER has been designed for typical Unix NOW environments, consisting of a central server and several workstations. For a W INNER NOW, the server is required to provide shared file systems and user accounts for all connected workstations. The various tasks of the W INNER system are performed by three kinds of manager processes: system managers, node managers, and job managers (see Figure 3).
Additionally, there are several user–interface tools e.g. for status reports and for influencing the usage of a user’s workstation.
Figure 3. Manager processes in a W INNER cluster.
The system manager is the central server process of a W INNER network. Its duties include (a) collecting the load information of all respective workstations, (b) managing the currently active jobs, and (c) deciding which hosts are assigned to a particular job request. This decision is mainly based on the estimated processing performance which each workstation would be able to provide to a newly started process. This estimation is basically computed by taking each workstation’s base speed, its current load and the number of processors (in case of mutiple processors) into account. Furthermore, the presence of the workstation owner or additional resource constraints can reduce the set of hosts which are suitable for a given job. For CORBA load distribution, this set is further reduced to the set of workstations which can actually provide the requested CORBA service. On every host participating in a W INNER network, a node manager performs the tasks related to the machine it runs on. First of all, it periodically measures the host’s utilization and reports it to the system manager. Furthermore, node managers are responsible for starting and controlling W INNER processes on their nodes, like e.g. reducing process priorities whenever a console user shows up. The CORBA load distribution naming service does not directly use these capabilities. System and node managers run as daemon processes. In contrast, job managers are invoked by a W INNER user in order to execute a sequential or parallel job. Thus, job managers are part of W INNER’s user interface. Their duties are (a) acquiring resources from the system manager, (b) starting processes on the acquired nodes via the respective node managers, and (c) controlling and possibly redirecting input and output of the started processes. For example, W INNER’s simplest job manager is called wrun. It allows the execution of a sequential (possibly interactive) job. It
works almost like the standard Unix command rsh, except that the job is automatically started on the currently best suited workstation in a network. In the W INNER terminology, the CORBA naming service acts as a special kind of job manager. Unlike other job managers, the naming service uses W INNER only for querying the fastest available host. It does not contact the node managers itself to actually start the CORBA services on the corresponding workstation. Instead, this is achieved via the normal CORBA infrastructure. In order to perform suitable task placement decisions, the system manager must have an accurate global view of the processor speed and current utilization of the workstations. To achieve this, each node manager provides the system manager with the information related to its own workstation. At startup, each node manager performs a simple benchmark loop, evaluating the machine’s speed (of a single processor) in integer operations, floating point calculations, and memory access. This benchmark’s result is a single number proportional to the host’s sequential performance, relative to every other workstation in a network. This value (called the base speed Sb ) is reported to the system manager along with the node name, IP address, and other static data such as the amount of main memory and the number of CPUs. Afterwards, the node manager regularly queries several load characteristics of its local host and reports them to the system manager either if they differ significantly from the last set of data sent or after a certain time interval, indicating that the node manager is still “alive”. To measure the currently existing workload of a machine, Unix kernels provide a so-called load average value averaging the number of processes in the ready queue within certain time intervals, the fastest of which is typically averaged over the last 60 seconds (depending on which operating system is used). Due to this averaging procedure, these load values follow the real load situation only very slowly. To get more recent data, W INNER computes the current run queue length from two consecutively measured load-average values at . This results in a more up-to-date load average value a which is comparable between different operating systems. Unfortunately, the load average values at as reported by the Unix kernels may be misleadingly high. This can happen when many shortly running processes are in the run queue. In that case, the CPU utilization can be observed more accurately using the fraction of time the processor(s) spent in the idle CPU state. In the single-processor case, the fraction of time spent in this state is reported by the operating system to be within the interval [0; 1]. In the case of a machine with p processors, an interval of [0; p] is used instead. However, once the CPU idle time is close to zero, it is
impossible to distinguish whether only one process is fully utilizing the CPU or whether several processes (per processor) are present in the CPU run queue. Hence, both types of information have to be used by W INNER to determine processor utilization exactly. Whenever a job manager requests a new node for its job, the system manager has to select the most appropriate machine. Besides checking for the presence of a console user and verifying static properties such as requested memory sizes, the system manager basically takes the currently available speed into account. Based on the workstation’s base speed Sb , its current speed Sc is calculated as Sc = Sb =, where denotes the fraction of available processing power in the presence of the current workload. Assuming a constant load, could be calculated as = a +1 (where a is the load average as computed by the node manager). This reflects the fact that after starting a new process, there are active processes sharing the CPU. As explained above, computing the available processor speed based on the load average may be inaccurate. Hence, for achieving more precise values, W INNER’s system manager instead calculates by using ti (the percentage of time the processor was idle): = 2 ? ti , yielding 1 for high percentages of idle time and 2 for higher CPU usage. In the case of small values of ti , it can be assumed that the workload consists of more than one process. Then, the load average value should be used for calculating in order to reflect the higher load. An empirically determined threshold of ti = 0:15 is used for switching between the two cases. is hence computed as follows:
=
a + 1; ti < 0:15 ? ti ; ti 0:15
2
For seamlessly integrating symmetric multiprocessor machines with shared memory (SMP) into W INNER networks, the system manager has to adapt its computation of Sc accordingly. There are two basic differences that have to be taken into account. First, on a machine with p processors, the fraction of CPU idle time is reported in the interval [0; p]. Second, the number of running processes (constituting the load average a) will be serviced by all p processors. Assuming an ideal scheduler (in the operating system), their load will be equally distributed across all p processors. Nevertheless, a single process can only be served by a single processor. Hence, whenever there are less processes than processors, the available speed must be derived from the capacity of only one processor. This situation changes in the case of multithreading. But since it is impossible to predict whether a given program binary will use multiple threads of control, a resource “multiprocessor machine” has to be requested by the user explicitly in this case. Although the scheme presented here does
not help to automatically select multiprocessor workstations for multithreaded applications, the workload generated by multiple threads will still be observed correctly. The computation of Sc = Sb = is performed analogously to the single-processor case with the exception that is computed via an intermediate value 0 . For ti 0:15, 0 is computed as 0 = 1 + p ? ti . For ti < 0:15, 0 = a + 1 as with a single processor. Finally, is computed as = 0 =p while excluding values < 1 whenever there are less processes than processors. The computation of 0 and can be summarized as follows:
a + 1; ti < 0:15 p ? ti ; ti 0:15 1; 0 < p = 0 =p; 0 p It is easy to see that the computation for coincides with its single-processor counterpart for p = 1. Consequently, 0 =
1+
Figure 4. W INNER status tool, displaying relative speeds of workstations.
W INNER always uses this enhanced scheme for computing
Sc.
A problem arises when the system manager receives several requests for available hosts within a short time interval. Without further measures, the system manager would choose the same node for all these requests resulting in the fastest host being swamped with new tasks. This is due to the fact that it takes some seconds for the selected workstation’s load to change and some additional time for this information to be reported back to the system manager. In order to avoid this problem, a bias procedure was introduced to the system manager’s calculations which assigns a temporary penalty to recently allocated nodes. This penalty is withdrawn when the system manager receives new load information from the respective node. Figure 4 shows the output of W INNER’s status tool which illustrates the meaning of the maximum speed Sb and the current speed Sc . On the left side of the status window, the names of the machines are listed. Available machines are printed in black whereas unavailable workstations (for example those with active console users) are printed in light grey. The relative speed of the given machines is indicated by the size of coloured bars. The total size of a bar corresponds to the statically measured machine speed. The black fraction of the bar indicates the currently available fraction of the processor speed ( 1=). The bar’s blue fraction (printed in grey) is currently in use by other processes. In this snapshot, almost all available CPU power is usable for W INNER jobs.
4. Experimental results To investigate the benefits of an integrated load distribution mechanism in CORBA, a test case from mathemat-
ical optimization was taken. The well known Rosenbrock test function is widely used for benchmarking optimization algorithms because of its special mathematical properties. For the general n–dimensional case it is defined as follows [20]:
f (x) = min
?1 ?
n X
=1
i
with
100
(xi+1 ? x2i )2 + (xi ? 1)2 ;
? 3 xi 3; 8i = 1; : : : ; n:
In our experiments, the function is used to demonstrate the benefits of an adequate placement of computationally expensive processes on nodes of a NOW. To compute the function in parallel, a decomposed formulation of the Rosenbrock function has been taken. In the decomposed formulation, several (sub-)problems with a smaller dimension than the original n–dimensional problem are solved by workers, and the subproblems are then combined for the solution of the original problem in a manager. In the case of the Rosenbrock function, the expression 100 (xi+1 ? x2i )2 prevents the independent solution of a subset of the sum, because indices i and i + 1 occur in every term of the sum. Therefore, a manager/worker scheme can be applied, in which the worker processes compute a solution for decision variables which are non–decision variables in the manager process and vice versa. These worker processes can be computed in parallel, alternately to the computation of the manager process. In Fig. 5 the results of the different test scenarios are compared. All test cases were computed using an implementation of the Complex Box algorithm [5]. Both manager and worker problems were solved using the same algorithm
with different parameters: the maximal number of iterations for worker problems were set to 10000, for the manager problem it was set to 2000. This setting reduces the sequential part of the solution strategy by decreasing the maximal number of iterations and thereby the time for the computation of the manager problem. All other parameters of the Complex Box algorithm were identical. Different scenarios were used to show the benefit of load distribution. The computation times for the test problems using the original naming service were compared with the computation times using W INNER when computing the manager/worker schemes on a network of up to 10 workstations (Pentium II/300 MHz under Linux 2.1). The naming service was implemented in C++. The ORB implementation used was omniORB 2.7.1 [16]. For the comparison of the different implementations of the naming service, a background load (a long running optimization of a 500–dimensional Rosenbrock function) was generated on 2, 4, 6 or 8 hosts. Additionally, the computations were performed with no background load.
160
Runtime (seconds)
140 120 100 80 60 40 CORBA 100/7 CORBA/Winner 100/7 CORBA 30/3 CORBA/Winner 30/3
20
0
1
2
3
4
5
6
7
8
Number of hosts with background load
Figure 5. Different test cases of a decomposed 30– and 100–dimensional Rosenbrock function with 3 and 7 worker problems under different load situations.
times between the two naming services are caused by the selection of hosts for manager and worker processes: firstly, the worker processes compute different optimization problems and hence need different computation times. Placing a longer running worker process on a host with background load causes a slightly longer total computation time. If the client application can predict the computation times for different workers it should allocate a host for the longer running worker first. Secondly, it makes almost no difference in terms of total computation times if one, two or even three workers are placed on hosts with background load. The total time conforms to the longest running worker process because of the necessary synchronization with the manager process. These effects of process placement depend on the properties of the distributed application. These test cases show the benefit of load distribution even for numerically expensive processes typical for scientific computing. The two upper curves (CORBA 100/7, CORBA / W IN NER 100/7) compare the computation times for a 100– dimensional Rosenbrock function with 7 worker problems (3 workers with dimension 14, 4 workers with dimension 13) and a 6–dimensional manager problem. The resulting 8 processes had to be distributed among 10 workstations. The benefit of load distribution is again most obvious when W INNER had the possibility to select idle workstations. With increasing background load the advantage diminishes because both implementations of the naming services are forced to select services on hosts with background load. To summarize, the benefit of load distribution for the test cases mentioned above can be estimated by ca. 40% in the best case. Even in the worst case it yields at least the same results as the unmodified naming service. This is the case if all available hosts are already in use and the load distribution mechanism has no possibility to explicitly select idle hosts. The mathematical properties of the test cases as mentioned above result in an average reduction of computation time of about 15%.
5. Conclusions The two lower curves (CORBA 30/3, CORBA / W INNER 30/3) show the computation times for a 30– dimensional Rosenbrock function with 3 worker problems (with problem dimension 10, 9 and 9) and a 2–dimensional manager problem. In this scenario, 6 workstations were available for the 4 processes. The effect of load distribution is obvious when 2 hosts had background load. The selection of hosts with the new naming service avoided these hosts and the computation time was the same as in the scenario with no background load. When 4 workstations were busy, W INNER had to select 2 of these hosts and the total computation time is only marginally less than the time achieved by the original naming service. Differences in computation
In this paper, the design and implementation of a load distribution service for CORBA in a NOW environment suitable for distributed scientific computing was presented. The proposed approach was based on integrating load distribution into the CORBA naming service which in turn relied on information provided by the underlying W INNER resource management system developed for typical Unix NOW environments. The necessary extensions to the naming service, the W INNER features for the collection of load information and the placement decisions were described. A prototypical implementation of the complete system was described, and performance results obtained for the parallel
optimization of a mathematical test function was presented. There are several areas of future work. Among them are: (a) improving and stabilizing the prototype implementation of the proposed CORBA load distribution service, (b) evaluating its benefits in real-life engineering applications, and (c) extending the W INNER load measurement and process placement features for wide-area networks to enable CORBA based distributed/parallel meta-computing over the WWW.
References
[11] Livny, M., Raman, R., High-Throughput Resource Management, In: The GRID: Blueprint for a New Computing Infrastructure, Foster, I., Kesselman, C. (eds.), pp. 311–337, Morgan Kaufmann, 1998. [12] Gropp, W., Lusk, E., Skjellum, A., Using MPI: Portable Parallel Programming with the Message– Passing Interface, MIT Press, 1994. [13] Mowbray, T. J., Zahavi, R., The Essential CORBA: Systems Integration Using Distributed Objects, John Wiley & Sons, 1995.
[1] Anderson, T. E., Culler,D. E., Patterson, D. A. and the NOW Team. A Case for NOW (Networks of Workstations). IEEE Micro, 15(1), p.54–64, 1995.
[14] CORBAservices: Common Object Service Specification - Chapter 3 - Naming Service Specification, Object Management Group, (ftp://www.omg.org/ pub/docs/formal/97-12-10.ps), 1997.
[2] Arndt, O., Freisleben, B., Kielmann, T., Thilo, F., Dynamic Load Distribution with the W INNER System, In: Proceedings Workshop Anwendungsbezogene Lastverteilung (ALV’98) Munich, p.77–88, SFB 342/01/98, 1998.
[15] The Common Object Request Broker: Architecture and Specification - Revision 2.2, Object Management Group, (ftp://ftp.omg.org/pub/ docs/formal/98-07-01.ps), 1998.
[3] Arndt, O., Freisleben, B., Kielmann, T., Thilo, F., Scheduling Parallel Applications in Networks of Mixed Uniprocessor/Multiprocessor Workstations, Proceedings Parallel and Distributed Computing Systems (PDCS98), p.190–197, Chicago, 1998. [4] Becker, D. J. et al., BEOWULF: A Parallel Workstation for Scientific Computation, In: Proceedings of the 1995 International Conference on Parallel Processing (ICPP), p.11–14, 1995. [5] Box, M., A new method of constrained optimization and a comparison with other methods, Computer Journal, 1965, Vol. 8, p.42–52. [6] Datorr: Desktop access to remote resources, http://www-fp.mcs.anl.gov/˜gregor/ datorr/ [7] Giesing, J. P.; Barthelemy, J. F.: A Summary of Industry MDO Applications and Needs, 7-th AIAA/USAF/NASA/ISSMO, St. Louis, 1998. [8] Gebauer, C., Load Balancer LB – a CORBA component for load balancing, (http://www.vsb.cs. uni-frankfurt.de/), Diploma thesis, University of Frankfurt, 1997. [9] Krueger, P., Chawla, R., The Stealth Distributed Scheduler, Proc. IEEE 11th Int. Conference on Distributed Computing Systems, p.336-343, 1991. [10] Grimshaw, A.S., Wulf, W.A., The Legion vision of a worldwide computer, Communications of the ACM, January 1997/Vol.40, No.1.
[16] omniORB – a free lightweight high-performance CORBA 2 compliant ORB, (http://www.uk. research.att.com/omniORB/omniORB. html), AT&T Laboratories Cambridge, 1998. [17] Pope, A., The Corba Reference Guide: Understanding the Common Object Request Broker Architecture, Addison-Wesley, 1998. [18] Geist, A., PVM: Parallel Virtual Machine – A Users Guide and Tutorial for Network Parallel Computing, MIT Press, 1994. [19] Rackl, G., Load Distribution for CORBA Environments, Diploma thesis, (http://wwwbode. informatik.tu-muenchen.de/˜rackl/ DA/da.html), University of Munich, 1997. [20] Rosenbrock, H. H., An automatic method for finding the greatest or least value of a function, The Computer Journal 3, p.175–184, 1960. [21] Schiemann, B., Borrmann, L., A new approach for load balancing in high-performance decision support system, Future Generation Computer Systems, Vol: 12, Issue: 5, April 1997, pp. 345-355. [22] Warren, M. S. et al., Parallel Supercomputing with Commodity Components, Proc. Int. Conf. on Parallel and Distrib. Processing Techniques and Applications, p.1372–1381, 1997.