G2 Remoting: A Cycle Stealing Framework based on .NET ... - CiteSeerX

3 downloads 97492 Views 49KB Size Report
A dedicated server machine is used as the physical manifestation of a virtual parallel machine on which remote objects logically reside from the programmers ...
G2 Remoting: A Cycle Stealing Framework based on .NET Remoting Wayne Kelly and Lars Frische Centre for Information Technology Innovation Queensland University of Technology, Australia {[email protected], [email protected]} Abstract. This paper presents G2 Remoting, a generic remote-object based framework for creating cycle- stealing parallel applications. The framework is built using the extensibility features of the .NET Remoting framework. The G2 Remoting framework enables programmers to program in a normal .NET Remoting fashion, without being concerned about the changing set of volunteer machines on which the computation is actually performed. A dedicated server machine is used as the physical manifestation of a virtual parallel machine on which remote objects logically reside from the programmers perspective. The remote objects are not, however, physically created on this server machine; they come into being, and have their methods actually executed, on the various volunteer machines. The remote objects do not, however, permanently reside on any given volunteer machine, they transparently move from one volunteer to another as necessary during their lifetime.

1 Introduction Over the last 10 years much research has been carried out on how to build systems that exploit idle cycles of networked workstations and personal computers (PCs). Many of these systems [3, 6, 14] have been developed for the Java platform, often making use of Java RMI. The new Microsoft .NET platform offers a mechanism similar to Java RMI called the .NET Remoting framework. Like Java RMI, it allows client applications to create objects on remote machines (client activated objects) and to invoke methods on these objects just as easily as creating and invoking methods on local objects. As Java RMI was developed after the Java Virtual Machine was designed, Java RMI requires programmers to use a tool to generate proxy classes for accessing remote objects. The .NET Remoting framework, being an integral part of the .NET runtime environment, dynamically creates transparent proxy classes at runtime as needed, without the programmer needing to be aware. One advantage of using the .NET Remoting framework is that programmers are free to implement their applications in any of the wide variety of programming languages available for the .NET platform. The corresponding disadvantage is, of course, the loss of the multiplatform nature of Java. For middleware developers, however, the primary advantage of using the .NET Remoting framework compared to Java RMI is that it is extremely extensible. In addition to making use of standard communication channels such as TCP and HTTP, programmers can develop their own custom channels or customize existing channels by inserting additional message sinks that can filter outgoing and incoming messages. In this manner facilities such as encryption or message redirection can be easily provided. We have developed a cycle stealing framework which we call G2 Remoting that is based on the .NET Remoting framework. The G2 Remoting framework enables programmers to program in a normal .NET Remoting fashion, without being

concerned about the changing set of volunteer machines on which the computation is actually performed. By making use of asynchronous method calls to remote objects, parallel programs can be created using a wide variety of parallel progr amming paradigms. The remainder of this paper is structured as follows. Section 2 gives a brief overview of .NET Remoting for those not already familiar with it. Section 3 explains how to use the G2 Remoting framework from a programmer’s perspective. Section 4 describes how we implemented the G2 Remoting framework on top of .NET remoting. Section 5 explains how G2 Remoting applications are dynamically deployed. Section 6 describes an example application, a version of the travelling salesman problem (TSP) based on a genetic algorithm. Section 7 compares our system with related work before concluding in Section 8.

2 .NET Remoting In the .NET remoting framework, client activated remote objects need to be of a type derived from class MarshalByRefObject. In addition there must be an entry in the client application’s remoting configuration file that specifies the URL of the server machine on which objects of that class should be activated (see Figures 1 & 2). class Foo:MarshalByRefObject { public void Bar(){ /*...*/ } public static void Main() { RemotingConfiguration.Configure("Client.exe.config"); Foo f = new Foo(); //client activated remote object f.Bar() // remote method invocation } }

Fig. 1: Sample .NET remoting code Protocol of server URL determines channel used ... Adding message sink to customize http channel ... Register client activated remote type

Fig. 2: .NET remoting configuration file (Client.exe.config)

The configuration file also includes information about communication channels that will be used. The protocol prefix of each server URL determines the channel that will be used for that class. Entirely new protocols can be introduced by implementing new channel classes, or existing protocols can be customized by implementing additional message sinks that messages will pass through. Remote activation and remote method calls cause the transparent proxy class to translate these calls into activation messages and method call messages respectively. These messages pass through a series of message sinks on the client side, the last of whose responsibility is to send the message over the physical network to the appropriate server machine using the communication protocol of the specified channel (see Figure 3). Once on the server side, messages again pass through a series of message sinks before arriving at the actual object in the case of a method call message, where they are converted by the framework into an actual method call, or in the construction of the object in the case of an activation message. The result of the method call (or a reference to the remote object in the case of an activation message) is then passed back in the reverse direction through the chain of message sinks before arriving at the transparent proxy on the client where it provides an actual result for the original method call. Any of the message sinks in these chains can transform or redirect messages in whatever manner they please.

Client Side

Server Side

Proxy

Remote Object

Message Sink 1

Message Sink n

Message Sink 2

Message Sink 2

Message Sink n

Message Sink 1

Transport layer Sink

Network

Transport layer Sink

Fig 3: Message Sink Chains

To remotely activate an object, all the client programmer need do is to use the new operator to create an instance of a class defined in the configuration file as a remotely activated class. Similarly, invoking a method on a remote object is as simple as invoking that method using the transparent proxy. Transparent proxy objects can be passed as parameters to other remote objects which can then use them to remotely invoke methods on the corresponding remote object themselves.

3 From the Programmer’s Perspective With regular .NET Remoting, parallel application can be created by remotely activating objects on a number of different servers and invoking methods on those remote objects asynchronously. To do this, however, we need to statically specify the

URLs of the machines where these remote objects are to permanently reside. In a cycle-stealing scenario, we wish to make use of a set of volunteered computers to perform the actual computations, but the set of volunteers is not known in advance and may vary dramatically during the course of the computation. The programming model that we have developed for G2 Remoting is based on a virtual parallel machine that is implemented by the current set of volunteer computers and a single dedicated server machine that acts as a manager for the volunteers. In G2 Remoting, all remotely activated objects reside on this virtual parallel machine which is referred to physically by the URL of the dedicated server machine. In other words, all activation messages and method cal l messages are routed via this dedicated server machine. The remote objects are not, however, physically created on this server machine; they come into being, and have their methods actually executed, on the various volunteer machines. The remote objects, however, do not permanently reside on any particular volunteer machine. They physically reside on the various volunteer machines for relatively short periods of time during which remote invocations of their methods are performed; but logically, they reside only on the virtual machine. Their physical location on a particular volunteer is never known publicly, other than by the dedicated server, so all clients (including other remote objects on the virtual machine) that wish to invoke methods must do so using the object’s logical address (i.e. care of the dedicated server). In this way, remote objects can transparently move between volunteer machines as is required in such a volatile environment. In general we assume client and volunteer machines do not require a web server and even if they do have one, that they are probably behind a firewall anyway. In other words, we assume that the server is not able to “push” messages to client or volunteer machines. All communication between clients and the server and between the server and the volunteers must be initiated by the clients and volunteers respectively. This is the other reason for requiring all remote method invocation messages to be routed via the server. Not only do other volunteers not always know on what volunteer a particular remote object is physically located, but even if they did, we assume that the volunteers cannot directly communicate with one another due to firewalls. The only machine that we require to be globally accessible is the server machine as it acts as a gateway between all other machines. Obviously, this may introduce a bottleneck, but that is the price that must be paid to maximize utilization of machines on today’s firewall prevalent Internet. Provided the granularity of parallelism is appropriate, good performance can be achieved.

4 Inside the G2 Remoting Framework The Client Interface The G2 Remoting Framework is implemented by customizing the standard HTTP channel by adding a custom server-side message sink. Clients generate activation messages and method call messages exactly as they would in regular .NET Remoting. These messages are sent to the virtual machine’s physical manifestation - the dedicated server. Our message sink on the server intercepts these messages and prevents them from being processed on the server as would happen normally. The message sink instead places incoming activation messages (still in their serialized form used by the physical transport layer) in an object repository, currently implemented as a relational dat abase. Here the activation messages (which can be

thought of as embryo objects) wait until a volunteer becomes available to gestate them into actual objects (see Figure 4). Virtual Parallel Machine Clients

Volunteers Server

Network

Object Repository

Network

Method call Repository

Fig. 4: Basic Architecture

Activation messages normally require a reply message that contains a URI (Uniform Resource Identifier) representing the location of the newly created remote object. In G2 Remoting, arrival of activation messages does not result in the immediate creation of an actual object, so we instead reply with a URI that represents the logical rather than physical location of the object. The URI returned consists of the server URL concatenated with the row id of the table in the database where the embryo is stored. Method call messages arriving at the server are stored in a separate table in the relational database and are correlated to the objects that they belong to via their object’s row id. Method call messages also normally require a reply message that contains the actual results of the method invocation. In G2 Remoting, the method may not actually be executed until much later, so we instead send a reply with just the method invocation’s row id from the database. We use a separate client -side message sink to intercept and prevent this reply being returned to the proxy as the actual result of the method call. Our client -side message sink instead creates an entry in a hash table for the method invocation id just received and stores with this id - the next message sink in the client -side chain of message sinks to which it would normally hav e passed the reply message. This reply chain leads back to an asynchronous result handle corresponding to the current method invocation. 1 As the server is not able to “push ” results back to the client when they are eventually computed by a volunteer, a singleton client -side thread is responsible for “pulling ” results from the server. If a client is currently waiting for results, this thread sends a request for results to the server. If no results are available at that time, the request blocks on the server until one or more results become available. As soon as a request for results is successfully processed, a subsequent request for results is made provided 1

Each remote method invocation produces a unique asynchronous result handle and a unique client-side chain of message sinks.

there are still results outstanding. In this way results are returned to the client in a timely fashion without the client needing to continually poll the server for results. And by using a singleton thread to retrieve the results for all currently outstanding method invocations, the number of communication channels left open is limited to one per client. When our client-side message sink receives actual method result messages from the server, it propagates them along their corresponding client -side reply chain as recorded earlier - indexed by their method invocation ids. The Volunteer Interface When a new volunteer becomes available it asks the server for an object that has methods waiting to be executed. The object that the volunteer receives may be in embryo form (i.e. an activation message) if it has not previously existed, or it may be a serialized form of the object produced at the end of its previous life on a volunteer. Objects exist on at most one volunteer at a time, and each volunteer manages at most one remote object at a time. So when a volunteer retrieves an object from the server, it is locked by the server to prevent other volunteers from attempting to modify it in parallel. Once a volunteer has obtained an object and brought it to life (in one way or another), it then proceeds to invoke methods on it in the order they were queued on the server (thereby achieving at least sequential consistency). It continues to invoke such queued method invocations until either no more exist, or until it has had its “fair share” of processing using a simple round-robin scheduling mechanism. After every successful method invocation that changes the state of the object, the volunteer must send a new serialized form of the object back to the server where it replaces the previously stored state of that object. In the case where a volunteer is reclaimed (and therefore the currently executing method cannot finish), the previous valid state of the object is still available on the server, so the next available volunteer can simply retrieve this previous state and re-execute the failed method. Unfortunately this simple rollback mechanism is complicated by the fact that the currently executing method may itself have invoked methods on other remote objects, potentially causing their state to change. In that scenario, simply re-executing the failed method would result in these subsequent remote method invocations being incorrectly repeated. We avoid this situation by delaying the dispatch of state mutating remote method invocations until the remote method invocation making those calls completes (see Figure 5). Remote Object 1

Remote Object 2

Remote Object 3

Fig. 5: Remote mutator methods are only issued, after the calling method returns

Such remote mutator methods must effectively return void since their result obviously cannot be used by the calling method. Remote invocations of accessor methods can, however, be dispatched immediately and their results can be used by the calling method. This restriction requires programmers to annotate remote mutator methods with a declarative method attribute to allow the G2 Remoting framework to handle them appropriately.

5 Deployment In regular .NET Remoting, software components (referred to as assemblies in .NET) need to be pre-deployed on the server machines were they will be used. In our case, the actual computation is performed on a large collection of volunteer machines that are owned by various individuals that may have no particular relationship with the people wishing to run client applications. We therefore adopt a transparent and lazy deployment strategy. All of the code, both the portion to execute on the client machine and the portion ultimately intended to execute on volunteer machines, init ially resides only on the client machine. At runtime, it is dynamically uploaded to the server machine (if it is not already present) and from there dynamically downloaded to the volunteer machines as required to process the requests of various clients. More than one client application can execute on the system at the same time. Each volunteer only ever hosts one remote object at a time, but the remote objects that a volunteer hosts over a period of time may belong to a variety of client applications and therefore require a variety of assemblies to be downloaded. Assemblies are cached on both the server and the volunteers to minimize repeated uploads and downloads.

6 Sample Application To show the abilities of our framework, we have implemented an example application. We decided to implement a genetic algorithm applied to the Travelling Salesman Problem (TSP) using the island model. Remote objects are used to model islands of populations that evolve independently. From time to time these islands exchange parts of their populations with one another. The client application that initiates the computation queries the islands from time to time for their shortest path and updates its best known shortest path if an island has evolved a shorter one. We tested TSP for G2 Remoting on 1, 2, 4, 8 and 16 800 MHz Pentium IIIs connected over a 100Mbit/s network. A shortest tour was sort for 100 randomly positioned cities. We varied the computation to communication ratio by varying the number of generations for which islands evolved between exchanges with other islands. The maximum task size that we tested, 1600 generations corresponded to approximately 25 seconds. Most of the communication cost in this application is caused by the need to update the state of objects on the server after every remote mutator method invocation. We are currently investigating ideas to reduce the number of such “checkpoints”.

18 16

Speedup

14 12

100 Generations

10

400 Generations

8

1600 generations Linear

6 4 2 0 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Number of Volunteers

Fig. 6: TSP Perfomance

7 Related Work Early cycle stealing systems like NOW [1] and Piranha [5] where aimed for local area networks (LANs) which is reflected in their use of native code and raw TCP/IP communication. These systems increase the utilization of machines in networks of workstations significantly and offer their owners the computing power of supercomputers. With the growing popularity of the Internet in the late 1990s, researchers became aware of the huge computational power of millions of personal computers connected over the Internet. One of the best-known and most impressive volunteer computing systems is SETI@home, which aims to detect signals of extraterrestrial civilizations. Unfortunately the resources of the Internet are much harder to use than networks of workstations. One of the main problems is the heterogeneity of the Internet. For example the client software for SETI@home comes in 47 different versions for different combinations of CPU and operating system [8]. Only the success of the programming language Java and its runtime environment made a new generation of volunteer computing systems possible. Especially its portability and network features allow the development of generic APIs and frameworks for volunteer computing systems. Java based grid systems can be classified into two types, applet -based or application-based, dependent on the way jobs are executed on volunteer machines. Java applet -based volunteer computing systems use applets in HTML pages - embedded programs that can be executed in any Java enabled browser to execute jobs on volunteer machines. Examples of Java applet -based systems are Bayanihan [12, 14], Javelin [6], Popcorn [11] and Charlotte [3]. Examples for Java application-based systems are Gucha [9], Ninflet [15], Atlas [2] and ParaWeb [4]. Only Bayanihan BSP and Ninflet allow distributed remote objects to hold their states between method calls. In Bayanihan BSP the state of remote objects is sent to a central server at the end of every so-called superstep, allowing them to migrate to another volunteer. Unlike G2 Remoting, remote objects in Bayanihan BSP have only one method called bsp_run() that is always invoked by the volunteer host after their

creation. Remote objects in Bayanihan BSP therefore only hold state between different calls of bsp_run() and not between arbitrary methods called by the client or other remote objects. Ninflet [15] does not provide automatic check-pointing and the programmer has to call the method checkpoint() to save the state of the remote object. Using this check-pointing mechanism, remote objects in Ninflet can also migrate from volunteer to volunteer. Clients and remote objects in Ninflet can use Java RMI to invoke methods on other remote objects and remote objects hold state between method calls. However, only if the user has taken checkpoints properly, can he be sure that objects correctly maintain their states between method calls even if the object has migrated to another volunteer in the meantime. The latest development in volunteer computing is the use of the new Microsoft .NET Framework. Wilkerson was the first who studied the benefits of .NET Web Services for a volunteer computing system [16]. The system, called SharedCycles2 provides a Web Service to render images by using ray tracing. The major problem of this implementation is that the volunteer components are Web Services that are not easy to set up on a common workstation and needs extra software to run. The first generic volunteer computing frameworks ar e G2 [7] and Bayanihan .NET [13]. Both systems provide a Web Service that enables clients to submit jobs. Volunteer components are in HTML pages embedded objects, similar to Java applets.

8 Conclusion We have implemented G2 Remoting, a generic cycle-stealing framework featuring a remote object programming model with transparent migration of objects. The framework guarantees consistent state of all remote objects at any time by using a rollback recovery protocol with very little overhead. The G2 Remoting programming model is made to look identical to that of the .NET Remoting framework as so should be familiar to many .NET programmers. Our framework is as far as we know the first general purpose cycle stealing system that utilises the benefits of .NET Remoting and the first one that provides an object-oriented programming model with remote objects being capable of holding state between method calls while automatically maintaining a consistent global system state. References [1] Anderson, T.E., et al. (1994) A Case for Networks of Workstations: NoW. In Hot Interconnects II, Symposium Record, pp. 43-58. [2] Baldeschwieler, E. J., Blumofe, R. D., Brewer, E. A. (1996) Atlas: An Infrastructure for Global Computing. In Proceedings of the seventh workshop on ACM SIGOPS European workshop: Systems support for worldwide applications, Connemara: ACM Press, pp. 165-172. [3] Baratloo, A., Karaul, M., Kedem, Z., Wyckoff, P. (1996) Charlotte: Metacomputing on the Web. In Proceedings of the ISCA International Conference on Parallel and Distributed Computing Systems, Raleigh: ISCA, pp. 181-188. [4] Brecht, T., Sandhu, H., Sh an, M., Talbot, J. (1996) ParaWeb: Towards World-Wide Supercomputing. In Proceedings of the seventh workshop on ACM SIGOPS European workshop: Systems support for worldwide applications, Connemara: ACM Press, pp. 181-188. [5] Carriero, N., Freeman, E., Gelernter, D., Kaminsky, D. (1995) ‘Adaptive Parallelism and Piranha’, Computer, vol.28, no.1, pp. 40-49.

[6] Christiansen, B.O., Cappello, P., Ionescu M.F., Neary, M.O., Schauser, K.E. and Wu, B. (1997) Javelin: Internet-Based Parallel Computing Using Java [Online]. Available: http://www.cs.ucsb.edu/~schauser/papers/97-javelin.pdf [Accessed 11 Aug. 2002]. [7] Kelly, W., Roe, P. and Sumitomo, J. (2002) G2: A Grid Middleware for Cycle Donation using .NET [Online]. Available: http://g2.fit.qut.edu.au/g2/PDPTA02.doc [Accessed 6 Aug. 2002]. [8] Korpela, E., Wertheimer, D., Anderson, D., Cobb, J., Leboisky, M. (2001) ‘ SETI@home-Massively Distributed Computing for SETI’, Computing in Science & Engineering, vol.3, no.1, pp. 78-83. [9] Lau, L.F., Ananda, A.L., Tan, G. and Wong, W.F. (2000) Gucha: Internet-Based Parallel Computing Using Java [Online]. Available: http://www.comp.nus.edu.sg/~wongwf/papers/gucha.pdf [Accessed 10 Aug. 2002]. [10] Litzkow, M.J., Livny, M., Mutka, M.W. (1988) Condor – a Hunter of Idle Workstations. In 8th International Conference on Distributed Computing Systems, 1988, San Jose, pp. 104-111. [11] Nisan, N., London, S., Regev, O., Camiel, N. (1998) Globally Distributed Computation Over the Internet – The POPCORN Project. In Proceedings. 18 th International Conference on Distributed Computing Systems, 1998, Amsterdam, pp. 592-601. [12] Sarmenta, L.F.G. (1999) An Adaptive, Fault-tolerant Implementation of BSP for Java- based Volunteer Computing Systems. In IPPS’99 International Workshop on Java for Parallel and Distributed Computation, San Juan: IPPS, pp. 763-780. [13] Sarmenta, L.F.G., et al. (2002) Bayanihan Computing .NET: Grid Computing with XML Web Services. In Cluster Computing and the Grid 2nd IEEE/ACM International Symposium CCGRID2002, Berlin: IEEE/ACM, pp.404-405. [14] Sarmenta, L. F. G., Hirano, S. (1999) ‘Bayanihan: Building and Studying Web- Based Volunteer Computing Systems Using Java’, Future Generation Computer SystemsSpecial Issue on Metacomputing, vol.15, no.5/6. [15] Takagi, H., Matsuoka, S., Nakada, H., Sekiguchi, S., Satoh, M. and Nagashima, U. (1998) Ninflet: a Migratable Parallel Objects Framework using Java [Online]. Available: http://www.cs.ucsb.edu/conferences/java98/papers/ninflet.pdf [Accessed 1 Aug. 2002]. [16] Wilkerson B. (2001) Grid Computing using .NET Web Services and UDDI [Online]. Available: http://cedar.intel.com/media/pdf/wilkerson1.pdf [Accessed 19. Aug. 2002].

Suggest Documents