Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines Michael J Jipping
Gary Lewandowski
Department of Computer Science Hope College Holland, MI 49423
Department of Mathematics and Computer Science Xavier University Cincinnati, OH 45207-4441
[email protected]
[email protected]
ABSTRACT In this paper, we describe the formatting guidelines for ACM SIG Proceedings.
1. INTRODUCTION Research in parallel processing and distributed computing views CPU cycles as something that can be shared, organized and contributed to a grander whole. This grander whole becomes one large machine, whose CPU cycles are shared in a protocol over a network that ties them together. While this work has often focused on static machines organized in a fixed arrangement, more opportunities are available if we can expand this work to engage mobile ad hoc networks (MANets) of computers of all sizes and capabilities. CPU cycles are available on platforms from pagers and cell phones to work stations and servers. By combining the concept of MANets with the idea of distributed executable objects, we have developed a parallel computing platform that is tolerant and effectively implements a parallel processing model. This paper documents our work that adapts a parallel computing model to operate in a diverse CPU environment held together by a mobile ad hoc network. The model tolerates machines of different configurations and capabilities that are dynamically entering and leaving the network. The next section describes background work and our motivation, and the following section describes what we believe are the requirements of a MANet that supports parallel processing. We follow this by a close look at HAND, our network model. We then describe our model of parallel computing on this network. Our conclusion points out where our work is headed.
2. BACKGROUND AND MOTIVATION As a way to enhance system performance, parallel and distributed computing models have been proposed and implemented for a long time. Early work in parallel and
distributed systems focused on specialized equipment in contained environments. Goodyear’s Massively Parallel Processor [1] is good examples of this early type of work MPI [5] and PVM [3] are all examples of systems that use existing networks of computers to assemble a virtual parallel processor. To truly utilize modern computing platforms, the model of parallel computing must embrace the mobility and dynamic nature of current technology. Projects like Distributed Net [4] take advantage of idle cycles in a dynamic way, but do not allow communication between processing nodes. Handheld computers provide an opportunity to extend the parallel computing model to MANets. The network tying these machines together is more likely to be a mobile ad hoc network, connecting a particular node to a network for a relatively short amount of time with bursty data rates geared toward updating information between a client and a server. Work on dynamic software models like Jini [6] has paved the way into mobile parallel computing environments. Jini is a Java-based model of computing that addresses the nature of service offerings in a dynamic environment. The motivation for our research in this area comes from using a Jini approach to sharing CPU cycles across a dynamic ad hoc network of handheld machines. We extend the Jini model by streamlining the network infrastructure to embrace a server-free and completely dynamic environment and enhancing it to facilitate easy sharing of CPU resources across a network of Jini-based devices. This work thus focuses on a networking infrastructure that enables server-free resource sharing based on the Jini approach and on a parallel computation model that takes advantage of this infrastructure.
3. CHARACTERISTICS Our MANet supporting parallel and distributed applications is called a Handheld Area Network Domain, or HAND. HAND networks, like MANets, make very few assumptions about the type of computer on the network or about how long that computer will stay connected to other nodes. HAND networks are open to many different computing architectures and models. For example, workstations, handheld machines, cell phones, and pagers may all be on the same HAND network. In particular, no assumptions should be made about when a HAND network node enters a network or about how long a HAND network node remains usable on a network. This means
that the network implementation needs to be passive and open regarding access to information about network nodes. No assumption about a parallel processing server is possible. Assigning server duties to a specific network node, given that we cannot assume that any node will remain on the network for long, is not feasible. No assumptions should be made about languages used or programming models imposed. This implies that a HAND network must be based on protocol standards that can be easily implemented, rather than APIs in a particular language. No assumptions should be made about the underlying network transport layers. While we will implement a HAND network using available networks, the specifications need to be given assuming an underlying transport, but not specifying one. The requirements of a HAND network imply some network characteristics. An implementation should provide a server-less environment. In that server-less network, information about nodes and the resources or services they offer must be kept local to each node. Because nodes flow in and out of the network, information about network nodes and their service offerings needs to flow as well, on a demand-driven basis. Access to information should be easy, since the information itself will change as rapidly as the network changes.
4. A HAND Model Overview The HAND model provides for a server-free implementation of parallel processing over a MANet, allowing for multiple platform implementation. It takes a layered approach to implementing services across a MANet. Figure 1 shows the big picture, depicting the layers in the HAND model. At the top is the application layer is the layer implementing a specific application that requires parallel resources. This application assumes the MANet exists, and uses the capabilities of the HAND network via a specified interface. The dispatch layer is next and receives calls for services from the application layer and dispatches tasks over the HAND network to service those requests. These service requests could be specific requests that take advantage of the parallel nature of the MANet or could be simple operating system service requests implemented by dispatching across a MANet. The infrastructure layer lies underneath the dispatch layer and comprises the MANet in 3 layers. The communication layer is at the bottom and is the assumed transport layer that supports the infrastructure of a MANet. Here, we focus on the infrastructure layer. The goals of this layer reflect the desired characteristics of a MANet for parallel processing. (1) Implement partitioning and dispatching using Jini semantics. We use the ideas of service advertisement by server implementations and service request by clients as the model for the implementation of parallel service across a HAND. (2) Implement a lightweight protocol that allows for efficient implementation of a HAND. As no assumptions can be made about the underlying communications layer, the protocol that implements this layer must be as lightweight as possible. (3) Implement simple protocols. We cannot assume large memories or fast CPUs in network nodes, and therefore, we must keep protocols and the corresponding implementations minimal. (4) Minimize the size of the “kernel” implementation. Again, we cannot assume large memories or fast CPUs. Therefore, we must keep the kernel that runs nodes as small as possible.
4.1 Protocol Layers The HAND model draws on the Freenet [3] model and views the infrastructure layer in protocol sublayers. Figure 1 shows the sublayers involved in the infrastructure model. The messaging layer comprises message passing capability. At this level, applications can pass messages to each other and to the server. Messages are guaranteed to either arrive or to give notice to the calling routine that they have not arrived. The network layer is the layer where the network is constructed. This layer collects information on the nodes in the network, and collects data on distance of nodes, latency of message travel, speed of CPU, and type of processing that a node can do. It interfaces with the messaging layer and presents that layer with a picture of the network. The administration layer is the "switchboard" of the infrastructure. It receives and dispatches notification of node entry and exit, and coordinates the leasing of services to and from nodes.
4.2 Network Details We view a HAND network as a set of loosely coupled nodes. Each node has only partial information about the rest of the network, and can gain more information if necessary. The set of nodes is organized as a set of neighbors. Each node has a neighbor set and communicates with the network through this neighbor set. The number of neighbors is configured by each node, but each node must have at least two. Each neighbor of a node has its own neighbor set, which includes the node in question as well as others. The resulting network is a connected network of nodes that only directly know their neighborhood but can access the entire network.
4.2.1
Information About a Network
A HAND network is passive about discovering network information. Nodes do not actively collect information, rather they pick up information as other operations take place. As part of a successful message sent from node FOO to node BAR, for instance, both nodes would record that the other exists. There is one exception to this rule. The information each node keeps on its neighbors has a time-to-live (TTL) parameter. This parameter is used as a measure of maximum length of time a node goes without hearing from its neighbors. When the amount of time given by the TTL parameter for a neighbor has elapsed, a node will ping that neighbor and reset that neighbor's counter. This counter is automatically reset whenever data from the neighbor is received.
Figure 1. The HAND Network Structure
4.2.2
Network Joining and Leaving
A node enters a HAND network by broadcasting its entry and looking for neighbors. In a HAND network, there will always be nodes that can accept neighbors and the response to an entry broadcast is a set of messages from nodes willing to take on neighbors. From this set, the entering node chooses one, accepting that neighbor and rejecting others. In certain cases, a timeout can serve as a decline of a neighbor invitation. Nodes exit a HAND network in two ways. The first is a clean exit in which a message is sent to a node's neighborhood indicating an intention of leaving the network. The message causes the exiting node's neighbors to reconfigure and establish new neighbors. The second method of leaving a network is through noncommunication. Timeout errors are taken very seriously in a HAND network and signify many things. In the context of neighborhood participation, a timeout is a signal of intent to leave a network. Upon the discovery of a timeout, one further query is sent, and a second timeout implies intention to leave the network. Network reconfiguration is an important issue in HAND networks. The design of the network is meant to allow reconfiguration quickly and easily, with little penalty and easy reentrance for node that have exited or have been cut off from the network by exiting nodes. Reconfiguration must occur when a neighbor has exited for any reason and is broken into two cases: (1) the exiting neighbor has no other neighbors, or (2) the exiting neighbor has other neighborhood connections. Reconfiguration could occur in different ways, depending on how much information each node has on other nodes. As the amount of information maintained in this model is designed to be minimal, we have chosen to allow reconfiguration to occur in the same way as entering a network for the first time. See the next section for further discussion of network information and its implications for network design.
4.2.3
Administration and Maintenance
There are several administrative and maintenance tasks that nodes must be responsible for when they are part of a HAND network. Identification queries can take several forms: (1) one emulates a "ping" (simply an "are you there" query), (2) one emulates an ARP (a "who are you" query), and (3) one allows the trace of a route from peer to peer in the network. These are required because of the shifting nature of HAND network neighborhoods. Some administration methods implement Jini behavior. These methods allow (1) the registration of service implementation, and (2) the request from a client for a service. Nodes can issues requests to be temporarily removed from dispatch but not from a network neighborhood. This type of "opting out" relays information to neighbors about upper layer dispatch operations while maintaining lower level network connections. A rejoin protocol signals when the node is ready to be placed back on the dispatch list. When a node realizes its neighbor is gone, this information is broadcast over the network. Since nodes build their internal network information incrementally, this broadcast helps to prune dead nodes and to eliminate unnecessary communication.
5. USING A HAND NETWORK FOR PARALLEL PROCESSING Parallel processing on a HAND network is a natural application of the service distribution built into the network. As in the Jini model, service providers announce their service and service users request a service. Rather than using a central Jini server however, the network uses the Freenet data caching model. Service registries are cached around the network at the discretion of the nodes receiving the announcement. Service requests are then handled in a manner similar to a Freenet document request: a node may have a match for the service or may pass the request on to a neighbor; nodes who have cached a matching registry can pass the request directly on to a node capable of providing the service. Unlike Freenet, however, the match is not a string match, but rather involves ensuring attributes of the request match attributes of the service registry.
5.1 Service Registries and Requests: HAND Network Agents HAND network agents are the application-level interface to the node of a HAND network. Agents are used either to distribute a resource, or to gather them and use them for some application. If, as a part of the application, code needs to be sent to another HAND network node for execution, that code is placed into a HAND network applet. The applets can define particular computations and/or message handlers. Agents advertise services using service registry objects. The registry contains a primary search string to help categorize the service and enhance network routing of requests to a possible match. It also holds a list of class names, attributes and a unique service id. Agents gather services using service request objects. The request includes a primary search string, a list of attributes the service should have, and a list of attributes the service should not have. An agent providing a service registers the service with the network and then waits for requests. When a service request arrives, the agent checks for a match with the service it provides. If a match occurs, the agent sends an offer of service to the requesting node and spawns a listener to handle further communications. Conversely, an agent requesting a service builds a service request object and sends it to the agent providing the service over the HAND network. When it receives a response, the service requester spawns a listener to deal with further communications between the nodes. Depending on the service received, communication may involve sending code or data between the two nodes.
5.2 A Parallel Computation Example There are three parts to a parallel computation: the agent that serves the CPU resource, the applet that describes the computation on the remote CPUs, and the agent that gathers the CPUs and organizes them into a computation unit. Each agent for a HAND network node providing its CPU as a resource sets up a registry and registers it with the network. The agent then listens for responses, and upon receiving a request for the CPU, spawns a thread to handle communications, and sends the requester a response that includes a communication handle to use for future messages to this agent. As the following code
shows, distributing the CPU resource is simple because all other operations are generically handled by the HandNetAgent The agent for the node requesting CPU resources first builds the request, then sends it to the HAND network. As the requesting agent receives responses, it adds listeners to communicate with the acquired CPUs. Upon receiving a resource, the agent can communicate with it: in this example a RUN_APPLET message is sent instructing the nodes to run MaxApplet. The listeners handle all further communication dealing with sending class bytes to the remote CPUs and requests for the code are generated automatically through an overloaded class loader. After ordering the applet to start, the agent sends data and waits for responses. The responses are in a APPLET_RESULT message which is handled by an overloaded message handler for the agent. As results are received the global max is computed. The full code for this agent follows. The last piece of the example is a HandNetApplet for computing the maximum. It is loaded by each CPU node when they receive the RUN_APPLET message. Applets include three key functions. The first is a message handler which enables the design of application- specific messages (the receipt of the data values in this case). The second applet function calculates the maximum. The third applet function is the run() function, called when the applet starts. In this example, it waits for data, calculates the max, and sends the result back to the main node of the computation. The example above is a master-slave computation typical of distributed computing applications. It is also possible to allow the gathered CPU resources to communicate with each other. The modifications to the example code above are relatively simple. The MaxAgent sends an array of communication handles (node identifiers) to each of the CPUs. The CPUs find their neighbors by moving to the left and right in this array from the index at which their handle is located. The MaxAgent waits for only one response which is the final result. After receiving the array of communication handles and the data, the applet code implements a typical tree computation. At stage i, each node sends its maximum result to the node indexed by (nodeIndex - nodeIndex % (2*i)). Thus, in the first stage, oddindexed CPUs send their maximum result to the even-indexed below it. The second stage sends to the CPU at twice this distance, and so on. When only one CPU remains, its result is sent to the MaxAgent.
5.3 Fault Tolerance Issues At the application level our model makes little obvious effort to handle computational difficulties related to nodes leaving the network. The network infrastructure automatically handles reconfiguring the network as necessary so the main disruption to computation is the disappearance of a computing node. We address this issue via the service model and the use of message timeouts. The service model we use allows application writers flexibility in addressing node reliability particular to a specific network. The inclusion of attributes in service registries and requests allows nodes to lease resources for a range of time. While not a guarantee of reliability, it does allow the service to speculate on itsreliability. Application writers may choose to break their tasks
up into smaller pieces and issue requests for leases at each stage. (Easy re-leasing of a service is not yet included in our model but conceivably could be added to speed up these requests.) When a node disappears, the timeout from messages sent to it is propagated up to the agents sending messages. This allows a variety of options for the application writer ranging from a restart of the entire computation to a request for further resources to a redistribution of tasks.
6. CONCLUSION This paper describes work in progress on a model for constructing a network infrastructure of handheld machines that tolerates the dynamic nature of a mobile ad hoc network and provides the foundation for applications to take advantage of this type of computing network. We have described the requirements and constraints of a HAND network and have described an implementation model for the infrastructure to support such a network. We have discussed the ways we will use this model for parallel processing. There are several open questions concerning the HAND networks. One area of further study involves the network performance and the overhead of maintaining the network. In particular we need to determine if the protocol we have designed is lightweight enough to leave resources available for sharing on the handheld machines. Another area for further study involves the design, analysis and implementation of parallel algorithms on these machines. Given a HAND network, we are interested in finding a granularity of data and task length to use the network optimally.
7. REFERENCES [1] Batcher, K. Design of a Massively Parallel Processor. IEEE Transactions on Computers, Vol. 29, No. 8, September, 1980, pp. 836--840. [2] Clarke, I., Sandberg, O.,Wiley,B., and HongT.W., “Freenet: A Distributed Anonymous Information Storage and Retrieval System” in Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability, LNCS 2009, ed. by H. Federrath. Springer: New York (2001). [3] Geist, G.A. and Sunderam, V.S. “The PVM System: Supercomputing Level Concurrent Computations on a Heterogeneous Network of Workstations”, Sixth Distributed Memory Computing Conference Proceedings, April 1991, pp. 258-261. [4] Hayes, B., “Computing Science Collective Wisdom”, American Scientist, March-April 1998. [5] Snir, M, Otto, S. Huss-Lederman, S. Walker, D. Dongarra, J. MPI: The Complete Reference, MIT Press, 1998. [6] Sun Microsystems, “Jini[tm] Technology Architecture Overview”, available online at http://www.sun.com/jini/whitepapers/architecture.html.