Federated grid clusters using service address routed optical networks

1 downloads 0 Views 2MB Size Report
Department of Computer Science – Systems, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697, United ...
Future Generation Computer Systems 23 (2007) 957–967 www.elsevier.com/locate/fgcs

Federated grid clusters using service address routed optical networks Isaac D. Scherson ∗ , Daniel Valencia, Enrique Cauich, John Duselis, Richert Wang Department of Computer Science – Systems, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697, United States Received 31 October 2006; received in revised form 13 April 2007; accepted 19 April 2007 Available online 5 May 2007

Abstract Clusters of computers have emerged as cost-effective parallel and/or distributed computing systems for computationally intensive tasks. Normally, clusters are composed of high performance computational nodes linked together by low-latency/high-bandwidth interconnection networks. With the advent of modern optical networking technologies, geographically distant clusters can be federated to yield systems considered tightly-coupled. By using Service Address Routed (SAR) optical networks, cluster federations are shown to be effective in dealing with complex scientific computations in a manner that is transparent to the user. The analysis of such federated clusters is carried out using a discrete event simulator. The findings include means to control the tradeoff between user response time and overall completion time, the advantages and disadvantages of exploiting and giving up locality, and how a meticulous control over the level of greediness can yield noticeable performance improvements. c 2007 Elsevier B.V. All rights reserved.

1. Introduction Computer clusters provide processing and storage resources to achieve high performance computing through concurrency and parallelism. For complex and computationally intensive problems some research organizations share and organize their computing clusters into federations to increase the availability of their computational power while preserving the Single System Image (SSI). Normally, the SSI is provided by the Distributed Operating System running on the individual nodes [20], though a recent network architecture named Service Address Routing (SAR) provides that functionality within the network [22]. Cluster federations encounter some inherent challenges including resource discovery, load distribution and high communication latencies especially in shared networks such as the Internet. Being widely adopted and ubiquitous in nature, the Internet is a convenient means for interconnecting clusters to form federations regardless of its unpredictable performance, unsafeness and overheadd. However, new emerging technologies, such as lambda routing in optical networks [26], provide an alternative solution and eliminate some of the Internet inconveniences. ∗ Corresponding author. Tel.: +1 949 824 8144; fax: +1 949 824 4056.

E-mail address: [email protected] (I.D. Scherson). c 2007 Elsevier B.V. All rights reserved. 0167-739X/$ - see front matter doi:10.1016/j.future.2007.04.012

Optiputers, proposed by Larry Smarr [12,18,26], are computer clusters that use optical interconnects to speed up communication between the clusters. Optical networks can be as much as 100 times faster than regular 100 Mbps Ethernet interconnected network, which is used in some laboratories as the fabric for interconnection networks for tightly coupled distributed systems [26]. An example of an optiputer would be a set of clusters interconnected by dedicated optical fibers [3,14,25,28,31] sharing computing resources. Consider the distance between two sister campuses within the University of California: UC Irvine and UC San Diego. An optical signal will travel the distance in about 1ms in an optical fiber. By comparison, copper wired Ethernet networks will typically have a latency of approximately 100 ms (point to point link), which is two orders of magnitude slower. In optimal network conditions, a signal traveling through the Internet would take about 5 ms to traverse such distances. The purpose of this work is to analyze the use of the Internet and dedicated optical fibers as the interconnection media for producing scalable federations of heterogeneous SAR clusters. It studies the behavior of Network Embedded Operating Systems (NEOS) based clusters and their interaction. Mechanisms for load distribution and service discovery are presented as part of the architecture for SAR. Finally, experimental information on the effect that varying parameters

958

I.D. Scherson et al. / Future Generation Computer Systems 23 (2007) 957–967

of the model has on performance and on the degree of migration, is obtained through simulation. This paper is organized as follows: Section 2 provides similarities and differences with related work, Section 3 describes the system architecture and SAR; Section 4 presents the algorithms for resource discovery and load distribution; lastly, Section 5 describes the simulated architecture and discusses the results observed. 2. Related work There exist various service lookup mechanisms for locating services in a distributed system, most of them using centralized entities such as Jini [11] and CORBA [30]. Although there can exist several servers in one system, each of them is a centralized information repository of its own group. In SAR, when a hierarchical structure exists, each switching element is a module of a distributed equivalent for such entity. Furthermore, SAR has several other abilities, which include high scalability, load distribution capabilities and process migration. Although there is a plethora of work for overcoming such problems [4, 6,8–10,13,17,23,24,27], most of them introduce overheads and latency to the processing elements. Although systems relying on the SAR paradigm have similarities with Content Addressable Networks (CAN) [19] (one could establish a correspondence between keys and service identifiers, and between resources and values), the basic difference relies on the service identifier used by SAR. This means that the interconnection fabric understands services and requests. Furthermore, CANs were incepted for distributing information, whereas SAR is meant for finding and managing resources of any kind. Globe [29] is a location independent object locator aimed at wide area systems. It uses a binding of a name of an object and one or more addresses where the object can be located. Globe is hierarchical in nature, as it divides the system into regions, each of which has a directory node that can have directory subnodes. This service also makes use of pointer caches at stable locations for contacting records. A search for a contact is done by a standard tree search algorithm. The ability to store different locations for a single handle is a main contribution of this system. Carmen [16] and DIET [1,2] are two examples of hierarchical distributed resource locators. Their hierarchical architecture may look similar to a SAR LCAN. However, Carmen does not provide efficient resource management, and even though DIET does, it floods the entire network with broadcast messages every time a request is generated. In contrast, SAR provides more intelligent mechanisms by balancing requests locally, which leads to better network utilization. Furthermore, not only does SAR provide service discovery, load distribution and load balancing mechanisms in a hierarchical fashion, it also does it within the network fabric in a completely distributed manner. To the best of our knowledge there is no interconnection network capable of providing resource management functionality for distributed systems using services as a base to route, balance and migrate requests within the system.

Fig. 1. SAR clusters connected via the Internet.

Fig. 2. An example of a hierarchical network is this 3-layered 4 × 3 least common ancestor network (LCAN) interconnecting 64 nodes of a cluster.

3. System architecture Local SAR clusters are interconnected to form a complete system federation as shown in Fig. 1. The purpose of the system is to exploit the scalability inherent in SAR and facilitate collaboration between distant clusters. Clusters may have any internal topology and means to interconnect to the longdistance clusters. 3.1. Network topology The topology considered in this model is a wide-area interconnection network. The Internet or dedicated optical links are used to allow information interchange between distant clusters. Internally, the topology of each cluster is left unspecified; nonetheless, it is assumed to be a Service Address Routed network. In this study, the individual clusters are interconnected by hierarchical networks with support for redundancy [21], as shown in Fig. 2. State-of-the-art optical networking devices include alloptical switches capable of routing data packets without the need of translating them into electro-packets and back to photopackets. Thus, network latency is considerably reduced by avoiding the electro-optical conversions and hence the much slower electronic processing of packets. Clearly, by working only in the optical domain, power requirements are also reduced drastically and the savings can reach at least a factor of

I.D. Scherson et al. / Future Generation Computer Systems 23 (2007) 957–967

959

100 [15]. Lucent Technologies has introduced such optical switches, which allow a bandwidth of 2 Tbps in a single fiber, based on technologies recently developed at Bell Labs. Hierarchical architectures have been shown to be efficient, scalable and flexible designs for intercommunication networks between a myriad of processing elements. The CM-5 (Connection Machine 5) system has shown that fat-trees [5] increase performance of data parallel applications when the size of data and processing resources increase as well [7]. 3.2. Service address routing (SAR) Fig. 3. Traditional routing.

In traditional network architectures, source nodes specify the physical addresses of destination nodes according to some protocol (e.g. IP, Ethernet). SAR is a means to select destination nodes by the name of the services they are capable of providing. Thus, resource allocation and replication, together with resource load balancing and scheduling, are implemented in the switching network rather than in the middleware of the cluster nodes. The result is a more efficient system where the network is aware of all the services offered within the nodes and of all the services provided by distant clusters. All a sender is required to know is a Service Address that the network will use to route service requests to a local or distant service provider. Note that this approach is similar to some name services used on the Internet. The huge difference is that our switching fabric provides resource management intelligence, hence the name NEOS. This is due to the fact that the nodes do not explicitly communicate with each other, but request services that the switches are free to control and allocate. Moreover, our intelligent network provides inherent support for faulttolerance as well as procedure migration without overloading the computational nodes of the connected clusters. A network that incorporates SAR is said to be intelligent, and is therefore also referred to as an Intelligent Interconnection Network (I2 N). A Service is an abstraction used to represent capabilities that can be provided by individual nodes to the rest of the system. In the traditional Internet Client Server model, when an application requires some functionality, it can either provide it by itself, or be able to find the address of the server where the functionality exists. With a SAR switching network, regardless of the physical or logical distribution of the nodes, once a service is requested, the network locates an appropriate serving node. For the sake of comparison, Fig. 3 shows how traditional physical-address routing is done, while Fig. 4 depicts service addressed requests. The main advantage of incorporating SAR into the system is that it is relatively easy (and transparent to the applications) to implement load distribution, load balancing, reliability and fault tolerance, among other functions, in a system composed of a multiplicity of clusters geographically distant and with heterogeneous capabilities.

Fig. 4. Service-address routing.

Fig. 5. Internal architecture of a LCAN switch.

in a distributed (e.g. hierarchical) switch, the table must be distributed amongst its modules, hereafter referred to as switches. The Service Table entries are filled by the service registration process. Each node will register the services it is capable of providing to the network. The service registration process is dynamic, which means any node can register and unregister services at any given time [22]. Fig. 5 shows the Service Table for each downer inside a single switch. 4.1. Service Table fields

4. Load distribution and service discovery process The main source of information for the in-network intelligence of SAR is the Service Table. As mentioned in [22],

Each downer port within a switch contains a Service Table. Each Service Table entry has specific data that is used to calculate routing decisions for resource location and load

960

I.D. Scherson et al. / Future Generation Computer Systems 23 (2007) 957–967

Fig. 6. Service Table using level-global knowledge.

Table 1 Service table entries Fields

Description

Service ID

A unique Service ID within the table used for the search The Node ID of the best processor in the port’s subnetwork that can provide the service specified in Service ID Current amount of work that the node has Unspecified metric of the node’s processor speed

Node ID

Workload Processor speed

distribution. Table 1 shows the information found in each Service Table entry and a description of each field. 4.2. Level-global knowledge Each downer must store information on every single service for which it can provide a path. The Service Table grows exponentially at each higher level of the hierarchy, but all switches have a complete view of the subnetwork beneath them (Fig. 2). For a switch in level i, the table in each downer will have up to Su i−1 entries, where S is the maximum number of different services a node can provide, and u is the number of uppers per switch. To provide faster service lookups, the Service Table is replicated in each level within the hierarchy. The number of roots per subnetwork in a level corresponds to the number of times the Service Table is replicated in that level. The number of roots per subnetwork is dependent on the number of uppers per switch. In the first level, the union of all local tables is

equivalent to one copy of the Service Table; in the second level, the local tables are replicated in each parent of their corresponding subnetwork, which is the number of uppers per switch; at the topmost level, each switch contains a complete replica of the Service Table. An advantage of this model is the fact that a service provider can be located in a switch’s subnetwork when the request arrives at that switch, as shown in Fig. 6. With appropriate service search and load distribution algorithms, this implementation would be expected to yield optimal performance. The challenge is that the knowledge stored in the switches is only relatively global. The information is complete, but just for the subnetworks under their umbrella. The search and allocation algorithm must be aware of this fact, and make an appropriate assignment of resources without creating unnecessary traffic. 4.3. Level caches In an effort to alleviate the exponential table growth, a design is proposed where all Service Tables are of the same size. One advantage of this model is, because of the constant size of the per-port Service Tables, there is no difference among switches, regardless of the level to which they belong. Another advantage is that the number of levels in the network can rise arbitrarily without increasing the complexity of the switches at each level, as shown in Fig. 7. Using this design, the union of the tables in the bottommost switches would again be equivalent to a complete, system-wide, service table. However, for any downer belonging to a switch in

I.D. Scherson et al. / Future Generation Computer Systems 23 (2007) 957–967

961

Fig. 7. Service Table using caches.

level 2 or higher, the information contained in the table may not be complete. Thus, these tables are considered caches of the complete service table. The advantages of having constant-sized caches were already discussed, but there are disadvantages and challenges as well. The more levels there are in the hierarchy, the smaller the caches become relative to the number of services in the system. This means that services are more likely not to be found and the replacement of entries is more likely to happen. A good algorithm will not be deterred by the lack of information, but manage to provide good system performance and keep only the most relevant entries. When a service request arrives at a switch and there is a cache hit, the switch will forward the request through its downer with the least amount of workload recorded. However, in the event of a cache miss, a Question message is sent through all available downers asking their children if they can provide a path for that service. When such a message arrives at a switch and any instance of the service is found in the tables, an update is generated and broadcast up the hierarchy. Otherwise the Question messages are being broadcast down the subnetworks. The way this algorithm works, a request will not wait for the update to arrive, but instead will keep propagating upwards until a suitable server is found or the request process is stopped and rescheduled. Therefore, the update will only benefit later requests. The concept of SAR caches is as novel as the concept of SAR itself, and has little relation to Internet caches even though we are using the Internet as a means for interconnecting distant clusters. While SAR caches are used to locate resources and

Fig. 8. Resource discovery and load distribution algorithm.

are used inherently in the routing process, Internet caches hold pieces of content that users might seek, addresses, and routing patterns. Thus, unlike in SAR, Internet content caching and routing information caching are two separate and independent concepts. 4.4. Load distribution This algorithm, condensed in Fig. 8, is based on finding the most suitable node for each job. We need to consider that the real execution time required by a node for a particular task is not known; therefore, an approximation needs to be done. A server node can deduce what is the actual workload. This deduction is based on the approximations calculated by the requesting nodes. The algorithm can be described in three steps: approximation of the service execution time, finding the best suited node for each service, and routing the request to the serving node.

962

I.D. Scherson et al. / Future Generation Computer Systems 23 (2007) 957–967

Every time a node requests a service, it calculates an approximated prediction of the amount of work that the service will require. The node’s predicted workload is calculated thus: P(E Rn ) = .9P(E Rn−1 ) + .1E Rni

(1)

where P(E Rn ) is the predicted execution time for the new service request, P(E Rn−1 ) is the previously predicted execution time, and E ni is the actual execution time of the last service completed. The calculated value is sent within the Request message, and is used to estimate the best capable server to perform the service. Using this predicted value, each server calculates the expected time to complete the service and its total estimated workload, P(Wn ). Each time the estimated workload in a server node changes, due to a new service request or a service completion, it sends an Update message to its parent switch announcing its new workload, Node ID and the ID of the service that it will provide or has just finished providing. When a switch receives an Update message through a downer, it performs two operations: first, it modifies the workload values for all the entries in that specific port’s table with the same Node ID, regardless of the service. Later, it compares entries from all the port tables with that specific Service ID, and announces the best node for that specific service, along with its workload, to its parent switches using Update messages. This procedure repeats until all root switches are reached. When a node sends requests to the network for services to be processed, the network switches utilize their Service Table information to route the request towards the most efficient node capable of providing the requested service. Using the predicted value, each switch will calculate the expected time to complete the service based on its port’s bandwidth latency and the recorded workload for the prospect server. This is calculated using (with seconds as the measurement): P(C(Rn )) = P(Wn ) + P(Rn )/Node Speedup + D(t)

(2)

where P(C(Rn )) is the expected completion time, and D(t) is the data transmission time. Each switch’s Service Table will keep track of the Node ID, workload, and processor speed of the node that can best process each service in its subnetwork umbrella. When a service request reaches a switch, the switch will check to see if its subnetwork is capable of providing the service. This is done by checking if an entry exists in the switch’s Service Table with the specified Service ID. If the service cannot be provided anywhere within the switch’s subnetwork umbrella, the service request is forwarded to its parent switch. If the service exists within the switch’s subnetwork, the switch will forward the service request message to its parent switch announcing an estimate of the workload the node would have in case it is chosen to fulfill the service. In this case, the parent switch will check if there exists another node that can provide the service faster by at least some threshold within its bigger subnetwork, which would produce a more balanced system. If the parent switch can do better than its child switch in finding the best server, i.e. a node exists within the parent switch’s subnetwork that has less workload, then that switch will forward the service

request to its respective parent switch with the better node’s workload. This process is continued until the service request message reaches a root switch, or when a parent cannot do any better than its child, in which case the parent will send a message to the child telling it to choose its best node for the service. The child switch will then choose the port that can route to the node with the best execution time and forward the request through this port. When a root switch obtains a service request, it can decide to process the service locally or forward the service request to a distant cluster. The root node will ask the clusters capable of providing the service for their expected completion time of the task. All clusters will respond informing of their best nodes’ workloads. If the root does not obtain any workload values better than the one it possesses, then it will forward the service request message to its best node for local processing. Otherwise, if it considers that a remote cluster can perform better at least by a certain threshold (which will be referred as migration threshold) than the best known local node, it will send the service request message to that cluster. Once the service request reaches the remote cluster, the request will be forwarded down to the best node like a local request. 5. Experiments A discrete event simulator built for studying SAR clusters with hierarchical networks was used to produce experimental data on the behavior of these federated clusters. The purpose of this experimental study is to have information on the performance of NEOSs based clusters when federated together. Furthermore, three degrees of freedom were defined by selecting three independent variables to observe their effect on overall system and individual task performance. These variables are: intercluster interconnection fabric, migration thresholds and cache sizes. 5.1. Experimental model The modeled system consists of three SAR clusters; each one is a 4-layer complete Least Common Ancestor Network (LCAN) with two uppers and three downers per switch, and 81 nodes connected to its bottommost switches. Service tables can contain the complete list of the services provided or just a fraction, depending of the configuration of the system. Four different service table sizes where modeled; five, ten, fifteen and thirty entry table (C5, C10, C15 and C30 in the figures). Architectures with the first three configurations are referred to as Level Caches, given that they can only hold information for a fraction of the services provided, and replacement of the entries becomes necessary as mentioned in Section 4.3. When the service table size is large enough to hold the maximum number of services provided in the subnetwork, in this case C30 since there are only thirty distinct services, it is called Level-global knowledge. The size of the level caches is varied to study how system performance is affected. The model includes 30 different services uniformly distributed among the nodes. Each node can provide five

I.D. Scherson et al. / Future Generation Computer Systems 23 (2007) 957–967 Table 2 Independent variables varied throughout the experimentation process Variable

Description

Values

Intercluster interconnection network

Describes the interconnection network that connects the federation clusters Number of entries on each service table Factor of workload balance among nodes

Internet: Links with variable bandwidth, ranging between 1 and 10 Mbps. Optical: Deterministic behavior, link speed of 10 Gbps

Cache size

Migration threshold

Level caches: 5, 10 and 15 entries. Level-global knowledge: 30 entries Values of 1%, 5% and 10% on all levels

Table 3 Frequency of test case workload Type of frequency

Frequency of requests (average)

Lowest Low Gradual High

1 request/5 s 1 request/3 s 1 request/1.5 s 1 request/0.66 s

different services; therefore, a total of 405 service instances are provided by each cluster’s nodes. Another parameter studied is the workload balance factor among nodes, which is controlled through the migration threshold. When a switch finds a better-suited node for a particular service request, it must decide whether it is better enough to be selected; if the difference is above the migration threshold, the new node is deemed worthy. During the experimentation process, this parameter took values of 1%, 5% and 10% of the best-known node’s workload. For different experiments, the three clusters were connected together with two different interconnection media: the Internet and optical fibers. Dedicated optical interconnections between clusters are assumed between two to three orders of magnitude faster than the Internet and one order of magnitude faster than local interconnects. Delays and bandwidth while traversing the Internet are varied dynamically better to represent the changing nature of such network. These variables are summarized in Table 2. 5.2. Simulation Different workload configurations are used to see how the system will react to variable amounts of data and network traffic. For each workload, 3000 requests are issued by a single cluster. The workloads are classified as lowest, low, gradual, and high depending on the requests frequency. These classifications are shown in Table 3. The service time for each request is uniformly distributed among requests and ranges from 1 to 300 s. All service requests are issued by the nodes of a single cluster, the requests are uniformly distributed among nodes. The distant clusters are then used by the load distribution algorithm in case the local cluster becomes overloaded. Every combination of workload frequency, cache size and migration threshold is simulated.

963

5.3. Results There are several findings in the experiments; the most important ones are related to the degree of load balance given the variables considered throughout the experimentation process. The results are organized and interpreted in three sections corresponding to the interconnection medium, migration threshold and cache size. 5.3.1. Intercluster interconnection media Heavily loaded clusters show a better balanced load when connected with dedicated optical networks, because it is more often cheaper to fulfill a request in remote clusters than in the local clusters. When the clusters are connected with the Internet infrastructure, the workload shows to be poorly balanced, having most of the requests served in the local cluster because it is faster to provide it locally than to send it to any remote cluster owing to the enormous communication delays. This behavior leads to unbalanced cluster federations, overloaded clusters and underutilized clusters. This can be observed in Fig. 9; the number of nodes actually used when the clusters are connected with the Internet is at most 90, while when connected with optical networks that figure raises up to 215 serving nodes. Furthermore, nodes present an average utilization of more than 65% in the clusters connected through the Internet, while the optical interconnects help lower that figure to 37%. There are more serving nodes in the system using optical interconnections because the cost of traveling to a distant cluster is much lower than when using the Internet. That means it is cheaper for more requests to migrate when using the faster network, which distributes workload to remote clusters producing a more balanced system. Additionally, service discovery mechanisms can perform poorly due to high network traffic and collisions caused by an overflow of requests being forwarded and rescheduled. This can be observed in Fig. 9; service discovery latency is more than double for systems interconnected with the Internet than for those connected with optical networks. It could be inferred that the latency is increased due to the Internet connections latencies, but actually just a very small percentage of the services have been fulfilled remotely. Therefore, it is not only the huge intercluster latencies that affect the process, but an overloaded local network that results from traffic caused by requests that did not migrate but were served locally. A faster network with more deterministic behavior allows better advantage of thresholds and caching to be taken for two reasons: the penalty of a misprediction is smaller, and fewer overheads have to be considered along with the cost of migration. This last is also a reason why more concurrency is achieved with the faster optical networks versus the slower Internet. Furthermore, not only do optical interconnections provide a better load distribution among nodes that would otherwise not be used, but they also significantly reduce the individual task roundtrip time in clusters overloaded with requests; performance improvements range between 35% and 43% for Level-global knowledge and between 22% and 31% for all other cache sizes considered. This leads to a faster response time for the user in overloaded systems (Fig. 10).

964

I.D. Scherson et al. / Future Generation Computer Systems 23 (2007) 957–967

Fig. 9. Performance comparison using different measurements for heavy workloads, varying between (left) using the Internet and (right) having dedicated optical fabric as intercluster interconnection network.

Moreover, in some cases, such as switches with 15-entry caches (Fig. 11), they increase task completion times and service discovery overheads. Therefore, a proper threshold value also leads to a faster service discovery by avoiding unnecessary lookups and network traffic. Furthermore, thresholds only show a fraction of the resources in remote clusters - the actual number of underutilized processors in the entire system rather than the whole set of processing resources available.

Fig. 10. Task time reduction using optical interconnections for intercluster communications instead of the Internet.

5.3.2. Migration thresholds Thresholds provide the degree of migration by choosing whether to forward particular requests to upper levels. When a request reaches a root switch in a cluster, the threshold evaluates the worthiness of sending the request to every other cluster rather than processing it locally. Network congestion, communication latency and node workload can all be alleviated by dynamically adjusting thresholds. Small threshold values can produce unnecessary lookups in upper-level switches, thus causing performance reduction and increasing the service discovery latency.

5.3.3. Cache sizes Level-global knowledge does not necessarily help in achieving a low overall completion time. By allowing random partial knowledge to be used and by losing partial control of locality, the overall completion time can be reduced. Fig. 9 shows optical networks with small level caches achieving up to 17% performance improvement compared to Levelglobal knowledge. However, analyzing the individual response time for users (Fig. 12), individual tasks lose performance when using level caches rather than Level-global knowledge. In conclusion, level caches not only provide scalability but their randomness can also help the system achieve better job distributions leading to a faster system throughput instead of a faster user throughput, which allows faster system decongestion. By randomly ignoring nodes that are able to provide some services, the effect of having cache sizes set properly is to procure more nodes from distant clusters. In this manner,

I.D. Scherson et al. / Future Generation Computer Systems 23 (2007) 957–967

965

Fig. 11. Performance comparison of clusters connected by optical interconnected networks using different thresholds.

6. Conclusions

Fig. 12. Percentage of individual time increase when using different Levelglobal knowledge cache sizes.

the local nodes are less loaded and the local network is less congested. Thus, an appropriate and conscientiously adjusted combination of thresholds and cache sizes can dramatically improve the overall system performance, provided that the advantages are not suffocated by slow network delays. For instance, if the goal is to keep the roundtrip smaller, either global knowledge or sufficiently big caches have to be considered.

Service Address Routing is a new field of research, and there is much to be explored and experimented with. In this work, several networks were simulated and their behavior was closely examined under the light of different criteria. The concept of SAR with Level Caches was introduced and analyzed. The intelligence distribution architecture proposed along with SAR in [22] has been named Level-global knowledge, because it provides global knowledge about the services provided in a particular subnetwork whose size depends on the level of the switch. SAR also contains a load distribution algorithm that proves to be proficient in allocating services which contributes to favorable results. A completely new service discovery and routing algorithm was proposed to consider node heterogeneity and task requirements. Intercommunications using optical fibers between loosely-coupled SAR LCAN clusters showed a feasible performance comparable to tightly-coupled systems. Lastly, the concept of a threshold for controlling the degree of greediness was introduced and analyzed. This work is the first in a vast series of practical and theoretical analyses that needs to be done in the new field of SAR. The findings presented herein and synthesized in Fig. 13, will provide a better understanding and design of

Fig. 13. Important simulation results.

966

I.D. Scherson et al. / Future Generation Computer Systems 23 (2007) 957–967

distributed systems at any scale, and for the better exploitation of computing and communication resources. Future work includes global balancing of the clusters, process migration algorithm development, and fault tolerance of a SAR system.

[19]

References [20] [1] E. Caron, F. Desprez, DIET: A scalable toolbox to build network enabled servers on the grid, International Journal of High Performance Computing Applications (2005). [2] E. Caron, A. Chis, F. Desprez, A. Su, Design of plug-in schedulers for a GridRPC environment, Future Generation Computer Systems (2007), doi:10.1016/j.future.2007.02.005. [3] T. DeFanti, C. de Laat, J. Mambretti, K. Neggers, B. Arnaud, Blueprint for the future of high-performance networking: TransLight: A globalscale LambdaGrid for e-science, Communications of the ACM 46 (11) (2003). [4] F. De Turck, S. Vanhastel, B. Volckaert, P. Demeester, A generic middleware-based platform for scalable cluster computing, Future Generation Computer Systems 18 (4) (2002) 549–560. [5] R. Greenberg, C. Leiserson, Randomized routing on fat-trees, in: IEEE 26th Annual Symposium on the Foundations of Computer Science, IEEE, 1985. [6] Y. Guo, P. Wendel, Developing a distributed scalable Java component server, Future Generation Computer Systems 17 (8) (2001) 1051–1057. [7] W. Hillis, L. Tucker, The CM-5 connection machine: A scalable supercomputer, Communications of the ACM 36 (11) (1993) 31–40. [8] K. Ho, H. Leong, An extended CORBA event service with support for load balancing and fault-tolerance, in: Proceedings of the International Symposium on Distributed Objects and Applications, 21–23 September, 2000, DOA, IEEE Computer Society, Washington, DC, 49, 2000. [9] E. Jeanvoine, Poster session: Distributed operating system for resource discovery and allocation in federated clusters, in: Proceedings of the Twentieth ACM Symposium on Operating Systems Principles, SOSP ’05, October, ACM Press, 2005. [10] Z. Juh´asz, JGrid: Jini as a grid technology, Newsletter of the IEEE Computer Society’s Task Force on Cluster Computing 5 (3) (2003). [11] Z. Juhasz, L. Kesmarki, A jini-based prototype metacomputing framework, in: Proceedings of Euro-Par 2000 — Parallel Processing: 6th International Euro-Par Conference, Munich, Germany, August/September, in: Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2000, p. 1171. [12] K.H. Kim, Wide-area real-time distributed computing in a tightly managed optical grid: An optiputer vision, in: Proceedings of the 18th International Conference on Advanced Information Networking and Applications, AINA ’04, vol. 2, March, IEEE Computer Society, 2004. [13] C. Lap-Sun, K. Yu-Kwong, The design and performance of an intelligent Jini load balancing service, in: International Conference on Parallel Processing Workshops, Valencia, Spain, 2001, pp. 361–366. [14] J. Leigh, L. Renambot, T. DeFanti, M. Brown, E. He, N. Krishnaprasad, J. Meerasa, A. Nayak, K. Park, R. Singh, S. Venkataraman, C. Zhang, D. Livingston, M. McLaughlin, An experimental OptIPuter architecture for data-intensive collaborative visualization, in: Proceedings of the Workshop on Advanced Collaboration Environments, Seattle, WA, June 22–24, 2003. [15] Lucent’s new all-optical router uses bell labs microscopic mirrors. http://www.bell-labs.com/news/1999/november/10/1.html. [16] S. Marti, V. Krishnan, Carmen: A dynamic service discovery architecture, Mobile and Media Systems Laboratory, HP Laboratories Palo Alto, September 16th, 2002. [17] O. Othman, C. O’Ryan, D.C. Schmidt, The design of an adaptive CORBA load balancing service, IEEE Distributed Systems Online, vol. 2, Apr. 2001. [18] P. Papadopoulos, C. Papadopoulos, M. Katz, W. Link, G. Bruno, Configuring large high-performance clusters at lightspeed: A case study,

[21] [22]

[23]

[24]

[25]

[26]

[27] [28]

[29]

[30]

[31]

International Journal of High Performance Computing Applications 18 (3) (2004). S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Schenker, A scalable content-addressable network, in: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols For Computer Communications, San Diego, CA, United States, SIGCOMM ’01, ACM Press, New York, NY, 2001, pp. 161–172. C. Sauer, D. Johnson, L. Loucks, A. Shaheen-Gouda, T. Smith, RT PC distributed services overview, ACM SIGOPS Operating Systems Review 21 (3) (1987). I.D. Scherson, C. Chien, Least common ancestor networks, VLSI Design 2 (4) (1995) 353–364. I.D. Scherson, D.S. Valencia, Service address routing: A network architecture for tightly coupled distributed computing systems, in: Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN) in Las Vegas, Nevada, USA, November 2005. T. Schnekenburger, Load balancing in CORBA: A survey of concepts, patterns, and techniques journal, The Journal of Supercomputing 15 (2) (2000) 141–161. K. Shen, H. Tang, T. Yang, L. Chu, Cluster resource management: Integrated resource management for cluster-based Internet services, ACM SIGOPS Operating Systems Review 36 (SI) (2002). R. Singh, N. Schwarz, N. Taesombut, et al., Real-time multi-scale brain data acquisition, assembly, and analysis using an end-to-end OptIPuter, Future Generation Computer Systems 22 (8) (2006) 1032–1039. L. Smarr, A. Chien, T. DeFanti, J. Leigh, P. Papadopoulos, Blueprint for the future of high-performance networking: The OptIPuter, Communications of the ACM 46 (11) (2003). T. Suzumura, S. Matsuoka, H. Nakada, A jini-based computing portal system, in: Proceedings of SC2001. N. Taesombut, X. Wu, A. Chien, A. Nayak, B. Smith, D. Kilb, T. Im, D. Samilo, G. Kent, J. Orcutt, Collaborative data visualization for earth sciences with the OptIPuter, Future Generation Computer Systems 22 (8) (2006) 955–963. M. van Steen, F. Hauck, P. Homburg, A. Tanenbaum, Locating objects in wide-area systems, IEEE Communications Magazine 36 (1) (1998) 104–109. S. Vinoski, CORBA: Integrating diverse applications within distributed heterogeneous environments, Communications Magazine, IEEE 35 (2) (1997) 46–55. C. Zhang, J. Leigh, T. DeFanti, M. Mazzucco, R. Grossman, TeraScope: Distributed visual data mining of terascale data sets over photonic networks, Future Generation Computer Systems 19 (6) (2003) 935–943.

Isaac D. Scherson was born in Santiago, Chile, on February 12, 1952. He is currently a Professor in the Department of Computer Science of The Donald Bren School of Information and Computer Sciences, and in the Department of Electrical Engineering and Computer Science of the Henry Samueli School of Engineering at the University of California, Irvine. He received BSEE and MSEE degrees from the National University of Mexico (UNAM) and a Ph.D. in Computer Science from the Dept. of Applied Mathematics of the Weizmann Institute of Science, Rehovot, Israel. He held faculty positions in the Dept. of Electrical and Computer Engineering of the University of California at Santa Barbara (1983–1987), and in the Dept. of Electrical Engineering at Princeton University (1987–1991). He is a member of the IEEE Computer Society and of the ACM. Dr Scherson’s research interests fall in the general areas of parallel computer architecture and applications of concurrent computation. His work on interconnection networks for massively parallel systems involves the development of cost-effective high performance networks capable of supporting thousands or millions of processing elements. In addition, Dr Scherson is applying the results of his seminal networks research to Intelligent Networks on Chip for Embedded Systems.

I.D. Scherson et al. / Future Generation Computer Systems 23 (2007) 957–967

967

Daniel S. Valencia is a Ph.D. candidate at the University of California, Irvine. He has a degree in Computer Engineering by the Kino University in Hermosillo, Sonora, Mexico and carried out graduate studies at CICESE, in Ensenada, Baja California, Mexico. After being employed by the hour as a lecturer in the University of Sonora and Kino University, he enrolled as a full-time student in the University of California, Irvine, in 2003. He has worked towards the degree since, and has been appointed intermittently as Teaching Assistant. He was also appointed as Lecturer in the summer of 2006 at the University of California, Irvine.

John Duselis is a graduate student currently enrolled in the Donald Bren School of Information and Computer Sciences at the University of California, Irvine. His research interests are in the areas of Systems and Parallel and Distributed Computing.

Enrique Cauich is a Ph.D. student currently enrolled in the Donald Bren School of Information and Computer Sciences at the University of California, Irvine. He received his B.S. in Computer Science degree from Instituto Tecnologico y de Estudios Superiores de Monterrey (ITESM) and recently received his M.S. degree at the University of California, Irvine. His research interests are in Parallel and Distributed Computing.

Richert Wang is currently a graduate student at the Donald Bren School of Information and Computer Science in the University of California, Irvine. His research interests are in Parallel and Distributed Computing. He recently received his M.S. degree and is currently working on his Ph.D. degree at the University of California, Irvine.

Suggest Documents