Department of Microelectronics and Information Technology. Royal Institute of ..... Tapestry: A resilient global-scale overlay for service deployment. IEEE Jour-.
The Need for Network-Centric Programming Models Position Paper for the Workshop on Network Centric Operating Systems
Konstantin Popov, Per Brand Swedish Institute of Computer Science (SICS) Kista, Sweden {kost,perbrand}@sics.se
Vlad Vlassov, Seif Haridi Department of Microelectronics and Information Technology Royal Institute of Technology (KTH/IMIT), Kista, Sweden {vlad,seif}@imit.kth.se
March 14, 2005 Abstract The Grid is envisioned as a global ubiquitous infrastructure that comprises an ever-increasing number of volatile and distributed services. With the present day Grid middleware and software engineering know-how, building a scalable and self-managing Grid service is a challenging task. The adaptation of peer-to-peer overlay networks for the Grid promises to solve this problem. A large-scale, dynamically reconfigurable and selfmanaging service comprises many nodes connected by an overlay network. We argue that new programming models together with appropriate support from the OS/basic middleware is necessary for both constructing such services, as well as for using them by higher-level services. In particular, the OS support for light-weight concurrency and low-overhead overlay networking is essential.
1
Introduction
The Grid is envisioned as a global ubiquitous infrastructure that treats all kinds of computer-related services as commodities - that can be described, located, purchased or leased, used, shared, etc. Grid in the future will be comprised of a large and ever increasing number of individual services that are volatile and distributed over the network. The present day Grid services are mostly both centralized and static, even though the Grid itself is distributed and dynamic. Building a large-scale service is a challenging task, as it is not supported by existing middleware and current state-of-the-art Grid software engineering practices. For example, the Globus toolkit offers a client-server programming model for building Grid services, which is clearly non-scalable. Today, deployment and management of largescale services can easily become very expensive as it requires manual labour. 1
Distributed applications are supported by middleware that builds on lowerlevel services provided by host operating systems (see e.g. [Emm00]). Generally, middleware provides high-level abstractions addressing communication, synchronization, reliability, scalability and heterogeneity. In addition to that, Grid middleware supports also Grid’s openness, dynamism, and coordinated management of diverse resources shared by “virtual organizations” [BdLH + 03, CRF+ 04]. Historically, support for adopted middleware technology has eventually migrated to the OS level, and we believe the same will happen with the Grid middleware. While commodity OS support execution of applications confined within single processes, a network-centric OS focuses on management of networked resources [KCM+ 00], and would provide support for Grid’s fault-tolerant, scalable, self-healing, and self-managing services [CRF + 04]. The middleware and OS services assume a set of abstractions and conventions concerning service composition and inter-service communication, covering both functional and non-functional aspects. These abstractions are commonly referred to as a programming model. In the present day Grid, the client-server model is used for inter-service communication, and the service-oriented models for service composition, such as OGSA [FKNT02]. Research is active on component-based models (e.g. [BCM03]), and models are also inherited from high-performance parallel computing, such as shared memory, message-passing as well as several combinations thereof [LT03]. A large-scale, self-* Grid service will necessarily be distributed – it will consist from a large number of nodes interconnected by an overlay network, possibly utilizing a varying set of different underlying services. Overlay networks such as peer-to-peer show a potential for handling large-scale systems [SMK + 01, RD01, ZHS+ 04, AEABH03] with autonomic management [KR04, ACMD + 03, CBL04, GLS+ 04]. For the sake of transparent management of service configuration and failure handling w.r.t. service clients, the service should provide multiple access points with similar functionality. It should be able to monitor the changes in component service availability, networking, and its own utilization load. Building such services requires support by middleware and network-centric OS. In this paper we focus on programming model, middleware and OS issues specific to distributed Grid services, where not only individual resources are on the network, but the Grid services themselves are on the network too. General issues of component-based programming for handling of Grid’s dynamic networked resources is out of scope of this document, and is considered elsewhere (see e.g. [BCM03, KMY+ 05, Laf02, CGB+ 04, Gan02]). We start with outlining our understanding of distributed Grid services (Section 2), and then attempt to analyze requirements for a matching programming model (Section 3) and support to be provided by underlying middleware and OS (Section 4).
2
Distributed Grid Services
A distributed Grid service is a Grid service the implementation of which spans several physical sites (i.e. networked computers). On Figure 1, distributed ser-
2
Figure 1: A Distributed Grid Service. vice 3 provides a single interface that is used by services 1 and 2. Internally, the service comprises several nodes that execute service code and encapsulate parts of the service’s state. Every site contains in general many different nodes of different distributed services. The number of nodes participating in a service realization changes dynamically, but is to a large degree transparent to the service’s client(s). Nodes are interconnected by an overlay network. Service requests by clients are transparently dispatched to internal service ports, according to e.g. physical proximity (in our example, service 1 would access node 1) or load balancing. The service implementation uses other services as necessary (service 4 by node 3).
3
Requirements on a Programming Model
Obviously, a programming model for distributed Grid services should address three groups of issues: how a service is defined and its interface specified, how it is programmed (i.e. what happens behind the interface), and how the service is used by other services. For pragmatic reasons, service definition should be a conservative extension of some conventional standards such as web-service- oriented OGSI/WSRF [Tue03, CFF+ 04]. Awareness- and control- facilities of a Grid service as a distributed one can be provided by additional service interfaces, in style of e.g. the Fractal component framework [BCS02]. The way the distribution-related properties of a distributed service can be best reflected to its client’s benefit is clearly a subject of further research. A distributed service is a combination of concurrently running nodes, each of which requires internal concurrency for handling communication with its peer 3
nodes and clients serviced by it. A programming model has to define: • how service nodes are composed together, including inter-node communication, synchronization and exception propagation and handling, the performance model, and while respecting the security demands • what is the state of a service as a whole, and how it is divided between nodes and maintained by them • how internal service ports are associated with service’s nodes Since nodes are volatile, a model should apparently encapsulate nodes as good as possible so that nodes can be interchanged. Yet enough information about node’s state must be made available to peer nodes so that they can take over the node’s clients if necessary. The shared memory programming model seems to be more appropriate for programming distributed services with shared state, as opposed to messagepassing models that would force the application programmer to implement suitable shared-memory abstractions on her own. However, the shared memory programming model need to be extended with network control and awareness for the sake of controlling data replication and failure handling, such as e.g. in the Mozart programming system [Con04, RH04]. The shared memory abstraction should be also extended to support different memory consistency models. The shared memory abstraction can be implemented using e.g. structured p2p overlay networks and distributed hash tables (DHT), benefiting from their good scalability and self-* properties. Internal service ports need to be connected to interface ports in a flexible way that allows dynamic reconfiguration of service nodes. Distributed services should be consistent across internal service ports, i.e. internal service ports provide the same service with the same internal state according to a well-specified consistency model. For example, a client of a distributed service that ignores its distributed nature might demand a strict consistency model. A more flexible client might tolerate a certain degree of inconsistency between service invocations, yielding performance benefits similar to the use of relaxed consistency models for distributed file systems (e.g. [GGL03]) and distributed shared memory systems (e.g. [J´eg00]). Finally, one might wish to compose distributed services. In this case, the same distributed service can be used simultaneously and independently by different nodes of another distributed service. For instance, on Figure 1 the service 4 (that can be distributed on its own) is used by both nodes 3 and 4. The semantic of such service interactions needs to be clearly defined.
4
Support by Network Centric OS
Handling a limited number of local resources and few communication channels by a typical application run by a commodity OS requires a modest amount of concurrency: each of the resources and communication channels is controlled by a logically concurrent thread of control. A Grid service that is distributed and 4
communicates with many other dynamically configured services and peer service nodes, and handles many relatively short-living client transactions requires considerably more concurrency, and for practical reasons concurrency must be light-weight. The present day OS provide concurrency primitives (such as native threads) for relatively heavy-weight, long-living activities such as event handling loops, forcing application and/or middleware developers to implement fine-grained concurrency in OS process’ “user space”. This incurs additional overhead as scheduling is conducted on two levels – by the OS for processes and by processes for internal threads. Maybe even more important, the OS is usually completely unaware of user-space concurrent threads and cannot therefore perform any kinds scheduling optimizations such as co-scheduling (e.g. [BL99]). Overlay networking benefits from kernel-level support similarly to the reason why e.g. the TCP/IP protocol stack performs IP routing in the kernel space: it allows to avoid the copying overhead between the kernel and the user-space, and, probably even more important, it eliminates latency due to asynchronous message passing between user processes and the I/O subsystem in the OS. Since Grid services can deploy different types of overlay networks, a flexible component(module)-based approach for kernel configuration is required. Furthermore, kernel-level support for efficient marshaling of data structures from user processes (e.g. using marshaling frameworks such as [PVBH03]) can further reduce copying and latency overheads. Kernel-level support for overlay networking, and therefore for inter-node communication between nodes in distributed services might demand the kernellevel support for naming and addressing of services and their constituent nodes. In any case, kernel support would speedup the operation of overlay networks as it would be able to decide on its own how a particular message should be handled (i.e. forwarded to another site, delivered to a local process, etc.)
5
Conclusions
Grid computing with its number, diversity, and dynamism of available resources raises new demands for Grid’s middleware, operating systems and programming models. One group of issues that is already well-recognized is handling of volatile resources, which is currently addressed by adaptation of various forms of component-based technologies. However, building distributed “on-the-network” Grid services that consist of many dynamically-reconfigurable nodes interconnected by overlay networks causes more difficulties, while such services have the potential of solving scalability and self-* challenges of the present day Grid. In particular, the programming models for such services have to address more issues not present for “single-node” Grid services, yet these models need to be simple and concise to be generally useful. Careful selection of facilities of underlying network-centric operating systems can speedup such distributed Grid services, in particular by providing support for light-weight concurrency and low-overhead overlay networking.
5
References [ACMD+ 03] K. Aberer, P. Cudr´e-Mauroux, A. Datta, Z. Despotovic, M. Hauswirth, M. Punceva, and R. Schmidt. P-Grid: A self-organizing structured p2p system. SIGMOD Record, 32(2), September 2003. [AEABH03] L.O. Alima, S. El-Ansary, P. Brand, and S. Haridi. DKS(N,k,f): A family of low communication, scalable and fault-tolerant infrastructures for p2p applications. In 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), pages 344–350, Tokyo, Japan, May 12–15 2003. IEEE Computer Society. [BCM03] F. Baude, D. Caromel, and M. Morel. From distributed objects to hierarchical grid components. In International Symposium on Distributed Objects and Applications (DOA), LNCS, Catania, Sicily, Italy, November 3–7 2003. Springer. [BCS02]
E. Bruneton, T. Coupaye, and J.B. Stefani. Recursive and dynamic software composition with sharing. In Seventh International Workshop on ConentOriented Programming (WCOP02) at ECOOP 2002, Malaga, Spain, June 10 2002.
[BdLH+ 03]
H. Bal, C. de Laat, S. Haridi, K. Jeffery, J. Labarta, D. Laforenza, P. Maccallum, J. Mass´ o, L. Matyska, T. Priol, A. Reinefeld, A. Reuter, M. Riguidel, D. Snelling, and M. van Steen. Next generation grid(s): European grid research 2005 – 2010. Expert group report, European Comission, June 16 2003.
[BL99]
J. Basney and M. Livny. Improving goodput by co-scheduling CPU and network capacity. International Journal of High Performance Computing Applications, 13(3), Fall 1999.
[CBL04]
A. Chakravarti, G. Baumgartner, and M. Lauria. The organic grid: Selforganizing computation on a peer-to-peer network. In Proceedings of the International Conference on Autonomic Computing (ICAC’04), pages 96–103, New York, NY, May 17–18 2004.
[CFF+ 04]
K. Czajkowski, D. Ferguson, J. Frey, S. Graham, T. Maguire, D. Snelling, and S. Tuecke. From Open Grid Services Infrastructure to WSResource framework: Refactoring & evolution, May 3 2004.
[CGB+ 04]
G. Coulson, P. Grace, G.S. Blair, L. Mathy, D. Duce, C. Cooper, W. K. Yeung, and W. Cai. Towards a component-based middleware framework for configurable and reconfigurable Grid computing. In Proceedings of the Workshop on Emerging Technologies for Next Generation Grid (ETNGRID-2004) associated with 13th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE-2004), pages 291–296, Modena, Italy, June 2004. IEEE Computer Society.
[Con04]
Mozart Consortium. The Mozart programming system. http://www.mozartoz.org/, 1998–2004.
[CRF+ 04]
S. Campadello, D. De Roure, B. Farshchian, M. Fehse, C. Goble, Y. Guo, M. Hermenegildo, K. Jeffery, D. Laforenza, P. Maccallum, J. Mass´ o, T. Priol, A. Reinefeld, M. Riguideland W. Schr¨ oder-Preikschat, D. Snelling, D. Talia, and T. A. Varvarigou. Next generation grids 2: Requirements and options for european grids research 2005 – 2010 and beyond. Expert group report, European Comission, July 2004.
[Emm00]
W. Emmerich. Software engineering and middleware: a roadmap. In ICSE – Future of Software Engineering Track, pages 117–129, Limerick, Ireland, June 4–11 2000. ACM Press.
[FKNT02]
I. Foster, C. Kesselman, J.M. Nick, and S. Tuecke. The physiology of the grid: An open grid services architecture for distributed systems integration. Open Grid Service Infrastructure WG, Global Grid Forum, June 22 2002.
[Gan02]
Gannon et al. Programming the grid: Distributed software components, p2p and grid web services for scientific applications. Cluster Computing, 5(3):325–336, 2002.
6
[GGL03]
S. Ghemawat, H. Gobioff, and S.T. Leung. The google file system. In SOSP ’03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29–43. ACM Press, 2003.
[GLS+ 04]
B. Godfrey, K. Lakshminarayanan, S. Surana, R. Karp, and I. Stoica. Load balancing in dynamic structured p2p systems. In Proceedings of the IEEE INFOCOM’04, Hong Kong, March 2004.
[J´eg00]
Y. J´egou. Controlling distributed shared memory consistency from high level programming languages. In IPDPS ’00: Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing, pages 293–300. SpringerVerlag, 2000.
[KCM+ 00]
F. Kon, R.H. Campbell, M.D. Mickunas, K. Nahrstedt, and F.J. Ballesteros. 2K: A distributed operating system for dynamic heterogeneous environments. In 9th IEEE International Symposium on High Performance Distributed Computing (HPDC’00), pages 201–210, Pittsburgh, Pennsylvania, USA, August 1–4 2000.
[KMY+ 05]
F. Kon, J.R. Marques, T. Yamane, R.H. Campbell, and M.D. Mickunas. Design, implementation, and performance of an automatic configuration service for distributed component systems. Software: Practice and Experience, 35(7), May 2005. To apeear.
[KR04]
D. Karger and M. Ruhl. Simple efficient load balancing algorithms for peer-topeer systems. In SPAA ’04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, pages 36–43. ACM Press, 2004.
[Laf02]
Domenico Laforenza. Grid programming: some indications where we are headed. Parallel Computing, 28(12):1733–1752, 2002.
[LT03]
C. Lee and D. Talia. Grid programming models: Current tools, issues, and directions. In F. Berman, A. Hey, and G. Fox, editors, Grid Computing: Making The Global Infrastructure a Reality. John Wiley & Sons, 2003.
[PVBH03]
K. Popov, V. Vlassov, P. Brand, and S. Haridi. An efficient marshaling framework for distributed systems. In Victor E. Malyshkin, editor, Parallel Computing Technologies, 7th International Conference (PaCT 2003), volume 2763 of LNCS, pages 324–331, Nizhni Novgorod, Russia, September 15–19 2003. Springer-Verlag. A revised version to appear in Future Generation Computer Systems, May 2005.
[RD01]
A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Middleware 2001: Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms, volume 2218 of LNCS, pages 329–350, Heidelberg, Germany, November 12–16 2001. Springer-Verlag.
[RH04]
Peter Van Roy and Seif Haridi. Concepts, Techniques, and Models of Computer Programming. MIT Press, 2004.
[SMK+ 01]
I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM ’01 Conference, pages 149–160, San Diego, California, August 2001.
[Tue03]
Tuecke et al. Open grid services infrastructure (OGSI) version 1.0. Global Grid Forum OGSI Working Group, June 27 2003.
[ZHS+ 04]
B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J. Kubiatowicz. Tapestry: A resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications (Special Issue: Recent Advances In Service Overlay Networks), 22(1):41–53, January 2004.
7