Programming Languages for Distributed Applications - Université ...

4 downloads 45327 Views 173KB Size Report
Jun 14, 1998 - Distributed Oz, which extends the Oz language with constructs to ... In the current state of the art, developing a distributed application with these ...
Invited paper for New Generation Computing

1

Programming Languages for Distributed Applications Seif Haridi, Peter Van Roy†, Per Brand‡, and Christian Schulte§ June 14, 1998

Abstract Much progress has been made in distributed computing in the areas of distribution structure, open computing, fault tolerance, and security. Yet, writing distributed applications remains difficult because the programmer has to manage models of these areas explicitly. A major challenge is to integrate the four models into a coherent development platform. Such a platform should make it possible to cleanly separate an application’s functionality from the other four concerns. Concurrent constraint programming, an evolution of concurrent logic programming, has both the expressiveness and the formal foundation needed to attempt this integration. As a first step, we have designed and built a platform that separates an application’s functionality from its distribution structure. We have prototyped several collaborative tools with this platform, including a shared graphic editor whose design is presented in detail. The platform efficiently implements Distributed Oz, which extends the Oz language with constructs to express the distribution structure and with basic primitives for open computing, failure detection and handling, and resource control. Oz appears to the programmer as a concurrent object-oriented language with dataflow synchronization. Oz is based on a higher-order, state-aware, concurrent constraint computation model.

1 Introduction Our society is becoming densely interconnected through computer networks. Transferring information around the world has become trivial. The Internet, built on top of the TCP/IP protocol family, has doubled in number of hosts every year since 1981, giving more than 20 million in 1997. Applications taking advantage of this new global organization are mushrooming. Collaborative work, from its humble beginnings as electronic mail and network newsgroups, is moving into workflow, multimedia, and true distributed environments [25, 12, 6, 5]. Heterogeneous and physically-separated information sources are being linked together. Tasks are being delegated across the network by means of agents [26]. Electronic commerce is possible through secure protocols. Yet, despite this explosive development, distributed computing itself remains a major challenge. Why is this? A distributed system is a set of autonomous processes, linked together by a network [48, 30, 8]. To emphasize that these processes are not necessarily on the same machine, we call them sites. Such a system is fundamentally different from a single process. The system is inherently concurrent and nondeterministic. There is no global information nor global time. Communication delays between processes are unpredictable. There is a large probability of localized faults. The system is shared, so users must be protected from other users and their computational agents.  [email protected], Swedish

Institute of Computer Science, S-164 28 Kista, Sweden INGI, Universit´e catholique de Louvain, B-1348 Louvain-la-Neuve, Belgium ‡ [email protected], Swedish Institute of Computer Science, S-164 28 Kista, Sweden § [email protected], German Research Center for Artificial Intelligence (DFKI), D-66123 Saarbr¨ ucken, Germany † [email protected], D´ ep.

Single model with added specifications

Multiple interacting models

Distribution structure

Open computing

Distribution structure Open computing

Application functionality Application functionality Resource control and security

Fault tolerance Resource control and security

Fault tolerance These do not affect functionality

Part of problem Interaction between parts

Figure 1: The challenge: simplifying distributed programming

1.1 Identifying the issues A distributed application should have good perceived behavior, despite the vicissitudes of the underlying system. The application should have good performance, be dependable, and be easily interfaceable with other applications. How can we achieve this? In the current state of the art, developing a distributed application with these properties requires specialist knowledge beyond that needed to develop an application on a single machine. For example, a new client-server application can be written with Java RMI [33, 34]. An existing application can be connected with another through a CORBA implementation (e.g., Orbix) [37]. Yet in both cases the tools are unsatisfactory. Simply reorganizing the distribution structure requires rewriting the application. Because the Java specification does not require time-sliced threads [15], doing such a reorganization in Java may require profound changes to the application. Furthermore, with each new problem that is addressed, e.g., adding a degree of fault tolerance, the complexity of the application increases. To master each new problem, the developer must learn a complex new tool in addition to the environment he or she already knows. A developer experienced only in centralized systems is not prepared. Some progress has been made in integrating solutions to different problem areas into a single platform. For example, the Ericsson Open Telecom Platform (OTP) [11], based on the Erlang language [4, 54], integrates solutions for both distribution structure and fault tolerance. Erlang is network-transparent at the process level, i.e., messages between processes (a form of active objects) are sent in the same way independently of whether the processes are on the same or different sites. The OTP goes far beyond popular platforms such as Java [33, 34] and is being successfully used in commercial telephony products, where reliability is paramount. The success of the Erlang approach suggests applying it to the other problem areas of distributed computing. We identify four areas, namely distribution structure, open computing, fault tolerance, and security. If the application functionality is included, this means that the application designer has five concerns:

 Functionality: what the application does if all effects of distribution are disregarded.  Distribution structure: the partitioning of the application over a set of sites.

2

 Open computing: the ability for independently-written applications to interact with each other in interesting ways.  Fault tolerance: the ability for the application to continue providing its service despite partial failures.  Security: the ability for the application to continue providing its service despite intentional interference. An important part of fault tolerance and security is resource control. A possible approach is to separate the functionality from the other four concerns (see Figure 1). That is, we would like the bulk of an application’s code to implement its functionality. Models of the four other concerns should be small and orthogonal additions. Can this approach work? This is a hard question and we do not yet have a complete answer. But some things can be said. The first step is to separate the functionality from the distribution structure. We say that the system should be both network-transparent and network-aware. A system is network-transparent if computations behave in the same way independent of the distribution structure. Applications can be almost entirely programmed without considering the network. A system is network-aware if the programmer maintains full control over localization of computations and network communication patterns. The programmer decides where a computation is performed and controls the mobility and replication of data and code. This allows to obtain high performance.

1.2 Towards a solution We have designed and implemented a language that successfully implements the first step, i.e., it completely separates the functionality from the distribution structure. The resulting language, Distributed Oz, is a conservative extension to the existing centralized Oz language [10]. Porting existing Oz programs to Distributed Oz requires essentially no effort. Why is Oz a good foundation for distributed programming? Because of three properties [46]:

 Oz has a solid formal foundation that does not sacrifice expressiveness or efficient implementation. Oz is based on a higher-order, state-aware, concurrent constraint computation model. Oz appears to the programmer as a concurrent object-oriented language that is every bit as advanced as modern languages such as Java (see Section 3). The current emulator-based implementation is as good or better than Java emulators [20, 19]. Standard techniques for concurrent object-oriented design apply to Oz [28]. Furthermore, Oz introduces powerful new techniques that are not supported by Java [16].  Oz is a state-aware and dataflow language. This helps give the programmer control over network communication patterns in a natural manner (see Section 4). State-awareness means the language distinguishes between stateless data (e.g., procedures or values), which can safely be copied to many sites, and stateful data (e.g., objects), which at any instant must reside on just one site [52]. Dataflow synchronization allows to decouple calculating a value from sending it across the network [17]. This is important for latency tolerance.  Oz provides language security. That is, references to all language entities are created and passed explicitly. An application cannot forge references nor access references that have not been explicitly given to it. The underlying representation of language entities is inaccessible to the programmer. Oz has an abstract store with lexical scoping and first-class procedures (see Section 7). These are essential properties to implement a capability-based security policy within the language [49, 53]. Allowing a successful separation of functionality from distribution structure puts severe restrictions on a language. It would be almost impossible in C++ because the semantics are informal and unnecessarily complex and because the programmer has full access to all underlying representations [47]. It is possible in Oz because of the above three properties. So far, it has not been necessary to update

3

the language semantics more than slightly to accommodate distribution.1 This may change in the future. Furthermore, work is in progress to separate the functionality from the other three concerns. Currently, Distributed Oz provides the language semantics of Oz and complements it in four ways:

 It has constructs to express the distribution structure independently of the functionality (see Section 4). The shared graphic editor of Section 2 is designed according to this approach.  It has primitives for open computing, based on the concept of tickets (see Section 5). This allows independently-running applications to connect and seamlessly exchange data and code.  It has primitives for orthogonal failure detection and handling, based on the concepts of handlers and watchers (see Section 6). This allows to build a first level of fault tolerance.  It supports a capability-based security policy and has primitives for resource control based on the concept of virtual site (see Section 7). In Distributed Oz, developing an application is separated into two independent parts. First, only the logical architecture of the task is considered. The application is written in Oz without explicitly partitioning the computation among sites. One can check the safety and liveness properties2 of the application by running it on one site. Second, the application is made efficient by specifying the network behavior of its entities. In particular, the mobility of stateful entities (objects) must be specified. For example, some objects may be placed on certain sites, and other objects may be given a particular mobile behavior (such as state caching). The Distributed Oz implementation extends the Oz implementation with four non-trivial distributed algorithms. Three are designed for specific language entities, namely logic variables, objectrecords, and object-state. Logic variables are bound with a variable binding protocol (see Section 4.2). Object-records are duplicated among sites with a lazy replication protocol (see Section 4.3). Object-state moves between sites with a mobile state protocol (see Section 4.4). The fourth protocol is a distributed garbage collection algorithm using a credit mechanism (see Section 4.5). Garbage collection is part of the management of shared entities, and it therefore underlies the other three protocols.

1.3 Outline of the article The rest of this article consists of six parts. Section 2 gives the design of a shared graphic editor in Distributed Oz. It shows how the separation between functionality and distribution works in practice. Section 3 gives an overview of the Oz language and its execution model. Oz has deep roots in the logic programming and concurrent logic programming communities. It is illuminating to show these connections. Section 4 presents Distributed Oz and its architecture, and explains how it separates functionality from distribution structure. The four protocols are highlighted, namely distributed logic variables, lazy replication of object-records, mobility of object-state, and distributed garbage collection. Finally, Sections 5, 6, and 7 discuss open computing, failure detection and handling, and resource control and security. These three sections are more speculative than the others since they describe parts of the system that are still under development.

2 Shared graphic editor Writing an efficient distributed application can be much simplified by separating the functionality from the distribution structure. We have substantiated this claim by designing and implementing a prototype shared graphic editor, an application which is useful in a collaborative work environment. The editor is seen by an arbitrary number of users. We wish the editor to behave like a shared virtual environment. This implies the following set of requirements (see Figure 2). We require 1 For 2A

example, ports have been changed to model asynchronous communication between sites [52]. fortiori, correctness and termination for nonreactive applications.

4

A1

B1 Informal specification: Network

A2

B2

Intranets + Internet

Contractor A

Contractor B

R

C



All users see the same design



Users are not bothered by the network

Consultant

Study bureau

Figure 2: A shared graphic editor

GE

GE Graphic entities UM User manager DB Display broadcaster

UM

DB

GS

WM

CM

CM Client manager WM Window manager GS Graphics subsystem

... ...

GS

WM

CM

Figure 3: Logical architecture of the graphic editor

that all users be able to make updates to the drawing at any time, that each user sees his or her own updates without any noticeable delays, and that updates must be visible to all users in real time. Furthermore, we require that the same graphic entity can be updated by multiple users. This is useful in a collaborative CAD environment when editing complex graphic designs. Finally, we require that all updates are sequentially consistent, i.e., each user has exactly the same view of the drawing. The last two requirements is what makes the application interesting. Using IP multicast to update each user’s visual representation, as is done for example in the LBL Whiteboard application, 3 does not satisfy the last two requirements.

2.1 Logical architecture Figure 3 gives the logical architecture of our prototype. No assumptions are made about the distribution structure. The drawing state is represented as a set of objects. These objects denote graphic entities such as geometric shapes and freehand drawing pads. When a user updates the drawing, either a new object is created or a message is sent to modify the state of an existing object. The object then posts the update to a display broadcaster. The broadcaster sends the update to all users so they can update their displays. The execution path from user input to display update is shown by the heavy curved line. The users see a shared stream, which guarantees sequential consistency. New users can connect themselves to the editor at any time using the open computing ability of Distributed Oz. The mechanism is based on “tickets”, which are simply text strings (see Section 5). Any Oz process that knows the ticket can obtain a reference to the language entity. The graphic editor creates a ticket for the User Manager object, which is responsible for adding new users. A new user is added by using the ticket to get a reference to the User Manager. The two computations 3 Available

at http://mice.ed.ac.uk/mice/archive.

5

Server site GE

UM

DB

Client 1 site GS

WM

CM

... ...

Client n site GS

WM

CM

Figure 4: Editor with client-server structure

Cached objects

Server site GE

UM

DB

Client 1 site GS

WM

CM

... ...

Client n site GS

WM

CM

Figure 5: Editor with cached graphic state

then reference the same object. This transparently opens a connection between two sites in the two computations. From that point onward, the computation space is shared. When there are no more references between two sites in a computation, then the connection between them is closed by the garbage collector. Computations can therefore connect and disconnect seamlessly.

2.2 Client-server structure To realize the design, we have to specify its distribution structure. Figure 4 shows one possibility: a client-server structure. All objects are stationary. They are partitioned among a server site and one site per user. This satisfies all requirements except performance. It works well on low-latency networks such as LANs, but performance is poor when a user far from the server tries to draw freehand sketches or any other graphic entity that needs continuous feedback. This is because a freehand sketch consists of many small line segments being drawn in a short time. In our implementation, up to 30 motion events per second are sent from the graphics subsystem to the Oz process. Each line segment requires updating the drawing pad state and sending this update to all users. If the state is remote, then the latency for one update is often several hundred milliseconds or more, with a large variance.

6

2.3 Cached graphic state To solve the latency problem, we change the distribution structure (see Figure 5). We refine the design to represent the graphic state and the display broadcaster as freely mobile (“cached”) objects rather than stationary objects. The effect of this refinement is that parts of the graphic state are cached at sites that modify them. Implementing the refinement requires changing some of the calls that create new objects. In all, less than 10 lines of code out of 500 have to be changed. With these changes, freehand sketches do not need any network operations to update the local display, so performance is satisfactory. Remote users see the sketch being made in real time, with a delay equal to the network latency. How is this magic accomplished? It is simple: whenever an object is invoked on a site, then the mobile state protocol first makes the object’s state pointer local to the site (see Section 4.4). The object invocation is therefore a local operation.

2.4 Push objects and transaction objects More refined editor designs can take advantage of additional distribution behaviors of objects. For example, the design with cached objects suffers from two problems:

 Users who simultaneously modify different graphic entities will interfere with each other through the display broadcaster. The latter will bounce between user sites, causing delays in updating the displays. This problem can be solved by using a push object, which multicasts state updates to all sites that reference the object. One possibility is to make the display broadcaster into a push object, thus maintaining sequential consistency while taking advantage of a multicast network protocol. Another possibility is to make each graphic entity into a push object. In this case, the users may see inconsistent drawings.  If a user wishes to modify a graphic entity, there is an initial delay while the graphic entity’s state is cached on the user site. This problem can be solved by using a transaction object, which does the state update locally, while requesting a global lock on the object. The state update will eventually be confirmed or rejected. Both push and transaction objects maintain consistency of object updates: the object is defined by a sequence of states. It follows that there is still one graphic state and updates to it are sequentially consistent. The editor therefore still supports collaborative design. What changes is how the state sequence is seen and how it is created. Updating the editor to use either or both of these object types may require changing its specification or logical architecture. For example, the specification may have to be relaxed slightly, temporarily allowing incorrect views. This illustrates the limits of network-transparent programming. It is not possible in general to indefinitely improve the performance of a given specification and logical architecture by changing the distribution structure. At some point, one or both of the specification and architecture must be changed.

2.5 Final comments Designing the shared graphic editor illustrates the two-part approach for building applications in Distributed Oz. First, build and test the application using stationary objects. Second, reduce latency by carefully selecting a few objects and changing their mobility behavior. Because of transparency, this can be done with quite minor changes to the code of the application itself. This can give good results in many cases. To obtain the very best performance, however, it may be necessary to change the application’s specification or architecture. In both the stationary and mobile designs, fault tolerance is a separate issue that must be taken into account explicitly. It can be done by recording on a reliable site a log of all display events. Crashed users disappear, and new users are sent a compressed version of the log. Primitives for fault tolerance are given in Section 6.

7

Dataflow S1 threads

Abstract store

S2

Z

...



Sn



Block on data availability Execute statement sequences

Not physical memory! ● Contains variables & bindings ● Only allows operations that are legal for the entities involved ●

X=23

Y=person(age:25)

Figure 6: Computation model of OPM

S ::=

j j j j j j j

S S X=f(l1 :Y1 ... ln :Yn ) j X= j X= j {NewName X} local X1 ... Xn in S end j X=Y proc {X Y1 ... Yn } S end j {X Y1 ... Yn } {NewCell Y X} j {Exchange X Y Z} j {Access X Y } case X==Y then S else S end thread S end j {GetThreadId X} try S catch X then S end j raise X end

Sequence Value Variable Procedure State Conditional Thread Exception

Figure 7: Kernel language of OPM

In general, mobile objects are useful both for fine-grained mobility (caching of object state) as well as coarse-grained mobility (explicit transfer of groups of objects). The key ability that the system must provide is transparent control of mobility, i.e., control that is independent of the object’s functionality. Sections 3.2 and 4 explain briefly how this is done in Distributed Oz. A full explanation is given in [52].

3 Oz Oz is a rich language built from a small set of powerful ideas. This section attempts to situate Oz among its peers. We summarize its programming model and we compare it with Prolog and with concurrent logic languages. The roots of Oz are in concurrent and constraint logic programming. The goal of the Oz project is to provide a firm foundation for all facets of computation, not just for a declarative subset. The semantics should be fully defined and bring the operational aspects out into the open. For example, concurrency and stateful execution make it easy to write programs that interact with the external world [19]. True higher-orderness results in compact, modular programs [1]. First-class computation spaces allow to program inference engines within the system. For example, it is easy to program multiple concurrent first-class Prolog top levels, each with its own search strategy [41]. Section 3.1 summarizes the Oz programming model, including the kernel languages and the abstractions built on top of it. Section 3.2 illustrates Oz by means of a nontrivial example, namely the implementation of remote method invocation. Section 3.3 compares Oz and Prolog. Finally, Section 3.4 gives the history of Oz from a concurrent logic programming viewpoint.

8

3.1 The Oz programming model The basic computation model is an abstract store observed by dataflow threads (see Figure 6). A thread executes a sequence of statements and blocks on the availability of data. The store is not physical memory. It only allows operations that are legal for the entities involved, i.e., no type casting or address calculation. The store has three compartments: the constraint store, containing variables and their bindings, the procedure store, containing procedure definitions, and the cell store, containing mutable pointers (“cells”). The constraint and procedure stores are monotonic, i.e., information can only be added to them, not changed or removed. Threads block on availability of data in the constraint store. The threads execute a kernel language called Oz Programming Model (OPM) [44]. We briefly describe the OPM constructs as given in Figure 7. Statement sequences are reduced sequentially inside a thread. Values (records, numbers, etc.) are introduced explicitly and can be equated to variables. All variables are logic variables, declared in an explicit scope defined by the local construct. Procedures are defined at run-time with the proc construct and referred to by a variable. Procedure applications block until their first argument refers to a procedure. State is created explicitly by NewCell, which creates a cell, an updatable pointer into the constraint store. Cells are updated by Exchange and read by Access. Conditionals use the keyword case and block until the condition is true or false in the constraint store.4 Threads are created explicitly with the thread construct and have their own identifier. Exception handling is dynamically scoped and uses the try and raise constructs. Full Oz is defined by transforming all its statements into this basic model. Full Oz supports idioms such as objects, classes, reentrant locks, and ports [44, 52]. The system implements them efficiently while respecting their definitions. We define the essence of these idioms as follows. For clarity, we have made small conceptual simplifications. Full definitions may be found in [16].

 Object. An object is essentially a one-argument procedure {Obj M} that references a cell, which is hidden by lexical scoping. The cell holds the object’s state. The argument M indexes into the method table. A method is a procedure that is given the message and the object state, and calculates the new state.  Class. A class is essentially a record that contains the method table and attribute names. When a class is defined, multiple inheritance conflicts are resolved to build its method table. Unlike Java, classes in Oz are pure values, i.e., they are stateless.  Reentrant lock. A reentrant lock is essentially a one-argument procedure {Lck P} used for explicit mutual exclusion, e.g., of method bodies in objects used concurrently. P is a zeroargument procedure defining the critical section. Reentrant means that the same thread is allowed to reenter the lock. Calls to the lock may therefore be nested. The lock is released automatically if the thread in the body terminates or raises an exception that escapes the lock body.  Port. A port is an asynchronous channel that supports many-to-one communication. A port P encapsulates a stream S. A stream is a list with unbound tail. The operation {Send P M} adds M to the end of S. Successive sends from the same thread appear in the order they were sent.

3.2 Oz by example It is not the purpose of this article to give a complete exposition of Oz. Instead, we present Oz by means of a nontrivial example program that is interesting in its own right. We show how to implement active objects in Oz, and as a corollary, we show that the same program implements remote method invocation in Distributed Oz. An active object is an object with an associated thread 4 The

keyword if is reserved for constraint applications.

9

proc {NewStationary Class Init ?StatObj} Obj={New Class Init} S P={NewPort S} N={NewName} in thread {ForAll S proc {$ M#R} thread try {Obj M} R=N catch E then R=E end end end} end proc {StatObj M} R in {Send P M#R} case R==N then skip else raise R end end end end

Figure 8: RMI part 1: Create a stationary object from any class

class Counter attr i meth init i