CLF/Mekano for building di erent kinds of applications such as document and ..... provide tools to deploy, con gure and monitor the application that may be easily ...
Appears in Proc. of EDOC'99, held in Mannheim (Germany), Sept. 27-30, 1999
CLF/Mekano: a framework for building virtual-enterprise applications Jean-Marc Andreoli, Damian Arregui, Francois Pacull, Michel Riviere, Jean-Yves Vion-Dury, Jutta Willamowski Xerox Research Centre Europe, Grenoble, France Abstract
CLF/Mekano is a distributed object infrastructure oriented towards the high-level coordination of coarse grain components. Unlike other infrastructures of the same class, such as CORBA or DCOM, coordination in CLF/Mekano is built-in at the lowest level, namely at the inter-component communication protocol level, and not as a side service (such as the event, transaction or negotiation services of Corba). Although the CLF protocol is \light-weight" (it relies on very few concepts and only 8 communication \verbs"), it makes the design and implementation of components more complex, but also more valuable if it can be re-used. The Mekano library has been developed in order to deal with this complexity, targeting re-usability. It provides ready-to-use generic classes of customizable components, as well as useful component parts which can be re-assembled according to application speci c needs. Of course, further layers of domain-speci c libraries (so called business object libraries), can then be developed on top of Mekano, to provide ready-to-use components dedicated to speci c business needs (in the line of Enterprise Java Beans).
1 Introduction The growth of the Internet in the last few years was mainly oriented towards the possibility of simply making public information that could be accessed through a universal front end (web browsers) to a usually centralized application. The trend nowadays is more to take full advantage of the rich interconnection capabilities of the Web, to make people and software components spread over the net work together with data (in particular: documents) which may be stored in dierent formats in distributed and heterogeneous storage spaces. Developing applications in this context makes particularly acute the problems of global coherence and coordination of the various parts of the applications. We have worked for several years on CLF, a middleware layer that deals with this kind of problems. However, it appeared that a middleware platform by itself, be it sophisticated, is unsucient and we had to enrich it with a library of coarse grain components allowing (i) designers to build applications by simply assembling together customized instances of these components, and (ii) administrators to monitor applications by separately looking at their components. For example, an application may want that a document, originally in non-printable format and hidden behind a rewall on a remote site, be printed onto an authorized user's local printer. Apart from specifying the coordination which this operation involves, and which a platform like CLF can adequately handle, the application designer has to provide the components which will actually do the job (the document server, the printing service, the software piece that translates the document into the adequate format, etc.). It is not realistic to assume that these components will be re-built from scratch for each new application, and this for two major reasons: rst they usually share a lot of behavior, and second, they may even share the same physical resources. For example, suppose an application is interested in the evolution of various pieces of data, e.g. a stock market index and a section of a nancial report. If both are stored in ODBC databases, the corresponding components will basically share a lot of their behavior, involving ODBC speci c operations to monitor the evolution of a record. If now the two kinds of information happen to be located on the same database, the components may even share access resources. Our goal with the CLF/Mekano infrastructure is to provide in an integrated way both the coordination capabilities of the CLF middleware and the rich library of re-usable components and component parts oered by Mekano. Notice that other distributed object middlewares, such as CORBA[15] and DCOM[14], have evolved in the same direction, by introducing re-usable, business speci c component libraries, and infrastructure layers. The dierence in our case is that the CLF infrastructure relies on a much richer protocol, which is adapted to sophisticated, 1
though still domain independent, coordination schemes. The layers and libraries we add on top are consequently more complex, but also more valuable, because they are intrinsically \coordination-aware". The paper is decomposed in the following manner. Section 2 presents a quick introduction to the basic concept of CLF/Mekano. It is not intended to be a comprehensive description of the CLF itself, the interested reader can refer to [5] for this purpose. Section 3 goes into the details of Mekano speci cally. Section 4 shows how we have used CLF/Mekano for building dierent kinds of applications such as document and knowledge sharing environments, or document-centered work ow. CLF/Mekano has also been used for other applications, not described here, such as electronic commerce and distributed work ow. The interested reader can refer to [4, 3, 2].
2 Overview
2.1 Objectives
The aim of the CLF/Mekano infrastructure is to provide a simple way to build enterprise wide applications. In particular when the dierent entities of the enterprise (or virtual-enterprise) are spread over a large scale distributed system (namely the Internet) and when the following aspects have to be considered: Distributed and coordinated actions are needed to ensure the global consistency of the application. Unlike the centralized case, consistency cannot be guaranteed by the normal execution of imperative programs: some form of controlled non-determinism has to be accepted. This means that alternative choices have to be explored concurrently within the application, possibly involving agreements between distributed actors. On the other hand, nalizing an agreement, i.e. enacting it, in a world-wide application, may require to perform actions on various components hosted by dierent sites in an atomic way. For instance stopping a work ow process instance which is executing across several distributed sites requires a consensus among the dierent sites, even after they have agreed to stop. Another example is to remove two sections of a document (or more precisely, agreed upon versions of the sections) stored independently by two dierent databases, in order to merge them and store the result in a unique document. This simple action requires transactional properties for managing on the one hand the concurrency control aspect and on the other hand the update of the mapping between the logical structure of the document and the physical storage of its dierent parts. The target is a wide area network with its classical drawbacks: { failure: links and/or hosts may fail and the running application should be able to adapt itself to this situation (namely to work in a degraded mode) while waiting for the restart of faulty components. { rewalls: dierent parts of the enterprise (in particular when the latter consists of dierent enterprises collaborating on a common project) may be hidden behind rewalls. This requires access control schemes based on the logic of the application and not on the physical location of documents and/or users, as is usually the case without rewalls. The Internet remains (and we hope for a long time) a heterogeneous world: the dierent entities of the enterprise probably operate with dierent operating systems, using various document management systems and/or databases, and several document formats (possibly proprietary). Distributed applications are intrinsically complex, not only in their design, but also in their deployment and monitoring. Indeed, when several sites are considered it is essential to be able to rely on a sound model of both the application logic and the underlying distributed system in order to allow easy (static or dynamic) recon guration and control. These aspects have been particularly stressed in the design of CLF/Mekano. Its aim is not to de ne a universal environment for building distributed application over a local area network or even the intra-net of a single company. On the contrary, the role of CLF/Mekano is to launch a bridge between legacy applications spread over the dierent entities of a world-wide company or virtual enterprise (extra-net). These individual applications may belong to the Java/Jini[12], Microsoft/DCOM[14] or CORBA/OMG[15] world, but generally lack the infrastructure for their high level inter-operation and coordination.
2.2 What is CLF/Mekano ?
CLF/Mekano provides CLF (Coordination Language Facility), a framework for specifying and enacting the coordination of distributed components, and Mekano1, a library of prede ned components, compliant with the CLF coordination protocol, that can easily be customized and put together to build distributed applications. Mekano also provides libraries of component parts that can be reused and derived to build new components. 1
Mekano stands for Multi-platform Environment for Knowledge Applications in Networked Organizations
2
2.2.1 CLF components
CLF components are objects managing resources and CLF applications are modeled as resource manipulations. There are two ways to access the resources held by a CLF object. Method invocation : this allows access to the component through the standard method invocation protocol. Method execution requires values for the input parameters of the method, performs any kind of modi cation on the resources, and returns values as output parameter(s). This form of interaction can be supported by simple synchronous protocols such as HTTP, making it very easy to connect an ad-hoc user interface or a simple web browser to a CLF component. Service invocation : this is more original and powerful. A component publishes services, which are partial views of its resources. These services can be accessed through three fundamental interaction schemes adapted to distributed computing: negotiation, atomic performance and noti cation. Dedicated CLF components, called Coordinators, fully exploit these interaction schemes to coordinate resource manipulations over distributed component services: Negotiation among multiple distributed components allows to asynchronously and non-deterministically generate combinations of oers, each combination being an alternative candidate for completing a compound coordination task. Oers may have very dierent forms. An oer may be, for example: { a physical resource such as documents or ight-tickets; { a service oer such as a slot in a print-shop schedule or a promise to deliver a product at a certain date; { a pointer to another component capable itself of making oers for some speci c type of services. In any case, an oer holds an action identi er which the client can use to activate the oer. Atomic performance of the actions associated to the oers of a combination obtained in the negotiation phase guarantees that the coordination task is eectively fully realized by the selected combination, or not at all. A classical two phase-commit is used here to ensure atomicity. This means that in CLF, a coordination is viewed as an extended transaction which encapsulates basic atomic transactions. Noti cation, achieved by inserting new resources into a given service, allows to let components know that a coordination task has been realized successfully. Components are free to interpret resource insertion as they want: a resource insertion could result in the availability of a free slot in a print-shop schedule, or the appearance of a new goods provider in a yellow page component, or simply trigger an action such as sending a fax or an e-mail.
2.2.2 The CLF protocol
The three interaction schemes presented above are actually captured at the CLF protocol level through 8 verbs. Each invocation of a service of a component must come up with a verb. Two verbs allow a client (in general a Coordinator) respectively to subscribe to certain service oers, speci ed by input parameters (verb Inquire) and then to retrieve these oers one by one (verb Next), each oer giving details about its conditions through output parameters. The stream of oers is potentially in nite since even though, at some point, no oer may be available, new oers may later become available, either as a side eect of the internal behavior of the component or due to an external noti cation to the service (see verb Insert). Thus a pending inquiry lasts until either the client is no more interested (verb Kill) or the service can guarantee that no new oers will arise for this speci c inquiry. The verb Check allows to verify whether an oer is still valid. Three verbs (Reserve, Con rm, Cancel) are responsible for unrolling the two-phase commit of the performance phase. Basically, the oers involved in a coordination combination are reserved and, if every service agrees, a con rmation is issued to all of them, otherwise a cancellation is sent to those services which had agreed (in order to release the reserved oers). Finally, the verb Insert requests an extension of the capability of a service by inserting a new resource into a given service.
2.2.3 The CLF scripting language
The second originality of CLF, apart from its rich protocol, is its scripting language which allows to specify coordination behaviors combining the three interaction schemes listed above in a very compact manner. A CLF script consists of production-like rules which express the coordination of multiple CLF components by specifying cross-component resource manipulations, where the resources are accessed through the services of the components. Scripts are themselves resources of coordinators, and are enacted as soon as they are inserted in a coordinator. The following toy example (Figure 1) shows how scripts search and combine resources, and solve con icts in their usage. The purpose of this toy coordination is to allocate rooms, publicly available in a set of hotels, to the customers of a set of booking agencies. There are three classes of components to consider here: 3
Interfaces: hotelList(hotel): -> hotel is LOOKUP Broker.hotelList c(component,service,name,date):component,service->name,date is DISPATCH ... Rules: hotelList(hotel)@agencyList(agency)@c[agency,’customer’](name,date)@ v[hotel,’vacancy’](date,’single’,roomnumber) - r[agency,’reservation’](name,roomnumber)
Figure 1: A toy CLF script
hotel3 hotel2
Agency2 Broker
vacancy
customer vacancy
customer hotelList
vacancy
hotel1
Agency1
agencyList ’02/03/1999’,’single’,? ’03/03/1999’,’single’,?
?,? ’agency2’
’hotel1’
?,?
’Tony’,’02/03/1999’
...
’03/03/1999’,’single’,’501’
’Gerhard’,’02/03/1999’ ?
’hotel2’
?
’agency1’
...
’02/03/1999’,’single’,’502’ ’Jacques’,’03/03/1999’ ’02/03/1999’,’single’,’501’
hotelList(hotel) @ agencyList(agency) @ c[agency,’customer’](name,date) @ v[hotel,’vacancy’](date,’single’,roomnumber)
Figure 2: The negotiation phase The agencies: each of them manages a set of resources which are customer requests (with the customer's name and requested date as output parameters) accessible through the service customer(name,date). The hotels: each of them manages a set of resources which are available rooms and schedules, accessible through the service vacancy(date,type,number) with date and type as input parameters. The broker: its resources are a catalog of agencies and a catalog of hotels, accessible through services agencyList(A) and hotelList(H), where the output parameters A and H are pointers resp. to an agency and a hotel. Pointers are here names for look-up in a name service. Figure 1 presents a simple CLF script consisting of only one rule. The interface part shows the mapping between the tokens (hotelList, agencyList, c, v and r) and remote services. Handles to these services are obtained by the coordinator using a look-up to an application-wide name server in which the components publish their services. Notice that the mapping between the token hotelList and the logical name Broker.hotelList is known statically (keyword LOOKUP) while for c it is de ned dynamically (keyword DISPATCH) by retrieving the service 'customer' of the component whose name (in the name server) is given by the instantiation of the variable agency. Figure 2 shows the data ow during the negotiation phase. The plain arrows symbolize the Inquiry invocations and the corresponding replies to the Next invocations. The dashed arrows show how instantiations are propagated along the dierent tokens of a rule. Let us now describe the enactment of the rule. The two services hotelList and agencyList will be inquired returning all the agencies and hotels known by the broker. Let's assume agencyList (resp. hotelList) returns the following ow of oers: 'Agency1', 'Agency2' (resp. 'Hotel1', 'Hotel2' and 'Hotel3') Then, for each agency the service customer is inquired, returning all the customer requests currently pending. Let's assume the following ow of oers is returned:
('Gerhard','02/03/1999'), ('Tony','02/03/1999'), ('Jacques','03/03/1999'), . . .
4
Each of these oers creates a specialized instance of the rule, and all the instances are executed asynchronously and concurrently. Then for each returned customer request, the date parameter is propagated to the invocation of the vacancy service of all the hotels known by the broker. As a result the free single rooms available at the corresponding date will be returned by each hotel: ('02/03/1999','single','501'), ('02/03/1999','single','502'), ('03/03/1999','single','501'), . . . ('02/03/1999','single','B45'), ('02/03/1999','single','B67'), ('02/03/1999','single','C49'), . . .
which produce further specializations of the rule: 1. customer('Gerhard','02/03/1999') @ vacancy('02/03/1999','single','501') 2. customer('Gerhard','02/03/1999') @ vacancy('02/03/1999','single','502') .. . 3. customer('Tony','02/03/1999') @ vacancy('02/03/1999','single','501') 4. customer('Tony','02/03/1999') @ vacancy('02/03/1999','single','502') .. . At this point, the instances are complete and ready to enter the performance phase. Notice that each instance denotes a combination of oers which is a potential candidate for the problem we are solving. However, it is easy to see that con icts exist between the dierent propositions. Indeed a person does not need several rooms and the same room cannot be booked by dierent people at the same time. The performance phase ensures that the actions promised by the services (namely the oers) will be executed atomically. So, if we consider the rst proposition, the coordinator reserves the resources held by the oers hotel3, Agency1, ("Gerhard", "02/03/1999") and ("02/03/1999","single","501"). In case of success, the actions associated to these oers are performed as result of the Con rm operation. In the present case, the resource attached to the oers ("Gerhard", "02/03/1999") (customer request) and ("02/03/1999","single","501") (room availability) are removed from their respective services while no speci c actions are performed for the resources attached to Hotel3, Agency1 since these have to be available for other inquiries. In the noti cation phase, a resource corresponding to an oer ("Gerhard","501") is inserted in the service reservation of the involved agency. The atomic removal of resources will invalidate propositions 2, i.e. the execution of the corresponding instances will be aborted. Similarly, only proposition 4 will succeed. Three points have to be noticed: First, the list of solutions may evolve in time. On the one hand new hotels or agencies can be added on the
y to the broker or disabled by the removal of the corresponding resources. On the other hand the agencies may receive new customer requests at any time, and hotel rooms may be freed (if a con rmed booking is later waived). Second, con icts may occur not only between instances of the same rule, but also between dierent rules and even between dierent scripts, possibly run by separate coordinators on dierent machines. The transactional protocol of the CLF is capable of handling these cases. Third, the action performed in the Con rm operation may dier from one service to another according to the behavior the service wishes to implement. The vacancy service of an agency has a \bag" behavior that implements a physical removal of the corresponding vacancy, since it is a bounded resource. The hotelList service of the broker has a \catalog" behavior since the information it contains can be used several time. If we want to be able to remove resources when for instance a hotel wants to unregister itself from the broker, then we can use another service sharing the resource of hotelList but implementing a bag behavior. Removing the resource corresponding to an hotel will abort any oer involving this resource.
3 CLF/Mekano Infrastructure The idea behind CLF/Mekano is to provide a framework based on the CLF that eases the de nition, the deployment and the monitoring of distributed applications. 1. We use the very simple paradigm of resource based programming supported by CLF, that models coordination of distributed objects as resource manipulation. In our experience, the learning curve for this paradigm is steep at the beginning for standard programmers, but, given the small number of concepts involved and their intuitiveness, the overhead of learning this paradigm quickly pays o. 5
2. We have de ned a library of basic component parts that allows the programmer to de ne components with a minimal eort. This is described in Section 3.1. 3. We have developed a library of basic components, built from the basic component parts mentionned above, that can be used either as such or customized to the user purpose or even used as example when new components have to be written. This is described in Section 3.2. 4. We have de ned a description language for modeling the application functional architecture, the physical distributed system on which it is targeted, and the mapping from components to hosts. It can be used both to deploy the application and, monitor it, and possibly recon gure it dynamically. This is described in Section 3.3.
3.1 CLF/Mekano component API
3.1.1 Bank Behaviors and Resource Managers
A CLF/Mekano component embeds high level mechanisms in order to implement the full CLF protocol. Specializing such a component may seem a dicult task since it involves concurrency management, deadlock avoidance, asynchronicity, and event management. In fact, we provide a capsule that takes into account each of these aspects separately, so that the programmer may concentrate on the speci c characteristics of the components. The role of a CLF service is to manage the dierent verbs of the CLF protocol invoked by external clients. To do so, a Mekano component relies on the one hand on a generic CLF runtime integrated into the component (not described here, for lack of space) and on the other hand on two abstractions called respectively the Bank Behavior and the Resource Manager. A programmer needs only provide code for the Bank behavior and Resource manager, compliant with the API of Figure 3 and detailed below. Bank Behavior
Direct Methods
DoInquire DoNext DoReserve DoCancel DoCommit DoKill DoCheck GetClfResources
User Direct Methods
Service Resource Manager AddResource RemoveResource TestResource ReserveResource ReleaseResource InitHandler RemoveHandler GetNextResource
Bank Behavior
User Service
CLF runtime
Resource Manager User Object
Figure 3: CLF/Mekano component internal block
Bank Behavior The bank behavior may be de ned as the speci c synchronous and sequential behavior of a
service. For instance, to process a Next request, a service may have to block until some new matching resource becomes available. The bank behavior speci es only the synchronous part of this operation, namely to check if a resource is available satisfying the service request, while the CLF runtime included in the component takes the responsibility of the asynchronous part of the operation, blocking the response and waking it up when required. Most of the time the programmer may reuse a Bank Behavior class of the Mekano library, either as such or in order to derive new behaviors. We just give here a avor of the available Bank Behaviors: Bag: behaves as a bag where it is possible to insert and remove resources. Basically, the Inquiry/Next operations collect resources satisfying the service request. The Reserve/Con rm operations remove a resource and Insert adds new resources. Dictionary: derives from Bag and specializes the Insert verb so that, before inserting a resource, it is veri ed whether a resource with the same key (e.g. the n rst elds of a tuple) is already present. If that is the case, the original resource is removed and replaced by the new one, otherwise the new one is simply added. This bank behavior is used for instance by the Mekano Name Server component. 6
ClosedBag and ClosedDictionary: derive respectively from Bag and Dictionary and specialize the Next verb
to return a no-more-value if no resource matching the Inquiry is currently available (or, more precisely, if all the matching resources have already produced oers returned by previous Next operations). Catalog: derives from Bag but specializes the Con rm verb in order to not remove physically the resource. In our toy example of the previous section, it is used to handle the services hotelList and agencyList of the Broker component. Consult, WebServer: only de ne the negotiation verbs of the CLF protocol. With a WebServer, for instance, each Inquiry is translated into an HTTP form understandable by an actual Web server and the returned HTML page is parsed and decomposed into the corresponding oers that are stored and returned normally by each invocation of the Next verb. Notify: de nes only the Insert verb. It is the super class of a long list of bank behavior classes which trigger various actions when a resource is inserted. For instance, one of the services of a coordinator component encapsulates a compiler that pre-compiles the CLF scripts that are inserted and triggers their enactment. Also, Notify banks are used to encapsulate physical devices such as Printer, Fax, email-sender, etc. A Bank Behavior class should be compliant with the API sketched in Figure 3 that mainly de nes the methods that are synchronously invoked by the service handler (part of the component CLF runtime).
Resource Manager A bank behavior de nes the particular behavior of a service with respect to the CLF verbs,
but abstracts away the precise nature of the resources it manipulates. A Resource Manager, on the other hand, de nes the way the physical resources (if any) are stored. The Mekano library provides a set of prede ned Resource Manager classes, which can be customized for application purpose. TupleSpace: oers a simple storage for tuples. PersistentTupleSpace: derives from the previous one and oers in addition a persistent storage. MySQL, mSQL, JDBC: use classical databases. Resources are databases records expressed as tuples. LDAP: uses standard LDAP server as storage. Resources are also tuples. Docushare: uses Docushare server. Resources are les. FileSystem: encapsulates Unix or NT directory. Resources are les. File: uses a simple le as storage. Resources are the lines in the le. It can be used for instance for managing a classical log le as a set of resources. A Resource Manager should be compliant with the API sketched in Figure 3 that provides basic functionality for adding and removing resources, browsing sequentially a set of resources and managing reservations (invoked by the CLF runtime included in the component).
3.1.2 How to write a Mekano component
The rst step in writing a Mekano component is to identify clearly what its resources will be. The CLF notion of resource is extremely generic and can adapt to basically anything, but the quality of an application mainly depends on the right choice of resources. Also, the speci c views of the resources published by the component, i.e. its services, must be carefully chosen. Second, the type of storage for the resources must be decided. The choice may be imposed by legacy conditions if, for instance, the resources are pieces of information concerning the employees of a company, already present in a corporate LDAP server or database. In that case, the resource manager will have to adapt to the legacy at hand. If it is covered by one of the prede ned resource manager classes provided in the Mekano library, then an instance will simply need to be created and customized by providing speci c parameters in order to initialize the link with the actual resource server. Third, the behavior of each of the services must be decided, depending on the semantics of the service. If you just want to read information concerning employees then a Catalog behavior is enough, if you also want to be able to modify data (e.g. through CLF scripts) then a Bag or a Dictionary is more appropriate. Of course, it is possible that two services using dierent bank behaviors share the same set of resources. For instance the Name Server object oers two services: one implemented with the Dictionary and the other with the ClosedDictionary. The rst allows to be blocked until the physical location of a service becomes available while the second returns, when the location is not known, the no-more-value that will stop the enumeration of oers. 7
By separating the resource management aspect from the bank behavior in the name server example, it is simple to customize it, for example to decide whether the information stored by the name server should be made persistent or not. In the same way, an LDAP server may be used for storing the resources of the Name Server simply by changing the resource manager used by its services. Currently, the Mekano component runtime is available in both Python or JPython (an implementation of Python on a Java virtual machine), so that the bank behavior and resource manager classes can be written in either Python or Java (and hence, C or C++). The Mekano library essentially uses the standard libraries of Python (LDAP, MySQL, Mailbox, . .. ) and Java (JDBC, Multi-function devices, ... ).
3.2 Reusable components
Mekano provides a library of reusable components that have been used in the development of various applications (see Section 4). We rst go through a quick overview of the existing components and then present two of them in more details. NameServer: A NameServer component maintains a mapping between keys and entities which both can be of any kind. It oers register and lookup facilities based on the Dictionary and ClosedDictionary bank behaviors. Name servers are essential components in most distributed applications. Unlike other infrastructures, the name service of CLF/Mekano is not part of the infrastructure and can be adapted per-application. Coordinator: Coordinator components, which are speci c to the CLF/Mekano infrastructure, coordinate the dierent phases of CLF service invocations according to CLF scripts. The scripts are the resources of a coordinator which has dedicated services that compile, enact and manage these scripts when they are inserted. A CLF coordinator must be initialized with a name server of the kind mentioned above in order to resolve names which occur statically in the interface section of scripts, for tokens of type LOOKUP, or dynamically during the execution of the scripts, for tokens of type DISPATCH (see Section 2). LogAnalyzer: A LogAnalyzer component gives access to the log of a Coordinator component. Indeed, the log les of a CLF application are spread over the net as the components. By using a File resource manager combined with a Consult bank behavior it is possible to de ne services that manipulate each line of the various log les as a resource. Thus it is possible, e.g. using a CLF script, to consult the dierent log sites and to rebuild the global history on the y. A script may even consult its own log and act accordingly! DocumentManagementSystem: A DocumentManagementSystem component roughly correspond to a le storage component. Its resources are entries mainly composed of a lename, a possible title and its content. They can be read, written and deleted. Two services, sharing these resources, give access either to the name and title or, when required, also the content (e.g. for further processing such as document transformation or moving a document across a rewall). There are currently two kinds of DMS supported. FileSystem: A FileSystem component encapsulates a standard UNIX or Windows NT le directory. The path to the root directory must be passed as initialization parameter. Of course the component must be running in the domain which hosts the le system. It can be used for instance for storing a copy of a shared document on the local le system of a laptop. DocuShare: A DocuShare component encapsulates a collection in a DocuShare server (DocuShare is a Xerox Web-based DMS). The server name and port must be passed as parameters. Additional parameters are the account information so that the component can login to the DocuShare server on the behalf of users. SQLDatabase: An SQLDatabase component gives access to a database server supporting SQL queries. A resource of this component is a row from a particular view of a table in a database. They may be added, removed and modi ed. The server name and port and the database name must be passed as parameters at the initialization of the component. Each service is de ned by a table name and the selected column names. Depending on the security features of each database engine, a user name and password may also be needed. Currently three database engines are supported. mSQL: A shareware database engine. MySQL: Another shareware database engine, very popular and showing excellent performances. dbAnywhere: This is a second-tier server accessed using JDBC. Many commercial database engines may be then connected as the third-tier in the architecture (Sybase, MS-Access have been successfully connected in this way). LDAPDirectory: Very similar to an SQL Database component. A resource is composed of certain elds of an LDAP[10] object type. Both the eld names and the object type are passed as parameters. 8
Mailbox: A Mailbox component gives access to a UNIX-style mailbox. A resource is composed of the header
elds (in fact some of them) and the body of a message. The path to the mailbox le must be passed as initialization parameter. Email: An Email component uses the SMTP server of a Unix machine. A resource is composed of the sender, receivers and body of a message. The information contained in every inserted resource is used to send a new e-mail message. The name of the SMTP server must be passed as initialization parameter. PrinterManager: A PrinterManager component handles a set of printers belonging to the same domain. A resource is composed of a printer identi er and the title and content of a text or Postscript le. Every inserted resource launches a printing job on the appropriate printer. The printing command and the names of the printers must be passed as initialization parameters. The format of the le to be printed is supposed to be correct: the component performs no explicit format transformation; it is often combined with transformation components. TaskManager: A TaskManager component maintains work ow related resources such as task and process states. Transformer: This generic component is aimed at wrapping various document transformations. A resource is composed of a document's name and content and of two empty slots that will eventually hold the document's name and content after the transformation. This provides a uniform interface to any kind of transformation, which then has to be customized for individual purposes. Among those already tested and in use are linguistic tools (text summarization, translation, etc.) and various format transformations from proprietary formats (MS-Word, Powerpoint) to open standards (HTML, Postscript). Let's go into more details about the following two components: The SQLDatabase component : This is a case where the CLF model maps quite straightforwardly onto the capabilities of an external service: each table row is a resource that can be inserted, retrieved using a lter, reserved and possibly consumed. A customized Resource Manager (see Section 3) is in charge of opening and closing connections to the database server and of translating calls to the Resource Manager API into SQL queries. We use a standard Bank Behavior of type Bag. Any number of services can be attached to an SQLDatabase component, corresponding to dierent views of dierent tables. Direct methods are also available in order to provide a simple, user-friendly HTML interface to the component. The Transformer component : This slightly more complex component is provided to let a programmer, unfamiliar with the CLF model, integrate simple transformation tools inside a CLF application. Such components are often used as satellites of a main CLF components to perform pre- and post-transformations on the data processed by that component, without having to fully rewrite it. It is enough to provide a well-behaved transformation function that will be imported and mapped into the CLF model by a custom Bank Behavior. The Bank Behavior invokes this function on an Inquire operation, with input parameters extracted from those of the inquiry. The output result of the function is returned in response to a Next operation. Non-deterministic functions, returning several results (e.g. a translation of an ambiguous piece of text in natural language), can also be handled. A standard TupleSpace Resource Manager is used for this component.
3.3 CLF/Mekano con guration and monitoring tools
Deploying, con guring and monitoring a distributed application is not simple. It implies to describe the logic of the application in term of components which interact together; to describe the underlying physical distributed system on which the application will run; to give the mapping between the logical components and the physical distributed system. Once the application is running, monitoring facilities are needed in order to control if the application works smoothly and to act, whenever necessary, at the component level (e.g. restart a faulty component). To ease this task, visualization tools may be used to reduce the diculty of dealing at the same time with the complexity of the application itself on the one hand and the complexity of the underlying distributed system on the other. They also help capture a global view of the system. In the CLF/Mekano environment, the goal is to provide tools to deploy, con gure and monitor the application that may be easily coupled with internal ad-hoc programming and visualization tools or external user's tools, typically web-browsers. A speci c language has been developed to describe a CLF distributed application to be deployed on a distributed system. The idea is to associate a class to each kind of physical entities of the distributed system (domain, node) and logical entities of the application (application, surrogate and program). Instances of these classes de ne attributes describing the corresponding CLF/Mekano entity and provide methods for acting on them. This lowlevel speci cation, also called a seed le, may be hand-written or automatically generated from higher level tools such as a visual con guration tools [20]. 9
UNIX-WS
NT-WS CLF Object
CLF Object
UNIX-DOMAIN
NT-WS CLF Object
Scripts Object Surrogate
Monitor
call
NT-DOMAIN
Figure 4: CLF application service tool WebBrowser Consult Monitor
Design Tool Seef File
Read
create Coordinator
ClfObject CLF Application
NameServer ClfObject
Figure 5: CLF application monitoring The distributed system is described through domains containing nodes. A domain is associated to a local area network sharing a le system (e.g. via NFS). A node object corresponds roughly to a user account de ned by a physical workstation plus a login name. Thus, it is possible to remotely launch an application component on a given workstation under the identity of a given user. Nodes contained in the same domain share CLF libraries. The distributed application is represented by an application object describing the data shared by the dierent CLF, components which compose it. To each CLF component corresponds a surrogate object. The attributes of the surrogate describe on the one hand the CLF object role in the application (e.g. coordinator, or name server, or user de ned component) and on the other hand its location in the distributed system (i.e. on which node it will run). A surrogate also provides a set of methods for direct interactions with the CLF object (e.g. stop, start, ping, inspect, used for monitoring purpose) as illustrated on Figure 4. A CLF script manipulated by a CLF Coordinator component is described through a program object the attributes of which are the source code and a reference to the coordinator surrogate. A program provides the method compile to execute the script on the associated coordinator. The mapping speci es which node hosts which surrogate by setting the node attribute of the surrogate. The seed le can be used both as a model description of the application and also as a program which deploys the application (fully or only partially). In order to ease the creation of this seed le and to facilitate the deployment and the monitoring of the CLF/Mekano application described by the seed le, a dedicated tool has been designed. It is based on a dedicated Mekano component called a Monitor that manages the seed and all its proxy objects as resources. This component de nes a direct method that returns a Java applet that acts as a con guration tool when the application is not running and a monitoring and dynamic recon guration tool when the application (or part of it) is running. Figure 5 illustrates the possibility to start one part of an application from the monitoring tool and an other part
10
directly from a console. It is important to note that the monitor tool could dynamically, while the CLF application is running, take into account any change done through another instance of the monitoring tool and/or an execution of seed le from a console. Basically, once the monitor has detected a new component registration into the name server, it requests from the latter all the information needed to update the seed held by the monitor. At any time, the seed of a monitor can be downloaded in order to provide a ready to start seed le to be run from a console.
4 Examples In this section we describe two applications developed in the Mekano framework. X-Aware : provides document awareness across heterogeneous document repositories spread over the Internet X-Folders : supports document centered work ow process across distributed virtual organizations Both heavily rely on the library of components provided by Mekano, e.g. wrappers to Document Managements Systems, Databases, Printers, Mailers, Document Transformers etc. The main part of both applications are generic components used as such, only few speci c components had to be designed. For each application we rst provide a short description of its goal and functionalities and then describe the Mekano-based implementation.
4.1 X-Aware
The goal of X-Aware is to provide document awareness across heterogeneous document repositories spread over the Internet. Often, within large organizations, people, distributed over the network, use dierent tools to manage knowledge, to store data and documents. This diversity together with the access constraints tied to any heterogeneous large-scale distributed environment turns document awareness and delivery into a real challenge. X-Aware addresses this problem integrating existing heterogeneous tools, and providing interested users with document awareness through noti cation, and document delivery on demand. X-Aware uses the existing infrastructure within the organization as far as possible. For instance, it is able to access databases or LDAP repositories for automatically extracting user information (needed for user noti cation) such as e-mail addresses, domains etc. It also accesses existing e-mail services, printers and heterogeneous document repositories. To do so, X-Aware heavily uses the Mekano library of components: DocumentRepositories to integrate the storage sources of the documents related to the dierent subjects, Databases to store user speci c information, various components wrapping noti cation tools (such as printers, e-mail and fax), and also transformers to adapt document formats to the requested delivery means. The real value of X-Aware resides not so much in its functionality (there already exists many user awareness tools on the market), but rather in the diversity of the repositories which can be monitored and also the extensibility of the architecture (new components can easily be dynamically added).
4.2 X-Folders
The goal of X-Folders is to support document-centered work ow process within and across organizations without imposing more constraints than necessary. X-Folders makes reference to the well known and well institutionalized (paper) circulation folder: the brown envelope in which you insert papers and on which you just indicate the names of the recipients, and possibly a post-it with what they are expected to do. This envelope then moves somehow magically from pigeon hole to pigeon hole and the required work, based on the good will of the involved people, is eventually performed. What we provide with the electronic version is the same simplicity and exibility but with all the power brought by interconnected computers in terms of speed, distribution and support. X-Folders allows to: integrate heterogeneous Document Management Systems (DMS), construct documents cooperatively in a distributed, safe and ecient manner (i.e. handle the problems of distribution and access rights), work according to a work ow, evolving with the activities of the involved partners, work under evolving concurrency control, adapted to the document maturity, handle the common problem of unavailable people (holidays, sick leave, etc.), support nomadic operation. As X-Aware, X-Folders heavily uses the Mekano library of components: 11
Web Browser
USER 2
USER 1 workspace
task routing script
111 000 Coordinator 000 111 000 111 Tools (editor, spreadsheet etc.)
document repository
XFolder Manager
doc routing script
Document
USER 3
Migrator
Wrapper to Legacy application Direct Access protocol
Set of CLF objects
111 000 Coordinator 000 111 000 111
Coordinator
CLF protocol
Figure 6: Architecture of the Xfolder application
Document Repositories (more precisely FileSystem Directory) components storing the documents involved in
the processes, TaskManager containing the task descriptions associated to the X-Folders, and Coordinators routing the documents to the concerned users, making them accessible even across rewalls. Besides these generic components, X-Folders also makes use of dedicated components built using the CLF/Mekano library of component parts, such as a Relay object used by the Coordinator to migrate documents across rewalls.
5 Conclusion The goal of the CLF/Mekano project is not to develop \yet another infrastructure for distributed object applications". It does not seek to compete with established industrial-strength infrastructures such as CORBA, Java/Jini or Microsoft/DCOM environments. Our goal instead is to complement these infrastructures with high-level coordination facilities capable of bridging together widely heterogeneous components distributed over a large area network such as the Internet. This led us to a light-weight approach based on a small number of concepts | resources, services, and an 8-verb interaction protocol | which model coordination through three basic interaction schemes: negotiation, atomic performance and noti cation. A general trend in more recent developments indicates that our approach corresponds to a real need. For instance, Jini [18, 19] proposes negotiation, transactional capability and tuple spaces as the basic bricks of its framework. The WebDav [21] initiative enriches the classical HTTP protocol by providing primitives for inquiring properties of document and oering basic locking mechanism for higher concurrency control scheme. Finally, the OMG issued a speci cation for a negotiation facility [13] based on normative sequences of speech act exchanges between the participants involved in a negotiation, described through state-transition diagrams. To increase reusability of CLF/Mekano developments, we have decoupled the coordination facilities (the CLF part) from the basic behavior of the components (the Mekano part). For each of these two aspects, we provide tools that help developers build and monitor their applications. Coordination scripts are described in a high-level language with a strong semantics. They are themselves considered as resources, so that they can be manipulated within the applications. As for the basic behavior of the components, they usually need not be built from scratch. They can be derived from existing component behaviors provided with the CLF/Mekano library, using the CLF/Mekano runtime, or even built by re-assembling stand-alone component parts also provided by the library. This approach has been eectively tested in the development of home-grown prototype applications, such as X-Aware and X-Folders, that are currently used in a truly distributed environment in and across our research centers. The CLF/Mekano project integrates ideas and results from several research domains, for instance: (i) distributed arti cial intelligence and multi-agent systems, esp. the work on negotiation protocols between autonomous entities [17, 8], (ii) coordination systems [7, 16], with the stress on heterogeneity of the interacting entities and the 12
notion of encapsulation around legacy applications, (iii) software composition, in particular the derivation of complex object behaviors from simpler ones (e.g. Resource Managers and Bank Behaviors), and of course (iv) wide area distribution, distributed objects infrastructures (like Olan [6]) and models (actors [9, 11], adaptors, lters [22, 1] etc.).
References [1] M. Aksit, K. Wakita, J. Bosch, and L. Bergmans. Abstracting object interactions using composition lters. In R. Gerraoui, O. Nierstrasz, and M. Riveille, editors, Object Based Distributed Processing. Springer Verlag, Berlin, Germany, 1994. [2] J-M. Andreoli, J-L. Meunier, and D. Pagani. Process enactment and coordination. In Proc. of EWSPT'96, pages 195{216, Nancy, France, 1996. [3] J-M. Andreoli and F. Pacull. Distributed print on demand systems in the xpect framework. In Proc. of Int'l Conf. on Trends in Electronic Commerce, pages 141{153, Hamburgh, Germany, 1998. [4] J-M. Andreoli, F. Pacull, and R. Pareschi. Xpect: A framework for electronic commerce. IEEE Internet Computing, 1(4):40{48, 1997. [5] J-M. Andreoli, D. Pagani, F. Pacull, and R. Pareschi. Multiparty negotiation for dynamic distributed object services. Journal of Science of Computer Programming, 31(2{3):179{203, 1998. [6] R. Balter, L. Bellissard, F. Boyer, M. Riveill, and J-Y. Vion-Dury. Architecturing and con guring distributed applications with olan. In Proc. of IFIP Int. Conf. on Distributed Systems Platforms and Open Distributed Processing (Middleware'98), The Lake District, U.K., 1998. [7] N. Carriero and D. Gelernter. Linda in context. Communication of the ACM, 32(9):444{458, 1989. [8] T. Finin, Y. Labrou, and J. May eld. Kqml as an agent communication language. In J. Bradshaw, editor, Software Agents. MIT Press, Cambridge, Ma, U.S.A., 1997. [9] S. Frlund. Coordinating Distributed Objects: An Actor-Based Approach to Synchronization. MIT Press, 1996. [10] H. Howes and M. Smith. LDAP: Programming Directory-Enabled Applications with Lightweight Directory Access Protocol, macmillan technical publishing edition, 1997. [11] N. Jamali, P. Thati, and G. Agha. An actor-based architecture for customizing and controlling agent ensembles. IEEE Intelligent Systems, 14(2), 1998. [12] JINI. http://www.sun.com/jini/. [13] S McConnell. Negotiation Facility (Final Revised Submission). osm.net, mar 1999. [14] Microsoft Corporation. Distributed Component Object Model Protocol DCOM/1.0, draft, nov 1996. [15] OMG/CORBA. http://www.corba.org. [16] G.A. Papadopoulos and F. Arbab. Coordination models and languages. Advances in Computers, 46, 1998. [17] R.G. Smith. The contract net protocol: High level communication and control in a distributed problem solver. IEEE Transactions on Computing, 29(12):1104{1113, 1980. [18] Sun Microsystem. Jini Distributed Leasing Speci cation, jan 1999. [19] Sun Microsystem. Jini Transaction Speci cation, jan 1999. [20] J.-Y. Vion-Dury and F. Pacull. A structured workspace for a visual con guration language. In Proc. of the IEEE symposium on Visual Languages (VL'97), Capri, Italy, 1997. [21] J Whitehead, E and M. Wiggins. Webdav: Ietf standard for collaborative authoring on the web. IEEE Internet Computing, 2(5):34{39, 1998. [22] D. Yellin and E. Strom. Protocol speci cations and component adaptors. ACM Transactions on Programming Languages and Systems, 19(2):292{333, 1997. 13