Addressing Scalability Issues Using the CLF Middleware - CiteSeerX

29 downloads 6526 Views 116KB Size Report
... corporate inter- nal information sources (document management systems, ... Printer Manager print. DMS. Coordinator. Corporate. LDAP documents.com. Database ..... corresponding rule instance is enacted, the ticket resource consumed ...
Addressing Scalability Issues Using the CLF Middleware Dami´an Arregui, Franc¸ois Pacull and Jutta Willamowski Xerox Research Centre Europe 6, chemin de Maupertuis, 38240 Meylan, France fDamian.Arregui, Francois.Pacull, [email protected] Abstract This article illustrates how to easily address scalability issues within distributed applications built on top of the Coordination Language Facility (CLF) middleware. CLF provides a distributed object computing framework with high expressiveness, flexibility and dynamicity. Our solutions rely on a core set of techniques such as replication, caching and distribution. We decline them along a geographical, a numerical, and an administrative dimension. We illustrate our approach with examples taken from Yaka, an application in the field of knowledge management and document awareness. In the context of this application we show in detail how the coordination model of the CLF and its associated scripting language enable straightforward solutions to various scalability issues.

1 Introduction Scale, and hence scalability, are over used terms in the field of distributed systems. As a preliminary step, let’s try to better understand their meaning. Scale can be declined along three dimensions [7]:

 geographical: how widely are the components of the system distributed across the nodes of a network ;  numerical: how many users, resources and components belong to the system ;  administrative: how many different organizations host system components. A system is said to be scalable if it can transparently handle scale in all three dimensions without drastically loosing performance or increasing complexity. To achieve this property, a system must solve issues related to reliability, system load and administration. A reliable system should continue to work (at least in a degraded mode) even if some of its components are unreachable or unavailable. The wider a system is distributed,

the more it will depend on the underlying network infrastructure. In a similar way, the more components the system has, the higher are the chances that some of them will be faulty. Even the unavailability of a simple administrative task performed on a component might alter the system behavior in an unacceptable way. A scalable system must be designed to support a growing amount of data, users and processing requests. It must balance the load across its components. Otherwise a particular component may become overloaded, thus unavailable, and compromise the functionality of the whole system. Administration facilities are mandatory for applications that have to be deployed over several organizations. In order to reduce the overall complexity of a system, on one hand the administrative burden must be distributed over various local system administrators managing the corresponding local components of the system. On the other hand, the global administrator, responsible for the system as a whole, needs an aggregated view over all components. The local and global administrators thus need corresponding appropriate tools for application deployment, monitoring, and administration. In this article, we claim that the original combination of features in the CLF middleware are particularly appropriate to address the above-mentioned issues. It allows to easily put in place classical techniques such as replication, caching and distribution [7]. We illustrate this with examples taken from Yaka, a CLF-based application in the field of knowledge management and document awareness. Section 2 describes the Yaka application. Section 3 presents the CLF key features relevant for achieving scalability. Sections 4, 5 and 6 show in more detail how we take advantage of these features to address geographical, numerical, and administrative scalability within Yaka. Section 7 discusses related work. Finally, section 8 concludes and describes some future work.

2 A Sample Application: Yaka We have chosen Yaka, an application in the field of knowledge management and document awareness to illustrate scalability problems and the proposed solutions through practical examples.

2.1

Functional Overview

Yaka addresses the problem of document awareness across a worldwide virtual enterprise. Xerox, our current test bed, has all the characteristics of such a setting: a large number of users and information sources hosted by independent organizations and widely distributed across the Internet. The challenge is to keep interested users aware of new documents published at selected information sources. In the following we describe the main aspects of Yaka functionality. Figure 1 gives an overview of Yaka (for a more detailed description see [4]). Subject Definition. Information Source components encapsulate different file systems, Web services, document management systems, or any other entity where documents are published. Properties concerning the contained documents (its parent collection or folder, type, or keywords, etc.) allow to define partial views on these information sources. Yaka aggregates a set of such views to define subjects managed within the Subject Directory component. Then it monitors the corresponding information sources in order to detect new documents. Notification. Users may subscribe to one or more subjects in order to get notified whenever new documents are published. However, unlike other similar systems, Yaka provides additional meta-information bound with the notification message. It helps the user determine the relevance level of the document. This meta-information is not only based on what is directly provided by the Information Source but also remotely computed on external services, this is the case for the document summary and keywords. Yaka uses the Mailer component to send the corresponding e-mail notification to the users. Delivery. Each notification e-mail contains a control panel. This control panel allows the user to interact with Yaka to request the full content of a document. For document delivery the user has the choice among all locally available media, e.g. e-mail, fax, local file systems and printers. Delivery Medium components uniformly encapsulate all these devices.

Document Processing and Transformation. To obtain document meta-information or whenever needed for document delivery, Yaka performs document processing and transformation. It therefore uses linguistic and transformation services from Document Service Center components. For example an MS-Word document is transformed into ASCII format before summarization 1 . Personalization. Various aspects of the application behavior can be customized on a per user basis. For each subject, the user defines personal notification and delivery preferences including e-mail address, fax number, printers. The user can also select an automatic delivery scheme, where Yaka directly delivers every published document for the particular subject through a preselected medium, skipping the notification phase, e.g. copying the document directly into a local directory on his laptop. The Profile Directory component manages all user related information, and their preferences for the different subscribed subjects. Document Archive. The above described notification and delivery scheme implements a push model. Yaka offers also features implementing a pull model: through the Web interface of the Profile Directory component users can access all the already processed documents, along with available meta-information. From there they also can request the delivery of any archived document. Administration. Through the Web interface of the Configuration Manager component, the system administrators can dynamically control and update the available set of information sources and delivery services. To the application the Configuration Manager provides Yellow Pages services.

2.2

Deployment Constraints

To deploy Yaka across Xerox we had to cope with many constraints coming from the existing infrastructure. First, the various Information Source components must often be hosted at the sites where the information source itself is located. This concerns most of the corporate internal information sources (document management systems, databases or repositories) that are distributed worldwide. They tend to have no networked API (e.g. file system, adhoc databases) and thus require local encapsulators. Even worse, they may be hidden behind organizational fire walls and thus require a virtual access channel. The external information sources (public Web sites, content providers portals, digital libraries, and search engines) impose in general fewer constraints. Some are easily accessible through 1 The

summarizer service does not handle MS-Word format.

XRCE Cambridge Web Service

Rochester

DMS

InfoSource N Yaka Admin

launch / shutdown infosources or delivery media

Configuration Manager

Printer

InfoSource 1 InfoSource n

Admin

InfoSource InfoSource

InfoSource

do

nt me cu do

cu m

Printer Manager

en t

document

print Xerox

Mailer

InfoSource

Private Directory copy

do

Subject Directory

Configuration Manager

cu

me

nt

cu

Profile Directory

LDAP

document

transformed

us

Yaka

Document Service Center

Corporate LDAP

t

Profile Directory

Subject Directory

Private Directory

Coordinator

Printer

g

N request

jec

lo

er in

f or

n

document

Mailer

User

sub

on

ati

fic oti

pr o fil e ma tio n

do

send

Coordinator

or

nt me

InfoSource

Log Manager

Document Service Center InfoSource

Database

XRCE Grenoble

documents.com

PARC

Dataflow in the Coordination

Yaka Specific Components

Administration (launch/stop/...)

Local Area Network CLF/Mekano Library Objects

Legacy Applications

Interaction with the User

Figure 1. (a) Functional viewpoint. the Web, others understand only particular protocols (e.g. Z39.50 for digital libraries or SQL over TCP for databases). Second, the users are spread across the various research centers and business groups. As some of the delivery media rely on the local infrastructure (e.g. printers, local file systems), the system also has to respect their localization constraints. Last, most of the document service centers (e.g. format transformation, summarization) have to run on some powerful dedicated servers. This induces other constraints. For instance, we make use of documents.com, a server located in the Xerox Palo Alto Research Center (PARC) that provides facilities for most document transformations. This server is accessible through an HTTP/HTML interface. Some other components have no particular localization constraints and can be hosted anywhere. This concerns in particular the application-specific components and the coordinator.

3 The CLF Middleware Yaka is built on top of the Coordination Language Facility (CLF), a platform developed at XRCE Grenoble. CLF is a middleware aimed at coordinating distributed active software components over a Wide Area Network (typically the Internet). It allows to easily combine legacy applications with pure CLF components in sophisticated coordination schemes such as negotiation and distributed transac-

(b) Deployment example. tions [3]. In this article, we will not describe in detail the CLF but just recall basic information required to make this document self contained. More detailed information can be found in [1]. CLF consists of two parts: the underlying object model and protocol, and the coordination scripting facility. We first describe them respectively and then illustrate the CLF scripting facility through an example.

3.1

Main Features

Object Model and Protocol. The CLF object model enriches other common models [8, 2, 6] of software objects by viewing objects as resource managers, thus separating the internal object state (the resources themselves) from their management state. Primitives to interact with objects are introduced to (i) inquire and negotiate object capabilities in terms of resource availability, (ii) perform basic transaction operations over the resources of several objects (two-phase commit), and (iii) request new resources to be inserted. This enriched interaction model is characterized by a set of 8 interaction verbs together with a protocol describing the correct sequences of invocations of these verbs, and their intended meaning in terms of resource manipulations. The interface of a CLF object distinguishes between “CLF services”, accessed through the CLF interaction protocol, and “regular methods”, accessed through a traditional request/answer protocol (of type RPC - Remote Procedure

Call). The first one is used by the coordinator when it is required to make atomic manipulations of resources belonging to different CLF objects. The second is mainly used for Web based GUI that triggers a manipulation involving resources from a single object and renders the HTML page generated as return value. Scripting Facility. The CLF coordination scripting facility allows high-level declarative specifications of coordinated invocations of CLF object services. A coordination is viewed here as a complex block of inter-related manipulations (removal, insertion, etc.) of the resources held by a set of objects. Using rules, CLF scripts describe the expected global behavior of such blocks in terms of resulting resource transformations, but abstracts away from the detailed sequencing of invocations of the corresponding CLF interaction verbs required to achieve such a behavior. It is this abstraction feature which considerably simplifies the design and verification of coordination scripts and makes them highly platform independent (i.e. portable), very compact and readable (i.e. easy to maintain). In a CLF application, coordination scripts are enacted by CLF objects called coordinators. Like all CLF objects, coordinators manage resources, which are the CLF coordination scripts themselves and the rules they contain. When a script is inserted into a coordinator, it is immediately interpreted. This allows for instance to generate CLF rules on-the-fly and to enact them. For instance, this feature is used for dynamic load balancing as described in section 5.3.

3.2

Processing a Sample Script

Script. Before going into details let’s stress that the presented rule is part of the summarization script of the Yaka application. It manages the summarization for small documents: the document content itself is taken as summary.

rules: status(inforsourceId,docId,’toSummarize’) @ accessRight(infosourceId,docId,’paid’) @ ‘docInfo(infosourceId,’Size’,docId,docSize) @ smallerThan(docSize,’512’) @ ‘docInfo(infosourceId,’Content’,docId,content) newDocStatus(infosourceId,docId,’summarized’) @ summary(infosourceId,docId,content) end

Let us comment this script in more detail. The interfaces section defines the mapping of tokens used in the rules to actual CLF objects and services. The tokens docStatus, accessRight, docNewStatus and summary refer to CLF services declared within the CLF objects ConfigurationManager and SubjectDirectory. The LOOKUP directive statically defines the logical name of the corresponding services at compilation time. The token docInfo, on the contrary, is not statically bound to a given object and service, but dynamically linked to a set of objects and services. Indeed, the DISPATCH directive considers the first two arguments, known only at run time, respectively as the object and the service name; the remaining arguments are normal parameters for the corresponding service. Tokens can also make reference to local processing, e.g. simple computations (COMPUTE, not used here) or verification of constraints (ASSERT) (e.g. smallerThan). For each token, the parameters occur between parentheses. The output parameters are underlined. In the rest of the document, we use simplified CLF rules for which we do not provide the corresponding interface section. We will always underline the output parameters of the services. Processing. When this script is inserted (as a resource) in a coordinator, the coordinator processes it as follows:

interfaces: docStatus(inforsourceId,docId,status): -> infosourceId,docId,status is LOOKUP ConfigurationManager.Status accessRight(infosourceId,docId,right): infosourceId,docId,right -> is LOOKUP ConfigurationManager.AccessRight docInfo(obj,srv,docId,value): obj,srv,docId -> value is DISPATCH smallerThan(size1,size2) : size1,size2 -> is ASSERT int size1, int size2 size1 < size2 newDocStatus(inforsourceId,docId,status): infosourceId,docId,status -> is LOOKUP ConfigurationManager.Status summary(infosourceId,docId,summary): infosourceId,docId,summary -> is LOOKUP SubjectDirectory.Summary

1. It first tries to find resources satisfying the tokens on the left hand side of the rule: The token status returns resources corresponding to documents to be summarized. The parameter ’toSummarize’ given as input allows to identify for each returned resource a couple < inforsourceId >, < docId > corresponding to a document to summarize and the information source containing it. Such resources are inserted by other Yaka rules for each newly detected document. The token accessRight verifies that a resource granting access to the document is available (e.g. in case you have to pay fee for accessing the document). The token docInfo is dynamically bound to the specified information source and services: The information

source is provided by the status token and the services ’Size’ and ’Content’ (later in the rule) are constants2 . DocInfo returns respectivelly the size and the content of the document. The token smallerThan implements an ASSERT which verifies that the document is smaller than 512 characters. 2. The coordinator then fully instantiates the rule for each set of consistent values infosourceId, docId, docSize, and content satisfying the tokens specified on its left hand side. It transactionally extracts and consumes the resources corresponding to the status and accessRight tokens. The back-quote (‘) before the docInfo tokens prevents the corresponding resources from being consumed. 3. The coordinator finally inserts resources corresponding to the status and the summary tokens specified on the right hand side of the rule.

3.3

Using CLF features for Scalability

Classical approaches for scalability implement replication, caching and distribution. Using the CLF, this can be done easily and flexibly. The application components can be dynamically deployed on the different involved sites. They can be migrated from one site to another if necessary or replicated on several sites. Local data can be assigned to local components to improve locality. Yellow pages services can return different instances of a service depending on the requester location. The scripts handling the component interaction can be dynamically modified. They can be distributed to different site’s coordinators to improve locality or replicated on several coordinators to improve both the global performance and the reliability of the system. This is possible because the transactional properties ensures that all coordinators holding a rule will try to perform the corresponding actions but that only one will eventually succeed. Indeed, the first completed transaction will consume the required resources, preventing the competing transactions to succeed. Another useful feature is the DISPATCH directive that allows dynamic name resolution at run time performed by the coordinator when executing a script. We used the various CLF features described above to make Yaka scalable. In the following sections we present the solutions we adopted and describe in detail how they have been implemented on top of the CLF. 2 Here the

service names are constants but they can also be variables.

4 Geographic Scalability As described in section 2, Yaka is geographically distributed (in a network sense). This induces potential communication problems such as network failures, congestion and low bandwidth. Ideally a geographically scalable system has the following three characteristics: maximal locality, fault tolerance, and graceful degradation. In the following we describe them in more detail and show how we achieve them with CLF.

4.1

Maximal Locality

To optimize the service, to minimize the network load and to reduce the dependency on external hardware components (links, hosts) it is common sense that processing should be done locally whenever possible. For instance a document stored in an information source within the XRCE Grenoble Lab that should be delivered through a printer in the same Lab should not go to PARC in order to be transformed into Postscript if a local transformer is available. Yaka is distributed over several sites and each site hosts various instances of Information Source, and Delivery Medium, possibly a Document Service Center, and users. Some components have to be deployed at the site of their physical location, according to the constraints described in section 2.2. Some application-specific components were initially deployed centrally for the whole application (see figure 1-(b)). For instance, all users went through the same central Profile Directory to access the main Yaka GUI. Similarly a single central Coordinator enacted all application processes over the different sites. In some cases, it happened that documents were shipped over a transatlantic Internet link several times before delivery, even if the user and the information source providing the document were col-located. This raised problems when the network went down and when huge documents had to be processed. Indeed network failures or low bandwidth between the different sites then rapidly inhibited or degraded the global functionality of the application. To palliate these problems, we have to minimize the dependencies on the underlying network infrastructure: we need to maximize locality, i.e. to bring data, services and process enactment together as close as possible, ideally on the same site (i.e. LAN). As we have already seen, this may or may not be possible depending on the existent distribution constraints. Our solution has two main aspects. First, we replicate the application-specific components that have no location constraints on the different sites. This concerns in particular the Configuration Manager and the Profile Directory. We also create Configuration Manager objects local to each

site. Their role is to manage information about the locally available Information Source, Printer and Document Service Center components. Second, we launch one coordinator per site and modify the scripts accordingly. Scripts that involve only components from one site are enacted by a local coordinator. Scripts that involve components from different sites are either distributed to any of the corresponding coordinators or replicated on several of them. As described in section 3, the final result will be the same. Figure 2 shows in more detail the distribution of the various components across two sites. The first site, XRCE Grenoble, has a local Document Service Center, while the second site, XRCE Cambridge, does not. Thus, when a user from the XRCE Grenoble site requests a document from a local information source, all required processing can be done locally. For the XRCE Cambridge site remote processing is necessary. XRCE Cambridge

Rochester

Subject Directory

Configuration Manager Document Service Center

document

Coordinator Subject Directory Profile Directory

Printer

Coordinator

document

Subject Directory

Configuration Manager

Profile Directory

Coordinator

Printer InfoSource

document

InfoSource document

XRCE Grenoble

The “local” rule covers the case where the Printer, and the Information Source are col-located, and where a local Document Service Center is available. This rule is replicated and enacted on the site-specific coordinators. printReq(infoSourceId,docId,printerId) @ ‘site(infoSourceId,site1) @ ‘site(printerId,site2) @ sameSite(site1,site2) @ ‘localYellowPages(’toPs’,obj,srv) @ ‘content(infoSourceId,’Content’,docId,content) @ toPs(obj,srv,content,postscript) - printer(printerId,’Print’,postscript)

Configuration Manager

Profile Directory

The printReq token represents the print request to be satisfied, specifying the document to process, the information source containing it, and the target printer. As soon as a user request generates such a resource the coordinator triggers this rule, fetching the document content, transforming it into Postscript and finally sending it to the printer. To bring locality into the picture replaces this rule by two new rules using the site service of the Configuration Manager objects. This service provides the names of the sites hosting the Information Sources, Printers and Document Service Centers. A local Configuration Manager contains only information about the components local to a site while the global one contains information about all components within the application. The two new rules implement the two cases described above, one where processing can be done locally, another one where remote processing is necessary.

Document Service Center

CLF Protocol

PARC

Logical Data Flow

Figure 2. New Yaka architecture using replication to enhance local document processing. Let’s consider as an example the delivery of a document through a printer. The initial single rule, not considering locality, was as follows: printReq(infoSourceId,docId,printerId) @ ‘content(infoSourceId,’Content’,docId,content) @ toPs(content,postscript) - printer(printerId,’Print’,postscript)

The combination of the site and sameSite tokens verifies that the concerned Information Source and Printer are col-located. The sameSite token is implemented using the ASSERT directive (see section 3). The localYellowPages token (mapped to the local Configuration Manager) returns a local Postscript transformation service if available. This rule will only be enacted if the locality condition is fulfilled and if a local transformation service is found. The “global” rule covers all other cases of distribution of the Information Source, the Printer and the Document Service Center across the application sites 3 . It will be enacted by the central coordinator. printReq(infoSourceId,docId,printerId) @ ‘site(infoSourceId,site1) @ ‘site(printerId,site2) @ ‘globalYellowPages(’toPs’,site2,site3,obj,srv) @ notSameSite(site1,site2,site3) @ ‘content(infoSourceId,’Content’,docId,content) @ toPs(obj,srv,content,postscript) - printer(printerId,’Print’,postscript) 3 Either the information source and the printer are on different sites or, if they are col-located, no local transformation service is available.

Here the globalYellowPages token is mapped to a service of the global Configuration Manager. It takes as input the required service and the optimal site on which (or at least closed to which) the service should be executed. It simultaneously returns the name of an object providing the required service, (obj), and the name of the site where this object is located, (site3). Then, the notSameSite token verifies that we are not handling a “local” case that should be handled by the previous rule. Through the conditions we introduced in the two rules we made sure that only one of them is applicable to each print request. Now we process document transformation locally whenever possible. Otherwise we do it as close as possible to the printer site.

4.2

Fault Tolerance

When dealing with large-scale distributed applications such as Yaka, one of the main problems is service unavailability: a network congestion, a server crash, an access policy change or any other circumstance can provoke a major failure of the application as a whole. Usually, some well identified services are critical for the core functionality of an application. In order to ensure their availability, we have to rely on fault tolerance techniques. XRCE Cambridge

Rochester

replica

document

Document Service Center

InfoSource Configuration Manager

document

Profile Directory

document

Coordinator

document

Subject Directory document

Printer replica Document Service Center

XRCE Grenoble

PARC

Figure 3. New Yaka architecture using active replication to deal with fault tolerance.

The content of this document has therefore first to be converted into a format understandable by the printer, let’s say Postscript. The system has to react as fast as possible to the user’s request. The service responsible for the format conversion is thus critical. The solution we propose is based on active replication [11, 9]. We simultaneously address several instances (or replica) of the same service and use the first result produced. Performing redundant service calls we maximize the chances of getting a correct result in time. In this particular case, the invoked service is stateless. Thus we have no problem complying with an important requirement of active replication, the coherence of the set of replica after processing the request. Figure 3 shows the resulting Yaka architecture. Our solution is implemented through the following CLF rule: toPsTicket(infoSourceId,docId) @ ‘content(infoSourceId,’Content’,docId,content) @ ‘yellowPages(’toPs’,obj,srv) @ toPs(obj, srv, content,postscript) - storePS(infoSourceId,docId,postscript)

When a ticket resource becomes available through the toPsTicket token, the document content is fetched, and the yellowPages service looks up all available converter replica. For each of them the rule is instantiated. The toPs token is dynamically bound to the corresponding object and service (see the DISPATCH mechanism described in section 3). Then the document is converted to Postscript, at least for the subset of services that are up and running. This will lead to one or more identical Postscript versions of the document. But the ticket ensures that only a single conversion service execution is taken into account and stored in the storePS service (to be finally sent to the printer). Indeed, it is essential to understand that in a first phase the generated rule instances try to obtain an offer for each token on the left hand side of the rule. Here this concerns in particular the ticket and the Postscript. Once a complete solution is possible, i.e. a first Postscript version of the document is available, the coordinator tries to reserve all resources listed in the left hand side of the rule, in particular the one returned by the toPsTicket token. If several rule instances compete at this level only one will succeed. This single instance will be enacted and the resulting Postscript file is stored before being sent to the printer as requested by the user. The coordinator then garbage collects the remaining rule instances, preventing not yet started transformation processes, and possibly interrupting still ongoing ones. The built-in transactional rule enactment of the CLF framework comes in particularly handy in this context.

4.3 In the frame of Yaka, consider for example a user requesting a printed copy of a recently discovered document.

Graceful Degradation

The solution described in the previous section replicates the services and then queries them redundantly. This is a

quite expensive approach both in terms of bandwidth and processing power and thus reserved to only a few key services. Graceful degradation allows to handle the failure of other services, less essential to the overall functionality of the application. For these services it is acceptable to temporarily lower the quality of the results obtained or even to work without them. In other words, the application can still work properly without some components, or replacing them by other ones providing a degraded Quality of Service (QoS). Let’s consider the notification aspect of Yaka. Each new document is summarized. Yaka sends the summary of the document to the subscribed users along with further metainformation. The summary helps the user to decide whether the whole document has some interest for her. But even without summary she can make a decision relying on the remaining meta-information provided (e.g. title, author, size, etc.). In any case, for Yaka it is crucial to provide document notification, not document summarization. Thus, in case of temporary unavailability of the summarizer we can decide either to use a lower quality backup summarizer (e.g. the first ten lines of the document itself if it is encoded in a human-readable format) or even to notify the user without summary. This is clearly acceptable for the end user as long as the notification e-mail is timely delivered. The following set of competing CLF rules implements a solution that successively replaces the faulty service by a lower quality one 4 : summTicket(infoSourceId,docId,obj,srv) @ ‘content(infoSourceId,’Content’,docId,content) @ summarize(obj,srv,content,summary) - storeSummary(infoSourceId,docId,summary) summTicket(infoSourceId,docId,obj,srv) @ wait(’600’) @ ‘summPolicy(obj,srv,nextObj,nextSrv) - summTicket(infoSourceId,docId,nextObj,nextSrv)

The proposed mechanism is again based on tickets (see section 4.2). Each ticket returned by the summTicket token represents the use of a specific summarization service srv provided by an object obj to summarize the document identified by docId from the information source infoSourceId. This ticket is generated by the application whenever a document has to be summarized, and consumed once this is done. The first rule takes a ticket from the summTicket token, fetches the document content and performs the summarization. The summarize token is dynamically bound to a particular summarizer service using the obj and srv information contained in the ticket (see the DISPATCH mechanism described in section 3). If this summarizer service is available, it summarizes the document content. The 4 As least quality service we have a dummy summarization service returning an empty string. Then, Yaka notifies the user without summary.

corresponding rule instance is enacted, the ticket resource consumed, and the summary inserted in the storeSummary token. If the summarizer service turns out to be unavailable the coordinator will keep the rule instance in its rule pool and try to enact it again later. The second rule associates a wait token to every ticket. It delays the enactment of the rule instance for the specified amount of time (e.g. 600 seconds). If a ticket resource is still present after this delay we consider that the specified summarization has not been performed and that the referenced summarizer service is probably faulty. The rule instance is then enacted, the ticket resource consumed, and a new ticket resource generated, specifying a different summarizer instance (object nextObj and service nextSrv). This ticket is in turn available for both rules. The described solution may be extended in several ways. We can for instance customize the policy service to take into account contextual information when selecting the replacement service, such as the bandwidth available to access a summarization service, the number of users to notify, etc. Another extension would be to add other rules similar to the second one, replacing the wait service by more evolved failure detection mechanisms.

5 Numerical Scalability Numerical scalability deals with three physical dimensions that may vary during the life of an application: the number of components implementing the system, the volume of data processed by the system and the number of users. In Yaka the first dimension depends essentially on the number of components representing the underlying infrastructure in terms of information sources and delivery devices. The issues related to their distribution across several administrative sites will be discussed in section 6. Concerning the second and third dimensions, the volume of data processed depends on the number of users and documents managed. Both numbers dynamically grow during the life time of the application. In push mode, this will increase the number of notification/delivery cycles generated. In pull mode, it will increase the number of user requests for consulting the archives and the amount of processing necessary for generating the responses. We discuss these problems in this section. The solutions we propose are data replication, data distribution, and dynamic load-balancing.

5.1

Data Replication

A classical technique for scalability is replication. It increases the availability of the concerned services or data. We have used service replication in section 4.2 to achieve fault tolerance. Here we use component, and essentially

data replication to reduce the load of heavily used components. The problem to face with data replication is data consistency across the replica. This is an issue when the replicated data evolve and change over time. But if heavily used data are stable, data replication can be easily implemented. This is the case within Yaka: some document-centered information is very stable and yet heavily accessed. For instance, once computed, the document summary, status and printable version do not change. However, these data are frequently accessed. The Subject Directory component storing them is involved in pull mode each time a user searches the document archive. For each search request, the Subject Directory generates a corresponding HTML page on the fly. Therefore, a growing number of users and documents directly impacts this component. A growing number of users increases the number of requests it has to manage, a growing number of documents the processing load per request.

The secondary replica are not directly interacting with any other components but the primary replica. For responding to user requests they only rely on their local data. They all offer the same interface to the users and register themselves at a Yellow Pages object. A single CLF rule propagates the data stored in the primary to the secondaries: ‘yellowPages(’SubjectDirectorySecondary’,obj) @ ‘subjectDirPrimaryStableData(data,...) subjectDirSecondary(obj,’StableData’,data,...)

In this way, when a user requests a printed copy of an already transformed document, the system can find the printable version either in the local secondary replica or in the central primary replica of Subject Directory. The amount of cached data may be managed through any classical cache policy (i.e. LRU).

5.2 XRCE Cambridge

Rochester

secondary

document meta−data

Data Distribution

Subject Directory document meta−data

primary Subject Directory

Coordinator Configuration Manager document meta−data

Profile Directory secondary

In section 5.1 all the components manage the same set of data and the load of components is reduced distributing user requests over identical replicated components. Here, we consider reducing the number of requests to each component by distributing the managed data among several instances of the same component. For instance, the Profile Directory component stores information related to the users. We can replicate it on the different sites and distribute the subset of local user profiles among the replicas. If we also replicate the Mailer component responsible for sending e-mail notification, we can associate a customized CLF rule to each couple (profile directory, mailer) in order to alleviate the load of each Mailer. All the rules are similar but enacted by the local coordinators with different components and hence, different data.

Subject Directory

5.3

Dynamic Load-Balancing

document meta−data

XRCE Grenoble

PARC

Figure 4. Passive replication of data. As shown in figure 4 we use passive replication of the Subject Directory component, i.e. primary backup technique [5, 9], to solve this problem. We consider one of the Subject Directory instances as primary. It interacts with the other components of the application as if it was the unique instance. But, we have populated our system with several secondary instances located at the different user sites. The users now access and search these local instances.

The solutions described in the previous sections are essentially static. They require to replicate and distribute data and components at design or deployment time. This is not always appropriate and easy to do, especially for applications like Yaka, which cannot be shutdown and re-started because they are used “around the clock”. Fortunately, CLF allows to dynamically balance the load of the system. We give here two examples of how we use this mechanism in Yaka. The first case concerns a cold start: when Yaka is started for the first time with a large set of information sources, a potentially very large number of documents will be detected and summarized. Two types of components are heavily used: the coordinator and the summarizer. Yaka has to

simultaneously process all the documents stored in the different information sources. The proposed solution relies on the instantiation of several coordinators and summarizers enacting a dedicated set of rules. These rules are not competing on resources, either because they enact distinct parts of the process, or because they are only applied to a subset of data. A dedicated CLF rule will shutdown these “booster” components once the cold start phase is finished: e.g. when the number of documents waiting for summarization is below a given threshold. The second case is more interesting and in a way complementary to the previous one since it implies the dynamic creation of new components. At some point the number of users registered to a Profile Directory may become too high with respect to what the component can handle. Being beyond a given threshold, we dynamically create a new instance of Profile Directory and migrate half of the original data to it. This process is managed by two sets of rules. The first rule monitors the number of users managed by each local Profile Directory. If the number of users is greater that 100, the rule is enacted. As a result a new instance of Profile Directory is launched on the given site and a ticket is created in order to control the transfer of data from the initial Profile Directory instance to the new one:

6 Administrative Scalability Administrative scalability has two aspects. First it concerns the complexity of a system when distributed over a growing number of organizations. Attributing local administration tasks to local administrators reduces this complexity. Second it must also be possible to deploy and monitor a distributed application as a whole, to easily get an overview of the components on the different sites involved, or to inspect the state of each component individually. We address these aspects in the following subsections.

6.1

Local Administration

migrationTicket(src,dst) @ ‘profileDirectory(src,’nb’,nb) @ lessThan(nb,’50’) -

The preferred solution to deal with administrative scalability in a growing number of organizations is to distribute the administrative tasks to local administrators. Each of them takes the responsibility for the local components. In Yaka the components requiring local administration at the different sites (i.e. LANs) are mainly the locally available information sources (e.g. file systems, document management system) and the notification and delivery devices (e.g. printers, fax, personal file systems, mailers). This infrastructure is not static, it evolves dynamically and independently from the application: delivery devices become available or disappear and information sources need to be added or removed. In order to dynamically add or remove infrastructure components while the application is running, Yaka provides a specific administration object, the Configuration Manager, which uses a generic object launcher available along with CLF. The Configuration Manager is replicated at each site and allows each site’s administrator to dynamically add or remove components. This keeps the representation of the site’s current infrastructure up to date without having to shutdown and re-start the application. It allows to dynamically adapt the functionality of the application to the capabilities of the infrastructure. Within Yaka the Configuration Manager acts as local yellow pages server for information sources and delivery devices of the site. Local coordinators use it to access the locally registered information sources and to process the documents they contain. The corresponding rules rely on the DISPATCH mechanism (Described in section 3) and thus automatically apply to all registered information sources. For instance, let’s have a look at the rule implementing the regular scanning of information sources:

The consumption of the ticket inactivates the second rule for the given migration process. But the three rules stay active and can trigger and manage another split as soon as the number of users in a Profile Directory instance grows too much.

‘yellowPages(’InformationSource’,infoSourceId) @ scanTicket(infoSourceId) @ ‘delay(infoSourceId,’Delay’,delay) @ wait(delay) scan(infoSourceId,’TriggerScan’,ticket) @ scanTicket(infoSourceId)

‘yellowPages(’ProfileDirectory’,src) @ ‘site(src,site) @ ‘nb(src,nb) @ greaterThan(nb,’100’) @ generateName(src,dest) launch(site,’ProfileDirectory’,dest) @ migrationTicket(src,dest)

The following rule then manages the transfer of the user data to the new instance: ‘migrationTicket(src,dest) @ ‘profileDirectory(src,’nb’,nb) @ greaterThan(nb,’50’) @ profileDirectoryData(src,’UserData’,data) - profileDirectoryData(dest,’UserData’,data)

The last rule is triggered when half of the data has migrated:

This rule regularly triggers scanning for all (enabled) information sources registered at the Configuration Manager. The token yellowPages allows to look up all information sources. If a corresponding scanTicket enables scanning for this source, the rule waits for a given delay5 and triggers the scanning process. It also inserts on the right hand side a scanTicket replacing the one consumed on the left hand side, which allows to regularly scan the information sources for new documents. Since the above described rule only exploits the minimal service interface, common to all the information source encapsulators, it automatically applies to the set of currently registered sources. Therefore, the administrator may dynamically add or remove a source without having to shutdown the application or to modify the script. Just like for the local information sources, the Configuration Manager allows to lookup all locally registered delivery devices. This allows to present to the user all the delivery devices currently available at his site. If a completely new type of delivery device available, its integration might involve updating the rules in the coordinators. This is the case if the interaction with this new component is not already covered by existing rules. It might also be necessary to integrate other complementary components as in the following example. If we dynamically add a PDA as a new delivery device for Yaka, we have to extend the system to generate a compliant output format. First we encapsulate, start, and register a corresponding transformer component. Then we tackle the processing issue generalizing the rule describing document delivery through a printer device. This rule was initially defined as follows: deliveryRequest(infoSourceId,docId,deviceId) @ ‘deliveryDevice(deviceId,’Printer’) @ ‘docFormat(infoSourceId,docId,docFrmt) @ ‘transf(docFrmt,’Postscript’,transObj,transSrv) @ ‘content(infoSourceId,’Content’,docId,content) @ exec(transObj,transSrv,content,delContent) - deliver(deviceId,delContent)

The above rule is specific to document delivery through a printer device (i.e. requiring a transformation to Postscript format of the document content). We extend it to define a more generic rule for any delivery request requiring format transformations, thus covering the case of PDAs. deliveryRequest(infoSourceId,docId,deviceId) @ ‘deliveryDevice(deviceId,devType,devFrmt) @ ‘docFormat(infoSourceId,docId,docFrmt) @ ‘transf(docFrmt,devFrmt,transObj,transSrv) @ ‘content(infoSourceId,’Content’,docId,content) @ exec(transObj,transSrv,content,delContent) - deliver(deviceId,delContent) 5 The

service Delay individually defines the delay between two scans for each information source.

Removing the old rule and replacing it by the new one can be done at run time without shutting down and restarting the application. Moreover, any active rule instances will be killed at transaction time, as the the rule is itself a resource required for executing the transaction.

6.2

System and Component Monitoring

The second aspect of administrative scalability concerns the deployment and monitoring of the distributed application as a whole. CLF provides several tools to address this problem. A distributed application is initially defined and deployed through a seed file defining:

 the involved sites and machines ;  the components (including the coordinators) and their distribution across the sites and machines ;  the rule scripts that describe the interaction among these components and their distribution over the coordinators. Components and scripts can be either started automatically at deployment time or manually later on. Moreover, the same seed can be used in the restricted context of a site: then only components and scripts belonging to the site are started. This allows local administrators testing and administrating their part of the application. At run time, a dedicated Monitor component continually collects and displays information about the different components of the application. The Monitor is itself a CLF object. It has a Web interface that summarizes the current state of the different components and that is regularly updated. From this view the administrator checks if a component is running or not, starts or shuts down “manual” components and uploads or removes “manual” scripts. Initially the monitor provides information about the set of components defined within the seed file, but later on, interrogating the name server, it automatically takes into account information about other dynamically launched components (see section 5). From the monitoring interface the administrator can also access component-specific information. Each component acts as its own internal monitor allowing to browse the services it provides, and the resources it contains. It is even possible to interact with the components using the CLF protocol and thus modifying their internal state (i.e. set of resources) while the application is running.

7 Discussion In the previous sections we have described how to address scalability issues with the CLF middleware. Our solutions are built around a core set of classical techniques[7]

also employed in other middleware research projects targeting wide-area distributed systems. Globe[13] addresses scalability through partitioned objects. The replication policies are implemented in specific sub-objects associated to each first-class object. The approach with CLF is different in the sense that a significant portion of the application’s behavior is defined outside the components (i.e. through rule scripts enacted by the coordinators). However, both mechanisms allow to adapt the replication strategies to the semantics of a particular application. RaDaR[10] implements a replication service and a multiplexing service within a hierarchical architecture. The problems the authors intend to solve are similar to the ones described in this article. In particular administrative scalability is acknowledged as an issue. Simulation results have confirmed the efficiency of the system in bringing scalability to a global information-hosting service. Nevertheless, it is unclear how difficult it would be to tailor such a solution to other kinds of application (e.g. Yaka). The Rent-A-Server[12] application has the same goal as RaDaR: achieving graceful scale of Web services. It is based on a centralized load balancing daemon and distributed smart clients. It was designed as a showcase of the WebOS system-level capabilities: naming, persistent shared state, security and process control. With CLF we propose higher-level abstractions: a programming model where scalable application behavior can be specified.

8 Conclusion This article revisits a set of classical solutions to deal with scalability issues. It proposes for each of them a design within the CLF framework, taking benefit of the CLF middleware in several ways. First, the CLF scripting language allows to express complex service interactions in a very compact way. Each rule describes part of the desired behavior while transparently providing (i) the mapping of the tokens to actual service instances, (ii) the management of infinite service offer streams and (iii) the transactional enactment of fully instantiated rules. Second, since the rule scripts are enacted by a dedicated components (the Coordinators), the implemented behaviors can be modified, inter-leaved and extended without digging into service-specific code. Finally, it allows to specify the application component deployment from a central point using a seed file. Furthermore rules can be dynamically added and objects launched and registered to a Yellow Pages server at run time. This enables starting an application with only its basic functionality and bringing on board later, and progressively various scalability solutions.

This paper describes only a qualitative approach ; we plan to perform as natural follow-up extensive performance tests in order to provide also quantitative data. We will take advantage of the current Yaka instances deployed across Xerox to collect and compare results coming from both simulation and real system usage. This will allow to investigate how the various solutions presented here can be integrated to provide a good trade-off between complexity and scalability of the resulting system.

References [1] J.-M. Andreoli, D. Arregui, F. Pacull, M. Riviere, J.-Y. Vion-Dury, and J. Willamowski. CLF/Mekano: a framework for building virtual-enterprise applications. In Proc. of EDOC’99, Manheim, Germany, 1999. [2] K. Arnold, A. Wollrath, B. O’Sullivan, R. Scheifler, and J. Waldo. The Jini specification. Addison-Wesley, Reading, MA, USA, 1999. [3] D. Arregui, F. Pacull, and Riviere. Heterogeneous component coordination: the CLF approach. In Proc. of EDOC’2000, Makuhari, Japan, 2000. [4] D. Arregui, F. Pacull, and J. Willamowski. Yaka: Document notification and delivery across heterogeneous document repositories. In Proc. of CRIWG’01, Darmstadt, Germany, 2001. [5] N. Budhijara, K. Marzullo, F. Schneider, and S. Toueg. Distributed Systems, chapter The Primary-Backup Approach, pages 199–216. Addison-Wesley, Wokingham, 2nd edition edition, 1993. [6] Microsoft Corporation. Distributed Component Object Model Protocol DCOM/1.0, draft, nov 1996. [7] B. Neuman. Scale in distributed systems. In T. Casavant and M. Singhal, editors, Readings in Distributed Computing Systems, pages 463–489. IEEE Computer Society Press, 1994. [8] OMG/CORBA. http://www.corba.org. [9] D. Powell, editor. Delta-4: A Generic Architecture for Dependable Distributed Computing. Springer-Verlag, 1991. [10] M. Rabinovich and A. Aggarwal. Radar: A scalable architecture for a global web hosting service. WWW8 / Computer Networks, 31(11-16):1545–1561, 1999. [11] F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299–319, 1990. [12] A. Vahdat, T. Anderson, M. Dahlin, D. Culler, E. Belani, P. Eastham, and C. Yoshikawa. Webos: Operating system services for wide area applications. In Proceedings of the Seventh IEEE Symposium on High Performance Distributed Systems, July 1998. [13] M. van Steen, P. Homburg, and A. S. Tanenbaum. Globe: A wide-area distributed system. IEEE Concurrency, pages 70–78, January 1999.

Suggest Documents