Programming Paradigms and Clustering Rules - Semantic Scholar

2 downloads 215 Views 286KB Size Report
Feb 15, 1993 - A worker process calls one or more simple server processes which, in turn, are only called by this process. ...... Monitoring Distributed Systems.
Programming Paradigms and Clustering Rules Thomas Kunz

15. Februar 1993

? TI{3/93 Institut fur Theoretische Informatik Fachbereich Informatik Technische Hochschule Darmstadt

Programming Paradigms and Clustering Rules Thomas Kunz Institut fur Theoretische Informatik Technische Hochschule Darmstadt February 15, 1993

Contents 1 Introduction

2

2 Distributed Processing

3

3 Programming Paradigms for Distributed Applications

4

4 The Process Clustering Tool

6

5 The Clustering Rules

9

6 Evaluation of Process Clusters

23

7 Results

24

8 Conclusions

29

1

Abstract

Debugging distributed applications is very dicult, due to a number of problems. To manage the inherent complexity of distributed applications, the use of abstractions is proposed. One frequently performed abstraction is to group processes into clusters. We describe an approach to derive clustering rules from well{known programming paradigms for distributed programming. Programming paradigms determine how we think about problems and their implementation. They shape, among other things, the application structure. This paper identi es frequently used programming paradigms for distributed computing. Likely application structures resulting from the use of these paradigms are discussed and captured in process clustering rules. A quantitative measure for process cluster evaluation is presented and applied to clusters derived for Hermes applications. The results provide insight into the relative strength of the process clustering rules.

1 Introduction Distributed applications have a number of characteristics that pose a special challenge to any debugging e ort, see for example [21, 33]. These characteristics make distributed applications simultaneously highly complex and error{prone. To address the inherent complexity, a suitable debugging methodology, organizing the overall debugging activity, has to be chosen. The work presented here is based on the assumption that a top{down debugging methodology is best suited to address the complexity problem. A top{down methodology starts with the component at the top of a hierarchy and successively moves to the next lower level [31]. In the context of distributed debugging, a hierarchy of abstractions can be used to minimize the complexity involved. Typically, a user starts the debugging process by trying to gain a global overview, see [35] or [46]. In the course of the debugging process, the faulty part of the application is isolated and examined at a lower level of abstraction. In this way, more and more detailed information is collected for smaller and smaller parts of the application, keeping the overall amount of information collected manageable. The process continues until the bug is identi ed and can be corrected. A number of approaches have been published to support the high{level debugging of distributed applications. [8] discusses a debugger that allows its users to formulate, in a bottom{up fashion, abstract events to model the behaviour of an application. [26] describes the use of path expressions to model synchronization requirements. [37] proposes TSL, a task sequencing language for Ada applications, and [25] presents a pattern{oriented parallel debugger. In all these cases, however, the set of available abstractions is either pre{determined and x or higher{level abstractions have to be derived manually. Given the inherent complexity of distributed applications, such a manual derivation of abstractions seems to be tedious and error{prone. One abstraction frequently employed is the grouping of application processes into clusters. By grouping clusters into higher{level clusters, a hierarchy of abstractions is build. This paper discusses an approach to the automatic derivation of process cluster hierarchies based upon process clustering rules. The example applications examined are written in Hermes[48], a high{level, process{oriented language for distributed computing. The general approach outlined here, however, is equally applicable to applications written in other distributed memory, message{passing programming languages. The paper is organized as follows. Section 2 discusses distributed processing and gives general characteristics of a distributed application. Section 3 presents programming paradigms used in the design of distributed applications. Section 4 addresses global issues in process clustering and Section 5 describes the derivation of process clustering rules from the various paradigms. Section 6 presents 2

a quantitative process cluster evaluation measure and Section 7 discusses the results obtained. The last section summarizes our ndings, states open problems, and gives an outlook on current and future work.

2 Distributed Processing The work reported here started aims to reduce the complexity of the debugging of distributed applications, e.g. programs executing on a distributed system. Although these terms are frequently used, no generally agreed upon de nitions exist. This section limits the scope of our work by presenting essential characteristics of both distributed systems as well as distributed applications.

2.1 Distributed Systems It is frequently claimed that the 1990s will be the decade of distributed systems, see for example [38, 42]. However, an exact de nition of what constitutes a distributed system is hard to give. Typically, a list of symptoms characterizing such a distributed system is provided, such as:

    

multiple processing elements, interconnection hardware independent failure of processing elements shared state among the processing elements, :::

The reasons for the construction and utilization of a distributed systems are numerous, such as better cost/performance ratio, alignment of the data processing with an essentially distributed real world, improved modularity and expandability, better availability and reliability, or the potential for scalability, see also [38].

2.2 Distributed Applications Distributed applications are the programs executing on a distributed system. They have characteristics matching those of distributed systems. [44], for example, gives a list with the following characteristics:

      

cooperation of loosely{coupled, distributed autonomous entities, complex structure, topological irregularity, intensive remote communication, dynamically changing communication patterns, separation of address spaces, and dynamic con guration changes.

It follows from these characteristics that the development and maintenance of distributed applications is dicult. To reduce these diculties, well{known programming paradigms can be used to guide the design and implementation of a distributed application. 3

3 Programming Paradigms for Distributed Applications The clustering rules presented later are derived from programming paradigms for distributed programming. This section justi es this approach and presents a number of such paradigms.

3.1 The Role of Programming Paradigms Programming paradigms are collections of conceptual patterns that in uence the design process and ultimately determine a program's structure [4, 20]. By analyzing programming paradigms used for distributed programming, potential program structures can be identi ed. Using rules that describe these structures and information about the runtime behaviour of a distributed application, a hierarchy of process clusters can be generated automatically. [4] distinguishes three categories of paradigms: those that support low-level programmingtechniques, those that support methods of algorithm design, and those that support high{level approaches to programming. The programming techniques category (for example, copying versus sharing of data structures) deals with techniques relevant for the internal design of a single process and therefore is too low{level for our purpose. Among the various high{level approaches to programming, the operational paradigm is supported by the majority of distributed programming languages, see [5, 7]. We therefore concentrate on the second category, paradigms supporting algorithm design, in the context of an operational programming language such as Hermes. A number of paradigms within this second category are described in [11, 19, 39, 47]. Based on [11], these paradigms are grouped in three classes: result parallelism, specialist parallelism, and agenda parallelism. These three classes capture di erent approaches towards the construction of parallel and distributed algorithms. Result parallelism produces a series of values with predictable organization and interdependencies. Specialist parallelism results in a program structure which is conceived in terms of a logical network. Each node executes a relatively autonomous computation and is the specialist for this speci c computation. Agenda parallelism involves the transformation, or a series of transformations, to be applied to all elements of some set in parallel. Even though the distinction between these di erent approaches is sometimes dicult (for example: agenda parallelism with only one item on the agenda versus result specialism), the following discussion is based on these three paradigm classes. Real applications, however, are frequently designed using paradigms from more than one class.

3.2 Non{applicable Paradigms To clarify the type of applications examined here, it is worth mentioning paradigms that are considered not to be applicable to the design of distributed applications. The paradigm of systolic computing is one example, since systolic algorithms typically lack the topological irregularity characteristic for distributed applications. Data parallelism, for example SIMD applications, require a strong coupling among the processing elements, which contradicts the idea of loosely{coupled, autonomous entities. And speculative parallelism, sometimes also referred to as or{parallelism, where a collection of parallel activities is undertaken with the understanding that some might be futile, is strongly associated with logic programming. This paper, however, focuses on paradigms supporting algorithm design in the context of an operational high{level approach to distributed programming.

4

3.3 Result Parallelism Paradigms Result parallelism is an approach that focuses on the shape of the nished product. Each process contributes one piece of the result and they all work in parallel, up to synchronization restrictions imposed by the nature of the problem. Methods of algorithm design in this class are divide & conquer, blackboard systems, and iterative relaxation. Divide & conquer divides a problem into two or more smaller problems. Each of these problems is solved independently and their results are combined to give the nal result, see also [39]. In blackboard systems, a central data structure represents the current stat of the computation. A number of independent processes, each responsible for a certain kind of knowledge, check the current state and update it if they can ([47]). Iterative relaxation divides the data space into adjacent regions, which are then assigned to di erent processes. Each process carries out activities local to its region, communicating with neighbours when necessary ([19]).

3.4 Specialist Parallelism Paradigms Paradigms following the specialist parallelism approach focus more on the makeup of the processes. Each process is assigned to perform one speci c task. Paradigms in this class are pipeline & lter, layered system, and client{server. In a pipeline & lter design, processes accept a stream of inputs and emit a stream of outputs. Typically, the input stream is transformed locally to yield the output stream, see [47]. In a layered system, the processes are organized hierarchically, with each layer providing service to the layer above. Inner layers are usually hidden from outer layers, except for speci cally designed interface functions ([47]). Client{server, sometimes referred to as passive data pool, lets a large data space be managed by a collection of processes that support queries and updates on the space. Queries from client processes are directed to the appropriate server process, see also [19].

3.5 Agenda Parallelism Paradigms The last approach towards the design of distributed applications focuses on the list of tasks or steps to be performed. Each process is helping out with the current item on the agenda. Paradigms in this class are the administrator concept, compute{aggregate{broadcast, and master{slave. In the master{slave paradigm, also referred to as generate{and{solve, a problem is subdivided into (di erent) subordinate problems. A pool of worker or slave processes stands ready to solve the subproblems as they are created and distributed by the master. Contrary to the divide & conquer paradigm, subordinate problems are not necessarily homogeneous. Furthermore, not all subproblems must be known from the start. They can also be created as a consequence of solving a previous subproblem, see also [19]. The administrator concept [22] reverses the role of send and reply for the worker processes. A worker sends the administrator a report on completion of the last task assigned and a request to be assigned new work. The administrator replies to the worker when there is a new task for it. This organization avoids that the administrator blocks while waiting for replies from a worker and aims at hiding the worker processes from the outside the same way as servers hide data pools or resources. 5

Compute{aggregate{broadcast designs consist of three distinct phases bearing those names. A compute phase performs some basic (local) computation, an aggregate phase combines local data into one or few global values, and a broadcast phase returns global information back to each process ([39]).

4 The Process Clustering Tool This section outlines the process clustering tool developed. After a short discussion of the relationship between process clustering and reverse engineering, the architecture of the process clustering tool is described. The section nishes with a presentation of the semantic information collected to characterize application processes.

4.1 Process Clustering and Reverse Engineering The paradigms discussed above in uence the overall application design. Using rules that try to capture the process structures resulting from the application of speci c paradigms, process clustering aims to reconstruct the application design. Rediscovering the design of an application is an explicit goal of reverse engineering approaches [13]. To achieve this goal, semantic clues are usually extracted from the source code by a static source code analysis as in [14]. The results are either transformed into data ow diagrams [9] or stored in a relational database [12]. Existing reverse engineering approaches are typically based on sequential programming languages and identify the components (such as procedures) and relationships (such as the static call hierarchy) by using a static source analysis tool. In distributed applications, the components are processes and the relationships are interprocess communication and process creation. However, due to the anonymous IPC facilities of Hermes, called ports, a static source analysis cannot reveal the peer with which a process is communicating. Contrary to sequential programming languages, where procedure names share one global namespace and procedure calls are easily identi able syntactically, the port concept in Hermes completely hides the peer in an IPC operation.1 For this reason, a purely static source code analysis does not seem overly promising. The approach pursued here combines static source code analysis with an evaluation of process traces. For example, one might want to cluster parent processes together with all the child processes they spawn. Unfortunately, the process creation primitives, create of and procedure of, can hide the identity of the created process in a string variable. Furthermore, a static code analysis may not reveal how many processes are created, but this information can easily be retrieved from the trace of an execution.

4.2 Architecture of the Process Clustering Tool The tool we developed clusters processes using semantic information characterizing application processes and information about the relationship among processes. Two interprocess relationships exist in distributed applications: process creation and process communication. Preliminary research indicated that clusters based on the process creation relation are inferior to clusters based on interprocess communication. Examining a number of Hermes applications showed that processes may be created by one process on behalf of another process. Clustering the newly created process together with its parent process does not correctly re ect its subsequent role. 1 Anonymous IPC is not unique to Hermes. The BSD Unix IPC facilities are based on a port concept as well [45], and other distributed programming languages and supporting runtime systems provide similar concepts, see for example [23].

6

As mentioned above, determining the actual communication between processes by a static source analysis is not possible. So this information is collected from trace information provided by a modi ed Hermes interpreter. This trace contains all information about process creation and interprocess communication during the run of an application. A static source analysis collects semantic information about each application process. An analyzer collects all the information and builds clusters according to the rules described later. The overall architecture of the tool is depicted in Figure 1. Note that this type of event trace can also be obtained by debuggers which instrument the target operating system to record IPC and process creation operations.

Source Files

Hermes Compiler

Object Files

Static Analyzer

Semantic Data

Hermes Interpreter

Trace File(s)

Output

Analyzer

Cluster File

Figure 1: Architecture of the clustering tool

4.3 Collected Semantic Information To allow for the formulation of very speci c process clustering rules, semantic information about each process is collected. To better understand the following discussion, the next few lines shortly outline process creation in Hermes. Processes are created by special language primitives. When a new process starts up, the runtime support provides it with one initialized variable, an input port. The instantiating process obtains an output port connected to this input port as result of the process creation. Typically, a Hermes process receives a message at this input port as part of its 7

initialization. This message contains, among other things, output ports to other processes and is the only way in which a process obtains initial connections to other processes. Similarly, the newly created process returns output ports to its input ports with the initialization message to enable other processes to communicate with it. A rst characteristic collected for each application process is its type. Several process taxonomies based on interprocess communication have been proposed [10, 22, 36]. These taxonomies vary in the number of categories they include, but they all divide processes into three broad categories: workers, servers, and shells or coordinators. Shells are processes that create and connect other processes only. Server processes o er one or more services in an endless loop. The output ports to invoke these services are part of the returned initialization message. Conceptually, a server has the following structure: process is_server; begin receive initialization_message; create service input port p; return initialization message with an output port bound to p; while 'true' receive service_request on p; serve_request; return service_request; end while; end;

All other processes are called workers, advancing the state of the computation. We deliberately do not call them clients because the term client usually implies that the process calls a server. This, however, is not necessarily true for worker processes. In reality, processes do not necessarily belong to strictly one category. For example, not one instance of a pure shell has been discovered. Usually, a server or worker takes over the role of a shell as well, creating other processes as part of its own initialization. Therefore, the number of di erent process types was reduced to two: workers and servers. A second characteristic is the process complexity. A process is de ned to be complex if it calls other application processes; otherwise, it is simple. This characteristic cannot be determined by a static source analysis. Instead, it is determined by analyzing the interprocess communication during the actual computation. The third and last characteristic describes the initialization behaviour of a process. Besides process instantiation, process creation in Hermes involves the sending of an initialization message to the new process. Very often, this message is not part of the actual computation but is used to set up connections between processes. Since we want to cluster processes based on their communication interconnections during the actual computation, the analyzer discards initialization messages from the trace le created by the modi ed Hermes interpreter. However, this adjustment is only made for processes that have a well-de ned initialization part: process has_init_part; begin receive initialization message; do initialization;

8

return initialization message; do computation; end;

Some processes receive only one message throughout their entire lifetime. By returning the initialization message, these processes also return the results of their computational task. In such cases, the initialization message must be considered part of the interprocess communication. By de nition, servers always have a separate initialization part. In summary, a process is characterized as follows: 1. type: worker or server 2. complexity: simple or complex. 3. initialization characteristic: existence of a separate initialization part The last characteristic is used only at the beginning of the clustering process to lter out initialization messages. The rst two characteristics are used by the clustering rules. These rules describe patterns of processes with certain types, complexities, and interprocess communication structure that can be clustered. Furthermore, these characteristics are determined not only for application processes but also for process clusters. The next section, describing the clustering rules, also explains how the characteristics of a process cluster are determined. By using the same semantic description for both processes and process clusters, the clustering rules can be applied recursively, creating a multi{level hierarchy of process clusters.

5 The Clustering Rules This section discusses in detail the clustering rules derived. It starts with a few general comments that are relevant to all rules. Afterwards, the clustering rules are discussed individually.

5.1 General Comments The clustering rules have names similar or identical to the paradigms upon which they are based. To distinguish between rule and paradigm, the following notational convention will be used. Italic names (such as divide & conquer) denote paradigms, names set in teletype (such as divide & conquer) are used for rules. The following discussion interprets the interprocess communication matrix as a graph. Figures using arti cial examples demonstrate the e ects of the clustering rules. In these gures, processes are drawn as boxes, containing the name and type of a process. An arrow connects box i to box j if messages are sent from i to j. The complexity can be derived from the IPC arcs. A node with no outgoing arcs has a simple complexity, otherwise it is complex. The gures illustrate the IPC structure after initialization messages are ltered out. Process clusters resulting from the application of a speci c rule are indicated by surrounding the component processes with a dashed line. The name and type of the cluster are written next to this surrounding line, see Figure 2. The process names in these arti cial examples have been chosen based on the following conventions. Worker processes are either given the name root or worker. Since these names are not used for any existing Hermes processes, the clustering tool assigns the default semantic interpretation worker to 9

these processes. Server processes, on the other side, have to be identi ed by the clustering tool as such. Therefore, process names of existing Hermes server processes were used. The examples, however, do not claim to mirror real or realistic Hermes applications. Neither the list of programming paradigms discussed above nor the clustering rules described here claim to be complete. There exist other methods of algorithm design for distributed applications and even for the paradigms listed above, the following rules do not exhaustively cover all possible process structures. The rules describe minimal and positive structures. This is demonstrated in Figure 2. Assume a clustering rule exists that clusters a simple worker process with the calling complex worker process. The resulting cluster is assigned the name and type of the calling process. root worker

worker1 worker

root worker

worker2 worker

worker3 worker

findfile server

worker4 worker

worker2 worker

Figure 2: Minimality and positiveness of clustering rules After construction of the worker2 cluster, the result is again a simple worker. It therefore could be clustered immediately with the root cluster. However, to control the resulting clusters sizes somewhat, clusters that are formed at one step cannot be clustered again in the same step. To combine both clusters, the rule would have to be applied again. This is referred to as the minimality of rules. In Figure 2, the root worker process is not only calling worker processes, but also a simple server process. Nevertheless, the root cluster is formed because such an additional IPC relation is not explicitly excluded by the rule. This is referred to as the positiveness of rules. Rules typically describe properties necessary for clustering. It does not follow that everything not mentioned by a rule is explicitly forbidden. This approach has been chosen because, as mentioned already above, we expect programmers to apply a number of di erent paradigms while designing an application. Consequently, an individual process might ful ll multiple functions. Depending on the rule employed, it could be clustered with di erent groups of processes.

5.2 Global Rules Global rules describe structures that characterize the complete application. All application processes will be analyzed and clustered when such a global rule is applied. Only one such rule has been derived, based upon the layered system paradigm. This layered system rule assigns processes into clusters 10

root worker

worker1 worker

worker4 worker

LAYER_0 worker

cwd server

worker5 worker

worker3 worker

worker6 worker

Figure 3: Applying the layered

LAYER_1 worker

worker7 worker

system

LAYER_2 worker

clustering rule

such that all processes in cluster i only communicate with processes in the same cluster or cluster i + 1. Figure 3 shows an application of this rule. It is, in general, not possible to identify any of the processes within each layer/cluster as most or more important, so that the cluster characteristics could be derived from this process. Therefore, the clusters are assigned the generic name LAYER xy, with xy being a unique, system{assigned numerical identi er, and the type worker. root worker

worker1 worker

worker4 worker

cwd server

worker5 worker

worker3 worker

worker6 worker

Figure 4: Example where the layered

worker7 worker

system

rule fails

It is not always possible to identify multiple layers that satisfy the above requirement. Figure 4 gives an example. All processes in this gure are directly or indirectly called from all other processes. Any distribution of processes in two or more clusters would therefore result in upcalls, something not allowed by the clustering rule.

11

5.3 Local Rules Local rules describe substructures within a distributed application. Multiple instances of the structure described by the rule might exist in parallel and will be discovered when applying the rule. They can be further subdivided into the subcategories: linear substructures, biconnected substructures, and structural identity. The rules in the linear substructure category describe IPC relations between processes where one process is either called by exactly one other process or calls just one other process. In biconnected substructures, a set of processes communicates in such a way that the removal of a single communication link still has the process set at least weakly connected. Structural identities are identi ed by the divide & conquer clustering rule, which looks for and clusters multiple process sets with, among other things, identical IPC structure.

5.3.1 Linear Substructures The rst clustering rule describing a linear substructure is the master-slave rule, derived from the master{slave paradigm. It scans all processes for simple workers that are called from exactly one other process and clusters these processes with the calling process. The underlying notion is that the calling process passes on some of its work to the worker process being called. Figure 5 shows an example. The calling process can be of type server or worker. The resulting cluster is assigned the name and type of the calling process. root worker

findfile server

findfile server

worker1 worker

worker3 worker

worker3 worker

worker2 worker

worker4 worker

worker5 worker

cwd server

Figure 5: Applying the master-slave clustering rule Figure 6 depicts a situation where the master-slave rule is not applicable. The only processes with a complexity of simple are of type server. Two rules similar to the master-slave rule are the client-server rule and the complex server rule. The client-server rule describes one potential IPC structure under the client{server paradigm. A worker process calls one or more simple server processes which, in turn, are only called by this process. Caller and called process are clustered together, the resulting cluster is assigned the name and type of the calling process. Figure 8 gives an example where this rule fails. The only simple server process (getfile) is called by two worker process. This gure also gives a good example for the incompleteness of the clustering rules. Applying the client{server paradigm might well lead to situations where multiple worker 12

root worker

worker1 worker

findfile server

worker2 worker

cwd server

worker3 worker

worker4 worker

worker5 worker

getdep server

Figure 6: Example where the master-slave rule fails

root worker

findfile server

worker2 worker

root worker

cwd server

worker4 worker

rmanager server

worker5 worker

Figure 7: Applying the client-server clustering rule

13

processes (such as worker4 and worker5) access one or a common subset of servers such as getfile. However, the current version of the client-server rule requires that a simple server be called from exactly one worker process only and therefore this substructure is not identi ed and clustered. root worker

findfile server

worker2 worker

cwd server

worker4 worker

rmanager server

worker5 worker

getfile server

Figure 8: Example where the client-server rule fails The complex server is not derived directly from one of the paradigms discussed previously. It clusters simple servers that are called by exactly one server process. Structures matching the complex server rule are expected to result from a combination of the client{server and the master{slave paradigms, where a complex service is broken down into simpler services while at the same time providing the image of one service to an outside caller. Figure 9 shows a situation where this rule can be applied. The resulting clusters have the same name and type as the calling server. Figure 10 shows a situation where the rule is not applicable. All simple servers are either not called from another server or from more than one server. The rst three rules describe substructures where a process with a complexity of simple was called by exactly one other process. Just as the administrator concept paradigm reverses the calling order between master and slave, the administrator concept rule describes a structure where multiple workers call exactly one server. The type of the process being called is restricted to server because this is the \obvious" design choice to implement this paradigm. Figure 11 shows the application of this rule. Similar to the previous rules, the resulting clusters are assigned the name and type of the process being known to the rest of the application, here the server being called. This rule, by the way, describes equally well likely IPC substructures resulting from the application of the blackboard system paradigm, where the central blackboard is a server process that is accessed by specialized workers. When a worker process calls other processes besides the server, it is no longer possible to decide whether this worker belongs to the server, according to the administrator concept paradigm, and is therefore not clustered. Similarly, server processes calling one central server process do not t the underlying paradigm and are therefore not clustered. Figure 12 shows these situations. 14

root worker

findfile server

findfile server

cwd server

worker1 worker

cwd server

pathload server

userrm server

rmanager server

getfile server

Figure 9: Applying the complex

server

clustering rule

root worker

worker1 worker

pathload server

cwd server

userrm server

rmanager server

getfile server

worker2 worker

Figure 10: Example where the complex

15

server

rule fails

root worker

worker1 worker

worker3 worker

getfile server

worker4 worker

worker5 worker

findfile server

getfile server

findfile server

cwd server

Figure 11: Applying the administrator

worker6 worker

concept

clustering rule

root worker

getfile server

worker2 worker

findfile server

worker3 worker

cwd server

pathload server

worker4 worker

Figure 12: Example where the administrator

16

concept

rule fails

worker7 worker

The compute-aggregate-broadcast rule, derived from the compute{aggregate{broadcast paradigm, clusters processes that call each other, forming a tree{like structure. An example is given in Figure 13. Processes X of arbitrary type that call exactly one other process Y are clustered with this process, if and only if more than one such process X calls Y . Requiring the existence of more than one process X ensures that this rule clusters only non{trivial structures. Mutual communication is necessary to implement both the aggregation phase, where all processes X communicate with Y and the broadcasting phase, where Y communicates with all X. The resulting cluster inherits name and type from the one process communicating with the rest of the application, process Y . root worker

worker1 worker

worker2 worker

worker1 worker

findfile server

worker2 worker

getfile server

worker3 worker

worker4 worker

worker5 worker

cwd server

Figure 13: Applying the compute-aggregate-broadcast clustering rule Figure 14 depicts a situation where only trivial substructures exist, for example between worker2 and worker4. This contradicts the intuitive notions of aggregation and broadcasting, two of the three phases in the underlying paradigm. Therefore, these two processes are not clustered. Because cwd calls another process (worker5), it is not counted as one of the required processes X calling worker1. Consequently, worker1 also communicates with only one suitable process, findfile, and is not clustered with this process. The last rule describing linear substructures is the pipeline & filter rule, derived from the identically named paradigm. This rule describes a structure where one process (the pipe end) is called from exactly one other process (the pipe head) and calls exactly one third process. Intuitively, the pipe end lters the access from the pipe head to the third process. Figure 15 shows the application of this rule. The resulting cluster is assigned the name and the type of the pipe head. For this rule to be applicable, both conditions have to be met. In Figure 16, for example, rmanager is called only from userrm, but does not call a third process. So while this structure ts the complex server rule, it is not clustered under the pipeline & filter rule. Similarly, userrm is not the pipe end for a pipe starting in worker2, even though userrm calls exactly one third process, rmanager. But userrm is called by worker1 too. Even under the assumption that userrm still primarily works as access lter to rmanager, it is, in general, not decidable whether userrm should be clustered with worker1 or worker2. There is also no obvious reason why worker1 and worker2 should belong to the same cluster, together with userrm, so no cluster is formed.

5.3.2 Biconnected Substructures Biconnected substructures describe sets of processes that communicate in such a way that the removal

17

root worker

worker1 worker

findfile server

worker2 worker

cwd server

worker3 worker

worker4 worker

worker5 worker

Figure 14: Example where the compute-aggregate-broadcast rule fails

worker1 worker

root worker

worker2 worker

worker1 worker

worker2 worker

worker3 worker

rmanager server

userrm server

Figure 15: Applying the pipeline

18

& filter

clustering rule

root worker

worker1 worker

worker2 worker

userrm server

rmanager server

Figure 16: Example where the pipeline

& filter

rule fails

of a single communication link still has the process set at least weakly connected. Two rules describing such structures have been derived from the paradigms discussed earlier: the peer groups rule and the iterative relaxation rule. Both rules have in common that within the set of processes to be clustered no process is more important than the others. Contrary to the linear substructures discussed above, the name and type of the resulting cluster can therefore not be deduced from one of the existing processes. Instead, the clusters will be assigned unique names and the default type worker. The biconnectivity property does not follow from the underlying paradigms directly. But experience has shown that biconnected substructures do in general conform better to the paradigms upon which the clustering rules are based. Furthermore, this property prevents many trivial overlaps with clusters detected by rules in the linear substructures category. The rst rule, peer groups, is derived from a generalization of the paradigms in the result parallelism category. Result parallelism produces a series of values with predictable organization and interdependencies. One possible IPC substructure is a group of interconnected worker processes as shown in Figure 17. Such groups of workers are clustered together, forming a new worker cluster with name PEER CLUSTER xy, where xy is a unique number for each cluster. Figure 18 demonstrates that the biconnectivity must hold between worker processes only. Removing the server process cwd and all communication links leading to/from this process results in a graph that is not biconnected. Consequently, the processes worker1, worker3, worker6 and worker7 are not clustered. The restriction to consider worker processes only is based on the assumption that only this process type will be used in designing and implementing an application following the result parallelism approach. The second rule, iterative relaxation, describes a special case of the peer groups rule. The iterative relaxation paradigm postulates a design in which adjacent regions are assigned to di erent processes. Each process carries out activities local to its region, communicatingwith neighbours when necessary. This communication with the neighbours is, in general, symmetric. Therefore, we impose the additional requirement that in a set of worker processes being a peer group the communication 19

root worker

PEER_CLUSTER_0 worker

PEER_CLUSTER_1 worker

worker1 worker

worker2 worker

worker3 worker

worker4 worker

worker6 worker

worker7 worker

worker5 worker

Figure 17: Applying the peer

groups

clustering rule

root worker

worker1 worker

worker2 worker

worker3 worker

cwd server

worker5 worker

worker6 worker

worker7 worker

Figure 18: Example where the peer

20

groups

rule fails

between any pair of communicating processes is mutual. The resulting worker clusters are assigned generic names of the form RELAX GROUP xy, where xy is a unique number for each cluster. Figure 19 shows an example and Figure 20 depicts a situation where this rule cannot be applied successfully. RELAX_GROUP_1 worker

root worker

worker1 worker

worker3 worker

worker2 worker

worker4 worker

worker5 worker

cwd server

worker6 worker

pathload server

Figure 19: Applying the iterative

relaxation

clustering rule

root worker

worker1 worler

worker3 worker

worker2 worker

worker4 worker

worker5 worker

worker6 worker

cwd server

Figure 20: Example where the iterative 21

relaxation

rule fails

5.3.3 Structural Identity This last subcategory of local process clustering rules contains only one rule, divide & conquer. The divide & conquer paradigm postulates a problem solving approach that splits the original problem into identical smaller problems which are then solved in parallel. Given that the subproblems are identical, the application should contain numerous instantiations of the same process set (necessary to allow the processing of the subproblems in parallel). The process within each set communicate with each other following the same pattern. Furthermore, they only communicate with the same external processes. The divide & conquer clustering rule identi es sets of processes with the following characteristics:

 each process set contains at least three processes (to avoid trivial cases),  the number of processes is the same for all sets,  for every process with name N in one set exists a processes with identical name in all other

sets,  for every IPC between processes X and Y in one set exist IPCs in the same direction between processes X 0 and Y 0 in all other sets where both X and X 0 as well as Y and Y 0 have the same process name and type (internal structural identity), and  for every IPC between a process X in a set and a process Y in the rest of the application exist IPCs in the same direction between processes X 0 in all other sets and process Y where X and X 0 have the same process name and type (external structural identity).

Figure 21 gives an example with four identical process sets according to the above criteria. The resulting clusters are of type worker and are assigned the generic names CLONE GROUP xy, where xy is a unique number for each cluster. CLONE_GROUP_0 worker

CLONE_GROUP_1 worker

root worker

CLONE_GROUP_2 worker

CLONE_GROUP_3 worker

worker1 worker

worker1 worker

worker1 worker

worker1 worker

cwd server

cwd server

cwd server

cwd server

worker2 worker

worker2 worker

worker2 worker

worker2 worker

worker2 worker

worker2 worker

worker2 worker

worker2 worker

pathload server

Figure 21: Applying the divide

& conquer

clustering rule

In Figure 22, which is similar to Figure 21, no identical process sets exist. Due to missing and additional IPC relations, the external structural identity is no longer given. 22

root worker

worker1 worker

worker1 worker

worker1 worker

worker1 worker

cwd server

cwd server

cwd server

cwd server

worker2 worker

worker2 worker

worker2 worker

worker2 worker

worker2 worker

worker2 worker

worker2 worker

worker2 worker

pathload server

Figure 22: Example where the divide

& conquer

rule fails

5.4 Summary This section discussed the various process clustering rules derived. It both justi ed the rules derived as well as explained in detail the substructures matched by each rule. The following sections will discuss these rules under two aspects. First, it has to be shown that the rules describe substructures in real distributed applications. Second, even if all rules do in fact describe existing substructures, it is not clear whether the resulting clusters are good. The evaluate the cluster goodness, a quantitative measure has been developed and will be discussed next.

6 Evaluation of Process Clusters Evaluating process clusters by human inspection is very tedious. To simplify this task, a quantitative measure has been derived, based on the similarity measure de ned in [41]. This measure uses a characteristic vector for each entity (process, software module, etc.)) that counts the references to particular data types in its components. It has been shown that the original measure works well for software units that work on the same user{de ned global data types (such as a group of functions implementing a stack). Our own research, reported in an upcoming technical report, shows that our derived measure is similarly able to discriminate between good and bad process clusters. To adopt the similarity measure to Hermes, special emphasis was placed on communication related types. The vector entries count all references to variables of an inport type, outport type, message type, or a compound type that contains at least one component with a communication related type. These vectors are used to calculate the pairwise similarity between any two processes as: Y Sim(X; Y ) = jjXXjj  jjY jj 23

where X  Y is the inner product of the characteristic vectors X and Y and jjX jj and jjY jj are their magnitudes or lengths. The cohesion for a process cluster P is the average pairwise similarity among processes within this cluster: P ]:ij Sim(pi ; pj ) Cohesion(P) = i;j 2[1;mP m?1 i i=1 where P is a set of processes fp1 ; : : :; pm g. Similarly, the coupling of a process cluster with its environment is calculated as: P Sim(pi ; qj ) Coupling(P) = i2[1;m];j 2m[1;n] n where P is a set of processes fp1; : : :; pm g and fqj g is the set of user processes not in P with jfqj gj = n. Following the well{known modularization criteria coupling{cohesion, we expect good process clusters to show high cohesion and low coupling, see also [18, 43]. The characteristic vectors used to calculate the pairwise similarity are determined by a static source analysis. Preliminary studies showed that the resulting cohesion and coupling values do not always conform with the human evaluation of the same cluster. Particularly, multiple instantiations of the same process source are indistinguishable under the above measure. We therefore modi ed the measure by taking the actual interprocess communication during the application runtime into account: 8 Sim(X; Y ) if X and Y are instantiations of the same source >< Y ) if both X and Y are unique instantiations of their source Simf (X; Y ) = Sim(X; Y ) if X and Y communicate with each other >: Sim(X; 0 otherwise The pairwise similarity between two processes is reduced to zero ( ltered) if at least one of them has sibling instantiations, they are instantiations of di erent source modules, and they do not communicate with each other. Cohesion and coupling values calculated with the ltered similarity measure accurately re ect a human evaluation of the same clusters for all test cases.

7 Results The execution of ve Hermes applications has been traced and served as input to our process clustering tool. This section describes the applications and the setup of our experiments. It then reports on the results obtained: the number of clusters detected per rule and the quantitative evaluation of these clusters.

7.1 Description of the Applications The ve Hermes applications traced are helloworld, makehermes, shuttle, dining philosopher, and dcom. Helloworld is the example program from the Hermes tutorial [49]. On its own, this application is trivial, but the version used here is started from the Hermes shell, which in turn is started from the Hermes cache. All these processes form part of the application too, increasing the number of invoked application processes to 26. Makehermes is the Hermes version of the Unix make tool. A Hermes application consists of separately compiled process and de nition modules which are imported by \linking" and \usage" lists 24

respectively. Parseing these lists in the source reveals all dependencies, a separate make le is not necessary. Makehermes builds a graph structure representing the discovered dependencies and checks, starting from the leaves, whether a source module has to be recompiled. Even with all targets upto-date, executing makehermes generates 175 processes (51 application processes plus 124 system processes). Shuttle simulates an airport shuttle system, similar to the one described in [6] and [34]. This airport shuttle system transports passengers between four terminals. Passengers signal their travel request by pushing a destination button at the control panel of their station and are serviced on a rst{ come{ rst{served basis. Shuttle is a simulation of such a shuttle system, modelling most physical entities with processes for a total of 16 processes. The fourth application traced is a Hermes implementation of the dining philosopher problem with ve philosophers. The implemented version consists of 16 application processes (5 philosophers, 5 forks, 5 eat processes implementing a fairness policy, and one startup process). The last application traced is the Hermes de nition module compiler dcom. In the execution traced, dcom compiled one small de nition module, creating and invoking a total of 55 application processes.

7.2 Setup of Experiments As discussed previously, the process clustering rules are minimal. Sometimes a rule can only be applied after the application of other rules modi ed the previous process/subcluster structure. To take this minimality into account, three clustering levels are distinguished. At level 0, the 10 clustering rules are applied directly to the traced application execution. At level 1, one other rule is applied before the rule of interest, resulting in a total of 100 tested combinations. At level 2, nally, every possible combination of two rules is applied before the clustering rule of interest, totaling 1000 combinations tested. To avoid repeatedly counting identical clusters, duplicates are ltered out. The following discussion presents the results obtained for each level separately as well as the grand total.

7.3 Number of Clusters Detected Tables 1 through 4 show the number of clusters detected for each level and clustering rule. These results lead to the following observations. First, not all rules could be applied successfully at level 0. But starting from level 1, all clustering rules match substructures of the applications examined. Second, the number of clusters detected increases with the level. Furthermore, the number of clusters detected at level 2 is identical with the overall count for all rules because a higher level n subsumes all lower levels m. Applying the two rules administrator concept and iterative relaxation m times does not change the original process structure, see Table 1. Rules applied afterwards work on the same structure as when applied at the lower level m ? n directly. Therefore, all clusters detected at a lower level will also be detected at higher levels. A last observation is that the rules vary widely in the number of clusters detected. Some rules, such as layered system or master-slave, match more substructures and hence result in more clusters. Other rules, such as compute-aggregate-broadcast or iterative relaxation describe less frequent substructures.

25

rule

layered system master-slave client-server complex server administrator concept compute-aggregate-broadcast pipeline & filter peer groups iterative relaxation divide & conquer

no. of occurrences 16 19 05 03 { 01 08 01 { 06

Table 1: Number of clusters formed per rule at level 0 rule

layered system master-slave client-server complex server administrator concept compute-aggregate-broadcast pipeline & filter peer groups iterative relaxation divide & conquer

no. of occurrences 46 41 11 10 01 03 22 05 01 10

Table 2: Number of clusters formed per rule at level 1 rule

layered system master-slave client-server complex server administrator concept compute-aggregate-broadcast pipeline & filter peer groups iterative relaxation divide & conquer

no. of occurrences 80 69 14 16 07 06 38 11 07 10

Table 3: Number of clusters formed per rule at level 2 26

rule

layered system master-slave client-server complex server administrator concept compute-aggregate-broadcast pipeline & filter peer groups iterative relaxation divide & conquer

no. of occurrences 80 69 14 16 07 06 38 11 07 10

Table 4: Overall number of clusters formed per rule

7.4 Quantitative Cluster Evaluation Tables 5 through 8 list the average process cluster evaluation per clustering level and rule. rule

layered system master-slave client-server complex server administrator concept compute-aggregate-broadcast pipeline & filter peer groups iterative relaxation divide & conquer

avg. cohesion avg. coupling 0.51 0.11 0.48 0.10 0.37 0.11 0.72 0.17 | | 0.90 0.22 0.29 0.06 0.57 0.01 | | 0.44 0.21

Table 5: Average cluster evaluation at level 0 As already discussed above, application of the clustering rules at level 2 results in the derivation of all clusters. Therefore, Tables 7 and 8 are identical. In all cases, the average cluster cohesion exceeds the average cluster coupling, indicating that, on average, all rules derive good clusters. A more detailed analysis of the individual results shows that of the 258 clusters derived, only 22 clusters are evaluated negatively (higher clustering than cohesion). The master-slave rule alone accounts for 10 of these 22 cases, out of a total of 69 clusters formed. Those rules that describe less frequent substructures, such as divide & conquer, compute-aggregate-broadcast, iterative relaxation, and administrator concept, never create a bad cluster. The absolute average coupling and cohesion values vary widely for the di erent rules. Table 9 ranks the clustering rules according to the di erence between average cohesion and average clustering. The pipeline & filter, iterative relaxation and administrator concept rules derive clusters with a cohesion only slightly above the coupling. The clusters derived with the compute-aggregate-broadcast and complex server rules are assigned cohesion values exceeding the respective 27

rule

layered system master-slave client-server complex server administrator concept compute-aggregate-broadcast pipeline & filter peer groups iterative relaxation divide & conquer

avg. cohesion avg. coupling 0.44 0.10 0.42 0.10 0.32 0.11 0.61 0.13 0.24 0.10 0.80 0.21 0.27 0.08 0.40 0.13 0.41 0.25 0.42 0.25

Table 6: Average cluster evaluation at level 1 rule

layered system master-slave client-server complex server administrator concept compute-aggregate-broadcast pipeline & filter peer groups iterative relaxation divide & conquer

avg. cohesion avg. coupling 0.40 0.09 0.36 0.10 0.31 0.11 0.52 0.12 0.23 0.11 0.62 0.21 0.24 0.09 0.38 0.21 0.38 0.25 0.42 0.25

Table 7: Average cluster evaluation at level 2 rule

layered system master-slave client-server complex server administrator concept compute-aggregate-broadcast pipeline & filter peer groups iterative relaxation divide & conquer

avg. cohesion avg. coupling 0.40 0.09 0.36 0.10 0.31 0.11 0.52 0.12 0.23 0.11 0.62 0.21 0.24 0.09 0.38 0.21 0.38 0.25 0.42 0.25

Table 8: Overall average cluster evaluation 28

rule

compute-aggregate-broadcast complex server layered system master-slave client-server peer groups divide & conquer pipeline & filter iterative relaxation administrator concept

avg. cohesion - avg. coupling no. of occurrences 0.41 06 0.40 16 0.31 80 0.26 69 0.20 14 0.17 11 0.17 10 0.15 38 0.13 07 0.12 07

Table 9: Rule ranking coupling values by far and are therefore judged to be very good clusters. The rules that describe the most frequent substructures, such as layered system or master-slave, rank in the middle.

8 Conclusions This paper proposes an approach to the automatic construction of a process cluster hierarchy. Clustering rules are derived from programming paradigms for distributed applications. All clustering rules describe program substructures in existing Hermes applications. The resulting clusters are evaluated by a quantitative measure. They are, on average, all evaluated to be good, but with a varying degree of goodness. The two rules deriving the best clusters are the compute-aggregate-broadcast rule and the complex server rule. The rules identifying the most substructures are the layered system and the master-slave clustering rules. The resulting clusters, however, are evaluated to be inferior to the fewer clusters derived with the rst two rules. Sometimes, these rules generated outright bad clusters. These results show that all clustering rules should be incorporated into an automatic process clustering tool. The order in which these rules are to be applied could be statically determined by the ranking shown in Table 9. However, this ranking is based on the average goodness of a rule. A more sophisticated clustering tool will try to apply multiple rules concurrently and select the best ones. This also helps avoiding the derivation of bad clusters with \in general" good rules. One problem that occurred a few times is the large resulting cluster size. The application of a clustering rule might create a process cluster with overly many processes. To enable control of the cluster size, a statistics based clustering approach will be integrated into the clustering tool as well. Process abstraction is not the only possible abstract view of an application's execution. Work is under way to study the issues in automatic event abstraction. Our goal is to identify equivalents to both the clustering rules as well as the semantic information derived from a static source analysis that will guide event abstraction.

29

References [1] Proceedings of the ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, Madison, Wisconsin, May 1988. Appeared as ACM SIGPLAN Notices, 24(1), January 1989. [2] Proceedings of the Conference on Object{Oriented Programming Systems, Languages, and Applications and the European Conference on Object{Oriented Programming, Ottawa, Canada, October 1990. [3] Proceedings of the 14th International Conference on Software Engineering, Melbourne, Australia, May 1992. [4] Allen L. Ambler, Margaret M. Burnett, and Betsy A. Zimmermann. Operational Versus De nitional: A Perspective on Programming Paradigms. IEEE Computer, 25(9):28{43, September 1992. [5] Gregory R. Andrews and Fred B. Schneider. Concepts and Notation for Concurrent Programming. ACM Computing Surveys, 15(1):3{43, March 1983. [6] Colin Atkinson, Trevor Moreton, and Antonio Natali, editors. Ada for Distributed Systems. Cambridge University Press, Cambridge et al., 1988. [7] Henri E. Bal, Jennifer G. Steiner, and Andrew S. Tanenbaum. Programming Languages for Distributed Computing Systems. ACM Computing Surveys, 21(3):261{322, September 1989. [8] Peter Bates. Distributed Debugging Tools for Heterogeneous Distributed Systems. In Proceedings of the 8th International Conference on Distributed Computing Systems, pages 308{315, San Jose, California, June 1988. [9] P. Benedusi, A. Cimitile, and U. De Carlini. A Reverse Engineering Methodology to Reconstruct Hierarchical Data Flow Diagrams for Software Maintenance. In Proceedings of the Conference on Software Maintenance, pages 180{189, Los Alamitos, CA, 1989. [10] David Bustard, John Elder, and Jim Welsh. Concurrent Program Structures. Prentice Hall International Ltd, 1988. [11] Nicholas Carriero and David Gelernter. How to Write Parallel Programs: A Guide to the Perplexed. ACM Computing Surveys, 21(3):323{357, September 1989. [12] Yih-Farn Chen, Michael Y. Nishimoto, and C. V. Ramamoorthy. The C Information Abstraction System. IEEE Transactions on Software Engineering, 16(3):325{334, March 1990. [13] Elliot J. Chikofsky and James H. Cross II. Reverse Engineering and Design Recovery: A Taxonomy. IEEE Software, 7(1):13{17, January 1990. [14] Aniello Cimitile and Ugo de Carlini. Reverse Engineering: Algorithms for Program Graph Reduction. Software|Practice and Experience, 21(5):519{537, May 1991. [15] Michel Cosnard, Yves Robert, Patrice Quinton, and Michel Raynal, editors. Parallel and Distributed Algorithms. Elsevier Science Publishers B.V. (North{Holland), 1989. [16] J. W. de Bakker, A. J. Nijman, and P. C. Treleaven, editors. PARLE: Parallel Architectures and Languages Europe, volume II: Parallel Languages. Springer{Verlag, Berlin et al., 1987. 30

[17] Jack Dongarra, Iain Du , Patrick Ga ney, and Sean McKee, editors. Vector and Parallel Computing: Issues in Applied Research and Development. Ellis Horwood Limited, Chichester, 1989. [18] Richard Fairley. Software Engineering Concepts. McGraw{Hill Series in Software Engineering and Technology. McGraw{Hill Book Company, New York et al., 1985. [19] Raphael A. Finkel. Large{grain parallelism { Three case studies. In Leah H. Jamieson, Dennis Gannon, and Robert J. Douglas, editors, The Characteristics of Parallel Algorithms, pages 21{63. The MIT Press, Cambridge, Massachusetts and London, England, 1987. [20] Robert W. Floyd. The Paradigms of Programming. Communications of the ACM, 22(8):455{ 460, August 1979. 1978 ACM Turing Award Lecture. [21] Hector Garcia-Molina, Frank Germano, Jr, and Walter H. Kohler. Debugging a Distributed Computing System. IEEE Transactions on Software Engineering, 10(3):210{219, March 1984. [22] W. Morven Gentleman. Message Passing Between Sequential Processes: the Reply Primitive and the Administrator Concept. Software|Practice and Experience, 11:435{466, 1981. [23] Sabine Habert, Laurence Mosseri, and Vadim Abrossimov. COOL: Kernel Support for Object{ Oriented Environments. In Proceedings of the Conference on Object{Oriented Programming Systems, Languages, and Applications and the European Conference on Object{Oriented Programming, pages 269{277, Ottawa, Canada, October 1990. [24] Ellis Horowitz, editor. Programming Languages: A Grand Tour. Computer Science Press, Inc.,

[25]

[26]

[27] [28] [29] [30] [31] [32]

Rockville, MD, 3 edition, 1987. Alfred A. Hough and Janice E. Cuny. Initial Experiences with a Pattern{Oriented Parallel Debugger. In Proceedings of the ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, pages 195{205, Madison, Wisconsin, May 1988. Appeared as ACM SIGPLAN Notices, 24(1), January 1989. Wenwey Hseush and Gail E. Kaiser. Data Path Debugging: Data{Oriented Debugging for a Concurrent Programming Language. In Proceedings of the ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, pages 236{247, Madison, Wisconsin, May 1988. Appeared as ACM SIGPLAN Notices, 24(1), January 1989. Proceedings of the 8th International Conference on Distributed Computing Systems, San Jose, California, June 1988. Proceedings of the Conference on Software Maintenance, Los Alamitos, CA, 1989. Proceedings of the 10th International Conference on Distributed Computing Systems, Paris, France, May 1990. Proceedings of the Second IEEE Workshop on Future Trends of Distributed Computing Systems, Cairo, Egypt, September 1990. The Institute of Electrical and Electronics Engineers, Inc., New York, NY, USA. IEEE Standard Glossary of Software Engineering Terminology, 1983. ANSI/IEEE Std 729. Leah H. Jamieson, Dennis Gannon, and Robert J. Douglas, editors. The Characteristics of Parallel Algorithms. The MIT Press, Cambridge, Massachusetts and London, England, 1987. 31

[33] Je rey Joyce, Greg Lomow, Konrad Slind, and Brian Unger. Monitoring Distributed Systems. ACM Transactions on Computer Systems, 5(2):121{150, May 1987. [34] Je Kramer, Je Magee, and Anthony Finkelstein. A Constructive Approach to the Design of Distributed Systems. In Proceedings of the 10th International Conference on Distributed Computing Systems, pages 580{587, Paris, France, May 1990. [35] David W. Krumme, Alva L. Couch, and George Cybenko. Debugging Support for Parallel Programs. In Jack Dongarra, Iain Du , Patrick Ga ney, and Sean McKee, editors, Vector and Parallel Computing: Issues in Applied Research and Development, pages 205{214. Ellis Horwood Limited, Chichester, 1989. [36] Robert Chi Tau Lai. Ada Task Taxonomy Support for Concurrent Programming. ACM SIGSOFT Software Engineering Notes, 16(1):73{91, January 1991. [37] D. C. Luckham, D. P. Helmbold, D. L. Bryan, and M. A. Haberler. Task Sequencing Language for Specifying Distributed Ada Systems: TSL{1. In J. W. de Bakker, A. J. Nijman, and P. C. Treleaven, editors, PARLE: Parallel Architectures and Languages Europe, volume II: Parallel Languages, pages 444{463. Springer{Verlag, Berlin et al., 1987. [38] Sape Mullender, editor. Distributed Systems. Addison{Wesley Publishing Company, New York, New York, 1989. [39] Philip A. Nelson and Lawrence Snyder. Programming Paradigms for Nonshared Memory Parallel Computers. In Leah H. Jamieson, Dennis Gannon, and Robert J. Douglas, editors, The Characteristics of Parallel Algorithms, pages 3{20. The MIT Press, Cambridge, Massachusetts and London, England, 1987. [40] Proceedings of the 3rd Reverse Engineering Forum, Burlington, Massachusetts, September 1992. [41] Sukesh Patel, William Chu, and Rich Baxter. A Measure for Composite Module Cohesion. In Proceedings of the 14th International Conference on Software Engineering, pages 38{48, Melbourne, Australia, May 1992. [42] Michel Raynal. Distributed Algorithms: Their Nature & The Problems Encountered. In Michel Cosnard, Yves Robert, Patrice Quinton, and Michel Raynal, editors, Parallel and Distributed Algorithms, pages 179{185. Elsevier Science Publishers B.V. (North{Holland), 1989. [43] Linda Rising and Frank W. Calliss. Problems with Determining Package Cohesion and Coupling. Software { Practice and Experience, 22(7):553{571, July 1992. [44] A. Schill, L. Heuser, and M. Muhlhauser. Using the object paradigm for distributed application development. In Kommunikation in verteilten Systemen. Springer{Verlag, Berlin et al., 1989. [45] Stuart Sechrest. Interprocess Communication. Technical report, PCS GmbH, 1988. [46] Rudolph E. Seviora. Knowledge{Based Program Debugging Systems. IEEE Software, 4(3):20{ 32, May 1987. [47] Mary Shaw. Larger Scale Systems Require Higher{Level Abstractions. 5th International Workshop on Software Speci cation and Design, 14(2):143{146, May 1989. Appeared as ACM SIGSOFT Software Engineering Notes. [48] Robert E. Strom, David F. Bacon, Arthur P. Goldberg, Andy Lowry, Bill Silvermann, Daniel Yellin, Jim Russell, and Shaula Yemini. Hermes: Unix User's Guide, Version 0.8alpha. Technical report, IBM T.J.Watson Research Center, Yorktown Heights, New York, USA, March 1992. 32

[49] Robert E. Strom, David F. Bacon, Arthur P. Goldberg, Andy Lowry, Daniel M. Yellin, and Shaula Alexander Yemini. HERMES: A Language for Distributed Computing. Prentice Hall, Inc., Englewood Cli s, New Jersey, 1991.

33