Document not found! Please try again

A Structured Approach to Instrumentation System ... - Semantic Scholar

4 downloads 81594 Views 100KB Size Report
technologies and broadening of application areas where these tools are being used, .... components of any software monitor for a parallel or distributed system.
A Structured Approach to Instrumentation System Development and Evaluation Abdul Waheed† and Diane T. Rover Department of Electrical Engineering 260 Engineering Building Michigan State University East Lansing, MI 48824 E-mail: {waheed, rover}@egr.msu.edu Phone: 517-353-7735 FAX: 517-353-1980 Abstract Software instrumentation is a widely used technique for parallel program performance evaluation, debugging, steering, and visualization. With increasing sophistication of parallel tool development technologies and broadening of application areas where these tools are being used, runtime data collection and management activities are growing in importance; we use the term instrumentation system (IS) to refer to components that support these activities in state-of-the-art parallel tool environments. An IS consists of Local Instrumentation Servers, an Instrumentation System Manager, and a Transfer Protocol. The overheads and perturbation effects attributed to an IS must be accounted for to ensure correct and efficient representation of program behavior, especially for on-line and real-time environments. Moreover, an IS is a key facilitator of integration of tools in an environment. In this paper, we define the primary components of an IS and their roles in an integrated environment, and classify ISs according to selected features. We introduce a structured approach to plan, design, model, evaluate, implement, and validate an IS. The approach provides a means to formally address domain-specific requirements. The modeling and evaluation processes are illustrated in the context of three distinctive IS case studies for PICL, Paradyn, and Vista. Valuable feedback on performance effects of IS parameters and policies can assist developers in making design decisions early in the software development cycle. Additionally, use of structured software engineering methods can support the mapping of an abstract IS model to an implementation of the IS.

† Student and presenting author 0

A Structured Approach to Instrumentation System Development and Evaluation 1

Introduction and Motivation

Software instrumentation is a widely used technique for parallel program performance evaluation, debugging, and visualization. Parallel tools rely on execution information regarding the states and behavior of application programs to provide useful feedback to the user. With increasing sophistication of parallel tool development technologies, runtime data collection and management activities are receiving more attention from tool developers [21]. Parallel tool developers are focusing on integrated parallel tool environments [30] and frameworks [26], performance evaluation of real-time systems [1], and program steering [6]. The overheads and perturbation effects associated with data collection and management are of critical importance in these emerging technologies, and, therefore, deserve special attention. We use the term instrumentation system (henceforth, IS) to specifically consider the data collection and management components in state-of-the-art parallel tool environments [29]. This paper presents a structured approach to plan, design, model, evaluate, and implement an IS to address specific requirements imposed by the parallel tool environment that it supports. It represents one of very few documented efforts to formally evaluate IS design early in the development cycle, thus treating the IS as any other complex software system. Moreover, it provides the basis for an automated path toward IS implementation via a mapping of the abstract system model using software engineering techniques. Finally, we cannot overemphasize the role of the IS to support integration in next-generation parallel tool environments. In order to put our IS development and evaluation approach in the proper perspective, we have examined a number of parallel tool development efforts, some of which are reviewed in Section 4. A majority of the ISs in current tool environments have been developed in a manner that can best be described as ad hoc, with insufficient or no evaluation of their overheads. Typical activities of an instrumentation system—such as the rate of data arrivals (collection), competition and contention between application and IS processes for shared system resources, message passing among various IS modules, etc.—are nondeterministic. Any system supporting such activities can not be evaluated reliably unless its modules and activities are appropriately specified. We have specified and modeled some of the well-known and popularly used ISs, including those of PICL [4] and Paradyn [19], to evaluate their overheads to the application programs and systems with respect to the specific requirements of their environments. These case studies demonstrate the promise of this methodology and lead to a structured development approach for ISs for the next generation of parallel tool environments. 1

This paper classifies an IS in terms of (1) the time constraints imposed by analysis tools in the environment, and (2) IS development, management, and evaluation approaches (including any cost models used for evaluation). Such classification is a prerequisite to a structured IS development and evaluation methodology. An IS may support off-line or on-line (or even real-time) performance analysis, visualization, and steering in an environment. The IS may be developed as a software module that is hard-coded into the rest of the environment or as a customizable application-specific module. It can manage the information collected from the executing concurrent processes in a static, adaptive, or application-specific manner. Evaluating an IS involves metrics and a cost model, which accounts for IS overheads. We classify an IS along these dimensions in Section 2. In order to develop an IS in a structured manner, we propose a rapid prototyping, two-level approach, as depicted in Figure 1. On a higher-level, requirements of the IS are either determined by the developer or specified by the tool users. These requirements are transformed to detailed lower-level system specifications, which are subsequently mapped to a model representing the structure and dynamics of the IS. This model is parameterized and evaluated with respect to chosen performance metrics that reflect the critical IS overheads to the application program as well as the target system. The evaluation results are then translated back to the higher-level, so that conclusions can be drawn by tool developers and users regarding IS performance. Feedback from the IS prototyping process is used to modify either the requirements or the system specifications to obtain desired performance. Finally, the model becomes the blueprint for actual synthesis of the IS. More specifically, we are applying object modeling techniques in this process with the intent of using object-oriented software engineering methods to translate the abstract system model into the software modules for the actual system.

IS Requirements

Feedback from the evaluation process

IS Evaluation

Higher-level qualitative considerations

Lower-level quantitative considerations System Specifications

IS Model

Parameterization

Model Calculations

IS Synthesis

Figure 1. Two levels of a structured IS development approach.

Realization of a tool in general, and an IS in particular, is a non-trivial process requiring many person2

hours of programming effort. Moreover, evaluation of a tool by users upon its release typically leads to requests for corrections, changes, or enhancements in its function. In contrast, rapid prototyping and preliminary evaluation of an IS using the approach presented in this paper can be applied to ensure that specific requirements of a tool environment are met prior to the investment in programming effort. This process is likely to deliver better performance, be less costly, and yield greater user satisfaction. We present three case studies in Section 3 that follow the IS development approach depicted in Figure 1. Section 4 discusses the related work that provides an appropriate context to appreciate the relevance of this work. We conclude with a discussion of the significance of this research.

2

Terms and Classifications

This section defines the terminology that we use throughout this paper. Several terms are identical to those used by others in the literature on monitoring systems, which serves to establish some consistency and clarity of discussion in this area.

2.1 Monitoring A monitor is a tool used to observe the activities of a system. In general, monitors observe the performance of systems, collect performance statistics, analyze the data, and display results. Monitors are used by performance analysts, programmers, system designers, and system managers [11]. Monitors are usually classified with respect to implementation level (software, hardware, or hybrid), trigger mechanism (eventdriven or time-driven, i.e., sampling), and data presentation and analysis mode (on-line or off-line). For a parallel or distributed system, a monitor is responsible for the collection and analysis of distributed program information [21, 27].

2.2 Instrumentation System An instrumentation system (IS) is the part of a monitor that is concerned with the collection and management of performance information. The scope of an IS spans three modules or functions that are components of any software monitor for a parallel or distributed system. These components are defined in the following subsections. We use the term instrumentation data to account for both execution information (messages, memory references, I/O calls, etc.) and program information (variables, arrays, objects, etc.). We have developed a generic instrumentation system model that represents a majority of components found in extant ISs and omits unnecessary implementation details. This generic model is depicted in Figure 3

2. The model defines three components of an IS that supports tool integration: (1) local instrumentation server (LIS), (2) instrumentation system manager (ISM), and (3) transfer protocol (TP). In Section 3, we study the ISs of selected tools with respect to this model. User interactions

Concurrent system nodes

Local Instrumentation Server (LIS)

Set of supported tools

Instrumentation System Manager (ISM) Input buffers

Instrumentation data processor

Raw Control instrumentation data Local interconnection network

Output buffers

Processed instrumentation data

Tools with front-ends for control and data integration

TP

Transfer Protocol (TP)

Storage hierarchy Target parallel/distributed system

Integrated parallel tool environment on the host system

Figure 2. Components of a typical instrumentation system supporting an integrated tool environment.

2.2.1 Local Instrumentation Server The Local Instrumentation Server (LIS) captures instrumentation data of interest from the concurrent application processes and forwards the data to other IS modules for consumption by appropriate tools. Typically, the LIS uses local buffers and a management policy to accomplish data capturing and forwarding functions. Instrumentation code is inserted in the application program statically at compile-time (for instance, using PICL [4]) or dynamically at runtime (for instance, using Paradyn [19]) to capture instrumentation data for profiling or tracing program execution. Ogle et al. [21] describe the LIS part of the monitor in their Issos environment in terms of sensors, probes, and tracing buffers. As in PICL, an LIS can simply comprise instrumentation library calls responsible for storing data in local buffers or forwarding data to analysis tools. Or, as in Paradyn, it may consist of a separate process for each node of the concurrent system, which handles instrumentation data management independent of the application processes. Existing monitoring systems use varying terminology for an LIS; for example, Paradyn calls it a Paradyn daemon, and Issos, a resident monitor. However, we use the term LIS as an abstraction for specific implementations of data capturing and forwarding functionality.

4

2.2.2 Instrumentation System Manager The LIS forwards instrumentation data from the concurrent system nodes to a logically centralized location called the Instrumentation System Manager (ISM), which manages the data in real-time. The functions of the ISM include temporary buffering of data, storing of data on a mass-storage device, and pre-processing of data for analysis and/or visualization tools (e.g., causal ordering). Functional requirements of an ISM that supports on-line tool usage are different in nature than for one that supports off-line tool usage. Similarly, different requirements are associated with an integrated tool environment versus a stand-alone tool. For instance, on-line tool usage may require the ISM to order data on-the-fly before submission to a tool; whereas an ISM for off-line tool usage may only need to merge data from various application processes, performing event-ordering off-line. We reflect this programmability by defining an instrumentation data processor module within the ISM in Figure 2. Tools receive instrumentation data from ISM output buffers or a mass storage device, depending on on-line or off-line usage, respectively. The ISM components in Paradyn [19] and Issos [21] monitoring systems are known as the main Paradyn process and the central monitor, respectively. Some tool developers, such as Ogle, Schwan, and Snodgrass [21], favor a different partitioning of pre-processing functions, implementing data reduction/analysis in the LIS rather than in the ISM. The definitions of the LIS and the ISM do not preclude this.

2.2.3 Transfer Protocol Instrumentation data are transferred from the LIS to the ISM and further to various analysis and visualization tools in an integrated tool environment. Data transfer to the tools is typically accompanied by an exchange of control signals between the ISM and a tool (for instance, as in the Vista toolkit [28]). Additionally, control messages may need to be passed between the ISM and concurrent application processes (directly or via the LIS) to control program execution as dictated by debugging and steering tools in the environment [7]. Usually, a consistent instrumentation data and control transfer protocol (TP) is used for IS-related communications. A majority of existing monitors use operating system-supported interprocess communication abstractions (such as sockets in Pablo [22] and Issos [21], pipes in Paradyn [19], and remote procedure calls in TAM [24]) to accomplish this purpose. Some monitors (such as Hewlett-Packard’s VIZIR [7]) implement customized high-level protocols, developed on top of operating system functions, to enhance the flexibility and portability of the instrumentation data transfer and control messaging mechanisms.

5

2.3 Integrated Parallel Tool Environment A proliferation of parallel program performance analysis, debugging, steering, and visualization tools has led to the development of several integrated parallel tool environments to enhance the usability of individual tools [30]. An integrated parallel tool environment supports the use of multiple, possibly heterogeneous, tools that cooperate for carrying out one or more analyses of the same parallel program. Tools built by different developers are referred to as heterogeneous tools by Hao et al. [7]. An integrated environment may support off-line tool usage, such as TAU [2] and ParaVision [20]; homogeneous on-line tool usage, such as Paradyn [19]; or a combination of the two, such as SPI [1], VIZIR [7], and ParAide [24]. Malony [18] presents a classification of measurement-based tools comprising four classes: profilebased (sampling), trace-based, prediction-based, and automated (dynamic, adaptive, or knowledge-based management). These types of tools, among others, are typically found in integrated environments. Integrated parallel tool environments rely on particular mechanisms invoked by the IS to capture, process, and consume instrumentation data. Figure 3 shows the basic technologies in use for tool integration. Tools are integrated with the support of debuggers, operating systems, languages and compilers, or runtime libraries to capture execution and program information from the application processes. Operating system interprocess communication abstractions, such as remote procedure call (RPC), socket, and pipe, are commonly used for transferring instrumentation data. Graphics libraries and graphical user interfaces, such as OpenGL, Tcl/Tk, and X/Motif, provide the user with a consistent view and control of the environment. An IS provides a subset of the functionality of an integrated environment, as can be seen by correlating Figure 2 with Figure 3. Clearly, the IS plays a central role in integration; it can cause undesirable and unexpected overhead and perturbation to an application program if its design is not properly evaluated.

2.4 IS Classification To address the issues related to its design and evaluation, we classify an IS in terms of (1) off-line versus on-line tool usage, i.e., the time constraints imposed by analysis tools in the environment, and (2) IS development, management, and evaluation approaches (including any cost models used for evaluation). These dimensions of IS classification are defined in this subsection. Off-line IS: An IS that supports analysis, debugging, and/or visualization of a parallel application program as a batch process after program execution is called an off-line IS. The LIS and ISM still collect and manage instrumentation data in real-time for such ISs. The ISM simply stores the data for post-processing.

6

Concurrent processes

Instrumentation data Control

Integration technology supported by a centralized location • • • •

Debugger based OS based Compiler based Library based

Instrumentation data

Control mechanisms • RPC • Sockets • Pipes

Tools

Control

Types of tools Presentation mechanisms • X/Motif • Tcl/Tk • OpenGL

• Performance evaluation • Debugging • Steering • Visualization

User interactions

Figure 3. Basic components and technologies for a typical integrated parallel tool environment.

On-line IS: An IS that supports analysis, debugging, steering, and/or visualization of a parallel application program in real-time, concurrent with application program execution, is called an on-line IS. In this case, the ISM interacts with the tools and dispatches instrumentation data as soon as on-the-fly preprocessing is finished. When an on-line IS supports an integrated tool environment, it maintains a steady flow of runtime data to the tools. IS Development: The process of planning, designing, and synthesizing an IS for a particular hardware platform is collectively referred to as IS development. The multi-step process is mandated in light of the complexity stemming from system integration issues that need to be addressed. IS Management: An IS is a combination of the IS components (as defined above) and a set of policies that dictate component behavior. IS management refers to the policies that are used to schedule various activities of the LIS and ISM parts of the system. These policies may be static, adaptive, or applicationspecific. Decisions regarding instrumentation data storage in local buffers, communication of data to the ISM or tools, and usage of shared system resources can greatly impact the performance of user programs. Therefore, these decisions are instituted via an IS management policy. IS Evaluation: An IS perturbs the performance of an application program at the LIS level due to data capturing and forwarding functions. Additionally, the functions at the ISM level can introduce excessive 7

delay between the time that instrumentation data are received by the ISM and dispatched to tools. These overheads are the subject of IS evaluation.

3

IS Modeling and Evaluation

Now that the concept of an instrumentation system has been elaborated upon, we proceed with the details of the structured development approach outlined in Section 1. The specification, modeling, and evaluation of three ISs, PICL, Paradyn, and Vista, are considered in the following subsections as case studies to illustrate the approach. The PICL IS is modeled to demonstrate proof-of-concept [4]. The Paradyn and Vista ISs are modeled and evaluated as part of on-going tool development efforts with specific goals and requirements. The primary objective of the techniques presented here is to answer “what-if” questions to provide feedback for making design decisions. Investigating such questions with measurement, experimentation, and analysis of prototype and production systems is often non-trivial.

3.1 PICL IS Portable Instrumented Communication Library (PICL), designed at Oak Ridge National Laboratory, provides efficient communication functions that are easily portable to various multicomputer and distributed computing platforms [4]. Instrumentation is an additional feature, and when combined with a tool such as ParaGraph, it supports program performance analysis and animation [8]. In order to instrument an application program, PICL library functions are inserted in the program by the user before compilation. During program execution, calls to these functions generate instrumentation data in a particular event record format and log the data in a local buffer of each node. The user specifies the size of the buffer. These buffers are typically flushed at the end of program execution and merged into a single trace file at the host system. Management of PICL’s IS is essential for a long-running program because local buffers will overflow with the immense amount of instrumentation data generated during program execution. By default, data collection stops after a buffer becomes full. Local buffers need to be flushed to allow continued data collection. The objectives of IS management are: (1) to optimize the use of limited resources, such as local memory; and (2) to minimize the adverse effects of excessive perturbation due to instrumentation data buffer flushes. We have identified two management policies for the PICL IS: Flush One buffer when it Fills (FOF) and Flush All the buffers when One Fills (FAOF). Neither of these is the default policy; and only FOF is actually supported as a PICL option, however, other IS developers have favored FAOF. The objective of modeling and evaluating this IS is to analyze the overhead of each policy and guide in the 8

selection of an appropriate policy.

3.1.1 System Specifications Specifications necessary for creating a model of the PICL IS based on the components of Figure 1 are summarized in Table 1. System requirements place the IS in the class of off-line systems. The platform to be considered is a distributed-memory parallel system consisting of P processors. The LIS is implemented with an instrumentation library, and the ISM and TP provide a means to merge the data as a trace file. The management policies are static. Table 1. Specifications characterizing the PICL instrumentation system. Analysis Requirements Off-line

Platform Multicomputer system (e.g., nCUBE)

LIS

ISM

Instrumentation library with trace data buffers at each node

Instrumentation library with merging distributed buffers as a trace file

TP Parallel I/O

Management Policy Static management policy implemented by the programmer

3.1.2 The IS Model The IS model is diagrammed in Figure 4. The concurrent LIS is modeled as a set of single-server (M/G/1) queues, one at each processor, as shown in Figure 4. The instrumentation data arrive at local buffers in response to the occurrence of an event of interest at a local processor. The capacity of each local M/G/1 buffer is l records; and the inter-arrival times at each of these buffers are assumed independent and exponentially distributed with rate α. The data in local buffers are transferred dynamically to the host system when the buffers become full. A larger buffer in the main memory of the host, called the main instrumentation data buffer, is the next level of the trace data storage hierarchy. At the end of the program, all trace records are transferred to the main buffer. The main buffer, in turn, may be flushed to the next level of the storage hierarchy, for example, a disk. The storage capacity is assumed to increase with each level in the storage hierarchy. The scope of the IS model considered in this paper is restricted only to the local buffers, but it extends naturally to higher levels of the storage hierarchy. With this model, we can perform a wide range of experiments using different parameters and calculate various metrics in order to evaluate IS performance under alternative management policies. Two of the metrics selected for comparing the FOF and FAOF policies are: (1) length of the time interval after which a local buffer becomes full and needs to be flushed (trace stopping time); and (2) ratio of the number of

9

Concurrent Computer System

p0

p1

p2

LIS

pP–1

Processors

Programs

Program Results Trace records

Local buffers of lengths Qi(t), each of capacity l

Distributed service facility

Host service facility

ISM (off-line)

Main instrumentation data buffer

Disk-based buffer

Instrumentation data pages

Instrumentation data segments

Front-End Host System

Figure 4. Model for a concurrent LIS and ISM developed from PICL IS.

flushes to the number of arrivals for a local buffer during a program’s execution (frequency of buffer flushes). These metrics represent the lower level quantitative considerations of Figure 1. Each metric, its method of calculation, and its interpretation are summarized in Table 2. Further evaluation using these metrics yields higher level feedback to aid in design decisions (see the next section). Table 2. Metrics for evaluating the PICL IS management policies. Metric

Calculation

Interpretation

Trace stopping time

Stochastic analysis of arrivals to local buffers

A higher value is desirable

Flushing frequency

Regenerative nature of buffer filling stochastic process

A higher value indicates greater overhead to the user program

A summary of the analytical results for the PICL IS model is presented in Table 3 (See [29] for derivations). These results compare the FOF and FAOF policies using expected trace stopping time and flushing frequency for a given arrival rate. The analytical results for the FAOF policy are obtained under the assumption that the arrival rates at all nodes are identical; and the results for the FAOF policy provide a theoretical lower bound on trace stopping time and hence an upper bound on flushing frequency.

3.1.3 IS Evaluation Analytical results for the PICL IS for off-line visualization are based on the observation that the process of filling and flushing a buffer is a regenerative process. That is, after one cycle of buffer filling and then flushing at the end of that cycle, the same cycle repeats again, independent of the first cycle. This process

10

Table 3. Summary of management policies. Performance Metric

FOF Policy

FAOF Policy

( αt ) l P [ τ l( i ) ≤ t ] = e –αt --------------------Γ ( l + 1)

Distribution

Expected trace stopping time Long-term flushing frequency

( αt ) l 1-e – αt -------------l!

P [ τl > t] =

P

1 E [ τ l( i ) ] = l ⋅ --α

l E [ τ l ] ≥ minE [ τ l ] = ------Pα

1 ω o = ---------------------l + αf ( l )

1 ω a ≤ -------------------------l + Pαf ( l )

τl(i) and τl represent the trace stopping times at i-th buffer under the FOF policy and on all buffers under FAOF policy, respectively. Capacity of each local buffer is l records. There are total P processors allocated for the instrumented program under consideration. Message passing time is a linear function of l and is represented by the function f(l).

continues till the end of instrumented program execution. Therefore, the proportion of time spent by the instrumentation system in the “flushing state” throughout program execution is the same as the proportion of time spent in this state during one cycle (Smith’s theorem [5, 23]). This result is used to determine the flushing frequency under either of the two policies [29]. Flushing frequencies for three different arrival rates are plotted in Figure 5 for variable buffer capacities; these show that the flushing frequency is lower under the FAOF policy for a given arrival rate. This behavior is more obvious as the arrival rate increases. These results were compared and validated with simulation and measurement results [29]. o

FAOF policy FOF policy

+

0.1

0.09

0.09

0.08

−3

0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 10

20

30

40

50

60

70

80

90

100

Flushing frequency

Flushing frequency

Flushing frequency

2.5

0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 10

20

30

40

50

60

70

80

90

100

x 10

2

1.5

1

0.5

0 10

20

30

40

50

60

70

Buffer capacity l

Buffer capacity l

Buffer capacity l

(a)

(b)

(c)

80

90

100

Figure 5. Comparison of buffer flushing frequencies of the FOF and FAOF policies for three arrival rates, (a) α=0.0008, (b) α=0.007, and (c) α=2.

The analytical results depicted by Figure 5 indicate that the FAOF policy is preferable with respect to the frequency of buffer flushes, consequently minimizing the time delays due to flushing operations. Intuitively, if the arrival rate of instrumentation data at all local buffers is nearly identical, the likelihood of filling other buffers soon after one buffer fills is very high. Therefore, in the long run, the overall frequency of flushing will be smaller compared with the FOF policy. On the other hand, the FAOF policy is not easily

11

implemented by a programmer, because trace stopping time is non-deterministic and all processes need to be synchronized to gang schedule the flushing operation. The programmer may be required to modify the program so that it runs in lock-step (e.g., with regularly scheduled barrier synchronizations) and allows one process to initiate the FAOF policy. This may result in unacceptable overheads to the program. Interestingly, IS developers for multicomputer systems tend to favor the FAOF policy over the FOF policy. It has been implemented in Pablo on the CM-5 [22] and in ParAide’s Tools Application Monitor (TAM) on the Intel Paragon [24]. All processes are context-switched to flush their local buffers. After flushing, the entire set of processes restarts execution with little or no perturbation of program behavior. On the other hand, the FOF policy can severely perturb program behavior. In fact, PICL developers recommend not using the FOF policy for exactly this reason [4]. Quantitative calculation of program perturbation, which can change the actual order of events, is still a challenge [16].

3.2 Paradyn IS The Paradyn environment, developed at the University of Wisconsin, has been implemented on a CM-5 and a cluster of Unix workstations. We have modeled the Paradyn IS for the workstation cluster. It provides data collection support for Paradyn’s W3 search model [10], which analyzes program performance bottlenecks by measuring system resource utilization with appropriate metrics. When the search algorithm needs to analyze a particular metric, instrumentation is inserted dynamically in the program during runtime to generate samples of that metric value. Therefore, the W3 search methodology uses a minimal amount of instrumentation to provide a structured and automated way for a programmer to isolate performance bottlenecks. The Paradyn IS supports an on-the-fly bottleneck search process by continuously providing instrumentation data to the ISM (main Paradyn process). Required instrumentation data are sampled from the application processes executing on each node of the system. These samples are forwarded to the LIS (local Paradyn daemon), which forwards them to the ISM. The rate of sampling of data progressively decreases over time during an interval when instrumentation is present in the program.

3.2.1 System Specifications Specifications necessary for creating a model of the Paradyn IS based on the components of Figure 1 are summarized in Table 4. An overview of the entire IS is depicted in Figure 6. The figure denotes the application processes that are instrumented by the local Paradyn daemon at node i as pji for j =0,1,..., n-1,

12

where the number of application processes n at a given node may differ from another node. The IS components are specified in terms of LIS and ISM for consistency. Table 4. Specifications characterizing the Paradyn instrumentation system. Analysis Requirements On-line

Platform

LIS

ISM

Cluster of workstations

Local daemon process for each node that collects samples from application processes and forwards data

Main Paradyn process that accepts data from daemons and uses data for analysis

TP Unix-based interprocess communication

Management Policy Adaptive management policy implemented by the tool developers

ISM

Main Paradyn processes

Host workstation Paradyn daemons

Application processes

LIS

p0 0

LIS

pn-10

pn-1P-1

p0P-1

Node 0

Node P-1

Figure 6. An overview of the Paradyn IS [19].

3.2.2 The IS Model We model the overall dynamics of the Paradyn IS with a queuing network, as depicted graphically in Figure 7. On each node, the LIS (i.e., local Paradyn daemon) acts as a server to collect data from the local application processes. It forwards that data to the ISM over the network. The ISM is another server that accepts the instrumentation data from all the distributed LISs and analyzes the data. Each LIS (one per node) is modeled as one server and multiple buffers corresponding to the local application processes. Each application process puts its instrumentation data into its buffer (representing a Unix pipe) after the specified sampling interval has expired. The local daemon collects the instrumentation data samples from the head of each buffer, one at a time, and forwards the samples to the ISM. These samples compete for network resources to reach the ISM and undergo random delays before arriving. The ISM receives the samples, one at a time, and is modeled as a single server queuing system. The level of detail of the model shown in Figure 7 is not sufficient to evaluate alternative data collection

13

p0 i

pn-1i

p1i

Local application processes (n) on node i Instrumentation data buffers provided by the kernel (Unix pipes)

LISs Pd0

Pdi

PdP-1

Local Paradyn daemons (P), one per node Network delays are represented by the arrivals to a single server buffer to allow random sequence of arrivals from different Pds

ISM

ISM

Main Paradyn process

Figure 7. Paradyn instrumentation system model in terms of the LIS components and the ISM.

schemes and select one that incurs minimum overhead to the application processes. In order to accomplish this objective, the LIS part of the model is extended to account for resource sharing and contention between application processes and LIS functions. We have developed a Resource OCCupancy (ROCC) model for isolating the overheads due to non-deterministic sharing of resources between IS and application processes. The model consists of three components: 1. System Resources. These resources are shared among (instrumented) application processes, other user and system processes, and IS processes. They include CPU, network, and I/O devices; 2. Requests. These are demands from application processes, other users’ processes, and IS processes to occupy the system resources during the execution of an instrumented application program. A request to occupy a resource specifies the amount of time needed for completion of a particular computation, communication, or I/O step of a process; and 3. Management Policies. IS management policies determine the nature of data capturing and forwarding operations. Figure 8 depicts the resource occupancy model for the Paradyn IS with two resources, CPU and network, being shared by three types of processes: application, IS, and other user processes. These processes generate requests for occupying the resources for certain periods of time, which are determined from workload studies on the target system. (Similar workload characterization studies were developed, for instance, for a shared workstation by Kleinrock et al. [13].) Multiple processes can generate requests concurrently. If a resource is busy, the request waits in the queue of that particular resource. To ensure fair scheduling of processes, the operating system (Unix) can preempt a process that needs to occupy a system resource for a period of time longer than the specified quantum. When a request is fully serviced, it signals the process that generated it, which then issues the next request for occupying another resource. This activity continues till the application program terminates. 14

Triggering of subsequent request from the corresponding process time out

CPU Instrumented application processes

e

equ

r PU

sts

C

Instrumentation system process (daemon) Other user processes

Network

requests

Network

Processes running at a particular system node that generate requests for occupying the system resources

Figure 8. The resource occupancy model for the Paradyn IS to evaluate the overhead to the application program due to resource sharing and contention among processes.

This model is simulated to determine the overhead due to resource sharing and contention between application processes and the IS. Of particular interest is a comparison of alternative parameters and policies for LIS management. Two applicable metrics, their calculation method, and their interpretations are summarized in Table 5. Here, we consider the metrics with respect to the CPU. Pd (monitoring) interference, the absolute amount of CPU time required for daemon execution, represents direct overhead to the application processes. Lower is better. The relative amount of total CPU time used by the daemon (relative to the application processes), utilizationPd, has a more complicated interpretation in which a nominal value is best under high application process loads. That is, both high and low values are undesirable if there is contention from application processes. Relatively high utilization by the daemon then reflects low availability and thus low throughput (compared to capacity) for application processes. Conversely, relatively low utilization correlates with high latency in servicing IS requests (monitoring latency [6]) if the system is saturated. Once again, low throughput for application processes may be the result in this case due to blocking, as described in Section 3.2.3. Table 5. Metrics for evaluating the Paradyn IS management policies. Metric

Calculation

Interpretation

Pd Interference

Resource occupancy model

Corresponds to direct perturbation of the program; lower is better

UtilizationPd

Resource occupancy model

Nominal is best

Simulation experiments were set up to analyze the effects of two parameters (factors) associated with the present LIS management policy, sampling rate and number of local application processes, on the two 15

performance metrics of interest. We used a 2 kr factorial design technique for these experiments, where k is the number of factors of interest and r is the number of repetitions of each experiment [11]. For these experiments, k=2 factors and r=50 repetitions, and the mean values of the two metrics are derived within 90% confidence intervals. Results are plotted in Figure 9. As expected, direct perturbation (i.e., Pd interference) to local application processes decreases as the sampling rate decreases, that is, as the period increases. The dependence is superlinear initially, but levels off. CPU utilization by the daemon (i.e., utilizationPd) decreases as the number of application processes becomes large. 7

CPU utilization by the daemon

2700

Interference (msec)

2600 2500 2400 2300 2200 2100 2000 1900 1800 1700 50

100

150

200

250

300

350

400

450

6

5

4

3

2

1

0 0

500

Sampling period (msec)

5

10

15

20

25

30

35

Number of application processes

Figure 9. Interference and utilization metrics calculated with the ROCC model.

3.2.3 IS Evaluation The reduction in CPU utilization by the daemon is mainly due to the round-robin CPU scheduling used by the Unix operating system. If there are more processes waiting for CPU time in the queue, then within a given period of time, the daemon process will receive relatively less CPU time. This means that the daemon becomes a bottleneck as the number of application processes grows. If it can not collect and forward the instrumentation data samples at a sufficiently high rate, the pipes become full and application processes, blocked. A similar result is reported by Gu et al. using measurements from their Falcon IS to show that multiple monitoring processes reduce the monitoring latency when the number of application processes is above a threshold [6]. This is particularly true when local nodes have more computation than communication capacity as in the case of high performance workstations. Evaluation of the Paradyn IS continues in collaboration with the Paradyn developers, including modeling of new LIS management policies. We expect that feedback to the developers early in the development process will lead to better design decisions.

16

3.3 Vista IS Vista is an experimental, integrated parallel tool environment used for testing novel concurrent system instrumentation and visualization technologies [25]. We have previously used this environment to integrate specialized performance analysis tools with generic data analysis and visualization tools to perform offline performance analysis [28]. Presently, we are using Vista for on-line performance analysis and visualization of programs running on a cluster of workstations. Vista includes a testbed IS, which is being used for studying IS management policies that control data collection, forwarding, processing, and dispatching. The IS is configurable, so different management policies can be instituted dynamically. The overall goal of the Vista IS testbed (called P´RISM, PaRallel Instrumentation System Management, and consisting of LIS, ISM, and TP components) is to enable the user to rapidly prototype IS designs and select a policy that meets functional and performance requirements. The Vista LIS captures instrumentation data from an application process by invoking its instrumentation library functions. Instrumentation is event-driven, and data related to an event of interest are forwarded to the ISM without local buffering. The size of this data structure is kept very small to avoid excessive communication delays. This event record forwarding differs from the context switching and sampling of the Paradyn LIS (which uses a local daemon process). Event forwarding involves only one system call per event. The data are received and ordered by the ISM. To avoid problems due to the lack of a global clock, we use the technique of assigning logical time-stamps, as implemented by VIZIR [7]. If an arriving event is in correct causal order, it is assigned a logical time-stamp and stored in an output buffer. When a tool selected by the user is ready, the processed event information is dispatched to the tool from the output buffer. If the arriving event is not in causal order, it is added in one (or multiple) input buffer(s) to reconstruct the causal order of the data before dispatch to a tool. For this type of ISM, it is desirable that input buffer management and event ordering are efficient, so that the (monitoring) latency between the arrival of data to the input buffer and the dispatch of data to the output buffer is minimized. Otherwise, the logical time-stamp will become less accurate and may even perturb the visualizations presented by the tools. The evaluation of the Vista IS presented in this subsection focuses on its ISM. We have found remarkably similar issues addressed by Vista’s ISM and Falcon’s ISM, where Vista’s is designed for tool integration and Falcon’s, for program steering.

17

3.3.1 System Specifications The requirements of the Vista IS for efficient support of an integrated environment impose the constraints that the IS should have minimum overheads to the program and the system and should allow the analysis and visualization tools to present accurate program behavior to the users. The tool needs to operate online, with optional storing of data to a disk file for off-line analysis. Presently, the LIS does not support a local daemon for data forwarding, which avoids the potential bottleneck suggested by the Paradyn IS evaluation in Section 3.2.3. Customized message passing functions support data transfer between the ISM and application processes and the ISM and tools. Currently, the IS implements static instrumentation data management policies; in the future, it will be capable of implementing adaptive policies as well. Specifications necessary for creating a model of the Vista IS based on the components of Figure 1 are summarized in Table 6. Table 6. Specifications characterizing the Vista instrumentation system. Analysis Requirements On-line/ Offline

Platform

LIS

ISM

TP

Management Policy

Cluster of workstations

Instrumentation library with event forwarding and no local buffers

Instrumentation data processing, forwarding to tools, and storing to disk

Unix-based library functions for interprocess communication

Static management policy implemented by the developers

3.3.2 The IS Model The ISM is modeled first because its performance is deemed critical to obtaining correct and efficient presentation of program behavior from tools in an integrated environment. The specific objective of this modeling effort is to guide the developers in selecting one of two possible configurations of the ISM that will guarantee regular receipt of instrumentation data with minimum delays. The two possible configurations are: Single Input buffer, Single Output buffer (SISO) and Multiple Input buffers, Single Output buffer (MISO). As the names suggest, the SISO configuration uses one input buffer to store out-oforder instrumentation data from all the processes, whereas the MISO configuration has one buffer per each application process. These configurations are commonly used in on-line ISs, for example, Falcon uses the MISO approach. The ISM is modeled as a network of two single-server queues. Queuing models for the SISO and MISO systems are shown in Figure 10. Instrumentation data are assumed to arrive at the input buffer(s) with exponentially distributed inter-arrival times. The data processor of the ISM processes and dispatches this data according to a normal distribution. The processed instrumentation data are consumed by a tool in a 18

first come, first served fashion. The SISO Model From application processes

The MISO Model From application process 0 process P-1

Exit from the system

M/G/1 queue

G/M/1 queue

MP/G/1 G/M/1 queues queue

Tool

Tool

Input (priority) queues

Input (priority) queue Output (FIFO) queue Data processor

Exit from the system

Output (FIFO) queue

Input side

Data processor

Output side

Data transfer to tool

Input side Output side Data transfer to tool

Figure 10. Models for the SISO and MISO configurations of the Vista ISM.

We have selected two metrics to compare the performance of the ISM configurations: data processing latency and average length of buffer(s). Data processing latency is defined as the amount of time between the arrival of instrumentation data at the ISM and its arrival (after processing) at the output buffer. Lower is better, since a high latency may result in inaccurate presentation of program behavior by tools. Average buffer length is defined as the ratio of the total number of instrumentation data records that arrive out of order (and hence need to be buffered) to the total observation time. A larger value of average buffer length indicates that many arrivals are out of order due to the management policies implemented by the LIS. A similar metric, called hold back ratio has been used by Gu et al. to evaluate the performance of the Falcon ISM [6]. This metric is defined as the ratio of the number of out-of-order arrivals to the total number of arrivals (rather than to total time). However, the two metrics provide the same qualitative measure of ISM performance. Each metric, its calculation, and its interpretation are summarized in Table 7. Table 7. Metrics for evaluating the Vista IS management policies. Metric

Calculation

Interpretation

Data processing latency

Queuing model evaluation and simulation

Longer latency may be undesirable for the tools

Average buffer length (hold back ratio)

Queuing model evaluation and simulation

Higher value indicates a potential bottleneck in the IS

The simulation experiments were set up to analyze the effects of the SISO or MISO configuration on the two performance metrics. Two factors were varied for these experiments: the ISM configuration (SISO or MISO) and the mean inter-arrival time between successive instrumentation data arrivals to the ISM. We use 19

a 2kr factorial design technique for these experiments (as in Section 3.2.2) [11]. For these experiments, k=2 factors and r=50 repetitions, and the mean values of the two metrics are derived within 90% confidence intervals. This technique was used to evaluate the relative significance of each of the two factors with respect to a metric. Data processing latency and average buffer length statistics for the two configurations and various arrival rates are shown in Figure 11. The data processing latency exhibits higher variance at longer inter-arrival times (lower arrival rates) for both SISO and MISO configurations, making them less distinguishable. For shorter inter-arrival times (higher arrival rates), the SISO ISM has relatively lower latency. Intuitively, maintenance of multiple buffers should incur more overhead, especially in accessing memory (including virtual memory), under high arrival rate conditions. The average buffer length follows a similar pattern. At lower arrival rates, the average buffer lengths are almost the same, but at higher rates, SISO is better than MISO. This behavior agrees with that observed by Gu et al. for the Falcon ISM, where the hold back ratio increases if an LIS has a large local buffer [6]. A large local buffer equates to a high rate of arrival at the ISM buffer, since, when sent, the result is a burst of arrivals at the ISM. We analyzed these results using principal component analysis techniques [11] and found that the inter-arrival rate is the dominant factor that affects data processing latency and average buffer length. 22

o

210

+

o

20

MISO system SISO system

Average input buffer length

Average data processing latency

220

200

190

180

170

+

18

MISO system SISO system

16 14 12 10 8 6

160 10

20

30

40

50

60

70

80

90

4 10

100

Mean inter-arrival times (milliseconds)

20

30

40

50

60

70

80

90

100

Mean inter-arrival times (milliseconds)

Figure 11. Comparison between the SISO and MISO ISMs in terms of average data processing latencies and input buffer lengths.

3.3.3 IS Evaluation The results presented in the previous section do not indicate that one configuration is clearly superior to another. Some researchers favor the MISO configuration (for instance [15]), and tools such as Falcon have implemented it. However, the models and evaluation presented here suggest that the SISO configuration

20

performs equally well at moderate arrival rates and marginally better at higher arrival rates. In event-driven monitoring, it is not uncommon for the rate of arrivals to surge during certain intervals, yielding unstable ISM behavior. Since the Vista IS uses an event-driven approach, a design decision was made to incorporate the SISO configuration based on this modeling and evaluation feedback. In general, assessing and validating design decisions with measurements of the operating IS (i.e., with benchmarking) is an essential step of the development process and one that we are currently addressing. The results of the PICL, Paradyn, and Vista case studies demonstrate the utility of the structured development approach for practical ISs. The major contribution of the IS models is their ability to support “what-if” analyses to investigate various parameters and policies. Considerable research effort remains to develop meaningful benchmarks for validating prototype or production ISs.

4

Related Work

Many parallel programming tools use an IS. Cheng [3] has surveyed most of the well-known parallel program performance analysis and debugging tools. In this section, we introduce the IS development approaches of various representative parallel tools. These are summarized in Table 8 according to the following features: off-line or on-line performance analysis and visualization; nature of the LIS and ISM components; hard-coded or application-specific development of instrumentation software; static, adaptive, or application-specific management of instrumentation data; and any integral evaluation techniques. These features establish the relevant context for the structured IS development approach presented in this paper. PICL and ParaGraph [8] have been used with several environments. PICL [4] is a portable library of efficient communication functions that also supports instrumentation. Refer to Section 3.1 for discussion of the PICL IS. AIMS (Automated Instrumentation and Monitoring System [32]) is a toolkit consisting of an instrumentation library and a set of off-line performance analysis and visualization tools. Its IS support is almost identical to that of PICL. A user can specify different sizes of buffers or usage of flushing functions in a configuration file as part of a static management policy. Pablo [22] is an integrated tool environment that offers three types of performance data capturing functions: (1) event tracing; (2) event counting; and (3) code profiling. If a local buffer is full, all buffers can be flushed synchronously to a file or to an Internet domain socket. Unlike PICL and AIMS ISs, Pablo’s IS supports adaptive levels of tracing to dynamically alter the volume, frequency, and types of event data

21

Table 8. Summary of IS features of some representative parallel tools.

Tool

Analysis/ Visualization Support

LIS

ISM

Synthesis Approach

Management Approach

Evaluation Approach

PICL

Off-line

Local buffers using runtime library

Trace file

Hard-coded

Static



AIMS

Off-line

Library

Trace file

Hard-coded

Static



Pablo

Off-line

Library

Trace file

Hard-coded

Adaptive



Paradyn

On-line

Local daemon

Main Paradyn process

Applicationspecific by using PCL

Adaptive

Adaptive cost model

Falcon/ Issos/ChaosMON

On-/Off-line

Resident monitor

Central monitor

Applicationspecific

Applicationspecific

Evaluation of the factors that affect perturbation

ParAide (TAM)

On-/Off-line

Library

Event trace server

Hard-coded

Static

Accountable invasiveness

SPI

On-/Off-line

Library

EventAction machines

Applicationspecific

Applicationspecific

Accountable invasiveness

VIZIR

On-/Off-line

Library

VIZIR front-end

Hard-coded

Static



recorded. Adaptive management policies ensure that the IS overheads remain low, particularly for the longrunning instrumented programs. Paradyn [19] is an on-line performance evaluation environment that is based on dynamically updating the cumulative-time statistics of various performance variables. In addition to implementing a dynamic management policy, its IS is equipped with the capability to estimate its cost to the application program [10]. This cost model is continuously updated in response to actual measurements as an instrumented program starts executing, and the model attempts to regulate the amount of IS overhead to the application program. Falcon [6] is an application-specific, on-line monitoring and steering system for parallel programs. The Falcon IS supports dynamic control of monitoring overhead to reduce the latency between the time an event is generated and the time it is acted upon for the purpose of steering. Various modules and functions of the IS are specified by a low-level sensor specification language and a higher level view specification language. Falcon is perhaps the only tool that provides a thorough evaluation of both LIS and ISM parts of its instrumentation system. ParAide [24] is the integrated performance monitoring environment for the Intel Paragon. Commands are

22

sent to the distributed monitoring system, called Tools Application Monitor (TAM). TAM consists of a network of TAM processes arranged as a broadcast spanning tree with one TAM process at each node. This configuration allows broadcasting monitoring requests to all nodes. Instrumentation library calls generate data that are sent to the event trace servers, which perform post-processing tasks and write the data to a file or send them directly to an analysis tool. To minimize perturbation, trace records are stored locally in a trace buffer that is periodically flushed to the local trace server. Scalable Parallel Instrumentation (SPI [1]) is Honeywell’s real-time instrumentation system for heterogeneous computer systems. SPI supports an application-specific instrumentation development environment, which is based on an event-action model and an event specification language. Hewlett-Packard’s VIZIR [7] is another integrated tool environment used for debugging and visualizing of a workstation cluster. This environment utilizes commercially available debuggers and visualization tools. This environment is an example in which IS support has been used to integrate heterogeneous tools. Work has been done on compensating for the effects of program perturbation due to instrumentation [31]. The goal of perturbation compensation is to reconstruct the actual program behavior from the perturbed behavior as it may be recorded by the IS. Malony et al. [16] describe a model for removing the effects of perturbation from the traces of parallel program executions. Presently, it is not standard practice to formally evaluate the performance and functionality of a tool early in its development. Usability and efficiency studies of prototypical tools are emerging to alleviate this situation. However, the underlying IS is removed from the end-user and is part of system infrastructure, thus necessitating more rigorous evaluation. Moreover, contemporary approaches to evaluate IS overheads and perturbation do not adequately consider the nondeterministic nature of these effects. The approach introduced in this paper has addressed these issues.

5

Discussion and Conclusions

This paper has presented a structured approach to evaluate the configuration and the management policies of an instrumentation system to provide valuable feedback to developers regarding the performance of their designs. Due to the diversity and complexity of computer systems in general and concurrent computer systems in particular, many performance analysts agree that there is no such thing as a “theory of computer system performance evaluation” yet. Performance evaluation of a given computer system is justifiably referred to as an art [11]. The evaluation process is carried out with respect to specific goals and the subtle

23

behavioral aspects of the system under study by applying appropriate results from multiple, related disciplines such as statistics, probability theory, queuing theory, operations research, simulation, and so on. In this paper, we have presented modeling and evaluation approaches for three ISs that serve fundamentally different requirements. For PICL, we used analytical modeling to calculate IS overheads. Actual measurements also were possible because the IS already exists. On the other hand, we had to use simulation models for evaluating the overheads of the Paradyn and Vista ISs in their specific contexts because the actual ISs are still at various stages of development and prototyping. While a “universal” model or evaluation technique that applies to all ISs is not practical, one can appreciate the following commonalities among the three case studies:

• Queuing models are intuitively appropriate to model the dynamics of an IS, just as they are for several other computer system components and policies, including processor architectures, networks, I/O subsystems, memory hierarchies, memory management schemes, caching policies, processor scheduling policies, communication protocols, and so on.

• A model is established according to high-level requirements of an IS, which aids in specifying the factors and metrics that are important for evaluating performance.

• The primary goal of a model is to support “what-if” analyses regarding the selection of various parameters and policies of an IS. If the IS is in production, modeling results can be used to analyze the system and measurements can be obtained to test their validity. If an IS is being designed or prototyped, simulation experiments can be used to investigate various design choices. This is a standard practice in system design [14]. These commonalities point toward the need and the opportunity for applying a structured approach in IS development. Although specifics will differ for different ISs, the overall approach is represented by Figure 1. This approach provides a basis for developers to institute design decisions that better serve the requirements. Most of the extant ISs that represent state-of-the-art IS development (listed in Table 8) try to address a subset of the issues raised in this paper in order to meet their domain-specific requirements. As concurrent computing is becoming more popular in a growing number of application areas, IS developers are faced with new challenges. One such challenge is the development of ISs for distributed or embedded real-time systems [1]. Such systems have to meet stringent timing and performability constraints to be operational, and their ISs need to incorporate adaptive management and usage of system resources, customizability, and flexibility. The demands on next-generation ISs reinforce the need for a structured approach. The operation of an IS in a real system is non-deterministic; hence it is not sufficient to collect measurements to evaluate it. The non-deterministic nature of arrivals, resource usage and contentions, and computational load on the system may render measurements of limited use. All three case studies

24

presented in this paper used models that do not overlook the random nature of various IS activities. Several important areas are being addressed by our on-going efforts in IS development: (1) benchmarking of ISs to validate that requirements are met; (2) applying structured software engineering methods to map abstract instrumentation system models to implementations; (3) appropriately characterizing IS workload to enhance the power and accuracy of the models; and (4) modeling other ISs that are at various stages of development to augment our suite of case studies using the structured approach. References [1]

Bhatt, Devesh, Rakesh Jha, Todd Steeves, Rashmi Bhatt, and David Wills, “SPI: An Instrumentation Development Environment for Parallel/Distributed Systems,” to Appear in the Proc. of Int. Parallel Processing Symposium, April 1995.

[2]

Brown D., S. Hackstadt, A. Malony, B. Mohr, “Program Analysis Environments for Parallel Language Systems: The TAU Environment,” Proc. of the Second Workshop on Environments and Tools For Parallel Scientific Computing, Townsend, Tennessee, May 1994, pp. 162–171.

[3]

Cheng, Doreen Y., “A Survey of Parallel Programming Languages and Tools,” Report RND-93-005, NASA Ames Research Center, March 1993.

[4]

Geist, G., M. Heath, B. Peyton, and P. Worley, “A User’s Guide to PICL”, Technical Report ORNL/ TM-11616, Oak Ridge National Laboratory, March 1991.

[5]

Gelenbe, E., G. Pujolle, and J. C. C. Nelson, Introduction to Queuing Networks, John Wiley, 1987.

[6]

Gu, Weiming, Greg Eisenhauer, Eileen Kramer, Karsten Schwan, John Stasko, and Jeffrey Vetter, “Falcon: On-line Monitoring and Steering of Large-Scale Parallel Programs,” Technical Report GIT–CC–94–21, 1994.

[7]

Hao, Ming C., Alan H. Karp, Abdul Waheed, and Mehdi Jazayeri, “VIZIR: An Integrated Environment for Distributed Program Visualization,” Proc. of Int. Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS ‘95) Tools Fair, Durham, North Carolina, Jan. 1995.

[8]

Heath, Michael T. and Jennifer A. Etheridge, “Visualizing the Performance of Parallel Programs,” IEEE Software, 8(5), September 1991, pp. 29–39.

[9]

Hollingsworth, J. K. and B. P. Miller, “Dynamic Control of Performance Monitoring on Large Scale Parallel Systems,” Proc. of Int. Con. on Supercomputing, Tokyo, Japan, July 19–23, 1993.

[10] Hollingsworth, J. K. and B. P. Miller, “An Adaptive Cost Model for Parallel Program Instrumentation,” Technical Report, Oct. 1994. [11] Jain, Raj, The Art of Computer Systems Performance Analysis—Techniques for Experimental Design, Measurement, Simulation, and Modeling, John Wiley & Sons, Inc., 1991. [12] Kilpatrick, Carol and Karsten Schwan, “ChaosMON—Application-Specific Monitoring and Display of Performance Information for Parallel and Distributed Systems,” Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, Santa Cruz, California, May 20–21, 1991. [13] Kleinrock, Leonard and Willard Korfhage, “Collecting Unused Processing Capacity: An Analysis of Transient Distributed Systems,” IEEE Transactions on Parallel and Distributed Systems, 4(4), May 1993, pp. 535–546. [14] Law, Averill M. and W. D. Kelton, Simulation Modeling and Analysis, McGraw-Hill, Inc., 1991. 25

[15] Lieu, Eric, personal communications, Hewlett-Packard Labs, Palo Alto, California, June 1994. [16] Malony, A. D., D. A. Reed, and H. A. G. Wijshoff, “Performance Measurement Intrusion and Perturbation Analysis,” IEEE Transactions on Parallel and Distributed Systems, 3(4), July 1992. [17] Malony, A., B. Mohr, P. Beckman, D. Gannon, S. Yang, F. Bodin, and S. Kesavan, “Implementing a Parallel C++ Runtime System for Scalable Parallel Systems,” Proceedings of Supercomputing ‘93, Portland, Oregon, November 15–19, 1993. [18] Malony, A. D., “Measurement and Monitoring of Parallel Programs,” Tutorial, Sigmetrics ‘1994, Nashville, Tennessee, May 16–20, 1994. [19] Miller, Barton P., Jonathan M. Cargille, R. Bruce Irvin, Krishna Kunchithapadam, Mark D. Callaghan, Jeffrey K. Hollingsworth, Karen L. Karavanic, and Tia Newhall, “The Paradyn Parallel Performance Measurement Tools,” Technical Report, 1994. [20] Nutt, Gary J. and Adam J. Griff, “Extensible Parallel Program Performance Visualization,” Proc. of Int. Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS ‘95), Durham, North Carolina, Jan. 1995. [21] Ogle, David M., Karsten Schwan, and Richard Snodgrass, “Application-Dependent Dynamic Monitoring of Distributed and Parallel Systems,” IEEE Transactions on Parallel and Distributed Systems, 4(7), July 1993, pp. 762–778. [22] Reed, Daniel A., Ruth A. Aydt, Tara M. Madhyastha, Roger J. Noe, Keith A. Shields, Bradley W. Schwartz, “The Pablo Performance Analysis Environment,” Dept. of Comp. Sci., Univ. of Ill., 1992. [23] Resnick, Sidney I., Adventures in Stochastic Processes, Birkhauser, 1992. [24] Ries, Bernhard, R. Anderson, D. Breazeal, K. Callaghan, E. Richards, and W. Smith, “The Paragon Performance Monitoring Environment,” Proceedings of Supercomputing ‘93, Portland, Oregon, Nov. 15–19, 1993. [25] Rover, Diane T., “Vista: Visualization and Instrumentation of Scalable Multicomputer Applications,” Project Summary, IEEE Parallel and Distributed Technology, 1(3), August 1993, pp. 83. [26] Rover, Diane T., “Performance Evaluation: Integrating Techniques and Tools into Environments and Frameworks,” Roundtable, Supercomputing ‘94, Washington DC, November 14–18, 1994. [27] Simmons, M., and R. Koskela, editors, Performance Instrumentation and Visualization, ACM & Addison-Wesley, 1990. [28] Waheed, A., B. Kronmuller, Roomi Sinha, and D. T. Rover, “A Toolkit for Advanced Performance Analysis,” Proc. of Int. Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS ‘94) Tools Fair, Durham, North Carolina, Jan. 31– Feb. 2, 1994. [29] Waheed, A., Vincent Melfi, and Diane T. Rover, “A Model for Instrumentation System Management in Concurrent Systems,” Proceedings of the Twenty Eights Hawaii International Conference on System Sciences, Maui, Hawaii, Jan. 3-6, 1995. [30] Workshop on Debugging and Performance Tuning of Parallel Computing Systems, Chatham, Mass., Oct. 3-5, 1994. [31] Yan, Jerry C. and S. Listgarten, “Intrusion Compensation for Performance Evaluation of Parallel Programs on a Multicomputer,” Proceedings of the Sixth International Conference on Parallel and Distributed systems, Louisville, KY, Oct. 14–16, 1993. [32] Yan, Jerry, “Performance Tuning with AIMS—An Automated Instrumentation and Monitoring System for Multicomputers,” Proc. of the Twenty-Seventh Hawaii Int. Conf. on System Sciences, Hawaii, January 1994.

26

Suggest Documents