A Flexible and Scalable Approach to Navigating ... - CiteSeerX

1 downloads 218 Views 203KB Size Report
compute servers, one commonly used approach to nav- igation requires the ... approach to navigation in performance management, especially identifying ...
A Flexible and Scalable Approach to Navigating Measurement Data in Performance Management Applications Proceedings of the Second International Conference on Systems Management June 19-21, 1996 Toronto, Canada Robert F. Berry

Joseph L. Hellerstein

IBM Personal Systems Products 11400 Burnet Road Austin, Texas 78758

IBM Research Division T.J. Watson Research Center Yorktown Heights, New York 10598

Abstract

Abstract

Managing the performance of large, distributed systems requires exible and scalable approaches to automating measurement navigation. Unfortunately, existing approaches achieve scalability by severely limiting exibility. Considered here is an approach that infers navigations from a dimensional representation of the measurement name space. Doing so provides

exible navigation and results in dramatic improvements in scalability, as quanti ed by analytic models herein developed. Indeed, our models indicate that it is inherently unscalable to automate navigation by requiring the speci cation of relationships between measurement names, as is done in existing approaches. In contrast, the dimensional approach is optimal for the class of data sources considered in our models. Exploiting the dimensional approach requires addressing issues such as: irregularities in the measurement name space; mappings between the name space used for measurement collection and storage and the dimensional structured name space; and ecient storage of measurement names. Solutions are proposed for all of the foregoing.

1 Introduction

The cost of computer hardware and software has decreased dramatically with the widespread use of distributed computing systems. Unfortunately, distributed systems often have huge operating costs. For example, Infonetics Research reports an annual operating cost of $7,500 per LAN-connected user, an amount that far exceeds hardware and software costs. This paper addresses a key element of these operating costs{managing system performance. Performance management encompasses activities such as resource monitoring, tuning, and capacity

planning. Many applications automate aspects of performance management. Examples include: [6], which makes recommendations on disk tuning; [5], which aids in tuning mainframe computer systems; and [1], which addresses capacity planning. Despite these efforts, the cost of performance management remains high. What are the impediments to automating performance management? It is our experience that a central diculty lies with navigating measurement data. For example, identifying performance bottlenecks often requires navigating large volumes of measurement data as does assessing the impact of con guration changes. The navigation problem can be stated as follows: Given a measurement variable (e.g., response time for a bank-balance transaction), identify related variables that should be examined next (e.g., delays associated with client workstations, networks, and database servers). Making such associations typically requires expert knowledge, especially in distributed systems. However, experts are in short supply. Automating measurement navigation requires incorporating the knowledge of human experts into the navigation application. This should be done in a manner that is both exible and scalable. Flexibility is required to ensure that all related variables can be discovered. Without navigational exibility the conclusions drawn and/or the actions taken may be unduly restrictive or even incorrect. Scalability is essential because of the large number of variables and their complex interrelationships. Indeed, widely used approaches to automating navigation require that human experts specify all possible relationships between measurement variables. Such approaches scale poorly, unless navigational exibility is severely curtailed. This paper describes an approach to measurement navigation that provides both exibility and scalabil-

ity. Our approach dynamically infers measurement relationships instead of having human experts specify measurement relationships a priori. This is accomplished by employing a dimensional approach in which measurement names are represented as tuples of coordinates. The dimensional approach dramatically improves scalability. For example, we show that to achieve exible navigation in a modest sized cluster of compute servers, one commonly used approach to navigation requires the manual speci cation of over 10 measurement relationships; the dimensional approach requires specifying only 90 coordinate relationships. Others have employed the dimensional approach to navigation in areas such as nancial analysis and product planning (e.g., [12], [14], and [17]). Further, the dimensional representation of measurement names has been incorporated into existing performance monitors to handle dynamic information, such as workloads and hardware con gurations (e.g., [19] and [8]). However, it does not appear that these monitors exploit the dimensional representation for measurement navigation. Our contributions are: (a) applying the dimensional approach to navigation in performance management, especially identifying dimensions and coordinates for commonly used data sources (e.g., in UnixTM , mainframes, and networks); (b) formalizing the algorithms employed; (c) quantifying the bene ts derived; and (d) identifying and addressing design and implementation issues. The remainder of this paper is organized as follows. Section 2 provides background on measurement navigation in performance management and describes common approaches to its automation. Section 3 develops the dimensional approach. In Section 4, analytic models are constructed to quantify the bene ts provided by the dimensional approach. Section 5 discusses our results, including design considerations. Our conclusions are contained in Section 6.

Workload classi cations  system - all workloads in the system  performance group (PG) - administrative domain  address space - individual application Metrics  work ow (w ) - normalized service rate  using (usg) - receiving service  processor using (pru) - receiving CPU  device using (dvu) - doing disk IO  delay (dly) - waiting for a resource  processor delay (prd) - waiting for the CPU  device delay (dvd) - waiting to do disk IO

13

Figure 1: Notation Used in Data Source for Scenarios Seven metrics are considered. Work ow (denoted by w ) provides an indication of system \health." It is a normalized service rate that is computed as the ratio (in percent) of the time spent receiving service to the time spent either receiving service or being delayed for it. Work ow takes on values between 0 and 100: 0 means that the workload is always delayed; 100 means that it is never delayed. Two classes of resources are considered in the example: processor (or CPU) and user I/O devices. A workload may use the CPU (denoted by pru) or perform disk I/O (dvu). In addition, the workload may wait for either the CPU (prd) or for disk I/O (dvd). Total time spent using disk or CPU is denoted by usg, where usg = pru + dvu. Similarly, total delay is denoted by dly, where dly = prd+dvd. For workload aggregations (e.g., performance groups), pru and dvu are summed across the values of group members (e.g., address spaces) to re ect better the capacity consumed; thus, usg = pru + dvu may exceed 100. Delays are averaged so as to indicate the impact on individual members of the workload aggregation; thus, dly may not exceed 100. Fig. 2 displays the data used in the example. (Values have been rounded to the nearest percent.) Columns of the table are metrics; rows are workloads. There are two performance groups: PG1, which contains address spaces AS1 and AS2; PG2, which contains AS3. A cell within the table corresponds to a measurement name. Thus, the cell in the rst row of the second column corresponds to the measurement name system-usg. The number within the cell is the value associated with the measurement name. At this point, we introduce several de nitions. A

2 Background

This section provides background on measurement navigation in performance management and describes common approaches to automated navigation. We begin with an example that provides the context for two navigation scenarios. The example is based on data available in mainframe computers [10], and is similar to Unix data such as [16]. A measured value contains both a \who" and a \what" component. The \who" component is workload information: applications that are grouped together for reporting or analysis purposes. The \what" component is metric information: quanti able characteristics of an executing program (e.g., delays). In the example, workloads are organized into a three level hierarchy (as summarized in Fig. 1). At the lowest level are individual address spaces in which applications execute. At the next level are performance groups, which are collections of address spaces that re ect business reporting (e.g., all applications run by the claims-processing department at an automobile insurance company). At the highest level is the entire system. 2

Workload

system PG1 AS1 AS2 PG2 AS3

w

86 70 89 14 97 97

(1)

(2) (3) (3) (2)

usg

200 103 89 14 97 97

(4)

pru

100 10 6 4 90 b 90 c

a (b)

( )

Metrics dvu dly

100 33 93 46 83 11 10 86 7 3 7 3

(4)

( ) ( )

prd

dvd

27 41 7 75 0 0

(5)

6 8 4 11 3 3

(5)

 AS1 and AS2 are in PG1  AS3 is in PG2  Superscripts indicate measurement navigations Figure 2: Data Used in Navigation Scenarios

measurement variable consists of a measurement name and its associated value(s). The measurement name space (or just name space) for a data source is the set of all names for measurement variables for that data source. For example, the name space for the data source in Fig. 2 has forty-two elements that are constructed by pairing each row label with each column label. The measurement value space (or just value space) of a data source is the set of values associated with measurement names. In Fig. 2, these values begin with the number 86 in the upper left and end with 3 in the lower right. Measurement navigation can be done in the name space, the value space, or a combination of the two. Herein, the focus is on the name space since it provides a basis for explaining the navigations done based on the variable relationships used. To illustrate name space navigation, we present a navigation scenario for the data in Fig. 2.

(4) Seeing that AS2-wfl is very low, the analyst seeks more detail. Since the analyst knows that wfl is a function of usg and dly, the analyst navigates from AS2-wfl to AS2-usg and AS2-dly. (5) Since AS2-dly is quite large, the analyst considers its components. Knowing that dly = prd + dvd, the analyst navigates to AS2-prd and AS2-dvd. Act: Seeing that AS2-prd is much larger than AS2-dvd, the analyst seeks the cause of AS2's large CPU delays. These steps are indicated by the numeric superscripts in Fig. 2. Note that in addition to navigation, the above scenario considers analysis issues as well. For example, the navigation in step (3) is motivated by the analysis \PG1-wfl is smaller than PG2-wfl." Such considerations are beyond the scope of this paper but are addressed elsewhere (e.g., [9]). How can the navigations in Scenario I be automated? A widely used approach is to employ explicit representations of possible navigations. Common representations are rules (e.g., [5], [7]) and graphs (e.g., [3], [11], [18]). While these representations di er in their speci cs, their capabilities are fairly similar. For pedagogical purposes, we use a measurement navigation graph (MNG) [3]. In a MNG, measurement names are represented by nodes, and relationships between measurement names are indicated by directed arcs. Fig. 3 displays a MNG for the navigations performed in Scenario I. For example, the navigation from system-wfl to PG1-wfl and PG2-wfl is represented explicitly in Fig. 3 by arcs from system-wfl to PG1-wfl and PG2-wfl. In current practice, such relationships are, for the most part, speci ed manually. Given a MNG, navigation is readily automated by employing algorithms that traverse arcs to determine the children (or parents) of a measurement variable. Employing such automation means that analysts need not be as concerned with the details of individual data

Scenario I

(1) The analyst begins by looking at system-wfl, which is an indicator of overall system health. For the purposes of this scenario, the analyst's objective is that system-wfl > 90, which is not the case in Fig. 2. (2) Having detected a problem, the analyst tries to identify the a ected workloads. To this end, the analyst employs knowledge about relationships between two workload classi cations: the entire system and performance groups. Thus, the analyst navigates from system-wfl to PG1-wfl and PG2-wfl. (3) Since PG1-wfl is much smaller than PG2-wfl, the former is pursued in more detail. Doing so demands that the analyst know the address spaces that comprise PG1. With this knowledge, the analyst navigates from PG1-wfl to AS1-wfl and AS2-wfl. 3

system-wfl

PG1-wfl

AS1-wfl

PG2-wfl

AS2-wfl

AS2-usg

AS2-dly

AS2-prd

AS2-dvd

Figure 3: Measurement Navigation Graph (MNG) for Scenario I (The foregoing navigations are indicated by the superscripts (a), (b), and (c) in Fig. 2.) Automating these navigations requires another MNG. How many MNGs must be constructed in order to automate all navigation scenarios? To address this question, we consider a more extensive example: a cluster of compute servers (e.g., [15]) in which the same application may be concurrently running on multiple computers. For simplicity, we consider a data source that provides measurement variables for thirty servers (labelled s ;    ; s ), thirty workloads (labelled w ;    ; w ), and thirty metrics (labelled m ;    ; m ). We add to this: s , which represents all servers (to obtain a cluster-wide view); w , which represents all workloads (to obtain a system-wide view from the end-user's perspective); and m , which provides a service-level perspective (e.g., response time). Thus, the counter-part to Fig. 2 would be a three dimensional table with approximately 30,000 cells, each representing a measurement name. We denote measurement names in the same hyphenated manner as before: si -wj -mk , which refers to the ith server, the j th workload, and the kth metric. For this data source, navigation typically begins with s -w -m , which quanti es overall performance of the servers and workloads (e.g., application weighted, cluster-wide response times). From here, three possible navigations are:  To determine if some servers have unusually poor performance, the analyst views the servicelevels provided by each server for all of its

sources. Thus, it may be possible for analysts to be generalists in terms of solving performance problems and rely on automated measurement navigation to cope with the speci cs of individual data sources. Doing so has great appeal since it makes expert analysts more productive, and it makes novice analysts more skilled. Further, automating navigation is a prerequisite for automating higher-level tasks, such as diagnosis and problem resolution. While Fig. 3 provides the basis for automating the navigations in Scenario I, it does not help with many other navigations of interest. For example, the action prescribed in Scenario II requires knowing which address spaces consume the most CPU. These navigations are described in the following scenario.

1

1

1

30

30

30

0

0

Scenario II

0

(a) The analyst looks at system-pru to ensure that there is no unused CPU. (b) The analyst uses knowledge of the performance groups in the system to isolate the workloads consuming the most CPU. Thus, there is a navigation from system-pru to PG1-pru and PG2-pru. (c) PG2-pru accounts for 90% of the CPU consumed, which motivates the analyst to seek more detail about this performance group. Since the analyst knows that only AS3 lies within PG2, there is a navigation from PG2-pru to AS3-pru. Act: The analyst examines the AS3 applications to determine why they consume so much CPU.

0

4

0

0

3 Dimensional Approach

workloads. That is, there is a navigation to s -w -m ;    ; s -w -m . 1

0

0

30

0

This section describes an approach to navigating measurement data in a manner that provides both

exibility and scalability. Doing so requires: (a) not using explicit representations of measurement names and (b) avoiding explicit representations of relationships between measurement names. Addressed rst is the representation of the measurement names. Returning to the example in Fig. 2, recall that there are forty-two measurement names that are obtained by pairing every row label with every column label. One way to represent the name space is to list all forty-two names. An alternative, and more scalable representation, is to infer measurement names from more fundamental information: the row and column labels. Hereafter, we refer to such labels as coordinates. In the dimensional approach, the name space in Fig. 2 becomes a two dimensional entity consisting of a:  workload dimension with coordinates: system; PG1; AS1; AS2; PG2; AS3  metric dimension with coordinates: wfl; usg; pru; dvu; dly; prd; dvd A measurement name is an element in the cross product of these two sets, such as (system; wfl). More generally, suppose there are K dimensions, and Ck is the set of coordinates for the kth dimension. If c 2 C ;    ; cK 2 CK , then (c ;    ; cK ) is a measurement name. We note in passing that this is consistent with the relational approach to monitoring described in [20] (although [20] does not exploit the dimensional structure for navigation). Addressed next is how the dimensional representation of measurement names allows for the dynamic construction of MNGs. Returning to Fig. 2 and to Scenario I (whose navigations are indicated by the numeric superscripts in the table), note that navigations (1), (2), and (3) use the same metric coordinate (i.e., wfl) but di erent workload coordinates. This is a consequence of exploiting relationships within the workload dimension. That is, navigation (2) uses the fact that system contains PG1 and PG2, and (3) uses the knowledge that PG1 contains AS1 and AS2. Navigations (4) and (5) have the same workload coordinate (i.e., AS2) but di erent metric coordinates. Here, insight into the metric dimension is employed: navigation (4) exploits the relationship wfl = usg=(usg + dly); (5) uses dly = prd + dvd. In the dimensional approach, a graphical representation is used to express relationships between coordinates within the same dimension. We refer to such a graph as the coordinate hierarchy for the dimension. Fig. 6 depicts the coordinate hierachies for the workload and metric dimensions of the data source in Fig. 2. Fig. 7 and Fig. 8 illustrate the operation of the dimensional approach in Scenario I. The gures are divided into four columns: the navigation step, the workload hierarchy, the metric hierarchy, and the measurement names formed. Navigation takes place by

0

 To determine if some applications are perform-

ing poorly, the analyst views the service-levels provided by each workload across all servers on which it executes. That is, there is a navigation to s -w -m ;    ; s -w -m . 0

1

0

0

30

0

 To determine if there are potential resource bot-

tlenecks in the aggregate, the analyst looks at cluster-wide metrics of resource utilizations (e.g., disk I/Os, CPU utilization, calls to network le system servers) that are aggregated across servers and workloads. That is, there is a navigation to s -w -m ;    ; s -w -m . 0

0

1

0

0

30

For each measurement name reached, two navigations from it are possible (one for each of the dimensions that has not been navigated along). Thus, the number of MNGs grows rapidly. Fig. 4 depicts portions of one MNG for this data source. In Section 4, we show that the total number of MNGs is huge{approximately 10 . Constructing these MNGs requires specifying a parent for each measurement name (except the root). Thus, the total number of measurement relationships that must be speci ed is approximately 30; 000  10  10 . Although the foregoing is consistent with many existing performance management applications, it is somewhat of a worst case in two respects. First, many of the MNGs are quite similar to one another. Thus, instead of constructing separate graphs, it may be more ecient to have pointers to common subtrees (e.g., as in [4]). Such an extended measurement navigation graph (EMNG) is depicted in Fig. 5. Nodes are either measurement names, which are indicated by rectangles, or navigation-types, which are indicated by circles. By the latter, we mean whether navigation is done by server, workload, or metric. Here, only one EMNG is required to represent all possible name-space navigations for the data source. As shown in Section 4, the resulting EMNG contains 162,000 arcs that terminate at name nodes plus another 183 arcs that terminate at navigationtype nodes. In comparison to the MNG approach, using an EMNG greatly reduces the number of relationships that must be speci ed; but it is still well beyond what humans can cope with. Further, constructing an EMNG in practice is even more demanding since our example considers only a modest number of servers, workloads, and metrics. Another way in which our analysis is worst case is its assumption that all possible MNGs are of interest. Since only a small fraction of the MNGs are in fact used, providing just these MNGs can result in signi cant eciencies. However, a priori selection of MNGs is dicult since the speci c navigations required depend on the problem being solved. Indeed, our experience has been that a priori selection greatly limits the utility of the resulting applications. 9

9

13

1

5

1

1

s0-w0-m0

s0-w0-m30

s0-w0-m1

s30-w0-m1

s1-w0-m1

s30-w1-m1

s30-w30-m1

s0-w1-m30

s1-w30-m30

s0-w30-m30

s30-w30-m30

Figure 4: Measurement Navigation Graph (MNG) for Thirty Server Example

Given:  C ;    ; CK - the coordinates for dimensions 1;    ; K  n - the measurement name from which navigation occurs, where n = (c ;    ; cK ) 2 C      CK  f ;    ; fK - the navigation functions, where fk

traversing nodes in a coordinate hierarchy and then taking the cross product of the coordinates chosen to produce a new measurement name. In step (1), the coordinate system is chosen from the workload dimension, and wfl is selected from the metric dimension. (Selected coordinates are indicated by the bold and italicized font.) Taking the cross product of the selected coordinates results in the single name (system; wfl). This name has an asterisk next to it, which means that navigation continues from this name to the next step. In step (2), navigation proceeds along the workload dimension in that arcs are traversed from system to its children: PG1 and PG2. Taking the cross product of the coordinates selected results in the measurement names (PG1; wfl) and (PG2; wfl). Navigations along the metric dimension are depicted in Fig. 8. The Fig. 7 and Fig. 8 navigations are equivalent to constructing dynamically the MNG in Fig. 3. This can be seen by drawing an arc from the name with an asterisk in step i?1 to the measurement names in step i. To formalize the foregoing, we describe how the dimensional approach traverses coordinate hierarchies to infer navigations in the measurement name space. Let K be the number of dimensions, with coordinate sets C ;    ; CK . Further, let n = (c ;    ; cK ) be a measurement name. We show how to construct navigate[n], the subset of the name space reached by using the dimensional approach. Navigation proceeds by traversing arcs in the coordinate hierarchy of one or more dimensions. By so doing, a new set of coordinates is identi ed from which measurement names are constructed. We use the term navigation function 1

1

1

1

returns a subset of Ck 1

Navigation computed: navigate[n] = f(c0 ;    ; c0K ); where c0k 2 fk [ck ]g: 1

Figure 9: Navigation in the Dimensional Approach to specify the mapping from ck to the subset of Ck that is navigated to. The navigation function for the kth dimension is denoted by fk . Examples of navigation functions are: children-Of[ck ], the set of child coordinates of ck ; parents-Of[ck ], the set of parent coordinates of ck ; and identity-Of[ck ] = ck . Hence, navigate[n] = f(c0 ;    ; c0 ); where c0k 2 fk [ck ]g. This is summarized in Fig. 9. K To illustrate how navigation works in the dimensional approach, consider step (2) in Fig. 7. Here, n = (system; wfl), f = children-Of (so, f [system] = fPG1; PG2g), and f = identity-Of.

1

1

1

1

6

2

s0-w0-m0

by s

by w

by m

s30-w0-m0

s1-w0-m0

by w

by m

s30-w0-m1

by s

by w

s30-w0-m30

s0-w1-m30

by w

by s

s30-w30-m30

s30-w1-m30

s0-w0-m30

s0-w0-m1

s1-w1-m30

s0-w30-m30

s30-w1-m30

Figure 5: Extended Measurement Navigation Graph (EMNG) for Thirty Server Example In performance management, this is referred to as a decomposition (or drill-down) in the workload dimension. That is, navigate[(system; wfl)] = f(PG1; wfl); (PG2; wfl)g. Alternatively, consider a navigation back from n = (AS2; prd) along the metric dimension (i.e., from step (5) to step (4) in Fig. 8). Here, f = identity-Of, f = parents-Of (so, f [prd] = fdlyg). Thus, navigate[(AS2; prd)] = f(AS2; dly)g: The dimensional approach is not limited to two dimensions. For example, consider the data source in the previous section with servers, workloads, and metrics. Here, a measurement name such as server 1 claim-processing delays can be expressed as the ordered triple (server 1, claim-processing, delays), where: server 1 is the coordinate from a server dimension; claim-processing is the coordinate from a workload dimension; and delays is the coordinate from a metric dimension. Suppose that delays can be partitioned into CPU-delays, IO-delays, and memory-delays. Then, using f = identity-Of, f = identity-Of, and f = children-Of, we navigate to f(server 1, claim-processing, CPU-delays), (server 1, claim-processing, IO-delays), (server 1, claim-processing, memory-delays)g. 1

formation that humans must specify in order to automate navigation. For MNGs, this information consists of relationships between measurement names; for EMNG, it is the relationships between names as well as between names and the navigation-types; for the dimensional approach, it is the relationships expressed in the coordinate hierarchies. The data source analyzed is a generalization of the multiple server example presented in Section 2. Measurement names are structured as K dimensions. Each coordinate hierarchy has a single root and J leaves. The root coordinate in the kth dimension is denoted by ck . We begin with the MNG approach. For the data source considered, each MNG is a tree, and its root is (c ;    ; cK ). A MNG must contain each variable name, of which there are (J +1)K (one of which is the root). Thus, T[K; J], the total number of measurement relationships to specify, is

2

2

0

0 1

T[K; J] = G[K; J]((J + 1)K ? 1); where G[K; J] is the number of MNGs. To calculate G[K; J], we proceed as follows. There are K levels below the root (which correspond to the K navigations that are possible). Further, there are K ways to navigate from the rst to the second level in a MNG. In addition, observe that if name n has children n0 and n00, then there are MNGs for0 each combination of the possible subtrees rooted at n and n00. So,

1

2

0

3

4 Analytic Comparisons

This section constructs analytic models to provide detailed comparisons between the MNG, EMNG, and dimensional approaches to navigating measurement data. The comparisons consider the amount of in-

G[K; J] = K(G[K ? 1; J])J : 7

METRIC HIERARCHY

WORKLOAD HIERARCHY

wfl

system

usg PG1 pru AS1 dvu AS2 dly PG2 prd AS3 dvd

Figure 6: Coordinate Hierarchies in Fig.2 Example between the rst two levels of name nodes in the EMNG. Clearly, there are K navigation-type nodes between the rst two level of name nodes. Thus, there are K arcs from (c ;    ; cK ) to the rst level of navigation-type nodes. From each of these navigationtype nodes, there are J arcs to the second level of name nodes. Thus, T 0 [K; J] = K + KJT 0 [K ? 1; J]: Since the last two levels of name nodes have no intervening navigation-type nodes, T 0[1; J] = J: Solving the recurrence, we have

Solving the recurrence, we have G[K; J] = exp[

X J iln[K ? i]];

K ?1

0 1

i=0

where exp[x] = ex and ln[y] is the natural logarithm of y. Thus, T[K; J] = exp[

X J iln[K ? i]]((J + 1)K ? 1):

K ?1 i=0

(1)

Considered next is the EMNG approach. Only one EMNG is needed to describe all possible navigations. The total number of arcs in the EMNG is denoted by the function T 0 [K; J]. An EMNG contains two kinds of nodes: name nodes and navigation-type nodes. For the data source considered, the EMNG is a tree, and its root is (c ;    ; cK ). Observe that the levels in the tree alternate between name nodes and navigation-type nodes, except for the last two levels (where there is no choice as to the dimension along which navigation is done). Thus, there are K + 1 levels of measurement nodes, and K ? 1 levels of navigation-type nodes. For convenience, we index separately the levels of the name and navigation-type nodes. The recurrence we develop is between levels of name nodes. Thus, T 0 [K; J] is interpreted as the total number of arcs in a subtree with K navigation-type nodes at its second level and J related measurement names for each navigation chosen. Consider the recurrence 0 1

T 0[K; J] = =

0

PiK? J i? PK? J i?

+ (K!)(J K ? )T 0 [1; J] + (K!)(J K ): i (2) Note that the sum ranges from 1 to K ? 1 since it corresponds to the levels of the navigation-type nodes. Now we consider the dimensional approach. Here, we count inter-coordinate arcs in the dimension hierarchies. Let T 00[K; J] denote this total. Since there is one arc for each leaf in a dimension, T 00 [K; J] = KJ: (3) Eq. (1), Eq. (2), and Eq. (3) quantify the relative complexity of using the three approaches to automating navigation. For the MNG approach, the num-

0

8

1

1

1

1

=1

=1

K! K ?i)! K! (K ?i)! (

1

Navigation Step

Workload Hierarchy

system

(1)

wfl

wfl

system

usg pru dvu dly prd dvd

PG1 AS1 AS2

PG2 AS3 (3)

* (system, wfl)

usg pru dvu dly prd dvd

PG1 AS1 AS2 PG2 AS3

(2)

Measurement Names

Metric Hierarchy

* (PG1, wfl)

(PG2, wfl)

wfl

system

usg pru dvu dly prd dvd

PG1

AS1 AS2 PG2 AS3

(AS1, wfl) * (AS2, wfl)

* - name navigated from

bold-italic

- coordinates used to form measurement names

Figure 7: Dimension-Based Navigation for Scenario I: Part 1 an increase in either the number of dimensions (K) or the number of coordinates (J). Speci cally, the data source herein analyzed requires specifying at least J K relationships just to account for leaf measurement names (those from which no further navigation is possible). For this reason, Eq. (1) includes the term (J + 1)K , and Eq. (2) contains the term J K . A nal observation is that for the data source considered the dimensional approach is in some sense optimal. To see this, observe that exible navigation requires being able to navigate to all measurement names from (c ;    ; cK ). Further, any name that is navigated to must have at least one nonroot coordinate. Thus, a lower bound for the number of inter-coordinate links is the number of leaf coordinates in the dimensional hierarchies: KJ. But T 00[K; J] = KJ. Thus, the number of relationships that must be speci ed for the dimensional approach equals this lower bound.

ber of measurement relationships that must be speci ed grows at a staggering rate as either K or J increase. This is indicated by the exponential and factorial terms in Eq. (1). Using an EMNG can greatly reduce the number of relationships that must be speci ed, as indicated by Eq. (2), although here too there are factorial and exponential terms. The dimensional approach scales much better, as evidenced by the absence of factorial and exponential terms in Eq. (3). Putting this in the context of the server example in Section 2:  MNG P T[3; 30] = exp[ i (30i )ln[2]](31 ? 1)  10 2

13

 EMNG

P

0 1

3

=0

T 0[3; 30] = i (30i? ) ?i + (2!)(30 ) = 162; 183 2

 Dimensional

=1

2!

1

(2

5 Discussion

3

)!

0

This section discusses considerations for employing the dimensional approach in practice. What data sources lend themselves to the dimensional approach? Answering this requires addressing the problem of dimensional design. This has two parts: 1. selecting dimensions and coordinates 2. specifying the coordinate hierarchies Regarding the rst, typical dimensions in distributed systems are node, metric, and time. Indeed, multi-

T 00[3; 30] = (3)(30) = 90

Beyond comparing these speci c approaches to navigation, our analysis points to a more fundamental issue as well: Dealing directly with the name space is inherently unscalable. This is a consequence of the large growth in measurement names when there is 9

Navigation Step (4)

Workload Hierarchy system

Measurement Names

wfl

usg

PG1 AS1

AS2 PG2 AS3 (5)

Metric Hierarchy

dly

* (AS2, dly)

prd dvd

system PG1 AS1

AS2 PG2 AS3

(AS2, usg)

pru dvu

wfl usg pru dvu dly

prd dvd

(AS2, prd) (AS2, dvd)

* - name navigated from

bold-italic

- coordinates used to form measurement names

Figure 8: Dimension-Based Navigation for Scenario I: Part 2 node data collected from Unix sources frequently have this three dimensional structure (e.g., sar, nfsstat, netstat, iosstat, and nfstat [13]). Often, a workload dimension is included as well, for example if there is process level information. Further, the metric dimension may be a composite of two or more other dimensions if it refers to characteristics of distinct resources (e.g., CPU, I/O, and paging). Network data can be more complex still. For example, LAN trac data (such as that collected by Ethernet sni ers) can be structured as: source node, destination node, protocol, metric (e.g., throughput, response time), and workload (e.g., tagged by TCP or UDP port). Our experience has been that coordinate hierarchies have varying levels of complexity. For example, metric hierarchies are often at, consisting of just a service level indicator (e.g., the total call rate in nfsstat or the length of the run queue in sar) with arcs to the remaining metrics, although there are exceptions to this rule (e.g., the RMF Monitor 3 data source, which has a complex, multi-level structure in its metric dimension [10]). The node dimension often has a great deal of structure, especially if there are multiple administrative units (e.g., marketing, accounts receivable) and several classes of machines (e.g., le servers, compute servers, and client workstations). Another example of a complex structure is the protocol dimension for network trac since this mirrors the protocol stack (e.g., NFS packets are a type of UDP packets). A somewhat less obvious example of structure lies within the time dimension. Here, structure results from data aggregations in that individual samples may be combined into collection intervals (e.g., every minute), which in

turn may be accumulated into parts of a day (e.g., by hour), which are further aggregated into days, and then into months. To date, the major problem we have faced with dimensional design relates to irregularities in the measurement name space. For example, a transaction processing application may report rate information that is not provided by a database application. We have dealt with name space irregularities in three ways. The rst is to combine two or more dimensions, such as merging the workload and metric dimensions to account for di erences in the metrics reported by applications. But there are drawbacks to this. First, unless the dimensions being combined have few coordinates, the number of coordinates in the combined dimension may be large and the coordinate hierarchy may be very complex. Second, combining dimensions complicates their maintenance (e.g., when additional metrics are added) and confuses the semantics of navigation. A second approach to dimensional design is to partition the name space, such as having a separate name space for transaction classes and database applications. The diculty here is navigating between the name spaces. However, this appears to be relatively tractable if there is an easy way to recognize when the same dimension and coordinate are used in di erent name spaces (e.g., by using naming conventions or data models). A third approach is to cope with ctitious measurement names. These are measurement names that exist in the cross product of coordinates but for which there is no measured value. For example, consider a measurement source that produces the vari10

6 Summary and Conclusions

ables TP-rate, TP-util, TP-service, DB-util, and DB-service. Now, suppose we use a two dimensional representation of the name space with C = fTP; DB g and C = frate; util; serviceg. Then, (DB; rate) 2 C  C , even though there is no DB-rate in the original data. Such situations can be approached in several ways. For example, if the number of ctitious names is modest, they could be listed separately and consulted whenever a measurement name is constructed by the navigation algorithm. Alternatively, there could be combining rules attached to the coordinates in each dimension. Still another possibility is to associate nil values with ctitious names. While nil values cause diculties for analysis applications that use dimensional navigation to select data upon which calculations are performed (e.g., averages), these diculties can be addressed by using generalized arithmetic operators (which are easily implemented in object-oriented programming languages that support polymorphism). Another consideration in the dimensional approach is mapping the name space in which measurement values are stored into the dimensionally structured name space. Sometimes this is fairly easy, for example if the measurements are stored in a relational database such as described in [20] since the attributes of the relational table may correspond to the dimensional structure. In other situations, measurement data are stored in variable length records with descriptor elds that specify which variables are present. Here, additional code is required to do the mapping (which may be included in the programs that interpret the data records). A third consideration is the storage consumed by employing a dimensional representation of measurement names. In principle, with each measurement value must be stored a vector of coordinates that speci es the associated measurement name. If implemented in this manner, the memory requirements could be large. However, the dimensional approach readily lends itself to more ecient storage of measurement names. Central to doing so is the concept of a dense subset of the name space. Let C      CK be the name space, and let S be a subset of it. S is dense in the name space if there exist C 0  C ;    ; CK0  CK such that S = C 0      CK0 . If S is dense, then the data can be stored in the same manner as is used to linearize a K-dimensional array that is indexed by C 0 ;    ; CK0 . If S is not dense, then multiple dense sets are required, which increases the overhead of the dimensional representation. Our nal remark relates to the fact that several existing navigation tools almost certainly employ elements of the dimensional approach. For example, tools such as [8] and [19] report measurement variables whose existence is dynamic, such as user-related metrics. Thus, these tools must employ something akin to a dimensional representation of measurement names since the name space is not known a priori. However, we are unaware of any tool that supports the exible navigation o ered by the dimensional approach. Our speculation is that while the tools may use (at least in part) a dimensionally structured name space, they do not exploit this structure for navigation.

This paper addresses a key requirement for automating performance management: providing exible and scalable navigation of measurement data. Flexibility is needed so that analysts (or programs) can discover all related measurement variables, thereby enabling the full range of diagnoses and/or actions. Scalability (in terms of what humans must specify to support automation) is essential to cope with the large number of entities in today's distributed computing systems as well as the interactions between these entities. Existing approaches to measurement navigation achieve scalability by severely limiting exibility. Herein, we advocate a dimensional structure to measurement names. That is, measurement names consist of ordered tuples of coordinates from distinct classi cation criteria (or dimensions). For example, the measurement variable server 1 claim-processing delays can be expressed as the ordered triple (server 1, claim-processing, delays), where: server 1 is the coordinate from a server dimension; claim-processing is the coordinate from a workload dimension; and delays is the coordinate from a metric dimension. Navigation proceeds by inferring related measurement names based on the structure of coordinates within a dimension. Thus, if delays can be partitioned into CPU-delays, IO-delays, and memory-delays, we can navigate in the metric dimension to (server 1, claimprocessing, CPU-delays), (server 1, claim-processing, IO-delays) and (server 1, claim-processing, memorydelays). The dimensional approach o ers huge scalability bene ts over existing approaches. To demonstrate this, we develop general analytic models to compare the number of relationships that must be speci ed to achieve exible navigation for three approaches: measurement navigation graphs (MNG), extended measurement navigation graphs (EMNG), and the dimensional approach. These models are then applied to a data source consisting of thirty servers, thirty workloads, and thirty metrics. The MNG approach requires specifying approximately 10 relationships; the EMNG requires 162,000; the dimensional approach requires 90. Such dramatic improvements in scalability result from dimensional approach using coordinate hierarchies to infer relationships between measurement names instead of requiring explicit representations of the relationships between measurement names. Indeed, our analytic models indicate that the latter approach is inherently unscalable. In contrast, the dimensional approach is optimal for the class of data sources considered in the models. Implementing the dimensional approach requires addressing several issues. Foremost is specifying the dimensions and coordinates, which is dicult for data sources that have an irregular name space (e.g., metrics that apply to only some of the workloads). Another consideration is mapping the name space used to store (or collect) measurement data into the dimensionally structured name space. Further, there are issues associated with the overheads of maintaining name-space information for large quantities of data. Solutions are proposed for all of the foregoing.

1

2

1

2

13

1

1

1

1

1

11

[8] Candle Corporation: \OMEGAVIEW Overview," Candle Corporation, Santa Monica, California, 1991. [9] J.L. Hellerstein: \A Comparison of Techniques for Diagnosing Performance Problems in Information Systems," ACM Sigmetrics, 1994. [10] IBM: Analyzing Resource Measurement Facility Version 4 Monitor III Reports, (LY28-1008-2) IBM Corporation, 1990. [11] Adam E. Irgon, Anthony H. Dragoni, Thomas O. Huleatt: \FAST: A Large Scale Expert System for Application and System Software Performance Tuning," ACM Sigmetrics, 151-156, 1988. [12] Lotus: \Improv for Windows," Lotus, 55 Cambridge Parkway, Cambridge, MA, 1992. [13] Mike Loukides: \System Performance Tuning," O'Reilly & Associates, Inc., Sebastopol, CA, 1991. [14] David McGoveran: \Modelling and Analysis for Large Databases," Alternative Technologies, Boulder Creek, CA, 1993. [15] William G. Pope: \The IBM Watson Research Central Compute Cluster (The Farm)," IBM Research Report, RC 17759, 1992. [16] Unix International Performance Management Working Group: \Performance Management Activities Within Unix International," Computer Measurement Group Transactions, Summer, 37-44, 1993. [17] Joe Re : \Dynamic-Viewing Spreadsheets," Byte, November, 1994, pp. 255-258. [18] Y.C. Shim and C.V. Ramamoorthy: \Monitoring and Control of Distributed Systems," Proc. of First International Conference on Systems Integration, 672-681, April 1990. [19] Annie W. Shum: \Visualizing the Parallel Universe," Proceedings of the BFGS Annual Users Group, BGS Systems, Inc., Waltham, MA, December 5, 1994. [20] Richard Snodgrass: \A Relational Approach to Monitoring Complex Systems," ACM Transactions on Computer Systems, 157-196, 1988.

Over the last three years, we have developed several prototypes that incorporate the dimensional approach to navigation, including one that operates in realtime. All of the prototypes employ a graphical user interface that allows analysts to interact directly with plots to specify the kinds of navigations desired. The reaction has been uniformly positive. Indeed, within a few minutes, analysts typically discover new insights as a result of the freedom to navigate large quantities of data in a exible manner. Much work remains to realize the full bene ts of the dimensional approach. For example, what guidelines can be provided for dimensional design, especially when integrating data from multiple sources? How can the dimensional approach be integrated with analysis techniques, such as those used in performance tuning to determine which variables to navigate from? How does dimensional design relate to hardware and software con guration, and how can the dimensional approach be applied to other system management function areas (e.g., fault and security management)? We plan to address these questions in our future work.

Acknowledgements We wish to thank the conference referees for their helpful comments.

References

[1] B. Arinze, M. Igbaria, and L.F. Young: \A Knowledge Based Decision Support System for Computer Performance Management," Decision Support Systems 8, 501-515, 1992. [2] R. A. Backman: \Easy to Use Performance Tools With a Consistent User Interface Across HP Operating Systems," Hewlett Packard Journal 42, 65-70, 1991. [3] Robert F. Berry and Joseph L. Hellerstein: \A Uni ed Approach to Interpreting Measurement Data in Performance Management Applications," First IEEE Conference on Systems Management, University of California, Los Angeles, May, 1993. [4] Robert F. Berry and Mark Maccabee: \An Object-Oriented Data Model for Automation of Computer Performance Management," Proceedings of the Computer Measurement Group, 1991, pp. 213-223. [5] Bernard Domanski: \A PROLOG-based Expert System for Tuning MVS/XA," Proceedings of the Computer Measurement Group, 160-166, 1987. [6] Boole & Babbage: DASD ADVISOR Technical Backgrounder, Boole & Babbage, Inc., 10BAB34.BGR, December, 1987. [7] Computer Associates: \CA-MINDOVER," Computer Associates, 711 Stewart Avenue, Garden City, New York, 1993. 12