Recognizing End-User Transactions in Performance Management JL Hellerstein, TS Jayram, I Rish IBM Thomas J. Watson Research Center Hawthorne, New York fhellers, jayram,
[email protected]
Abstract
Providing good quality of service (e.g., low response times) in distributed computer systems requires measuring end-user perceptions of performance. Unfortunately, in practice such measures are often expensive or impossible to obtain. Herein, we propose a machine learning approach to recognizing end-user transactions consisting of sequences of remote procedure calls (RPCs) received at a server. Two problems are addressed. The rst is labeling previously segmented transaction instances with the correct transaction type. This is akin to work done in document classi cation. The second problem is segmenting RPC sequences into transaction instances. This is a more dicult problem, but it is similar to segmenting sounds into words as in speech understanding. Using Naive Bayes, we tackle the labeling problem with four combinations of feature vectors and probability distributions: RPC occurrences with the Bernoulli distribution and RPC counts with the multinomial, geometric, and shifted geometric distributions. Our approach to segmentation searches for sequences of RPCs that have a suciently high probability of being a known transaction type, as determined by one of our classi ers. For both problems, good accuracies are obtained, although the labeling problem achieves higher accuracies (85%) than does segmentation (70%).
1. Introduction
Providing good quality of service (e.g., low response times) to end-users of information systems is essential for eCommerce, among other applications. A rst step is to characterize end-user transactions (EUTs). An EUT is sequence of interactions between the end user and his/her workstation that re ects a logically complete unit of work, such as opening a database, opening a view, reading several records and closing the database. Characterizing EUTs is needed to (a) better quantify end-user perception of performance (b) create representative workloads, and (c) provide better resource management. This paper describes a machine c 2000, American Association for Arti cial InCopyright telligence (www.aaai.org). All rights reserved.
learning approach to recognizing EUTs. EUTs consist of a sequence of commands that endusers issue to their workstation. In distributed systems, these commands typically cause remote procedure calls (RPCs) to be sent from the user's workstation to one or more tiers of servers that process the RPCs. Because end-user workstations are so numerous and since they are often not the responsibility of the administrative sta, there is often little opportunity to collect information about EUTs from the workstation itself. Rather, it is at the servers where EUT information is obtained in the form of RPC sequences. Unfortunately, little information about end-user transactions is present at the server. In principle, client-server protocols could be instrumented to mark the beginning and end of user interactions. However, this is not sucient to identify EUTs since users often view a sequence of application interactions as a single unit of work. In current practice, this quandary is addressed either by using surrogates for EUTs (e.g., synthetic transaction generated by probing stations) or labeling EUTs manually for post-processing. The former often leads to incorrect assessments of service quality. The latter is extremely time consuming. Indeed, it took multiple experts several weeks to segment and label the data we use in this paper. Herein, our objective is to identify RPC sequences that correspond to particular EUTs. This problem has two parts. The rst is segmenting the stream of RPCs (from each user) into transaction instances. The second is labeling the segments with the correct transaction type. The rst problem is similar to that faced in speech understanding where an acoustic model is used to partition sounds into words. This second is akin to work in document classi cation. To illustrate the foregoing, we use the Lotus Notes email system. Common RPCs include OPEN DB, READ ENTRIES, and FIND BY KEY. Given a time ordered sequence of such RPCs from the same end user, we want to identify the beginning and end of EUTs and label each type. Examples of the EUTs in Lotus Notes include: replication, search for a note, update notes, and re-sort view. Herein, we propose the use of machine learning tech-
Problem 1: label segmented data Segmented RPC's
1
Labeled Tx's
2
1
Tx1
Tx2
3
3
1
1
Tx3
2
3
4
Tx3
Tx 1
most likely transaction type
Problem 2: both segment and label Unsegmented RPC's
1
2
1
3
4
1
2
3
4
max P(Tx | segment) < p max P(Tx | segment) < p max P(Tx | segment) < p max P(Tx | segment) > p Segmented RPC's and Labeled Tx's
Tx1
Tx2
Tx3
Tx2
most likely transaction type
Figure 1: Illustration of Transaction Recognition niques to recognize EUTs. For classi cation, we use a Naive Bayes framework. A classi er is speci ed by the choice of feature vector and by conditional probabilities of each feature given a class. Our approach to segmentation searches for sequences of RPCs that have a suciently high probability of being a known transaction type, as determined by one of our classi ers. The results herein reported are of three types. First, we provide insight into a new problem domain for machine learning | recognizing end-user transactions to aid in performance management. Second, we demonstrate that Naive Bayes works well for labeling EUTs in that it provides an accuracy of approximately 85% (with over 30 transaction classes). Third, we describe an approach to segmenting transactions that attains an accuracy close to 70%. The remainder of this paper is organized as follows. Section 2 describes the data characteristics and discusses probabilistic models used for labeling and segmentation. Section 3 details our results for labeling transaction instances, and Section 4 does the same for labeling with segmentation. Section 5 discusses related work. Our conclusions are contained in Section 6.
2. Data Characteristics
This section describes the data characteristics and discusses probabilistic models that we use in subsequent sections for classi cation and segmentation. Our data are obtained from a Lotus Notes email server at a large oil company. The data consist of traces of individual RPCs collected during two one-hour measurements of the email interactions of several hundred users. Included in the trace are the type of RPC (e.g., OPEN COLLECTION), the identity of the server connection (which identi es a single user), and the time (in seconds) at which the request is made. In addition, we have the results of the segmentation and labeling done by Lotus Notes experts, a process that took sev-
eral weeks. Fig. 1 depicts the two problems we address. In problem 1, the sequence of RPCs has been separated into sessions, which in turn have been segmented into transaction instances. The task here is to label each instance with the correct transaction type based on the RPCs in the instance. For example, in the gure, the third transaction instance, which consists of two RPCs of type 3 and one RPC of type 1, is labeled as Tx3. In problem 2, the task is to both segment RPCs into transaction instances and to label these instances. For example, the RPC sequence (1; 2; 1; 3; 4; 1; 2; 3; 4) is segmented into four transaction instances that are labeled Tx1, Tx2, Tx3, and Tx2. Our data are organized into two data sets. Each dataset contains approximately 1,500 transaction instances and about 15,000 RPC instances. There are 32 dierent types of transactions and 92 RPC types. Fig. 2 displays the marginal distributions of transaction and RPC types. Note that while the distribution of RPCs is similar for the two datasets, the distribution of transaction types is quite dierent. Further, the distributions are highly skewed. We structure these data into transaction instances. Instances have a variable number of RPCs, some of which may be of the same type. The transaction type is used as our class variable. Our feature vector is a transformation of the sequence of RPC instances within a transaction instance. Our choice of feature vectors is based on what has been employed in related literature (e.g., document classi cation) and ease of computation. Two feature vectors are considered. Both are represented as a vector of length M (the number of RPC types). The rst is the occurrence of each RPC type. Referring to Fig. 1, M = 4 and the value of the occurrence feature vector for the third transaction instance is (1; 0; 1; 0). A second feature vector is RPC counts. Again referring to Fig. 1 and the third instance, the value of the count feature vector is (1; 0; 2; 0). Applying Naive Bayes to these feature vectors requires having a way to estimate the conditional probability of an RPC type occurring within an instance of a transaction type. For occurrences, a Bernoulli distribution is used. Thus, for each combination of RPC type and transaction type, we estimate P (oij = 1jTi); where oij = 1 if RPC of type j occurred in a transaction instance of type i, and 0 otherwise. For counts, we need to estimate P (nij jTi) for each i; j and each value of nij , where nij is the number of RPCs of type j in a transaction instance of type i. Our approach to these estimation problems is to consider several parametric distributions. The multinomial has been used elsewhere for document classi cation (Nigam et al. 1998; McCallum & Nigam 1998). While good results have been reported, this distribution is technically not well suited for Naive Bayes since it imposes a dependency on the relationship between the counts of RPCs (since they must sum to the number of RPCs in the
Counts
Data Set 1
Data Set 2
200
200
150
150
100
100
50
50
0
0
5
10
15 20 Transaction
25
30
2000
2000
1500
1500
1000
1000
500
500
0
20
40 60 RPC
5
0
80
10
20
15 20 Transaction
40 60 RPC
Multinomial Geometric Shifted Geometric 25
30
80
Figure 2: Marginal Distributions of RPC Types and Transaction Types transaction instance). An alternative is the geometric distribution. This distribution is widely used to describe performance characteristics of queueing systems such as computer systems (Kleinrock 1975). A closer look at the nature of client-server protocols suggests a third distribution that is a variation on the geometric. Speci cally, client-server interactions are broadly of two types. The rst are xed overheads, such as opening a database or accessing a collection of objects. Once this has been done then \payload" operations may take place, such as reads and writes. This suggests that we should mix deterministic distributions with distributions that have substantial variability. It turns out that a variant of the geometric distribution can accommodate these needs. The variant, which we call the shifted geometric distribution, includes a shift parameter ij that speci es the minimum count for RPCs of type j in a transaction instance of type i. Namely, n ? (1 ? pij ); P (nij jTi ) = pij where P (nij jTi) = 0 if nij < ij : Thus, a geometric distribution with a shift parameter of 3 and a probability parameter of 0 is a deterministic distribution. The maximum likelihood estimators for shifted geometric can be found in a straight-forward way. Which distribution function best ts the RPC counts in our data? Answering this question is complicated by the fact that there are 308 distributions | one for each combination of transaction type and RPC type that occurs in our data. (Actually, for the multinomial, there is only one distribution for each transaction type.) We proceed as follows. For each distribution function relating RPC and transaction types we calculate its parameters using standard maximum-likelihood estimators. Then, we use Monte Carlo techniques to generate a set of synthetic transaction instances in accordance with the parameters of the distribution function. ij
ij
Data Set 1 Data Set 2 2238 490 192
1928 398 178
Figure 3: Chi-Square Statistics for Fits of Parametric Distributions A large number of synthetic instances is generated in order to achieve a very low variance in the estimation of distribution quantiles. Using a Chi-square goodness of t test, we compare the empirical distribution of each function's synthetic instances with the empirical distribution of our data. The Chi-square statistics have the interpretation that a lower number indicates that the distribution family better approximates the data. The results are reported in Fig. 3. While no distribution provides a very good t (primarily due to the frequency of zero counts), it is clear that the shifted geometric provides the best t and the multinomial has the worst t.
3. Labeling
This section describes our approach to assigning transaction types to previously segmented transaction instances. This is a classi cation task, where C = fT1 ; :::; Tng, is the set of possible classi cation labels (transaction types), and fk = (R1; :::; RM ) is the feature vector computed for the kth transaction instance and Rj denotes the feature corresponding to RPC of type j . We consider two feature types: RPC occurrences and RPC counts. We use a Naive Bayes classi er. Given transaction instance k, we seek Ti that maximizes P (Tijfk ): Applying Bayes rule gives P (Tijfk ) = P (f PjT(f)P) (T ) : Since Naive Bayes assumes conditional independence between features given class, we get P (fk jTi ) = j P (fk (j )jTi ); where fk (j ) is the value of the j th feature in the kth transaction instance. k
i
i
k
We consider four classi ers, each speci ed by a feature type and a feature distribution: RPC occurrences with the Bernoulli distribution, and RPC counts with the multinomial, geometric, and shifted geometric distributions. These choices are based on previous work in document classi cation and on our data analysis presented in the previous section. We also consider two dierent parameter estimators for the Bernoulli model: (1) the maximum-likelihood estimator and (2) a nonstandard estimator that coincides with the maximumlikelihood estimator of the parameter of the geometric distribution. As it turns out, the second estimator yields a better accuracy for the segmentation task for our data. The accuracy of the classi ers is gauged in two ways. The rst is the fraction of correctly labeled transactions. This is comparable to metrics typically used in text classi cation. The second approach is to measure the
4. Segmentation and Labeling
Considered next is the more dicult problem of both segmenting RPCs into transactions instances and labeling these instances. Our approach, which is a heuristic, uses the classi ers presented in the previous section to determine the most probable transaction type for a candidate transaction instance. If this is not suciently probable, then a dierent transaction segmentation is considered. Our Transaction Segmentation and Labeling (TSL) algorithm takes as input (a) time-ordered RPCs from the same session, (b) a classi er (which is assumed to have been trained previously and computes its own feature vector) that is used to label candidate transaction instances, and (c) the probability threshold ". As shown in Fig. 5, the algorithm operates as follows. First, it initializes to the start of the RPC sequence. It then considers successive subsequences of RPCs. For
Data from COC1, Labeling task 1
0.9
0.8
Transaction labeling accuracy
fraction of correctly labeled RPCs. The latter weights more heavily longer transaction instances. It turns out that the two metrics produce results that are within a few percent of each other. So, we only report results for the rst metric. Our methodology sets aside 10% of the transaction instances as a test set. A subset of the remaining data is used for training data set. We varied the size of this subset to see the eect of training set size. In each run, the same subset was used for all four classi ers, and thirty runs were done for each training set size. The resulting accuracies had a standard deviation of under 1%. Figure 4 presents the results of experiments on the data set 1. The x-axis is training set size, and the yaxis is average labeling accuracy. Note that accuracy generally improves with the size of the training set. All of the classi ers have an accuracy in excess of 75%. This is quite competitive with the literature on document classi cation (e.g., (Nigam et al. 1998; McCallum & Nigam 1998)) and is much better than a classi er that always chooses the transaction type that occurs with highest probability in the data (which would provide an accuracy of about 10%). Further, observe that almost all classi ers, except for shifted geometric one, provide very similar accuracy results. The counts-with-shiftedgeometric classi er has accuracies that are consistently lower than the others. This is in contrast to Fig. 3 that shows the shifted geometric provides the best t to the data. Therefore, the best- tting distribution may not necessarily provide the best classi er. Another interesting observation is that occurrenceswith-Bernoulli, counts-with-multinomial,and countswith-geometric yield very similar results. This is usually not the case in text classi cation. For example, others have shown that the counts-with-multinomial classi er is signi cantly better than occurrences-withBernoulli (McCallum & Nigam 1998). That our data do not abide by this principle suggests that its characteristics may in some ways dier from text classi cation.
0.7
0.6
0.5
0.4
0.3
Bernoulli/geom estimate Bernoulli/ML estimate Multinomial Geometric Shifted geometric
0.2
0.1
0
0
200
400
600 800 Training set size
1000
1200
1400
Figure 4: Classi cation results. each, the algorithm invokes the classi er to nd the most likely transaction Ti and its associated probability, pi . If pi > ", the subsequence is marked as a transaction instance of type Ti : However, if the entire sequence of RPCs is scanned and no suciently probable transaction is found, then the algorithm increases the start pointer by one. Thus, not all RPCs will be in transaction instances. Assessing the accuracy of a segmentation algorithm is complicated by the fact that transaction instances may overlap but not necessarily coincide with instances in the test data. As a result, our accuracy metric here is the fraction of correctly labeled RPCs. We used TSL in combination with all four classi ers. The results are shown in Fig. 6. As expected, for this more complex task, accuracies are lower than for labeling alone{70% vs. 85%. Also, in contrast to the labeling problem where 4 of the 5 classi ers performed equivalently, here we see a clear ranking of the classi ers. Counts-with-multinomial is the best, followed by occurrences-with-Bernoulli using geometric estimator, and counts-with-geometric. Again, counts-with-shifted-geometric provides worse accuracy, but most surprisingly, occurrences-with-Bernoulli using maximum-likelihood estimator is even worse. These results suggest that choosing an estimator may have a signi cant impact on the performance of a classi er. Note that the comparison between multinomialand Bernoulli models in document classi cation literature (McCallum & Nigam 1998) was done using maximum-likelihood estimators. It favored multinomial for large vocabulary sizes. An interesting approach would be to apply Bernoulli with biased estimator to document classi cation domain.
5. Related Work and Discussion
To the best of our knowledge, the problem of recognizing end-user transactions have not yet been ad-
Algorithm TSL(R, Classi er, ") Inputs: R - an RPC sequence Classi er - a transaction instance classi er "-
probability threshold.
Output: Seg- a segmentation start= 1; Seg=; % initialize while start < size(R) for end = start+1 to size(R) S = fR(start),..., R(end)g; [Ti , pi ] = Classi er(S); if pi " Seg = Seg [ (Ti , start, end); start = end+1; break;
end if end for if end