154
15
Detecting Imperfect Patterns in Event Streams Using Local Search Adele E. Howe Computer Science Department Colorado State University Fort Collins, CO 80523 USA
[email protected] ABSTRACT Recurring patterns in event streams may indicate causal in uences. Such patterns have been used as the basis of software debugging. One technique, dependency detection, for nding these patterns requires exhaustive search and perfect patterns, two characteristics that are unrealistic for event streams extracted from software executions. This paper presents an enhanced version of dependency detection that uses a more exible pattern matching scheme to extend the types of patterns detected and local search to reduce the computational demands. The new version was tested on real and arti cial data to determine whether local search is eective for detecting strong patterns.
15.1 Introduction The problem of inferring causality from empirical observations has been well studied. Several approaches are notable for eciently constructing complex causal models of the inter-relationships between variables (e.g., [8, 3, 2]). These approaches tend to rely on correlations and co-variances among the variables as the basis for inferring causality. However, for some applications, the available data are categorical observations over time (e.g., event streams or execution traces of programs); for example, patterns in execution traces form the basis of several methods of debugging software (e.g., [1, 4]). These applications are less amenable to solution by methods based on correlation and co-variance. An alternative approach, called Dependency Detection, searches the event streams for often recurring sequences [6]. The set of recurring sequences (called dependencies) indicates events that commonly co-occur and forms a weak model of causality. Although promising, dependency detection is limited in several ways. First, the underlying search is exhaustive, looking for all possible dependencies in the data. As a consequence, the computational complexity increases exponentially with the length of the sequences. Second, the sequences are rigid, which means that only exact matches count and that any noise (i.e., insertion of some other unrelated event into the stream) will not count as an example of the sequence. Third, the technique considers only a single stream rather than multiple streams of parallel events; Oates et al. have developed a technique for multi-stream dependency detection [7]. This paper describes how local search with 1
c 1996 Springer-Verlag. Learning from Data: AI and Statistics V. Edited by D. Fisher and H.-J. Lenz.
156
Adele E. Howe
exible matching has been used to overcome the rst two limitations.
15.2 Finding Signi cant Sequences The goal is to nd sequences that repeat with unusual frequency in event streams. This section describes the basis for nding these sequences (dependency detection) and enhancements to the basic method that nd longer sequences that are insensitive to minor variations (local search dependency detection). 15.2.1 Dependency Detection Dependency Detection applies contingency table analysis to build a set of simple models of causes of a particular event's occurrence. It was designed to be the rst step in a partially automated debugging process, which was originally developed for the Phoenix planning system [5]. Consequently, the algorithm and the examples refer to failure recovery actions of the Phoenix planner and the failures that occurred during the planner's execution. It has been used as well for detecting sources of search ineciency in another planning system. Dependency detection searches Phoenix execution traces for statistically signi cant patterns: co-occurrences of particular precursors (events or combinations of events) leading to a target event (i.e., a failure)2. The execution traces are sequences of events: alternating failures and the recovery actions that repaired them, e.g.,
Fa ! Raa ! Fb ! Rzz ! Fa ! Raa ! Fb:
The idea is that the appearance of repetitive sequences (which will be called patterns to distinguish potential models from the data), such as Raa ! Fb, may designate early warnings of severe failures or indicate partial causes of the failures. Dependency detection exhaustively tested the signi cance of a well-de ned set of patterns using contingency table analysis. The set of patterns is crucial because the patterns form the hypothesized models of causality for the data; a model that is not in the set will not be tested. The pattern set de nes the size of the search space. To test each pattern, the algorithm needs a method of matching the pattern to the data and, to reduce redundancy and potentially con icting models, needs a method of reconciling similar models. Dependency detection exploits characteristics of the Phoenix data; the set of patterns included all combinations of a precursor followed immediately by a failure; the precursor could be a single failure, a single recovery action or a two event sequence of a failure and a recovery action. (The precursor-failure patterns are written as [F F], [R F] and [FR F], respectively.) While [R F] and [FR F] match the event streams exactly, the [F F] pattern matches by ignoring the value of the intervening recovery action. Similar patterns are reconciled by exploiting a property of the underlying statistic. The dependency detection algorithm has three steps. 1. Gather execution traces. 2. For each de ned pattern of precursor and failure, a contingency table is built and 2
As terminology, a pattern is composed of a precursor and a failure.
Detecting Imperfect Patterns in Event Streams Using Local Search
Fb
157
Fb
Raa 5 10 Raa 10 100 TABLE 15.1. Sample contingency table from Dependency Detection for sequence [RaaFb ]
tested for the appearance of the pattern in the execution traces. 3. Each signi cant pattern is tested for overlap with other similar signi cant patterns. First, the algorithm gathers the data, which is alternating failures and recovery actions. Second, it steps through all possible patterns. For each pattern, it arranges four counts from the event streams in a 2x2 contingency table and applies a G-test3 to the table. Figure 15.1 shows an example contingency table for the sequence [RaaFb] (which indicates that the precursor is a recovery action of type aa and the failure is of type b). The four counts are: the number of times the precursor is followed by the target failure (upper left cell) and not followed by the target failure (upper right cell), as well as when other precursors are followed by the failure (lower left cell) and not followed by it (lower right cell). A G-test of the example table produces G = 6:79; p < :009, suggesting a dependency between the two events. For each signi cant pattern ( < :05), the third step prunes out similar (overlapping) patterns using a Homogeneity G-test. Two patterns overlap when one pattern contains either a particular failure or a particular recovery action as precursor (i.e., are either [R F] or [F F]) and the other contains both (i.e., [FR F] where one of the events in the precursor is the same as in the singleton). The process exploits the additivity property of the G-test by treating the data for the [FR F] pattern as a partition of the [F F] or [R F] (whichever is being compared) data and results in a list of all signi cant patterns found in the execution traces. (Details of the pruning process can be found in [6].) Computationally, dependency detection requires a complete sweep of the execution traces to test each pattern. The number of patterns equals the number of types of failures squared (for FF) plus the number of failures squared times the number of recovery actions (for FRF) plus the number of recovery actions times the number of failures (for RF). In total, this is roughly M N 3 where M is the length of the event traces, N is the number of types of events in the traces (i.e., the total number of types of recovery actions and failures). N is cubed to subsume the FRF case in which the precursor has both a failure and a recovery action; if we wished to consider longer patterns of events, then the 3 would be replaced by the length of the desired pattern. 15.2.2 Local Search Dependency Detection Dependency detection is limited primarily by its reliance on exhaustive search of possible patterns. This was feasible initially because the search space included only all possible immediately preceding in uences on failure; the traces contained at most eleven types of The G-test is similar to chi-square in its use and composition; its primary dierence is that the results are additive, meaning that the summation of G results for partitions of a sample is equal to G for the entire sample. 3
158
Adele E. Howe
failure and eight recovery actions, producing, at most, 1177 possible combinations. The original algorithm was modi ed to include more exible matching and a nonexhaustive, but computationally more manageable search through local search. More
exible matching of patterns to data means that the event streams may be composed of many types of events, which need not alternate as before4. The events can be of different values for several types. Any set of events with time stamps for ordering is a legal event stream. In this view, the event streams for the Phoenix data are described as having two types of events, failures and recovery actions, which include eleven and eight values respectively. The data is passed to the program in a le with three columns: time stamp, event type and event value. Improving the Matching The original dependency detection required an exact match of pattern constituents to events observed in the traces. Many domains may not produce uninterrupted sequences, e.g., ones in which parallel processes are incorporated in a single event trace or regularly occurring, but uninteresting events may disrupt patterns. To make the matching insensitive to the insertion of random events, the contingency table construction was changed to match a relative order pattern within a window. As before, the pattern consists of a precursor and a nal target event. For example, we could use a precursor of any four recovery actions and a target of a particular failure. For the construction of a contingency table, we need to determine whether the precursor matches (thus, we add one to a count in the top row of the contingency table) and whether the target event matches (in which case we add one to a count in the left column of the table). In this algorithm, the precursor is a relative order of a speci ed number of events. A window designates an area over which the algorithm will match the precursor. A pattern matches a given window position when the target event is at the last position and when the elements of the precursor appear in the window in the same order as they do in the pattern. If the window is exactly the length of the precursor, then the match must be exact as in the original dependency detection algorithm; if the window length is greater than that of the precursor, then the match checks the relative order of the constituents of the precursor. For example, if the precursor is [A B] and the window is of length three, then the precursor will match a portion of the event streams in which A is followed immediately by B or A and B are separated by a single event. The algorithm for constructing the contingency table is: 1. Initially position the match window at the rst position in the event streams in which the window's last position (the target) is over an event of the same type and at least W ? 1 events proceed it. 2. Repeatedly re-position the window as follows until the end of the event stream is reached: (a) Position the window with its last position over the next event of the same type as the target event. 4 The original dependency detection matched patterns to event streams based on the alternation in the data without needing to check what type of event happened next.
Detecting Imperfect Patterns in Event Streams Using Local Search
159
A 1→ B 2 → A 3 → B 1 → A 2 → B 1 → A 2 → A 1 → B 2 → A 3 A2
A2
A 1B 2
1
1
A 1B 2
1
1
FIGURE 1. Sliding a window to construct a contingency table.
(b) If the target event matches the last position, then prepare to add to a cell in the left column; otherwise, prepare to add to the right column. (c) If the precursor matches within the window, then add one to the top row and column indicated by the last step of the contingency table; otherwise, add to the bottom row and column indicated as before. For a given pattern, incidence counts are gathered by sliding an imaginary match window over the event streams. The window de nes an area of length W . By default, this length is 2P +1, which indicates that twice as many elements can be found in the window as are expected to be part of the precursor (2P ) and 1 accommodates the target event in the pattern. In other words, we can allow P spurious events to slip between salient events. The window is re-positioned by moving its last position (the target) to the next event of the same type as the target. As in the original formulation, four counts are tallied for the contingency table: precursor with target, precursor without target, not precursor with target and not precursor without target. Each window positioning contributes to one and only one of the four positions in the contingency table. Figure 1 shows four positionings of the sliding window and their contribution to the contingency table for the [A1 B2, A2] pattern (the target is of type A and has the value A2). After sliding the window through all event streams, a G-test is run on the resulting contingency table. Reducing the Search The purpose of the search is to test models of co-occurrence and nd those that seem to represent signi cant dependencies between events in time. Because of the computational problems of increasing the number and character of the potential models, the dependency detection algorithm also was modi ed to apply local instead of exhaustive search. In this case, local search refers to a process of starting with a randomly selected initial pattern (a precursor and target event combination) and applying operators to nd the best pattern from that one; the search continues until no better pattern can be found through operator application. Multiple initial starts of local search are done to collect dierent dependencies and explore dierent parts of the search space. Local search has several advantages over exhaustive search for this application. First, we can control how much time is devoted to search through the operators applied and the number of trials. Second, we can be assured of getting the strongest dependencies; local search prunes the number of dependencies to be considered and usually prunes spurious
160
Adele E. Howe
dependencies based on few data. The algorithm incorporates local search as follows: 1. Gather event streams of salient events. 2. Select parameters for the pattern: a speci c target event, a length for the precursor (P ) and a length for the window (W ). 3. For each of T trials (a) Construct an initial pattern by randomly selecting P events in some order. (b) Find the best pattern by searching from the initial pattern: i. Apply the permutation operator to the best pattern found so far. ii. For each permutation, test its signi cance by constructing a contingency table for the relative order matching and applying the G test to it. iii. Select the best permutation. iv. If the permutation is no better than the previous best, then halt search and return the best that was found; otherwise, repeat from step i. 4. Return a list of the best patterns found in each trial. For the second step, as described in the previous section, patterns may be relative order and of a speci ed length with a speci c target event. The window length indicates the number of allowed spurious events. For the third step, the search nds the best related pattern: the one with the highest G value for its contingency table. For each of the T trials, the initial pattern is composed of a precursor of length P with the elements chosen randomly plus the target event at the end. New patterns are generated by applying a local search operator to the current best. Several local search operators were considered that permute the pattern by reordering and replacing pattern elements, but the best operator, in terms of eciency and ecacy, is 1-OPT. 1-OPT optimizes one element in the pattern at a time. Progressing through the pattern positions in order, this operator considers all available elements as replacements; the operator replaces the element originally in the position with the best element for that position by testing the possibilities. The algorithm continues to apply 1-OPT to subsequent best patterns until the resulting G value cannot be improved. The G values are obtained by constructing contingency tables and applying the G test as was described in the last subsection. The output is a list of the patterns returned in each of the T trials.
15.3 Results Dependency detection, in its original form, has already been shown to be eective at nding interesting patterns in the Phoenix data. The two modi cations described here extend the types of patterns found and search process for identifying signi cant patterns. The utility of the pattern extension remains to be seen; we are currently exploring it in another project on debugging dierent planners. The change in search process will only be an improvement if signi cant dependencies are surrounded by basins of attraction for
Detecting Imperfect Patterns in Event Streams Using Local Search
161
Failure 1 2 3 4 5 6 7 8 # unique 14 10 5 6 8 8 10 14 # p < :01 9 2 3 3 8 1 3 7 Avg. depth 5.7 4.8 6.8 5.3 5.8 5.5 5.0 4.8 TABLE 15.2. Results of local search for 8 target failures with precursors of length 3, window of length 7 on a Phoenix data set. The rst two rows represent the number of trials out of 100 which found unique patterns and highly signi cant patterns. The last row indicates the average number of applications of the local search operator during a trial.
directing the search. If the topology of the search space is relatively at, then the number of random starts required to nd signi cant dependencies will need to be large, thus reducing or eliminating any computational advantage over brute force exhaustive search. Tests of the original dependency detection showed that similar patterns tended all to be signi cant. In other words, it was likely that if one pattern was signi cant, another pattern that shared all but one of the same elements would also be signi cant. In a sense, a strongly signi cant pattern is a basin of attraction in the search space; this characteristic is exploited by local search. To con rm whether such basins of attraction exist and can be exploited by the algorithm, the implementation was tested on execution traces from Phoenix and on synthetic data. 15.3.1 Finding Dependencies in Phoenix Data The ecacy of local search was tested on some data sets from Phoenix. The results were similar in each data set and are reported for a data set with 946 events (alternating failures and recovery actions) in its event streams. For each of eight target failures (those that appeared in the event streams at least ve times), the algorithm started with 100 randomly generated precursors of length three (which makes a pattern of length 4) using a window size of seven and searched for the highest rated neighbor that could be reached from each. Each precursor included either failures or recovery actions for 18 possible values in each precursor position. The searches found a signi cant dependency in about one-third of the trials, and half were highly signi cant (p < :01). As would be expected, many of the dependencies were found in more than one trial and signi cant dependencies tended to be related to other signi cant dependencies. Table 15.2 summarizes the results for each target failure (designated by number) by number of unique dependencies found, number of dependencies found with p < :01, and the average search depth to nd the dependencies. The search depth suggests a strong relationship between signi cant patterns and indicates the number of times the operator was applied. The 100 trials on the eight failures required three hours of CPU time on a SPARC IPX with the code written in Lucid Common Lisp. As shown in Table 15.3, searching for length four precursors exhibited similar, if slightly less successful results. Highly signi cant dependencies are found in, on average, 18 out of 200 trials. In this case, the search tends to proceed much further, but this is balanced out by many of the initial starts not appearing in the data and many of their neighbors not appearing in the data either. Not too surprisingly, the 200 trials on the eight failures required twice as much computation (six hours). As would be expected, the dependencies found exhibited a high degree of overlap.
162
Adele E. Howe
Failure 1 2 3 4 5 6 7 8 # unique 22 19 18 12 16 20 9 29 # p < :01 14 10 12 7 11 4 4 15 Avg. depth 8.5 7.3 7.6 7.5 8.1 7.0 6.9 6.8 TABLE 15.3. Results for precursors of length 4, window of length 9
As described earlier, a shorter pattern overlaps with a longer pattern when the longer pattern is the shorter pattern with events added to one or the other end. Over 50% of the length three precursors overlapped (i.e., were substrings) of the length four precursors. Additionally, comparisons to exhaustive search of longer dependencies showed overlap as well: nearly all the relative order dependencies were found within the absolute order dependencies. 15.3.2 Finding Dependencies in Synthetic Data In addition to the Phoenix data, the local search algorithm was tested on nding dependencies in synthetic data. The synthetic data was used to probe a concern: Can local search nd dependencies even when there is no relationship between similar patterns? The synthetic data was produced by generating token streams with biases. For this test, the bias was three patterns that were selected in advance. For each pattern, the precursor would appear at each position in the event streams with a likelihood of :05; if the precursor was inserted, then the target event followed it with likelihood of :80. Twenty seven synthetic data sets were generated, which included three numbers of tokens (15, 25, 35), three stream lengths (100, 500 1000), and three pattern lengths (2, 3, 4). Local search found all the patterns of length 2 and 3 with 15 or 25 tokens and at least 500 events in the streams, but none of the patterns of length 4 with 15 or 35 tokens. In addition, few of the patterns were found when there were 35 tokens. These results suggest that local search can detect relative order dependencies even when the neighborhoods have less meaning. However, increasing numbers of tokens and length of precursor means that local search is less likely to eectively search the space.
15.4 Conclusions Although computational problems have been overcome, testing and collecting more dependencies exacts a cost in terms of certainty and model deluge. The algorithm requires serial testing of hypotheses: contingency table analysis is run on similar patterns for the same data repeatedly in the course of local search. Consequently, the probability threshold of :05 is no longer an accurate gure. We are currently investigating methods of adjusting down to compensate for the serial testing inherent in this algorithm. Given the fairly shallow depth of the search (number of tests to be run during a search) and the low values of P found in the most signi cant dependencies, it appears that adjusting will enhance the certainty of the results without unduly dropping the number of dependencies detected in the data. From the point of view of someone interpreting dependencies, it is easy to be over-
Detecting Imperfect Patterns in Event Streams Using Local Search
163
whelmed with the number found. Local search addresses this problem in two ways: First, only the most signi cant dependencies are likely to be found. Although the algorithm is doing multiple hill-climbs, it will likely nd repeats in subsequent trials of the same pattern at the top of a large hill. Second, the search neighborhood in local search provides a natural clustering: the basins of attraction around most locally optimal patterns. These basins ensure that only one of the set of similar dependencies will be returned, and they can be exploited to cluster dependencies that contain the same subsequences. Subsequences that are repeated across dependencies are probably themselves signi cant or may indicate cases in which several events exert a similar in uence. For example, in the Phoenix data, some of the failures were variants on the same problem (e.g., two of them were failures in calculating paths), and some of the recovery actions were variants on the same method (e.g., two of them replanned, but from dierent points in the plan). We would like to cluster related dependencies to nd commonalities and to reduce the burden of looking over a long list. Clusters can be formed by nding locally optimal patterns and then backing out of the basin at individual positions to nd other highly signi cant related patterns. The algorithm was augmented to \back out" of local maxima and gather patterns from the neighborhood that were also signi cant. If we examine patterns local to the maxima found in the test with precursors of length three and window of length seven, then we nd that at each position in the pattern at least one and as many as six out of sixteen other patterns are signi cant. With local search, we can nd clusters of somewhat less signi cant but highly similar composition. Local search dependency detection is data directed, nds highly signi cant dependencies and provides a convenient method of clustering. As demonstrated on the Phoenix data, the local search method is eective at detecting long dependencies in data with considerable overlap in the elements of the dependencies: it nds a large number of the most signi cant dependencies in a manageable amount of computation time. However, not too surprisingly, when tested on synthetic data, it does not perform nearly as well when there is little relationship between subsequences and the long sequences that form dependencies. Acknowledgments: This research was supported by a Colorado State University Diversity Career Enhancement grant and ARPA/Rome Laboratory contract F30602-93-C-0100. The US Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation hereon. I wish to thank Darrell Whitley for his advice on the application of local search, Aaron Fuegi for his programming, Steve Leathard for running tests, Tim Oates for the synthetic data generator, and Douglas Fisher and anonymous reviewers for their patient and careful reading and suggestions for improvement. 15.5
References
[1] Peter C. Bates and Jack C. Wileden. High-level debugging of distributed systems: The behavioral abstraction approach. Department of Computer and Information Science 83-29, University of Massachusetts, Amherst, MA, March 1983.
164
Adele E. Howe
[2] Paul R. Cohen, Lisa A. Ballesteros, Dawn E. Gregory, and Robert St. Amant. Regression can build predictive causal models. Technical Report 94-15, Dept. of Computer Science, University of Massachusetts/Amherst., 1994. [3] C. Glymour, R. Scheines, P. Spirtes, and K. Kelly. Discovering Causal Structure. Academic Press, 1987. [4] N.K. Gupta and R.E. Seviora. An expert system approach to real time system debugging. In Proceedings of the IEEE Computer Society Conference on AI Applications, pages 336{343, Denver, CO, December 5-7 1984. [5] Adele E. Howe. Analyzing failure recovery to improve planner design. In Proceedings of the Tenth National Conference on Arti cial Intelligence, pages 387{393, July 1992. [6] Adele E. Howe and Paul R. Cohen. Detecting and explaining dependencies in execution traces. In P. Cheeseman and R.W. Oldford, editors, Selecting Models from Data; Arti cial Intelligence and Statistics IV, volume 89 of Lecture Notes in Statistics, chapter 8, pages 71{78. Springer-Verlag, NY,NY, 1994. [7] Tim Oates, Dawn Gregory, and Paul R. Cohen. Detecting complex dependencies in categorical data. In Preliminary Papers of the Fifthe International Workshop on Arti cial Intelligence and Statistics, January 1995. [8] Judea Pearl and T. S. Verma. A theory of inferred causation. In J.A. Allen, R. Fikes, and E. Sandewall, editors, Principles of Knowledge Representation and Reasoning: Proceedings of the Second International Conference, pages 441{452, San Mateo, CA, April 1991. Morgan Kaufmann.