Wing Hong Cheung, James P. Black, and Eric Manning. A Framework for Distributed ... James Alexander Summers. Precedence-Preserving Abstraction forĀ ...
Issues
in Event
Abstraction
Thomas Kunz Institut ffir Theoretische Informatik Technische Hochschule Darmstadt
A b s t r a c t . Debugging distributed applications is very difficult, due to a number of problems. To manage the inherent complexity of distributed applications, for example, the use of abstractions is proposed. Event abstractions group sets of events into one higher-level event. Only event sets with certain properties guarantee proper abstraction. This paper examines two specific event set structures in more depth: complete precedence abstractions and contractions. Its main results are as follows. First, it is shown how the algorithmic detection of complete precedence abstractions can be simplified. Second, an additional structural requirement for contractions is derived to ensure their complete timestamping.
1
Introduction
Debugging distributed applications is commonly thought to be very difficult [3]. One of the problems is that distributed applications are inherently more complex than sequential ones. This problem is usually dealt with by debugging at different levels of abstraction, see [1]. This extended abstract discusses some of the issues relevant to event abstraction, where a set of events is grouped into one higher-level, abstract event. Due to space limitations, only systems with asynchronous interprocess communication will be dealt with. A more detailed treatment, including proofs for all theorems and a discussion of synchronous interprocess communication, can be found in [4].
2
Basic Definitions
The happened before (---~) relation defined in [5] is of particular importance for the analyses of distributed applications. An event a cannot influence another event b if it does not happen before that event. The happened before relation captures the notion of potential causality and the partial order induced by ---* is sometimes referred to as causality order or causality graph. Each event e is assigned a timestamp T,, usually an integer or a vector of integers. These timestamps are used to determine the --* relation quickly (instead of tracing a path through tile causality graph). Many timestamping techniques have been proposed in the literature. This abstract uses the vector timestamps proposed by [7]. For individual events, the following two timestamp tests can be used to derive the ---* relation: T i m e s t a m p T e s t 1: a ~ b iff Ta[i] < Tb[i] for all timestamp vector elements i. T i m e s t a m p T e s t 2: a ~ b iff Ta[p] < Tb[p], where event a occurs in process p.
669
3
Proper Abstraction
In the most general sense, events are atomic entities which cause changes to the state of an application. In distributed debugging, process creation and termination as well as all events related to interprocess communication are of particular interest. Each run of a distributed application produces a stream of these events. This is typically depicted in a diagram similar to Fig. 1. Each process is drawn as a line, events are drawn as circles. Time progresses from left to right, the layout of the events mirrors the underlying --~ relation. Event abstractions are formed by grouping sets of events into one abstract event. There are two difficult problems involved in the abstraction process. First, it is far from obvious what abstractions should be formed to represent the overall program behaviour in a meaningful way [6]. Second, arbitrary abstractions may not reflect the ~ relation between events correctly at higher abstraction levels. Violations of the happened before relation at higher abstraction levels lead to a wrong or misleading representation of the overall program behaviour and will seriously impede the debugging process. Work reported in [2, 7] shows that only abstract events with specific properties guarantee proper abstraction, e.g. do not violate the ---*relation. The two more powerful abstract event structures identified in [7] are complete precedence abstractions and contractions. Our research aims to analyze the event stream generated by the execution of a distributed application to automatically derive abstractions. Abstract events are essentially sets of more primitive events. Therefore, an automatic event abstraction algorithm will generate event sets and check whether a particular event set is suited as abstraction. However, generating all possible event sets is slow and inefficient. In a system with only 20 events, for example, 4845 different event sets of size 4 exist, but only very few will form useful abstractions. And real-life distributed applications produce event streams with thousands of events. So an important question we have to address is how to reduce the number of event sets generated without ignoring relevant event sets.
4
Complete Precedence Abstractions
Informally, let an event i A (o A) be an input event (output event) for a set of events A if it has an immediate predecessor (successor) according to the ~ relation outside A. A complete precedence abstraction is defined as follows: D e f i n i t i o n 1. A set E of events is called a complete precedence abstraction iff Vi E ,o E : iE --* o E.
T h e o r e m 2. A complete precedence abstraction E is dense: Va, bE E :a---*c--~b~cE E. The density property is a very strong structural property, drastically reducing the number of event sets that have to be generated by an event abstraction algorithm. Consider the example shown in Fig. 1.
670 a
PI
b
c
~
~
7~
Fig. 1. A sample application execution In this example, 36 distinct event sets of size 3 containing the event a exist. However, only the following 7 event sets are dense: {a, b, c}, {a, b, d}, {a, d, e}, {a, d, g}, {a, d, h}, {a, d, i} and {a, g, h}. The event set A = {a, e, f}, for example, is not dense: a --* d ---* e A a, e E A, d ~ A. A second possibility to reduce the number of event sets generated even further is presented in [4]. 5
Contractions
D e f i n i t i o n 3. An input point I A (output point 0 A) of a set of events A is a subset of the set of input events I A (output events o a ) . The indices are used to differentiate between multiple input/output points. Typically, we assume that all input/output events belong to exactly one input/output point: I A N I A - ~ for n :#- m and UI$ = I A (o.
= 0 for n #
and U O .
= 0").
D e f i n i t i o n 4 . A set A of events is a contraction if, for every possible pair of an input point 1~ and an output point 0am, there exists an input event i A E I A and an output event oa E OAm such that i a ---, oA. To guarantee proper abstraction, contractions can only be connected according to the rules given in [7]. It can be shown that in systems with only asynchronous interprocess communication, contractions are dense. Contrary to complete precedence abstractions, this property is lost when synchronous interprocess communication is allowed too, makeing contractions harder to detect in such a case. Another difference to complete precedence abstractions is that the contraction timestamps calculated with the algorithm given in [7] are incomplete: E1 --* E~ ::~ TEl