It (efficiently) solves a real world problem of practical utility for ... (Throughout this paper we shall assume that the degree of the left ... it will be non-recursive, from a LFP query of the form of (2). Such a ... A few comments on the notation are in order. First, all ..... fied by the statement causing failure in the match- ing process.
Synthesizing Least Fixed Point Queries into Non-recursive Iterative Programs Shamim A. Naqvi Bell Laboratories Murray H i l l , NJ 07974 Lawrence J. Henschen Northwestern University Evanston, IL 60201
Abstract An algorithm is presented which takes a least fixed point query expressed by a basis and inductive step and transforms it into a non-recursive iterative program. By augmenting a relational system with this algorithm, least fixed point queries can be handled automatically, and there is no need to proceduralizc relational algebra for such queries. 1. Motivation We present an algorithm for synthesizing a nonrecursive, iterative program from a Least Fixed Point query (defined later). There are three reasons why such a synthesis will be of interest to researchers in AI in general, and to researchers in program synthesis in particular: 1.
It (efficiently) solves a real world problem of practical utility for database workers.
2.
The method was derived using techniques from logic programming systems. In particular the backward-chaining depth-first problem solving metaphor leads to an intuitive understanding of the algorithm. Thus a link between logic programming and program synthesis is illustrated.
3.
Finally, it illustrates the idea that if programs can be classified into domains (e.g. sort programs, list processing programs etc.) then may be we can derive algorithms for synthesizing programs within a domain.
2. Background
to a group of employees, and determining whether there is an active circuit connecting two points. None of the above can be couched in relational algebra [AHOl. In studying the question of how powerful a data query language should be, Aho and Ullman [ A H O l postulated two general principles and noted that Least Fixed Point (LFP) queries satisfy those principles but cannot be supported by the traditional relational calculus [ C O D D ] . Since in general a LFP of (1) may not exist [ T A R S ] , a relation R can be constructed inductively by specifying the basis and inductive rules:
In [ A H O l it is further shown that if in (2) there is only one occurrence of R in f ( R ) then certain optimizations can be performed in the calculation of the LFP through (3). In this paper we present a method of automatically synthesizing a non-recursive while-loop program, i.e. a program like (3) except that it will be non-recursive, from a L F P query of the form of (2). Such a transformation would have the following points of interest for database researchers:
Consider a relational expression of the form R - f(R)
—(l)
(Throughout this paper we shall assume that the degree of the left and right hand sides of all such equations is the same). A L F P of (1) is a relation R* such that R* - f(R*) and R* C S for any S satisfying (1). Examples of such L F P operations are calculating the number of flights between two cities during a given time period, finding the lowest-level manager common
1.
The non-procedural nature of relational algebra is preserved although it would take a fairly sophisticated user to write a L F P query.
2.
No theorem-proving is required to support L F P queries as in [ W R I G ] though our method is grounded in the resolution principle of Robinson [ROB].
26 S. Naqvi and L. Henschen
3.
The program derived for (2) is non-recursive.
4.
W i t h a slight abuse of notation of (2), link queries (defined later) of QBE [ Z L O O ] can also be supported in the same framework.
3. Intuitive Explanation of the Algorithm
Now, assume we have a backward-chaining problemsolving system and, further, that a problem p(x,y,....) is solvable only if at least one of the variables x,y,... is instantiated to a constant. Finally, it is evident that a LFP query can be replaced by a view. Thus, (4) can be replaced by the query
It is simpler to develop and explain the algorithm by assuming
that
the
LFP query
will
be
written
in
relational calculus. Thus, equation (2) becomes
As an example, consider a database containing the relations father(x,y) and parent(u,v) for particular values of x, y, u and v. Then a query asking for (certain kinds of) ancestors of a person V can be stated as follows:
A few comments on the notation are in order. First, all variables are assumed to be universally quantified. Second, the syntax, on purpose, is like that of P R O L O G [ P E R E ] . Third, any base predicate (i.e., a relation occurring in the database) with at least one argument instantiated to a constant represents a retrieval request. Thus, the expression parent(w,a) can be read as S2-a parent", select from the parent relation those tuples whose second components have the value V. Fourth, we have purposefully chosen a somewhat obscure and incomplete definition of ancestors. This is to highlight a distinction later. Finally, our method of handling L F P queries (as described below) has two advantages over a P R O L O G system. It not only works for left-recursive assertions but is also independent of the order of the assertions. In fact, putting the above query to a P R O L O G system will cause an infinite computation. In order to proceed with an intuitive description of our method, we need to specify how a backward-chaining problem-solving system works. Given an assertion of the form
and a problem, C, such a problem-solving system matches the problem to the head, i.e. left-hand side of the assertion. If the match succeeds then the original problem, C, is replaced by the subproblems g1, g2, i.e. the body of the assertion, with the matching substitution applied to the subproblems. If the head of more than one assertion matches a problem then the system has to make a choice.
We now describe how a backward-chaining problemsolving system would solve (5) using (6a) and (6b). From this description, our method will become obvious. To solve (5), we match it with the head of (6a) and (6b). Since (5) matches both the assertions, let us choose (6a) first. The matching substitution is (u— w,v —a}, and, thus, the original problem is replaced by the body of (6a), namely, father (w,a). This subproblem is solvable and, in fact, yields a first set of answers which we can collect in a relation, say ancestor. From (5) and (6b), the original problem is replaced by the body of (6b) with the matching substitution applied. This yields ancestor(w,y), parent(y,a). Proceeding in a depth-first manner, the subproblem ancestor(w,y) is not solvable, but parent(y,a) is and yields a set of values for y. These values for y are substituted back into the failed subproblem ancestor(w,y) and this subproblem is asked again. Once again we have a choice between (6a) and (6b) and the above process repeats. The above iterative pattern can be captured in the following steps: 1.
Retrieve father(w,a) and insert these tuples into the answer relation, ancestor.
2.
Stack the value V.
3.
Pop a value from the stack and assign it to V.
4.
Retrieve parent(y,z) and use these new values of y to retrieve father(w,y). Also stack these values of y. Collect tuples retrieved for father(w,y) in ancestor.
5.
Go to step 3.
S. Naqvi and L. Henschen 27
Insert-in-queue(Q,a) inserts the value V into queue Q if 'a' has not been inserted into Q before in the current computation. Finally, removc-from-queue(Q) removes the top element of Q. Let us now consider a right-recursive definition of the above L F P query. (
Once again, a similar problem solver would solve the above query as follows: (7) and (8) yield the primitive subproblem father(w,a). From (7) and (9), to solve ancestor(w,a) solve parent(w,y) and ancestor (y,a). Clearly parent (w,y) cannot be effectively solved because we do not know either 'w' or 'y' To solve ancestor(y,a) reuse (8). This yields a value for 'y' which makes parent (w,y) solvable. In this case the program form is as follow:
as the basis residue (BR). Thus from (5), (6a) and (6b) the basis residue is father(u,v) and the inductive residue is parent(y,z). Now notice that the program forms (A) and (B) differ in the order in which the basis and inductive residue expressions are used. If we call the constants specified in the basis step R0 —> R as driver constants, we are lead to the general program form (C) shown in Fig. 1. The only thing left to explain is the order in which BR and IR expressions are to be used. This order can be elucidated as follows. Consider an L F P query stated as (10a) and (10b). Hypothesize solving R in (10a) by using (10b). This generates the subproblems
Now, by examination, if the subproblem R in the above expression is solvable (i.e. at least one of its variables is known) then the order is "eval(IR); R U eval(BR)'', otherwise it is "eval(BR); R U eval(IR)".
Figure 1: General Program Form
1.
Find the Basis Residue (BR) and the Inductive Residue ( I R ) .
2.
Decide on the order of the BR and IR expressions as follows: Match the head of the BR expression (11) to the head of the IR expression (12). Apply the matching substitution to the body of (12) and if the subproblem, R, in the body of (12) is solvable then the order is {eval(IR); R U eval(BR)) else the order is (eval(BR); R U eval(IR)).
3.
Substitute the BR and IR expressions (in the right order) into the program skeleton (C).
A proof that the program form above produces correct answers is given in [ H E N ] . Briefly, correct answers are those values logically implied by the definitions. Our process is essentially supported resolution which is sound and complete, so that all and only answers are generated. Clearly, if the relation R is not cyclic, the stack must become empty at some point. Certain kinds of cyclicity (e.g. reflexivity ) can be easily recognized by slight additions to the algorithm. In any case, a more elaborate termination test for the while-loop, based on remembering the derivation of each value added to R, yields termination in all cases. See [ H E N ] for details.
28
S. Naqvl and L Henschen
The transformation algorithm itself has complexity based on identifying cycles in a (clause connectivity) graph and on unification, both of which admit efficient algorithms. The general case allows for programs representing non-tail recursive definitions as well. In these cases, the eval(IR) part of the loop includes an expanding formula corresponding to the non-tail recursive part of the definition.
REFERENCES [AHO]
Aho, A. and Ullman, J.: Universality of Data Retrieval Languages, Conf. on POPL, 1978, pp. 110-120.
[CODD]
Codd, E.: Relational Completeness of Database Sublanguages, in Data Base Systems (ed. R. Rustin), Prentice Hall, Englewood Cliffs, N J .
[HEN]
Henschen, L. and Naqvi, S.: On Compiling Queries in Recursive FirstOrder Databases, to appear in the Journal of the A C M .
[PERE]
Pereira, L., Pereira, F. and Warren, D.: Users Guide to DEC 10 P R O L O G , Dept. of A I , Univ. of Edinburgh, 1978.
[ROB]
Robinson, J.: A Machine-Oriented Logic Based on the Resolution Principle, Journal of the A C M , 12 1, 1965
[TARS]
Tarski, A.: A Lattice-Theoretical Fixpoint Theorem and its Applications, Pacific Journal of Mathematics 5:2, 1955.
5. Conclusion
[WR1G]
We have presented a method which, without using theorem-proving, transforms a L F P query expressed as a recursive definition into a non-recursive program. The use of this method permits one to process LFP queries posed in a conventional relational calculus.
Wright, D.: P R O L O G as a Relationally Complete Database Query Language which can handle Least Fixed Point Operators, Univ. of Kentucky Tech. Report no. 73-80.
[ZLOO]
Zloof, M.: Query-By-Example: A Database Management Language, I B M Systems Journal 16, no. 4, 1977.
[ZL002]
Zloof, M.: Query-By-Example: Operations on the Transitive Closure, I B M Technical Report no. 5526.
Although, we give preference to select operations over joins, this is not required. However, most of the time it is obviously more efficient. For example, in the expression
it makes no sense to form the join of parent(xl,x2) and parent(x2,x3) to start with. A feature of the method is that the form of the database requests is known beforehand so that optimizing these requests on the basis of the physical organization of the database as well as on the basis of redundancy in the set of requests could be carried out when the program is synthesized.
6. Acknowledgements Dan Fishman pointed out some errors and omissions in an earlier draft of this paper. Earlier drafts were also read and commented upon by Kevin Wilkinson, John Linderman and an annonymous referee at Bell Laboratories. We thank all of them.
DIAGNOSTIC REASONING IN SOFTWARE FAULT LOCALIZATION Robert L. Sedlmeyer* W i l l i a m B. Thompson P a u l E. J o h n s o n
University ABSTRACT We p r e s e n t a d i a g n o s t i c model of s o f t w a r e fault localization. A diagnoatic. approach to f a u l t l o c a l i z a t i o n has p r o v e n e f f e c t i v e I n t h e domains o f m e d i c i n e and d i g i t a l h a r d w a r e . Applyi n g t h i s a p p r o a c h t o t h e s o f t w a r e domain r e q u i r e s two e x t e n s i o n s : a h e u r i s t i c a b s t r a c t i o n mechanism w h i c h i n f e r s program f u n c t i o n from s t r u c t u r e u s i n g r e c o g n i t i o n and t r a n s f o r m a t i o n t a c t i c s ; and a s e a r c h mechanism w h i c h i n t e g r a t e s b o t h p r o t o t y p i c and c a u s a l r e a s o n i n g a b o u t f a u l t s . I.
INTRODUCTION
I n t h i s p a p e r w e p r e s e n t a model o f f a u l t l o c a l i z a t i o n f o r p r o g r a m d e b u g g i n g based o n a t r o u b l e - s h o o t i n g paradigm [ J ] , Within a diagnostic f r a m e w o r k w e d e f i n e the. s o f t w a r e f a u l t l o c a l i z a t i o n task as f o l l o w s : A fault e x i s t s In a program whenever i t s output d i f f e r s from t h a t expected by the user. These d e s c r e p a n c i e s a r e c a l l e d f a u l t manif e s t a t i o n s . The t a s k i s t o i d e n t i f y t h e cause o f each m a n i f e s t a t i o n w h i c h i s s p e c i f i c enough t o e f f e c t program r e p a i r . A p p l i c a t i o n of the t r o u b l e - s h o o t i n g s t r a t e g y has p r o v e n e f f e c t i v e i n c o n s t r u c t i n g i n t e l l i g e n t s y s t e m s f o r h a r d w a r e [ 2 , 3 , 4 ] and m e d i c a l [ 5 , 6 , 7 ] diagnosis. In contrast to verification-based approaches ( c f . [ 8 , 9 , 1 0 , 1 1 ] t h i s s t r a t e g y concent r a t e s c o m p u t a t i o n a l r e s o u r c e s o n s u s p e c t components. A p p l y i n g the d i a g n o s t i c approach to the s o f t w a r e domain r e q u i r e s two e x t e n s i o n s : a h e u r i s t i c a b s t r a c t i o n mechanism t h a t i n f e r s f u n c t i o n f r o m s t r u c t u r e u s i n g r e c o g n i t i o n and t r a n s f o r m a t i o n t a c t i c s ; and a s e a r c h mechanism t h a t i n t e g r a t e s b o t h p r o t o t y p i c and c a u s a l r e a s o n i n g t o localize faults. S e v e r a l program debugging systems evidence t r o u b l e - s h o o t i n g t a c t i c s [ 1 2 , 1 3 , 1 4 ] , b u t none a r e based o n a c o m p r e h e n s i v e d i a g n o s t i c t h e o r y . Sussman [ 1 ] was p r i m a r i l y i n t e r e s t e d i n d e v e l o p i n g a t h e o r y o f s k i l l a c q u i s i t i o n and examined t h e r o l e o f debugging i n t h a t c o n t e x t . W h i l e M i l l e r and G o l d s t e i n [13] addressed the debugging task d i r e c t l y , t h e f a u l t s w h i c h MYCROFT a n a l y z e d were t i g h t l y c o n s t r a i n e d b y t h e s i m p l i c i t y o f t h e programs Involved. S h a p i r o [ 1 4 ] a d d r e s s e s more r e a l i s t i c debugging environments, but formulates a theory of *Current address: D e p a r t m e n t o f Computer T e c h . , Purdue U n i v e r s i t y a t F o r t Wayne, I n d i a n a 46805
of
Minnesota faults
rather
than
of
fault
localization.
Our model p a r t i t i o n s d i a g n o s t i c k n o w l e d g e f o r the s o f t w a r e f a u l t l o c a l i z a t i o n task i n t o knowledge uspd t o l o c a t e known and novel f a u l t s . For d i a g n o s i n g known f a u l t s t h e k n o w l e d g e base c o n t a i n s a h i e r a r c h y o f f a u l t s w h i c h a r e known t o o c c u r i n p r o g r a m s f r o m a p a r t i c u l a r a p p l i c a t i o n s d o m a i n , and a set o f e m p i r i c a l a s s o c i a t i o n s which r e l a t e f a u l t m a n i f e s t a t i o n s to p o s s i b l e causes. For d i a g n o s i n g novel f a u l t s t h e k n o w l e d g e base c o n t a i n s models o f i m p l e m e n t a t i o n a l t e r n a t i v e s and e x e c u t i o n b e h a v i o r of f u n c t i o n s indigenous to the domain. Both knowledge sources are u t i l i z e d by a set of l o c a l i z a t i o n t a c t i c s t o g e n e r a t e , s e l e c t and t e s t f a u l t h y p o theses. II.
KNOWLEDGE OF PROGRAM STRUCTURE AND FUNCTION
Knowledge o f p r o g r a m s t r u c t u r e and f u n c t i o n i s n e c e s s a r y f o r d i a g n o s i s o f b o t h known and n o v e l faults. I n t h e f o r m e r case i t i s used t o c o n f i r m expectations associated w i t h a given f a u l t hypot h e s i s ; i n t h e l a t t e r case i t i s used t o t r a c e v i o l a t i o n s o f expected behavior t o t h e i r source. I n o u r model t h i s k n o w l e d g e i s c a p t u r e d i n functional phototypes. A functional. phototype, c o n s i s t s of f o u r components: a set of r e c o g n i t i o n t r i g g e r s , a l i s t of p r e - and p o s t - c o n d i t i o n s d e s c r i b i n g t h e e x e c u t i o n b e h a v i o r , a d e s c r i p t i o n o f p r o t o t y p e components and t h e i r t o p o l o g y , and a L i s t o f c o n s t r a i n t s among components t h a t must h o l d f o r r e c o g n i t i o n . Prototypes are d e f i n e d a t three l e v e l s o f a b s t r a c t i o n : t h e l a n g u a g e , p r o g r a m m i n g , and a p p l i c a t i o n s l e v e l s ; and a r e h i e r a r c h i c a l l y o r g a n i z e d . A p o r t i o n of the f u n c t i o n a l p r o t o t y p e h i e r a r c h y d e s c r i b i n g t h e component t o p o l o g y o f the "master f i l e p r i m i n g r e a d " (MFPR) f u n c t i o n i s g i v e n i n F i g u r e 1 .
30
R. Sedlmeyer et al.
The f i g u r e shows t h a t t h e MFPR c o n s i s t s o f a s i n g l e component, the f i l e p r i m i n g r e a d . A file_priming r e a d i s o n l y r e c o g n i z e d a s a MFPR i f two c o n s t r a i n t s are met: t h e i n p u t r e c o r d i s used a s a m a s t e r r e c o r d and t h e i n p u t f i l e i s used a s a master f i l e . Recognition t r i g g e r s i n c l u d e the f i l e p r i m i n g _ r e a d and f i l e i n p u t s t m t p r o t o t y p e s as w e l l as the language keyword " r e a d " . III.
KNOWLEDGE OF FAULTS
While f u n c t i o n a l p r o t o t y p e s describe expected p r o g r a m s t r u c t u r e and f u n c t i o n ; f a u l t modals d e s c r i b e e x p e c t e d d e f e c t s i n p r o g r a m s t r u c t u r e and function. A fault model e m b o d i e s k n o w l e d g e of a p a r t i c u l a r implementation error or class of e r r o r s . T h i s knowledge c o n s i s t s o f c o n d i t i o n s under which t h e model i s a p p l i c a b l e , a s e t o f f a u l t h y p o t h e s e s , and a s e t o f r e l a t e d f a u l t m o d e l s . A f a u l t hypot h e s i s is an expected defect in a f u n c t i o n a l p r o totype. Defects are defined as missing p r o t o t y p e components o r v i o l a t e d c o n s t r a i n t s . Like functiona l p r o t o t y p e s f a u l t models are h i e r a r c h i c a l l y organized. F i g u r e 2 d e t a i l s a f a u l t model f o r a master f i l e input e r r o r .
t h e a b s t r a c t i o n mechanism t o t e s t i t . I f the h y p o t h e s i s f a i l s t h e n a r e l a t e d h y p o t h e s i s can b e p r o p o s e d , a d i f f e r e n t f a u l t model can b e a p p l i e d , o r t h e t a c t i c can b e a b a n d o n e d . F u n c t i o n - d r i v e n l o c a l i z a t i o n uses o u t p u t d i s crepancies to suggest t h a t a p a r t i c u l a r f u n c t i o n a l prototype is improperly implemented. The a b s t r a c t i o n mechanism i s t h e n i n v o k e d t o i d e n t i f y t h e code w h i c h i m p l e m e n t s t h e p r o t o t y p e . Tf the p r o t o t y p e c a n n o t b e f o u n d t h e n t h e s o u r c e code p r o d u c i n g t h e most f e a s i b l e match i s c o n s i d e r e d a s the i n t e n d e d i m p l e m e n t a t i o n . The f a u l t i s i d e n t i f i e d by the statement causing f a i l u r e in the matching process. T h i s t a c t i c can a l s o b e d i r e c t e d t o examine a p a r t i c u l a r a b s t r a c t i o n l e v e l f o r a f a u l t . A l l p r o t o t y p e s d e f i n e d a t the chosen l e v e l are c o n sidered suspect. This a p p l i c a t i o n of the f u n c t i o n d r i v e n t a c t i c i s l e s s d i r e c t e d and u s u a l l y l e s s desirable. O n l y when t h e o u t p u t d i s c r e p a n c i e s o f f e r n o b a s i s f o r p r o t o t y p e s e l e c t i o n does t h i s a l t e r n a t i v e become a t t r a c t i v e . C o m p u t a t i o n - d r i v e n l o c a l i z a t i o n t r a c e s the c o m p u t a t i o n o f a p a r t i c u l a r set o f program v a r i ables. The c o m p u t a t i o n i s a n a l y z e d b y e x t r a c t i n g from the program those s t a t e m e n t s t h a t d i r e c t l y a f f e c t the v a l u e s o f v a r i a b l e s i n the s e t . This e x t r a c t i o n p r o c e s s i s known a s s l i c i n g [ 1 5 ] . The c o m p u t a t i o n can b e f o l l o w e d i n e i t h e r a f o r w a r d or reverse order of execution flow. F a u l t s are f o u n d b y u s i n g t h e a b s t r a c t i o n mechanism t o i n t e r p r e t s t a t e m e n t s i n t h e s l i c e a s members o f more a b s t r a c t f u n c t i o n a l p r o t o t y p e s and c o m p a r i n g t h e expected computational r e s u l t s to those d e r i v e d . V.
IV.
LOCALIZATION TACTICS
Localization
is
performed
using
fault-driven, f u n c t i o n - d r i v e n and dntvan. Vault-driven.n localization at
f i n d i n g a p a r t i c u l a r kind of
unexpected end_of
file.
three
fault,
such a s a n
Function-driven
zatA.on a n a l y z e s
a
particular
f o r an e r r o r .
Computation-driven
tactics:
computationd i r e c t s search
functional
localiprototype
localization
c o n c e n t r a t e s s e a r c h on the c o m p u t a t i o n of a set of program v a r i a b l e s . Each makes use o f a n a b s t r a c t i o n mechanism ( S e c t i o n 5 ) t o g e n e r a t e and t e s t f a u l t hypotheses. M u l t i p l e l o c a l i z a t i o n t a c t i c s are d e s i r a b l e for three reasons. F i r s t , t h e y enhance t h e p r o b a b i l i t y of finding f a u l t s since d i f f e r e n t t a c t i c s a n a l y z e t h e s o u r c e code f r o m d i f f e r e n t p e r s p e c t i v e s . Secondly, t a c t i c s which best f i t the a v a i l a b l e i n f o r m a t i o n can b e c h o s e n . T h i r d l y , each t a c t i c i s o p t i m a l , i n t h e sense o f m i n i m i z i n g c o m p u t a t i o n a l resources, for a p a r t i c u l a r l o c a l i z a t i o n task. F a u l t - d r i v e n l o c a l i z a t i o n uses the f a u l t model hierarchy to generate f a u l t hypotheses. Output discrepancies serve as t r i g g e r s f o r s e l e c t i n g a p a r t i c u l a r f a u l t model. The more s a l i e n t t h e t r i g g e r s e t t h e more s p e c i f i c t h e f a u l t m o d e l and the associated f a u l t hypotheses. Given a f a u l t model, l o c a l i z a t i o n continues by f i r s t choosing one o f t h e s e f a u l t h y p o t h e s e s and t h e n i n v o k i n g
PROGRAM ABSTRACTION
Program a b s t r a c t i o n i s p e r f o r m e d b y m a t c h i n g f u n c t i o n a l p r o t o t y p e s to the source code. Abs t r a c t i o n may e i t h e r b e e x p e c t a t i o n - d r i v e n o r d a t a - d r i v e n d e p e n d i n g upon t h e l o c a l i z a t i o n t a c t i c selected. E x p e c t a t i o n - d r i v e n a b s t r a c t i o n matches p r o t o t y p e s t o code i n a t o p - d o w n m a n n e r , r e c u r s i v e l y m a t c h i n g components u n t i l a d i r e c t m a t c h can b e made a g a i n s t t h e s o u r c e code o r p r e v i o u s l y r e c o g nized prototypes. D a t a - d r i v e n a b s t r a c t i o n employs language-level triggers to select prototypes for matching. P r o t o t y p e s matched a t l o w e r l e v e l s o f a b s t r a c t i o n serve as t r i g g e r s f o r matching at higher l e v e l s . Once a f u n c t i o n a l p r o t o t y p e i s i d e n t i f i e d t h e c o r r e s p o n d i n g code i s bound t o i t . Recognition-based abstraction is an e f f i c i e n t t e c h n i q u e , b u t i t has l i m i t a t i o n s . Recognition of a f u n c t i o n a l p r o t o t y p e may f a i l f o r one o f t h r e e reasons: a d e f e c t e x i s t s i n t h e s o u r c e code i m p l e m e n t i n g the p r o t o t y p e , the wrong p r o t o t y p e i s s e l e c t e d , o r t h e s o u r c e code r e p r e s e n t s a n u n f a m i l i a r but correct implementation. The e f f e c t i v e n e s s o f t h e r e c o g n i t i o n mechanism depends o n t h e e x a c t n e s s o f t h e t r i g g e r i n g p r o c e s s and t h e richness of the a l t e r n a t i v e implementation s e t . P u r e l y s t r u c t u r a l m a t c h i n g i s augmented b y f u n c t i o n a l matching. The b e h a v i o r o f a p r o g r a m segment w h i c h i s i n f e r r e d f r o m l a n g u a g e s e m a n t i c s , can a l s o b e compared t o t h e f u n c t i o n a l d e s c r i p t i o n s i n the p r o t o t y p e h i e r a r c h y .
R. Sedlmeyer et al. 31
VI.
AN EXAMPLE
W e have i m p l e m e n t e d a n i n i t i a l v e r s i o n o f t h e model in a p r o g r a m named FALOSY ( F A u l t L o c a l i z a t i o n SYstem). FALOSY a d d r e s s e s f a u l t s i n m a s t e r f i l e update programs [ 1 6 ] . In t h i s section we i l l u s t r a t e FALOSY1s r e a s o n i n g f o r t h e m a s t e r f i l e p r i m i n g read e r r o r . FALOSY i s p r e s e n t e d w i t h a d i s c r e p a n c y l i s t and a l i s t r e p r e s e n t a t i o n o f t h e p r o g r a m ' s s o u r c e code. The d i s c r e p a n c y l i s t f o r m a l l y d e s c r i b e s d i f f e r e n c e s b e t w e e n e x p e c t e d and o b s e r v e d o u t p u t . A n a b r i d g e d t r a c e o f FALOSY's r e a s o n i n g i s g i v e n in the Appendix. Numbers i n p a r e n t h e s e s r e f e r t o l i n e numbers i n t h e A p p e n d i x .
2 4 . Hole f _ i _ s , w h i c h i s bound t o g 0 0 0 1 4 , does n o t p r e c e d e r o l e u p d a t e l o o p , w h i c h i s bound t o g0005 2 5 . Match f o r f p r f a i l s 2 6 . M a t c h i n g m f pr to (s52 read t r a n s f i l e t r a n s b u f ) 2 7. Checking c o n s t r a i n t (before f _ i _ s update l o o p ) 2 8 . R o l e f i s , w h i c h i s bound t o gOOO18, does n o t p r e c e d e r o l e u p d a t e l o o p w h i c h i s bound t o g()0005 2 9 . Match f o r f _ p _ r f a i l s 30. R e c o g n i t i o n of m f pr f a i l s 31. Fault hypothesis v e r i f i e d 32. F a u l t i s : (m f pr m i s s i n g ) BIBLIOGRAPHY [11
A p r o d u c t i o n s y s t e m , whose a n t e c e d e n t s a r e s e t s o f d i s c r e p a n c i e s , i s used t o s e l e c t t h e initial localization tactic. I n t h i s c a s e l.hef a u l t - d r i v e n t a c t i c i s chosen and t h e m a s t e r f i l e p r i m i n g r e a d f a u l t model i s t r i g g e r e d ( 1 ) . FALOSY h y p o t h e s i z e s t h a t the p r i m i n g , read f o r the master f i l e is missing (2). The a b s t r a c t i o n mechanism i s invoked to i d e n t i f y the corresponding p r o t o t y p e (4). A c h e c k i s f i r s t made t o d e t e r m i n e i f i t has been p r e v i o u s l y i d e n t i f i e d . S i n c e i t has n o t , t h e r e c o g n i t i o n mechanism i s i n v o k e d r e c u r s i v e l y t o f i n d t h e components o i t h e m a s t e r f i l e p r i m i n g read p r o t o t y p e . E v e n t u a l l y search is c a r r i e d out at the source l e v e l , and t h r e e r e a d s t a t e m e n t s ( 6 , 26) a r e selected for further matching. C o n s t r a i n t s are now c h e c k e d s t a r t i n g w i t h t h o s e a t t h e l o w e s t l e v e l i n the a b s t r a c t i o n h i e r a r c h y . The f i r s t c a n d i d a t e i s r e j e c t e d s i n c e t h e f i l e i d e n t i f i e r i s n o t used as the master f i l e (20). The second and t h i r d c a n d i d a t e s are rejectee1 because they do not p r e cede t h e u p d a t e l o o p ( 2 4 , 2 8 ) . No candidate s a t i s f i e s a l l c o n s t r a i n t s and r e c o g n i t i o n f a i l s . The o r i g i n a l hypothesis i s thus v e r i f i e d .
[2]
J . D e K l e e r , " L o c a l methods f o r l o c a l i z a t i o n o f f a u l t s i n e l e c t o n i c c i r c u i t s , " M1T-A! Memo 3 9 4 , 1976. M.Genesereth,
"Diagnosis
using hierarchic
design models," Proceeding*
of
the Second AAA]
C o n f e r e n c e - , A u g u s t , 1982. [ 3 ] R. D a v i s , H. S h r o b e , w. H a m s c h e r , K. W i e c k e r t , M . S h i r l e y , and S . Pol i t , " D i a g n o s i s based o n d e s c r i p t i o n o f s t r u c t u r e and f u n c t i o n , " proceeding of the Second AAA! Coference, August, 1982. [ 4 ] R . H a r t l e v , "How e x p e r t s h o u l d a n e x p e r t s y s t e m b e ? " , Proceeding* I J C A I - 8 1 , A u g u s t 1 9 8 1 . [5] R. P a t i l , P. S z o l o v i t s , W. S c h w a r t z , " C a u s a l understanding o f p a t i e n t i l l n e s s i n medical diagnosis," Proceedings of 1JCA1-81, A u g u s t ,
1981. [6]
E.
Short
tions' [71
life,
Computer
Based
Medical
Consulta-
MYCIN, American E l s e v i e r , New York,
New Y o r k , 1 9 7 6 . B . C h a n d r a s e k a r a n and S . M i t t a l , "Deep v e r s u s compiled knowledge approaches to d i a g n o s t i c
p r o b l e m - s o l v i n g , " Proceeding*
of
the Second
AAAI C o n f e r e n c e , A u g u s t , 1 9 8 2 . [ 8 1 H . W e r t z , " U n d e r s t a n d i n g and i m p r o v i n g LISP
programs,"
7 TCAI-77
Proceeding*.
[91
R. R u t h , " I n t e l l i g e n t program a n a l y s i s , " artifical Intelligence, Vol. 7, No. l, 1976 [101 C. Adam and M. "LAURA, a s y s t e m to debug
student
programs,"
Artificial
Intelligence,
November, 1 9 8 0 . [11] F . " U n d e r s t a n d i n g and d e b u g g i n g c o m p u t e r
[12]
programs,"
International
Studies,
12,
G
Vol.
Sussman,
19S0,
Journal of Man-Machine. pp.
189-202.
A Computational Model
of
Skill
Acquisition, American E l s e v i e r , New Y o r k , 1975. [ 1 3 ] M i l l e r , " A s t r u c t u r e d p l a n n i n g and d e b u g g i n g
environment," International Machine Studies, V o l . 1 1 .
Journal 1978.
of
Man-
[14] D. S h a p i r o , " S n i f f e r : a system t h a t understands b u g , " M I T - A I Memo 6 3 8 , J u n e , 1981 [151M. W e i s e r , " P r o g r a m s l i c e s : formal, psychologic a l , and p r a c t i c a l i n v e s t i g a t i o n s o f a n automatic program a b s t r a c t i o n method," U n i v e r s i t y o f M i c h i g a n Ph.D. T h e s i s , 1979. [ 1 6 ] B . D w y e r , "One more t i m e - How to u p d a t e a
master f i l e , " Communications of the ACM, Vol.
2 4 , No.
1, January 1981.