A Case Study in Explanation and Implication Eugene C. Freuder, Chavalit Likitvivatanavong, Richard J. Wallace Constraint Computation Center, Department of Computer Science University of New Hampshire Durham, NH 03824 USA ecf-,chavalit-,
[email protected]
Abstract. In this work we explore the problem of generating explanations for solutions obtained by consistency methods, either directly or via more sophisticated forms of inference. For this purpose, we employ a type of logic puzzle, called the 9-puzzle, that can be solved by inference alone. We show how one can generate explanations in the form of trees, guided by the sequence of inferences followed in obtaining a solution. Then we show how ordering heuristics and selection strategies allow us to obtain better explanations according to well-defined criteria. (Essentially, this amounts to finding smaller, more compact explanation trees.) Finally, we show how our testbed can be elaborated to support retraction of partial explanations, if the user wants to add values back to the present state of the puzzle. This allows the user to explore the implications of a given value selection. Together, these methods suggest some ways in which the process of solving combinatorial problems can be made more perspicuous and more interactive.
1 Introduction 1.1 Overview Solving a problem is not always sufficient: Users like to have an explanation for the result, e.g. “the frammus must be blue because red will clash with the gingus”. This is especially true if the result is: “your problem is unsolvable”; users want to know why, and ideally to get advice as to how to modify the problem to make it solvable. If users are incrementally making choices while defining a problem dynamically, or solving a problem interactively, they want to know the implications of their choices, e.g. “if you make the frammus red, then you can’t have a red gingus”. When the problem is unsolvable the user will want to change the problem and view the implications of the changes. Of course, these issues have long been confronted by other technologies moving into real-world use, e.g. by rule-based expert systems. However, they are especially difficult for constraint-based systems because such systems generally rely on combinatorial search. An obvious response to the need for explanation or implication information - tracing the solution process - does not work well for search. “We tried these things and they failed so we backed up and tried something else” (backtrack search) is not a very satisfactory explanation - or even worse “we tried this and it made matters worse
but we did it anyway” (simulated annealing). Pruning away “dead ends” in a search tree simply results in the solution itself. However, the consistency processing that distinguishes the AI approach to constraint solving is an inference process. [6] demonstrated in the context of logic puzzles that a) specialized inference rules could allow one to completely solve many puzzles by inference alone, and b) these inferences could be used to support natural explanations. In this paper we conduct another case study, using the “9-puzzle” taken from a commercial Dell puzzle booklet [1]. The choice was motivated in part by our interest in configuration problems [4]; the 9-puzzle is a rudimentary configuration problem. The puzzle is also related to the quasi-group problem, which has been used for other case studies [3]. We are able to produce inference rules that solve these puzzles without the need for search. The reader may object at this point that real problems often do require some search as well as inference. Puzzles in fact, which are often designed to have a unique solution, and to be solved by people who are better at inference than search, may be unfairly well suited to solution by inference. This is a reasonable objection, and we do intend to expand this work to allow for search. However, several arguments can be made for the significance of the study of inference-based explanation:
Increasingly specialized “global constraints” are being used to shift the burden from search to inference [5]. Typically configurators avoid search and confine themselves to computing the implications of user choices. In theory, any problem can be solved by “synthesis” methods that employ only inference [2]. Efforts have been made to make this approach more practical [7]. Inferences may be useful to retrospectively generate inference-based explanations for a solution that has been found using search. (Even if the solution was found by inference, the explanation does not have to correspond to the inferences used to find the solution; another explanation may be “better”; similarly we can compute the implications of a user specification using redundant inference rules that are not all needed to solve the problem). When we consider how to generate explanations, we face several issues:
What exactly is an explanation? How do we measure the “goodness” of an explanation? How do we produce “better” explanations? There are clearly many possible answers to these questions. In particular, the answers may differ depending on the underlying concerns of the user. In Section 2 we answer these the questions as follows. An explanation is a trace of the inference process that solves the puzzle. This responds to the question: “Why is this the solution?” Or “Why is there no solution?” We measure “goodness” by the number of inference steps, and introduce inference order heuristics that substantially reduce the explanation size. Here, inference order matters. However, even the shortened explanations are quite long. This might be expected for any complex problem. A user might cope with this complexity by asking instead: “Why
does this problem variable have this value?” In Section 3 we introduce the obvious notion of an “explanation tree” to answer these questions. Goodness is measured by features of the tree. We introduce another inference rule, and an associated algorithm and heuristic for applying the rule, that improves the goodness of the explanation trees. We conclude that inference rules matter. In Section 4 we demonstrate how our inference methods can be used prospectively to show the implications of user choices. The user can also undo choices - to explore options or cope with an overconstrained problem. In our current implementation, a GUI facilitates visualization of the explanations and implications that the system computes. This is described in Section 5. 1.2 The 9-Puzzle The 9-puzzle [1] consists of a 9 X 9 array of cells, some of which contain numbers from 1 to 9 (see Figure 1). The goal is to place a number in each empty cell so that each row, each column, and each adjacent 9-cell block within the array contains all of the numbers 1 through 9. Therefore, in a valid solution no number can appear more than once in any row, column, or block. In what follows we refer to cells by their row and column (e.g., cell (3,1) is the cell in the third row, first column, counting from top to bottom and left to right) and blocks are numbered from left to right and top to bottom (e.g., Block 1 is the upper left-hand block of nine cells, set off by double lines in the figure). 4
5 8
7 1
9 7 9
1 2
5 8 4
5 7
6 5 4
7 1
2 6
6 3 1
3
8
1 3 4
2
5 2 5 6
3
Figure 1. We can represent this type of puzzle as a CSP, where each cell is a variable whose domain consists of the numbers 1 to 9. The constraints are all binary inequality constraints, holding between each pair of cells in each row, column and 9-cell block.
2 Complete Explanations The first inference rule we consider (which we will refer to as “rule 1”) follows naturally from this representation and the CSP style of search. This is a simple exclusionary rule based directly on the requirements of the problem: if cell (x,y) contains number n,
then other cells in the same row, column, and block as (x,y) cannot be n. This rule is implemented by taking each nonempty (labeled) cell in turn and deleting its number from the domains of all the other cells in the same row, column and block. If during this process the domain of a cell is reduced to one number, the cell must be given this number. This rule (as well as the other rule we will describe) was sufficient to solve all the 9-puzzles in the booklet. With this inference rule, we must still decide which cell to apply it to at each step in the problem solving procedure. This means that we may be able to find heuristics that order the choice of cells so as to reduce the number of steps in this procedure. As an example, we consider a default ordering that follows the layout of the puzzle and a greedy heuristic tailored to this rule. The default ordering chooses cells with singleton domains as they are given in the puzzle, going from left to right and from top to bottom, i.e. first the cells in the first row, left to right, then in the second row, left to right, etc. The greedy heuristic chooses cells according to the number of items the rule eliminates; that is, we choose a cell where applying the rule eliminates as many values as possible. Both orderings were tested on nine puzzles obtained from the booklet. Efficiency was measured in inference steps, where a “step” consists of choosing a labeled cell (singleton domain) and performing constraint propagation to remove this label from the domains of all the other cells in the same row, column and block where it appears. In this case, the default ordering required an average of 75 steps to produce a full solution, while the greedy ordering required an average of 56 steps, so there was a clear effect of inference order. (A simple analysis of the requirements of the problem gives an upper bound of 80, [based on the total number of cells] and a lower bound of 30, but we have reason to believe that the latter is not a tight bound.)
3 Explanation Trees 3.1 Building Explanation Trees Inference traces can account for the solution obtained, but they do not provide explanations for the specific selections made. Here, we show how it is possible to generate such explanations, guided by the solution process. Specifically, we can explain why each cell was given a certain number. Moreover, we can do this for a given cell as soon as it has been labeled, i.e., even before the final (complete) solution is obtained. In all cases, the explanation produced in this way takes the form of a tree, which we call an explanation tree. Thus, whenever the label for a cell is derived, a set of already labeled cells that form a sufficient basis for assigning that label to can also be determined. Each of these cells, in turn, has a corresponding sufficient set, unless it is one of the original labels. Although an explanation built up in this fashion follows the solution process, it does not have to stand in direct relation to it. Consider our inference rule from the last section. In the procedure that was actually used in our experiments, whenever a value was eliminated from the domain of a cell, the location of the cell that produced this reduction was stored in association with the reduced domain. Then, when the domain was a singleton, we had an explanation of this fact that did in fact reflect the actual
basis for each domain reduction. However, we could have proceeded differently: in each case, when a label is determined, we could search for a possible explanation, i.e. a set of values that would eliminate all the other values from this domain. For example, for the puzzle given in Figure 1, with the default ordering one of the first cells whose value is specified is (2,3), which must be given a 2. This step is shown in Figure 2, where the added label is enlarged and italicized and the cells responsible for the domain reduction are in bold-face. However, the reader will note that the 3 in cell (8,3) could be used instead of the one in (2,8), to form an adequate explanation. 4
5 8
2
7 1
9 7 9
1 2
5 8 4
5 7 8
6 5 4
1 7 1
3 4 2 6
6 3 1
3 2
5 2 5 6
3
Figure 2. At this point in the discussion we introduce a second rule (“rule 2”): for each empty (unlabeled) cell, determine whether a number in its domain can be excluded from every other empty cell in the same block; if so, then the former cell must be given this number. For example, in the puzzle in Figure 1, a 1 must be placed in cell (3,1), as shown by the following argument. Since cells (2,4) and (5,2) contain a 1, then no empty cell in row 2 and column 2 can contain a 1. This means that four of the five empty cells in Block 1 (cells (1,2), (2,1), (2,3) and (3,2)) cannot contain a 1. However, since each block must have all the numbers from 1 to 9, and cell (3,1) is the only remaining empty cell in this block, then a 1 must be placed in this cell. By repeatedly discovering cells where we can deduce the labeling in this way, we solve the entire puzzle. For the second rule, ordering of solution steps does not affect the number of steps in the solution process. This is because for this inference rule there are always 45 steps, the number of empty cells in the puzzle. (Note also that the “steps” in this procedure are not comparable to the steps in the procedure based on rule 1. However, a complete explanation based on rule 2 does seem to be more compact and satisfying in ways we hope to further formalize.) We implement rule 2 as follows. We start with the number 1 and mark all the cells that cannot be 1 because of the original labeling. The results of this procedure using our example puzzle are shown in Figure 3, where cells marked in this way are labeled with the letter “x”. This procedure leaves two blocks (1 and 5) with only one unmarked and unlabeled cell. Since no other cell in these blocks can have the number 1, these cells ((3,1) and (6,5)) must be given this number.
4 x x 8 x x x 9 1 x 2 5 x x 7 8 x
5 x 9 7 x x 3
x 1 x x 5 8 4 x x
x 6 5 4 x x x 2
7 x x x x 6 3 1 x
x x 7 1 x x 2 x 6
x 3 4 x 6 x 5
1 x x 2 x 5 x x 3
Figure 3. At this point we can search for an explanation, i.e. a set of labels that will eliminate the number 1 from all the other cells in these blocks. For example, for cell (3,1) in Block 1, one explanation is f(1,9), (2,4), (5,2), (3,3)g. That is, the 1 in (1,9) prevents any of the cells in the first row of this block from containing the number 1, and, similarly, the 1’s in (2,4) and (5,2) exclude this number from row 2 and column 2. Finally, (3,3) is already labeled with a 9. An alternative explanation is f(1,1), (1,3), (2,2), (3,3), (2,4), (5,2)g, where the first four cells are in Block 1 and have labels different from 1, and the last two cells prevent the remaining empty cells in this block from taking this number. Note that with this rule as with rule 1, the process of explanation-building can involve cells that are different than those used to deduce the labeling. However, in all cases the full explanation for a given cell-labeling takes the form of a tree, as described at the beginning of this subsection. 3.2 Finding Good Explanations Since our explanations take a well-defined form in this situation (regardless of which rule we use), this allows us to describe them quantitatively and to establish criteria for goodness in this domain based on simple quantitative properties. Two obvious properties are average number of nodes in a tree and average number of levels in the tree. As with solution traces, we assume that smaller (or fewer) is better. This may not be an adequate criterion in general, because a very compact explanation may lose in perspicuity. However, since in the present situation the nature of the explanation is always the same for a given rule, it seems reasonable to expect that reducing the number of elements in an explanation will make it more understandable. For rule 1 we compared the two ordering heuristics described in Section 2 as well as a method that uses a cell-selection criterion directly related to our criteria for goodness. In the latter method, at each step a cell is chosen that has the fewest nodes in its explanation tree. The results are shown in Table 1 (number of solution steps is included for reference). As expected, the smallest-tree heuristic produces a smaller tree on average than the other heuristic. Surprisingly, the default ordering is also effective; this may be because cells are selected over the entire board, which allows more of the original cells to be incorporated into the explanations and thus limits the size of the explanation trees. Times to find a solution are uniformly low.
Table 1. Characteristics of Explanation Trees Built Using Rule 1 with Different Ordering Heuristics ordering solution steps average nodes average height avg. time(s) default 75 30 2.8 .04 greedy 56 58 3.8 .18 smallest tree 74 29 2.8 .18
Different heuristics can be associated with rule 2, by choosing explanations according to different criteria, after the marking procedure has uncovered a cell that must be given a certain number. In this case, a “default” explanation set was obtained by scanning the puzzle by rows and columns to find the first set of cells that prevented the remaining cells in a block from taking a given label. This was compared with two methods designed to find good explanations. The first chooses the smallest set of cells for an explanation, i.e. the smallest set that eliminates a number from all but one empty cell in a block (cf. the example given in the previous subsection, where the first explanation, with 4 elements, would be chosen over the second, which has 6). The second chooses the set of cells with the smallest average explanation tree size. With these methods we obtain the results shown in Table 2 (as explained before, there are always 45 solution steps). To further evaluate the times required by these heuristics, we tested the rule with the default ordering and without constructing explanation trees; under these conditions the average runtime was 1.4 seconds. Table 2. Characteristics of Explanation Trees Built Using Rule 2 with Different Selection Criteria criterion average nodes average height avg. time(s) default 558 7.5 1.9 smallest set 44 4.6 4.9 smallest tree 19 2.9 5.1
We see that by combining the second inference rule with more careful selection of a good explanation at each step, we can further reduce the size of the explanation tree, although with some increase in computation cost.
4 Exploring Implications by Dynamically Altering Explanations In this context, we have also considered the problem of allowing the user to reintroduce numbers during the solving process in order to see the implications of having or not having these values in the problem. In this case, the user can provide one number at a time and the program will adjust the existing explanation trees. Eventually, we hope to expand this feature to allow dynamic problem alteration. This is, of course, important for coping with unsolvable problems and situations in which all the problem features are not available in advance.
To achieve this form of interaction, we must be able to undo effects of inferences from the cell being modified, since explanations of other cells that include this cell are no longer valid. In addition, we must undo inferences that affect the modified cell. Suppose that the domain of cell does not contain the value k; this means its explanation includes a cell, 0 , with singleton domain k. Therefore, if k is added to the domain of
, then the inference based on cell 0 has to be undone as well. This action is carried out recursively until no level of the explanation tree is affected by the original change. In addition, other explanation trees which use affected cells must also be changed, and this action must be carried out recursively in a similar manner. We demonstrate this process, using our original puzzle (Figure 1). The solution to this puzzle is shown in Figure 4. 4 7 1 6 9 3 5 2 8
6 8 3 5 1 2 9 7 4
5 2 9 7 8 4 6 3 1
9 1 2 3 5 8 4 6 7
3 6 5 4 7 1 8 9 2
7 4 8 9 2 6 3 1 5
8 5 7 1 3 9 2 4 6
2 3 4 8 6 7 1 5 9
1 9 6 2 4 5 7 8 3
Figure 4. Now, consider some partial explanation trees obtained during the solution process. (In this case, the solution was found using rule 1, so a full tree has eight nodes at each level.)
f8g(8,9)
f2g(5,6)
...
...
f4g(8,7) f9g(6,7) f6g(5,8)
f6g(5,8)
Suppose the user changes the value of cell (6,7) from f9g to f9,6g. Cell (6,7) is now unlabeled. In addition, since the value of cell (6,7) is no longer a singleton, it cannot be used for inference, and thus, it cannot be in the explanation for cell (8,7). Therefore, the number 9 is added back to the domain of cell (8,7) and cell (6,7) is deleted from its explanation. Cell (8,7) is now unlabeled, and inferences based on this cell must be undone also. This process continues recursively upward and stops when cell (8,9) is unlabeled. However, we are still not done, since we have to undo the inference from cell (5,8) as well. This forces us to unlabel cell (5,6), and its domain becomes f2,6g. This process goes on recursively, and the resulting puzzle is shown in Figure 5. 4 7 1 6 9 3 5 2 8
6 8 3 5 1 2 9 7
5 2 9 7 4 6 3 1
1 2 3 5 8 4
6 5 4 7 1 8 2
7 4 9
8 5 7 1
3 4
1 9 6 2
6 6 3 1 5
5 2 5 6
3
Figure 5.
5 Visual Interface The present system is written in Java 1.2 and runs under Microsoft Windows. A puzzleboard is shown at the center of the display. To the left, menus activated from a tool bar allow the user to select a puzzle to solve as well as the rule and heuristics to be used in finding the solution. A series of buttons to the right of the main display allows the user to specify a desired action: find a complete solution, perform one step of the procedure (i.e., one rule application), clear the board in order to start a new puzzle, show statistics for explanation trees generated from search. Once a solution has been found or a single cell assigned a label, the user can click on any labeled cell to evoke an explanation for that labeling. This appears in the form of a set of highlighted cells on the puzzle board and, simultaneously in a panel to the left of the board, as the child nodes in the explanation tree. In the latter case, nodes that represent original labels are depicted in a different way than nodes for labels that in turn depend on other cells. The latter can be ‘opened’ by clicking, to show the next level of the tree. After each application of a rule, the action is described in a separate panel under the display. At the same time, the cells involved in the step are highlighted on the puzzleboard. With rule 1, a sample description is: “apply the rule at (1,2) - #step = 15”. With rule 2, a sample description is: “the value of this cell is deduced from the value of the highlighted cells - #step 5”.
This system is available for perusal at: http://pubpages.unh.edu/˜ chavalit/ninepuzzle.html.
6 Summary and Conclusions Although preliminary, this work has already shown how explanations can be built automatically for a complex inferential process. At the center of this effort is the concept of the explanation tree, that is created during the course of problem solving and can then be used as a commentary for each step of the process. In addition, we have shown how different inference rules can generate solution traces and explanations with distinctive properties. Because the explanation tree is a well-defined object, its properties can be used to distinguish explanations according to reasonable measures of goodness. We have found that explanation trees produced using different inference rules in association with different selection strategies (with respect to ordering the inference steps or choosing among immediate explanations), can show marked differences in these measures of goodness. This is a “workshop work-in-progress” paper; the conclusions are not especially surprising. However, we believe we have made a contribution in demonstrating the manner in which we can move from plausible general hypotheses (“inference order matters”) to the evaluation of concrete methods.
Acknowledgement. This work was supported by Calico Commerce. We would like to thank Nancy Schluster and Dell Magazine for permission to reproduce material from Dell Logic Puzzles.
7 References [1] Dell math puzzles and logic problems, March 2000, pp. 23. [2] E. C. Freuder, ”Synthesizing Constraint Expressions”, Comm. of the ACM, Vol. 21, No. 11, November 1978, pp. 958-966. [3] C. Gomes and B. Selman, “Problem structure in the presence of pertubations”, Proc. 14th National Conference on Artificial Intelligence, AAAI-97, Cambridge, MA: MIT Press, 1997. [4] ”Special Issue on Configuration”. IEEE Intelligent Systems and Their Applications, July/August 1998. [5] H. Simonis, Application development with the CHIP system. In: G. M. Kuper and M. Wallace (Eds.) Constraint Databases and Applications. Lecture Notes in Computer Science No. 1034, Berlin: Springer, 1996, pp. 1-21. [6] M. H. Squalli and E. C. Freuder, “Inference-Based Constraint Satisfaction Supports Explanation”, Proc. 13th National Conference on Artificial Intelligence, AAAI-96, Cambridge, MA: MIT Press, pp. 318-325. [7] E. Tsang, Foundations of Constraint Satisfaction, London: Academic Press, 1993.