Nov 9, 1987 - Global search theory uses the concept of necessary conditions on the ..... The input is a positive integer n, an ordered map A over the domain ...... The motivation for this comes from the availability of fast algorithms for solving.
Kestrel Institute Technical Report KES.U.87.11
Structure and Design of Global Search Algorithms
Douglas R. Smith Kestrel Institute 3260 Hillview Avenue Palo Alto, California 94304-1216 9 November 1987 Revised 9 July 1988 Revised Oct 1992
ABSTRACT Global search is an enumerative approach to problem solving that generalizes the computational paradigms of binary search, backtracking, branch-and-bound, heuristic search, constraint satisfaction, and others. The structure common to global search algorithms is formalized as a theory that provides the basis for a design tactic that prescribes how to construct a correct global search algorithm from a given formal specification of a problem. Global search theory uses the concept of necessary conditions on the existence of feasible or optimal solutions in order to account for such well-known programming concepts as incorporating a constraint into a generator, pruning of branches in backtracking, and pruning via lower bound functions and dominance relations in branch-and-bound. The theory and tactic are illustrated by application to the problem of 0,1-integer linear programming. The design tactic has been implemented in the KIDS system and used to design dozens of algorithms.
1
Contents Glossary
4
1. Introduction
5
2. Basic Concepts And Notation
7
2.1. Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.2. Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3. A Simple Example: Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.4. Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
3. Enumerating Feasible Solutions
12
3.1. Abstract Structure of Global Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .
12
3.2. Specializing a Known GS-Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3.3. Program Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.3.1. Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.3.2. Simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.3.3. Other program optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.4. Summary of the Design Tactic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4. Enumerating Optimal Cost Solutions
26
4.1. Optimization Problem Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
4.2. Abstract GS-Theory for Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . .
28
4.3. Optimality Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
4.3.1. Lower bound filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
4.3.2. Dominance relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
4.4. Control strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
5. Related Work
36
5.1. Models of Global Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
5.2. Designing Global Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
6. Implementation in KIDS
38
7. Concluding Remarks
39
Appendix
40
References
42
2
Glossary: Symbols in global search theory D - input domain I - input condition R - output domain O - input/output predicate BF - specification for feasibility problem F OF - specification for optimization problem F GF - global search theory for problem F ˆ - type of space descriptors R ˆI (x ) - legal space descriptors ˆr , ˆs, ˆt - space descriptor variables ˆr0 (x ) - initial space descriptor Satisfies(z , ˆr ) - z is in the space ˆr Split(x , ˆr , ˆs) - space ˆs is a subspace of ˆs with respect to input x Extract(z , ˆr ) - z can be directly extracted from the space ˆr Φ(x , ˆr ) - necessary filter on existence of a feasible solution in space ˆr C - cost domain f (x , z ) - cost of solution z with respect to input x f ∗ (x , ˆr ) - minimal cost over feasible solutions in ˆr with respect to input x f ∗ (x ) - minimal cost over feasible solutions with respect to input x lb(x , ˆr ) - lower bound on the cost of feasible solutions in ˆr
3
1.
Introduction
Solving a problem by intelligent enumeration of the space of candidate solutions is a pervasive and well-known paradigm in the Computer Science and Operations Research communities. In this paper we explore one common enumeration method called global search which generalizes the computational paradigms of binary search, backtracking, branch-and-bound, constraint satisfaction, heuristic search, and others. The basic idea of global search is to represent and manipulate sets of candidate solutions. The principal operations are to extract solutions from a set and to split a set into subsets. Derived operations include various filters which are used to eliminate sets containing no feasible or optimal solutions. Global search algorithms work as follows: starting from an initial set that contains all solutions to the given problem instance, the algorithm repeatedly extracts candidates, splits sets, and eliminates sets via filters until no sets remain to be split. The process can be described as a tree search in which a node represents a set of candidates and an arc represents the suset relationship between child and parent nodes. The filters serve to prune off branches of the tree that cannot lead to solutions. In our model of global search, solutions can be extracted from all nodes of the tree, not just leaves. This allows the enumeration of countably infinite sets where the tree is unbounded in breadth or depth. The sets of candidate solutions are rarely represented extensionally. Thus global search algorithms are based on an abstract data type of intensional representations called descriptors. In addition to the extraction and splitting operations mentioned above, the type also includes a satisfaction predicate that determines when a candidate solution is in the set denoted by a descriptor. For the sake of simplifying the presentation we will use the term space (or subspace) to denote both the descriptor and the set that it denotes. It should be clear from context which meaning is intended. The core of this paper is an axiomatic presentation of the abstract structure of global search algorithms. The axiomatic presentation allows us to capture the essence of a class of algorithms while suppressing details of programming language and control strategy. The main results of this paper are design tactics for formally deriving global search algorithms from specifications. Being based on the axiomatic representation, the tactics are sound and thus they either produce correct algorithms or they fail to complete. The tactics are abstract and thus they are applicable to a broad variety of problems. They are also formal and thus are amenable to machine support. The tactics incorporate control knowledge about designing global search algorithms and thus provide highly motivated derivations when compared with the use of a more general but consequently weaker derivation methodology. The main emphasis of this paper is on formally designing correct algorithms via design tactics. Revealing the tactic’s full potential requires automation of the tactic and the application of program optimization techniques. The tactic was specifically designed to be automated and has been implemented in the KIDS system [34] and used to design and optimize over forty global search algorithms. The derived algorithms naturally present opportunities for such high-level optimization techniques as simplification, partial evaluation [5], finite differencing [24], case analysis, and data structure design and refinement. See [34] for details. One goal of this paper is to present the tactics and example derivations at a level of detail that will convey to the reader a sense of how an automated system such as KIDS could derive global search algorithms. We derive algorithms for problems in which it is desired to obtain all solutions that satisfy some feasibility constraints and optionally to minimize a cost function. The tactic can be specialized 4
to handle cases where only a single solution is desired. Example derivations of 0,1-integer linear programming algorithms are presented. In an extended version of this paper [33], we treat the problems of finding a maximum of a discrete unimodal function, shortest path in a graph, binary search, and traveling salesman. Other published applications of the tactic include k-queens [34], optimal job scheduling on a single processor [38], cyclic difference sets [36], and Costas arrays [35]. Other problems to which we have applied the tactic are listed in the Appendix. Section 2 introduces some basic concepts and terminology. In Section 3 we formalize the notion of global search as an axiomatic theory and show how to specialize known global search theories to particular problems. Section 4 considers problem specifications involving a cost function and how the global search theory and design tactic can be extended to handle the enumeration of optimal cost solutions. The relation of this work to other work is discussed in Section 5. The Appendix lists some standard gs-theories that are used in this paper and which have been implemented in the KIDS system.
5
2.
Basic Concepts And Notation
2.1.
Language
A functional specification/programming language augmented with set-theoretic data types will be used in this paper. The main type constructors and their operations (listed below) are based on those of the REFINE 1 language. The boolean type admits the usual operators and quantifiers of the predicate calculus ( ∧ , ∨ , ¬, =⇒ , ⇐⇒ , ∀, ∃). Finite Sets. S: set(Nat) =, 6=, ∈, 6∈, ⊆ {} {1, 2, 4, 8} {f (~x) | P (~x)} ∪, ], ∩ card (S) reduce(bop, S) S with x S less x arb(S) arbsplit(S)
example type declaration of a set of natural numbers comparison predicates: (in)equality; (non)membership; improper subset the empty set literal set former general set former; note that any variables that are local to a setformer have implicit existential quantification union, disjoint union, intersection number of elements in S; e.g. card ({ }) = 0 reduction of the set S by the commutative and associative binary operator bop; e.g., reduce(+, {1, 2, 3}) = 6 element addition element deletion selection of an arbitrary element from S splits the set S into a 2-tuple consisting of an arbitrary element of S and the remainder of S; e.g., arbsplit({1, 2, 3}) could evaluate to h2, {1, 3}i.
Finite Sequences. A: seq(real ) =, 6=, ∈, 6∈ [] empty(A) [1, 2, 4, 8] A(i) domain(A) range(A) length(A) first(A) rest(A) append (A, x) concat(A, B)
1
example type declaration of a sequence of reals comparison predicates: (in)equality; (non)membership, e.g. 3 ∈ [2, 3, 5, 3] the empty sequence A=[] literal sequence former the ith element of A; e.g. [4, 5, 6](2) = 5 the set of integers between 1 and length(A) inclusive same as {A(i) | i ∈ domain(A)} length of the sequence A same as A(1) all but the first; e.g. rest([4, 5, 6, 4]) = [5, 6, 4] insert x at end of A; e.g. append([1, 2], 3) = [1, 2, 3] concatenate sequences A and B.
REFINE is a trademark of Reasoning Systems, Inc., Palo Alto, California.
6
Finite Maps. M : map(Nat, boolean) =, 6= {| |} empty(M ) {| 1 7→ 1, 2 7→ 4 |} {| d(x) 7→ r(x) | P (x) |} M (x) domain(M ) range(M ) λ(x1 , . . . , xn ) E(x1 , . . . , xn ) M ⊕N
example type declaration of a map from natural numbers to booleans (in)equality the empty map: the map that is undefined everywhere M = {| |} literal map former general map former map application the set of values over which M is defined the set of values returned by M over its domain lambda expression map composition (M ⊕ N )(x) = if x ∈ domain(N ) then N (x) else M (x)
Type information will be omitted when it is easily inferable from the context. Other notation will be introduced as needed.
2.2.
Specifications
A formal specification serves to define the problem for which we desire an efficient computational solution. A problem F with problem specification BF = hD, R, I, Oi takes inputs of input type D satisfying the input condition I : D → boolean. A feasible solution z : R with respect to input x satisfies the output condition O : D × R → boolean; i.e. O(x, z) holds. Specification and program can be combined in a program specification written function F (x : D) : set(R) where I(x) returns {z | O(x, z)} = Body. The where clause specifies assumptions on the input data. The returns clause specifies the value returned by the function — here the set of all feasible solutions. The optional expression Body is (possibly recursive) code that can be executed to compute F . A program specification is consistent if for all possible inputs satisfying the input condition, the body produces the same set as specified in the returns expression; formally ∀(x : D) ( I(x) =⇒ F (x) = {z | O(x, z)}).
(1)
Here F (x) denotes the value computed by the program. In the sequel problem specifications will often be presented in the above format, but without the body.
2.3.
A Simple Example: Binary Search
In this section we view binary search as an example of global search. This will also serve to illustrate the functional language used in specifications and program bodies. Consider the problem of searching an ordered mapping for a given key, which can be specified as follows. 7
function OrdSearch(n : N at, A : map(N at, N at), key : N at) : set(N at) where 1 ≤ n ∧ domain(A) = {1..n} ∧ Ordered(A) returns {index | A(index) = key ∧ index ∈ {1..n}}. The input is a positive integer n, an ordered map A over the domain {1..n}, and a key. Ordered maps can be defined by a monotonicity law, which for Ordered(A) is: for all a and b, a ∈ domain(A) ∧ b ∈ domain(A) =⇒ (a ≤ b =⇒ A(a) ≤ A(b)). The set of acceptable solutions (feasible space) for OrdSearch is the set of indices whose image under the map A is the given key. The output domain of the OrdSearch problem is the natural numbers. The first step in the design tactic for global search is to select a standard global search method for enumerating subranges of integers. Such a method would be available as a global search theory in a library (see Appendix). It represents sets of candidate solutions {i..j} by tuples of the form hi, ji. The problem becomes enriched with a new data type {hi, ji | 1 ≤ i ≤ j ≤ n}. The pair h1, ni describes the initial space which includes all feasible solutions. The operators on this type include Split which takes hi, ji into hi, (i + j) div 2i and h1 + ((i + j) div 2), ji when i < j; and Extract which specifies that i is extractable from space hi, ji when i = j. Together this enriched problem structure can be assembled into a concrete program by instantiating a global search program scheme (to simplify notation we use i and j as parameters rather than the tuple hi, ji): function OrdSearch(n : N at, A : map(N at, N at), key : N at) : set(N at) where 1 ≤ n ∧ domain(A) = {1..n} ∧ Ordered(A) returns {index | A(index) = key ∧ index ∈ {1..n}} = OrdSearch gs(n, A, key, 1, n) function OrdSearch gs(n : N at, A : map(N at, N at), key : N at, i : N at, j : N at) : set(N at) where domain(A) = {1..n} ∧ Ordered(A) ∧ 1 ≤ i ≤ j ≤ n returns {index | A(index) = key ∧ index ∈ {i..j}} = {index | i = j ∧ key = A(i) ∧ index = i} ∪ {index | i < j ∧ index ∈ OrdSearch gs(n, A, key, i, (i + j) div 2)} ∪ {index | i < j ∧ index ∈ OrdSearch gs(n, A, key, 1 + ((i + j) div 2), j)} OrdSearch gs is a linear time algorithm that correctly finds all feasible solutions. However, to obtain efficiency it is important to prune off spaces that cannot contain feasible solutions. This is done by deriving a necessary condition on the existance of a feasible solution in a given space hi, ji. Technically we derive such a necessary condition using forward inference: Assume that 1 ≤ i ≤ j ≤ n and Ordered(A) and reason forwards from the assumption that index is a feasible solution in the interval hi, ji: index ∈ {i..j} ∧ A(index) = key. Ordered maps obey a monotonicity law, which for Ordered(A) is: for all a and b, a ∈ domain(A) ∧ b ∈ domain(A) =⇒ (a ≤ b =⇒ A(a) ≤ A(b)). 8
Using definition of Ordered, we can infer from index ∈ {i..j} and 1 ≤ i ≤ j ≤ n that A(i) ≤ A(index) ≤ A(j) and, furthermore, from A(index) = key we can infer A(i) ≤ key ≤ A(j).
(2)
Formula (2) is used as a filter on intervals — if it does not hold, then the interval hi, ji cannot contain any feasible solutions. The incorporation of filters can greatly improve the performance of a simple but inefficient algorithm. In this case, after folding in the filter, and noting the new invariant, we obtain a O(log n) binary search algorithm: function OrdSearch(n, A, key) : set(N at) where 1 ≤ n ∧ domain(A) = {1..n} ∧ Ordered(A) returns {index | A(index) = key ∧ index ∈ {1..n}}. = if A(1) ≤ key ≤ A(n) then OrdSearch gs(n, A, key, 1, n) else { } function OrdSearch gs(n, A, key, i, j) : set(N at) where domain(A) = {1..n} ∧ Ordered(A) ∧ 1 ≤ i ≤ j ≤ n ∧ A(i) ≤ key ≤ A(j) returns {index | A(index) = key ∧ index ∈ {i..j}}. = {index | i = j ∧ key = A(i) ∧ index = i} ∪ {index | i < j ∧ A(i) ≤ key ≤ A((i + j) div 2) ∧ index ∈ OrdSearch gs(n, A, key, index, i, (i + j) div 2)} ∪ {index | i < j ∧ A(1 + ((i + j) div 2)) ≤ key ≤ A(j) ∧ index ∈ OrdSearch gs(n, A, key, index, 1 + ((i + j) div 2), j)} The reader can find many optimization opportunities in this program — deleting unnecessary variables, pulling out common subexpressions, conversion to a case statement, simplification of program predicates with respect to the input conditions (for example, the expression A(i) ≤ key ≤ A((i + j) div 2) simplifies to key ≤ A((i + j) div 2) because A(i) ≤ key is an input invariant), etc. These optimizations are fairly routine and can be performed with a few interactive steps in KIDS.
2.4.
Inference
Deductive inference is necessary for applying general knowledge to particular problems. Generally, inference tasks in this paper are specified in the following (slightly simplified) form Infer (T arget) (A =⇒ (Source(x1 , . . . , xm ) −→ T arget(y1 , . . . , yn ))) where A is a conjunction of assumptions, Source is the “source” expression (i.e. term or formula), and −→ is a reflexive and transitive ordering relation over expressions, called the inference direction. For notational simplicity all free variables are universally quantified, unless otherwise noted. In words, the task is to infer some expression T arget over the variables {y1 , . . . , yn } (a subset of the free variables {x1 , . . . , xm } of Source) such that the relationship Source(x1 , . . . , xm ) −→ T arget(y1 , . . . , yn ) holds under the given assumptions. Typical inference directions include: 9
forward inference backward inference simplification deriving a lower bound deriving an upper bound
=⇒ ⇐= = ≥ ≤
For example, the problem of deriving the filter for OrdSearch is specified as follows. Infer (Φ) (1 ≤ i ≤ j ≤ n ∧ Ordered(A) =⇒ (index ∈ {i..j} ∧ A(index) = key =⇒ Φ(A, key, i, j))) Here the inference direction =⇒ means that we want to derive a necessary condition Φ on the source expression index ∈ {i..j} ∧ A(index) = key modulo the assumptions 1 ≤ i ≤ j ≤ n ∧ Ordered(A). Furthermore the free variables of Φ must be a subset of {A, key, i, j}. The inference process involves applying a sequence of transformations to the source term. The transformations are restricted to those that preserve the specified inference direction. Deductions will be presented in the following format: index ∈ {i..j} ∧ A(index) = key =⇒
distributing element membership over range i ≤ index ≤ j
=⇒
by definition of Ordered A(i) ≤ A(index) ≤ A(j)
=
using A(index) = key A(i) ≤ key ≤ A(j).
In general we will also perform forward inference from the assumptions in order to obtain a richer assumption set. That is, if we can infer B from assumptions A1 , A2 , . . . , An then B can be added as a new assumption. These derived assumptions often help simplify and speed up the inference process. This style of inference is called directed inference because of the explicit inference direction in an inference specification. Much of the work in algorithm design and optimization can be treated as highly constrained directed inference. A formal system for deriving sufficient and equivalent conditions appears in [30]. We have implemented a directed inference system called RAINBOW II that is invoked as a utility in KIDS. It can be used in either automatic or interactively-guided mode.
10
3.
Enumerating Feasible Solutions
Global search algorithms work by splitting a description of a set into descriptions of subsets. Directly enumerable solutions are extracted from each generated set. This extraction and splitting process may continue indefinitely if we are enumerating an infinite set. Global search usually also includes some predicates, called filters, to decide whether to eliminate a set descriptor from further consideration. Again, we will use the terms space and subspace to mean both descriptors and the sets that they denote. The splitting process leads to a natural view of global search as a tree searching procedure where nodes represent spaces and the children of a node represent subspaces of the parent node. For finite trees a simple depth-first control strategy suffices to enumerate all nodes in the tree. In general, when there may be infinite depth and/or infinite branching factors, the search control problem is reducible to enumerating all sequences of natural numbers. The binary search algorithm in the previous section provides a very simple example of a global search algorithm. The overall set being enumerated is {1..n}, more compactly describable as a tuple consisting of the lower and upper bounds h1, ni. At each stage in the computation the algorithm is working on the descriptor hi, ji (denoting the set {z | i ≤ z ≤ j}) and extracts a candidate solution i from singleton intervals and checks it for feasibility. The algorithm also splits hi, ji into subspaces hi, midi and hmid + 1, ji where mid = (i + j) div 2. A simple test A(i) ≤ key ≤ A(j) is used to filter out spaces that do not contain indices of the given key. For another example, suppose that we want to enumerate all total mappings from some finite set U to finite set V . One approach is to build the desired mappings up an element at a time. A set of maps could be described by a triple hS, T, M i where M is a partial mapping from U to V and domain(M ) = S and U = S ] T . The descriptor denotes the set of all consistent extensions of M defined over set U . The initial set of all mappings could be represented by the triple h{ }, U, {| |}i and a space described by a partial map M could be split by generating alternative extensions of M . Figure 1 shows part of the tree of descriptors generated for mappings from U = {1, 2, 3} to V = {0, 1}.
3.1.
Abstract Structure of Global Search Algorithms
In this section we formalize the structure underlying global search algorithms. The formalism is of value in that (1) it allows us to clarify the nature of global search algorithms and their correctness, and (2) it provides a rigorous foundation for designing global search algorithms for particular problems. Abstractly, global search can be treated as the extension of a problem specification ˆ The operators on this type allow for creating an with an abstract data type for descriptors (R). initial space (ˆr0 ), splitting spaces (Split), extracting solutions (Extract), and determining element membership (Satisfies). Axioms constrain the possible interpretations of the extension. Formally, global search theory (or simply gs−theory) is presented as follows:2 Sorts D R 2
input domain output domain
Note that all variables in the axioms are assumed to be universally quantified unless explicitly specified otherwise.
11
h{ }, {1, 2, 3}, {| |}i @ @ @ @
h{1}, {2, 3}, {| 1 7→ 0 |}i
h{1}, {2, 3}, {| 1 7→ 1 |}i @
@
@ @ @
@ @ @
h{1, 3}, {2}, {| 1 7→ 0, 3 7→ 0 |}i
...
...
h{1, 3}, {2}, {| 1 7→ 1, 3 7→ 1 |}i
...
@ @ @ @
h{1, 2, 3}, { }, {| 1 7→ 1, 2 7→ 0, 3 7→ 1 |}i
...
Figure 1: Partial Tree of descriptors representing maps from {1, 2, 3} to {0, 1}
ˆ R subspace descriptors Operations I : D → boolean input condition O : D × R → boolean input/output condition ˆI : D × R ˆ → boolean subspace descriptors condition ˆ ˆr0 : D → R initial space ˆ → boolean Satisfies : R × R denotation of descriptors ˆ ˆ Split : D × R × R → boolean split relation ˆ Extract : R × R → boolean extractor of solutions from spaces Axioms GS0. I (x ) =⇒ ˆI (x , ˆr0 (x )) GS1. I (x ) ∧ ˆI (x , ˆr ) ∧ Split(x , ˆr , ˆs) =⇒ ˆI (x , ˆs) GS2. I (x ) ∧ O(x , z ) =⇒ Satisfies(z , ˆr0 (x )) GS3. I (x ) ∧ ˆI (x , ˆr ) =⇒ (Satisfies(z , ˆr ) ⇐⇒ ∃(ˆs) ( Split ∗ (x , ˆr , ˆs) ∧ Extract(z , ˆs)))
GS-theory extends problem specification hD, R, I , Oi with the sorts, operators, and axioms underlying global search algorithms. We notate gs-theory counterparts to problem specification symbols ˆ is the type of space descripwith hats (e.g. I vs. ˆI ). Thus whereas R is the type of solutions, R tors which denotes sets of solutions. Furthermore we use the symbols ˆr , ˆs, ˆt to denote elements ˆ Whereas I is a subtype qualifier for D characterizing legal inputs, ˆI is a subtype qualifier of R. ˆ characterizing legal space descriptors. ˆr0 (x ) is the descriptor of the initial set of candidate on R solutions, Satisfies(z , ˆr ) means that z is in the set denoted by descriptor ˆr or that z satisfies the constraints that ˆr represents, Split(x , ˆr , ˆs) means that ˆs is denotes a subspace of ˆr with respect to input x , and Extract(z , ˆr ) means that z is directly extractable from ˆr . Axiom GS0 asserts that the initial descriptor ˆr0 (x ) is a legal descriptor. Axiom GS1 asserts that 12
legal descriptors split into legal descriptors. Axiom GS2 gives the denotation of the initial descriptor — all feasible solutions are contained in the initial space. Axiom GS3 gives the denotation of an arbitrary descriptor ˆr — an output object z is in the set denoted by ˆr if and only if z can be extracted after finitely many applications of Split to ˆr . Here we define Split ∗ (x , ˆr , ˆs) ⇐⇒ ∃(k : N at) ( Split k (x , ˆr , ˆs) ) and Split 0 (x , ˆr , ˆt) ⇐⇒ ˆr = ˆt and for all natural numbers k ˆ ( Split(x , ˆr , ˆs) ∧ Split k (x , ˆs, ˆt)). Split k+1 (x , ˆr , ˆt) ⇐⇒ ∃(ˆs : R) Axiom GS3 interrelates and constrains the meaning of Satisfies, Split, and Extract. ˆI is a subtype qualifier on descriptors — only those descriptors ˆr : R ˆ satisfying ˆI (ˆr ) will be used ˆ denotes a set of R values {z : R | Satisfies(z, ˆr )}. It is an easy in global search. A descriptor ˆr : R consequence of the axioms that if Split(x , ˆr , ˆs) then {z | Satisfies(z, ˆs)} ⊆ {z | Satisfies(z, ˆr )}. For compactness we sometimes express a gs-theory as a tuple ˆ ˆI , ˆr0 , Satisfies, Split, Extracti G = hB, R, where B = hD, R, I , Oi. ˆ can be split only finitely many times. To be A gs-theory is well-founded if an arbitrary space in R more precise: ˆ ∃(k : N at) ∀(ˆt : R) ˆ ( ¬Split k (x , ˆr , ˆt) ). ∀(ˆr : R) All of the gs-theories listed in the Appendix are well-founded.3 We present a gs-theory via a theory interpretation4 . For example, a global search theory, called gs-finite-maps, for enumerating maps that corresponds to Figure 1 can be presented as follows: D I R O ˆ R ˆI
7→ 7 → 7 → 7 → 7 → 7 →
Satisfies ˆr0 Split
7→ 7→ 7 →
Extract
7→
set(α) × set(β) λ(hU, V i) true map(α, β) λ(hU, V i, N ) domain(N ) = U ∧ range(N ) ⊆ V set(α) × set(α) × map(α, β) λ(hU, V i, hS, T, M i) ∧ S ] T = U ∧ domain(M ) = S ∧ range(M ) ⊆ V λ(N, hS, T, M i) ∀(x : S) (N (x) = M (x)) λ(hU, V i)h{ }, U, {| |}i λ(hU, V i, hS, T, M i, hS 0 , T 0 , M 0 i) ∃(a, b) ( a = arb(T ) ∧ b ∈ V ∧ hS 0 , T 0 , M 0 i = hS + a, T − a, M ⊕{| a 7→ b |}i) λ(N, hS, T, M i) T = { } ∧ N = M
3
Treat as an extra axiom to gs-theory? worry about finite breadth? A theory interpretation from theory T1 to theory T2 comprises a map from the language of T1 to the language of T2 such that theorems of T1 translate to theorems of T2 . That is, a theory interpretation is a validity-preserving translation of one theory into another [28]. 4
13
Note the type parameters α and β; they are bound at design-time whereas input parameters U and V are bound at run-time. That is, given α and β we could design a global search algorithm that when given inputs U and V returns all maps from U to V . A recursive algorithm scheme can be derived in abstract gs-theory. The algorithm explores the tree of subspace descriptors defined by Split applied recursively to the initial descriptor ˆr0 . For a descriptor ˆr , the algorithm returns the set of directly extractable solutions {z | Extract(z , ˆr ) ∧ O(x , z )} together with the solutions obtained recursively for each descriptor ˆs that ˆr splits into. Note that this scheme is naturally parallel. Theorem 3..1 If GF is a well-founded gs-theory then the following multilinear recursive program specification is consistent. function F (x : D) : set(R) where I (x ) returns {z | O(x , z )} = F gs(x , ˆr0 (x )) ˆ : set(R) function F gs(x : D, ˆr : R) where I (x ) ∧ ˆI (x , ˆr ) returns {z | Satisfies(z , ˆr ) ∧ O(x , z )} = {z | Extract(z , ˆr ) ∧ O(x , z )} ∪ reduce(∪, { F gs(x , ˆs) | Split(x , ˆr , ˆs)}). Proof: The main challenge is to show the consistency of F gs: F gs(x , ˆr ) = {z | Satisfies(z , ˆr ) ∧ O(x , z )}. Assuming (3), the first specification is consistent by the following argument: F (x ) = F gs(x , ˆr0 (x )) = {z | Satisfies(z , ˆr0 (x )) ∧ O(x , z )} by (3) = {z | O(x , z )} using Axiom GS1. To establish (3) we start with the right-hand-side: {z | Satisfies(z , ˆr ) ∧ O(x , z )} =
applying GS3 and dropping the quantifier on ˆs {z | Split ∗ (x , ˆr , ˆs) ∧ Extract(z , ˆs) ∧ O(x , z )}
=
using Case Analysis on Split ∗ {z | (Split 0 (x , ˆr , ˆs) ∨ ∃(k)Split k+1 (x , ˆr , ˆs)) ∧ Extract(z , ˆs) ∧ O(x , z )}
14
(3)
distributing, simplifying, and decomposing Split k+1
=
{z | Extract(z , ˆr ) ∧ O(x , z )} S
{z | ∃(k)(∃(ˆt)(Split(x , ˆr , ˆt) ∧ Split k (x , ˆt, ˆs)) ∧ Extract(z , ˆs) ∧ O(x , z ))} =
regrouping {z | Extract(z , ˆr ) ∧ O(x , z )} S
{z | ∃(ˆt)(∃(k)Split k (x , ˆt, ˆs) ∧ Extract(z , ˆs)) ∧ Split(x , ˆr , ˆt) ∧ O(x , z ))} dropping quantifiers and applying definition of Split ∗
=
{z | Extract(z , ˆr ) ∧ O(x , z )} S
{z | (Split ∗ (x , ˆt, ˆs) ∧ Extract(z , ˆs)) ∧ Split(x , ˆr , ˆt) ∧ O(x , z )} =
applying GS3 {z | Extract(z , ˆr ) ∧ O(x , z )} S
{z | Satisfies(z , ˆt) ∧ Split(x , ˆr , ˆt) ∧ O(x , z )} =
reformulating {z | Extract(z , ˆr ) ∧ O(x , z )} S
reduce(∪, { {z | Satisfies(z , ˆs) ∧ O(x , z )} | Split(x , ˆr , ˆs)})
=
byinductionhypothesis {z | Extract(z , ˆr ) ∧ O(x , z )} S
reduce(∪, { F gs(x , ˆs) | Split(x , ˆr , ˆs)})
=
bydefinition F gs(x , ˆr )
QED Theorem 3..1 provides a program scheme that can be instantiated to yield a concrete consistent program for a given problem. Example: Enumerating maps
15
A theory interpretation shows how to translate each gs-theory symbol in the abstract program into an expression in map theory, resulting in a correct, concrete program. Applying Theorem 3..1 to the gs-theory gs finite maps we obtain the program specification: function F inite M aps(U : set(α), V : set(β)) : set(map(U, V )) where true returns {N | domain(N ) = U ∧ range(N ) ⊆ V } = F inite M aps gs(U, V, { }, U, {| |}) function F inite M aps gs(U, V, S, T, M ) : set(map(set(α), set(β))) where S ] T = U ∧ domain(M ) = S ∧ range(M ) ⊆ V returns {N | domain(N ) = U ∧ range(N ) ⊆ V } = {N | T = { } ∧ N = M } ∪ reduce(∪, {F inite M aps gs(U, V, S 0 , T 0 , M 0 ) | T 6= { } ∧ a = arb(T ) ∧ b ∈ V ∧ hS 0 , T 0 , M 0 i = hS + a, T − a, M ⊕{| a 7→ b |}i})
In the last line above a is bound to a single value arb(T ) and b varies over all elements of the set V. End of Example.
3.2.
Specializing a Known GS-Theory
Theorem 3..1 provides a way to construct a global search algorithm once a global search theory is at hand; but how can we construct a global search theory for a given problem specification? The key idea is to maintain a library of global search theories and to have a method for specializing a library theory to the given problem. For example, in this section we specialize the library theory gs finite maps to obtain a global search theory for 0,1-Integer Linear Programming. Let BF = hDF , RF , IF , OF i and BG = hDG , RG , IG , OG i be problem specifications. Definition. BF specializes BG if RF = RG and ∀(x : DF )∃(y : DG )∀(z : RF )((IF (x ) =⇒ IF G(y)) ∧ (OF (x , z ) =⇒ OG (y, z ))).
(4)
In words, for any input x in problem F , there is some input y in problem G such that two conditions hold: first, if x is a legal input to F , then y is a legal input to G; second, if z is a feasible solution to input x then z is also feasible solution to input y in problem G. Thus, an algorithm for BG can be used to enumerate a superset of the feasible solutions of BF . A constructive proof of (4) yields a witness t(x) to the existential that translates F inputs to G inputs. Definition. BF specializes to BG with translation t(x ) if ∀(x : DF )∀(z : RF )((IF (x ) =⇒ IF G(t(x ))) ∧ (OF (x , z ) =⇒ OG (t(x ), z ))).
16
The following theorem allows us to construct a global search theory for an arbitrary problem BF if we can find a translation that specializes a known global search theory to BF :
ˆ ˆI , ˆr0 , Satisfies, Split, Extracti be a global search theory, and let Theorem 3..2 Let GG = hBG , R, BF = hDF , RF , IF , OF i be a problem theory that specializes BG = hDG , RG , IG , OG i with translaˆ λ(x , ˆr )ˆI (t(x ), ˆr ), Satisfies, λ(x )ˆr0 (t(x )), λ(x , ˆr , ˆs)Split(t(x ), ˆr , ˆs), Extra tion t(x ), then the structure GF = hBF , R, is a global search theory. Proof: Clearly the syntactic components are present and correctly typed. We must show that the axioms of global search are satisfied in GF . We will show this for two axioms, the others are analogous. First, for GS0 we must establish that ∀(x : DF )(IF (x ) =⇒ ˆIF (x , ˆr0F (x ))). For any x : DF , IF (x ) =⇒
by assumption that BF specializes BG I (t(x ))
=⇒
by axiom GS0 ˆI (t(x ), ˆr0 (t(x )))
⇐⇒
by definition of ˆr0F and ˆIF ˆIF (x , ˆr0F (x )).
This establishes GS0 for GF . Second, assuming that axiom GS3 holds in GG we need to show that GS3 holds in GF : IF (x ) ∧ ˆIF (x , ˆr ) =⇒ (Satisfies F (z , ˆr ) ⇐⇒ ∃(ˆs) ( Split ∗F (x , ˆr , ˆs) ∧ Extract F (z , ˆs))). ˆ F . Since GG specializes GG we infer I (t(x )). First, assume IF (x ) ∧ ˆIF (x , ˆr ) where x : DF and ˆr : R ˆ ˆ From the definition of IF (x , ˆr ) we infer I (t(x ), ˆr ). We can then reason as follows:
17
Satisfies F (z , ˆr ) ⇐⇒
by construction Satisfies(z , ˆr )
⇐⇒
by GS3 for GG and using derived assumptions ∃(ˆs) ( Split ∗ (t(x ), ˆr , ˆs) ∧ Extract(z , ˆs))
⇐⇒
using the definition of Split ∗F and Extract F ∃(ˆs) ( Split ∗F (x , ˆr , ˆs) ∧ Extract F (z , ˆs)).
We conclude that GS3 holds for GF . QED Example: 0,1-Integer Linear Programming (ILP). Suppose that we want to find all mappings from {1..n} to {0, 1} (i.e., vectors of zeroes and ones of length n) that satisfy the linear inequalities Ax ≤ b where Ax ≤ b ⇐⇒ ∀(i)(i ∈ {1..m} =⇒
n X
A(i, j)x(j) ≤ b(i) ).
j=1
The problem can be specified as follows. ILP (m : N at, n : N at, A : map(N at × N at, integer), b : map(N at, integer)) : set(map(N at, N at)) where 1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} returns {x | domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b}
There is a complete reduction from ILP to the gs-theory gs finite mapping. To see this, we first determine that the output types can be made equal (unification of map(N at, N at) and map(α, β) yields the substitution {α 7→ N at, β 7→ N at}), and then instantiate and verify (4). ∀hm, n, A, bi : N at × N at × map(N at × N at, integer) × map(N at, integer) ∃hU, V i : set(N at) × set(N at) ∀x : map(N at, N at) (1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b =⇒ domain(x) = U ∧ range(x) ⊆ V )
18
The proof is straightforward and results in the translation {U 7→ {1..n}, V 7→ {0, 1}}. According to Theorem 3..2, we then construct the following specialized gs-theory for 0,1-integer linear programming. D I
7→ 7 →
R O ˆ R ˆI ˆr0 Satisfies Split
7→ 7 → 7→ 7→ 7→ 7→ 7→
Extract
7→
N at × N at × map(N at × N at, integer) × map(N at, integer)} λ(hm, n, A, bi) 1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} map(N at, N at) λ(hm, n, A, bi, x) domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b set(N at) × set(N at) × map(N at, N at) λ(hm, n, A, bi) S ] T = {1..n} ∧ domain(M ) = S ∧ range(M ) ⊆ {0, 1} λ(hm, n, A, bi) h{ }, {1..n}, {| |}i λ(x, hS, T, M i) ∀(i)(i ∈ S =⇒ x(i) = M (i)) λ(hm, n, A, bi, hS, T, M i, hS 0 , T 0 , M 0 i) ∃(a, r)( a = arb(T ) ∧ r ∈ {0, 1} L ∧ hS 0 , T 0 , M 0 i = hS + a, T − a, M {| a 7→ r |}i)) λ(x, hS, T, M i) T = { } ∧ x = M
Theorem 3..1 allows us to assemble a program that enumerate the feasible space of ILP : function ILP (m, n, A, b) where 1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} returns {x | domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b} = ILP gs1(m, n, A, b, { }, {1..n}, {| |}) function ILP gs(m, n, A, b, S, T, M ) where 1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} ∧ S ] T = {1..n} ∧ domain(M ) = S ∧ range(M ) ⊆ {0, 1} returns {x | ∀(i)(i ∈ S =⇒ x(i) = M (i)) ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b} = {x | T = { } ∧ x = M ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b} ∪ reduce(∪, { ILP gs1(m, n, A, b, S 0 , T 0 , M 0 ) | ∃(a, r)( a = arb(T ) ∧ r ∈ {0, 1} L ∧ hS 0 , T 0 , M 0 i = hS + a, T − a, M {| a 7→ r |}i))}). This is a correct ILP algorithm, but it enumerates all 2n mappings and so it is not very efficient. Various ways to improve its performance are described below. End of example ILP. The result of the specialization process is another gs-theory GF which can be added to the library if desired, and thus reused. This approach allows the design system to be involved in the incremental acquisition of its own library. Furthermore it allows concentration of the knowledge base in specialized domains. As a simple example, a generator of mappings could be specialized to a generator of permutations, which could then serve as a starting point for developing an enumerator for the k-queens or traveling salesman problems.
19
3.3.
Program Optimization
Theorem 3..1 and its analogues allow us to infer a consistent but possibly inefficient algorithm from a gs-theory. While the algorithm will be well-structured, it may allow expensive and unnecessary computation. In the following sections we discuss several methods for improving the performance of global search algorithms: deriving feasibility filters, expression simplification, finite differencing, etc.
3.3.1.
Filters
In this section and in Section 4.3 we define several common types of filters – predicates that are used to eliminate spaces from further consideration. In terms of the search tree model, filters can be used to prune off branches of the search tree that cannot yield solutions or to focus search on branches that are known to lead to solutions. To be more precise, a global search program explores the set of spaces {ˆs | Split∗ (x , ˆr0 (x ), ˆs) }. ˆ → Boolean is used to limit the search to A filter ψ : D × R {ˆs | Split∗ (x , ˆr0 (x ), ˆs) ∧ ψ(x , ˆs) }. Clearly, the stronger the filter the fewer the spaces that are explored by the global search algorithm. That is, if ψ1 (x , ˆr ) =⇒ ψ2 (x , ˆr ) (i.e., ψ1 is stronger than ψ2 ) then {ˆs | Split∗ (x , ˆr0 (x ), ˆs) ∧ ψ1 (x , ˆs) } ⊆ {ˆs | Split∗ (x , ˆr0 (x ), ˆs) ∧ ψ2 (x , ˆs) }. Thus, all else being equal, we are motivated to derive as strong a filter as possible. This situation is complicated by two facts however. First, the stronger filter may eliminate some spaces that contain solutions. For example, the strongest possible filter (false) filters out all spaces, so that the search algorithm returns no solutions. Second, the complexity of computing the filter itself has a strong effect on performance of the global search algorithm. It is possible that a weaker but cheaper filter will outperform a stronger but expensive filter. The relationship between the semantic strength of a filter and the quality of the solution set can be clarified by the following classification of filters. The question of interest when seeking feasible solutions is “Does there exist a feasible solution in a given space ˆr ?”. Formally this is ∃(z : R) ( Satisfies(z , ˆr ) ∧ O(x , z ) )
(5)
We might call (5) the ideal filter since a global search algorithm algorithm using it (or an equivalent) would explore exactly the set of spaces needed to find all solutions. However, to use this directly would usually be too expensive, so instead we use various approximations to it. These approximations can be classified as either (i) necessary feasibility filters, where ∃(z : R) ( Satisfies(z , ˆr ) ∧ O(x , y) ) =⇒ ψ(x , ˆr ); 20
(ii) sufficient feasibility filters, where ψ(x , ˆr ) =⇒ ∃(z : R) ( Satisfies(z , ˆr ) ∧ O(x , y) ); (iii) heuristic feasibility filters, which are neither necessary nor sufficient conditions on (5), but are nonetheless approximations to it. Necessary feasibility filters only eliminate spaces that do not contain solutions (if ¬ψ(x , ˆr ) then by the contrapositive of (i) there do not exist any feasible solutions in the space denoted by ˆr ), but they may also allow spaces that do not contain solutions to be explored. So they guarantee the ability to find all solutions but may allow unnecessary work. Sufficient filters eliminate all spaces that do not contain solutions, but may filter out some spaces that do contain solutions. Thus they do no unnecessary work, but only return a subset of solutions. Heuristic filters have the disadvantages of both — they may eliminate spaces containing solutions and allow spaces that do not contain solutions to be explored. However, a fast heuristic approximation to the ideal filter may have the best performance in practice. If a filter ψ is derived as an approximation to the ideal filter, then it depends on both the particular problem being solved (via the feasibility conditions) and the general notions of global search (via the abstract data type of spaces). The inference of ψ is thus crucial to the construction of an effective global search algorithm. The derivation and use of feasibility filters can be viewed as the incorporation of problem-specific information into the more general-purpose generator supplied by the gs-theory. For several reasons we will concentrate mainly on necessary feasibility filters in this paper. First, they are almost always worth deriving since they improve performance without affecting the set of solutions that are computed. Second, algorithms for optimization problems (see Section 4) are based on the ability to enumerate all feasible solutions since any one of them may be optimal. One feature of necessary feasibility filters is that one is immediately available: the constant true. Stronger filters are obtained with more investment of computational resource at design-time. Sufficient filters and examples may be found in [33]. Heuristic filters have been used extensively in artificial intelligence applications, such as job-shop scheduling [8]. Necessary feasibility filters, denoted Φ, are defined by the condition ˆ ∀(z : R) ( I (x ) ∧ ˆI (x , ˆr ) =⇒ (Satisfies(z , ˆr ) ∧ O(x , z ) =⇒ Φ(x , ˆr ))). ∀(x : D) ∀(ˆr : R)
(6)
The following proposition simply states that necessary feasibility filters allow the global search scheme to compute all solutions. Proposition 3..1 If GF is a well-founded gs-theory and Φ is a necessary feasibility filter, then the following program specification is consistent. function F (x : D) : set(R) where I (x ) returns {z | O(x , z )} = if Φ(x , ˆr0 (x ))then F gs(x , ˆr0 (x ))else {} ˆ : set(R) function F gs(x : D, ˆr : R) 21
where I (x ) ∧ ˆI (x , ˆr ) ∧ Φ(x , ˆr ) returns {z | Satisfies(z , ˆr ) ∧ O(x , z )} = {z | Extract(z , ˆr ) ∧ O(x , z )} ∪ reduce(∪, { F gs(x , ˆs) | Split(x , ˆr , ˆs) ∧ Φ(x , ˆs)}). Note that Φ(x , ˆr ) is an invariant of F gs since it is tested prior to each call to F gs. Example: Integer Linear Programming. We can obtain a necessary feasibility filter for ILP according to the inference specification Infer (Φ) (1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} ∧ S ] T = {1..n} ∧ domain(M ) = S ∧ range(M ) ⊆ {0, 1} =⇒ (∀(j)(j ∈ S =⇒ x(j) = M (j)) ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b =⇒ Φ(m, n, A, b, S, T, M ))). One filter can be derived by transforming Ax ≤ b as follows. Ax ≤ b
⇐⇒
by definition X
∀i ∈ {1..m} (
A(i, j)x(j) ≤ b(i))
j∈{1..n}
⇐⇒
applying S ] T = {1..n}
∀i ∈ {1..m}(
X
A(i, j)x(j) +
A(i, j)x(j) ≤ b(i))
j∈T
j∈S
⇐⇒
X
using x(j) = M (j) when j ∈ S
∀i ∈ {1..m} (
X
A(i, j)M (j) +
A(i, j)x(j) ≤ b(i))
j∈T
j∈S
=⇒
X
by transitivity and the fact min(A(i, j) ∗ 0, A(i, j) ∗ 1) ≤ A(i, j)x(j)
∀i ∈ {1..m}(
X
A(i, j)M (j) +
j∈S
X
min(0, A(i, j)) ≤ b(i)).
j∈T
The final formula can be used as a feasibility filter. In words, it states that for each i the partial solution map M can be plausibly extended in such a way that its inner product with A(i, ·) falls under the bound b(i). This is not a sufficient condition because there is no constraint that the plausible extensions be the same for different values of i.
22
function ILP (m, n, A, b) where 1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} returns {x | domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b} = if ILP f ilter(m, n, A, b, { }, {1..n}, {| |}) then ILP gs1(m, n, A, b, { }, {1..n}, {| |}) else {}) function ILP gs(m, n, A, b, S, T, M ) where 1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} ∧ S ] T = {1..n} ∧ domain(M ) = S ∧ range(M ) ⊆ {0, 1} ∧ ILP f ilter(m, n, A, b, S, T, M )) returns {x | ∀(i)(i ∈ S =⇒ x(i) = M (i)) ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b} = {x | T = { } ∧ x = M ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b} ∪ reduce(∪, { ILP gs(m, n, A, b, S 0 , T 0 , M 0 ) | ∃(a, r)( a = arb(T ) ∧ r ∈ {0, 1} L ∧ hS 0 , T 0 , M 0 i = hS + a, T − a, M {| a 7→ r |}i)) ∧ ILP f ilter(m, n, A, b, S 0 , T 0 , M 0 ) }).
function ILP f ilter(m, n, A,b, S, T, M ) = ∀(i)(i ∈ {1..m} =⇒
X
j∈S
A(i, j)M (j) +
X
min(0, A(i, j)) ≤ b(i)))
j∈T
Figure 2: 0,1-Integer Linear Programming Algorithm with Filter
Incorporating the derived filter according to Proposition 3..1 we obtain the consistent program specification in Figure 2. End of example. The filter Φ will often dramatically reduce the amount of work needed to enumerate the feasible space. However it is notoriously difficult to quantify the complexity of enumeration algorithms.
3.3.2.
Simplification
Various simplifications that exploit context can be identified in the abstract form of the program specifications in Proposition 3..1. The most obvious possibility is to simplify predicates with respect to the input conditions. For example 1. The predicate Φ(x , ˆr0 (x )) can usually be simplified with respect to the input assumption I (x ). 2. The expression Extract(x , ˆr ) ∧ O(x , z ) can often be simplified with respect to context I (x ) ∧ ˆI (x , ˆr ) ∧ Φ(x , ˆr ). 23
3. The expression Φ(x , ˆs) can often be simplified with respect to context I (x ) ∧ ˆI (x , ˆr ) ∧ Split(x , ˆr , ˆs) ∧ Φ(x , ˆr ). Example: Integer Linear Programming. P
Several simplification opportunities arise in ILP : (1) replace j∈{ } A(i, j)M (j) by 0 and continue simplifying, and (2) eliminate the test AM ≤ b since it follows from the input conditions and T = { }. The filter Φ does not simplify in this case. End of example.
3.3.3.
Other program optimizations
There are often computationally expensive expressions remaining in the derived program that can be transformed via finite differencing techniques [24, 34] into less expensive incremental computations. In particular, subexpressions of the filter Φ can often be incrementally maintained rather than computed each time. For example one opportunity in ILP is to maintain a mapping Sum Sum = {| i 7→
X
A(i, j)M (j) +
j∈S
X
min(0, A(i, j)) | i ∈ {1..m} |}.
(7)
j∈T
We add Sum as a new input variable to ILP gs and maintain (7) as an invariant. This involves another application of directed inference with the result that Sum is initialized to {| i 7→
X
min(0, A(i, j)) | i ∈ {1..m} |}
j∈T
and is updated in parallel with the split operation to {| i 7→ Sum(i) + A(i, a)r − min(A(i, a), 0) | i ∈ {1..m} |}. The use of Sum dramatically speeds up the computation of the filter Φ. There are obvious opportunities for other optimizations here such as pulling out common subexpressions, converting the set-union in ILP gs to an if-then-else expression, and so on. See [34] for more detailed examples of applying simplification, partial evaluation, finite differencing, and other optimizations following algorithm design.
3.4.
Summary of the Design Tactic
The design steps presented in the previous sections are summarized below. The design tactic depends on the existence of a library of standard gs-theories for common abstract data types (see the Appendix) and a directed inference system. It is assumed that a specification BF = hD, I, R, Oi is given. 0. Build a problem domain theory The user builds up a domain theory by defining problem-specific concepts via appropriate types and functions. The domain theory provides the language for expressing the problem at hand. The user also provides laws that allow high-level reasoning about the defined concepts. Our experience has 24
been that distributive and monotonicity laws provide most of the laws that are needed to support design and optimization. 1. Select a known gs-theory GG and specialize it to F . We set up the appropriate instance of formula (4) and verify it. The resulting translation is applied according to Proposition 3..2 to construct a gs-theory for F . Theorem 3..1 can be applied to obtain a consistent concrete program specification. 2. Derive a filter. A necessary feasibility filter is obtained by deriving a formula Φ(x , ˆr ) satisfying formula 6. Proposition 3..1 or its analogues show how to incorporate the filter. 3. Perform optimizations. Various subexpressions of the recurrence can be simplified with respect to local context. Partial evaluation, finite differencing, case analysis, datatype refinement, and other well-known optimization techniques are often applicable. See [38, 34] for complete derivations involving algorithm design, optimization, and data type refinement.
4.
Enumerating Optimal Cost Solutions
In this section we consider the design of global search algorithms for optimization problems in which a cost is assigned to each feasible solution and it is desired to find some or all feasible solutions that have optimal cost.
4.1.
Optimization Problem Structure
Given a binary preference relation p over a finite set S, an element of S is optimal if it is preferable to all others in S. Formally optimal(y, p, S) ⇐⇒ (y ∈ S ∧ ∀(z)(z ∈ S =⇒ p(y, z))) optima(p, S) = {y | optimal(y, p, S)} = {y | y ∈ S ∧ ∀(z)(z ∈ S =⇒ p(y, z))}. A specification for an optimization problem has the form function F (x : D) : set(R) where I(x) returns optima(λ(y, z) f (x, y) ≤ f (x, z), {z | O(x, z)}) where f is a cost function defined over D × R and ≤ is a total order on the range of f . We will restrict our attention to finite feasible spaces so that the optima is well-defined. For example, the optimization version of 0,1-Integer Linear Programming can be specified as follows.
25
function ILP (m : N at, n : N at, A : map(N at × N at, integer), b : map(N at, integer), c : map(N at, N at)) : set(map(N at, N at)) where 1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} ∧ domain(c) = {1..n} returns optima(λ(x, y) c · x ≤ c · y, {x | domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b} The problem is to find all vectors of zeroes and ones (of length n) that satisfy the linear inequalities Ax ≤ b and minimize the cost function c · x where c·x=
n X
c(j)x(j).
j=1
An optimization specification O = hD, I , R, O, C , le, f i is problem specification extended with the cost structure 1. Cost domain: C 2. Cost ordering: le : C × C → Boolean 3. Cost function: f : D × R → C ) plus the axioms of a total order on C : reflexivity, antisymmetry, transitivity, and totality. For example, the ILP specification can be presented as D I
7→ 7→
R O C le f
7→ 7→ 7→ 7→ 7 →
N at × N at × map(N at × N at, integer) × map(N at, integer) × map(N at, N at) λ(hm, n, A, b, ci) 1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} ∧ domain(c) = {1..n} map(N at, N at) λ(hm, n, A, b, ci, x) domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b N at λ(u, v)u ≤ v λ(hm, n, A, b, ci, x) c · x
We will usually omit reference to the cost domain and the cost ordering when they are clear from the context. As a further abbreviation we will usually write optima(f (x, y), S) or optima(f, S) instead of optima(λ(y, z) f (x, y) ≤ f (x, z), S) when the ordering is clear from the context. Define f ∗ (x , ˆr ) to denote the cost of a minimal cost object in subspace ˆr ,5 i.e., f ∗ (x , ˆr ) = arb(optima(le, {f (x , z ) | Satisfies(z , ˆr ) ∧ O(x , z )})). 5 Note that the total order on the cost domain C does not induce a total order on R. As is usual in optimization problems, several distinct elements of R may have the same cost – in particular there may be several minimal cost solutions.
26
As a special case let f ∗ (x ) denote the cost of a minimal cost object in ˆr0 (x ), f ∗ (x ) = f ∗ (x , ˆr0 (x )). A feasible output object z : R is called optimal with respect to input instance x : D if f (x , z ) = f ∗ (x ). We will assume that the feasible spaces of our problems contain at least one optimal solution.
4.2.
Abstract GS-Theory for Optimization Problems
A gs-theory extended with cost structure will be called a gso-theory. The following analogue to Theorem 3..1 mediates the transition from a gso-theory to a correct well-structured program. This theorem exploits a linear recursive algorithm specification. The algorithm maintains a set of spaces Active which are waiting to be processed. During each iteration a space is selected from Active and its subspaces are added to Active and its extractable solutions are added to a set Solutions. The advantages of this form include the ability to decide search strategy according to how the arbsplit operation is implemented. Also, it is easy to translate to iterative form. The disadvantages include the apparent commitment to sequential execution. Theorem 4..1 Let GF be a well-founded gso-theory and let Φ be a necessary feasibility filter. The following linear recursive program specification is consistent. function F (x : D) : set(R) where I (x ) returns optima(f , {z | O(x , z )}) = if Φ(x , ˆr0 (x )) then F gso(x , {ˆr0 (x )}, { }, { }) else {} ˆ Solutions : set(R), Inactive : set(R)) ˆ : set(R) function F gso(x : D, Active : set(R), ˆ where I (x ) ∧ ∀(ˆr )(ˆr ∈ Active =⇒ I (x , ˆr ) ∧ Φ(x , ˆr )) ∧ Solutions = optima(f , {z | ˆr ∈ Inactive ∧ Extract(z , ˆr ) ∧ O(x , z )}) S returns optima(f , Solutions {z | ˆr ∈ Active ∧ Satisfies(z , ˆr ) ∧ O(x , z )}) = if empty(Active) then Solutions else let hˆr , A1i = arbsplit(Active) S let N ew Solutions = optima(f , Solutions {z | Extract(z , ˆr ) ∧ O(x , z )}) S N ew Active = {ˆs | ˆs ∈ A1 {ˆt | Split(x , ˆr , ˆt)} ∧ Φ(x , ˆs)} F gso(x , N ew Active, N ew Solutions, Inactive with ˆr ). The proof is similar to the proof of Theorem 3..1, although slightly more complicated. A basic design tactic for optimization problems can be described: select and specialize a gs-theory to a given problem and then apply one of the theorems to obtain a consistent concrete program. In the next section we show how to infer filters that exploit the cost function in order to achieve substantial increases in performance over this basic approach. Using a straightforward extension to Theorem 3..2 we can construct a gso-theory for ILP: 27
D I
7→ 7→
R O C le f
7→ 7 → 7→ 7→ 7→
ˆ R ˆI
7 → 7→
ˆr0 Satisfies Split
7→ 7 → 7→
Extract
7→
Φ
7→
N at × N at × map(N at × N at, integer) × map(N at, integer) × map(N at, N at) λ(hm, n, A, b, ci) 1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} ∧ domain(c) = {1..n} map(N at, N at) λ(hm, n, A, bi, x) domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b N at λ(u, v)u ≤ v λ(hm, n, A, b, ci, x) c · x set(N at) × set(N at) × map(N at, N at) λ(hm, n, A, b, ci, hS, T, M i) S ] T = {1..n} ∧ domain(M ) = S ∧ range(M ) ⊆ {0, 1} λ(hm, n, A, b, ci) h{}, {1..n}, {| |}i λ(x, hS, T, M i) ∀(s)(s ∈ S =⇒ x(s) = M (s)) λ(hm, n, A, b, ci, hS, T, M i, hS 0 , T 0 , M 0 i) ∃(a, r) ( a = arb(T ) ∧ r ∈ {0, 1} L ∧ hS 0 , T 0 , M 0 i = hS + a, T − a, M {| a 7→ r |}i) λ(x, hS, T, M i) T = {} ∧ x = M λ(hm, n, A, b, ci, hS, T, M i) P P ∀(i)(i ∈ {1..m} =⇒ j∈S A(i, j)M (j) + j∈T min(0, A(i, j)) ≤ b(i))
6
4.3.
Optimality Filters
In Section 3.3.1 we discussed how feasibility filters are used to eliminate spaces from consideration and the effect of their strength on the behavior and results of a global search algorithm. Feasibility filters can be viewed as approximate tests of the existence of feasible solutions in a given space. For optimization problems there are several important and commonly occurring optimality filters that eliminate spaces on the basis of cost information. The ideal optimality filter decides the question “Does there exist an optimal solution in a given space ˆr ?”. Formally this is ∃(z : R)( Satisfies(z , ˆr ) ∧ optimal(z , f , {y | O(x , y)}) ).
(8)
If we incorporate this filter into the abstract program of Theorem 4..1, then the resulting behavior is ideal in the sense that only spaces containing solutions are ever explored and all optimal solutions are found. However since (8) is typically expensive to compute directly, various approximations to it are used in practice. Again we can classify these approximations as necessary, sufficient, or heuristic optimization filters. There are two well-known kinds of necessary optimality filters – lower bound functions and dominance relations. Sufficient optimality filters and an example are presented in [33]. 6
Φ as extension of gs-theory?
28
4.3.1.
Lower bound filters
Eliminating spaces on the basis of lower bound functions is a characteristic notion of branch-andbound algorithms. The idea is to eliminate a space if a lower bound on the cost of feasible solutions in the space is greater than the cost of a known solution or, in general, is greater than some upper bound on the cost of a optimal solution. We can derive the notion of a lower bound filter as a necessary condition of formula (8) as follows: Assume that z is an optimal solution and that z is in space denoted ˆr : Satisfies(z , ˆr ) ∧ optimal(z , f , {y | O(x , y)}). Since z has optimal cost, we have f (x , z ) = f ∗ (x ); but also since z is in the space denoted by ˆr , f (x , z ) = f ∗ (x , ˆr ); consequently f ∗ (x , ˆr ) = f ∗ (x ) which implies that f ∗ (x , ˆr ) ≤ f ∗ (x ). Neither side of this inequality is easily computable, but if we could cheaply compute bounds on them then we could obtain a useful necessary feasibility filter: lb(x , ˆr ) ≤ f ∗ (x , ˆr ) ≤ f ∗ (x ) ≤ some (ub) ( f ∗ (x ) ≤ ub) ˆ → C satisfies where lb : D × R ˆ z : R) (I (x ) ∧ ˆI (x , ˆr ) ∧ Satisfies(z , ˆr ) ∧ O(x , z ) =⇒ lb(x , ˆr ) ≤ f (x , z )). ∀(x : D, ˆr : R,
(9)
lb is called a lower bound function since it computes a lower bound on the cost of feasible solutions in space ˆr and ub is an upper bound on the cost of an optimal solution. ˆ such that ˆI (x , ˆr ), and Proposition 4..1 If lb satisfies (9), then for all x : D such that I (x ), ˆr : R z :R Satisfies(z , ˆr ) ∧ optimal(z , f , {y | O(x , y)}) =⇒ lb(x , ˆr ) ≤ some (ub) ( f ∗ (x ) ≤ ub ). Proof: Suppose there is some z : R such that optimal(z , f , {y | O(x , y)}) ∧ Satisfies(z , ˆr ), then we have O(x , z ) and f (x , z ) = f ∗ (x ) since z is a feasible and optimal solution. By definition of lb we infer lb(x , ˆr ) ≤ f (x , z ) = f ∗ (x ). Therefore, for arbitrary ub such that f ∗ (x ) ≤ ub we have lb(x , ˆr ) ≤ ub. QED Two difficulties arise in exploiting Proposition 4..1: (1) how to derive a lower bound function for a particular problem and (2) how to treat the nondeterministic expression involving ub. A common technique for deriving lower-bound functions involves relaxing or finding a superset S of the feasible space {z | O(x , z )} by various means, since {z | O(x , z )} ⊆ S =⇒ min({f (x , z ) | S}) ≤ min({f (x , z )|O(x , z )}).
29
One approach is to generalize or weaken the predicate O(x , z ). The rationale is that if O(x , z ) =⇒ O 0 (x , z ) then {f (x , z )| O(x , z )} ⊆ {f (x , z )| O 0 (x , z )}. One way to weaken a conjunction O is to replace some conjunct P in O by a weaker formula Q (if O is P ∧ P 0 and P =⇒ Q holds, then O can be generalized or weakened to Q ∧ P 0 ). Note that since P =⇒ true, we can always simply drop the conjunct P . Some common rules for weakening formulas may be found in [12, 19]. An important special case is to relax the feasible space by embedding the output domain R in a larger domain T and to suitably relax the predicate O so that it is naturally defined over D ×T (cf. [25]). A common example is the relaxation from an integer-valued linear program to a real-valued linear program. Examples of optimization algorithms that compute lower bounds by relaxation may be found in [31, 26]. A more general technique is called Lagrangian Relaxation [10]. Examples of deriving lower bound functions are given [38]. Lower bound functions can be derived via the following abstract inference specification. The approaches discussed above suggest tactics that could be used to guide the search for a useful lower bound function. Infer (lb) (I (x ) ∧ ˆI (x , ˆr ) ∧ Satisfies(z , ˆr ) ∧ O(x , z ) =⇒ (f (x , z ) ≥ lb(x , ˆr ))) A general principle borne out by several theoretical studies and empirical evidence is that the closer the lower bound is to the actual value f ∗ (x , ˆr ), the fewer the spaces that will be searched (see for example [26]). If lb 1 (x , ˆr ) ≤ lb 2 (x , ˆr ) ≤ f ∗ (x , ˆr ) for all x and ˆr , then lb 2 (x , ˆr ) ≤ ub =⇒ lb 1 (x , ˆr ) ≤ ub and thus lb 2 is part of the stronger filter and it will eliminate a superset of the spaces eliminated by the other. This fact has motivated researchers to explore ways to use a computed lower bound as a starting point for a local search (hillclimbing ascent) to drive the bound still higher [13, 10]. The difficulty is that a bound that is closer to the optimum is often more expensive to compute. Consequently there is a delicate tradeoff between the cost of computing a bound and its power to reduce overall search time. Unfortunately theoretical techniques for analyzing the effectiveness of various lower bound functions are not well developed at present. Another difficulty in applying Proposition 4..1 arises in implementing the upper bound expression some (ub) ( f ∗ (x ) ≤ ub ) in Proposition 4..1. The usual strategy is to maintain a variable for ub which is the cost of a least cost solution found so far during the search. Whenever a lower cost solution is discovered during the search, the variable is updated. The motivation for this implementation is to dynamically increase the strength of the lower bound filter as much as possible. The initial value for the upper bound can be derived by various means. One approach is to design an algorithm that quickly finds the cost of an arbitrary feasible solution. However, for some optimization problems the underlying feasibility problem is itself quite difficult (e.g., ILP is
30
NP-complete). Another approach is to compute the maximum of a relaxed version of the problem. That is, if our problem is to compute optima(λ(y, z ) f (x , y) ≤ f (x , z ), {z | O(x , z )})
(10)
and O 0 is a relaxation of O: ∀(x : D, z : R) ( I (x ) ∧ O(x , z ) =⇒ O 0 (x , z ) ) then an upper bound on f ∗ (x ) is arb(optima(≥, {f (x , z ) | O 0 (x , z )})). The following proposition shows how the lower bound filter is incorporated into the linear recursive global search scheme. Proposition 4..2 Let Init ub be a function such that ∀(x : D) ( f ∗ (x ) ≤ Init ub(x ) ) and let lb be a lower bound function satisfying (9). If GF is a well-founded gso-theory and Φ is a necessary feasibility filter, then the following program specification is consistent. function F (x : D) : set(R) where I (x ) returns optima(f , {z | O(x , z )}) = if Φ(x , ˆr0 (x )) ∧ lb(x , ˆr0 (x )) ≤ Init ub(x ) then F gso(x , {ˆr0 (x )}, { }, { }) else {} ˆ Solutions : set(R), Inactive : set(R)) ˆ : set(R) function F gso(x : D, Active : set(R), where I (x ) ∧ ∀(ˆr )(ˆr ∈ Active =⇒ ˆI (x , ˆr ) ∧ Φ(x , ˆr ) ∧ lb(x , ˆr ) ≤ ub(x , Solutions) ∧ Solutions = optima(f , {z | ˆr ∈ Inactive ∧ Extract(z , ˆr ) ∧ O(x , z )}) S returns optima(f , Solutions {z | ˆr ∈ Active ∧ Satisfies(z , ˆr ) ∧ O(x , z )}) = if empty(Active) then Solutions else let hˆr , A1i = arbsplit(Active) S let N ew Solutions = optima(f , Solutions {z | Extract(z , ˆr ) ∧ O(x , z )}) S N ew Active = {ˆs | ˆs ∈ A1 {ˆt | Split(x , ˆr , ˆs)} ∧ Φ(x , ˆs) ∧ lb(x , ˆs) ≤ ub(x , N ew Solutions)} F gso(x , N ew Active, N ew Solutions, Inactive with ˆr ). function ub(x , Solutions) : C = if empty(Solutions) then Init ub(x ) else f (x , arb(Solutions)) A lower bound function can be derived for ILP via the inference specification Infer (lb) (1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} 31
∧ domain(b) = {1..m} ∧ domain(c) = {1..n} ∧ domain(M ) = S ∧ S ] T = {1..n} ∧ range(M ) ⊆ {0, 1} ∧ ∀(s)(s ∈ S =⇒ x(s) = M (s)) ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b =⇒ (c · x ≥ lb(m, n, A, b, c, S, T, M ))) which specifies that we want to derive a lower bound on the expression c · x by applying transformations that preserve the ≥ relation. c·x
=
by definition X
c(i)x(i)
i∈{1..n}
using S ] T = {1..n}
= X
c(i)x(i)
i∈S]T
=
distributing X
c(i)x(i) +
X
c(i)x(i)
i∈T
i∈S
since ∀(s)(s ∈ S =⇒ x(s) = M (s))
= X
c(i)M (i) +
c(i)x(i)
i∈T
i∈S
≥
X
since both x and c are nonnegative for all inputs X
c(i)M (i).
i∈S
Thus
P
i∈S
c(i)M (i) is our derived lower bound function.
An initial upper bound on the cost of a minimal cost solution can be derived by solving for the maximum of a relaxed ILP problem according to (10). Infer (ub) (1 ≤ n ∧ 1 ≤ m ∧ domain(A) = {1..m} × {1..n} ∧ domain(b) = {1..m} ∧ domain(c) = {1..n} ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b =⇒ (c · x ≤ ub(m, n, A, b, c))) The derivation proceeds as follows. 32
c·x
=
by definition X
c(i)x(i)
i∈{1..n}
≤
using range(x) ⊆ {0, 1} X
c(i)max({0, 1})
i∈{1..n}
=
simplifying X
c(i).
i∈{1..n}
Thus we define Init ub(m, n, A, b, c) =
X
c(i)
i∈{1..n}
and the complete filter is X
c(i)M (i) ≤ (if empty(Solutions) then
i∈S
X
c(i) else c(arb(Solutions))).
i∈{1..n}
Using Theorem thm-lo-lb, this filter is incorporated into ILP in Figure 3. Program optimization transformations would pull out the common subexpressions and finite differencing would be used to compute the lower bound and upper bound incrementally. The result can be refined into an algorithm that is similar to Balas’ additive algorithm [2]. Balas implements the operation arb(T ) in a way that heuristically improves performance over a random selection. Subsequent refinements were made by Geoffrion [9] and others. There are alternative lower bound functions that can be derived for ILP . The most obvious one is to relax the type constraint on x to map(N at, real), thereby making the problem into an ordinary linear program. The motivation for this comes from the availability of fast algorithms for solving and incrementally re-solving linear programs.
4.3.2.
Dominance relations
Dominance relations [15, 14] are another necessary optimality filter. The idea is to compare two spaces to determine if one contains a better solution than the other — if so then the latter can be pruned. Formal treatment of this topic and detailed examples may be found in [33].
4.4.
Control strategies
The linear recursive scheme in Theorem 4..1 has a set Active that holds sets of spaces waiting for processing. It can in general be refined into a priority queue [1]. Its operations include Init which produces an empty priority queue, Insert which inserts a set of elements and associated priorities, 33
function ILP (m, n, A, b, c) where 1 ≤ n ∧ 1 ≤ m returns optima(λ(x, y) c · x ≤ c · y, {x | domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b}) P = if ∀i ∈ {1..m}( j∈{1..n} min(0, A(i, j)) ≤ b(i)) then ILP gso(m, n, A, b, c, {h{ }, {1..n}, {| |}i}, { }, { }) else {} function ILP gso (m, n, A, b, c, Active, Solutions, Inactive) where 1 ≤ n ∧ 1 ≤ m ∧ S ] T = {1..n} ∧ ∀(hS, T, M i)(hS, T, M i ∈ Active =⇒ ∀(i)(i ∈ {1..m} =⇒ ) = S ∧ range(M ) ⊆ {0, 1})) P(S ] T = {1..n} ∧ domain(M P ∧ j∈S A(i, j)M (j) + j∈T min(0, A(i, j)) ≤ b(i)) ∧ Solutions = optima(c · x, {x | hS, T, M i ∈ Inactive ∧ T = {} ∧ x = M ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b}) returns optima(c · x, {x | ∀(i)(i ∈ S =⇒ x(i) = M (i)) ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b}) = if empty(Active) then Solutions else let hhS, T, M i, A1i = arbsplit(Active) let N ew Solutions = optima(c · x, Solutions S {x | T = {} ∧ x = M ∧ domain(x) = {1..n} ∧ range(x) ⊆ {0, 1} ∧ Ax ≤ b}) S N ew Active = A1 {hS 0 , T 0 , M 0 i | ∃(a, r)( T 6= { } ∧ a = arb(T ) ∧ r ∈ {0, 1} ∧ hS 0 , T 0 , M 0 i = hS + a, T − a, M ⊕{|a 7→ r|}i ∧ ILP f ilter(m, n, A, b, S 0 , T 0 , M 0 )} ILP gso(m, n, A, b, c, N ew Active, N ew Solutions, Inactive with hS, T, M i). function ILP f ilter(m, n, A,b, S, T, M ) = ∀(i)(i ∈ {1..m} =⇒
X
j∈S
A(i, j)M (j) +
X
min(0, A(i, j)) ≤ b(i))
j∈T
Figure 3: 0,1-Integer Linear Programming Optimization Algorithm with Lower Bound Pruning
34
and Select which deletes and returns the element in the queue with the highest priority. The priority has the effect of determining the search strategy used. For example, depth-first, breadthfirst, and heuristic search can all be accommodated based on different priority functions. Choice of search strategy we leave as an open design choice to be determined by other design factors such as time and space constraints. Pearl [26] is a rich source of information on search control heuristics and their effect on performance.
5. 5.1.
Related Work Models of Global Search Algorithms
Backtracking has been used informally for a long time. An abstract backtrack scheme was first presented in 1960 by Walker [40]. This was an iterative, depth-first algorithm that generated all feasible orderings of subsets of a given set (e.g., permutations of a set or queens positions in the k-queens problem). Balas’ algorithm for 0,1 integer linear programming [2] was cast as a backtrack algorithm. Solutions are built up incrementally and filtering is implicit in the step that computes the set of extensions to the current partial solution. The model identifies backtracking as a basic systematic procedure for generating combinations/arrangements of elementary objects. Branch-and-bound algorithms first arose in the late 1950’s [17]. A general model of branch-andbound was developed by Lawler and Wood [18] in 1965. It generalizes the backtrack model to a tree searching procedure in which nodes represent restricted versions of the initial problem and the branching structure represents the decomposition of a problem into an equivalent set of subproblems. Novel features of this model include (1) generalizing to arbitrary combinatorial optimization problems, (2) identification of search control strategy (the order in which the nodes of the tree are explored) as a key design decision, and (3) use of lower bound functions for pruning nonoptimal subproblems. Axiomatic models of branch-and-bound algorithms were developed by Balas [2] and Mitten [20]. In Balas’ model, at each node/space a relaxed problem is solved and if the resulting solution is infeasible then the space is split such that the solution is excluded. Mitten’s model allowed infinite feasible spaces and (with modifications pointed out by Rinnooy Kan [27]) allowed the inference of a number of important properties, such as convergence and correctness. The concept of implicit enumeration which involves consideration of sets of solutions and the important concept of “fathoming” a set via a test on its representation was discussed by Geoffrion [9] as a way to simplify the presentation of Balas’ 0,1-ILP algorithm. Mitten and Wharburton’s model [21] of implicit enumeration simplifies earlier models of branch-and-bound to just two basic operations: branching (splitting a collection of sets) and elimination (deleting from a collection of sets those that provably cannot contain feasible or optimal solutions). The algorithm iteratively applies branching and elimination to a collection of spaces, converging exactly when the fixpoint collection contains only optimal solutions. Theirs is an axiomatic model (actually a family of models) allowing proofs of correctness and convergence of the abstract algorithm scheme. Mitten and Wharburton point out the importance of necessary conditions on feasibility and optimality as the logical basis for constructing elimination operations (filters). They also mention the use of sufficient conditions to guide search when, for example, only a single solution is desired. More recently Nau, Kumar, and Kanal [16, 23] show that essentially the same model can account for
35
game-tree search and various forms of problem-solving search (branch-and-bound and dynamic programming). The model in this paper emphasizes the need to represent spaces and to extract candidates from spaces. The use of Extract allows us to naturally model relaxation-based algorithms and to allow the effective enumeration of infinite feasible spaces. The model abstracts away control mechanisms altogether. Perhaps most importantly the model is formal and elaborate enough to support the automated design of global search algorithms. The literature on dominance relations is relatively sparse. Kohler and Steiglitz [15] gave the first formal treatment of them, but in the restricted context of permutation problems. Ibaraki [14] treated dominance relations in a more general setting and proved various properties about them. They do not seem to have found their way into textbooks as a standard design technique, yet it is likely that they occur commonly in practice.
5.2.
Designing Global Search Algorithms
There have been several efforts aimed at developing global search-like algorithms from specifications. Wirth [41] informally develops a backtrack algorithm for the k-queens problem by passing a test on the leaves of the search tree up into the internal branching points. In effect this amounts to deriving a necessary condition (Φ) on the existence of feasible extensions to current path (c.f. [34]). Gerhart and Yelowitz [11] provide a collection of schemes and generic correctness proofs for a class of backtrack algorithms that can be viewed as special cases of global search algorithms. Mostow’s FOO system [22] transforms (“operationalizes”) a specification plus advice into heuristic search algorithms. These can be viewed as global search algorithms in which spaces are initial sequences (the gs-theory gs sequences over finite set in the Appendix). A special class of necessary feasibility filters, called “monotonically necessary” predicates, are derivable. Sintzoff [29] presents a method for eliminating search from nondeterministic programs comprised of condition-action rules. The method specializes rules by regressing goals over the action part of rules. The effect is the derivation of sufficient conditions on the ability of the rule to make progress toward the desired goal. Broy and Wirsing [6] derive the notion of backtracking from the notion of an enumeration process consisting of a generation phase (which produces a set of candidate solutions) and an exhaustion phase (which consumes the candidates and computes the result solutions). Program transformations then interleave these two phases opening up opportunities to insert filters and perform program optimizations. Broy and Wirsing characterize a predicate that can be used as a kind of necessary feasibility filter — its intent is to capture the notion that for some problems the feasibility conditions can be symbolically evaluated on a partial solution. Many researchers interested in program transformation view specifications as understandable but inefficient generate-and-test algorithms. Transformations, called filter promotion or constraint incorporation, are then applied to incorporate information from the test (the output condition and, if one exists, the cost function) into the generator of candidate solutions. See for example Darlington [7], Broy and Wirsing [6], Tappel [39], Balzer [3], and Bird [4]. Promotion or constraint incorporation are abetted by using a global search framework for the generator. Also the notion of filtering by necessary conditions on feasibility or optimality generalizes the kind of results that would be obtainable by promotion under the same circumstances.
36
6.
Implementation in KIDS
The global search tactic for feasibility problems has been implemented in KIDS (Kestrel Interactive Development System) [34]. We have used it to semiautomatically derive and optimize over forty global search algorithms. Examples include the binary search algorithm and backtracking algorithms for k-queens, job scheduling, graph-coloring, knapsack, set-covering and the algorithms listed in the Appendix. The user constructs a domain theory that includes problem specifications and definitions of concepts and the laws for reasoning about them (typically these are distributive laws). The user may then choose to apply the global search tactic and further choose which gstheory from the library to specialize. The tactic automatically specializes the gs-theory to the given problem and automatically derives necessary feasibility filters (generating forward inferences to a fixed depth). The user selects which of the derived formulas to use in the filter (note that the use of any subset will result in a correct algorithm). This selection can be automated if dependency links are kept during inference. In terms of formula scheme (6), there are two conditions on acceptable filters: first, they must depend on both Satisfies(z , ˆr ) and O(x , z ) — no others have filtering power; second, only the strongest necessary conditions are useful — if both p and q are inferred, but q was inferred from p then using p alone will filter out more spaces than q, although using either or both of them will still give a correct algorithm. Finally the tactic instantiates a program scheme and a concrete consistent program is produced. KIDS allows the user to interactively apply algorithm design and program optimization transformations. The user uses a mouse to direct the application of simplification, finite differencing, and partial evaluation transformations to various subexpressions of the program. These transformations are all automatic and perform a correctness-preserving, high-level change to the program. When the program has been sufficiently optimized, the user directs it to be compiled. All user interaction with KIDS during development is via pointing to expressions in the evolving specification or selecting from a machine-generated menu. The derivations for the algorithms listed in the Appendix take on the order of 5-10 minutes running on a SUN4 workstation and require roughly 10 - 25 selections via the mouse (this includes algorithm design, optimization, and compilation). The global search tactics rely on 1. A library of standard gs-theories for common data types and type constructors. The KIDS library currently includes the gs-theories listed in the Appendix. These seem to account for the kinds of global search algorithms that one finds in current algorithm design texts. It is likely that more elaborate theories would be needed in specialized domains. 2. A directed inference system. We make no commitment to any formal systems for performing inference nor to the existence of a single system that will be generally useful. Special inferences needs are often best met by specialized systems and even knowledge bases. For our experimental purposes we have used a relatively simple system, called RAINBOW II, that combines natural deduction and rewrite rules. 3. Domain theories. The inference system depends on a knowledge-base of theories about various domains (objects, operators, relations, and laws), including the specification language. Domain theories provide the terminology in which a formal specification can be stated and the laws with which the inference system can do the steps of the tactics: theory matching, deriving filters, performing context-sensitive expression simplification, etc. When using KIDS a typical failure is that some knowledge that the system needs is either lacking or not in the right form for the inference rules. Our experience has been that definitions and distributive 37
laws comprise most of the knowledge needed for global search derivations. An interactive system that helps in acquiring new domain terminology and laws and helps diagnose incomplete deductions would make the KIDS approach easier and more systematic to use.
7.
Concluding Remarks
We have presented an axiomatic theory of global search algorithms and a formal method for designing concrete consistent programs from problem specifications. The design tactic works by specializing an existing gs-theory to the given problem. Design tactics are intended to provide a highly automated tool for transforming concise problem specifications into efficient programs. The decision to design an algorithm of a certain abstract form, such as global search or divide-and-conquer, provides much structure and motivation to the design process. The creation of a gs-theory for a given problem provides the rough framework for an algorithm. A more complicated but efficient algorithm is then obtained by deriving filters, performing simplifications, finite differencing, data structure design, etc. Global search theory and design tactics can account for a broad range of known algorithms (see Appendix). Their range of applicability is an example of an important form of reusability in software — formal knowledge of how to generate algorithms from specifications.
Acknowledgements I would like to thank Jim Bennett, Alan Biermann, Allen Goldberg, Richard J¨ ullig, Mike Lowry, Tom Pressburger, David Steier, and the referees for their comments on this paper and for suggesting many improvements. This research was supported in part by the Office of Naval Research under Contracts N00014-84-C-0473, N00014-87-K-0550, and N00014-90-J-1733 and in part by the Air Force Office of Scientific Research under Contract F49620-88-C-0033.
38
Appendix: A Library of GS-Theories A collection of standard gs-theories for sets, sequences, mappings, and integers is presented below. These are all represented in the knowledge-base of KIDS. Each gs-theory below is accompanied by a list of problems to which we have applied it. 1. gs bounded sequences over f inite set enumerates all sequences of length at most k over a given finite set S. The generator works by building partial sequences - i.e. a set of sequences is represented by a sequence V that is their greatest common prefix. Splitting is performed by appending an element from S onto the end of common prefix V. The sequence V itself is directly extractable from the space. F D I R O ˆ R ˆI Satisfies ˆr0 Split Extract
7→ 7→ 7→ 7 → 7 → 7 → 7 → 7 → 7 → 7 → 7→
gs sequences over f inite set set(α) × integer λ(S, k) 0 ≤ k seq(α) λ(hS, ki, q) range(q) ⊆ S ∧ size(q) ≤ k seq(α) × integer λ(S, k, V ) size(V ) ≤ k ∧ range(V ) ⊆ S λq, V. extends(q, V ) emptyseq λ(hS, ki, V, V 0 ) size(V ) < k ∧ ∃(i)(i ∈ S =⇒ V 0 = append(V, i)) λ(q, V ) q = V
Derived Examples: sequences of length k, n-bit vectors, permutations of a given set, k-queens [34], optimal job scheduling [38], topological sort [38], Cyclic projective planes, Costas Arrays [35], and traveling salesman problem. 2. gs finite mappings is an gs-theory for enumerating mappings from finite set U to finite set V. The space structure hS, T, M i supports the incremental construction of a partial map M whose domain is S ⊆ U and where T is used to hold uncommitted elements of the domain U. D I R O ˆ R ˆI Satisfies ˆr0 Split
7→ 7 → 7 → 7 → 7 → 7→ 7→ 7→ 7→
Extract
7→
set(α) × set(β) true map(α, β) λ(hU, V i, N ) domain(N ) = U ∧ range(N ) ⊆ V set(α) × set(α) × map(α, β) λ(hU, V i, hS, T, M i) S ] T = U ∧ domain(M ) = S ∧ range(M ) ⊆ V λ(N, hS, T, M i) ∀(x)(x ∈ S =⇒ N (x) = M (x)) λ(hU, V i)h{}, U, {| |}i λ(hU, V i, hS, T, M i, hS 0 , T 0 , M 0 i) ∃(a, b) ( T 6= { } ∧ a = arb(T ) ∧ b ∈ V ∧ hS 0 , T 0 , M 0 i = hS + a, T − a, M ⊕ {| a 7→ b |}i) λ(N, hS, T, M i) T = {} ∧ N = M
Derived Examples: 0,1-integer linear programming, bin-packing satisfiability of a propositional formula, optimal circuit layout, various schedule optimization problems, graph k-coloring, traveling salesman problem, hamiltonian circuit.
39
3. gs subsets of a finite set enumerates subsets of a given finite set. A space is represented by a pair hU, V i where U is the partially constructed subset - the greatest common subsubset of sets in the space represented by U. The space variable V is the set of uncommitted elements so far. Splitting involves selecting an arbitrary element of V and alternatively adding or not adding it to U (yielding two new space descriptors). This generator has the property that the depth of the search tree is exactly size(S) and that subsets are extracted only from the leaves. The size of the search tree is 2n+1 − 1 where n = size(S). D R O ˆ R ˆI Satisfies ˆr0 Split
7→ 7 → 7 → 7 → 7 → 7 → 7 → 7 →
Extract
7→
set(α) set(α) λ(S, T ) T ⊆ S set(α) × set(α) λ(S, hU, V i) U ] V ⊆ S λ(T, hU, V i) U ⊆ T ∧ T ⊆ V ] U λ(S) hemptyset, Si λ(S, hU, V i, hU 0 , V 0 i) V 6= { } ∧ a = arb(V ) ∧ (hU 0 , V 0 i = hU, V − ai ∨ hU 0 , V 0 i = hU + a, V − ai) λ(T, hU, V i) empty(V ) ∧ T = U
Derived examples: set cover, knapsack, k-clique, vertex cover (for definitions of these problems see [1]), and cyclic difference sets [37]. 4. gs binary split of integer subrange encodes the binary split paradigm. It enumerates all elements of a subrange [m..n] of integers. Spaces correspond to subsubranges and are split roughly in half. This can be generalized to a nondense total order and to allow the split point to be arbitrary and thus determinable at either design-time or run-time. D I R O ˆ R ˆI Satisfies ˆr0 Split
7→ 7→ 7 → 7 → 7 → 7 → 7 → 7 → 7 →
Extract
7→
integer × integer λ(m, n) m ≤ b integer λ(hm, ni, k) k ∈ [m..n] integer × integer λ(hm, ni, hi, ji) m ≤ i ≤ j ≤ n λ(k, hi, ji) i ≤ k ≤ j λ(hm, ni) hm, ni λ(hm, bi, hi, ji, hi0 , j 0 i) i < j ∧ (hi0 , j 0 i = hi, (i + j) div 2i ∨ hi0 , j 0 i = h1 + (i + j) div 2, ji) λ(k, hi, ji) i = j ∧ k = i
Derived Examples: binary search [32], integer square root, maximum of a unimodal function [33].
40
References [1] Aho, A., Hopcroft, J., and Ullman, J. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA, 1974. [2] Balas, E. An additive algorithm for solving linear programs with zero-one variables. Operations Research 13 (1965), 517–546. [3] Balzer, R. M. Transformational implementation: An example. IEEE Transactions on Software Engineering SE-7, 1 (1981), 3–14. [4] Bird, R. S. The promotion and accumulation strategies in transformational programming. ACM Transactions on Programming Languages and Systems 6, 4 (October 1984), 487–504. [5] Bjørner, D., Ershov, A. P., and Jones, N. D., Eds. Partial Evaluation and Mixed Computation. North-Holland, Amsterdam, 1988. [6] Broy, M., and Wirsing, M. Program development: From enumeration to backtracking. Information Processing Letters 10, 4 (July 1980), 193–197. [7] Darlington, J. A synthesis of several sorting algorithms. Acta Informatica 11, 1 (1978), 1–30. [8] Fox, M. S., Sadeh, N., and Baykan, C. Constrained heuristic search. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (Detroit, MI, August 20–25, 1989), pp. 309–315. [9] Geoffrion, A. M. Integer programming by implicit enumeration and Balas’ method. SIAM Review 9, 2 (April 1967), 178–190. [10] Geoffrion, A. M. Lagrangian relaxation and its uses in integer programming. Mathematical Programming Studies 2 (1974), 82–114. [11] Gerhart, S., and Yelowitz, L. Control structure abstractions of the backtrack programming technique. IEEE Transactions on Software Engineering SE-2, 4 (April 1976), 285–292. [12] Gries, D. The Science of Programming. Springer-Verlag, New York, 1981. [13] Held, M., and Karp, R. Traveling salesman problems and minimum spanning trees. Operations Research 18 (1970), 1138–1162. [14] Ibaraki, T. The power of dominance relations in branch-and-bound algorithms. Journal of the ACM 24, 2 (April 1977), 264–279. [15] Kohler, W. H., and Steiglitz, K. Characterization and theoretical comparison of branchand-bound algorithms for permutation problems. Journal of the ACM 21, 1 (January 1974), 140–156. [16] Kumar, V., and Kanal, L. A general branch and bound formulation for understanding and synthesizing and/or tree search procedures. Artificial Intelligence 21, 1-2 (March 1983), 179–198. [17] Land, A., and Doig, A. An automatic method for solving discrete programming problems. Econometrica 28 (1960), 497–520.
41
[18] Lawler, E. L., and Wood, D. E. Branch-and-bound methods: A survey. Operations Research 14 (1966), 699–719. [19] Michalski, R. S. A theory and methodology of machine learning. In Machine Learning: An Artificial Intelligence Approach, R. S. Michalski, Ed. Tioga Press, Palo Alto, CA, 1983, pp. 83–135. [20] Mitten, L. G. Branch-and-bound methods: General formulation and properties. Operations Research 18, 1 (1970), 24–34. [21] Mitten, L. G., and Warburton, A. R. Implicit enumeration procedures. Tech. Rep. Working Paper 251, University of British Columbia, 1973. [22] Mostow, J. D. Machine transformation of advice into a heuristic search procedure. In Machine Learning: An Artificial Intelligence Approach, R. S. Michalski, Ed. Tioga Press, Palo Alto, CA, 1983, pp. 367–404. [23] Nau, D., Kumar, V., and Kanal, L. General branch and bound and its relation to A* and AO*. Artificial Intelligence 23, 1 (May 1984), 29–58. [24] Paige, R., and Koenig, S. Finite differencing of computable expressions. ACM Transactions on Programming Languages and Systems 4, 3 (July 1982), 402–454. [25] Partsch, H. Specification and Transformation of Programs: A Formal Approach to Software Development. Springer-Verlag, New York, 1990. [26] Pearl, J. Heuristics. Addison-Wesley, Reading, MA, 1984. [27] Rinnooy Kan, A. H. G. On mitten’s axioms for branch-and-bound. Tech. Rep. Working Paper W/74/45/03, Graduate School of Management, Delft, The Netherlands, 1974. [28] Shoenfield, J. R. Mathematical Logic. Addison-Wesley, Reading, MA, 1967. [29] Sintzoff, M. Eliminating blind alleys from backtrack programs. In Proceedings Third ICALP. Edinburgh University Press, Edinburgh, 1976, pp. 531–557. [30] Smith, D. R. Derived preconditions and their use in program synthesis, LNCS 138. In Sixth Conference on Automated Deduction (Berlin, 1982), D. W. Loveland, Ed., Springer-Verlag, pp. 172–193. [31] Smith, D. R. Random trees and the analysis of branch and bound procedures. Journal of the ACM 31, 1 (January 1984), 163–188. [32] Smith, D. R. On the design of generate-and-test algorithms: Subspace generators. In Program Specification and Transformation, L. Meertens, Ed. North-Holland, Amsterdam, 1987, pp. 207– 220. [33] Smith, D. R. Structure and design of global search algorithms. Tech. Rep. KES.U.87.12, Kestrel Institute, November 1987. [34] Smith, D. R. KIDS – a semi-automatic program development system. IEEE Transactions on Software Engineering Special Issue on Formal Methods in Software Engineering 16, 9 (1990), 1024–1043. [35] Smith, D. R. KIDS: A knowledge-based software development system. In Automating Software Design, M. Lowry and R. McCartney, Eds. MIT Press, Menlo Park, 1991, pp. 483–514. 42
[36] Smith, D. R., and Lowry, M. R. Algorithm theories and design tactics. In Proceedings of the International Conference on Mathematics of Program Construction, LNCS 375, L. van de Snepscheut, Ed. Springer-Verlag, Berlin, 1989, pp. 379–398. (reprinted in Science of Computer Programming, 14(2-3), October 1990, pp. 305–321). [37] Smith, D. R., and Lowry, M. R. Algorithm theories and design tactics. Science of Computer Programming 14, 2-3 (October 1990), 305–321. [38] Smith, D. R., and Pressburger, T. T. Knowledge-based software development tools. In Software Engineering Environments, P. Brereton, Ed. Ellis Horwood Ltd., Chichester, 1988, pp. 79–103. (also Technical Report KES.U.87.6, Kestrel Institute, May 1987). [39] Tappel, S. Some algorithm design methods. AAAI, pp. 64–67. [40] Walker, R. J. An enumerative technique for a class of combinatorial problems. In Proceedings of Symposia in Applied Mathematics, vol. 10. American Mathematical Society, Providence, Rhode Island, 1960, pp. 91–94. [41] Wirth, N. Program development by stepwise refinement. Communications of the ACM 14, 4 (April 1971), 221–227.
43