Program Inversion

3 downloads 0 Views 94KB Size Report
Keywords and phrases: program inversion, inversion, sorting, permutation ... nondeterministic program that can generate any acceptable sentence or it may be a ...
Program Inversion* Rommert J. Casimir Erasmus University Rotterdam Faculty of Economics, Dept of Computer Science (Vakgroep AIV) Abstract: The inversion of a program with a given output is defined as a program that generates the sequence of all input strings that produce the given output if presented to this program. A method for producing the inversion of programs is given. It is shown that inversion establishes a natural relationship between various well known programs, e.g. sorting and permutation generation programs. Keywords and phrases: program inversion, inversion, sorting, permutation generation. CR categories: 5.24

* This is a reprint of the original report, which dates from 1980 and has been rejected by a computer science journal.

1

1 Automata It is a well known result of automata theory that any finite automaton corresponds to the syntax of a regular language [DE78]. This syntax may be used to generate the sentences that will be accepted by the automaton. The generation procedure may take two forms: may be a nondeterministic program that can generate any acceptable sentence or it may be a deterministic program that enumerates all sentences in the language. The latter program terminates only if the number of acceptable sentences in the language is finite, i.e. if the syntax contains no recursive definitions and the automaton may be represented as an acyclic directed graph. The sentences in the language may also be generated directly from the automaton by an algorithm which enumerates all paths through a directed graph. Consider as an example the automaton that accepts all partitions of 4:

1 2 3 4

0 1 2 3 4

1 2 3 4 5

2 3 4 5 5

3 4 5 5 5

4 4 4 4 4

5 5 5 5 5

with the accepting state 4. This automaton corresponds to the left regular grammar:

::= 1 ::= 1 | 2 ::= 1 | 2 | 3 ::= 1 | 2 | 3 | 4

If this syntax is applied depth-first and left-to right the partitions are generated in the order: ,,,,,,,

The automaton also corresponds to the right regular grammar:

1 1 1 1

2 2 2

3 3 1 4

Which generates the partitions in the order: ,,,,,,,

The algorithm that generates the sentences directly from the automaton in the same order as the left regular grammar starts from the accepting state and searches the transition table for all state/input pairs that produce a transition to this state. This procedure is repeated until the initial state is reached, which means that a new path has been found, or the set of transitions to this state is exhausted. In both cases the program backtracks until all transitions to the final state have been used. 2

The algorithm that generates the sentences in the same order as the right regular grammar starts from the initial state and runs through all states until either the accepting state or a state from which no path leads to the accepting state is reached. Then the program backtracks until all paths are found.

2 Programs A deterministic program which produces a given output may be considered as a deterministic automaton that accepts a string of inputs iff this produces the given output. Unless otherwise stated we will consider all programs to be deterministic. Thus a program with a given result should correspond to a syntax that defines the input strings that produce this result if presented to this program. So from a program and a result we may derive a nondeterministic program that can generate any input string that will lead to the given result in the original program and, if the set of acceptable input strings is finite, a deterministic program that generates all input strings that lead to this result. If a program X generates all input strings that produce a specified result in a program Y we call X the inversion of Y. This a generalization of the concept of program inversion as used in the literature [DI79,SI79] where it has been applied only to programs with a one-to-one correspondence between input and output. Inversion of a known program may either produce a new algorithm or an algorithm which is already known. In the latter case a possibly unknown relation between algorithms may be established. As a first example we take the following program to compute the sum of n integers. Throughout this report programs are written in Pascal. In this case we have taken some liberties with language to describe a general class of programs that cannot be expressed in Pascal proper. The complete Pascal programs and procedures in this report have been mechanically edited from the source texts for Apple-UCSD Pascal. var

s,I,j,n:integer; a:array[l..n]of integer; begin s:=0;i:=0; while i 0 ai ≤ sum-s

(by definition) (to preserve invariant 2)

The accepting state is reached when s=sum. In this state the condition i=n holds, so the value of i defines the input value of n. The resulting program is: const var

sum=4; a:array[l..sum) of integer; tl:integer; procedure partright(s,i:integer); var j:integer; begin if s=sum then begin tl:=tl+l;write(tl:3) for j:=l to i-1 do write(a[j]:3);writeln end else for j:=l to sum-s do begin a[i]:=j;partright(s+j,i+l) end; end; begin tl:=0;partright(0,1); end.

The left inversion constructs for each state the set of inputs that force a transition from a state which can be reached from the initial state. Using invariants (1) and (2) we find the acceptable input values for a[i]: ai>0 ai≤ s

(by definition) (to preserve invariant 1)

The initial state is reached when s=0. In this state the condition i=0 should hold. However in the accepting state the value of i is unknown because i is an input value. This means we cannot use the variable i but should use the variable k=n-i instead, so the value of k in the initial state can define the input value of n. The resulting program is:

4

const sum=4; var a:array[1..sum] of integer; tl:integer; procedure partleft(s,k:integer); var j:integer; begin if s=0 then begin tl:=tl+1;write(tl); for j:=k- 1 downto 1 do write (a[j]:3); writeln end else for j:=1 to s do begin a[k]:=j;partleft(s-j,k+1)end; end; begin tl:=0;partleft(sum,1); end.

The left inversion effectively follows the program backwards whereas the right inversion generates the acceptable inputs on its own and only uses the invariants of the original program to check the validity of these inputs. We intend to invert programs statement by statement which will result in left inversions.

3 Functions Another way to look at a program is to consider it as a function which maps a set of input strings onto a set of output strings. The inverse of a program which computes a function f computes the inverse of this function which is either a function f-1 or a one-to-many relation f . As an example we use a program to compute the sum of n integers. The corresponding function maps the set of vectors a[1..i] with i in the range 1..n containing values in the range 1..m onto the set of integers 1..n*m. This function may be defined recursively as: ub(a)=1: f(a)=a1 ub(a)>1: f(a)=aub ∪ f(al..ub-1)

where ub denotes the upper bound of the vector a and ai..j denotes the vector containing the elements i through j from a. The inverse of the function f is the relation q which defines the set of vectors that map onto a specified integer. This may be expressed as:

g(s) = {< s >} ∪

s −1

U g( s − i ) *{< i >} i ∈1

The relation g of course defines the partitions of s and the program which was given as the left inversion of the summation program may be directly derived from it.

4 Relations Relations may be used to define the logic of algorithms [KO79] and to describe the logical structure of a database[CO70]. In the programming language PROLOG [WP77,VE78] the logical component of an algorithm is separated from the control part. The PROLOG program

5

proper defines the relations between objects. A program is activated by a goal statement which defines the values of some objects. This means that the interpreter is instructed to find such values for the objects that are not defined in the goal statement that the goal statement is shown to be unsatisfiable with these values. If the execution successfully terminates, the values defined by the goal statement may be called input values and the values chosen during the execution may be called output values. On the other hand the execution will abort when it is not possible to find such values by exhaustive search. In this case we may say that the input values are incorrect. If one of the domains in the relation that is not defined in the goal statement is infinite, it is possible that the execution of the program does not terminate. In that case it cannot be shown whether the goal statement is satisfiable. If a relation is binary and one-to-one it may be replaced by functional notation, e.g: R(a,b)



f(a)=b f-1(b)=a

Thus in a program which contains only the logic of the algorithm in the form of relations, there is no difference between a program and its inversion. However, the effect of execution of such a program is not altogether clear when the program contains other than one-to-one relations. PROLOG assigns such values to objects that the relations in the program are satisfied, so its results are logically nondeterministic, but in effect controlled by the interpreter. Example The relation sum(t,x) where t is a possibly empty list containing positive integers and x is an integer may be defined informally as: x is the sum of the values of the elements of t Taking our notation from [VE78] this may be defined in PROLOG as: sum(nil,O) sum(*a.*x,*a+*y) posint(*a) nonnegint(*a)

n then jhigh:=n+1; while (ib then begin w:=s[i];s[i]:=s[j];s[j]:=w end; with stack[stackp+1] do begin lo:=h;hi:=u end; perqui (b,b,h+1,(h+b)div 2,h,stackp+1,s); for k:= i+1 to h do for l:= j-1 downto h+1 do perqui(b,k,l,h,u,stackp,s); end; end; begin for i:= 1 to ub do s[i]:=i;tl:=0; perqui(0,0,ub+1,ub div 2,ub,0,s); end.

This generates the permutations in the following order:

24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1 1 2 2 4 4 2 2 4 4 3 3 3 3 2 2 1 1 4 4 1 1 3 3

2 2 1 1 2 2 4 4 3 3 4 4 2 2 3 3 4 4 1 1 3 3 1 1

3 4 3 4 3 1 3 1 2 1 2 1 1 4 1 4 3 2 3 2 2 4 2 4

4 3 4 3 1 3 1 3 1 2 1 2 4 1 4 1 2 3 2 3 4 2 4 2

6.8 Treesort The treesort algorithm places the elements of a vector in a binary tree and then follows this tree to select the elements in the correct order. Thus the inversion of this program generates all binary trees and for each tree generates the permutations that result in this tree. So it has an intermediate result which is interesting in its own right [KN77]. The program which selects the nodes from a binary tree in the correct order can be described as: procedure tree(node:integer); if left[node] > 0 then tree(left[node]); do something with node if right[node] > 0 then tree(right[node]);

The top-down inversion of this procedure successively selects all values in the sorted vector as node and generates all combinations of the left and right trees as they are defined by this node. In this way the trees are generated in the order given by Knott [KN77], which is derived from Knuth [KN72]. Because the original program is recursive its inversion needs two stacks. The generation of all permutations from the trees is straightforward. The resulting program is:

25

const type var

ub=4; stax=record dir,from,toe:integer end; direc=(left,right,link); stack: array[1..ub] of stax; t:array[1..ub,direc] of integer; val:array[1..ub]of integer; i,tl:integer; dd:direc;

procedure priper(n:integer); var i,j,k:integer; found:boolean; begin for i:=1 to ub do begin if (t[i,left]=0) and (t[i,right]=0) then begin found :=false;k:=n; while (k0 then t[back,d]:=i; t[i,link]:=back; if i>start then begin with stack[stackp+1] do begin dir:=i-1;from:=start;toe:=i end; tree(right,i,i+1,finish,stackp+1); end else tree(right,i,i+1,finish,stackp); if back>0 then t[back,d]:=0;t[i,link]:=0; end; end end; begin

26

for i:=1 to ub do for dd:=left to link do t[i,dd]:=0; tl:=0;tree(left,0,1,ub,0); end.

This produces the permutations in the following order: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4

2 2 3 3 4 4 3 3 1 4 4 1 4 1 1 4 2 2 1 1 2 2 3 3

3 4 4 2 2 3 4 1 3 3 1 4 1 4 2 2 4 1 2 3 3 1 1 2

4 3 2 4 3 2 1 4 4 1 3 3 2 2 4 1 1 4 3 2 1 3 2 1

6.9 Cyclic sorting We may now ask whether any sorting program may be inverted to generate the permutations in the order of one of the permutation generation procedures in [SE77], other than the lexicographic or inverse lexicographic order. We will first study the method of permutation generation by nested cycling. We first define the notion of a cycle-sorted vector, i.e. a vector which can be sorted by cycling. A vector a[1..n]is cycle-sorted if: for at most one j : a [j] > a [(j mod n) +1] and ∀ k ∈ 1..n, k≠ ≠ j : a[k]≤ ≤ a[(k mod n)+1]

Now a vector may be sorted by rotating each cycle-sorted subvector a[1..i-1] in such a wav that the subvector a[1..i] remains cyclesorted and finally rotating the vector a[l..n] until it is sorted. If this algorithm is implemented straightforward on a conventional machine its is very slow as it runs in time O(n3). So the program for this algorithm, which we call cyclesort, is given only as a curiosity:

27

procedure cyclesort(n:integer); var i,j,k:integer; found:boolean; procedure cycle(n:integer); var j,w:integer; begin w:=a[n]; for j:=n downto 2 do a[j]:=a[j-1];a[1]:=w; end; begin for i:=2 to n+1 do begin j:=1;found:=false;k:=i-1; while (ja[j mod (i-1)+1]then k:=j; if i=a[j])and(a[i]a[i+1] → a[i],a[i+1]:=a[i+1],a[i] od

This program exchanges any two successive elements in the vector if they are out of order. It terminates if all elements are in order, so the vector is certainly sorted if the program terminates. Because every exchange diminishes the number of inversions in the vector it is also guaranteed to terminate. If we add a control function for the pointer I, the program will be transformed into a deterministic program, e.g. insertion sort or bubble sort, so its time is not worse than O(n2). Now a nondeterministic program with a deterministic result may produce a result from a given input by more than one path. The equivalent theorem from automata theory is that the grammar that is accepted by a nondeterministic automaton is ambiguous. This means that we cannot generate the acceptable inputs for a nondeterministic program by following a tree but only by enumerating all nodes in a graph, for which a breadth-first approach should be preferred. As an example we give the graph which describes the possible execution paths in the nondeterministic sorting program with the result : 123 213 231

132 312 321

Because of the necessity to check whether any new permutation has already been generated the resulting program is neither efficient nor elegant:

29

const var

n=4;p=24; i,j,k,m,t0,t1,t2: integer; found,equal: boolean; a:array[1..p,1..n]of integer; b:array[1..n]of integer;

begin for i:=1 to n do a[1,i]:=i; while t1>t0 do begin t2:=t1; for i:=t0+1 to t1 do begin for j:=1 to n-1 do if a[i,j]