in constant time, find an O(log n) algorithm for computing index (merge (x, y)) n. .... some degree of freedom, but the spirit of the original should be preserved, and ... We can build the tree from bottom to top by constructing a sequence of trees at.
Science of Computer Programming
Functional algorithm design
’
Richard S. Bird Programming Research Group, Oxford University, Wolfson Building, Parks Road Oxford, OXI 3QD. UK
Abstract For an adequate account of a functional approach to the principles of algorithm design we need to find new translations of classical algorithms and data structures, translations that do not compromise efficiency. For an adequate formal account of a functional approach to the specification and design of programs we need to include relations in the underlying theory. These and other points are illustrated in the context of sorting algorithms.
1. Introduction As a subject
Algorithm
Design
tion, strategies
in the core curriculum of most undergraduate computing degrees, is concerned with explaining basic strategies for efficient computa-
taught
such as greedy algorithms,
dynamic
programming,
and divide and con-
quer, together with the use of appropriate data structures for representing information. These strategies are illustrated with descriptions of famous algorithms from the literature of computing science. A comprehensive treatment is given in the excellent text [S]. Normally the subject is studied using imperative dictions, but one can also attempt a functional approach. Trying to express standard algorithms an exhilarating and challenging experience. It is exhilarating
in functional form is both because of the amount of
ground that can be covered in a short course, and challenging because many traditional algorithms need to be completely rethought in a functional setting. New algorithms for old are beginning to emerge. For example, one can cite David King’s and John Launchbury’s elegant treatment [13] of various graph algorithms, de Moor’s characterisation of dynamic programming [21], a functional approach to pattern matching [2, 10, Ill, and Chris Okasaki’s recent work [22,23] on purely functional queues. But much remains to be done; for instance, as far as we are aware, there is no effective treatment of the Union-Find problem in a functional setting, a point returned to below. It may be the case, as is suggested in [24, 141, that some classes of algorithm ’Modified version of an invited talk at MPC, 1995 0167-6423/96/$15.00 @ 1996 Elsevier Science B.V. All rights reserved SSDIO167-6423(95)00033-X
R.S. BirdlScience
16
are inherently
inefficient
of Computer Programming 26 (1996)
in any formalism
for programming
15-31
that lacks updatable
but this point has not yet been settled in a satisfactory manner. Unlike other formalisms, functional programming offers a unique opportunity ploit a compositional
approach
to Algorithm
Design,
and to demonstrate
state, to ex-
the effective-
ness of the mathematics of program construction in the presentation of many algorithms. However, for an adequate formal account of programming with functions we need to include relations
in the underlying
theory. With relations
our powers of description
are
increased, and calculations can be unified. But, like the embedding of the real line in the complex plane, the extension to relations should be as seamless as possible, and preserve the shape and simplicity
of its functional
years there has been a growing
appreciation
subset as much as possible.
of the need for relations
In recent
in formal pro-
gram development ([ 1,8, 12, 18, 19,21,25], to cite just a few references), though not everyone takes the view that relational programming is a generalisation of functional programming. In the rest of the paper these remarks
are amplified,
using problems
searching as illustrations. Our aim is to indicate something on programming that a functional approach can reveal.
in sorting and
of the unique
perspective
2. On composition Consider sort x
the following
well-known
functional
version
= [I, = sort y -!+[a] it sort z, where (y, a, z) = split x
split (a : x) = (jilter
( a) x).
It is an academic point (i.e. interesting, even intriguing, but probably not of practical consequence) whether this program can legitimately be called quicksort. After all, the heart of quicksort - the partition phase that burns the candle at both ends - is missing, and there is no notion of an in situ algorithm in functional programming. What is more, quicksort is a terrible algorithm in functional form: its expected running time is easily beaten by mergesort, among others, and it contains a space leak. However, that is not the point at issue here. In most texts on Algorithm Design, sorting is quickly followed, in the same chapter or the following one, with a discussion of selection. The standard expected linear time selection algorithm is introduced by phrases such as “can be modelled on quicksort”, or “follows the structure of quicksort”. But suppose we define select x k = index (sort x) k, where index x k is the kth element
of x (counting
index (a : x) 0 = a index (a : x) (k + 1) = index k x.
from 0):
R.S. BirdlScience
Then we can calculate,
of’Computer Programming 26 (1996)
for nonempty
17
15-31
x:
select x k =
{definition
of select)
index (sort x) k =
{definition
of sort, with (y,a,z)
= split x}
index (sort y + [a] + sort z) k =
{since index (u + u) k = (k < #u + index u k, index v (k - #u))} (k < #sort y + index (sort y) k, index ([a] + sort z) (k - #sort y))
=
{since #sort y = #y = n (say)} (k < n + index (sort y) k, index ([a] Stsort z) (k - n))
=
{last but one step again, and definition
of index}
(k < n --+ index (sort y) k, k = n + a, index (sort z) (k - n - 1)) =
{definition
of select}
(k = (x, a, y> = %jA
(~,a, y))
= a.
The value min (a, b) is the smaller of a and 6. It remains to implement the function mktree.
The standard
algorithm
takes a list
[a~, al,. . .] and builds a tree with a0 at the top, al, a2 at the next level, as, a4,a5, a6 at the next level, and so on until the list is exhausted. The length of the list at the bottom level will not in general be a power of two. Of course, in the array based algorithm no building
actually
takes place; the array is just viewed
as forming
such a tree and
everything is done by juggling subscripts. We can build the tree from bottom to top by constructing a sequence of trees at each level; the trees at the next level higher up are formed by combining trees in pairs with appropriate containing
elements
of the list. At the end of this process we are left with a list
a single tree. To implement
mktree = head . mktrees The function [[aOk
the idea we define
. levels.
levels : list (list A) + list A applied to [a~, al,.
[al,a2],
.] produces
the list
a61> ..1
[a3,a4,a5,
and is defined by an unfold: levels start x isrnil (k,x) level (k,x)
= [isrnil, level] . start = (1,x) = (x = nil) = (take k x,(2 x k,drop
The curried functions of a list. The function
k x))).
take k and drop k, respectively,
mktrees
take and drop the first k elements
: list (tree A) +- list (list A) is defined by
mktrees = @[null], layer), where layer : list (tree A) + (list A x list (t ree A)) is defined by an unfold; to the pair ([ao,ai,. .], [ug, UI , . . .)] this function produces the list York
(~~,a~,ulMork
(~2,a1,~3),..
applied
.I.
If the list [UO,~1,. . .] of trees is not long enough, of empty trees. The definition of layer is: layer islnil (x, ts) step (cons (a,~), nil) step (cons (a,~), cons (24,nil)) step (cons (a,~), cons (u, cons (v, ts)))
= = = = =
it is filled with a sufficient
[is&l, step] (x = nil) (Jerk (null, a, null), (x, nil)) cfork (u, a, null), (x, nil)) cfork (u, a, v), (x, ts)).
number
R.S. BirdIScience
This completes
of Computer
the new definition
reader, to show that mkheap
Programming
of mkheap.
26 (1996)
23
15-31
It is an instructive
exercise,
left to the
takes linear time.
The new version of heapsort shows that some standard algorithms can be translated to functional form while preserving the spirit of the original. But there are other algorithms whose
functional
translations
are not obvious.
In particular,
Kruskal’s
algorithm
for
minimum cost spanning trees uses, in addition to a heap, an algorithm for the UnionFind problem. The Union-Find problem concerns the efficient implementation of three operations
on disjoint
sets, specified
: setA-+set
units units x
= Ha)
as follows:
(setA)
I a Exl
: set (set A)+A+set A = “the (unique) set x in xs that contains
a”
: set (set A) --7‘set A + set A 4 set (set A)
union
union xs x y = (xs - {x} - {y}) U {x U y}. Various schemes for maintaining clear how to achieve comparable
partitions are known [26,27], but currently efficiency in a purely functional setting.
it is not
5. On relations Relations have been knocking at the door, demanding entry for some time now, and it is time to let them in. One reason concerns the nature of the relationship between the fold and unfold operations, and another concerns program specification in general. In a purely functional framework one can model relations by set-valued functions, but the mathematics becomes fussy. It becomes even fussier if we have to model set-valued functions by list-valued ones. With relations things are significantly simpler. Moreover, unlike functions important
every relation
both in specification
has a converse, and program
Consider for instance the following of elements under a preorder 9:
purely
and the use of converse
operations
are
development. functional
specification
of sorting a list
sort = head . jlter uplist . perms uplist x = and [a d b 1 (a, 6) c zip (x, tail x)]. The function perms returns a list of all permutations of a sequence, and the booleanvalued function uplist determines whether a sequence is ascending under 4. While this is an acceptable specification of sort, a better one is to specify sort to be a function satisfying the inclusion sort C uplist?
perm,
(1)
24
R.S. BirdlScience
of Computer Programming 26 (1996)
where perm and uplist? are now relations
rather than functions.
then uplist? . perm
but the expression
is itself a function,
15-31
(If _a is a linear order, is not capable
of being
implemented directly in a standard functional language, so there is still work to do.) Since we want to preserve compatibility with functions, we think of relations as taking arguments
on the right and delivering
results on the left, so our relational
composi-
tion takes the same form as functional composition (we want an ordered permutation, not a permutation of an ordered list). As a relation, uplist? C id, where id is the identity relation on lists, and holds for x just when uplist x is true. A relation R such that R C id is called a corejexive (because a relation R satisfying id CR is a reflexive relation). More generally, p? is the coreflexive that holds for x just when the predicate p x is true. One can define coreflexives by translating the corresponding predicate, but it is usually more satisfactory to define them directly. In particular, one can define uplist? as a relational catamorphism uplist? = @nil, cons . ok?],
ok (a,x) = (‘db : b inlist x : a a b). The relation inlist : A + list A is the membership relation for lists. It is not immediately clear how to define the membership relation for an arbitrary datatype, but the matter was finally settled by Hoogendijk and de Moor in [9]. It would take too long to explain how to define inlist in the relational calculus, so we will just accept it. For the same reason, the following
formal definition
ok? = id n (outlo . (a /inZist”)
of ok? is given without
explanation:
outr).
This “point-free” style is typical in a number of presentations of the relational calculus (see [7]); at first sight it seems arcane, but one soon gets used to it and calculations without variables are significantly simpler. To define the relation perm we need the fundamental operation of taking the converse R” of a relation R, defined by xROy = yRx. Then we can define perm = bagify” . baggy, where bagzjj turns a list into a bag of its elements. Thus perm is defined using bags as an intermediate type: turning a list into a bag and then turning it back into a list gives a permutation of the original. The function bagify can be defined as a catamorphism bagify = (nilbag, consbag], where nilbag is the empty bag and consbag adds an element
to a bag.
R.S. BirdlScience of Computer Programming 26 (1996) 15-31
25
6. On fold and unfold Now let us mm to the formal definitions What follows
of fold and unfold in a relational
will be rather brief and incomplete
enough of the general idea comes across to stimulate thesis [ 161 is a good starting
in various
further reading.
point, as is [ 171 which was written
grammers. And, if you can wait long enough, a complete forthcoming text [4] to be published later this year. As we have seen in the case of lists and trees, whenever number of functions asserts the existence
setting.
ways, but it is hoped
account
Grant Malcolm’s for functional
pro-
will appear in the
one declares
a datatype
a
are brought into play. In part, declaring a datatype as an equation of an isomorphism between the types on the left and right. In the
case of lists this takes the form list A E 1 + (A x list A). The type 1 consists The type constructor can be rephrased
of just one member and serves as the source type for constants. x is Cartesian product, and + is disjoint sum. The right-hand side
as
list A E F(A, list A), where F(A,B) = 1 + (A x B) is a mapping from types to types. We can also use F as a mapping from functions to functions by defining F(f,g)
= id1 + (f x s),
where id, is the identity function on 1. A function having a dual role both as a mapping between types and a mapping between functions is, provided certain properties are satisfied, called a functor. The functor F defined above takes a pair of types or functions as argument and so is sometimes called a bzjiinctor. One property we require of a functor F is that if f :A c B, then Ff : FA +- FB. The other properties identity and composition rules:
are the
Fid = id F(f .g) = Ff .Fg. In the case of bifunctors the rules are, firstly, that if f : A c F( f, g) : F(A,B) +- F(C, 0); and, secondly, that
C and g : B c D, then
F( id, id) = id
F(f.s,h.k)=F(f,h).F(g,k). The Cartesian product constructor x can also be defined as a mapping between ftmctions: if f : A + C and g : B + D, then f x g : A x B t C x D is defined by (f
x g) (c>d) =
This mapping
(f c,g 0
satisfies
the identity
and composition
rules for bifunctors,
so x is a
26
R.S. BirdlScience
of Compuier
Programming
26 (1996)
15-31
bifunctor.
Similarly, the coproduct constructor + can be defined on functions: applied to a left component c, the function f +g : A +B + C+D returns f c as a left component of the result; dually, applied to a right component d, the value of (f + g) d is the right component
g d. Again, the identity
a bifunctor. The declaration
of list A also introduces
nil : list A +- 1 that serve to construct
if
rules are satisfied,
so + is
two functions
cons : list A + A x list A
and
lists. We can parcel these functions
[nil, cons] : list A t In general,
and composition
together as one function
F(A, list A).
f :A + B and g : A + C, then [f, g]:A t B + C applies f to left
components and g to right components. The function [nil, cons] has a special property, which captures the fact that we can define functions on lists by pattern-matching: given any function [c, f ]:B + F(A, B) there is a unique function h : B +- list A such that h . [nil, cons] = [c, f ].F( id, h). Unwrapping
this compact
equation,
we get two equations
h.nil = c h cons = f .(id x h). Thus, h = [c, f 1. In a general datatype
declaration,
which we can write in the form
data A 2-- F(A, data A),
(f1 :B + data A, taking an argument
the catamorphism unique
function
h.a=
f :B + F(A, B), is the
h satisfying
f .F(id,h).
As a consequence of the defining property of a we get that [cl] = id. For example, [nil, consj (which we should have written as ([nil, cons]]) but will not) is the identity function on lists. Less obviously, isomorphism, meaning CI. cP = id
and
it also follows from its defining property that c( is an
cP a = id,
where CC’,more usually written a-‘, denotes the inverse function of CC.The first id is the identity relation on data A, and the second is the identity relation on F(A,dutu A). Since CIis an isomorphism, we can move it to the other side of the defining equation for (f1.Thus, h = (f1 is the unique solution of the equation h=
f. F(id, h) . 8.
We will abbreviate
this by writing
flf 1 = (vh : h = f .F(id, h) . CC’).
R.S. BirdlScience
Finally, into one:
of Computer
we also obtain the extremely
Programminy
26 (1996)
15-31
useful fusion rule for combining
21
two functions
.[gJ=[hJ+f .g=h.Ff.
f
Now, let us extend all this stuff to relations. through when functions tion to functors
are extended
that are monotonic;
to relations,
Everything
we have said above goes
provided
only that we restrict atten-
that is, if R C S then FR c FS. It can be shown
that monotonic functors preserve relational that the expression FR” is not ambiguous.
converse, that is, (FR)” = F(R”). It follows In particular, since
(R . S)” = S” . R”, we get for a relation
R : data A t
F(A,data
[RI = (v/Y : X = R. F(id,X)
x”)
QRJ)”= (vX : X = ct. F(id,X)
R”).
,f : F(A,B)
Given a function
A) that
+ B the unfold operator
Kf] is defined by
KfI = tf"D". The (now) non-standard form [R, f ] that we have used previously for unfold on lists stands more properly for [p + !, f 1,where ! : 1 + B and f :(A x B) + B. With relations we also get two variants of the fusion rule: R.aSDCaTD~R.ScT.F(id,R) R Finally,
(SD >[Tj) writing
e R . S > T F(id, R).
@_X : X = &Y) for the least fixed point (under
relational
inclusion)
of 4, we get the following formalisation of the remark made in Section 3 about the simplification of the composition of a fold over a parameterised type data A with an unfold: (2) It is this transformation
that is behind Wadler’s
deforestation
algorithm
[29].
7. On the derivation of sorting algorithms The formal derivation and classification of sorting algorithms is not, of course, new (see e.g. [6,20]), but let us end with a brief sampler of the kinds of derivations we can accomplish with the above material.
28
R.S. BirdlScience
of Computer Programming 26 (1996) 15-31
7.1. Insertion sort Our first sorting algorithm
arises as a result of the following
three-step
development:
uplist? perm =
{expressing perm in the form [nil, add)} uplist? [nil, addD
>
{fusion} (nil, uplist? . add)
>
{supposing insert 2 uplist? add} (nil, insert]).
In outline,
we can express perm as a relational
to obtain a second catamorphism, phism. The relation
catamorphism,
use fusion with uplist?
and finally refine the result to a functional
catamor-
add for which perrn = [nil, add) can be defined by
add (a,x ity)
= x +[a] ii-y.
It can also be defined recursively
by
add = cons U cons . (id x add) swap (id x cons’),
(3)
where swap (a,(b,x)) = (b,(a,x)). We omit the proof of this fact, as well as most others in this section. The function insert that refines uplist? add can be defined by insert = (ok’ -+ cons, cons . (id x insert) swap . (id x cons’)), where ok’ (a, nil) = true ok’ (a, cons (b,x)) = (a[nil, uplist? . add] . F(uplist?). But this follows
quickly
from the monotonicity
uplist? is a coreflexive. The resulting is, of course, insertion sort.
sorting
of the functor algorithm,
namely
F and the fact that sort = @nil,insertl),
7.2. Selection sort Our second sorting algorithm
comes from the following
uplist? . perm =
{since perm = perm’ and uplist? = uplist?“}
development:
R.S. BirdlScience
29
of Computer Programming 26 (1996) 15-31
(perm . uplist?)” {fusion}
=
. cons. ok?)”
$nil,perm 2
{supposing
select C_ok? . cons’ . perm)
[nil, select’]’ =
{ anamorphisms} [isnil, select].
The result is selection
sort.
7.3. Quicksort Finally,
to derive quicksort
This inclusion
captures
we need the fact that if f is a function,
the fact that functions
We also need the coreflexive uptree? = {null,fork
map arguments
then f. f” C id.
to at most one result.
uptree? defined by
okt?],
where okt (x, a, y) = (‘v’b : b intree x : b < a) A (Vb : b intree y : a < b). Then we can argue along the same lines as in selection uplist? >
=
. flatten
. uptree? .$atten’
= flatten
. uptree?)
’perm
’(perm .jlatten
uptree?)”
flnull,perm
. join . okt?]”
split C okt? .join”
perm; converses}
. ~null,split”l)”
{ anamorphisms} flatten.
=
uplist?
{supposing Patten
=
.JEatten’ .perm
{fusion} Patten
2
: list A +- tree A is a function}
(since perm = perm’ and uptree? = uptree?“) jlatten
=
.$atten
{claim: jlatten
=
perm
{since flatten uplist?
sort:
Kisnull,split].
{introducing
mktree
= [isnull, split]}
Jlatten . mktree. We omit the proof of the claim, and the detailed justification
of the fusion step.
30
R.S. BirdlScience
of Computer
Programming
26 (1996)
15-31
References [I] R.C. Backhouse, P.J de Bruin, G. Malcolm, E. Voermans and J.C.S.P. van der Woude, Relational catamorphisms., in: B. Miiller, ed., Proc. the ZFZP TC2/WG2.1 Working Conf: on Constructing Programs from Specijications (I 991) 287-3 18. [2] R.S. Bird, J. Gibbons and G. Jones, Formal derivation of a pattern matching algorithm, Sci. Comput. Programming 12 (1989) 93-104. [3] R. Bird and P. Wadler, Introduction IO Functional Programming (Prentice Hall, Englewood Cliffs, NJ, 1988). [4] R. Bird and 0. de Moor, The Algebra of Programming (Prentice Hall, Englewood Cliffs, NJ, 1996), To be published. [5] T.H. Cormen, C.E. Leiserson and R.L. Rivest, Introduction to Algorithms (MIT Press, Cambridge, MA, 1990). [6] J. Darlington, A synthesis of several sorting algorithms, Acta Inform. 11 (1978) I-30. [7] P.J. Freyd and A. SEedrov, Categories, Allegories, Mathematical Library, Vol. 39 (North-Holland, Amsterdam, 1990). [8] A.M. Haeberer and P.A.S. Veloso, Partial relations for program development. in: B. Moller, ed., Construciing Programs from Spec$cations, Proc. IFIP TC2iWG2.1 Conference, Pacific Grove, CA, (1991), (North-Holland, Amsterdam, 1991) 3733397. [9] P. Hoogendijk and 0. de Moor, Membership of datatypes, Unpublished Draft, 1993. [IO] R. Hoogerwoord, The design of functional programs: a calculational approach, Ph.D Thesis, University of Eindhoven, 1989. [l I] J. Jeuring, Polytypic pattern matching, in: S. Peyton Jones, ed., Con$ Record of FPCA 1995. SZGPLAN-SZGARCHWG2.8 (I 995) 238-248. [I21 G. Jones and M. Sheeran, Circuit design in Ruby, in: Jorge” Staunstmp, ed., Formal Methods for VLSI Design (North-Holland, Amsterdam, 1990) 13-70. [13] D. King and J. Launchbury, Structuring depth-first search algorithms in Haskell, Proc. ACM Principles of Programming Languages, San Francisco, 1995. [14] J. Launchbury and S.P. Jones, State in Haskell, University of Glasgow, Preprint, 1995. [I51 G. Malcolm, Homomorphisms and promotability, in: J. Snepscheut, ed., 1989 Groningen Mathematics of Program Construction Conf (Springer, Berlin, Lecture Notes in Computer Science, Vol. 375, 1989) 335-347. [I61 G. Malcolm, Algebraic types and program transformation, Ph.D Thesis, University of Groningen, The Netherlands, 1990. [17] E. Meijer, M. Fokkinga and R. Paterson, Functional programming with bananas, lenses, envelopes and barbed wire, in: J. Hughes, ed., Proc. 1991 ACM Conf on Functional Programming and Computer Architecture, Lecture Notes in Computer Science, Vol. 523 (Springer, Berlin, 1991). [ 181 A. Mili, A relational approach to the design of deterministic programs, Acta Znform. 20 ( 1983) 3 155328. [I91 B. Miiller, Relations as a program development language, in: B. Mijller, ed., Constructing Programs fkom Specijcafions, Proc. ZFZP TCZIWG2.Z Conj, Pacific Grove, CA, 1991, (North-Holland, Amsterdam, 1991), 3733397. [20] B. Moller, Algebraic calculation of graph and sorting algorithms, in: D. Bjorner, M. Broy, I.V. Pottosin, eds., Formal methods in Programming and their Applications, Lecture Notes in Computer Science, Vol. 735 (Springer, Berlin, 1993) 3944413. [21] 0. de Moor, Categories, relations and dynamic programming, D.Phil. thesis, Technical Monograph PRG98, Computing Laboratory, Oxford, 1992; Also in Math. Strut. in Comput. Sci. 4 (1994) 33-70. [22] C. Okasaki, Simple and efficient purely functional queues and deques, J. Functional Programming 5, To appear. [23] C. Okasaki and G. Brodal, Optimal purely functional priority queues, J. Functional Programming, To appear. [24] G.C. Ponder, P.C. McGeer and A.P-C. Ng, Are applicative languages inefficient? SZGPLAN Notices 23 (1988) 1355139. [25] G. Schmidt and T. Strohlein, Relations and Graphs, EATCS Monographs on Theoretical Computer Science (Springer, Berlin, 1991). [26] R.E. Tarjan, Efficiency of a good but not linear set union algorithm, J. ACM. 22 (1975) 215-225.
R.S. BirdlScience
of’Computer
Proyramminy
26 (1996)
15-31
31
[27] R.E. Tarjan and J. van Leeuwen, Worst-case analysis of set union algorithms, J. ACM. 31 (1984) 245-281. [28] J.W.J. Williams, Algorithm 232 (heapsort), Commun. ACM 7 (1964) 347-348. [29] P.L. Wadler, Deforestation: transforming programs to eliminate trees, Theoret. Comput. Sci. 2 (1990) 461493.