Department of Computer Science ... course on data structures and algorithm% one topic among many ... Computer Sciencv topics, includino: algebraic proof by ...
The Versatile List: A Pathway to Abstraction John Hamer and Adriana Ferraro Department of Computer Science Unlvemity of Auckland Auckland, New Zealand {J.Hamer, adriana}~cs.auckland.ac.nz
Abstract The l'mmhle "list" is usually pmsenmd early in a first course on data structures and algorithm% one topic among many, generally considered less interesting than trees graphs. We believe the list deserves better, and show how the list can be used to bring together a wide variety of Computer Sciencv topics, i n c l u d i n o : a l g e b r a i c proof by induction, abslzact data types, refw~ion, and generic programming. The emphasis is on developing abstraction and design ~ I 1 ~ applying both theory and engineering considerations, ultimately azriving at an unexpectedly powerful framework. 1 Introduotlon The "list" data model is a popular topic for starting a s c c o l l d c o u r s e i n p r o g r a ...... in~.
Linked
liStS p l x ) v i d c a n
¢xc~llent introdu-don to the use of pointers, a n d lists the basis for the obfigatory sorting and searching algorM'n'-~, As the simplest "self-similar" data type, the list is also a natoral candidate for introducing topics such as algebraic dam types And recursion. The list can be further used to motivate concepts from gener/c progra,.....ing. The principle is simply stated: rather than working directly with a listimplementation, an interface is used to hide the demik. List algorithmq written to use this interface can be shown to work uniformly on a variety of list implementations. All looks to be wen until array-based lists are considered. Suddenly, the principle breaks down; the interface that worked so well for llnh-d lists simply does not fit array-based. This realisation motivates a re-design of the interface, and the resul~ng abstraction proves to be unexpectedly powerful.
The topics presented in this paper are designed to give students the opportunity to discover certain Izuths about P~ion tD makc di#ml or Imrd vopim of all oTpmrlof this w~rk fro. personal or cla.un'eomrise is granted without fee provided that copies me not made ~ d i l u t e d for profit or connnercinl advantq~, and that copies bear this notice and the fifllcitation on the tim page. To copy otherwle, m repubfish, m pod on lm'~m's or m recKm.~m¢ m
~ n x l ~ pro"spcdflcper~monmVoraee.
ACE 2000 12/00 Melbow-ue, Am~'alh O 2000 AC~I i ~ 8 ! 13-271o9H~0012 ... $5.00
124
lists (and, by inference, other data types), and the interaction between theory and practice, including: Lists can be understood in mathematical terms, a.a subjected to algebraic reasoning and proof by induction. The mathematical theory of listscan guide the design of an abstract data type (ADT).
The ADT suggested by the theory has a natural irnplemeatafio~ in tixis case, the linked lJ~L One successful implementation does not an ADT make. However, some alternative implementations change the complexity of the ADT operations, and this m~k~s them useless in practice. Complexity is understood as a necessary property of an ADT, and is not just an implementation concern. Observation of code similarities can lead to a diff~nt, morn powerful ADT, umelamd to any of the mathematical theory. The new ADT proves to be a powerful metaphor, giving rise to truly generic code. 2 Introducing the List through Theory We advocate presenting a mathematical formalisation of the list data model before any code is written. This approach has several benefits. In particular, a key prerequisite to the development of abstraction skills is the ability to rehte different notations for the same concept The mathematical formalism provides a suitably different notation fi-om program code. Later, by carefully deriving progr~mming code fzom the mathematical formalism, the equivalence of the two notalinns is made evident
T h e o r e m 1: CConcatvnation is an associative operator. I.e.,
For e~ample, Axiom 1: The empty list (z) is a zero for concatvnation (denoted e )
V k, I, m: k e 0 • m) = ( k e I) • m The proof is by structural induction on k:
Vl:s$1ffilffil~¢
Basis: I f k is 8, then: ee(lem) ffi I • m ffi (¢ • 1) 6) m
Axiom 2: Non-empty lists distdbum over concatvnafion V h, t, h cons(h,t) ~ 1 = cons(h, t 6) 1)
(by Axiom 1) (by Axiom 1)
Inductive step: k is cons(It,t) for some list t such that V 1, n~te0em)f(teDem Then
cons(h,t) • 0 • m) = ~ s ( h , t • ( l e m)) = co..~h. (t • I) e m) = consCa, t • 1) • m ffi(oons(h, 0 • ]) • m
(by Axiom 2) (by induction hyp.) (by Axiom 2) (by Axiom 2)
These axioms can be used to prove theorems involving lists. Figure 1 illustrates the mathematics required.
3 From Algebraic Definltlon to ADT The algebraic definition of a ]ist can be used as the specifica~on of an abstract dad type. In Java or C-H-, this can the form of a class with two List conslnlctors: class List { LAst () //-- construct an e m p t y list
Axioms for the length o f a list can similarly be defined:
List( Object head, List tail ) _ //-- construct a n o n - e m p t y list
Zength(s) = 0
A "type test" can be added to the ADT to recosni~e which list conslructor was used to create a list:
length(cons(At)) = 1 + length(t)
b o o l e a n i s ~ - T t y () //-- test w h i c h c o n s t r u c t o r b u i l d this list
Theorem 2:length(1 ~ m) ffi length(I) + hmgth(m)
Again, the proof isby structuralinductionon k.
Two "accessor" functions are motivatvd by the need to provide access to the head and tail o f a non-empty list.
Figure 1: A sample Inductlve p r o o f
Object head ( ) _ //-- r e t u r n the head of a n o n - e m p t y last
A connection can also be made between a proof by sUuctural induction and recursion. Studenm lrad.ifio-.11y have difficulty uacl~x~iaading these topics, but we have found that teachin~ them fi'om several perspectives works well
o
The "axioms" o f concatenation and length presented earlier can now be encoded as functions: public static List concat ( List i, List m ) { if(1.isEmpty() ) EeturIl zn; else return n e w List(l.head(), co~cat(l.tail(), m));
A List has a simple, elegant algebraic definition: 2. 3.
}
List tail() _ //-- r e t u r n the tail of a n o ~ - e m p t y list
an empty list is a list; an element foilowvd by a fist is also a list; (nothing else is a list).
}
We use e to denote an empty list, and cons(h,t) to denote an element h followed by a fist L With the addition of a few axioms, properties of the list data model can be explored with mathematical precision.
public static int length( List 1 ) { if ( 1 . i s E m p t y ( ) ) r e t u r n 0; else r e t u r n i + leng1:h(1.tail()) ;
4
From ADT to I m p l e m e n t a t i o n Of" course, for these functions to be of any use, the A D T be given an implementation. The first ;n',~lementation is kept deh'berately simple; more sophisticated impl-
125
class AList { private Object _err [] ; public A L i s t ( ) { _err - new Object [0]; public AList( object head, AList tail ) a r t - new Object [ tail._arr.lenc~h _err [0] - head; System. arraycopy (tail . s i r , 0,_art, I, tail._arr.length)
Benchm.rking exercises should reveal a constant time speed i~.,:,~ovement in this optimised version of length. } { + 1
;
} public boolean leE=pry( ) { return _art. lenc~h -- 0;
An attempt at reducing the memory requirements of the list is undertaken at this stage. An array is suggested as a more compact representation, as it e | i m i n s ~ S the need to store the tail pointer with each d e m e n t The code is a little more complex, and most students are able to a~rive at a variant of figure 2.
} public Object head( ) { return _art[0]; i public AList tail( ) { AList newa - new AList( ); news. art - new Object[_arr.length - i] System. arraycopy (_art, i,
news
8z'r
0,
newa._arr, length) ;
}
return news;
}
The code could scarcely be less efficiem. Benchnmrking the list functions using AList reveals a large, asymptotically worsening perforr~-ce. Try as we might, there is no good way of following the List interface with arrays. This exercise is l~sented as a "successful failure", demonstrating that complexity is nssociated with an abstraction, and not simply an implementation. The sentiment echoes Alex Stepanov, designer of the C++ Standard T~LLt,late Library: "It was comn~uly assumed that the complexity of an operation is part of imi~lementation and that abstraction ignores complexity. One of the things that is central to generic programming as I understand it now, is that complexity, or at least some general notion of complexity, has to be a s s o c ~ d with an operation." [2]
Figure 2: Sample code for an array-based list
ementations (such as a doubly linked list with a sentinel) are set as exercises. class List { private Object head; private List _tail; private boolean _empty; public List() { _empty - true; } public List( Object head, List tall )
}
_ m a d - head; _tail - tail; _empty - false;
boolean isEmpty() { return e m p t y ; } Object: head() { return h e a d ; } List tail () { return .~ail ; }
} The preceding code is a fully functional linked list data type in Java, ~ e d as a nattwal Frogression from the form.1 definition of a ~st
6
O b s e r v i n g Patterns Leads to an Abstraction next s t y in our duelopment of the L ~ is based on an obson,ation of the form of" the code used wben worldng with an array-based list The code for sequ~tial]y processing an array takes the following prototypical form: for( int i - 0; i < arr.length; process( art[ i ] );
This form bears a strong similariW with the (non-recursive) linked ~ code: for(List i = first; ! i.ieEmpty(); process(i.head() );
public static int len91:h( List 1 ) { int len - 0; for( ; 1.icE,pry( ); 1 = 1.tail( ) ) len++; return len;
i = i.tail())
Comp='isongives the fonowingcorrespondences: Int i - 0 i < arr.len~ch ÷+I Arr[ i ]
5
Efficiency Considerations It is usual to discuss the concepts of asymptotic ..d constant factor performance improvements at some stage, and ~ ~ provides ample opponmuties to do so. For reasons ~mt wiU become apparent shonly, w e smut by ~ng ~ e ~ s i v e ~ func~ons into loops. For example, • e len~h funCdon is re-wnnen using an acc--.,1-tor in the usual way.
++i )
LiSt i - first ! i.isEwpty( ) i - i.tai1( ) i.head( )
While capturing this a b s ~ i o n would Imve been difficult in the days of C and Pascal, most modern progr~ ...... inE languages provide good support For e~am~le. Java has interfaces, C++ mmp]ates, and SML sjzn.tures. Adopting the C++ S~ndard Templar Library term "iterator", we arrive at this Java interface:
)
126
interface Iterator { public boolean hasMore () ; public void advance () ; public Object get();
}
class IntegerRange implements Zterator { private int _sin, _max ; public IntegerRange( int sin, int max ) = rain; ~ = max;
}
Java devotees will recoc,n i ~ this interface as "Enumeration". We note that the term "iterator" has been adopted in the Java 1.2 collections framework.
} Using this interface, we can write a completely "generic" length function: public static length( Iterator i ) { int solar - 0; for( ; i.hasMore( ); i.advance() sofar++; return solar;
}
Iterator is easy to implement for all kinds of lists. For a 1/=ked list, we have class ListIterator implements Iterator { private List 1; public ListIterator( List 1 ) { 1 - I; } public boolean hasMore () { return ! _i. isE~pty () ; } public Object get() { return _l.head(); public void advance() { _1 = _1.tail(); }
}
And for an array, class Arraylterator implements Iterator { private Object _a [] ; private int i ; public ArrayIterator ( Object a[] ) { _a - a; } public boolean hasMore () { return _i ~ a . l e n g t h ; } public Object geL() { return _a[ _i ]; } public void advance () { ++_i; }
Putting it all together:
Considerable scope exists for setting exercises that invite creative implementations of the Iterator interface, such as generating unbo--~,~t sequences, modifying existing iterators, etc.
Experiences and Reflections Our approach to teaching lists at the University of Auc~la.d has evolved over the past four years, starting (.mbifiously]) with Pascal, shifting to Java in 1998. A modified version of the course was also taught at Upp~1. University in the spring 2000 semester, using Sta,~d.rd ML. The material is taught in approximately one third of a semester (h.lf-year) long course. 8
h would be fair to say that most students find the material challen~ng. There is little opporttmity for ~ - ~ "fun" exaraples in this section of the course, although we do try to rrmko up for this lapse elsewhere (course evaluations are emphatic that the students do have fun). We have persevered with our belief in the iml~.~iance of abstraction, and over the years have successfixfiy set assitouuent and exam questions that test increasingly sophisticated abstraction drills.
We find it interesting to note that nothing in the presrmtation of the material is inherently =object-oriented" [3]. The fact that the material has been presented in Standard ML testifies to this, but even in C++ the "template" m~Thani~rrl is o ~ n preferable to using vh'tual classes. We have found it possible to delay introducing any discussion of object oriented programmi.E, a topic we believe to be best covered in the context of software engineering.
public static void main( Strin 9 args[] ) { Integer arr[] - new Integer[ 3 ] ; List ii - new List(l, new List(2, new List(3, List()))); System.out.println(len~ch(ArrayIterator(arr) ) ) ; 9 System.out.println(lenc/ch(Listlterator(ll) ) ) ;
}
7 Furthering the A b s t r a c t i o n The Iterator abstraction is open to a surprising variety of uses. For example, a list of the integers between two bounds can be constructed by ~irr~.ly storing the bo, md¢"
public boolean hasMore ( ) { return ~ i n c- max; } public Object get( ) { return new Integer(m4-) ; } public void advance( ) { m ~ , + + ; }
Summary "Computer Science is the science of abstraction" [1]. When those words were written, the ideals of genetic programm/-g were yet to reach widespread adoption. Even now, the majority of textbooks on dam structures and algorithms are content to repeat basic algorithms time and again with each implementation. This is no longer sufficient. Abstractions of the kind presented here now form the core of standard software frameworks, such as STL [5] and the Java collections h'brary [4]. The challenge to Computer Science educators is to instill a deeper undersra.d~.~ of pmgrammi.g to enable students to contn'bute effectively in this c b a - ~ - S environment.
127
We hope this paper has conm'buted to this goal by offering specific suggestions on how a diverse set of topics can be presented in a coherent and satisfying way. References [1] Aho, A. and Ulhnan, J. Foundations of Computer Science (1995), Computer Science Press
[2]
[3] Decker, R. and Hirscfield, S. The top 10 reasons why object-oriented programming can't be taught in CS1, Proc. 2~ h SIUCSE Technical Symposium on CS Education (1994), 51-55
[4] MageLang Institute, Introduction to the collections fi'amework, htt~://develol~er.iava.sun.com/develover/
rmllneT~'ainin~/cOHectio~s/COI]ectJo~ho:nl
A1 Stevens interviews Alex Stepanov, Dr. Dobb's Journal (March 1995)
[53 Mussel', D. Sami; A., and St~vanov, A. StI Tutorial & Reference Cru/de (1996), Addison-Wesley
128