KNOWLEDGE ENGINEERING: PRINCIPLES AND TECHNIQUES Proceedings of the International Conference on Knowledge Engineering, Principles and Techniques, KEPT2011 Cluj-Napoca (Romania), July 4–6, 2011, pp. 1–12
PATTERNS FOR DECOUPLING DATA STRUCTURES IMPLEMENTATIONS VIRGINIA NICULESCU Abstract. Design patterns may introduce new perspectives on the traditional subject of data structures. They introduce more flexibility and reusability in data structures implementation and use. We analyze in this paper some design patterns that can be used for data structures implementation, and their advantages. This analysis emphasizes how design patterns could be used in order to obtain implementation of the data structures based on storage independence.
1. Introduction Data structures [4, 11] represent an old issue in the Computer Science field. By introducing the concept of abstract data type, data structures could be defined in a more accurate and formal way. A step forward has been done on this subject with object oriented programming [13]. Object oriented programming allows us to think in a more abstract way about data structures. Based on OOP we may define not only generic data structures by using polymorphism or templates, but also to separate definitions from implementations of data structures by using interfaces. We may start from the definition of a data structure: A data structure is a group of data/elements, which has an organization defined by a structure and by a specific set of operations. Each data structure can be defined as being a concrete implementation of an Abstract Data Type based on a specific representation of the elements of the domain. Interfaces that describe the abstract data types could be defined: they describe only the operations, but the possible representations are dependent on them. Received by the editors: March 2011. 2010 Mathematics Subject Classification. 68P05. 1998 CR Categories and Descriptors. E.1 [Data]: Data Structures; E.2 [Data]: Data Storage Representation . Key words and phrases. data structures, design patterns, genericity, representation. c
2011 Babe¸ s-Bolyai University, Cluj-Napoca
1
2
VIRGINIA NICULESCU
Design patterns may move the things forward, and introduce more flexibility and reusability for data structures. In order to build a data structure that is extensible and reusable, it is necessary to decouple the intrinsic and primitive behavior of the structure from the application specific behavior that manipulates it. In order to achieve this, design patterns can be used [6, 10]. For generic data structures, templates have been used intensively, STL [12] library being the most known and used one. There are several design patterns which are used in this library, such as the very well known Iterator pattern, but also Adapter pattern which is going to be used here, too. Modern libraries of data structures are based on design patterns, so their understanding is now very important. Any kind of data structure of container type is formed by a number of elements which are usually of the same type. A specific container has properties and behavior which are not dependent on the type of its constitutive elements. A very common solution is based on templates or parametric data types: the parameter is the type of the elements stored in the container. The problem is that we impose by this to use only languages where the mechanism of creating templates is included. In C++ we have such a mechanism which allows us not to create a single class, but to specify only once the pattern for creation of some classes that are different only by the type of some parameters [4, 12]. The mechanism which was included in Java since JDK 1.5 [14] could be considered more efficient since just one class is created for each container. Also, the mechanism of parameterized Java classes allows the specification of a certain behavior of parameters (bounded polymorphism) [1, 5, 7]. In what it follows we will denote by Element the interface implemented by type of the generic elements of a data structure. This could be either a parameter of the classes (parametric polymorphism) or the superclass of all types of elements that could be used (type polymorphism). The interface Element defines operations for: assignment, equality testing, and copy creation (clone). 2. First Level and Second Level Data Structures The study of the different data structures emphasizes the fact that we can make the following classification: • first level or fundamental data structures; • second level data structures which are characterized by the fact that their implementations use first level data structures. The arrays, and linked representations for lists and trees are considered first level data structures. In order to implement a set or a map we can use an array, or a linked list or a tree; so sets and maps are examples of second level data structures. Figure 1 presents the UML class diagram for
PATTERNS FOR DECOUPLING DATA STRUCTURES IMPLEMENTATIONS
3
Figure 1. Storage interface, and some fundamental data structures - arrays, lists and trees. some first level data structures. We have considered dynamic arrays, lists that are implementation of an interface RefList for which the positions of elements in the list are of a reference type (singly or doubly linked, with dynamic or static allocation), and binary trees with a linked representation using nodes [9]. (A reference is considered to be any value that could be used in order to obtain another value; examples of references are: memory addresses (pointers), indices in a table, etc.) Their property that they could be used as storages for implementation of other structures is emphasized by the fact they all implement the interface Storage. The postcondition of the method add(e:Element) assures just the fact that the element e is in the storage; the postcondition of the method remove(e:Element) assures that one instance of the element e has been removed from the storage. The number of the elements in the storage is returned by the method getSize(), and we may obtain an iterator over the elements of the storage using the method getIterator(). 3. Patterns for Storage Independence In order to implement a second level data structure we have to start from the corresponding abstract data type corresponding to which we may define an interface, and then based on the possible representations to implement
4
VIRGINIA NICULESCU
Figure 2. Bridge design pattern. concrete classes. The process could be simplified by using the Bridge design pattern together with Abstract Factory and Singleton design patterns. 3.1. Bridge. Bridge design pattern decouples an abstraction from its implementation so that the two can vary independently The classes and/or objects participating in this pattern are: • Abstraction defines the abstraction’s interface, and maintains a reference to an object of type Implementor. • RefinedAbstraction extends the interface defined by Abstraction. • Implementor defines the interface for implementation classes. This interface doesn’t have to correspond exactly to Abstraction’s interface; in fact the two interfaces can be quite different. Typically the Implementator interface provides only primitive operations, and Abstraction defines higher-level operations based on these primitives. • ConcreteImplementor implements the Implementor interface and defines its concrete implementation. Generally, if we have different ways of representation or storage, for a data structure, we may separate the storage from the data structure using Bridge design pattern. Example 1 (Set). We may consider the case of Set data structure. The advantages of this separation is that we will have only one class Set, and we may specify when we instantiate this class what kind of storage we want, for a particular situation.
PATTERNS FOR DECOUPLING DATA STRUCTURES IMPLEMENTATIONS
5
Figure 3. Building sets using generic storage for representation. This solution for implementing sets uses a reference to a general storage. The diagram of the Figure 3 presents the details of this solution. Set – the new created data structure – could also be seen as a storage that can be used in other contexts, and because of this the class Set implements the interface Storage, too. Since the class Set uses a storage in order to store its elements, the constructor of the class Set have to be able to initialize this storage. A direct and simple solution –but not the best – would be to give to this constructor a parameter (of type storage) that could initialize the storage. The specific operations of the Set data structure are implemented based on the operations of the storage. Other examples may be considered for multi-sets, maps, dictionaries, . . . Restrictions There are two important restrictions related to any storage: • initially it has to be empty, and also • it has to be an unshared storage. Generally, the data structures are used in very many different contexts and because of this it becomes important to allow their creation in a flexible and dynamical way. These requirements could be achieved by using creational design patterns. More precisely, we will use Abstract Factory to create each special storage dynamically. Singleton design pattern assure the fact we have only one instance created for a certain type, and we have a global access point to it. Since we don’t need more than one instance of a specific factory class, Singleton design pattern will be used for each. 3.2. Abstract Factory. Abstract Factory design pattern provides an interface for creating families of related or dependent objects without specifying
6
VIRGINIA NICULESCU
Figure 4. Abstract Factory design pattern. their concrete classes [3]. From the cases in which it can be used, we emphasize the following: • a system should be independent of how its products are created, composed, and represented. • we want to provide a class library of products, and we want to reveal just their interfaces, not their implementations. The classes and/or objects participating in this pattern are: • AbstractFactory declares an interface for operations that create abstract products ; • ConcreteFactory implements the operations to create concrete product objects ; • AbstractProduct declares an interface for a type of product object; • Product defines a product object to be created by the corresponding concrete factory, and implements the AbstractProduct interface; • Client uses interfaces declared by AbstractFactory and AbstractProduct classes.
PATTERNS FOR DECOUPLING DATA STRUCTURES IMPLEMENTATIONS
7
Figure 5. Factories for creating arrays, lists, and binary trees. The concrete products, in which we are interested in, are the fundamental data structures. This means that we define factories for each type of these products; the method createStorage() returns an empty storage of a specialized type. The Figure 5 emphasizes this. For each concrete structure a corresponding factory is defined. The solution for implementing sets using Bridge but also Abstract Factory and Singleton is presented in the Figure 6. The constructor of the class Set receives a concrete instance of a factory type, and this is used for creating the storage.
Figure 6. Building sets using factories for creating different storages for representation.
8
VIRGINIA NICULESCU
Figure 7. Adapter-object design pattern. 3.3. Adapter. Another problem that could arise is when we have an already developed library for fundamental data structures that is not based on this framework. In this case the Adapter [3] design pattern can be used. Adapter design pattern allows the conversion of the interface of a class into another interface clients expect. Adapter lets classes work together that could not otherwise because of incompatible interfaces. The classes and/or objects participating in this pattern are: • Target defines the domain-specific interface that Client uses. • Adapter adapts the interface Adaptee to the Target interface. • Adaptee defines an existing interface that needs adapting. • Client collaborates with objects conforming to the Target interface. In order to adapt already defined structures to be used as storages the Adapter pattern is applied with following correspondences: -: Target is the interface Storage; -: Adaptee is the existing class that corresponds to the existing data structure; -: Adapter is the adapted data structure; -: Client is the new data structure that is going to be built using a general storage. Minimization of the obtained time-complexities The adaptation has to be done in a way that minimizes the time-complexities of the implementations of adapted methods. For example, in order to adapt a linked list to be used as a simple storage we may define the method add from the Storage interface by using the method addFirst from the list implementation which has a time complexity of Θ(1).
PATTERNS FOR DECOUPLING DATA STRUCTURES IMPLEMENTATIONS
9
Figure 8. Different types of Storages and the relations between them.
4. Specialized Storages An evaluation has to be done relative to the efficiency of the implementation of the operations add, and remove. For a general storage, the postcondition of the operation add specified just the fact that the parameter (of type Element) exists in the storage. In a similar way the remove operation assures the fact that one instance equal to the parameter has been removed from the storage. The implementation of the operation belongs of the class Set is based on using the iterator – that implies a time-complexity linear in the size of the storage. In order to improve this, a specialized storage has to be defined – SeachableStorage, which add a new method search. A searchable storage is a storage that is able to implement a searching operation with a time-complexity better than for a sequential search. Examples of searchable storages are a hash table, and different types of sorted data structures. If we want to create a set based on a searchable storage, in order to have an efficient implementation for the method belongs we need to allow the creation of this new kind of storage. The resulted set will be also a kind of searchable storage, and in order to make clear the difference between a simple set and a searchable set, a new class SearchableSet is defined as it is presented in the Figure 9. Both classes Set and SearchableSet implement the general interface ISet.
10
VIRGINIA NICULESCU
Figure 9. The details of the implementation of searchable sets. Another useful specialization of a storage is SortedStorage. This kind of storage is useful for the implementation of sorted data structures where elements are compared using an instance of type Comparator. Comparator is another design pattern which is very much used in relation with data structure implementation. It is a specialization of the Strategy pattern [3], because each comparison criteria can be seen as a strategy for comparing two elements. A SortedStorage is a specialization of a SearchableStorage since for sorted data structures we can define efficient searching operations. The Figure 8 presents the relation between these types of storages. The specification of the interface SortedStorage enforces the postcondition of the method getIterator by imposing the condition that the order in which the elements are iterated is based on the comparison criteria specified by the comparator. Based on this condition, the BinarySearchTree is a sorted storage iff the iterator returned by the method getIterator() is the inorder iterator. In order to implement a sorted set a similar approach to that used for searchable sets is used. A simple storage does not specify anything about the position of an element in relation with the other elements. If the elements positions are in a linear dependency relation (sequence) then we may consider another specialization – SequentialStorage. A single modification is necessary: to impose the creation of a “read-write” kind of iterator using the method getIterator(). This method is overwritten in any sequential storage and respects an enforced specification, namely the postcondition assures the fact that the returned iterator has the
PATTERNS FOR DECOUPLING DATA STRUCTURES IMPLEMENTATIONS
11
Figure 10. The interfaces Iterator and RWIterator. type RWIterator. In Figure 10 we have the UML diagram for the interfaces Iterator and RWIterator. The insertion operation based on RWIterator iterator can be used in order to insert a new element after the current element pointed by the iterator, and the method add of the storage will be used only for insertion on a specified position (usually the first). Examples of such kind of storages are lists. 5. Conclusion and Further Work By separating the concrete representation of a data structure by the behavior of its type we introduce a new level of indirection and so a new level of abstraction. Using this, we are able to implement data structures based on different fundamental data structures without creating more than one class. So, a new level of genericity is introduced, too. Storage interfaces also introduce a classification between data structures. We emphasize the fact that each data structure could be used as storage for another data structure or as a generic storage in a generic program. A further analysis has to be done related to the completeness of this classification. As it is already known, design patterns are very important mechanism for increasing the level of abstraction in programming. In order to achieve storage independence we have used design patterns as Abstract Factory, Singleton, Bridge, Comparator and Adapter. Modern programming uses environments that simplify very much the work of a programmer. These environments offer a very high level of abstraction, so that the program is build by specifying how some basic components are composed. In this context, a high level of abstraction of the data which are manipulated by these programs is also expected.
12
VIRGINIA NICULESCU
We may think of a scenario in which the programmer will only specify the type of the needed storage: simple, sequential, searchable, or sorted. For choosing the efficient concrete types some automated mechanisms may be used – very probable based on artificial intelligence techniques. Acknoledgement: This work was supported by CNCSIS - UEFISCDI, project number PNII - IDEI 2286/2008 References [1] L. Cardelli, P. Wegner. On understanding types, data abstraction, and polymorphism. ACM COMPUTING SURVEYS, (1985). [2] H.E. Eriksson, M. Penker. UML Toolkit. Wiley Computer Publishing, 1997. [3] E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns: Elements of Reusable Object Oriented Software, Addison-Wesley, 1995. [4] E. Horowitz. Fundamentals of Data Structures in C++. Computer Science Press, 1995. [5] B. Meyer. Genericity versus Inheritance. Proceedings of OOPSLA 1986: 391-405. [6] D. Nguyen. Design Patterns for Data Structures. SIGCSE Bulletin, 30, 1, March 1998, 336-340. [7] V. Niculescu. Teaching about Creational Design Patterns, Workshop on Pedagogies and Tools for Learning Object-Oriented Concepts, ECOOP’2003, Germany, July 2125, 2003. [8] V. Niculescu. On Choosing Between Templates and Polymorphic Types. Case-study., Proceedings of “Zilele Academice Clujene”, Cluj-Napoca, June 2003, pp.71-78. [9] V. Niculescu. A Uniform Analysis of Lists Based on a General Non-recursive Definition. Studia Universitatis “Babes-Bolyai”, Informatica, Vol. LI, No. 1 pp. 91-98 (2006). [10] V. Niculescu, G. Czibula Fundamental Data Structures and Algorithms. An ObjectOriented Perspective. Casa C˘ arc tii de S ¸ tiint¸˘ a, 2009 (230 pg.) (in Romanian). [11] D.M. Mount. Data Structures, University of Maryland, 1993. [12] D.R. Musser, A. Scine. STL Tutorial and Reference Guide: C++ Programming with Standard Template Library, Addison-Wesley, 1995. [13] Bruno R. Preiss. Data Structures and Algorithms with Object-Oriented Design Patterns in Java, Wiley Computer Publishing, 1999. [14] Generic Java, http://download.oracle.com/javase/1.5.0/docs/guide/language/generics.html Department of Computer Science, Babes¸-Bolyai University, Cluj-Napoca E-mail address:
[email protected]