Aug 8, 1991 - 3.5 Performance penalty for multiple inheritance support : : : : : : : : : : : : : 20 .... Furthermore, there exist les Cadmus.change in the subdirectories src ... return 1; // all entries arr equal .... tion assumes that stream classes are somehow provided. .... const nil data member of class Object is assigned NULL.
Porting and Comparing NIHCL and LIBG++
Thomas Kunz
? TI{2/91 August 8, 1991
Institut fur Theoretische Informatik
Porting and Comparing NIHCL and LIBG++ Thomas Kunz Institut fur Theoretische Informatik Fachbereich Informatik Technische Hochschule Darmstadt August 8, 1991
Contents
1 Introduction 2 The LIBG++ class library 2.1 2.2 2.3 2.4
Overview : : : : : : : : : : : : : : : : : : : : : : : : Provided classes : : : : : : : : : : : : : : : : : : : : Modi cations : : : : : : : : : : : : : : : : : : : : : : The eciency of dierent container implementations
1 2 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
3 The NIHCL class library 3.1 3.2 3.3 3.4 3.5
Overview : : : : : : : : : : : : : : : : : : : : : : : : : Provided classes : : : : : : : : : : : : : : : : : : : : : Modi cations : : : : : : : : : : : : : : : : : : : : : : : The vector classes : : : : : : : : : : : : : : : : : : : : Performance penalty for multiple inheritance support :
11 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
4 Comparing the two class libraries
4.1 Methodological dierences : : : : : : : : : : 4.2 Functional dierences : : : : : : : : : : : : 4.3 Some experimental results and experiences :
5 Conclusions
3 3 5 6
11 12 13 18 20
25 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
25 25 26
29
1 Introduction One goal of object{oriented programming is to achieve greater programmer productivity through software reuse. The idea is that applications will be build by combining and modifying well{designed reusable components instead of starting from scratch. One of the means to achieve this goal is the development and distribution of class libraries. A number of 1
such libraries have been published recently. Booch and Vilot[3] for example present work on the C++ Booch Components, a commercially available object{oriented library providing a collection of useful data structures. Faust and Levy[4] developed an object{oriented library to support user{level threads. These libraries dier with respect to the provided functionality. So in general an application programmer will have to combine several libraries to satisfy all his needs. Some interesting questions concerning the compatibility between independently designed libraries arise and are dicussed in Berlin[2]. Another interesting application of class libraries is the promotion of portable software development. Having the same core set of class libraries available on dierent machines, applications that are only using these libraries can easily be ported. But currently available class libraries show all the same de ciencies with regard to this goal. There is no class library available that interfaces a major part of the runtime system. Two class libraries for C++ are available from the Free Software Foundation: NIHCL (National Institutes of Health Class Library)1 and LIBG++.2 We ported both libraries to our Unix workstations, using the GNU C++{compiler g++, version 1.39.1. An interesting aspect of these two libraries is the dierence in their design. And according the Koeing[7], library design is essentially language design. LIBG++ follows a forest approach, in which a number of independent classes are provided. NIHCL, on the other side, follows a hierarchical approach, deriving all classes from a common ancestor.3 This paper reports our experiences with porting and using these two class libraries. The rest of the paper is organized as follows. The next two sections give a short overview of the two libraries, starting with LIBG++. Discussing the complete functionality of both libraries in depth is beyond the scope of this paper.4 Rather, we concentrate on a description of the problems encountered while porting the libraries and the realized solutions (or patches). The fourth section compares the two libraries, both on a more theoretical basis and by actually writing some simple test programs and measuring their execution times. Section ve summarizes our ndings.
2 The LIBG++ class library This section shortly describes the LIBG++ class library. After a general overview the provided functionality is discussed. As already mentioned, no in{depth description of single classes or methods will be given. Rather, a high{level classi cation scheme has been developed and will guide the discussion of the provided functionality. The third subsection describes the necessary modi cations to port LIBG++ to our machines, PCS Cadmus workstations with MC86020 processor, running a Unix system V derivate. The section ends with some performance results, comparing dierent implementations of the same LIBG++ classes. see Gorlen[5] sea Lea[8] see Lea[8, p. 11] 4 LIBG++ is documented in Lea[8], NIHCL is described in Gorlen[5] and Gorlen et al.[6]. 1 2 3
2
2.1 Overview
LIBG++ is the GNU C++ library, providing some kind of basic support for C++ programmers. The library is similar to, but not identical with AT&T's libC.a. The main deviations between libC.a and LIBG++ stem from dierences between the AT&T C++ translator and the GNU C++ compiler. Nevertheless, it is claimed that the vast majority of C++ programs compile and run under both libraries with no visible dierence.5 LIBG++ was designed along the forest approach of object{oriented programming: provideing a collection of free{standing classes that can be mixed freely. This forest approach is somewhat restricted to simplify the handling of LIBG++ by enforceing some uniformity across all classes. Lea lists a number of stylistic conventions which are adhered to by all classes, see Lea[8, p. 13]. Among these are for example nameing conventions (class names begin with capital letters, except for istream and ostream for AT&T C++ compatibility). Another common convention is that all classes use the same simple exception handling strategy. Other, not explicitly mentioned restrictions of the underlying forest approach appear in the use of method names. Deleting an element from a container class (sets, bags, stacks, queues, etc.) is always done by a method called del which takes the element to be deleted as parameter. Using the same name for methods with similar functionality across dierent classes is done throughout LIBG++, cf. the methods empty, length, or clear. Nearly all classes support a method OK() which checks the representation invariant of a class and calls the class's error method in case of failure. A third attempt to enforce some uniformity across dierent classes is the common traversing scheme for elements in container classes. In LIBG++, access methods have been standardized via the use of pseudo{indexes called Pixes. Every class that supports the use of Pixes contains the methods first and next to iterate over all elements.
2.2 Provided classes
To gain an overview of the provided functionality, the high{level classi cation scheme shown in gure 1 will be used. The functionality provided by a library is divided in ve categories. Internal functions/classes are only used within the library and can not be accessed/used by an application program. A second group of library classes deals with resources. A rst subgroup in this category provides an interface to the I/O system ( les, streams, terminals). A second subgroup handles processes. The third category is an example for the support of a speci c application area: statistics. Classes in this category are not only (but mostly) random number generators. Classes helping with the collection, analyses and presentation of statistical information also fall in this category. The fourth category, simple types, adds basic data types to the builtin data types of a programming language like String, Time, or Rational numbers. The fth and last category consists of the already mentioned containers. Classes in this category handle collections of elements like Stack, Bag, Set, Vector, or Queue. This category is sometimes viewed as the most important 5
see Lea[8, p. 11]
3
internal
used within lib only
resources
interface to system resources (I/O, processes)
statistics
random number generators, etc.
simple types
non{builtin types like rational, date, etc.
containers
collections of entities like stacks, queues, sets
Figure 1: Classi cation scheme support for application programmers. Libraries like the already mentioned, commercially available C++ Booch Components6 consist entirely of classes in this fth category. The classi cation scheme of gure 1 deviates from the one used by Lea to give an overview of LIBG++. The scheme presented here has the advantage that it can easily be reused for provideing an overview of the NIHCL class library, the second class library of interest in this paper. This facilitates the comparison between the two libraries in the following sections. Figure 2 lls in this classi cation scheme with the functionality provided by LIBG++. For each of the ve categories, some classes of LIBG++ that belong to this category are listed. internal
dynamic memory allocation
resources
les, streams, curses
statistics simple types containers
dierent random number generators data collection and analysis integer, rational, complex pix (iterator over containers) obstack, allocring, bitset, bitstring, vectors plex, stack, list, queue, set, bag, map
Figure 2: Classifying LIBG++ Internal functions manage the dynamic memory allocation. Resource handling is re6
see Booch and Vilot[3]
4
stricted to I/O interfacing. LIBG++ provides the standard stream classes described in Stroustrup[10] or Weiskamp and Flaming[11] as well as Files and Curses. The statistics category includes a large amount of classes. There exist a number of random number generators with dierent underlying distributions (cf. Binomial, Geometric, Normal, DiscreteUniform, Poisson, etc.). Futhermore, LIBG++ provides two classes for the collection and analyses of statistical data, SampleStatistic and SampleHistogram. In addition to the builtin types, LIBG++ de nes some more simple types: Integer, Rational, Complex, String, Pix. The Rational class for example supports rational numbers and algebra, Pixes are the already mentioned pseudo{indexes to iterate over containers. The fth category, containers, consists of three distinct groups of classes. The rst group containes classes like Obstack, Bitset, or Bitstring. These classes have a xed internal representation and work on elements of a de ned type. The class Obstack for example works on Strings only. The second group of classes consists of pseudo{generic classes that have a xed internal representation like Vector or List. These container classes are homogeneous, but it is possible for example to use a vector of integers in parallel with a vector of reals. To support this pseudo{genericity, LIBG++ provides a number of les as the basis for generating container classes for a certain element type. These les are used to generate the desired header and source les via a supplied shell script that performs simple textual substitution, see Lea[8, p. 17]. As a consequence, these classes are not part of the generated archive libg++.a but have to be created for every speci c application. The third group of classes in containers are classes like Stack, Queue, Bag, or Set. These classes are not only pseudo{generic as the ones in the second group, but are also provided with dierent underlying implementations. So it is up to the application programmer to select the implementation of a speci c container class. Similar to the classes in the second group, these classes are not part of the archive le lib++.a but are provided as a number of text les from which the speci c classes have to be generated for each application by textual substitution. In summary, LIBG++ supports primarely interfaces to the I/O system, statistical applications, and container classes. The container classes are, for the most part, pseudo{ generic and rely on textual substitution to generate the appropriate container classes for each application. Furthermore, a number of container classes have dierent underlying implementations to choose from.
2.3 Modi cations
We ported version 1.39.0 of LIBG++ to our Unix workstations, using the GNU C++ compiler g++. Not all of the source in the distribution from MIT has been ported, only the core part of the library, consisting of the subdirectories src, the library source, tests, some test programs, and g++--include, which includes C++{compatible system header les, the header les for the LIBG++ classes, and the basic set of les for the pseudo-generic classes.
5
Since LIBG++ is intended to be used as the basic library for g++, it was relatively easy to port the library to our machines, having g++ up and running. Only a few modi cations to the header les and the sources have been necessary. These modi cations are marked with comments starting with Cadmus change:, so they can easily be retrieved with the grep command. Furthermore, there exist les Cadmus.change in the subdirectories src and g++--include, listing all modi cations to les in the respective subdirectory.
2.3.1 Modi cations to the source
Three modi cations to the source are necessary. In le timer.cc, the integer variable hz is de ned and initialized with the value returned by the system call hertz(). This is necessary because the later used system call times() expects an integer variable hz to be de ned and initialized as described.7 The second modi cation occured in le delete.cc, which rede ned the delete operator. The system call free(ptr) dumps a core when invoked with a NULL pointer. So we checked the value of ptr for NULL before invokeing free(). The third and last modi cation happened in le bcopy.c. This le implements a special bcopy version for Unix system V operating systems. When trying to get the address of the dest pointer, indexed by count (dest = &dest[count]), g++ complained about "dereferencing pointer to incomplete type", since dest is declared as void*. A simple x to this problem was to add count to the value of dest: dest = dest+count. The same modi cation is necessary for an assignment to the pointer source.
2.3.2 Modi cations to the header les The modi cations to the header les are mostly necessary to de ne the correct machine and operating system speci c constants. The les math.h and values.h for example contain constants describing the maximal and minimal representable oating point numbers and related functions. In le stdio.h, the system-dependent structure of iobuf is de ned. File std.h contains C++{compatible prototypes of system calls and standard library routines. Here the declaration of a prototyp for the system call flock is dropped because this function does not exist on our machines. And nally, le ctype.h rede nes the preprocessor macro ctype to ctype on Unix system V machines, which is not true for our Unix system V derivate. We therefore commented out this rede nition.
2.4 The eciency of dierent container implementations
As described above, a group of container classes is provided with dierent underlying implementations. Some smaller programs have been written to measure and compare the performance of these implementations. The following classes have been examined: Stack, Bag, and Set. All example programs operate on elements of the following class: 7
This implementation of times() on the PCS Cadmus is very unusual.
6
class address { char name[40]; int zip_code; char city[20]; int house_number; char street[20]; public: address() { strcpy(name,"Thomas Kunz"); strcpy(city,"Darmstadt"); strcpy(street,"Alexanderstr."); zip_code = 6100; house_number = 10; } address(int i) { strcpy(name,"Thomas Kunz"); strcpy(city,"Darmstadt"); strcpy(street,"Alexanderstr."); zip_code = i; house_number = 10; } address(address& n) { strcpy(name,n.name); strcpy(city,n.city); strcpy(street,n.street); zip_code = n.zip_code; house_number = n.house_number; } operator=(address& n) { strcpy(name,n.name); strcpy(city,n.city); strcpy(street,n.street); zip_code = n.zip_code; house_number = n.house_number; } operator==(address& n) { return (!strcmp(name,n.name) && !strcmp(city,n.city) && !strcmp(street,n.street) && zip_code == n.zip_code && house_number == n.house_number); } operator n.zip_code) return 0; if (zip_code < n.zip_code) return 1; if (house_number > n.house_number) return 0; if (house_number < n.house_number) return 1; return 1; // all entries arr equal } void set_zip_code(int i) { zip_code = i; } int get_zip_code() { return zip_code; } int hash() { return zip_code * house_number; } }; typedef address* test_object;
This class rede nes the equality operator == and the less-or-equal operator