A lightweight class library for extended persistent ...

A lightweight class library for extended persistent object management in C++ Jens- Uwe Dolinsky, Thorsten Pawletta University of Wismar Department of Mechanical Engineering/ Process and Environmental Engineering. Philipp- Müller- Str., P. O. Box 1210, D-23952 Wismar, Germany Email: [email protected], [email protected]

Keywords: Modular-hierarchical, Persistence abstraction, Persistent modelling - Data type orthogonality

Abstract When applications should be able to keep their data structures persistent, an additional effort to transient modelling is necessary for realizing a suitable persistent storage. The most convenient approach is the usage of a persistent programming language (PPL), which offers internal mechanisms for storing and loading data transparently for the application programmer. An alternative to a PPL is an universal programming language, which is extended by persistence concepts for any data types. This paper introduces an extended generic mechanism, which abstracts necessary functionality for realising persistence of any C++ object structures. At first general problems of the persistent storage and the motivations for this approach are discussed. The design aspects and the reasons for special features are described in details with the data and implementation insides. Finally the persistent data structure is sketched and the integration of the library in three different example programs is presented.

Introduction The lifetime of data values from their creation until their deletion is a rate for their persistence. This period can be very short, e.g. temporary results of calculations, or very long if

data outlives versions of persistent support systems (see also [2]). In this context computer data are classified in short-term data (transient main memory data) and long-term data (program lifetime independent data or persistent data). One of the major problems is the different semantic of short-term and long-term data, which causes the requirement of different access methods. Long-term data exist outside of and independent from the creating program in external files respectively databases. For processing within an application the long-term data has to be converted into directly accessible transient formats and after the evaluation in persistent formats again. In the worst case data has to be converted between different data models (e.g. relational ↔ object oriented) which is often inconvenient and responsible for inconsistencies and information loss. For the application programmer it means, that he has to consistently maintain two formats (transient ↔ persistent) of the same data, which can be a hard job. One solution of the problem is the integration of data persistence supporting concepts into a programming language. In that way the processing of data can be independent from their persistence. For the programmer invisible mechanisms automatically load and save the data and convert it between persistent and transient structures [7]. In the object oriented world persistence of data means the persistent storage of complex, polymorphic, modular-hierarchical object structures (the linkage of the methods is not discussed here; See [4]). Some object oriented programming languages (e.g. Smalltalk) already support persistence with the language standard. In contrast C++ leaves the programmer alone with all consequences to implement

persistence mechanisms for data objects. In the past a lot of different solutions were developed in that case. Because of the special requirements of a C++ based modular- hierarchical simulation runtime system for variable structure systems [9] (e.g. separate persistence of inherited data) an approach is proposed, that can be considered as an extended general solution for persistent object management too.

Problem Analysis On the way to persistent objects many different physical and logical problems have to be solved. The main task of a persistence mechanism is the transformation of main memory object structures into flat sequential byte streams. The persistent storage of a complex object structure must include the values of the individual objects and their relations (references) among them. References in the transient structure (pointers) have to be translated into object numbers, because the identification of the persistent objects in the stream doesn't work with main memory addresses. The access to persistent objects via object numbers presupposes a predefined order of the objects in the stream. Another important problem is the storage of cyclic structures. The persistence mechanism has to traverse the entire structure with a decided strategy (e.g. depth- first- order) to find all objects. In this context it must be prevented, that the mechanism runs in an endless loop during the search. The program independent existence of persistent data causes a further general problem. The type bindings of persistent data in the stream can not be protected by the type system of the programming language. To keep the semantic of the persistent data, the persistence mechanism has to prevent incorrect interpretation of the data e.g. by using redundant information in the persistent storage.

Object 1 Object 3

Object 2

Object 5

Object 6

Object 4

Object 7

Figure 1 : Commonly used substructure

To the special requirements of the persistent storage announced above belongs the solution of the problem of commonly used substructures, which is shown in Figure 1. If Object 1 should be stored, its referenced objects (Object 3) respectively substructures (Object 5 and its referenced Objects 6 and 7) have to be stored as well because of their logical correlation (persistence by reachability). If Object 2 should be stored, Object 4 and the substructure Object 5, which is commonly used with Object 1, have to be saved too. The result is the existence of two independent persistent copies of the same substructure Object 5. After the reconstruction Object 1 and Object 2 don't share the same substructure again, both objects have an own local copy of Object 5 and its components (Figure 2), which causes an unacceptable information loss:

Object 1 Object 3

Object 2 Object 5

Object 5

Object 6

Object 7

Object 6

Object 4

Object 7

Figure 2 : loss of commonly used substructure after the reconstruction Another special requirement (see [9]) is the selective storage of single objects or substructures out of the context of a wrapper structure:

Object 1 Object 2 Object 4

Object 3 Object 5

File

Figure 3 : Selective Storage In the example illustrated in Figure 3 the single Object 2 should be stored out of (respectively loaded into) the context of the entire structure. Therefore the search of the persistence mechanism for not stored/loaded objects must be disabled (e.g. by parameterizing). Another required special feature in [9] is a solution for storing/loading inherited data separately. That means, an object must be able to explicitly store/load only inherited components (specified by the name of the inherited

class; See also the implementation examples below). The general technical problems are described in the following sections.

Design Principle The persistence mechanism presented here is encapsulated by a generic base class, which can be inherited by each user-defined class (Data Type Orthogonality). The major aims were portability of the entire solution, for the user transparent mechanisms and a minimum of complexity (no global data). The transparency of the algorithms is achieved by their total encapsulation based on the functional abstraction of the persistence mechanism. The components of the generic class can be divided into completely encapsulated components (not invisible and therefore not adjustable by inherited classes), interfaces and persistence methods. The set of encapsulated components contains the File Management, Object Management, Garbage Collection and Error Handling (See Figure 4). Generic Persistence Class

Persistence Methods

File Management Object Management

Simple Data

Class Instances

Garbage Collection Static Data

Static Instances

Dynamic Data

Dynamic Instances

Error Handling

Interfaces Global Object Generation Object Communication

User Interface Save

Load

Figure 4: Functional abstraction in the generic class The File Management is responsible for creating, opening and closing files as well as reading and writing the data. Because of the encapsulation of this functionality the persistence mechanism of the generic class and thereby implicit of all inherited classes is easy to adapt on other reading/writing devices, e.g. sockets, pipes etc.. The task of the Object Management is the administration of objects during stor-

ing/loading processes. It registers all already stored/loaded objects and supports the persistence mechanism with that information to detect e.g. cyclic structures. Furthermore the Object Management is responsible for the translation between transient object addresses ↔ persistent object numbers. Because it works locally (each loading/saving process has an own management) it is possible for the persistence mechanism in contrast to other approaches as [3] to operate consistently in multi-process applications. The Garbage Collection in this approach is a solution of a technical problem. Because the persistence mechanism possibly has to create dynamic variables, which are not handled by methods of the class (e.g. destructors), a concept for collecting all this data is necessary to give free their memory spaces after object finalization (keyword: avoiding memory leaks). An example: FODVV

bsp

{

FKDU*

name bsp(FKDU* s) {name = s;}

};

An instance of this example class is initialized by constructor with a pointer to a string. With the persistent storage of such an object the string will be saved too. During the rebuilding of the object from the persistent storage dynamic memory for the reconstruction of the string must be allocated. With the deletion of the object a memory leak occurs, because there is no method, which gives the memory space of the string free again. The problem raises with multiple reinitializations of the object. Each loading process would allocate memory for the string. The memory leaks already occur after the second reinitialization. Therefore in this approach the Garbage Collection of the generic class registers those dynamic memory allocations of the persistence mechanism and automatically deallocates the memory spaces by the deletion of the object. This feature is not supported by some other persistence solutions (e.g. streams++ [3]). The last encapsulated component is the Error Handling -functionality. It catches all the errors occurring during loading/saving processes. That could be semantic errors (e.g. negative string length) or physical errors like

reading or writing errors as well as memory allocation errors.

The Interfaces To avoid complexity, the user interface only consists of the two methods load() and save() (see the implementation examples). Another important interface (Object Communication) serves the communication between objects during loading/saving phase. Via this interface the objects send messages each other, which initiate the respective actions (load/save) in the informed object. This interface is totally encapsulated too and therefore only for the internal use of the persistence mechanism (transparency ↑). The third interface serves the Global Object Generation. The persistence mechanism must know all used classes in a project, which are persistence derivatives, to be able to reconstruct them within dynamic object structures. This interface is a simple method, which has to be adapted once in the entire project (see implementation examples). The third category of elements of the generic class consists of the persistence methods. These methods are used for the storage/ reconstruction of the member data of an object. They send the member data messages to activate the storing respectively loading process. Because of the different types and properties of the member data a lot of methods were implemented. It is decided between simple data and class instances. Simple data can be written /read directly. Class instances must be informed via messages to activate their individual load/store process. In this context static members are components, which are a part of the instance. Methods for dynamic members must be able to allocate memory space. Dynamic instances will be created via the interface of the Global Object Generation.

Implementation In this section the translation of the designed model into concrete details of the programming language C++ is described. Only standard constructs of C++ were used which guarantees the portability of the result.

The implementation consists of three classes. The Garbage Collection and the Object Management are realized as own classes, both are simple linear list implementations. Instances of them are fully encapsulated by the general persistence mechanism, which is implemented in the class persistent. load()

Loading Initialisation Communication

save()

Data Moving Persistence

File Message to start the Organisation of the physical writManagement persistent activity member data ing/reading load();save()

p_message(...)

move_data()

Saving

static_data(); data(); //etc.

Implementation Interface

Figure 5 : Persistence mechanism of the generic class persistent Arbitrary classes can inherit this base class to get the mechanisms for storing/loading their member data. Therefore the persistence mechanism must be slightly adjusted by redefinition of a method move_data(). The Figure 5 shows the program flow of the persistence mechanism during the loading respectively saving process, which are activated by the interface methods load() respectively save(). These methods are responsible for opening the file, initialization of the Object Management (empty list) and packing of a message record with all information about the current activity (mode flag, reference to the Object Management, file pointer), which will be send as a parameter of the method p_message(...) to all objects in a structure, at first to itself (Figure 6). The following illustration shows the message order of the loading/storing process of a hierarchical three- object - structure.

1. load(...) / save(...)

Object 1

4. p_message(...)

Object 2

2. p_message(...)

3. move_data()

6. p_message(...)

5. move_data()

Object 3

7. move_data()

Figure 6 : Message order while saving/loading

By the way it is worth to mention, that the entire message exchange between the objects within a structure is fully encapsulated and therefore transparent for the derived classes. If an object receives such a message, it registers itself in the Object Management -list and calls its own virtual method move_data(), which is the implementation interface of the class. In their body for all data elements of the respective class the suitable persistence methods must be called. Each of these methods recognizes the current mode (load/store) of the persistence mechanism and performs the physical reading/writing of the data. If the class has class instances (persistent derived) as member elements, the persistence methods ( dynamic_object(...) must be chosen for dynamic, referenced objects as well as static_object(...) for static objects) activate the persistence mechanism of these member objects by calling their p_message(...)- method. In Figure 5 as well as in Figure 6 it is recognizeable, that the program flows for the loading and saving mode are nearly identical. The persistence mechanism is only parametrized by the interface methods. At last in the calling chain the persistence methods read the parameter and perform the respective physical activity. The implementation interface between them, which abstracts both loading and saving as one data movement process, is illustrated in Figure 5. So only one method (move_data()) has to be redefined. In other implementations of persistence libraries (e.g. streams++ ) a method for loading and a method for saving mode has to be redefined, which needs much effort if the number of member elements is high. This adds complexity to the system anyway, because the two methods must be aligned each other in the storage order of the member elements and thereby the file structure. Therefore the maintenance of two dependent methods is a source of programming errors. This disadvantage is avoided in the proposed project (see implementation examples). The Error Handling mechanism is implemented by using the Exception Handling feature of C++. Occurring physical and semantic errors will be caught and assigned to error numbers, which are the return values of the two interface methods.

The Global Object Generation feature is implemented as one private (to avoid name conflicts with derived classes) method of the class persistent, which is able to generate an instance of the class, whose name was passed as a string parameter. Therewith the object generation (which needs global information about all classes of a project) is completely decoupled from the encapsulated persistence mechanism including local object management. An Example of this method for a twoclass project: persistent *persistent::p_create_Instance(char *name) { if (strcmp(name,"class_1")==0) return new class_1((persistent*)NULL); else if (strcmp(name,"class_2")==0) return new class_2((persistent*)NULL); else return NULL; }

The definition of this method can be more simplified by using macros: DEF_CREATE_INSTANCE( REGISTER(class_1) REGISTER(class_2))

As mentioned above these definition must be done once in the entire project and it considers all persistent derived classes, whose instances are within dynamically loadable structures. The following example shows the integration of the persistent class into an arbitrary class: example_class : SXEOLF persistent var,field[20]; FKDUstat_string[10];/*instance bounded data*/ FKDU dstring; /*Pointer to a dynamic string*/ FODVV abc inst; /*instance bounded class instance (persistent derived)*/ FODVV abc *inst2; /*Pointer to a dynamic object (persistent derived)*/ MOVE_DATA(example_class,persistent)/*Macro*/ /*persistence methods */ { dynamic_string(dstring,LOAD_AFTER_CREATING); static_object(inst); static_string(stat_string); static_data(var); static_field(feld); dynamic_object(inst2,LOAD_AFTER_CREATING); } /* other class element definitions */

FODVV

{

LQW

};

The example_class inherits the generic class persistent. It contains several type-different members e.g. instances of basic types, fields, pointer to dynamic strings and class instances.

For all the member elements the adequate persistence method must be chosen within the move_data() method definition as shown above. The persistence methods dynamic_string() and dynamic_object() have an additional parameter flag, which indicates, whether dynamic memory must be allocated (default case) before loading the data or it is a pure reinitialization (without creating) of the respective dynamic member. This approach doesn't utilize the RTTI feature of the C++ programming language for identifying the type of transient objects. To realize the separate persistence of inherited data (See also example 3), there was a need to implement a mechanism which is able to search backwards in the class hierarchy for the specified class. Because this problem cannot entirely be solved using RTTI (except class name detection), an own solution was implemented. To exclude the RTTI mechanism (additional overhead) the class name detection is provided by a simple method, which will be defined implicitly (transparent for the user) utilizing the MOVE_DATA macro (See implementation examples).

File Structure For understanding the physical representation of the persistent objects in this approach the file structure of an arbitrary object is illustrated in Figure 7. It shows an object with three members: a string field, a long and a short int number. The format of the numbers is Little Endian. The move_data() method definition is only an example.

example object char[9]name =“my_name“ MOVE_DATA(Eclass,persistent) { static_string(name); static_data(long_number); static_data(int_number); }

short int int_number =25 long long_number =45

string length +1

short int

string terminator

0 0 7 0 'E' 'c' 'l' 'a' 's' 's' 0 8 0 'm' 'y' '_' 'n' 'a' 'm' 'e' 0 45 0 0 0 25 0 0 class name- type : string

Figure 7 : File Structure

string

long

The order of the persistence method calls is arbitrary (advantage of one method), but it influences the file structure. So if there are already a lot of persistent objects in use, the move_data() definition must not be changed. The first element in the persistent store is the counter of special parameters which were passed by calling the interface method save() (see also the implementation examples). The subsequent element is the class name of the object as string, which consists of the string length, the string data and the terminator. This header follows the data of the member element, which order is defined in the move_data()- method definition.

Implementation Examples In this section some examples of the use of the persistence library are presented.

Example 1: A cyclic structure The first example implements a hierarchical structure with three elements. There are a root node and two son nodes, all nodes are instances of the same class. The root node has references to both sons. The sons refer to the root and each other. The entire structure is a double chained, closed list. #include "persis.h" /*Header of persistent*/ class Node : public persistent { public: char name[20]; //space for object name Node *Partner1,*Partner2; //references to partner objects Node(char* n) { Partner1=Partner2=NULL; strcpy(name,n);} virtual ~Node(){} //Destructor void Identification() { cout Partner1 = h->Partner2->Partner1 = h; h->Partner1->Partner2 = h->Partner2; h->Partner2->Partner2 = h->Partner1; // Identification; output to stdout h->Identification(); h->Partner1->Identification(); h->Partner2->Identification(); //Storage of h and its members in file test.ttt h->save("test.ttt",NULL,NULL);

//Generation of a new root h = new Node(""); //Load the structure from test.ttt h->load("test.ttt",NULL,NULL); //verification cout Identification(); h->Partner1->Identification(); h->Partner2->Identification(); }

After the translation the program produces the following output which verifies the correctness:

Name: Root, Partner: Son 1, Son 2 Name: Son 1, Partner: Root, Son 2 Name: Son 2, Partner: Root, Son 1 ******After Loading:****** Name: Root, Partner: Son 1, Son 2 Name: Son 1, Partner: Root, Son 2 Name: Son 2, Partner: Root, Son 1

The transient structure and the corresponding persistent representation of this example are shown in the following illustration.

MOVE_DATA(Node,persistent) { static_string(name); dynamic_object(partner1,...); dynamic_object(partner2,...); }

Root Partner1 Partner2 1 6

4

2

Son 2

Son 1

Partner1

3

Partner2

Partner1 Partner2

Number of already stored objects

5

0 5 'Node' 0 5 'Root'

0 0 5 'Node' 0 6 'Son 1' 0 1 0 5 'Node' 0 6 'Son 2' 0 1 2 3 Son 2 Son 1

The move_data()- method definition influences the order of the messages and therewith the file structure. The storing process starts by calling the interface method save( ) of object Root. Root writes its class name into the file followed by its data. At first the instance name is written, then the instance, referenced by Partner1(Son 1), is requested by calling p_message( ) to write its data into the file. Because Son 1 is an instance of the same class as Root, it does the same: writing the class and the instance name. Then it sends to Root the message for storing itself. Because Root already has been stored, it writes not its data but its object number into the file. This object number the persistence mechanism obtains from the transparent, local operating Object Management. It is clearly recognizable, how the indirect recursive calls of the move_data() methods cause an interlaced, nested persistent storage.

Example 2: Selective Storage of a Substructure This example structure consists of 4 nodes. There are a root node and a substructure consisting of the objects Node 1, Son 1 and Son 2. This substructure shall be stored out of the context of the entire structure. The problem of this configuration is that both sons have references to the root node. The persistence mechanism would normally follow these references and store all reachable objects and thereby the entire structure. The important location in the source is therefore the call of the interface methods. The third parameter of both interface methods is an ellipse, which can take a parameter list of variable length. These parameters are references to objects within the structure, which shall not be stored respectively, which shall be referenced by objects of the loaded structure. The Object Management registers these referenced objects and assigned them object numbers. The persistence mechanism considers these objects as already loaded respectively saved and continues.

Root

Figure 8 : Persistent and transient representation of a cyclic structure

This program uses the class definition of the first example program. Therefore only the main- routine is shown.

After the translation the program produces the following output which verifies the correct structure change:

void main() {// Generation of the objects new Node("Root 1"); Node *h = new Node("Node 1"); h->Partner2= h->Partner2->Partner1=new Node("Son 2"); h->Partner2->Partner2=new Node("Son 1");

Name: Root 1, Partner: ,Node 1 Name: Node 1, Partner: Son 2,Son 1 Name: Son 2, Partner: Root 1,Son 1 Name: Son 1, Partner: Root 1,Son 2

// Initialization of the references h->Partner2->Partner1->Partner1 = h; h->Partner2->Partner2->Partner1 = h; h->Partner2->Partner1->Partner2 = h->Partner2->Partner2; h->Partner2->Partner2->Partner2 = h->Partner2->Partner1;

****** After Loading ****** Name: Root 2, Partner: ,Node 1 Name: Node 1, Partner: Son 2,Son 1 Name: Son 2, Partner: Root 2,Son 1 Name: Son 1, Partner: Root 2,Son 2

// Identification; h->Identification(); h->Partner2->Identification(); h->Partner2->Partner1->Identification(); h->Partner2->Partner2->Identification();

Note: The stored substructure can also be loaded as an autonomous structure if the 3. Parameter of the load(...) method is NULL (stands for: end of parameter list.).

//Saves Node 1 in test.ttt // 3 parameter = reference to the root h->Partner2->save("test.ttt",NULL,h,NULL);

//Identification cout Identification(); h->Partner2->Identification(); h->Partner2->Partner1->Identification(); h->Partner2->Partner2->Identification(); }

In that way a solution of the problem of commonly used substructures (see Problem Analysis) can be found. The Object 1 from Figure 1 can store for example itself and the commonly used substructure while the save(...)- method of Object 2 is called with the reference of that structure as the 3. parameter. To reconstruct the entire structure, Object 1 (including substructure) must be loaded first. The Object 2 has to be generated. It uploads itself by calling load(...) with the new address of the substructure of Object 1 as the 3. parameter. After that the original structure is reconstructed.

The activities of this program is illustrated in the following figure:

Example 3: Separate persistence of inherited data

//Generation of a new root h = new Node("Root 2"); h->Partner2 = new Node(""); //Loading the structure from file test.ttt //with 3. parameter = reference to the new root h->Partner2->load("test.ttt",NULL,h,NULL);

Root 1

This program shows how an object can save only its inherited data.

Root 2

Partner1

Partner1

Partner2

Partner2

#include "persis.h" Node 1

Node 1

Partner1

Partner1

Partner2

Partner2

Son 2

Son 1

Partner1

Partner1

Partner1

Partner1

Partner2

Partner2

Partner2

Partner2

Son 2

Son 1

File

Figure 9 : Persistence in different contexts

class Basedata:public persistent {public: int Number1; long Number2; //Constructor Basedata(int i,long f) { Number1=i; Number2=f;} MOVE_DATA(Basedata,persistent) { static_data(Number2); static_data(Number1); } }; class Node : public Basedata { char Name[30]; public: Node(char *s,int i,long f):Basedata(i,f){strcpy(Name,s);} void Identification() { cout

A lightweight class library for extended persistent ...

A lightweight class library for extended persistent ...

Suggest Documents

Mnemosyne: Lightweight Persistent Memory - CiteSeerX

A Lightweight Performance Emulator for Persistent Memory Software

A Lightweight Performance Emulator for Persistent Memory Software

a framework for extended persistent identification of scientific ... - J-Stage

A Class Library for Multithreaded Programming - CiteSeerX

A Class of Adaptive Extended State Observers for ... - IEEE Xplore

A Lightweight and Efficient Mechanism for Fixing ... - ACM Digital Library

GSHMEM: A Portable Library for Lightweight, Shared ... - PGAS 2011

A trial of oral corticosteroids for persistent ... - Wiley Online Library

Ripser.py: A Lean Persistent Homology Library for ... - Open Journals

A prediction model for persistent (erosive) arthritis - Wiley Online Library

Ultra-lightweight, Deployable 1m-Class Optical Telescope for SSA

Picasso: Lightweight Device Class Fingerprinting for ... - Elie Bursztein

Persistent First Class Procedures are Enough - CiteSeerX

Extended sequence preferences for ... - Wiley Online Library

Extended anticoagulation for unprovoked ... - Wiley Online Library

Detection and characterization of class A extended ... - Oxford Journals

an extended class of marginal link functions for modelling contingency ...

Chromosome-Encoded Extended-Spectrum Class A Î²-Lactamase MIN ...

Genetic Analysis of an Ambler Class A Extended-Spectrum Beta ...

Ambler Class A Extended-Spectrum Î²-Lactamases in Pseudomonas ...

A Class of Extended Fractional Derivative Operators and ... - MDPI

A class of exactly solvable rationally extended non-central potentials ...

Aeromonas hydrophila with Plasmid-Borne Class A Extended ...