Languages for System Components: A Case Study W. M. Waite Department of Electrical and Computer Engineering University of Colorado
[email protected] B. M. Kadhim Department of Computer Science University of Colorado
[email protected] Abstract This paper describes a successful example of the use of a special purpose language to implement a reusable design for a system component: a general property storage module that provides unique representations for an arbitrary number of entities and allows an arbitrary set of properties to be associated with each entity. All of the parameters of this design can be distilled into a simple language that allows users to describe the requirements imposed by their application. This linguistic description can then be compiled, producing a C-coded module tailored to the application. Modern compiler technology allows us to generate the compiler for the description language quickly from a simple speci cation, thus making the approach cost eective.
1 Introduction Many computer applications deal with entities that have properties. For example, a program that analyzes a directed graph deals with node and edge entities. Each edge is directed from a starting node to an ending node, and those nodes might be properties of the edge entity. Both node and edge entities might also have label properties, and so on. The object oriented programming paradigm is one approach to such applications: Each entity is represented by an object of a speci c type appropriate to that entity, and each property by a member value of that type. An alternate approach uses one type of object to represent all entities. These entities can be prede ned or may be created dynamically. While the names and types of properties are predetermined, the association of properties to entities happens dynamically. The association is established by applying an access method for the property to the object representing the entity. The association can also be done statically for prede ned entities. Type safety can be guaranteed by careful design of the module implementing the access methods [7]. 1
#define NoKey (DefTableKey)0
/* Distinguished entity */
extern DefTableKey NewKey(); /* Establish a new definition * On exit* NewKey=Unique definition table key ***/
Figure 1: Representation of an Entity We have debated the relative merits of these two approaches to the general property storage problem elsewhere [8]; our purpose here is to provide an example of design reuse [6] by linguistic means. The parameters of the property storage module design can be distilled into a simple special purpose language. A short description in this language can then be compiled into an application-speci c module which, in turn, can be compiled by either a C or a C++ compiler and linked with the remainder of the application. The use of a special purpose language allows the user to concentrate on the characteristics of their problem and saves them the cost of a labor intensive implementation. Creating a special purpose language entails a cost that must be weighed against the cost of creating the implementations by hand. Advances in compiler technology, especially the availability of compiler construction environments like Eli [3], have signi cantly speeded the development of compilers for such languages: It is now possible to build a robust, maintainable implementation of a language like the one described in this paper in a day. Thus the linguistic approach provides tremendous leverage for labor intensive applications like the property storage module [1,2]. The remainder of the paper is structured as follows: Section 2 gives a brief explanation of our approach to property storage and access, showing how exibility and type safety can be maintained. In Section 3 we show how a simple special purpose language can be used to describe the interface for instances of the module. Section 4 shows how a compiler can produce an application-speci c property storage module from such a description.
2 The Property Storage Module A general solution to the property storage problem must allow for an arbitrary number of entities, each having any number of properties. Properties must also be allowed to have any type. These properties could be accessed in arbitrary ways. Our approach is to provide a design that encompasses this exibility, but then to implement that design in an application-speci c way. Thus we can solve any particular problem, but the solution does not pay for unnecessary generality { the design is reusable but the code is not [6]. Each entity is represented by a unique value of type DefTableKey. The module exports a distinguished value to represent \no entity", and a function for creating new entities (Figure 1). NewKey is not limited in the number of entities it can provide, so the module is capable of dealing with an arbitrary number of entities. Each property has at least two access methods, whose interface speci cations are given in Figure 2. Thus 2
extern void ResetPROP(DefTableKey k, TYPE v); /* Establish a property value * On entry* k represents the entity whose PROP property is being established * v=desired value of k's PROP property ***/ extern TYPE GetPROP(DefTableKey k, TYPE d); /* Obtain the current value of a property * On entry* k represents the entity whose PROP property value is desired * d represents a default value for the PROP property * If k is NoKey or k's PROP property has not been set then on exit* GetPROP=d * Otherwise on exit* GetPROP=current value of k's PROP property ***/
Figure 2: Standard Access Methods if the problem at hand required an int-valued property named Count, the module would export at least two operations having the prototypes void ResetCount(DefTableKey, int) and int GetCount(DefTableKey, int). Clearly there is no limit on either the number of properties or their types, so a speci c module may deal with any desired number of properties of any desired types. A critical point about the interface for the Get routines in Figure 2 is that they return a valid value in all situations. The requirement on the programmer to provide a default value, to be used when either the given key represents no entity or the entity does not have the desired property, guarantees an appropriate result in the speci c context of the call [7,8]. Additional access methods, obeying similar interface conditions, can be written when needed by a particular problem. For example, in a desk calculator program stored variables might be represented by entities with two properties: the name of the variable and its value. An access method int FetchValue(DefTableKey) could be de ned to return the value property if it was de ned, and otherwise to print the variable name and ask the user to supply a value interactively. Since there is no limit on the number or complexity of such procedures, a speci c module may access properties in any desired way. Although it is robust and reusable, there is a great deal of code that must be provided for any instantiation of this design. The implementation of the underlying associative memory that realizes the de nition table keys and the basic lookup mechanism [7] can be reused, but the code of Figure 2 must be rewritten for each property.
3 The Property De nition Language A surprisingly small amount of information is needed to de ne a speci c property storage module: 3
Counter: int [Inc]; Storage: StorageRequired; "storage.h" Type: DefTableKey; int Inc(DefTableKey key) { if (key == NoKey) return 0; if (ACCESS) ++VALUE; else VALUE = 1; return VALUE; }
Figure 3: Specifying a Property Storage Module 1. 2. 3. 4.
A set of named properties A type for each property A set of access mechanisms An associative memory implementation
Once this information is available, the property storage module it de nes can be created mechanically. Tremendous leverage can therefore be obtained by using a special purpose language to express the necessary information, and having the compiler for that language produce the desired property storage module. While the named properties and their types are going to be speci c to each application, the bulk of the applications will use a small set of access mechanisms. Some of those (like Reset and Get) are so common that they should simply be provided automatically for every property. The remainder should be available as a library, and associated with particular properties by an additional speci cation. Finally, language facilities should be available for creating application-speci c access mechanisms. The associative memory implementation can be provided by a library module that the user can select at link time, since a very simple mechanism suces in most cases. Thus no language facilities for associative memory speci cation are needed. Figure 3 shows an example property storage module speci cation. It is written in PDL (the Property De nition Language), and de nes three properties. Counter is an integer valued property that has, in addition to the standard Get and Reset access methods, an access method called Inc. Inc is an applicationspeci c method, whose de nition appears later in the speci cation. The speci cation also de nes the Storage property that is of type StorageRequired. The de nition of StorageRequired can be found in a header le called storage.h, and that interface is made available to the property de nition module by placing the name of the header le in double quotes in the speci cation. The last property de ned by the speci cation in Figure 3 is called Type and is of type DefTableKey. No header le need be included for the de nition of DefTableKey, since that is a type that must always be included in the generated property storage module. The access method Inc is de ned in C notation in the PDL speci cation. It is designed to take a key representing an entity, establish the value 1 if the property has not been set, and increment the value otherwise (as one would do with a counter). In this example, Inc can be applied only to the type-int property Counter, and therefore int appears as the result type. De nitions of access mechanisms that can be applied to properties with a variety of types use the symbol TYPE instead of a speci c type identi er. 4
Zero -> Value={0}; MaxInt -> Value={32767};
Figure 4: Initializing Properties ACCESS, PRESENT, and VALUE are macros that can be used in the body of an access method de nition. These macros are de ned by the PDL language, and provide a simple interface that hides the details of using the associative memory and addressing the property value. Their semantics are described in the PDL Manual [4].
Note that the de nition of Inc does not specify a property. Inc could be speci ed as an access method for any integer-valued property using the bracket notation illustrated on the rst line of Figure 3. Given Figure 3, the Inc access method is embodied only in the operation int IncCounter(DefTableKey); if it were also associated with another integer-valued property P then the operation int IncP(DefTableKey) would also be available. Sometimes it is useful to establish some entities at compile time, with particular property values. Figure 4 shows the notation PDL provides. Here the text in braces must be a valid C initializer for the type of the named property. The identi ers Zero and MaxInt represent entities; they are values of type DefTableKey that can be used as arguments to access methods to query and update their property values.
4 The PDL Compiler The net result of the speci cation given in Figure 3 is a set of seven operations: one Get operation and one Reset operation for each of the three properties, plus the IncCounter operation. The PDL compiler implements each of these operations by a C macro. For example, here is the implementation of GetCounter: #define GetCounter(key,d) Getint(1,(key),(d)) Getint is a routine produced by the compiler to implement the Get operation on properties of type int. Its rst argument is an integer generated by the compiler to represent the speci c property (Counter in this case). There will be a single routine named Getint, regardless of the number of properties of type int.
Figure 4 causes the compiler to generate two global variables named Zero and MaxInt, with appropriate initializers. Each initializer describes a C struct with one eld containing the compiler generated integer corresponding to the property and another containing the initializer supplied in the PDL code. The compiler's output consists of two les, pdl gen.c and pdl gen.h. File pdl gen.c contains the code of the type-speci c access routines, plus declarations for all of the global variables. All of these are declared extern in pdl gen.h, which also contains the macro de nitions and the interface speci cation for the associative memory. To use the generated module, include pdl gen.h in any C or C++ le that refers to an entity, compile pdl gen.c with the appropriate compiler, and link it with the application and the module implementing the associative memory.
5
In addition to verifying the phrase structure of the input, the generated compiler reports violations of any of the following context conditions imposed by the language:
Property names may not appear as types and vice versa. Speci ed operations must be de ned, but not be multiply de ned. Initial property values of prede ned entities may only be de ned for declared properties and may not be assigned more than once. Since it is allowed to declare a property more than once, each declaration must agree in its type.
Allowing properties to be de ned in more than one place aords greater exibility in building modular speci cations. It means that any number of PDL speci cations can be written, each to describe a dierent (but possibly overlapping) property storage aspect of the system, and then simply be concatenated together for input to the PDL compiler. Modern compiler construction environments greatly reduce the cost of producing compilers for languages like PDL. Using Eli, the entire syntax of PDL can be described in less than 30 lines by an EBNF grammar; the token structure needs a 5 line speci cation. The context conditions are veri ed by an attribute grammar speci cation about 150 lines long. (A large number of these lines are dedicated to the description of the abstract syntax and not to the computations themselves. The compiler was also written before Eli's module library was extended to greatly simplify the task of name analysis [5].) Eli supports output via a special purpose language, called PTG, for pattern text generation. A speci cation combining PTG patterns with attribute grammar rules describes the construction and composition of output text. PTG manages the storage for the constructed text, which is then output by invoking a single function. This distinction between the times at which the output is constructed and at which it is written is very important when collecting text fragments that are most naturally generated at widely separated positions in the input. Many people attempt to use scripting languages to construct translators for little languages like PDL. Use of a scripting language tends to produce rigid restrictions on the phrase structure of the language and often makes adequate veri cation of context conditions quite dicult. A scripting languages is also not likely to reduce the implementation eort below that needed when using a mature compiler generation environment. Finally, scripting languages are notorious for leading to speci cations that are very dicult to understand and maintain.
5 Conclusion We have presented a strategy for creating a general property de nition module from a simple linguistic description. The system described here has been used for several years to provide the de nition table component for translators generated by the Eli system [3]. Our small investment in the creation of a robust and maintainable compiler for the PDL language has paid o handsomely in simplifying speci cations and providing a mechanism for supporting other system components. Both the associative memory module and the PDL compiler are available via anonymous ftp as part of the Eli system. We are currently establishing procedures for obtaining this software independent of the remainder of Eli. 6
Acknowledgments This work was partially supported by the US Army Research Oce under grant DAAL03-92-G-0158.
6 References 1. Bentley, J., \Little Languages," Communications of the ACM 29(August 1986), 711{721. 2. Cleaveland, J. C., \Building Application Generators," IEEE Software 5 (July 1988), 25{33. 3. Gray, R. W., Heuring, V. P., Levi, S. P., Sloane, A. M. & Waite, W. M., \Eli: A Complete, Flexible Compiler Construction System," Communications of the ACM 35 (February 1992), 121{131. 4. Kadhim, B. M., \Property De nition Language Manual," Department of Computer Science, University of Colorado, CU-CS-95-776, Boulder, CO, July 1995. 5. Kastens, U. & Waite, W. M., \Modularity and Reusability in Attribute Grammars," Acta Informatica31 (1994), 601{627. 6. Prieto-Diaz, R'en, \Status Report: Software Reusability," IEEE Software 10(May 1993), 61{66. 7. Waite, W. M. & Carter, L. R., An Introduction to Compiler Construction, HarperCollins, New York, 1993. 8. Waite, W. M. & Kadhim, B. M., \A General Property Storage Module," Department of Computer Science, University of Colorado, Boulder, CU-CS-786-95, September 1995.
7