Integrating Code Generators into the C# Language

1 downloads 0 Views 81KB Size Report
many code generation tasks can be accomplished for which generic types are ... to extend a programming language with new constructs, which trigger the ...
Integrating Code Generators into the C# Language Dirk Draheim Institute of Computer Science Freie Universit¨at Berlin Takustr.9, 14195 Berlin, Germany [email protected]

Christof Lutteroth, Gerald Weber Department of Computer Science The University of Auckland 38 Princes Street, Auckland 1020, New Zealand [email protected], [email protected]

Abstract In this paper we show how the concept of code generators can be safely implemented into an object oriented language. Modern languages like Java and C# begin to offer advanced features for generative programming, like generic types. Our own extension of C# generalizes the concept of generic types by combining it with reflection. With reflection many code generation tasks can be accomplished for which generic types are insufficient. By balancing the availability of code generation features with their safety, we are able to detect potential generation errors statically.

1 Introduction In today’s software engineering generation of components is a common task – usually it is supported by proprietary generators or still even done manually. Many programs have potentially parts in common, and rewriting variations of these parts over and over again is a time consuming and tedious task. Identification and isolation of such parts has led to the notion of components, and the fact that a component cannot always be used in its same static form but needs to be adapted has led to the notion of code generation. Over the years code generation has become an important part of software development. There exists, for example, a large variety of tools for the generation of database interfaces, GUIs and compilers. Besides these very specialized examples of code generation technology, many systems have been developed that offer a more generic approach toward code generation. Some of these systems allow the user to extend a programming language with new constructs, which trigger the generation of customized code. In many cases it is not easy for a user to develop own code generators, even when using systems that support this explicitly. The user has to have knowledge about how a generator receives its parameters, how code is represented

and processed, how code is emitted and how a generator is deployed. The answers to these questions vary greatly from technology to technology. Code generation is a sensitive area because it depends on parameters and the usual data structure involved, a syntax tree, is not trivial. A generator can work well most of the time but fail with some rare actual parameters, and an error may not be obvious but express itself in some slightly malformed parts of generated code. Using generators always bears the risk of introducing hard to find bugs, while a good generator has the potential to provide an economic and solid solution to a common problem. Complexity in the development of code generators leads to generators that are more error-prone. In this paper we show how the concept of code generators can be made accessible to the user directly in object oriented languages. The aim is to make generators part of a program and not of the compiler. No internal knowledge of the compiler should be required, and the generation process should be transparent for the user. Placing generators into the language itself instead of into a compiler affects the language syntax as well as its semantics and safety. The challenge lies in integrating the new constructs syntactically without interfering with existing semantics. Typed languages usually offer a high degree of safety through the use of type systems. Type checkers are able to detect many potential execution errors statically. With the new concept of generators, however, new types of potential execution errors are introduced, namely those that happen when code generation produces ill-typed code. Consequently, code generation poses new challenges to type systems. In Sect. 2 we have a look at two relatively widespread generative programming technologies, and in Sect. 3 we present our own solution. Section 4 poses an example how this solution can be applied, while Sect. 5 describes how certain safety issues were dealt with. The paper concludes with Sect. 6.

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05) 0-7695-2316-1/05 $20.00 © 2005 IEEE

2 Examples of Parameterized Code Reuse 2.1 Generic Types A good example for how a generative concept can be suitable integrated into object oriented languages is that of generic types. A generic type is usually a class parameterized with one or more type variables, which can be used in the class body to substitute types. In a wider sense, generic types are a generative feature. A generic type cannot be used by itself but when given appropriate type arguments, can evaluate to an arbitrary number of different types. Usually generation of these types is heterogeneous with the same piece of code used for all the generated types, although there have also been heterogeneous solutions for this problem where code was created for each set of type arguments individually. When using non-generic standard types, e.g., a stack class for all objects, we have to perform certain up- and downcasts which would not be necessary with generic types. The only way to avoid these casts and the loss of static type safety that comes along with them is to create specialized versions of a non-generic type for each type argument. However, this can be laborious. What we can learn from generic types is that it is possible to make generative features part of an object oriented programming language, in a way that is still relatively compatible with our understanding of program design. Furthermore, this does not necessarily impair type safety but, by extending the type system appropriately, might even be an improvement.

2.2 C++ Template Metaprogramming C++ templates [3] can be used to make the compiler perform computations and generate new code. This practice, which is also referred to as template metaprogramming, exploits certain properties of templates that make them a Turing-complete compile-time sublanguage. First of all, we note that template metaprogramming uses the functional paradigm. Instead of iteration we have to use templates recursively, and distinction of cases is done by a sort of pattern matching. Another notable point is that template metaprograms are conglomerated with the C++ type system: in order to calculate a value and give it an accessible name we have to wrap it, for example, in an enumeration type. All this seems odd and is usually counterintuitive for a programmer who is used to imperative, object oriented programming. C++ template metaprogramming uses the C++ type system in order to perform compile-time computation; however, the type system does not make it any safer. It detects

only errors in the C++ code generated for particular arguments, but not potentially unsafe parts in the template code itself. Template metaprogramming is often used for optimization in libraries, e.g., for unrolling loops and partial evaluation of constant expressions. Like all code generation technology it is not a necessity but rather an instrument for gaining performance in development and execution.

3 Parameterized Generators: Factories Our C# extension, which we have given the name Factory, introduces a syntax that is reminiscent of that of generic types but not limited to classes (or interfaces). Like for generic types the template paradigm is used, but in contrast to C++ we do not use the type system to perform compile-time computation. The compile-time level language is new and distinct from the runtime one. It is kept in an imperative style, obliterating the need for recursion and pattern matching in most cases, and along the lines of the C# language itself. Also the type system is analogous to the runtime one but simpler for ordinary types and more sophisticated for generated types. It is able to detect parts of a generator that can potentially produce malformed code. The generators in the factory language are called factories and can be embedded into the source code like ordinary types. Each time a factory is applied with new arguments new types or methods with unique names are created. If a factory is applied more than once with the same arguments in a compilation run, the corresponding code is generated only once. The Factory compiler is implemented and can be downloaded from the Factory project’s web site [5]. The current version of the Factory language supports generators for classes, interfaces, methods and sets of types. A factory can be thought of as function with an arbitrary domain analogue to that of a C# method and a range that comprises all the possibly generated code. The generated code, just like, e.g., a type argument, is represented by a corresponding metaobject. For this Factory uses the standard C# metaobject protocol. When discussing the syntax of the Factory language we will use context free grammars. In these grammars the nonterminals are set in italics. Subrules are enclosed in italic parentheses. The class factory looks like an ordinary C# class definition, only that there is a list of parameter declarations embraced in parentheses after the class identifier. This makes it easy to transform an ordinary class into a factory. The result of a class factory is a Type object. A method factory is constructed analogously to a class factory. There are two sets of parameter declarations after the method identifier: one for the generator arguments, which are used during generation time, and one for the generated method’s arguments, which are used during runtime. The type set factory is used

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05) 0-7695-2316-1/05 $20.00 © 2005 IEEE

in order to generate a whole set of type definitions. It can be thought of as a parametrized namespace. The parameter declarations of a factory are mostly similar to those of ordinary C# methods, but it is possible to give an explicit bound to parameters of type Type. A factory parameter declaration is either an ordinary C# parameter declaration or an identifier, the keyword istype and a type, signifying that the parameter is of type Type and represents a subtype of the given type. classFactory:

modifiers class ID ( params ) (: type (, type)∗ )? { classBody } methodFactory: modifiers method ID ( params ) ( methodParams ) { methodBody } typesetFactory: typeset ID ( params ) { namespaceBody } params: ( paramDecl ( , paramDecl )∗ )? paramDecl: paramDecl | ID istype type

3.1 Factory Expressions Factory expressions describe values that are computed at generation-time. Beside some C# operators they provide access to C# types and their constructors, fields and methods. Furthermore, generator variables, i.e., variables that are accessible at generation time, can be accessed. These expressions should behave like deterministic functions, in the sense that structurally equivalent expressions with the same variable assignment should produce the same value. Since expressions in common generators naturally have that property, this is not a disadvantage, but as we will see in Sect. 5 it is important for the type system. The syntax of a Factory expression is similar to that of C# expressions. The main difference is that we do not have as many operators and also no type casts. In factory expressions factories can be invoked like methods, with the result being a metaobject, e.g., of type Type, that describes the generated entity.

3.2 Intercession with Factory Expressions Usually Factory expressions are used to introspect type parameters and extract or construct the information that is needed for intercession, i.e., information that represents parts of the generated entity. In order to make the value of a factory expression part of the generated code, the Factory expression is enclosed in @ characters and placed into the code template at a position where the entity that is represented by the expression’s value can occur. At generation time all factory expressions are evaluated and substituted by the code represented by their values. Since we use the standard C# metaobject protocol, it is already clear which type represents which language entity. For example, a type can be generated with a Type object, a method with a Method, and an identifier with a String.

3.3 Control Constructs In addition to expressions Factory provides imperative control constructs for code generation. They generate syntax elements conditionally or iteratively and create new generator variables. They are used in different places, depending on the entities that are generated. In the following grammar rules we denote the respective generated element with the nonterminal ge; the following table sums up which kind of element can be generated where: Place in the source code Type set / namespace body Class / interface body Method body Parameter/argument list

generatedElement Type Field / method / constructor Statement Parameter/argument list

if allows conditional generation, with the first factory expression resulting in a Boolean object. for allows to iterate over the elements of any ICollection object. Consequently the first factoryExpression must evaluate to an object the class of which implements this interface. The elements in the body are generated during iteration for each element in the array; in each iteration the respective element can be accessed from within the body through the generator variable ID. const allows to create a new generator constant, which is initialized with an arbitrary Factory expression. The construct declares the constant and assigns it the value of the expression, which can be accessed subsequently. if : foreach: const:

@if ( factoryExpr ) ge ( else ge )? @foreach ( ID in factoryExpr ) ge ID = factoryExpr ;

4 Code Generation Example Factory can be used for parametric polymorphism, but it can also be used for the generation of much more sophisticated extensions. In contrast to ordinary inheritance mechanisms, which also extend classes, a factory can adapt the extension it generates to the class that is extended. This makes it possible to address crosscutting concerns, as it is also aimed at in aspect oriented programming, e.g., [6]. The following example is a class factory that takes a class and generates a corresponding subclass with an adapted ToString method. This method is used, for example, when an object is printed on the console. Such a ToString method that simply returns the names of its class’ fields together with string representation of their values can be very useful for debugging purposes. In line 1 we declare a single type parameter for this factory. In line 4 this parameter is introspected, and we iterate over all its fields. For each field F we generate code that adds the name

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05) 0-7695-2316-1/05 $20.00 © 2005 IEEE

and the value of F to a string s. Note that the expression @=F.Name@ in line 5 generates a string literal for the field name, while in expression [email protected]@ the identifier of the same field is generated, so that it is accessed in the generated code. In the same way more complex generation tasks can be done, like generation of functionality for persistence, proxys, or wrapper classes. 1 2 3 4 5 6 7 8 9

class WithToString(Type T) { override public String ToString() { String s = ""; @foreach(F in T.GetFields()) { s += @=F.Name@ + ":\t" + [email protected]@ + "\n"; } return s; } }

5 Safety It is desirable but not necessarily guaranteed that a generator always terminates. C++ templates, for example, can potentially recurse endlessly and usually a recursion-depth limit is used to stop it [3]. In other technologies which use a Turing-complete language for meta object manipulation, like CLOS [8], OpenC++ [2] or Jasper[7], generators potentially do not terminate as well. We have designed the Factory language in a way so that nontermination cannot occur easily. The @foreach loop, for example, iterates over a list that cannot be modified, so the number of iterations is always bound by the size of the list. Only the use of a method that does not terminate, or cyclic recursion in factory calls will keep a generator running endlessly. Cyclic recursive use of factories, however, can be detected statically with call graphs. As another advancement in safety, we want to detect type errors which we call generator type errors. A generator type error means that a generator has the potential to generate ill-typed code, although it may itself be well typed. If a generator is generator type safe, then it is guaranteed that all generated code is well typed. This can be seen as the ordinary notion of type safety transferred to the field of metaprogramming, i.e., programs that manipulate and deal with other programs or source-code in general (see also [3]). Many other approaches, like the ones using C++ template metaprogramming [1] or OpenC++ [2], detect type errors in the generated code at generation-time. Type errors are only detected in a particular generated program, but no statement about the type safety of other generations is made. Languages with runtime reflection [8, 4] usually detect type errors in the generated code only when it is executed. Some approaches [2, 7] compile the generators before using them and by that, use the type checker of the compiler to ensure

a certain level of structural type safety. This helps to determine if the generator handles metaobjects in a type correct way, so that the structure of the code they represent is not syntactically malformed. But it does not detect all type errors, like the incorrect use of identifiers, which remain to be detected at generation-time in each individual generator run. When talking about type safety, it is important to distinguish the types of Factory expressions and the types generated by them. We have to clearly separate the level of the metaprogram and the level of the program that it generates. A Factory expression that generates a type is of type Type, but it may generate any C# type depending on the object it evaluates to. In order to detect generator type errors, we developed a new type system which is compatible with and extends the type system of the host language, i.e., C#.

6 Conclusion Factory implements a concept for generative programming that integrates reflection by means of a metalanguage into genericity. It can be integrated into object oriented languages and can be used to solve common problems of generative programming. It offers advantages compared to other languages with respect to the degree of integration of the runtime and the metalanguage, and safety. The type system of the host language is extended so that generator type errors can be detected.

References [1] G. Attardi and A. Cisternino. Reflection support by means of template metaprogramming. Lecture Notes in Computer Science, 2186:118–??, 2001. [2] S. Chiba. A metaobject protocol for c++. In Proceedings of the 10 th Conference on Object-Oriented Programming Systems, Languages and Programming, pages 285–299, 1995. [3] K. Czarnecki and U. Eisenecker. Generative Programming Methods, Tools, and Applications. Addison-Wesley, 2000. [4] R. Douence and M. S¨udholt. A generic reification technique for object-oriented reflective languages. Higher Order and Symbolic Computation, 14(1), 2001. [5] D. Draheim, C. Lutteroth, and G. Weber. Factory, 2003. http://www.factory.formcharts.org. [6] G. Kiczales. An overview of AspectJ. In Proceedings of the European Conference on Object-Oriented Programming, pages 18–22. Budapest, Hungary, June 2001. [7] D. Nizhegorodov. Jasper: Type-safe MOP-based language extensions and reflective template processing in java. In Proceedings of ECOOP’2000 Workshop on Reflection and Metalevel Architectures, 2000. [8] J. L. W. R. G. Gabriel, D. G. Bobrow. Object Oriented Programming - The CLOS perspective. The MIT Press, Cambridge, MA, 1993.

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05) 0-7695-2316-1/05 $20.00 © 2005 IEEE