Integrating Domain Speci c Language Design in the Software Life Cycle Philipp W. Kutter1 , Daniel Schweizer2, and Lothar Thiele1 1 Federal Institute of Technology, CH-8092 Zurich
fkutter,
g
thiele @tik.ee.ethz.ch
2 United Bank of Switzerland, CH-8021 Zurich
[email protected]
Abstract. Domain Speci c Languages help to split the software live cycle in dierent independent cycles. While the use of the newly created language is just an additional tool in the established cycle, the language live cycle is independent and opens the doors for the application of formal methods. We report on an industrial case study, where a driver speci cation language has been designed, formally speci ed, and nally an implementation has been generated from the speci cation. Using Abstract State Machines and Montages for the language speci cation, it was possible that the industrial partners learned how to maintain and extend the language speci cation. On the other hand the formal semantics of the method allows to apply dierent veri cation-oriented methods to the artifacts.
1 Introduction Today's software and hardware systems are characterized by a composition of complex subsystems and complex interactions between them. These subsystems are very often based on dierent models of computation and data. Interfaces among these heterogeneous subsystems and other problems of inter-operability are major sources of complexity. For example, in a data-warehouse scenario there is the need of connecting dierent information sources, such as dierent data bases and dierent data sources (e.g. Internet, les, data bases, user input/output). A similar situation can be found in the speci cation and implementation of heterogeneous electronic systems. No single model of computation is appropriate for all application areas. The correctness of interfaces between components modeled with dierent paradigms are of major importance. Traditionally there have been two major approaches to solve these problems: Problem{speci c: The interfaces (wrappers) between the heterogeneous components are implemented by hand in a tedious process. In addition the interfaces are written without reference to the underlying model of computation which is error prone.
General purpose: Frameworks have been provided for the speci cation of all
system components and their interaction. This approach has proven to be not feasible in several application domains as these general purpose frameworks are too complex to manage and to use by domain experts. Consequently there is the need for new methods and tools to design heterogeneous components and their interaction. Pretty early came up the idea, that little languages, tailored towards the speci c needs of a particular domain, can signi cantly ease building software systems for that domain [Ben86]. This domain-speci c approach, situated between the traditional problem-speci c and general purpose solutions, enables an interaction between the dierent domains on the language level, and not on the implementation level. Already [HB88] notes that if a domain is rich enough and program tasks within the domain are common enough, a language supporting the primitive concepts of the domain is called for, and such a language may allow a description of few lines long to replace many thousand lines of code in other languages. This idea is elaborated in the FAST process, a software{engineering methodology advocating the use of domain{speci c languages [Wei97]. Central to FAST is the process to identify a suitable domain of problems, and to nd abstractions common to the domain. These abstractions are used to produce family members in an orthogonal way. As a next step, a language is designed for specifying family members. The syntax of the language is based on the terminology already used by the domain experts, and the semantics is developed in tight collaboration with them. The goals are to bring the domain experts in the production loop, to respond rapidly to changes in the requirements, to separate the concerns of requirement determination from design and coding, and nally to rapidly generated deliverable code and documentation. Central for the industry is the possibility to capture and leverage the expertise of the involved domain{engineers. For the de nition and implementation of DSLs, powerful tool support is needed. Both the compiler construction community and the language speci cation community are now advocating to use their tools for DSLs. Advantages of compiler construction tools (e.g. Cocktail [GE90], Eli [GHL+ 92], or PCCTS [PDC92]) are ecient implementations and scalability to large languages, while advantages of language speci cation tools (e.g. Centaur [BCD+ 87], ASF+SDF [Kli93], or Gem-Mex [AKP97a]) are very abstract speci cations, that are easy to maintain and reuse. In an industrial setting, it has been shown that the advantages of both approaches can be combined by using operational language speci cations as basis for DSL de nition and implementation. In the following the work directly related to this approach is summarized. In particular, the results present in the paper can be summarized as follows: { A complex data warehouse problem at Union Bank of Switzerland is analyzed and the corresponding requirements for a solution are derived. { Strictly based on these requirements, dierent possible solutions are compared and the use of domain-speci c languages is selected as the appropriate approach.
{ Using formal methods and corresponding software tools, diculties in the
application of DSLs have been resolved. In particular, the approach is based on abstract state machines, see [Gur95], and on a domain-speci c structuring mechanism Montages, see [KP97b]. { The above formal methods have been integrated into the software life cycle.
2 Application Union Bank of Switzerland (UBS) is the world second largest bank. Numerous large scale databases are used to provide information to cope with the challenges of daily business as well as for long term strategic decisions. Data relevant to both elds reside on many dierent data bases from dierent manufacturers distributed all over the world. Most of them are relational data bases up to size of 1 Tbyte. They are grown historically and thus expose very heterogeneous structures and data models, respectively. There is nothing like a company wide data model and after the merger with Swiss Bank Corporation (SBC), more then ever, hardly anyone believes that there will be one in the future. One centrally located database is technically and politically not feasible and also from the viewpoint of accessibility and availability not desirable. Until recently, data access was very cumbersome, especially in cases where information from dierent sources has to be combined in a single report. Due to the lack of suitable tools on the market, the CUBIX [Mar97] project was started two years ago. Since September 1997 the system is up and running, serving a rapidly growing community of several dozens users, mainly in controlling departments. CUBIX now provides access to information including pro t centers, products, customers and accounts. In the following we sketch the CUBIX architecture, and summarize the requirements for rapid connection of new data-bases, that led to our application.
2.1 CUBIX CUBIX provides an infrastructure to build and maintain so called \virtual" data warehouses, needed to do \Online Analytical Processing" \Virtual" means, that it is not necessary to copy the relevant information to a single huge database. OLAP has evolved as users' needs for data analysis have grown. It provides executives, analysts and managers with valuable information via a \slice, dice and rotate" method of end user data access, augmenting or replacing the more complicated relational query. This slice and dice method gives the user consistently fast access to a wide variety of views of data organized by key selection criteria that match the real dimensions of the modern enterprise. Although CUBIX provides a multidimensional view [CCS93] (see the data cube with three dimensions on top of Fig. 1), the data still reside on existing relational databases as they did before. A CUBIX data warehouse may consist of several data-marts, each containing information about distinct aspects of the company. Data-marts are represented as multi-dimensional cubes. Several cubes may share some dimensions. The user doesn't have to care about the location or kind of data bases he
uses. Reports spanning over multiple existing data bases are as easy to get as such from a single one.
2.2 Cube Drivers One important component of the CUBIX system are drivers connecting the new multi-dimensional data model to the existing relational data bases. The core task of such a driver is to correlate the dimensions in the cubes to the tables and attributes in a relational data base. This is illustrated in Fig.1 showing how a tuple of coordinates is translated to a SQL-query retrieving the corresponding data in the relational data base. content of C at position K0 is V0 z
K0 = (x0, y0, z0)
C
coordinates
CUBIX Cube
V0
z0
value
x
x0
y0
y
driver Unix-process
table1
table2
table3
Q0 SQL-query
S SQL-data-base
Fig. 1. Cubes and tables connected by a driver.
2.3 Physical Representation in Relational Databases The basic components of the multidimensional model (MDM) are facts, dimensions and attributes. The rst two components, facts and dimensions, are represented physically in a relational database as tables. In the simplest MDM, the basic star schema, they are the only tables. In the star schema, each dimension is described by its own table, and the facts are arranged in a single large table, indexed by a multi-part key that is made up of the individual keys of each dimension. There are many variations on the simple star schema, but they all share the basic concept of fact tables and dimension tables. In many cases, not
all facts share the same dimensionality, and multiple fact tables may be used. An example is selling price, which may not vary between markets or customers, or the Federal Income Tax rate. When dimensions are large (high cardinality), it often makes sense to split the dimension tables at the level of the attributes, also known as the \snow ake schema". Storing aggregations and derivations yields even more exotic combinations, sometimes referred to as \decomposed stars" or \constellation schema".
2.4 Requirements for Connecting New Databases As mentioned earlier, our data bases grew historically, hardly any of them conforms to the star or snow{ ake scheme. Thus the connection of a new data base requires a distinct cube driver to be implemented (or at least con gured). So far, the implementation of a new cube driver took several days and had to be done by a highly trained computer scientist. Driven by the fast-growing needs for information within UBS and further accelerated by the recent merger with SBC, the number of data bases to be connected grows rapidly. Thus the rapid connection of new databases is crucial for the success of CUBIX and the requirements for connecting data-bases can be summarized as follows:
(1) The Connection of existing data bases must be possible in less than 1
week. Only a signi cant gain in time to market justi es the use of a new technology. (2) Data base administration sta should be able to connect their data bases to the CUBIX data warehouse. Training to do this should be less than two days. (3) Although CUBIX is already in use in daily business, development continues. Thus, if some changes of the CUBIX implementation are required, all drivers should be automatically adaptable. (4) To bene t from new versions of underlying data base management Systems, an adaption of the respective cube drivers may be required (e.g. to employ new query optimizations mechanisms). (5) The Cubix developers must be able to understand all aspects of the solution chosen for connecting data bases. To be able to incorporate future optimizations, they need full control over the used technique. The solution must be easy to document, easy to extend to new problems, and easy to learn to new people in the team.
3 Methodology Three possible approaches to meet the requirements as described in section 2.4 have been investigated. { Currently, the interfaces between the heterogeneous data sources are implemented by hand. Besides the fact that this problem-speci c process is error prone, it violates the requirements (1), (2), (3) and (4).
{ A typical academic approach is to build a general-purpose framework for the
speci cation of all system components and their interaction. With such a system, both multi-dimensional and relational data-models can be represented including their interaction. Consequently, the requirements (1), (3) and (4) can be satis ed. On the other hand, these frameworks are too complex to manage and to use by domain experts, see (2) and (5). { In a domain-speci c approach, a small language speci es the interfaces between the heterogeneous data sources. Obviously, handling (3) and (4) is particularly simple as only the semantics of the interface speci cation language must be adopted if CUBIX or a data-base management system changes. The speci cation of the DSL should be easy to understand, extend, and maintain such that domain experts are able not only to use them, but also to change them, see (1), (2) and (5). To this end, formal methods and tool support has been developed and integrated in the software life cycle at Union Bank of Switzerland. In the given application, we decided to develop a driver-speci cation language, called Correlation Modeling Language (CML). Each program in CML describes a driver. Having decided to follow this approach, additional requirements came up, mainly connected with the old requirement (5). (5.1) The syntax and semantics of the language must be unambiguously de ned. (5.2) The language de nition must be easy to read and understand. (5.3) The language should be easy to extend. (5.4) Changes in the de nition should be implementable without speci c knowledge, e.g. in compiler{construction techniques or formal methods. Requirements (5.4) and (5.2) rule out traditional implementation techniques. As alternative remain sophisticated compiler construction tools and programming language semantics formalisms. Although at a rst glance, compiler construction tools seem to be much more appropriate for an industrial context, they are not suited for our purpose. They allow to develop very ecient compilers for new languages. The involved techniques require a deep understanding of compiler construction, which contradicts requirements (5.2), and (5.3). Using a programming languages formalism that allows to generate an implementation from the speci cation, the driver generation process can be split in three phases as depicted in Fig. 2. In the rst phase, a speci cation of CML (written for instance in the structuring formalism Montages, see section 3.1) is fed to a rapid prototyping software tool (here for instance Gem-Mex, see section 4). The rapid prototyping tool generates an interpreter of CML. Using this interpreter a driver speci cation can be executed. Figure 1 is a more detailed presentation of phase three in Fig. 2.
3.1 Formal Semantics for Domain Speci c Languages
A number of semantics formalisms have been proposed, and they have been successfully used to investigate single constructs of languages and to discover aws
CML specification Montages
C-to-S driver specification CML
Gem-Mex
CML interpreter
driver
C-code
C-code
Unix-process
Phase 1
Phase 2
Phase 3
Fig. 2. The DSL process in the design of languages. They had a considerable in uence on the design of functional and logic programming languages. Recently most of them have been applied to DSLs as well. The approaches can be split in operational [Hen91], denotational [Sto77], and axiomatic [Dij76,Hoa69] ones. In operational semantics, the meaning of a well-formed program is the trace of computation steps that results from processing the program's input. In denotational semantics the is a mathematical function from input data to output data. The states of a computation must be encoded in the output data. Finally in axiomatic semantics the meaning is a logical proposition, that states some property about the input and outputs, respectively the state before or after a computation step. Unfortunately the mission to provide a semantical analogue to EBNF was never accomplished. Although Scott-Strachey-style denotational semantics [SS71] introduced 25 years ago the revolutionary idea to extend BNF to semantics, it has never been accepted by programmers, since denotational semantics of a realistic language is often too dense to read and too low level to modify [Sch97]. In general the disadvantage of the proposed approaches is that they involve a relatively heavy mathematical machinery. As discussed above, for the present application of DSLs there was a strict requirement (5) for a formal semantics, involving a minimum of mathematical machinery. Montages [KP97b,AKP97c] use Gurevitchs Abstract State Machines(ASMs) [Gur95] to extend BNF to semantics. Syntax, static analysis and semantics, and dynamic semantics are given in an unambiguous and coherent way by means of semi{visual descriptions. The speci cations are similar in structure, length, and complexity to those found in common language manuals. Montages are designed to provide a theoretical basis for a number of activities from initial language
design to prototyping. In order to meet requirement (5) Montages are based only on concepts well know to industrial programmers:
{ { { {
EBNF for syntax de nition control/data- ow graphs for static analysis simple predicate logic for static semantics imperative pseudo code for dynamic semantics
Formally, the imperative pseudo code are ASM rules, and the static aspects of Montages are formalized with ASMs too. Thus intuition of Montages is based on the above listened concepts, while formal semantics is given with ASMs. Such a formalization is in turn understandable as imperative pseudo code, and can thus be used as documentation for the industrial programmers. In the following we introduce ASMs and Montages. Abstract State Machines ASMs [Gur95,Hug] (formally called Evolving Algebras) are an operational speci cation formalism. In short, ASMs are a state{based formalism in which a state is updated in discrete time steps. Unlike most state based systems, the state is given by an algebra, that is, a collection of functions and a set of objects. The state transitions are given by rules that are red repeatedly. Objects: in each ASM the set of existing objects O 1 plays a fundamental r^ole. Each object has its identity, independent of the value of its attributes. For example we may express that all objects satisfy predicate P
8 :O: o
()
P o
Functions: Constants, Operations, Sets, Variables, Attributes, Relations, Records, Arrays, Data Structures correspond all to n-ary functions. As higher programming languages abstract from registers, ASMs abstract from the dierences of the listed concepts. Constants and Operations are static functions without respectively with arguments. Static means that they do not change their de nition over time. For convenience the constants true and false are prede ned denoting two dierent elements of O and the logical operations are prede ned. In addition a third object is denoted by the prede ned constant undef. Sets 2 or Types in the sense of collections of objects are modeled by functions from O to f g. Such a function delivers true for all members of a set (instances of a type), and false otherwise. The set or type consisting of true and false is called Boolean. A type corresponds thus to a function true; f alse
T
T
: O ! Boolean
1 in traditional ASM terminology objects are called elements and O is called the super-
universe
2 traditionally ASM literature speaks about universes
Types are used to describe domain and codomain of functions. But there is no xed type system as for instance in many-sorted or order-sorted algebras. In [Kut98] simple macros allowing to use set operations are de ned. Variables are 0-are functions, that can be rede ned. Attributes are unary functions, that can be point-wise rede ned. Relations are functions with more than one argument and Boolean as codomain. Records are unary functions from eld{names to eld{values, arrays are unary functions with a range of positive integers as domain. In general data structures are n-ary functions that can be rede ned point wise. The n-ary function is thus a unique concept, used to represent all sorts of functional dependencies, relations, storage, and data structures. Dynamics: Given a state of an ASM consisting of O and de nitions of the declared functions3 the system evolves by repeated execution of calculation steps. Calculation steps are de ned with transition rules. The fundamental construct of transition rules is the update or rede nition of a n-ary function at one point. (1
f t ;::: ;t
rede nes
n ) := t0
at point 1 n to 0 , where 0 1 n are the values of in the current state. The update may be guarded, and the simple guard construct can be generalized to if-then-elsif-else and case constructs. Abstracting Calculation Steps Unlike sequential, imperative languages ASMs allow to specify in a declarative way, what changes from one state to another rather than how this is done sequentially. This is achieved by means of parallel composition of updates. Swapping the values of variables and is done by executing the rule x := y y := x which naturally corresponds to the equation system f = = g relating as usual next-state (primed versions and ) with the current state. Arbitrary sequentialisation can be achieved by means of abstract program counters. ASMs where used to successfully describe the dynamic semantics of programming languages like Prolog [BR94], C [GH93], C++ [Wal95], Occam [BDR94b], Oberon [Kut97], Java [BS98], and SDF [GK97]. These speci cations have been used to verify the correctness of compilation, e.g from Prolog to WAM [BDR94a], Occam to Transputer [BD96], and C to DEC{Alpha processors [ZG97]. An up{to{date overview on case{studies with ASMs can be found on [Hug]. An important dierences of the dynamic semantics speci cations with ASMs to those using classical approaches, is the way how the parse tree is formalized. The classical way is to consider the parse tree as an element of the term{algebra generated by the context free grammar. In newer approaches like 3 O (the super-universe) together with interpretations of the function symbols builds 0 1
f
t ;t ;::: ;t
v ;::: ;v
v
v ;v ;::: ;v
n
x
x
x
0
y
0
an algebra, thus the states of ASMs are algebras.
0
y; y
y
0
x
[Gan85,Gur88,Ode89,PH91] and the mentioned ASM dynamic-semantics descriptions, each parse tree is formalized as an algebra, whose carrier sets contain the occurrences of the nodes. Based on this approach, control/data- ow graphs, can be directly speci ed and need not to be encoded with tables or continuations. In [PH97] functional speci cations of static analysis and semantics are combined with ASMs for dynamic semantics. The case-studies show, that the method scales to languages like C, and that pretty ecient interpreters can be generated. Montages , as described above can be seen as an application speci c structuring mechanism for ASMs. A language description is given by a collection of Montages, which is hierarchically structured according to the rules of the corresponding context-free grammar given in EBNF. Each Montage is a \BNF-extension-tosemantics", that is a self contained description in which the semantic properties of a given construct are formally de ned. In Fig. 3 the Montage speci cation of the select snow{ ake construct of the developed driver speci cation language CML is given. Syntactically the construct is built by the keyword \SNOW-FLAKE" and a list of value-type (given by a number $Number), attribute-name ($Ident) pairs. pairs. The purpose of the construct is to choose the corresponding attribute-name, for the currently requested value type (RequestedValueType ) and to assign it to the attribute Code of the SelectSnowFlake-node. For completeness, the Montage of the TypeAttrPair is given as well. The topmost part of the Montage is the production rule de ning the context{ free syntax. Below is a graphical representation of a pattern in the parse tree, and of the control and data ow graph. Inner nodes of the parse tree are represented with boxes and leaves with ovals. Nested boxes are used to represent nodes on lower levels of the parse tree. The solid and dotted arrows denote the data and control ow, respectively. Control ow arrows are labeled by means of boolean predicates which determine through which edges the control ows from one state to the next, e.g. in our example the predicate RequestedValueType = Self.SNumber indicates that the control is passed from a TypeAttrPair-node to a SelectSnowFlake-node whenever the number-value is equal to the requested value type. As in object-oriented languages, the Self is default, and can be omitted. For the sake of simplicity, we allow as well to omit the label in certain situations: { If the unique outgoing control arrow of a node has the label true, this label may be omitted. { If two control arrows c1 and c2 leave a node, and c1 is labeled with the negation of c2 's label, one of the labels may be omitted. The control ow arrows I (initial) and T (terminal) are special arrows which serve to plug together the local ow-information to the global one. For illustration, the global ow graph for the following example program
TypeAttrPair ::= Number" : "Ident TypeAttrPair
I
S-Ident
Attribute
T if S-Number = RequestedValueType then SnowFlake.Code := Attribute endif
SelectSnowFlake ::= "SNOW-FLAKE" fTypeAttrPair ";"g LIST S-TypeAttrPair
I
S-Number = RequestedValueType
SnowFlake
SelectSnowFlake
condition
T
for all p1, p2 in list S-TypeAttrPair (p1 # p2) = (p1.Attribute # p1.Attribute
>
Fig. 3. The select snow- ake construct
SNOW-FLAKE 1: Revenue; 2: Amount; 3: Risk;
is given in Fig. 4. To simplify the picture, the SnowFlake arrows are not drawn. I
C C
SelectSnowFlake T
1
C
Attribute not C
2 not C
3
Attribute Attribute
Revenue Amount Risk
Fig. 4. A global ow graph, where C , (number = RequestedValueType) The plugging mechanism is de ned as follows. { The functions Initial and Terminal are initialized as identity. The I and T arrows are used to rede ne them as follows: Self.Initial := Self.S-I.Initial Self.Terminal := Self.S-T.Terminal where S-I is the selector to the descendant pointed by the I arrow, and S-T is the selector to the descendant being source of the T arrow. { A solid data{ ow edge links the source with the terminal task of the target. This is useful, since the result of a construct is typically available at the end of its evaluation, e.g. at its terminal task. { A dotted control ow edge links the terminal task of its source with the initial task of its target. The third part of a Montage contains the static semantics (condition), for instance in the SelectSnowFlake (Fig. 3, the TypeAttrPair-nodes must have different number-values. The designer may make use of full rst-order logic to express context sensitive constraints. The last part contains the dynamic semantics rules, in the example the SelectSnowFlake has no dynamic semantics rule, and the dynamic semantics rule of the TypeAttrPair sets the Code attribute of the SnowFlake to the corresponding attribute.
Gem Program::= "begin" ...
HTML
Expression::= X "+" E Evaluation::= "?" E Assignment::= I ":=" E condition DefTab[I]=undef condition DefTab[I]=undef @":=": CT:=CT.NT condition DefTab[I]=undef @":=": CT:=CT.NT condition DefTab[I]=undef @":=": CT:=CT.NT @":=": CT:=CT.NT
global .asm
LaTeX
Mex Generated Montages executable
Mex Debugger
Fig. 5. The General Structure of the Gem-Mex System
4 Tool Support The development environment for Montages is given by the Gem-Mex tool [AKP97b], whose architecture is depicted in gure 5. It is a complex system which assists the designer in a number of activities related with the language life cycle, from early design to routine programmer usage. It consists of a number of interconnected components { the Graphical Editor for Montages (Gem) is a sophisticated graphical editor in which Montages can be entered; furthermore high-quality documentation can be generated; { the Montages executable generator (Mex) which automatically generates correct and ecient implementations of the language; { the generic animation and debugger tool visualizes the static and dynamical behavior of the speci ed language at a symbolic level; source programs written in the speci ed language can be animated and inspected in a visual environment. The whole development of a programming language can be supported with an eective impact on the productivity and robustness of the design. The designer can enter the speci cation, browse it and especially maintain it. Speci cations may evolve in time even in a non-monotonic way since modi cations can be localized within very neat boundaries. By doing so, dierent experimentation can take place with dierent versions of the syntax and semantics of the speci ed language in a very short time. Besides the pure editing functionality, Gem can be used to generate documents suitable for speci cation presentation. Experience suggests how lack in documentation is a dangerous bottleneck for the consistency and coherence of a project. Both, paper and online presentation of the language speci cation are automatically generated by Gem:
{ LATEX documents illustrate the Montages and the grammar; such documents are easily customizable for the non-specialist user; { HTML versions of the language speci cation allows to browse the speci ca-
tion and retrieve pieces of speci cation. Moreover, intelligibility is enhanced by means of \literate speci cation" techniques directly supported by Gem. Formal parts of the speci cation can be substituted with textual elements by means of a \literate programming" tool integrated in the system. \Literate speci cation" means that the Montages text elds may contain references to other parts of the formalization speci ed outside of the Montages modules. For the collaboration with industry, it was important, that Gem-Mex is integrated with the programming language C, e.g. the generated interpreter can be called directly from a C-program, and C-programs can be called inside Montages descriptions. In Fig. 2 we see how Gem-Mex is integrated in the driver development process. For the application to CML, the by Gem-Mex generated interpreter has shown to been ecient enough and is now directly running in the operational version of CUBIX at UBS. To provide more ecient implementations we continue to work on the integration of Montages with the Veri x approach for correct compiler construction [ZG97].
5 Integration in Software Life Cycle Before the introduction of CML the development of driver has been tangled with both the current state of CUBIX and the evolution of the underlying DBMS. To apply formal methods directly in such a frame work would not be possible. Through the introduction of a DSL, the life cycles of the driver software and the other components are less intertwined and the introduction of formal methods for the development of CML was possible even guaranteeing that the nal user needs only handle a very simple language. Such a user can concentrate on the correlation of his data-base and a cube, without worrying about CUBIX or the underlying DBMS. The CUBIX team, maintaining the CML speci cation, can easily adapt the speci cation to changes in their system or to changes in the DBMS. Finally improvements of Gem-Mex are directly bene cial for the already speci ed drivers, and for the maintenance process of CML. As well, if the drivers are ported to another system, only Gem-Mex has to be adapted.
6 Conclusions The application of the afore mentioned methods and tools in the CUBIX project led to more universal and more exible solutions for connecting relational data bases to the CUBIX data warehouse. The domain speci c language CML together with the tools supporting its application reduced the eort and skills necessary to rapidly establish and maintain data base connections signi cantly. The advances of using Gem-Mex to design, specify, implement, and document CML are:
{ Simplicity Only simple imperative updates and drawings of control and
data ow graphs are used to specify the language. Only 28 dierent symbols (for variables, macros, e.t.c.) are used in addition to the symbols introduced by the EBNF syntax-rules. The whole textual speci cation of static and dynamic semantics is 92 lines long. { Modularity and Extendibility The speci cation is split in 22 Montages. Each Montage speci es one language construct. Most information is local, and changes of one Montage do not aect the other Montages. { Self documenting The simplicity, shortness, and modularity of the speci cation make it usable as documentation as well. Using again the Montages of CML, Gem-Mex generates HTML and paper versions of the language description. The Montages approach could be learned within one day by an industrial programmer, and allows the CUBIX developer to extend the language if new problems appear. item Eciency From the Montages speci cations of CML, Gem-Mex generates an interpreter. This interpreter is ecient enough to be used in the operational version of CUBIX. Calculating the SQL queries for a report with 100 result values, using a complicate driver speci cation takes about 1.5 seconds on an Ultra30, for a simple speci cation about 0.6 seconds. { Portability Gem-Mex generates C-code, that can be compiled on arbitrary architectures and operating systems. Gem-Mex itself is only based on C and TclTk, which exist on all commercial platforms, like Windows, Windows NT, Unix. The tools can thus easily be integrated in a developer version of CUBIX. It has been shown that abstract state machines [Gur95] structed with Montages [KP97b] are well suited to de ne domain speci c languages. The formal speci cation of the language CML has been a starting point to successfully introduce formal methods into an industrial setting. In particular, formal methods specialists can read Montage models since the semantics of abstract state machines is a mathematical object. Domain experts need not learn other formal methods in order to validate the developed formalizations. Therefore, formal methods far above the complexity understandable by domain experts can be applied, if a Montage/ASM speci cation exists. Acknowledgments We would like to thank to Matthias Anlau, Alfonso Pierantonio, and Christoph Denzler for their help with Gem-Mex and their collaboration in the Montages project, and to Werner Brunner, Reto Marti, Stefan Pascheda and Matthias Seitz at UBS for their support and con dence.
References [AKP97a] M. Anlau, P. Kutter, and A. Pierantonio. Formal aspects of and development environments for montages. In ASF+SDF'97, Workshops in Computing. Springer, 1997.
[AKP97b] M. Anlau, P.W. Kutter, and A. Pierantonio. GemMex-Homepage, 1997. http://www. rst.gmd.de/ma/gemmex/. [AKP97c] M. Anlau, P.W. Kutter, and A. Pierantonio. Montages-Homepage, 1997. http://univaq.it/ alfonso/montages/Mhome.html. [BCD+ 87] P. Borra, D. Clement, T. Despeyroux, J. Incerpi, G. Kahn, B. Lang, and V. Pascual. Centaur: the system. Technical Report 777, INRIA, Sophia Antipolis, 1987. [BD96] E. Borger and I. Durdanovic. Correctness of compiling Occam to Transputer code. Computer Journal, 39(1):52{92, 1996. [BDR94a] E. Borger and D. D. Rosenzweig. The WAM { De nition and Compiler Correctness. In C. Beierle and L. Plumer, editors, Logic Programming: Formal Methods and Practical Applications, Studies in Computer Science and Arti cial Intelligence, chapter 2, pages 20{90. North-Holland, 1994. [BDR94b] E. Borger, I. Durdanovic, and D. Rosenzweig. Occam: Speci cation and Compiler Correctness. Part I: The Primary Model. In IFIP 13th World Computer Congress, Volume I: Technology/Foundations, pages 489 { 508. Elsevier, Amsterdam, 1994. [Ben86] J.L. Bentley. Programming pearls: Little languages. Communications of the ACM, 29(8):711{721, 1986. [BGM95] E. Borger, U. Glasser, and W. Muller. Formal De nition of an Abstract VHDL'93 Simulator by EA-Machines. In C. Delgado Kloos and P. T. Breuer, editors, Formal Semantics for VHDL, pages 107{139. Kluwer Academic Publishers, 1995. [BR94] E. Borger and D. Rosenzweig. A Mathematical De nition of Full Prolog. In Science of Computer Programming, volume 24, pages 249{286. NorthHolland, 1994. [BS98] E. Borger and W. Schulte. Programmer friendly modular de nition of the semantics of java. In Formal Syntax and Semantics of Java, LNCS. Springer, 1998. to appear. [CCS93] E.F. Codd, S.B. Codd, and C.T. Salley. Beyond decision support. Computerworld, 27(30), July 1993. [Dij76] E.W. Dijkstra. A Discipline of Programming. Prentice{Hall, 1976. [Gan85] H. Ganzinger. Modular rst-order speci cations of operational semantics. In H. Ganzinger and N.D. Jones, editors, Programs as Data Objects, number 217 in LNCS. Springer, 1985. [GE90] J. Grosch and H. Emmelmann. A tool box for compiler construction. Technical Report 20, GMD, Universita Karlsruhe, January 1990. [GH93] Y. Gurevich and J.K. Huggins. The Semantics of the C Programming Language, volume 702 of LNCS, pages 274 { 308. Springer Verlag, 1993. [GHL+ 92] R.W. Gray, V.P. Heuring, S.P. Levi, A.M. Sloane, and W.M. Waite. Eli: A complete, exible compiler construction system. Communications of the ACM, (35):121 { 131, February 1992. [GK97] U. Glasser and R. Karges. Abstract State Machine Semantics of SDL. Journal of Universal Computer Science, 3(12):1382{1414, 1997. [Gur88] Y. Gurevich. Logic and the challenge of computer science. In E. Boerger, editor, Current Trends in Theoretical Computer Science, pages 1 { 57. CS Press, 1988. [Gur95] Y. Gurevich. Evolving Algebras 1993: Lipari Guide. In E. Borger, editor, Speci cation and Validation Methods. Oxford University Press, 1995.
[HB88] [Hen91] [Hoa69] [Hug] [Kli93] [KP97a] [KP97b] [Kut97] [Kut98] [Mar97] [Ode89] [PDC92] [PH91] [PH97] [Sch86] [Sch97] [SS71] [Sto77] [Wal95] [Wei97] [ZG97]
R.M. Herndon and V.A. Berzins. The realizable bene ts of a language prototyping language. IEEE Transactions on Software Engineering, 14:803{ 809, 1988. M. Hennessy. The Semantics of Programming Languages: An Elementary Introduction Using Structured Operational Semantics. Wiley, New York, 1991. C.A.R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576 { 581, 1969. J. Huggins. Abstract state machines web page. http://www.eecs.umich.edu/gasm. P. Klint. A meta{environment for generating programming environments. ACM Transactions on Software Engineering and Methodology, 2(2):176{201, April 1993. P.W. Kutter and A. Pierantonio. The formal speci cation of Oberon. J.UCS, 3(5):443 { 503, 1997. P.W. Kutter and A. Pierantonio. Montages: Speci cations of realistic programming languages. J.UCS, 3(5):416 { 442, 1997. P.W. Kutter. Dynamic semantics of the programming language Oberon. Technical Report 27, Computer Engineering and Networks Laboratory, ETH Zurich, 1997. P.W. Kutter. An ASM Macro Language for Sets. Technical Report 34, Computer Engineering and Networks Laboratory, ETH Zurich, 1998. R. Marti. Das cubix-metamodell. SBG{ORIM, 1997. Internal publication of Union Bank of Switzerland. M. Odersky. A New Approach to Formal Language De nition and its Application to Oberon. PhD thesis, ETH Zurich, 1989. T.J. Parr, H.G. Dietz, and W.E. Cohen. Pccts reference manual (version 1.00). ACM SIGPLAN Notices, pages 88 { 165, February 1992. A. Poetzsch-Heter. Formale Spezi kation der kontextabhangigen Syntax von Programmiersprachen. PhD thesis, Technische Uni. Munchen, 1991. A. Poetzsch-Heter. Prototyping realistic programming languages based on formal speci cations. Acta Informatica, 34:737{772, 1997. 1997. D.A. Schmidt. Denotational Semantics: A Methodology for Language Development. Allyn and Bacon, Inc., 1986. D.A. Schmid. On the need for a popular formal semantics. Sigplan Notices, 1997. D.S. Scott and C. Strachey. Toward a mathematical semantics for computer languages. Computers and Automata, (21):14 { 46, 1971. Microwave Research Institute Symposia. J.E. Stoy. Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory. MIT Press, 1977. C. Wallace. The Semantics of the C++ Programming Language. In E. Borger, editor, Speci cation and Validation Methods, pages 131{164. Oxford University Press, 1995. D.M. Weiss. Creating domain{speci c languages: The fast process. In S. Kamin, editor, First ACM-SIGPLAN Workshop on Domain{Speci c Languages DSL'97, 1997. W. Zimmermann and T. Gaul. On the construction of correct compiler backends: An asm approach. Journal of Universal Computer Science, 3(5):504 { 567, 1997.