Automata Minimization: A case study in combining ...

9 downloads 1057 Views 263KB Size Report
ate executable code and animate the algorithm in a graphical editor. .... fun acc nameij :: typeconstr => typexprij where .... Strecker/Publications/medi2011.html ...
Automata Minimization: A case study in combining formal verification and model-driven engineering Selma Djeddai1 , Mohamed Mezghiche2 , Martin Strecker1 , and R´emy Wyss3 1

IRIT (Institut de Recherche en Informatique de Toulouse) Universit´e de Toulouse http://www.irit.fr/∼Selma.Djeddai/ 2 LIMOSE, Universit´e de Boumerd`es Facult´e des Sciences Boumerd`es, Algeria 3 ONERA, Toulouse, France

Abstract. Formal methods are increasingly used for safety critical developments, not only on the application level, but also on the tools level. It thus becomes interesting to formally specify the core components of a tool, to implement verifiably correct transformations, and to manipulate these components in a graphical editor. This paper takes first steps in exploring two sides of the interplay of three essential ingredients: formal verification (with the aid of interactive proof assistants), provably correct code (in the Scala programming language), and Model Driven Engineering (with the EMF framework). Taking as example a simple form of finite automata, we show how we can formally verify the correctness of a minimization algorithm, generate executable code and animate the algorithm in a graphical editor. We provide a mapping of functional data structures, commonly used in proof assistants, to EMF, thus paving the way for eventually using automated editor generators. Keywords: Model Driven Engineering; Model Transformation; Formal Methods; Verification

1

Introduction

Formal methods are increasingly used not only during the development and verification of application code, but also for development tools and environments. For example, there is a growing body of literature dealing with compiler verification, provably correct static analyses and program transformations. These methods are not limited to mainstream programming languages, but can also be applied to Domain Specific Languages (DSLs) that are at the heart of Model Driven Engineering (MDE) in diverse applicative areas such as embedded system design. In these contexts, MDE approaches [2,13] are exposed to requirements that are difficult to reconcile: They have to produce highly reliable software,

even though the domain experts often do not have a computer science background and are used to their own, mostly graphical, notation and development methodology. The provider of an MDE environment, such as Topcased [8], is in turn confronted with the dilemma of choosing the right platform: Arguably the most convincing verification environments for manipulating formal languages, specifying their semantics and carrying out correctness proofs of transformations are based on a functional programming paradigm. On the other hand, objectoriented (OO) languages are the choice when it comes to developing graphical user interfaces. And a third factor comes into play: meta-modeling, which allows to specify the class structure of an application and, increasingly, to generate code stubs for syntax analyzers or graphical editors. M MF FF qq8 q FF q q FF q q FF q q xq " / OO V erif ication

Fig. 1: Meta-modeling (MM), verification environments and OO languages

This article studies the interplay of these three formalisms, as illustrated in Figure 1, and thus constitutes a first step towards integrating formal verification into MDE. The development methodology we prone is further spelled out in Section 2 and compared to other approaches. Section 3 constitutes the technical core of the article; it describes a translation from data models in the functional programming world, used in verification environments, to meta models in the EMF formalism, and a translation in the inverse direction. We illustrate the methodology in Section 4 with a toy example: automata minimization. In Section 5, we conclude with perspectives of further work.

2

Methodology: Combination of MDE and Verification

Let us now give some more details of our choices concerning the formalisms sketched in Figure 1: – The Eclipse4 environment has become a quasi-standard as software development platform and meta-modeling environment. It comes equipped with the Eclipse Modeling Framework (EMF) [3], a modeling language similar to a subset of UML, used for describing data models of an object-oriented application. These can further be processed by tools for creating development environments for programming languages, including graphical editors (via GMF5 ). 4 5

http://www.eclipse.org http://www.eclipse.org/gmf/

– For formal verification, we use the Isabelle system [12], which is, roughly speaking, the combination of a functional, ML-like programming language and a higher-order logic. This choice is not a necessity; instead, we could without great additional effort provide an interface to similar proof assistants such as Coq or HOL. – The OO programming languages are Scala6 and Java. This choice is motivated by the fact that a code generator exporting formal developments from Isabelle to Scala already exists. Providing a code extractor to other OO languages would require considerable effort, such as re-implementing higher order aspects and pattern matching. Our formal developments (i. e. the programs we develop and verify in Isabelle) are primarily purely functional programs operating on inductively defined datatypes. A more recent extension allows us to deal with data structures containing pointers [9,10]. Using this extension, we can deal with a type of references in our translation, see Section 3.2. Due to the similarity of context-free languages and inductive datatypes, our translation from datatypes to meta-models bears resemblance with transformations from models to texts [11]. Graph transformation tools [5,7] offer some verification functionality, which is however often limited to syntactic aspects (such as confluence of transformation rules) and does not allow to model deeper semantic properties (such as an operational semantics of a programming language and proofs by bisimulation). For the Focal specification environment, a mapping to UML [6] has been defined, which is however essentially different from our datatype-centric approach.

3

From Datatypes to Meta-Models and Back again

3.1

Context

In this part, we present in detail, how we performed the automatic translation from functional datatypes to meta-models, and backwards. MDE counts several formalisms to perform meta-modeling including the Unified Modeling Language (UML) and Ecore, the modeling language provided by the Eclipse Modeling Framework (EMF) [3]. Although the two formalisms have the same descriptive power, their components and graphical interfaces are different. We choose to describe meta-models in Ecore for integrability reason. Indeed, we aim to integrate our work to Eclipse. The translator is developed in Java, on Eclipse. The latter provides a means to exploit and create an Ecore meta-model. The first direction of translation is from functional datatypes to Ecore metamodels. The entry format is a sequence of datatypes. These datatypes are translated to compositions of Ecore components. This is further developed in Section 3.2. The reverse direction of translation is presented in Section 3.3. 6

http://www.scala-lang.org/

3.2

Datatypes to MM

A type declaration in functional programing consists of a sequence of datatype definitions. We are not able, yet, to handle all the syntactic features provided in functional programming. We based our work on a subset, defined in the grammar presented in Figure 2. Each datatype is composed of one or more constructor declarations. These declarations are formed of a constructor that takes as argument some (optional) type expressions. This subset does not allow mutable data structures such as arrays, higher-order functions and type constraints.

type-definition ::= datatype [type-prm]typeconstr = constr-decl { | constr-decl } * constr-decl ::= constr-name [{typexpr} *] typexpr ::= type | type list | type option | type ref type ::= [type-prm] typeconstr |premitiv-type |type-prm premitiv-type ::= nat | bool | string

Fig. 2: Functional Grammar

Datatype definitions do not provide all needed information. Among others, there are no named accessors to components of data types. We therefore had to define a syntax to introduce accessors by the annotation (*@ accessor*), defined as follow: --(*@ accessor *) fun acc nameij :: typeconstr => typexprij where acc nameij (constr-namei (xi1 , ..., xiq ) ) = xij The parsing supplies us with a data-structure of datatypes and, for each constructor, a list of accessor names acs list(constr-name) that is directly usable in the translation function. It is represented further as a context Γ (constrname) in the translation function T rdecl . The transformation function Tr() takes datatype structures as input, and produces the corresponding meta-model. T r : DataT ypes −→ Ecore Meta-model Depending on the form of the datatype, Tr() behaves differently. For datatypes composed only of constr-names (without typexpr s), the datatype is translated to an EEnum which is usually employed to model enumerated datatypes. T r(tp constr = cn1 |...|cnp ) = createEEnum(); setN ame(tp constr); T rc−name (cni ) / 1 ≤ i ≤ p T rc−name (cni ) = EEnumLiteral(cni ) / 1 ≤ i ≤ p

On the other side, when constructor declarations are composed of constructor names and type expressions, the translation is operated differently. First, an EClass is created to represent the type constructor (typeconstr ). Then, for each constructor, an EClass is created too, and inherits from the typeconstr one. T r(tp constr = cd1 |...|cdn ) = createEClass(); setN ame(tp constr); T rdecl (Γ (cdi ), cdi ) / 1 ≤ i ≤ n T rdecl (Γ (cni ), cni t1 ...tm ) = createEClass(); setN ame(cni ); setSuperT ype (EClass (tp constr)); T rtype (γ(tj ), tj ) / 1 ≤ j ≤ m Where: Γ (cni t1 ...tm )

= {acs1 , ..., acsm } =⇒ γ(tj ) = acsj 1 ≤ j ≤ m

In case the datatype definition is a polymorphic datatype, composed of a type parameter and a datatype constructor, it is translated to a generic type in the meta-model. It consists of the creation of an EClass and setting it as an EGenericType. The type parameter becomes an ETypeParameter. The constructor declarations are processed exactly like in the previous case. We took notice during the development that the EGenericType is not explicitly represented in the EcoreDiagram. Although it is correctly present in the Ecore file, the graphical interface does not enable to represent it graphically. T r(tp prm tp constr = cd1 |...|cdn ) = createEClass(); setN ame(tp constr); T rdecl (Γ (cdi ), cdi ) / 1 ≤ i ≤ n createEGenericT ype(); setEClassif ier(EClass); createET ypeP arameter(); setN ame(tp prm); setET ypeP arameter(tp prm) To translate type expressions, several cases have to be taken into account. In the primitive type case, the translation function generates a new EAttribute in its corresponding EClass. If it is a datatype previously defined, the program creates a containment link between the current EClass and the EClass referring to the datatype, and sets the cardinality to 1. T rtype (asc, primitiveT ype)

= createEAtrribute(); setN ame(asc); setT ype(primitiveT ypeEM F ) = createERef erence(); setN ame(asc); T rtype (asc, tp constr) setT ype (tp constr); setContainment (true); setLowerBound(1); setU pperBound(1) T rtype (asc, tp prm tp constr) = createERef erence(); setN ame (asc); setT ype (tp constr) T rtype (asc, tp prm) = createERef erence(); setN ame(asc); setT ype (tp prm)

The type expressions can, also, appear in the form of a type list. The change is operated on the cardinality to 0...*.

The type expression type option in functional programming is used to express whether a value is present or not. It returns None, if it is absent and Some value, if it is present. Then, we change the cardinality to 0...1. The last case that we deal with, is type ref which is used to represent pointers. It is translated to references without containments. T rtype (asc, t ref)

= T rtype (asc, t) setContainment(F alse); T rtype (asc, t list) = T rtype (asc, t) setLowerBound(0); setU pperBound(∗); T rtype (t option, asc) = T rtype (asc, t) setLowerBound(0); setU pperBound(1); In Figure 4, we present the result of applying our translation function Tr() on the functional datatype definition of a simple arithmetic expression (see Figure 3).

datatype expr = Add expr expr | Vars string | Consts nat — (*@ accessor *) fun name :: expr ⇒ string where name (Vars v ) = v — (*@ accessor *) fun val :: expr ⇒ nat where val (Consts c) = c — (*@ accessor *) fun expr1 :: expr ⇒ expr where expr1 (Add e1 - ) = e1 — (*@ accessor *) fun expr2 :: expr ⇒ expr where expr2 (Add - e2 ) = e2

Fig. 3: Datatype Expression

Fig. 4: The generated Ecore-diagram file

3.3

MM to Datatypes

In the opposite direction, we developed in Java a function that generates datatype definitions from Ecore meta-models. To browse these meta-models, we used the package org.eclipse.emf.ecore.xmi. It furnishes a parser that make usable the different Ecore components such as EClassifiers, EStructuralFeatures etc. The translation function developed there is almost the inverse of the one presented in Section 3.2. We have reversed the transformation rule to get the corresponding datatype components. For example, to translate an EEnum, we first get all the EClassifiers, check for their instances and transform it to a datatype composed only of constructor names that are the translation of the EEnumLiterals.

4

Case study: Automata Minimization

4.1

Informal presentation

In this part, we will present a case study that is sufficiently small to illustrate the essential concepts, while facing the same kind of challenges of larger formal verifications. We have implemented and verified essential parts of Moore’s automata minimization algorithm (Section 4.2). By using Isabelle’s code extraction facilities, we obtain a verified algorithm in Scala, that can be combined with a graphical user interface. We can also similarly apply our meta-model conversion facilities to derive a meta-model of the data structures used in our algorithm (Section 4.3). Given a deterministic finite automaton (DFA), minimization (see [1] for a survey) consists in finding a DFA having less states but accepting the same language. We use here a cut-down version where the transitions are not indexed by the letters of an alphabet. Thus, our automata essentially count the number of characters of a string. For example, the automaton in Figure 5a accepts exactly the strings of odd length. So does the smaller automaton in Figure 5b, resulting from merging the states S1 and S3 and S2 and S4, respectively. 4.2

Verification

To give a flavour of our formalization, we present here the essential steps. The only other formalization we are aware of has been carried out in the NuPrl system [4], which differs considerably from our development since the algorithm is not given explicitly, but extracted from a constructive proof. The curious reader can consult the details in an accompanying document on the authors’ web pages7 . There is one essential point we want to illustrate: verification often involves auxiliary notions (in our case: the language accepted by an automaton) that are usually not included in a meta-model. Verification is 7

http://www.irit.fr/∼Martin.Strecker/Publications/medi2011.html

essentially based on these auxiliary notions and cannot be carried out by an a posteriori check with OCL or similar mechanisms. We define automata classically (also refer to the meta-model in Figure 6) as consisting of a list of states and transitions. Each state has a state id (needed for our treatment of references), Boolean flags indicating whether the state is initial or final, and a generic contents field. Transitions have two references to states (respectively to the source and destination of the transition). For reasons of space, we only display some of the accessors: datatype 0s state = State nat bool bool 0s — (*@ accessor *) fun contents :: 0s state ⇒ 0s where contents (State - - - c) = c datatype 0s transition = Transition ( 0s state ref ) ( 0s state ref ) — (*@ accessor *) fun src :: 0s transition ⇒ 0s state where src (Transition (Ref s) -) = s datatype 0s automaton = Automaton ( 0s state list) ( 0s transition list)

The format, called pre-automaton, that we use internally for carrying out the verification, is slightly different, but the two formats are easily interconvertible. A notion that is important for defining the correctness of our transformation is the language accepted by an automaton. We obtain it by first defining recursively that a word of a certain length is accepted in a state (accepts-at). From this, we define the language accepted in a state (language-at) and the language of the automaton: fun accepts-at :: 0s pre-automaton ⇒ 0s ⇒ nat ⇒ bool where accepts-at aut s 0 = (s ∈ final-states aut) | accepts-at aut s (Suc n) = (∃ s 0 ∈ ((transitions aut) ‘‘ {s}). accepts-at aut s 0 n) definition language-at :: 0s pre-automaton ⇒ 0s ⇒ nat set where language-at aut s = {n . (accepts-at aut s n)} 0 definition language S :: s pre-automaton ⇒ nat set where language aut = ( s ∈ (initial-states aut). language-at aut s)

The algorithm presented below calculates progressively finer partitions of “equivalent” states. It first distinguishes only final and non-final states (thus, states that behave the same for words of length 0). It then recursively splits partitions equivalent for words of length up to h by traversing the transitions backwards to obtain partitions equivalent for words of length up to h + 1. This notion of equivalence is called Moore equivalence, and there is the related notion of Nerode equivalence for states accepting the same language: definition moore-eq :: 0s pre-automaton ⇒ nat ⇒ ( 0s ∗ 0s) set where moore-eq aut h = {(p, q). p ∈ states aut ∧ q ∈ states aut ∧ language-at-bounded aut p h = language-at-bounded aut q h} definition nerode-eq :: 0s pre-automaton ⇒ ( 0s ∗ 0s) set where

nerode-eq aut = {(p, q). p ∈ states aut ∧ q ∈ states aut ∧ language-at aut p = language-at aut q}

And indeed, the two concepts are related in the sense that if the function moore-rec below reaches a fixed point for a value h, then moore-eq is the same as nerode-eq. This result is subject to some preconditions, among others that the automaton is deterministic and has outgoing transitions for each state. theorem nerode-eq-moore-eq-fixp: Field (transitions aut) ⊆ states aut ∧ deterministic aut ∧ rel-complete (states aut) (transitions aut) =⇒ moore-eq aut h = moore-eq aut (Suc h) =⇒ nerode-eq aut = moore-eq aut h

Even without spelling out all the details, the definition of a quotient automaton with respect to an equivalence relation r should be clear: for a given pre-automaton, we form the automaton whose state set, initial and final state sets are factorized by r, and analogously the transitions. The following theorem then shows that the language accepted by the automaton factorized by nerode-eq is the same as the language of the original automaton. fun quotient-aut :: 0s pre-automaton ⇒ ( 0s ∗ 0s) set ⇒ 0s set pre-automaton where quotient-aut (PreAutomaton sts ists fsts trs) r = (PreAutomaton (sts // r ) (ists // r ) (fsts // r ) (quotient-rel trs r )) theorem language-quotient: deterministic aut ∧ wf-pre-automaton aut =⇒ language (quotient-aut aut (nerode-eq aut)) = language aut

It remains to effectively compute the Nerode equality, by computing its characteristic partition (function minim-partition). We do this by computing the above-mentioned fixed point, starting from an initial partition. The recursion moore-rec itself is straightforward (tr is the set of transitions, P P is the partition at level h as described above, P P 0 the partition at level h + 1). Most of the complexity is hidden behind the function split-partition, which performs non-trivial computations that justify a formal treatment. type-synonym 0a partition = 0a set set 0 0 definition split-partition S :: ( 0a ∗ 0a)Sset ⇒ S a partition ⇒ a partition where 0 split-partition tr PP = ( P ∈PP . P ∈ x ∈P . S { P 0∈ PP . if x ∈ tr −1 ‘‘ P 0 then tr −1 ‘‘ P 0 else {}}. {P ∩ P 0})

function moore-rec :: ( 0a ∗ 0a) set ⇒ 0a partition ⇒ 0a partition where moore-rec tr PP = (let PP 0 = split-partition tr PP in if PP = PP 0 then PP else moore-rec tr PP 0) definition initial-partition :: 0a pre-automaton ⇒ ( 0a partition) where initial-partition aut = {final-states aut, states aut − final-states aut} − {{}} definition minim-partition :: 0a pre-automaton ⇒ ( 0a partition) where minim-partition aut = moore-rec (transitions aut) (initial-partition aut)

Our presentation, which necessarily had to stay sketchy, should convey a feeling for the interaction of notions defined in a meta-model and the notions required for verification. 4.3

Graphical display

Once the code is verified with Isabelle, we generate automatically the corresponding Scala code. For displaying the automata, we used the graphical Java library: JGraphX 8 . The fact that Scala compiles into bytecode, like Java, provides a strong interoperability between the two languages. It allows to call Java libraries from Scala code by adding the jgraphx.jar to the classpath. This library enables, among others, to display graphs. It offers methods to define the vertexes and edges of the graph. It allows also, to determine specific styles for the graph components. For our example, we define a specific style for each of the final states and the non-final ones. We call the displaying function before and after the minimization. In Figure 5a is presented the graphical display of the automaton before performing the minimization, and in 5b the minimized one.

(a) Source automaton

(b) Minimized automaton

Fig. 5: Displayed automata before and after minimization

Figure 6 represents the result of applying our translation algorithm, described in Section 3.2, to the datatypes used for verifying the minimization algorithm (presented in Section 4.2). As mentioned earlier, generic types do not appear in the diagram (Figure 6b). The EGenericType is presented there as a simple class. We therefore present in Figure 6a the Ecore file displayed in Eclipse, where the generic types are clearly represented, State is parametrized with the type parameter S.

5

Conclusions

Our work constitutes a first step towards a combination of formal verification and model-driven engineering. This article has essentially been concerned with 8

http://www.jgraph.com/jgraph.html

(a) Ecore Model

(b) Ecore Diagram

Fig. 6: Ecore Model and Diagram

two sides of the triangle of Figure 1: Starting from a formal verification, we have on the one hand been able to derive an executable algorithm that can be smoothly integrated into the object-oriented world and coupled with graphical user interfaces. On the other hand, we generate an EMF meta-model that can be further manipulated by the Eclipse workbench. So far, the meta-model is useful for purposes of documentation, but we hope to draw further benefit from it by including the third side of the triangle of Figure 1, for example for the generation of editors with the aid of Eclipse GMF that allows to avoid, as far as possible, the manual coding of interfaces as in Section 4.3. Making this round-trip is not entirely straightforward, because one has to ensure that the code generated from the meta-model is compatible with the code extracted by the verification engine. We will investigate some options, such as adapting the code extractor, or in addition generating glue code for bridging possibly incompatible data models. We intend to apply our method to larger developments, such as creating environments for manipulating Domain Specific Languages. Here, we face challenges of the same nature as in Section 4.2: the languages have a complex operational or denotational semantics; some procedures, such as type checking with nonstandard type systems, require complex inferences that have to be justified with respect to the semantics; and refinement proofs (for example for compiler correctness) can barely be carried out on the level of instances. Currently, our approach is verification-centric: The starting point is the verification environment, from which we derive executable code and the metamodel. When dealing with more substantial applications, we will see whether this method scales up to larger developments, or whether the focus shifts to the meta-model.

Acknowledgments We are grateful to Mathieu Giorgino for fruitful discussions about modeling of imperative algorithms and code extraction in Isabelle.

References 1. Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot. Minimization of automata. CoRR, abs/1010.5318, 2010. 2. Jean B´ezivin. Model driven engineering: An emerging technical space. In Ralf L¨ ammel, Jo˜ ao Saraiva, and Joost Visser, editors, Generative and Transformational Techniques in Software Engineering, volume 4143 of Lecture Notes in Computer Science, pages 36–64. Springer Berlin / Heidelberg, 2006. 3. Frank Budinsky, Stephen A. Brodsky, and Ed Merks. Eclipse Modeling Framework. Pearson Education, 2003. 4. Robert L. Constable, Paul B. Jackson, Pavel Naumov, and Juan Uribe. Formalizing automata theory i: Finite automata. Unpublished manuscript, Cornell University, 1996. 5. Juan de Lara and Hans Vangheluwe. Using AToM3 as a meta-case tool. In ICEIS, pages 642–649, 2002. ´ 6. David Delahaye, Jean-Fr´ed´eric Etienne, and V´eronique Vigui´e Donzeau-Gouge. A Formal and Sound Transformation from Focal to UML: An Application to Airport Security Regulations. In UML and Formal Methods (UML&FM), Innovations in Systems and Software Engineering (ISSE) NASA Journal, Kitakyushu-City (Japan), October 2008. Springer. 7. Karsten Ehrig, Claudia Ermel, Stefan H¨ ansgen, and Gabriele Taentzer. Generation of visual editors as eclipse plug-ins. In Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, ASE ’05, pages 134–143, New York, NY, USA, 2005. ACM. 8. Patrick Farail, Pierre Gaufillet, Agusti Canals, Christophe Le Camus, David Sciamma, Pierre Michel, Xavier Cr´egut, and Marc Pantel. The TOPCASED project: a Toolkit in Open source for Critical Aeronautic SystEms Design. In European Congress on Embedded Real-Time Software (ERTS), Toulouse, 25/01/200627/01/2006. Soci´et´e des Ing´enieurs de l’Automobile, January 2006. 9. Mathieu Giorgino and Martin Strecker. BDDs verified in a proof assistant (preliminary report). In Proceedings TAAPSD’2010, Univ. Taras Shevchenko, Kiev, 2010. 10. Mathieu Giorgino, Martin Strecker, Ralph Matthes, and Marc Pantel. Verification of the Schorr-Waite algorithm - From trees to graphs. In 20th International Symposium on Logic-Based Program Synthesis and Transformation (Lopstr), LNCS. Springer Verlag, July 2010. 11. Pierre-Alain Muller, Fr´ed´eric Fondement, Franck Fleurey, Michel Hassenforder, R´emi Schneckenburger, S´ebastien G´erard, and Jean-Marc J´ez´equel. Model-driven analysis and synthesis of textual concrete syntax. Software and System Modeling, 7(4):423–441, 2008. 12. Tobias Nipkow, Lawrence Paulson, and Markus Wenzel. Isabelle/HOL. A Proof Assistant for Higher-Order Logic. LNCS 2283. Springer Verlag, 2002. 13. Bran Selic. The pragmatics of model-driven development. IEEE Software, 20(5):19–25, 2003.

Suggest Documents