UNU/IIST International Institute for Software Technology
2nd International Workshop on
Formal Aspects of Component Software FACS’05 UNU-IIST, Macao, October 24-25, 2005 Preliminary Proceedings
Editors: Lu´ıs Barbosa and Zhiming Liu October 2005
UNU-IIST Report No. 333
C
UNU-IIST and UNU-IIST Reports UNU-IIST (United Nations University International Institute for Software Technology) is a Research and Training Centre of the United Nations University (UNU). It is based in Macau, and was founded in 1991. It started operations in July 1992. UNU-IIST is jointly funded by the Governor of Macau and the governments of the People’s Republic of China and Portugal through a contribution to the UNU Endownment Fund. As well as providing two-thirds of the endownment fund, the Macau authorities also supply UNU-IIST with its office premises and furniture and subsidise fellow accommodation. The mission of UNU-IIST is to assist developing countries in the application and development of software technology. UNU-IIST contributes through its programmatic activities: 1. Advanced development projects, in which software techniques supported by tools are applied, 2. Research projects, in which new techniques for software development are investigated, 3. Curriculum development projects, in which courses of software technology for universities in developing countries are developed, 4. University development projects, which complement the curriculum development projects by aiming to strengthen all aspects of computer science teaching in universities in developing countries, 5. Schools and Courses, which typically teach advanced software development techniques, 6. Events, in which conferences and workshops are organised or supported by UNU-IIST, and 7. Dissemination, in which UNU-IIST regularly distributes to developing countries information on international progress of software technology. Fellows, who are young scientists and engineers from developing countries, are invited to actively participate in all these projects. By doing the projects they are trained. At present, the technical focus of UNU-IIST is on formal methods for software development. UNU-IIST is an internationally recognised center in the area of formal methods. However, no software technique is universally applicable. We are prepared to choose complementary techniques for our projects, if necessary. UNU-IIST produces a report series. Reports are either Research R , Technical T , Compendia C or Administrative A . They are records of UNU-IIST activities and research and development achievements. Many of the reports are also published in conference proceedings and journals. Please write to UNU-IIST at P.O. Box 3058, Macau or visit UNU-IIST’s home page: http://www.iist.unu.edu, if you would like to know more about UNU-IIST and its report series.
G. M. Reed, Director
UNU/IIST International Institute for Software Technology
P.O. Box 3058 Macau
2nd International Workshop on
Formal Aspects of Component Software FACS’05 UNU-IIST, Macao, October 24-25, 2005 Preliminary Proceedings
Editors: Lu´ıs Barbosa and Zhiming Liu Abstract
This compendium consists of the papers presented at the FACS’05 workshop. This compendium is the preliminary proceedings for the workshop for the convenience of the workshop participants; the official proceedings will be published after the workshop in the Electronic Notes on Theoretical Computer Science.
c 2005 by UNU-IIST, Editors: Lu´ıs Barbosa and Zhiming Liu Copyright
Electronic Notes in Theoretical Computer Science
2nd International Workshop on
Formal Aspects of Component Software FACS’05
UNU-IIST, Macao October 24-25, 2005
Preliminary Proceedings
Guest Editors: Lu´ıs Barbosa and Zhiming Liu
ii
Contents Preface
v
Programme
vii
Committees
ix
Remi Bastide, Eric Barboni Software Components: A Formal Semantics Based on Coloured Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Grant Malcolm Component-based Specification of Distributed Systems . . . . . . . . . . . . . . . 21 Silvia Amaro, Ernesto Pimentel, Ana M. Roldan Reo Based Interaction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Tomas Barros, Ludovic Henrio, Eric Madelaine Verification of Distributed Hierarchical Components . . . . . . . . . . . . . . . . . . 51 Samir Chouali, Maritta Heisel, Jeanine Souquieres Proving Component Interoperability with B Refinement . . . . . . . . . . . . . . 67 Ronald Middelkoop, Cornelis Huizing, Ruurd Kuiperi, Erik Luit Cooperation-Based Invariants for OO Languages . . . . . . . . . . . . . . . . . . . . . 85 Jan Carlsson, John Hrakansson,Paul Pettersson SaveCCM: An Analysable Component Model for Real-Time Systems . 101 Franoise Bellegarde, Jacques Julliand, Hassan Mountassir, Emilie Oudot On the Contribution of a tau-Simulation in the Incremental Modeling of Timed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Ivanilton Polato, Antonio Silva Filho A Component-based Approach to Embedded Software Design . . . . . . . . 133 M. Victoria Cengarle, Peter Graubmann, Stefan Wagner Semantics of UML 2.0 Interactions with Variabilities . . . . . . . . . . . . . . . . . 153 Yan Zhang, Jun Hu, Xiaofeng Yu, Tian Zhang, Xuandong Li, Guoliang Zheng Deriving Available Behaviour All Out from Incompatible Component Compositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
iii
David Streader, Steve Reeves Stepwise Refinement of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 J.J.M.M. Rutten Algebraic Specification and Coalgebraic Synthesis of Mealy Automata 199 ¨tz Bernhard Scha Building Components from Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Nuno Rodrigues, Lu´ıs Barbosa Component Identification Through Program Slicing . . . . . . . . . . . . . . . . . . 231 Lian Wen, Geoff Dromey Architecture Normalization for Component-Based System . . . . . . . . . . . . 247 Pavel Jezek, Jan Kofron, Frantisek Plasil Model Checking of Component Behavior Specifications: A Real Life Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Abbas Heydar Noori, Farhad Mavaddat, Farhad Arbab Towards an Automated Deployment Planner for Composition of Web Services as Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
iv
Preface By the end of the century, component-based software emerged as a promising paradigm to deal with the ever increasing need for mastering systems’ complexity, their evolution and reuse, and driving software engineering into sound production and engineering standards. Soon, however, it became a popular technology long before consensual definitions and principles, let alone formal foundations, have been put forward. The quest for suitable mathematical models for components, their interaction and composition, as well as for rigourous approaches to verification, deployment, testing and certification remain open research questions and challenging opportunities for formal methods. Moreover, new challenges are raised by applications to non conventional areas, such as safety-critical, mobile, and/or embedded systems. In this context, the objective of Facs’05 was to bring together researchers in the areas of component software and formal methods to promote a deep understanding of this paradigm and its applications. In particular, the Workshop was interested in defining the common aspects of components and component-based development. The collection of papers in this volume addresses a large number of crucial issues in this effort: from component models, their analysis and verification, to the semantics of interaction, architectural issues, non conventional application areas and case-studies. Organised by the International Institute for Software Technology of the United Nations University, Facs’05 was held in Macao, from the 24 to 25th October, 2005. The scientific programme included 18 regular papers and 4 invited talks, by Farhad Arbab, Paolo Ciancarini, Jifeng He and Rolf Hennicker. Facs’05 was the second in a series of workshops founded by Unu-Iist, whose first edition was held in Pisa, Italy, in September 2003, collocated with Fm’03.
We would like to thank all the researchers who submitted their work to the workshop and all colleagues who served in the Programme Committee and worked very hard in a tight deadline for the review. Our gratitude extends, of course, to the invited speakers for their willingness to come to present their research and share their own perspectives on formal methods for component software. Facs’05 proceedings will be published in the series Electronic Notes in Theoretical Computer Science. We are grateful to Mike Mislove, Managing Editor of the series, for his continuing support. Without the support of UNU-IIST, the workshop could not have happened. We would like to thank the director, Mike Reed, and his members of staff for their great help. In particular, we would like to thank Wendy Hoi, who has put a lot of effort in the arrangement of accommodation and meals for the v
participants, Kitty Chan and Raymond Hoi, who have been maintaing the workshop management system, and Michelle Ho, for dealing with the online registration.
Zhiming Liu and Lu´ıs Barbosa
vi
Programme Monday, 24th October 09.00 09.30
Welcome: Mike Reed (UNU-IIST Director) Invited Talk: Farhad Arbab (CWI, Leiden University and University of Waterloo) Reo: A Coordination Model for Component Composition Coffee Break Session 1 [Component Models]
10.30 11.00
• • • 12.30 14.00
Lunch Invited Talk: He Jifeng (UNU-IIST) Towards a Theory of Components and their Composition Session 2 [Verification]
15.00
• •
•
16.30 17.00
Software Components: A Formal Semantics Based on Coloured Petri Nets (Remi Bastide, Eric Barboni, Universit´e Toulouse 1, France) Component-based Specification of Distributed Systems (Grant Malcolm, University of Liverpool, UK ) Reo Based Interaction Model (Silvia Amaro, Ernesto Pimentel, Ana Roldan, Comahue University, Argentina)
Verification of Distributed Hierarchical Components (Tomas Barros, Ludovic Henrio, Eric Madelaine, INRIA Sophia Antipolis, France) Proving Component Interoperability with B Refinement (Samir Chouali, Maritta Heisel, Jeanine Souquieres, University of Nancy 2, France, and University of Duisburg-Essen, Germany) Cooperation-Based Invariants for OO Languages (Ronald Middelkoop, Cornelis Huizing, Ruurd Kuiper, Erik Luit, TU Eindhoven, The Netherlands)
Coffee Break Session 3 [Components for Timed and Embedded Systems] • •
•
SaveCCM: An Analysable Component Model for Real-Time Systems (Jan Carlsson, John Hkansson, Paul Pettersson, Uppsala University, Sweden) On the Contribution of a tau-Simulation in the Incremental Modeling of Timed Systems (Franoise Bellegarde, Jacques Julliand, Hassan Mountassir, Emilie Oudot, LIFC, France) A Component-based Approach to Embedded Software Design (Ivanilton Polato, Antonio Silva Filho, Recife Center for Advanced Studies and Systems, Brazil )
18.30
End of Day 1
20.00
Workshop Dinner
vii
Tuesday, 25th October 09.00 10.00 10.30
Invited Talk: Paolo Ciancarini (Universit`a di Bologna) On the Education of Future Software Engineers Coffee Break Session 4 [Behaviour and Interaction] • •
• • 12.30 14.00 15.00
Lunch unchen) Invited Talk: Rolf Hennicker (Ludwig-Maximilians-Universit¨at M¨ A Component Model for Architectural Programming Session 5 [Architectural Issues] • • •
16.30 17.00
Building Components from Functions (Bernhard Sch¨atz, TU M¨ unchen, Germany) Component Identification Through Program Slicing (Nuno Rodrigues, Luis Barbosa, Universidade do Minho, Portugal ) Architecture Normalization for Component-Based System (Lian Wen, Geoff Dromey, Griffith University, Australia)
Coffee Break Session 6 [Case-studies] • •
18.00 18.30
Semantics of UML 2.0 Interactions with Variabilities (M. Victoria Cengarle, Peter Graubmann, Stefan Wagner, TU Mnchen, Germany) Deriving Available Behaviour All Out from Incompatible Component Compositions (Yan Zhang, Jun Hu, Xiaofeng Yu, Tian Zhang, Xuandong Li, Guoliang Zheng, Nanjing University, China) Stepwise Refinement of Processes (David Streader and Steve Reeves, University of Waikato, New Zealand ) Alegbraic Specification and Coalgebraic Synthesis of Mealy Automata (J.J.M.M. Rutten, CWI, Amsterdam, The Netherlands)
Model Checking of Component Behavior Specification: A Real Life Experience (Pavel Jezek, Jan Kofron, Frantisek Plasil, Charles University, Czech Republic) Towards an Automated Deployment Planner for Composition of Web Services as Software Components (Abbas HeydarNoori, Farhad Mavaddat, Farhad Arbab, University of Waterloo, Canada)
Closing Session End of Day 2
viii
Invited Talks (Abstracts) Monday, 24th October 09.30 Farhad Arbab
(CWI, Leiden University and University of Waterloo)
Reo: A Coordination Model for Component Composition Abstract: Reo is a channel-based exogenous coordination model wherein complex coordinators, called connectors are compositionally built out of simpler ones. The simplest connectors in Reo are a set of mobile channels with well-defined behavior supplied by users. Reo can be used as a language for coordination of concurrent processes, or as a ”glue language” for compositional construction of connectors that orchestrate component instances in a component-based system. The emphasis in Reo is on connectors and their composition only, not on the entities that connect, communicate, and cooperate through these connectors. Each connector in Reo imposes a specific coordination pattern on the (distributed and mobile) entities (e.g., components) that perform I/O operations through that connector, without the knowledge of those entities. Reo inherently supports dynamic reconfiguration of its connectors. Channel composition in Reo is a surprisingly powerful mechanism for construction of connectors. We demonstrate the expressive power of connector composition in Reo through a number of examples. We show that exogenous coordination patterns that can be expressed as (meta-level) regular expressions over I/O operations can be composed in Reo out of a small set of only five primitive channel types.
14.00 He Jifeng (UNU-IIST) Towards a Theory of Components and their Composition Abstract: We present a a theory of reactive components. We identify a component by the services that it provides, and specify the ndividual services by a guarded-design, which enables one to separate the resposibility of clients from the commiment made by the component, and model the behaviour of a component by a set of failures and divergences. Protocols are introduced to coordinate the interactions between a component with the external environment. We adopt the notion of process refinement to formalise the substitutivity of components, and provide a complete proof method based on the notion of simulations. We also studies the algebraic properties of component combinators.
ix
Tuesday, 25th October 09.00 Paolo Ciancarini
(Universit`a di Bologna)
On the Education of Future Software Engineers Abstract: Service-oriented software engineering research addresses both technical and organizational issues, like for instance modeling the agents, the infrastructure, the business or even the community which will adopt a software system. The advent of component-based architectures and their widespread use for service oriented systems offers the occasion to reflect on the current education of future software engineers. The traditional approaches based on life-cycle phases and transformations are in crisis, however there is no clear consensus on what methods are more adequate for the novel design and managerial problems posed by service oriented, software intensive systems. In this talk we intend to discuss these topics, putting them in a research perspective.
14.00 Rolf Hennicker (Ludwig-Maximilians-Universit¨at M¨ unchen) A Component Model for Architectural Programming Abstract: High-level architectures and modular composition help in constructing largescale software systems. Current programming languages support software architecture insufficiently. We propose an architectural programming language, called Java/A, which integrates notions like components, connectors and configurations into Java. Java/A realizes an abstract component model based on algebras and transition systems.
x
Programme Committe • Farhad Arbab, CWI and Leiden University, The Netherlands, University of Waterloo, Canada • Lu´ıs Barbosa (PC Co-Chair ), Universidade do Minho, Portugal • Marcello Bonsangue, Leiden University, The Netherlands • Christiano Braga, Universidade Federal Fluminense, Brazil • Manfred Broy, Technische Universitat Munchen, Germany • Carlos Canal, Universidad de Malaga, Spain • Jo˜ ao Faria, Universidade do Porto, Portugal • Jos´e Fiadeiro, University of Leicester, United Kingdom • Susanne Graf, VERIMAG, France • Mathai Joseph, Tata Consultancy Services Limited, India • Atsushi Igarashi, Kyoto University, Japan • Kung-Kiu Lau, University of Manchester, United Kingdom • Zhiming Liu (PC Co-Chair ), UNU-IIST, United Nations University • Ant´ onia Lopes, Universidade de Lisboa, Portugal • Markus Lumpe, Iowa State University, USA • Tom Maibaum, McMaster University, Canada • Sun Meng, National University of Singapore, Singapore • Ugo Montanari, Universita di Pisa, Italy • David Naumann, Stevens Institute of Technology, USA • Bernhard Schaetz, Technische Universitat Munchen, Germany • Anders Ravn, Aalborg University, Denmark • Carolyn Talcott, SRI International, USA
Organising Committe • Bernhard Aichernig, UNU-IIST, Macao • Antonio Cerone, UNU-IIST, Macao • He Jifeng,UNU-IIST (OC Chair ) • Xiaoshan Li,University of Macaoi • Chan Iok Sam,UNU-IIST
xi
xii
FACS 2005
Software Components: a Formal Semantics Based on Coloured Petri Nets Remi Bastide and Eric Barboni 1,2 LIIHS-IRIT University Toulouse I France
Abstract This paper proposes a component model compliant with the current practice of Software Engineering, yet provided with a sound formal semantics based on Coloured Petri nets. Our proposal is structured as follows: 1) Define a component model. We have chosen a component model inspired by the CORBA Component Model (CCM), yet simpler and more precise. 2) Propose a notation to formally specify the internal behaviour of a software component. Our formal approach is based on Coloured Petri nets which makes it well suited to the modelling of concurrent, distributed or event-driven systems, and amenable to formal verification. 3) Define a mapping from the constructs of the component model (facets, receptacles, event sources and sinks) to the constructs of the Petri-net based behavioural specification (e.g. places, transitions, etc.). 4) Provide a formal definition of inter-components communication primitives, (invocation of methods, event-based communication). This definition is also given in terms of Petri nets. 5) Provide a denotational semantics of an assembly of components, in order to define the behaviour of such a system in terms of the individual behaviour of each component and of the formal definition of inter-component communication primitives. The expected benefits of such an approach are threefold: 1) Offer a convenient notation for describing the internal behaviour of concurrent and distributed software components, 2) Provide a formal, unambiguous semantics of component features such as event multicast or service invocation, 3) And, with the previous two being necessary conditions, offer some means to reason about assemblies of components designed with this approach, in particular to mathematically verify properties on them. Key words: Component models, Coloured Petri Net, Signal Nets
1 2
Email:
[email protected] Email:
[email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs
Bastide and Barboni
1
Introduction
The main contenders in the domain of industrial software component models currently appear to be the Microsoft .Net framework [14], java-based components such as JavaBeans [4] and the CORBA Component Model (CCM) [16]. Although they differ widely in the details, they have settled on a core of common concepts, which indicates that this domain has reached some maturity. These common concepts are: •
Considering a component as a black box that is accessed through exposed software interfaces. These interfaces define the contract offered by the component.
•
Providing for multicast, event-based communication as well as for unicast method invocation,
•
Providing for the design-time assembly and configuration of components, and in particular,
•
Design-time configuration of components through exposed properties.
Component-based programming emerged mainly thanks to the software industry, which was concerned in improving the reusability of software artefacts. It is now firmly established as a commercial market. A lot of work has been devoted to practical usability concerns in an industrial setting, such as deployment facilities for example. Much less work as been devoted ahead of time to the formal and theoretical foundations of component-based programming. The research community, witnessing the industrial success of component-based programming, has started in the last few years devoting a lot of activity to laying out such foundations [11,5]. This paper presents a formal model of components, and in particular aims at providing a formal semantics framework for the main concepts of software components as stated above. To this aim, we define five steps: •
Define a component model. We have chosen a component model somewhat inspired by the CORBA Component Model (CCM) (In particular, we reuse the CCM vocabulary for the features of components), yet simpler and more precise. The focus of our work is on a behavioural semantics of component activity, and leave out many practical aspects of CCM (such as deployment) that are fundamental in an industrial setting, but mainly resort to plumb and convey little theoretical interest.
•
Propose a notation to formally specify the internal behaviour of a software component. Our formal approach is based on Coloured Petri nets [7] which is a well-known, widely used and tool-supported notation, and builds upon our previous work on formal specification of CORBA objects [1,2]. It may be considered as an extension of this previous, object-oriented work to component-oriented programming. Its Petri net foundations makes 2
Bastide and Barboni
it particularly well suited to the modelling of concurrent, distributed or event-driven systems [12], and amenable to formal verification. •
Define a mapping from the constructs of the component models (e.g. facets, receptacles, event sources and sinks) to the constructs of our Petri-net based behavioural formalism (e.g. places, transitions, etc.).
•
Provide a formal definition of inter-components communication primitives, (invocation of methods, event-based communication). This definition is also given in terms of Petri nets.
•
Provide a denotational semantics of an assembly of components, in order to define the behaviour of such a system in terms of the individual behaviour of each component and of the formal definition of inter-component communication primitives. Although the notation used to specify the components’ behaviour is plain Coloured Petri nets as defined by Jensen, the denotational semantics is given in terms of Signal Nets, an extension of Coloured Petri nets defined and extensively studied by Starke and Roch [15]. The expected benefits of such an approach are threefold:
•
Offer a convenient notation for describing the internal behaviour of concurrent and distributed components,
•
Provide a formal, unambiguous semantics of component features such as event multicast or properties, especially inter-components communication,
•
And, with the previous two being necessary conditions, offer some means to reason about assemblies of components designed with this approach, in particular to mathematically verify properties on them.
The paper is structured as follows: We first describe our component model, along with its visual representation. The notion of component assembly is also described. We then present a simple case study that will be used throughout the paper to exemplify our approach. The case is first described informally in terms of components assembly. We proceed to describe the mapping between the component model and Coloured Petri nets, our chosen behavioural notation. For each component of the assembly, we provide a formal behavioural specification. Finally we describe the denotational semantics that allows us to automatically construct a single, unstructured Signal Net from the behaviours of all the components and their interconnections. This net describes the behaviour of the assembly as a whole.
2
CompoNets, our component model
Our component model is called CompoNets (a contraction of Component and Petri nets). Its features are somewhat inspired by CORBA CCM. In particular, we follow the CCM philosophy to treat the features required by a 3
Bastide and Barboni
component on par with the features it offers to other components. Since we use Coloured Petri Nets (CPN) to specify the behaviour of components, the type system of our component model will be given in terms of CPN-ML, a variant of standard ML. CPN-ML is the inscription language of CPNs, supported by the Design-CPN tool which we used to design the models in this paper. A CompoNet presents to the external world an Envelope made of several Ports, through which it will communicate with other components (figure 1). Ports may be of different categories (Facet, Receptacle, Event Source, and Event Sink). •
Facets: a facet represents a set of functional features offered to other components. Each facet is described by a name and an interface, i.e. a set of CPN-ML function signatures. The ML language does not support the notion of interface, but we use this term in reference to other languages such as Java or CORBA IDL, to conveniently give a name to a set of ML function signatures.
•
Receptacles: a receptacle represents a set of functional features that are required by the CompoNet to fulfil its function. A receptacle is described like for a facet, by a name and a CPN-ML interface.
•
Event Sources: an event source describes an event that may be emitted by the CompoNet. The fact that a CompoNet offers an event source implies that it will be able to multicast this event to a set of receivers who have manifested their interest for the event. Syntactically, an event source is described as a parameter less CPN-ML function signature. The result type of the function can be any CPN colourType. In particular, the colourType E will often be used: in CPN, E is the empty colour type, used to describe tokens that carry no value. The E colourType will be used when only the occurrence of the event is of interest. When an event carries some information, a dedicated CPN-ML colourType (such as STRING) will be used as the return type of the ML function describing the event source.
•
Event Sinks: an event sink describes an event that the CompoNet is willing to receive. An event sink is described like an event source. FacetName : FacetInterface
EventSinkName: ResultType
EventSourceName : ResultType
ReceptacleName: ReceptacleInterface
Fig. 1. Graphic syntax of CompoNets.
4
Bastide and Barboni
Figure 1 illustrates the graphic syntax of a CompoNet, giving the graphic notation used for facet, receptacles, event sources and event sinks. The graphic representation for facets (resp. event sources) and receptacles (resp. event sinks) is designed so that they can conveniently plug into each other. To design a system, the designer will assemble several components into an assembly, and connect their facets and receptacles (resp. their event sources and event sinks). •
A facet can be connected to a receptacle if they have the same ML interface. In an assembly, a receptacle needs to be connected to exactly one facet: the connection describes which component provides the features described by the receptacle’s interface. On the contrary, a facet can be connected to zero, one or several receptacles: the features described by the facet’s interface may be used by several other components.
•
An event source can be connected to an event sink if their associated ML function signatures have the same result type. An event source can be connected to any number of event sinks (thus modelling the multicasting of events to several destinations), and an event sink can be connected to any number of event sources (thus modelling the fact that a component may receive the same event from more than one source).
3
Case Study
To illustrate our approach of component-based modeling, we present a simple case study from the User interface application domain. Informal specification The proposed application allows the user to enter a string of text into a text box, and put the typed text into a buffer of unknown size by clicking on a button. Conversely, another button allows the user to get a string from the buffer, and display the retrieved text in another textbox. The buttons must be enabled or disabled according to the state of the buffer: if the buffer is full, the Put button must be disabled. Conversely, if the buffer is empty, the Get button must be disabled. Component assembly The user interface of our running example is illustrated in figure 2, while figure 3 gives an assembly of components modeling a possible software solution: The assembly in figure 3 (left) shows 4 visual components that are also visible in the user interface: the two text fields (PutTextField and GetTextField) and the two buttons (PutButton and GetButton). It also contains non-visual components that provide the rest of the application’s behaviour: MyBuffer which model the buffer of unknown capacity, PutAdapter and GetAdapter that provide the logic necessary to assemble the various pieces. The detail of 5
Bastide and Barboni
Fig. 2. User interface of the case study.
notFull
empty
full
notEmpty
MyBuffer
buffer : Buffer enable
disable
PutButton
enable
perform click
buffer : Buffer
click
buffer : Buffer
GetButton
perform
PutAdapter
GetAdapter
text : Text
text : Text
text : Text
text : Text
PutTextField
click
GetTextField
keyPressed() : STRING
click
click
disable
keyPressed() : STRING
keyPressed() : STRING
click
Fig. 3. A component assembly for the case study (left) and its hierarchical abstract view (right).
the envelope and behaviour of each component in the assembly is given in the following section. The component model we propose is hierarchical: the behaviour of one component may be given either in terms of Coloured Petri nets, or as an assembly of other components. The right part of figure 3 represents the hierarchical view of the left part. The assembly may be considered and used like an atomic component, taking part in higher-level assemblies. It exposes features promoted from its inner component (in this case, the keyPressed event sink from the PutTextField component, and the click event sinks from both buttons. The focus of the present paper is not on this hierarchical feature, so we will not give more details on the technical aspects of hierarchical assemblies. 6
Bastide and Barboni
4
Detailed specification of each component
We now proceed by giving the detailed specification of each component in the assembly in figure 3. Doing this, we will also explain the mapping between the component model described in section 1 and our chosen behavioural notation, Coloured Petri nets. GetButton and PutButton components enable
1 1`e
E disabled
enabled 1`e
click() disable click
Fig. 4. Envelope (left) and behaviour (right) of the GetButton component.
The envelope of the GetButton component (figure 4 left) shows three event sinks (enable, disable, and click) event sources and one event source (click). Since the return type of these events are not specified, the default empty return type E is assumed (in all rigor we should have written enable : E, disable : E, click : E). This model the facts that these events are only value-less signals coming from or sent to the component’s environment. The behavioural specification of GetButton is given in figure 4 (right), as a coloured Petri net with several syntactic extensions whose semantics will be formally given in section 5. Informally, we could say that a button receives enable, disable and click signals from its environment, and forwards click signals only when it is enabled. The Petri net allows to state formally this behaviour, provided a suitable mapping from event sources and event sinks to transitions in the Petri net is properly defined. Mapping for event sinks For each event sink in the envelope, we associate a set of transitions in the behavioural net. This set of transitions is called the event-handler of the event sink. We have chosen a visual syntax to represent the event-handlers: all transitions in the event-handler for event sink X are named X, and represented in bold. In figure 4 (right) we can see that the event-handlers for event sink 7
Bastide and Barboni
enable are made of the single transition with the same name, and likewise for the other two event sinks (disable and click). This visual syntax is consistent with the fact that all event handlers are disjoint (a transition is part of 0 or 1 event handler). The fact that a transition belongs to an event-handler changes its usual firing rule: it becomes a synchronized transition. The concept of synchronized transitions was developed by Moalla et al. [10] to enable modeling nonautonomous systems with Petri nets. A synchronized transition does not fire autonomously when it is enabled, like other transitions do. In order to fire, it needs to receive a signal from its environment when enabled. The concept of non-autonomous system is very relevant to component-based modeling: each component is potentially a non-autonomous system that needs input from its environment in order to function. For instance, our GetButton component is initially enabled (as described by the initial marking in figure 4 right). If an enable event is received in this state, it will be ignored and lost: the enable transition is not fireable in this marking. However, the reception of a disable event will move it in the disabled state, where subsequent disable and click events will be ignored. Note that the notion of synchronized transition and its associated firing rule are only meant to give the designer an intuitive comprehension of the component’s dynamic behaviour: this notion will no longer be used in the denotational semantics detailed in section 5, and superseded by the more general notion of signal arcs. Mapping for event sources The GetButton component features only one event source, click, stating that it is able to send click events to other components. The mapping for event sources to the Petri net is simple: for each event source S, an event dispatching function of the same name can be used as the action of any transition T in the behavioural net. The intuitive meaning is that, when the transition T is fired (and thus when T’s action is executed), the event will be sent to each event sink connected to the source. Event dispatching functions are meant to model multicast, asynchronous message sending. This intuitive meaning will be given a formal semantics in section 5. In our GetButton component, when the click transition fires, its action executes and sends the click event to all connected components. Note that the transition can only fire when the enabled place is marked, and when an external click event is received through the click event sink. We therefore have the expected behaviour for the button: forward click events only when enabled. PutButton and GetButton have identical envelopes. The behaviour of PutButton only differs from GetButton’s by its initial state: it is initially disabled. PutButton’s behaviour is given in figure 5. The layout is also different in order to facilitate the reading of the complete semantics of the 8
Bastide and Barboni
enable
1`e enabled
1 1`e
disabled E
click()
disable
click
Fig. 5. Behaviour of the PutButton component.
model on figure 14 3 . MyBuffer component MyBuffer is meant to model a message buffer in a component-oriented way. Its envelope (figure 6 right) shows 4 event sources (notFull, full, empty, notEmpty) and one facet (buffer).
interface Buffer { fun put(s : STRING):E; fun get() : STRING ; }
Fig. 6. The Buffer interface (left) and MyBuffer component’s envelope (right).
The Buffer interface (figure 6 left) is a collection of ML function signatures that describe the functional features one can expect from a buffer. Two services are provided: put(s : STRING) to insert a message in the buffer, and get() : STRING to retrieve a message from the buffer. The behavioural specification associated to this component is meant to describe when these services are available, and their effect on the component’s state. For the sake of simplicity, we have chosen to model a very simple, one-slot buffer. This buffer may contain at most one message, and is therefore either 3
In all rigour, PutButton and GetButton should have been described as two instances of the same component class, but at the price of some extra logic in MyBuffer to signal its initial state. We have chosen not to do so to keep the models simpler.
9
Bastide and Barboni
empty or full. Simple as it is, it is however fully compliant with its envelope illustrated in figure 6 right. This component will allow us to illustrate the mapping from the facets to the Petri net.
Mapping for facets A facet F in a component’s envelope is a set of ML function signatures. For each signature S in this set, two places of the same name are defined in the behavioural net: •
A place called the Service Input Port (SIP) for S. Graphically, SIPs are depicted by an [In] annotation in the net. SIPs are meant to model the arrival of services invocation in the component.
•
A place called the Service Output Port (SOP) for S. Graphically, SOPs are depicted by an [out] annotation in the net. SOPs are meant to model the results of a service invocation in the component.
The token-type of SIPs and SOPs is deduced from the parameters of signature S: for instance, the token-type of place [In] put is STRING, and the token-type of place [Out] put is E (the empty token-type) because the signature of the put service is fun put( s : STRING): E. SIPs can only have output arcs in the behavioural net. Conversely SOPs can only have input arcs. put
roomLeft
get
Out
In
s
notEmpty(); full();
notFull(); empty();
insert
extract s
s
s
STRING put
message
In STRING
1 1`"hello" 1`initialMessage
get Out STRING
Fig. 7. Behavioural specification of the MyBuffer component.
Provided with the mapping for event sources and facets, we may specify the behaviour of MyBuffer in terms of Coloured Petri nets (figure 7). The initial marking states that the buffer has initially a message in its message place, and is therefore full. The component can receive put and get invocations at any time, which will result in tokens received in the corresponding SIPs. If a put invocation is received in this initial state, the Insert transition is not enabled, and therefore the token will stay in the [In] put place until an invocation for get is received, processed, and results in a token being deposited in the roomLeft place. From this marking, the Insert transition can fire, depositing a result token in the [Out] put place. 10
Bastide and Barboni
Figure 7 also makes use of the mapping for event sources described above. For instance, when the extract transition fires, it signals both the event notFull (since the buffer is not longer full) and the event empty (since it is now empty). PutTextField and GetTextField component PutTextField and GetTextField are two instances of the same component class: they have exactly the same envelope and behaviour.
interface Text { fun setText(s : STRING); fun getText() : STRING ; }
Fig. 8. The Text interface (left) and PutTextField component’s envelope (right).
The Text interface (figure 8 left) lists the functional features expected from a text component: a service (setText) to set the contents of the text area, and a service (getText) to retrieve its content. A text area such as PutTextField (figure 8 right) supports this facet, and has an event sink keyPressed that allow interactively changing the text content through some form of user-interface interaction. setText In
STRING
getText In
t 1`"" setText
STRING s
s
contents
t setText Out
getText s
s STRING
m
combine(m, keyPressed())
getText Out
keyPressed
Fig. 9. Behaviour of both PutTextField and GetTextField components.
The behaviour of both text fields is specified in figure 9. The initial content of the text field is the empty string, and can be changed either by invoking the setText service, or by interactively editing the field’s content through keyPressed events. We postulate the existence of a combine(text: STRING, 11
Bastide and Barboni
char: STRING): STRING ML function that allows appending characters to the field’s content, taking care of special characters such as backspace, etc. GetAdapter and PutAdapter We complete the formal specification of the case study by giving the behaviour of the GetAdapter and PutAdapter components, which will lead us to detail the mapping for receptacles, which we have not yet encountered. These two adapter components are meant to adapt the events fired by the buttons to the services provided by MyBuffer and both text fields. Such adapter classes are quite frequently required in event-driven programming libraries, such as the Swing library for java-based user interfaces. E output(s) perform action let val result = buffer.get() in (s) s end
text STRING
ready
s
1`e 1 1`e
setText action text.setText(s) end
Fig. 10. Envelope (left) and behaviour (right) of the GetAdapter component.
Mapping for receptacles The GetAdapter component (figure 10 left) has two receptacles: •
buffer, associated to the interface Buffer given in figure 6 (left). This models the fact that GetAdapter will make use of the services provided by another component featuring a Buffer facet.
•
text, associated to the interface Text given in figure 8 (left). This models the fact that GetAdapter will make use of the services provided by another component featuring a Text facet.
The existence of a receptacle R in a component’s envelope enables the definition of special invocation transition in the component’s behavioural net. An invocation transition is a transition whose action has the special form receptacleName.serviceName(parameters), where serviceName is one of the functions in the receptacle’s interface. Invocation transitions model the unicast synchronous invocation of a method on another component, as opposed to the asynchronous multicast event dispatching modeled by event sources. Once again, the formal semantics for such invocation transitions will be detailed later (section 5). 12
Bastide and Barboni
In the case of the GetAdapter component, we have two such invocation transitions: the perform transition invokes the get() service on the buffer receptacle, and the setText transition which invokes the setText() service on the text receptacle. The former is also synchronized on the perform event sink. In natural language, we could say that when the GetAdapter component receives a perform event, he first gets a message from the buffer by calling buffer.get(), then displays the resulting message in the text field by calling text.setText(). This behaviour is precisely what is formally described in the behaviour net of figure 10 (right). E 1`e perform
ready
output(s) action let val result = text.getText() in (s) s end
text
1 1`e
s
put
action buffer.put(s) end
STRING
Fig. 11. Envelope (left) and behaviour (right) of the PutAdapter component.
PutAdapter’s envelope is identical to getAdapter’s, and its behaviour is very similar: the only differences are the methods called in the invocations transitions’ action: when PutAdapter receives a perform signal, it retrieves the content of its text field by calling text.getText(), and inserts this string into the buffer by calling buffer.put().
5
Denotational semantics
So far we have shown how coloured Petri nets can be used to model the behaviour of software components featuring synchronous unicast method invocations and asynchronous multicast event dispatching. This description requires several syntactic extensions to conventional coloured PN (Service Input/Output ports, invocation transitions, synchronized event-handler transitions and event dispatching functions). What remains to be done is to give a formal semantics to these syntactic extensions. This formal semantics will be given in a denotational manner. According to B. Meyer [9], The denotational semantics of a (source) language expresses the meaning of a program by a translation scheme which, for each program in the (source) language, produces a program in a simpler (target) language . For us, the source language is the assembly of components together with their behavioural nets, and the target language is Signal nets, a variant of 13
Bastide and Barboni
coloured Petri nets developed and extensively studied by Starke and Roch [15]. In the following, we will first present the Petri nets patterns used to provide a denotational semantics to facet/receptacle connectors and to event-source / event-sink connectors, and as an illustration we will give the Signal net which is the formal denotational semantics of the assembly in the annex (figure 14). Semantics for facet/receptacle connectors Facet/receptacle connectors are meant to model method invocation between two components. As usual for object-oriented languages, method invocation in our component model is unicast (the client or emitter of the invocation must known a single server or receiver for it) and synchronous: the client waits for a result from the server. Such a mode of communication is very easily modeled in terms of Petri nets: a typical client/server communication Petri net pattern has been described by Ramamoorthy [13] as early as 1980. We have described in greater detail the use of this pattern for distributed object systems modeling in [1]. We reuse this pattern and extend it to deal properly with high-level Petri nets and concurrent invocations as follows: •
Invocation transitions exist only in components that have at least a receptacle. Each invocation transition in a behavioural net is considered as a macro-transition composed by a request-transition, a wait place, and a result transition connected in sequence. The request transition is connected by an output arc to the Service Input Port (SIP) of the corresponding service in the behavioural net of the component that possesses the connected facet. Conversely, the Service Output Port is connected by an output arc to the result transition.
•
Although this does not come up in our simple case study, we must properly deal with potentially concurrent invocations. The same service can be called concurrently by several components, possibly with different parameters. We need a proper construction to ensure that the results of a service are returned to the proper client. To this end, we define a new colour called INVOCATION, meant to identify each invocation. The request transitions generate a new INVOCATION value by calling the ML function gensym(), which returns a different value at each call. This value is appended to the tokens in the SIP, SOP, and wait place. The unification mechanism of coloured Petri nets thus ensures that the proper client gets the result of the invocation back.
Figure 12 exemplifies this Petri net pattern for client/server communication. The figure shows (partially) the behavioural net of PutTextField (left), the expanded net of PutAdapter (center) and (partially) the net of MyBuffer (right). The invocation transitions in PutAdapter have been expanded according to the pattern described above: we seen for instance that the perform 14
Bastide and Barboni E
NG
xt t
1 1`"" 1`"" STRING
s
s
i
INVOCATION
1 1`e 1`e
E
i
ready output (i); action let val invocation = gensym() in (i) end getText perform_request i In INVOCATION i
put_result
put
roomLeft
Out i
i i
put_wait
insert (s,i)
i
extract s
(s,i)
s
(s,i) setText
ext
contents t
m
getText s
perform_wait (s,i)
s
i
combine(m, keyPressed()) (s,i) keyPressed
put_request
getText Out
s perform_result
text
put In output (i); STRINGParam action let val invocation = gensym() in (i) end
message STRING
STRING
STRINGParam
Fig. 12. Facet/receptacle connector denotational pattern.
transition has been expanded into perform request, perform wait and perfom result. Likewise, the put transition has been expanded into put request, put wait and put result. The [In] and [Out] annotations for SIP and SOP places have been kept only for easy reference to the original nets: they serve no purpose anymore, the places are just conventional Petri net places. Semantics for event-source/ event-sink connectors As shown above, synchronous client-server communication between components is easily described in terms of Petri nets. The situation is not as good for asynchronous, multicast communication. This type of communication may even prove to be particularly difficult to model in terms of Petri nets, which has led Starke and Roch to define an extension to coloured PN called Signal Nets [15]. This is a genuine Petri nets extension: it has been demonstrated that this extension extends the modeling power of Petri nets to that of a Turing machine, and therefore several properties become undecidable. However several analysis techniques are available for Signal Nets and interesting results are still available, in particular using model checking. The extension brought by Signal Nets is signal arcs, i.e. arcs that connect two transitions. The semantics of such signal arcs is as follows: a transition which is the target of a signal arc fires when it is enabled and when it receives a signal from the source transition of the arc. The detailed firing rule for signal arcs is provided in [15]. In our case, event dispatching transitions will be the source of signal arcs, and transitions in an event-handler will be their target. Figure 13 exemplifies the use of signal arc for defining the semantics of event-source / event sinks connectors. The figure illustrates the communica15
1 1`"hello" 1`initialMessage
Bastide and Barboni
enable
E enabled
disabled
1 1`e 1`e
click
disable INVOCATION
INVOCATION
E
put
roomLeft
get
Out
In i
i
insert (s,i)
put
extract s
message
In STRINGParam
(s,i)
s
STRING
1 1`"hello" 1`initialMessage
get Out STRING
Fig. 13. Event-source / event sink connector denotational pattern.
tion between MyBuffer and PutButton component. The buffer signals when it is empty by firing the extract transition, and when it is full by firing the insert transition. These signals will enable or disable the button by forcing the occurrence of the target enable or disable transitions. The signal arcs are shown in bold.
6
Conclusion
We have shown that coloured Petri nets are a formal notation suitable to specifying the behaviour of software components featuring unicast method invocation as well as multicast event dispatching. Although a number of other researchers have used Petri nets for modeling object- or componentbased systems ([6,3,8]) we believe our work is original in that it defines a component model supporting both methods and events. In order to properly specify such components, several syntactic extensions are needed to provide a mapping from constructs of the component model to Petri net primitives. We have provided a denotational semantics for these syntactic extensions, given in terms of Signal Nets, and variant of Coloured Petri nets provided itself with a very precise operational semantics. The goal of our approach is to bring the benefit of Petri net based specification to component-based modeling. Further work remains to be done in order to really provide this benefit. In particular, since analysis will be performed on the global net resulting from the merging of all the behaviour nets in an assembly, we need to investigate ways to provide the analysis results in terms of the original nets, which are known by the designer. 16
Bastide and Barboni
References [1] Bastide, R., P. Palanque, O. Sy, D.-H. Le and D. Navarre, “Petri Net Based Behavioural Specification of CORBA Systems,” Proceedings of the 20th International Conference on Application and Theory of Petri Nets, SpringerVerlag, 1999, pp. 66-85. [2] Bastide, R., O. Sy, P. Palanque and D. Navarre, “A formal specification of the CORBA event service,” Fourth International Conference on Formal methods for open object-based distributed systems, 2000, pp. 371-395. [3] Battiston, E. , “Modeling a cooperative development environment with CLOWN,” OOPMC workshop, 1995. [4] Englander, R., “Developing Java beans,” OReilly & Associates, Inc., Sebastopol, CA, USA, 1997. [5] FMCO, “Workshop on Modelling of Objects, Components, and Agents,” URL http://www.daimi.au.dk/CPnets/workshop01/ [6] Guelfi, N., O. Biberstein, D. Buchs, E. Canver, M.-C. Gaudel, F. von Henke and D. Schwier, “Comparison of Object-Oriented Formal Methods,” Technical Report of the Esprit Long Term Research Project 20072, 1997. [7] Jensen, K., “Coloured Petri Nets. Basic Concepts, Analysis Methods and Practical Use,” Springer-Verlag, 1997. [8] Lakos, C. and C. Keen, “LOOPN++: A New Language for Object-Oriented Petri Nets,” Proceedings of Modelling and Simulation (European Simulation Multiconference), 1994, pp.369-374. [9] Meyer, B., “Introduction a la theorie des langages de programmation,” InterEditions, 1992. [10] Moalla, M., J. Pulou and J. Sifakis, “Synchronized Petri Nets: A Model for the Description of Non-Autonomous Systems,” Lecture Notes in Computer Science: Mathematical Foundations of Computer Science, Springer-Verlag, 1978, pp. 374384. [11] MOCA, “Second International Symposium on Formal Methods for Components and Objects,” URL http://fmco.liacs.nl/fmco03.html [12] Peterson, J. L., “Petri Net Theory and the Modeling of Systems,” Prentice Hall PTR, 1981. [13] Ramamoorthy, C. V. and G. S. Ho, “Performance Evaluation of Asynchronous Concurrent Systems Using Petri Nets,” IEEE Trans. Software Eng., 1980, pp. 440-449. [14] Ritcher, J., “Programming the .NET Platform,” Microsoft Press, 1998. [15] Starke, P. and S. Roch, “Analysing Signal-Net Systems,” Technical Report 162, Informatik Berichte, 2002
17
Bastide and Barboni
[16] Vinoski, S., “New features for CORBA 3.0” Commun. ACM, ACM Press, 1998, pp. 40-52.
18
19
t
setText
setText
setText Out
In
STRING
7
s
t
getText
disable
disabled
E
s combine(m, keyPressed()) (s,i)
keyPressed
m
1 1`""1`"" s
contents
STRING
click
enabled
enable
i
1`e
11`e1`e ready
E
(s,i)
STRINGParam
getText Out
perform_result
i
perform_wait
output (i) action let val invocation = gensym() in (i) end getText perform_request i In INVOCATION i
1 1`e
s
Out
put
(s,i)
i
insert
put In output (i); STRINGParam action let val invocation = gensym() in (i) end
(s,i)
i
STRING
message
s
roomLeft
E
s
1`initialMessage
1 1`"hello"
extract
i
(s,i)
In
Out STRING
get
get
INVOCATION
(s, i)
STRING
text
s
perform_result
i
perform_wait
i
output (i) action let val invocation = gensym() in (i)i perform_request end
Fig. 14. Denotational semantics of the case study.
STRING
text
s
put_request
i
put_wait
i
put_result
INVOCATION
disable
disabled
enable
1 1`e
1`e
(s,i)
i
setText
setText Out
INVOCATION
click
(s,i) s setText_request setText output (i) In action STRINGParam let val invocation = gensym() in (i) end
i
setText_wait
i
i
1 1`e
E 1`e
setText_result
E
ready
enabled
s
t
keyPressed()
m
contents
11`""1`""
STRING
s
s
s
getText Out
getText
getText In
STRING
Annex : the denotational semantics of the case study as an unstructured Signal net
Bastide and Barboni
Bastide and Barboni
20
FACS 2005
Component-Based Specification of Distributed Systems Grant Malcolm 1 Department of Computer Science University of Liverpool UK
Abstract We suggest that hidden algebra can provide a setting for component specification and composition that has the advantages of algebraic specification, without the disadvantages of object-oriented approaches where communication between components is mediated solely by method invocation. We propose a basic composition mechanism for hidden algebraic component specifications that is based on communication through shared subcomponents, and show that this composition mechanism on specifications extends naturally to allow models (or implementations) of the component specifications to be amalgamated into a model of the composite system.
As part of a general trend towards decentralisation [16], computer systems tend more and more to be constructed from distributed, self-contained, and possibly autonomous units. The challenges that this poses to computer science are reflected in the growth of new paradigms such as component-based, serviceoriented, and aspect-oriented software, and of new languages for modelling, specifying, composing, and co-ordinating these units. The object paradigm certainly helped set these developments moving: code could be organised at the level of classes, architectures at the level of class instances, and both these levels could be seen as comprising self-contained, even autonomous units. Distributedness is an essential part of the object paradigm, with interaction between instances being mediated by (possibly remote) method invocation, and type systems that include interfaces and abstract classes allow systems of interacting objects to be built from subsystems in a robust and flexible way. Yet it is widely agreed that the basic mechanism of interaction through method invocation does not meet the challenges posed by decentralised software systems. For example, Andrade and Fiadeiro [1], concerned with serviceoriented software, describe (op. cit., p. 380) ‘a gap between the high-level 1
Email:
[email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs
Malcolm
specification of interactions and their implementation in any particular technology,’ such as object-oriented languages that have a rigid coupling between methods and the (identities of the) objects that provide them. In a similar vein, Arbab [2], on component-based software, points to the ‘tight coupling inherent in the method call semantics (sic) [which is] more appropriate for intra-component communication’ (op. cit., p. 8). Since object-orientation can be seen as based on the notion of abstract data type (ADT) by a correspondence between the operations of an ADT and the methods of a class (cf. Meyer [15]), Arbab writes: ‘[i]f a component, like an ADT, provides a set of operations, then the only way to communicate with a component is by invoking its operations, and inter-component communication becomes the same as inter-object communication’ (op. cit., p. 14). Moreover, ‘[c]omposition of two components, in such models, does not by itself yield another component’ (op. cit., p. 48). We agree with these arguments that the compositional techniques of the object paradigm are too inflexible and ‘brittle’ to meet the challenges of decentralised software. For us these arguments raised the question: to what extent are algebraic specifications tied to the notion of interaction solely through method invocation? In this paper we argue that it is possible to use some of the methodologies of algebraic specification in formulating component composition where interaction takes place through shared subcomponents, thus relaxing the tight coupling of method-call semantics. In Section 2, we use hidden algebra as a language for specifying components. Hidden algebra was introduced by Goguen [7] to capture the notion of behaviour in systems that have state. In viewing components as ‘black boxes’ with state, our approach is similar to that of Arbab [2]; it is also similar to coalgebraic approaches such as that of Barbosa [3,4], since hidden algebra is closely related to a restricted form of coalgebra [13,5]. The approach also has the advantage that the possible implementations of a hidden specification are the models of the specification. In hidden algebra, composite (distributed) systems can be built up from components by concurrent connection; this was introduced in [8] as a purely syntactic construct. One of the contributions of the present work is an examination of the semantics of this construct in terms of its model theory. In Section 3, we introduce the notion of meromorphism to capture the inclusion of one component within another, and show that meromorphisms make concurrent connection into a (co)limit construction: i.e., concurrent connection of component specifications is the ‘least’ way of combining components with meromorphisms from the components to the composite system. The other key contribution of the present work is to suggest that concurrent connection could usefully form the basis of a component composition language, by showing that the construction extends in a natural way to models. If we build a composite specification by means of concurrent connection, we should also require that models — i.e., implementations — of the individual components should be composable to give a model of the composite system. In Section 4 22
Malcolm
we give ‘amalgamation lemmas’ (in the terminology of Ehrig and Mahr [6]) which show that this is indeed the case. A pleasant consequence of these results is that systems can be refined by refining individual components. Before we set out the main results, Section 1 gives some background on algebraic specification and hidden algebra. Some of the technical development in this paper is couched in the language of category theory, because this allows for a very concise statement of key results. However, we have tried to make this as transparent as possible by paraphrasing and explaining the intuitions behind the categorical terminology. In particular, although many of these results would be most naturally expressed in terms of indexed categories, we have adopted a more elementary approach, particularly in the proofs, in order that the actual constructions be given explicitly.
1
Preliminaries
This section fixes terminology and notation that is used in the following sections. We assume familiarity with algebraic specification, and so do not motivate the notions described below. For an introduction to algebraic specification, see Meinke and Tucker [14]; for more background on hidden algebra, see Goguen and Malcolm [11]. A signature is a pair (S, Σ) consisting of a set S of sorts and an (S ∗ ×S)indexed set Σ of operations. We usually denote such a signature as Σ if the set S can be left implicit, and we sometimes write σ : w → s for σ ∈ Σw,s . A Σ-algebra A interprets sorts as sets, and operations as appropriately typed functions; we write A(s) for the carrier set of sort s ∈ S, and A(σ) : Aw → As for the function interpreting σ : w → s. A signature morphism (S, Σ) → (S ′ , Σ′ ) is a pair (f, g), with f : S → S ′ , and g : Σ → Σ′f ∗ ,f an (S ∗ ×S)sorted function, i.e., for w ∈ S ∗ and s ∈ S, gw,s : Σw,s → Σ′f ∗ (w),f (s) . Usually, we denote a signature morphism by ϕ : Σ → Σ′ , and ignore the distinction between f and g and drop all subscripts, writing ϕ(s) for f (s), and ϕ(σ) for gw,s(σ). We write TΣ (X) for the algebra of Σ-terms with variables from the Ssorted set X. A theory is a pair (Σ, E), where Σ is a signature, and E a set of Σ-equations of the form (∀X) l = r, where X is an S-sorted set of variables, and l, r ∈ TΣ (X)s for some s ∈ S. A (Σ, E)-model is a Σ-algebra A that satisfies all the equations in E; we write A |= E to indicate that A is a (Σ, E)-model. Hidden algebra distinguishes hidden and visible sorts; hidden sorts are intended to represent states that can change, while visible sorts represent immutable data values, such as numbers, Booleans, etc. A hidden theory specifies behaviour of states, and uses a fixed representation of the visible sorts: a visible data universe is a triple (V, Ψ, D), where (V, Ψ) is a signature, and D is a Ψ-algebra. The examples in this paper use the visible data universe given by: 23
Malcolm
obj DATA is sorts Msg MsgList . op nil : -> MsgList . op cons : Msg MsgList -> MsgList . op isEmpty : MsgList -> Bool . var M : Msg . eq eq
var MS : MsgList .
isEmpty(nil) = isEmpty(cons(M,MS)) =
true . false .
end
(we use the notation of BOBJ [9], which we hope requires no explanation), and we assume some model D that satisfies the given equations. Given a fixed visible data universe as above, a hidden signature is a pair (H, Σ), where the elements of H are called hidden sorts and are disjoint from V , and (V ∪H, Σ) is a signature such that for σ : w → s, there is at most one hidden sort in w (i.e., operations work locally on one state). Hidden Σalgebras (or ‘models’) are Σ-algebras that agree with D on visible sorts and operations. Example 1.1 Two simple examples of hidden signatures, both of which extend the DATA signature, are: bth STATE is pr DATA .
bth ENLIST is pr STATE .
sort State .
op addToList : State Msg -> State .
end
end
A STATE-algebra simply has some carrier set for states, in addition to the fixed interpretation of DATA. An ENLIST-algebra A has one function A(addToList) : A(State)×D(Msg) → A(State). Behaviour of operations is given by equations: a hidden theory (Σ, E) consists of a hidden signature Σ, and a set of (Ψ∪Σ)-equations E, where each equation contains at most one variable of hidden sort. A context is a term with exactly one occurrence of a place-holder variable (say, ‘_’); visible contexts are contexts of visible sort. We write CΣ (s, t) for the set of contexts of sort t that contain a place-holder variable of sort s; we write c[t] for the term obtained by substituting term t for the place-holder variable in context c. A hidden algebra behaviourally satisfies an equation (∀X) l = r iff it satisfies all equations of the form (∀X) c[l] = c[r], where c is a visible context. Example 1.2 A model of the following stores a list of messages: bth SEELIST is pr ENLIST . op list : State -> MsgList . op empty : State -> State . var S : State .
var M : Msg .
24
Malcolm eq eq
list(addToList(S,M)) = list(empty(S)) =
cons(M, list(S)) . nil .
end
We will not be concerned in this paper with behavioural satisfaction, but we note that behavioural satisfaction of visible-sorted equations such as those above is the same as standard satisfaction of equations [11]. A hidden homomorphism h : A → B of Σ-algebras is a homomorphism which is the identity on visible sorts. In this paper we will consider only hidden signatures with no generalised constants, i.e., operations σ : w → s where w ∈ V ∗ and s ∈ H. Thus there are no ‘constructors’ for states. This restriction Q gives us final algebras: for a hidden signature Σ, define FΣ by FΣ (h) = v∈V [CΣ (h, v) → D(v)], so that FΣ (h) consists of ‘abstract states’ that map any visible context c ∈ CΣ (h, v) to a ‘result’ in D(v). The interpretation of operations on this Σ-algebra is the obvious one. This algebra is final in that there is exactly one hidden homomorphism A → FΣ for any Σ-algebra A; this homomorphism maps a state of A to its abstract behaviour. More generally, Cˆırstea [5] has shown that any theory morphism ϕ : (Σ, E) → (Σ′ , E ′ ) allows (Σ, E)-models to be extended to (Σ′ , E ′ )-models. Technically, the reduct functor A′ 7→ A′ |`ϕ : Mod(Σ′ , E ′ ) → Mod(Σ, E) has a right adjoint A 7→ Aϕ : Mod(Σ, E) → Mod(Σ′ , E ′ ). Thus for every A |≡ (Σ, E), there is Aϕ |≡ (Σ′ , E ′ ), with εA : Aϕ |`ϕ → M such that for every h : A′ |`ϕ → A there is a unique f /εA : A′ → Aϕ such that (f /εA )|`ϕ ; εA = f . To simplify only slightly, the carrier of Aϕ consists of elements p in FΣ′ which are ‘decorated’ with elements of A which agree with p with respect to Σ-contexts (and we restrict to only those p that satisfy the equations E ′ ). For operations σ ∈ Σ′ /Σ, an element p is mapped to, say, p′ , and there is no restriction on the elements of A that decorate p and p′ .
2
Component Specifications
Our notion of a component is a ‘black box’ with hidden state: Definition 2.1 A component specification is a hidden theory with exactly one hidden sort. We will usually use ‘C’, with subscripts or other decorations, as a variable ranging over component specifications, with C Σ representing its signature, and C E it equations, and write A |≡ C to indicate that A is a C-model. We will typically denote the unique hidden sort of an component specification by ‘h’. In specific examples in BOBJ notation, we use State as the name of the unique hidden sort, as in Example 1.1. Although we do not suggest that components are classes or objects, it is standard terminology in hidden algebra, which we shall follow here, to refer to operations that return a visible sort as ‘attributes’, and operations that return a hidden sort as ‘methods’. In general, attributes and methods take one hidden-sorted argument and some number of visible25
Malcolm
sorted arguments; henceforth, we shall ignore the visible-sorted arguments, without loss of generality, as we could assume we have one operation for each of its possible fixed arguments, for example, an operation addToListm : h → h for each m ∈ D(Msg). Example 2.2 A channel that carries values of sort Msg and a component that keeps a running total of the messages sent on a channel is specified as follows: bth CHANNEL is pr STATE . op read : State -> State . op val : State -> Msg . end bth ADDER is pr CHANNEL . op total : State -> MsgList . var S : State . eq total(read(S)) = end
cons(val(S), total(S)) .
Note that ‘pr CHANNEL’ imports the CHANNEL specification, nad makes the channel a subcomponent of ADDER.
3
Concurrent Connection
In this section we describe the construction of complex components by concurrent connection [8], and give a formal definition of subcomponent which makes this a closure operation: i.e., the connected components are subcomponents of the composite system. We begin with the simplest case from [8]: Definition 3.1 Given component specifications Ci for i ∈ I, the independent sum ||i∈I Ci is the component specification whose signature is the union ∪i∈I CiΣ (which we assume to be disjoint, with the exception of the hidden sort h), and whose equations are the union of the equations in each Ci , together with independence axioms of the form (∀S : h) α(µ(S)) = α(S) for each attribute α in Ci and method µ in Cj with i 6= j. This represents two components with no communication. For example, the independent sum of two SEELIST components (cf. Example 1.2) would represent two separate lists, with operations ops list1 list2 : State -> MsgList . ops addToList1 addToList2 : State Msg -> State .
and the independence axioms would include: eq
list1(addToList2(S,M)) =
list1(S) .
The following generalises this to allow for components that communicate via a shared subcomponent (again, from [8]). 26
Malcolm
Definition 3.2 Given component specifications Ci for i = 0, 1, 2, and hidden theory morphisms ϕi : C0 → Ci for i = 1, 2, the concurrent connection is given by the component specification ϕ1 ⇓ ϕ2 together with ψi : Ci → ϕ1 ⇓ ϕ2 where the signature (ϕ1 ⇓ ϕ2 )Σ (and the morphisms ψi ) is given by the pushout of the morphisms ϕi , and the equations ϕ1 ⇓ ϕE 2 contain all the equations ψi (Ei ) for i = 1, 2, as well as the independence axioms: •
for each i 6= j ∈ {1, 2} and operations αi : h → v in Ci and not in the range of ϕi and µj : h → h in Cj and not in the range of ϕj , an equation (∀S : h) ψi (αi )(ψj (µj )(S)) = ψi (αi )(S) .
Example 3.3 A component that writes messages to a channel is specified in the following theory. This component stores messages (in an attribute store), which are written to the channel itself by the channel’s read method. bth SENDER is pr CHANNEL . op put : State Msg -> State . op store : State -> Msg . var S : State .
var M : Msg .
eq eq
store(put(S,M)) = M . store(read(S)) = store(S) .
eq eq
val(put(S,M)) = val(S) . val(read(S)) = store(S) .
end
The concurrent connection of this component with the ADDER component of Example 2.2, for which we use the notation SENDER/CHANNEL\ADDER, has all the operations from ADDER and SENDER (without duplicating the CHANNEL operations), all the equations from these two components, plus the following independence axiom: eq
total(put(S,M)) =
total(S) .
stating that the method put, which is local to the sender, has no effect on the attribute total, which is local to the adder. The signature of the concurrent connection is formed by taking all the operations from the signatures of the components; no duplicates are made of any operations that come from shared subcomponents, but if two operations ‘accidentally’ have the same name (i.e., they don’t come from a shared subcomponent) then they are named apart. This is an example of a colimit construction: the signature of the concurrent connection is a maximal way of combining the signatures of components without duplicating operations from shared subcomponents. This maximality gives a universal property that we will make frequent use of: given another signature, say Σ, that combines the component signatures without duplication, there is a unique signature morphism from the signature of the concurrent connection to Σ that includes the operations in the same way they are combined in Σ. Our main results below extend this sort of ‘maximality’ property to component specifications. Now we capture the notion of subcomponent as follows: 27
Malcolm
Definition 3.4 A meromorphism ϕ : (Σ, E) → (Σ′ , E ′ ) is an inclusion of component specifications such that for all attributes α in Σ and all methods µ in Σ′ /Σ, we have E ′ |≡ (∀S : h) α(µ(S)) = α(S). We say that σ is local to (Σ′ , E ′ ) iff σ ∈ Σ′ /Σ. Note that since ϕ is an inclusion, we simply write α for ϕ(α) in this definition, and in the rest of this paper. Proposition 3.5 Let ϕi : C0 → Ci (i = 1, 2) be meromorphisms; then the concurrent connection injections ψi : Ci → ϕ1 ⇓ ϕ2 are meromorphisms. Proof. Let α be an attribute in C1 and µ be a local method in C2 . Now if α is local in C1 , i.e., if α is not in C0 , then (ϕ1 ⇓ ϕ2 )E |≡ (∀S : h) α(µ(S)) = α(S) because that is an independence axiom in (ϕ1 ⇓ ϕ2 )E . If α is in C, then C2E |≡ (∀S : h) α(µ(S)) = α(S) by the condition that ϕ2 is a meromorphism, and so it follows that E ′ |= α(µ(S)) = α(S). This shows that ϕ1 is a meromorphism; symmetry between i = 1 and i = 2 shows that ϕ2 is also a meromorphism. Let HMer be the category of component specifications and meromorphisms. Proposition 3.6 The category HMer is cocomplete. Proof. The theory STATE is an initial object. It has no attributes, so the defining property of meromorphisms is vacuously satisfied by its inclusion in any hidden component specification. Proposition 3.7 below shows that HMer has coproducts, and Proposition 3.8 shows that it has pushouts. Proposition 3.7 Independent sum gives coproducts in HMer. Proof. Let Ci be hidden component specifications (for i in some index set I), and let C be their independent sum. The independence axioms of the independent sum are exactly what is needed to make the injections ψi : Ci → C meromorphisms. Now suppose χi : Ci → C ′ is a cocone of meromorphisms. The universal property of C Σ gives a unique ξ : C Σ → C Σ such that ψi ; ξ = χi . To see that this is a meromorphism, let α be an attribute in C, and let µ be local to C ′ . Since α is in C, it must be in Ci for some i ∈ I, and so C ′E |≡ α(µ(S)) = α(S) because χi is a meromorphism; but this is exactly what is required for ξ to be a meromorphism. Proposition 3.8 Concurrent connection is a pushout in HMer. Proof. Let ψi : Ci → C be the concurrent connection of ϕi : C0 → Ci (i = 1, 2), and let χi : Ci → C ′ be a cocone of meromorphisms. By cocompleteness of the category of signatures, there is a unique morphism of cocones ξ : C → C ′ ; the proof that this is a meromorphism is similar to that in Proposition 3.7 above. The point of these latter two propositions is that all colimits can be built from repeated construction of independent sums and concurrent connections. For example, consider a chatroom application, where a chatter can send messages to a chatroom, and the chatroom broadcasts messages to the chatter. If we concentrate only on the ‘in-boxes’ of both the chatter and the chatroom, 28
Malcolm
we can specify the chatter as: bth CHATTER is pr ENLIST *(op addToList to sendMsg) . pr SEELIST *(op addToList to getMsg , op list to history, op empty to clear) . end
This component has two lists (with appropriately renamed operations), representing the (‘write-only’) list of messages the chatter sends to the chatroom, and the (‘read-write’) list of messages received from the chatroom. Similarly, the chatroom can be specified as: bth CHATROOM is pr ENLIST *(op addToList to sendMsg) . pr SEELIST *(op addToList to receiveMsg , op list to inbox, op empty to broadcast) . op sendAll : State MsgList -> State . var S : State . eq eq eq
var M : Msg .
var MS : MsgList .
broadcast(S) = sendAll(S, inbox(S)) . sendAll(S, empty) = S . sendAll(S, cons(M,MS)) = sendAll(sendMsg(S,M), MS) .
end
Now this is all we need specify; these two components share two lists with differing ‘read-permissions’, and the component representing their composition can be built up incrementally as in the following diagram. CHATTER/(ENLIST1||ENLIST2)\CHATROOM
CHATROOM
CHATTER = SEELIST1/ENLIST1\(ENLIST1||ENLIST2)
SEELIST1
(ENLIST1||ENLIST2)/ENLIST2\SEELIST2
SEELIST2
ENLIST1||ENLIST2
ENLIST1
ENLIST2
This process could be repeated to form a component comprising two or more chatters sharing a chatroom.
4
Composing Implementations
We turn now to the question of composing models, i.e., implementations, of component specifications. Definition 4.1 Let CMod be the category whose objects are pairs (C, M), 29
Malcolm
where C is a component specification, and M a hidden C-model, and whose arrows are pairs (ϕ, h) : (C, M) → (C ′ , M ′ ) with ϕ : C → C ′ a meromorphism and h : M ′ |`ϕ → M a hidden homomorphism. Let CModSL be the subcategory of CMod where the morphisms satisfy the strong localisation property: that h is such that h(Mµ′ (x)) = h(x) for all local methods µ in C ′ . The following ‘cocompleteness’ results are amalgamation properties: implementations of components give rise to implementations of specifications built from independent sum and concurrent connection. We begin with the simplest case, where the strong localisation property states that local methods have no effect at all on subcomponents. Note that this property holds for systems built exclusively by independent sums. Proposition 4.2 CModSL is cocomplete. Proof. We note that an initial object is given by (State, 1), where 1(h) = {0}. The following lemmas show the existence of coproducts and pushouts. Lemma 4.3 CModSL has coproducts. Proof. Given a collection (Ci , Mi ) for i ∈ I, let ϕi : Ci → ||i∈I Ci be the independent sum, and define the model M |≡ ||i∈I Ci by Y M(h) = Mi (h) i∈I
For σ local to CjΣ , M(σ) takes a family (mi )i∈I to (m′i )i∈I , where m′i = mi for all i 6= j, and m′j = Mi (σ)(mj ). It is straightforward to see that M satisfies the equations of ||i∈I Ci . Then (ϕi , πi ) : (Ci , Mi ) → (||i∈I Ci , M) for i ∈ I, where πi : M|`ϕi → Mi is the obvious projection. It is clear that the projections satisfy the strong localisation property. Given a cocone (ψi , hi ) : (Ci , Mi ) → (C ′ , M ′ ), the universal property of ||i∈I Ci gives χ : ||i∈I Ci → C ′ , and since M(h) is a product, we have h/π : χM ′ → M defined by h/π ; πi = hi for all i ∈ I. It follows almost immediately from this definition that h/π has the strong localisation property if each hi does. To see that this is a homomorphism for the signature of the independent sum, let σ be local to Cj , then = = = =
M ′ |`χ (σ) ; h/π ; πj M ′ (σ) ; hj hj ; Mj (σ) h/π ; πj ; Mj (σ) h/π ; M(σ) ; πj
and for i 6= j, = =
M ′ |`χ (σ) ; h/π ; πi M ′ (σ) ; hi { hi has the strong localisation property } 30
Malcolm
= =
hi h/π ; πi h/π ; M(σ) ; πi
and so M |`χ (σ) ; h/π = h/π ; M(σ). ′
Lemma 4.4 CModSL has pushouts. Proof. Given a span (ϕi , hi ) : (C0 , M0 ) → (Ci , Mi ) (i = 1, 2) in CModSL define the model M of the concurrent connection ϕ1 ⇓ ϕ2 by •
M(h) = { (m1 , m2 ) ∈ M1 (h)×M2 (h) | h1 (m1 ) = h2 (m2 ) }
•
M(µ)(m1 , m2 ) = (M1 (µ)(m1 ), M2 (µ)(m2 )) for µ in C
•
M(µ)(m1 , m2 ) = (M1 (µ)(m1 ), m2 ) for µ local to C1
•
M(µ)(m1 , m2 ) = (m1 , M2 (µ)(m2 )) for µ local to C2
•
M(α)(m1 , m2 ) = M1 (α)(m1 ) = M2 (α)(m2 ) for α in C
•
M(α)(m1 , m2 ) = Mi (α)(mi ) for α local to Ci (i = 1, 2).
Note that the second-last bullet point is well defined since each hi is the identity on visible sorts, hence M1 (α)(m1 ) = h1 (M1 (α)(m1 )) = M0 (α)(h1 (m1 )) = M0 (α)(h2 (m2 )) = h2 (M2 (α)(m2 )) = M2 (α)(m2 ). Letting ψi : Ci → ϕ1 ⇓ ϕ2 denote the inclusions into the concurrent connection, we have the obvious projections πi : M|`ψi → Mi , and it is quite straightforward to see that these are indeed homomorphisms. It is straightforward to show that M |≡ Ei , so it only remains to show that M satisfies the independence axioms. Let α be local to C1 and µ be local to C2 ; for any (m1 , m2 ) ∈ M(h) = = =
M(α)(M(µ)(m1 , m2 )) M(α)(m1 , M2 (µ)(m2 )) M1 (α)(m1 ) M(α)(m1 , m2 )
whence (and by symmetry between subscripts 1 and 2), M |≡ (ϕ1 ⇓ ϕ2 )E . Now, if (χi , gi ) : (Ci , Mi ) → (C ′ , M ′ ) is a cocone for (ϕi , hi ), then we have ξ : (ϕ1 ⇓ ϕ2 )Σ → C ′Σ with ϕi ; ξ = ψi . Define f : ξ|`M ′ → M by f (m′ ) = (g1 (m′ ), g2 (m′ )), which is well defined since g1 |`ψ1 ; h1 = g2 |`ψ2 ; h2 . To see that this is a homomorphism, consider an operation σ in (ϕ1 ⇓ ϕ2 )Σ . If σ is in C0Σ , then M(σ)(f (m′ )) = M(σ)(g1 (m′ ), g2 (m′ )) = (M1 (σ)(g1 (m′ )), M2 (σ)(g2 (m′ ))) = (g1 (M ′ (σ)(m′ )), g2(M ′ (σ)(m′ ))) = f (M ′ (σ)(m′ )). Otherwise, σ is local, say to C1Σ , in which case = =
M(σ)(f (m′ )) M(σ)(g1 (m′ ), g2 (m′ )) (M1 (σ)(g1(m′ )), g2 (m′ )) 31
Malcolm
= = =
(g1 (M ′ (σ)(m′ )), g2 (m′ )) { strong localisation property of g2 } (g1 (M ′ (σ)(m′ )), g2 (M ′ (σ)(m′ ))) f (M ′ (σ)(m′ ))
The case where σ is local to C2Σ is symmetric, so f is indeed a homomorphism, and it is clear that it is the unique f such that f |`ψi ; πi = gi . The strong localisation property of CModSL is indeed a strong restriction on possible communication between subcomponents: it states that local methods can have no effect whatsoever on shared subcomponents, and so it does not apply to the SENDER component of Example 3.3. So we turn our attention now to CMod, we will see that the cofree extensions of Cˆırstea [5] described in the Preliminaries gives a more powerful amalgamation result. Lemma 4.5 CMod has pushouts. Proof. Suppose Ci (i = 0, 1, 2) are component specifications, with meromorphisms ϕi : C0 → Ci (i = 1, 2), and models Mi |≡ Ci (i = 0, 1, 2) with homomorphisms hi : Mi |`ϕi → M0 (i = 1, 2). By cofree extension along ϕi ; ψi (which we simply write as ϕ ; ψ), we have hbi = (εMi |`ϕi ; hi )/εM0 : Miψi → M0ϕ ; ψ . Now let gi : M → Miψi be the pullback of these hbi , so we have gi|`ψi ; εMi : M|`ψi → Mi . Note that M(h) = { (m1 , m2 ) ∈ M ψ1 (h)×M ψ2 (h) | hb1 (m1 ) = hb2 (m2 ) } 1
2
By construction, M |≡ ϕ1 ⇓ ϕ2 , and the gi are (ϕ1 ⇓ ϕ2 )-homomorphisms, so the gi |`ψi ; εMi are Ci -homomorphisms. Now suppose (χi , fi ) : (Ci , Mi ) → (C ′ , M ′ ) is a cocone. We have fi /εMi : M ′ |`ξ → Miψi . Now = = = =
(fi /εMi ; (εMi |`ϕi ; hi )/εM0 )|`ϕ ; ψ ; εM0 (fi /εMi )|`ϕ ; ψ ; ((εMi |`ϕi ; hi )/εM0 )|`ϕ ; ψ ; εM0 (fi /εMi )|`ϕ ; ψ ; εMi |`ϕi ; hi ((fi /εMi )|`ψi ; εMi )|`ϕi ; hi fi |`ϕi ; hi
which shows that fi /εMi ; hbi = (fi |`ϕi ; hi )/εM0 , and since f1 |`ϕ1 ; h1 = f2 |`ϕ2 ; h2 , we have f1 /εM1 ; hb1 = f2 /εM2 ; hb2 , and so, by the universal property of M as a pullback, we have (f /εM )/g : M ′ |`ξ → M, i.e., (ξ, (f /εM )/g) : (ϕ1 ⇓ ϕ2 , M) → (C ′ , M ′ ) . For any e : M ′ |`ξ → M, we have iff
e|`ψi ; gi|`ψi ; εMi = fi (e ; gi )|`ψi ; εMi = fi 32
Malcolm
iff iff
e ; gi = fi /εMi e = (f /εM )/g
which gives the desired universal property. It is easily seen that (State, 1) is also initial in CMod; thus, taking (C0 , M0 ) in the lemma above to be (State, 1) gives a way of constructing an amalgam M1 ||M2 of the independent sum C1 ||C2 , which gives binary coproducts in CMod, and this can be generalised very straightforwardly to arbitrary coproducts, so we have our main Theorem 4.6 (Amalgamation) CMod is cocomplete.
5
Conclusions
We have shown that hidden algebraic component specifications support component composition by concurrent connection, and that this composition extends naturally to composition of models, which we think of as concrete implementations of component specifications. We would argue that this suggests that hidden algebra could form a useful basis for component composition languages, though more technical work needs to be done, and, in particular, more examples and case studies would be needed to make this argument more persuasive. Returning to the issues raised in the introduction, our main results show that it is possible to have a component model that is based on algebraic specifications and ADTs, and in which the result of composition is in itself a component. However, it is an essentially algebraic approach, and therefore, like the coalgebraic approaches to which hidden algebra is related, it is based on operations. The objection could be raised that the approach is essentially object-oriented, and that our components are no more than the ‘fortified collections of objects’ referred to by Arbab [2]. But note that the model we propose incorporates communication between components through shared subcomponents, not merely through method invocation. We would argue that the use of operations in specification simply represents the possibilities for components to change state, and these do not represent methods, though of course the operations could be realised by actual methods in an object-oriented implementation. This more general view of operations can be illustrated by pointing out that they simply represent possible changes of state at particular levels of abstraction, and that the very flexibility of an algebraic approach allows one to move readily between different levels of abstraction. One consequence of our model-theoretic study of concurrent connection is that composite systems can be refined by refining individual components. Indeed, standard algebraic techniques for proving correctness of refinements, as well as techniques specific to hidden algebra [10], combine elegantly with the colimit constructions given above. As a specific example, the ‘adding-to-a-list’ 33
Malcolm
component in Example 1.1 could be refined (moving down to a lower level of abstraction) by the composite presented in Example 3.3, which in turn could be refined by a more concrete specification of streams. In order to develop our proposed approach, we need to extend the amalgamation results presented here to provide a uniform treatment of ‘gluing’ code, along the lines currently achieved by the language Reo [2]. One possibility would be to examine amalgamation results that combine individual states of models of components. Another avenue for future research is suggested by the chatroom example described at the end of Section 3: how to allow for dynamic reconfiguration of components, e.g., chatters joining and leaving the chatroom. The amalgamation properties presented in this paper could be useful here, perhaps in combination with rewriting logic and tile logic [12]).
References [1] Andrade, L. F. and J. L. Fiadeiro, Composition contracts for service interaction, Journal of Universal Computer Science 10 (2004), pp. 375–390. [2] Arbab, F., Abstract Behavior Types: a foundation model for components and their composition, Science of Computer Programming 55 (2005), pp. 3–52. [3] Barbosa, L., Components as processes: An exercise in coalgebraic modeling, in: S. Smith and C. Talcott, editors, Proc. FMOODS 2000 (2000), pp. 397–417. [4] Barbosa, L., Towards a calculus of software components, Journal of Universal Computer Science 9 (2003), pp. 891–909. [5] Cˆırstea, C., Coalgebra semantics for hidden algebra: parameterized objects and inheritance, in: F. Parisi-Presicce, editor, Proc. 12th Workshop on Algebraic Development Techniques (1998), pp. 174–189. [6] Ehrig, H. and B. Mahr, “Fundamentals of Algebraic Specification 1: Equations and Initial Semantics,” Springer, 1985. [7] Goguen, J. A., Types as theories, in: G. M. Reed, A. W. Roscoe and R. F. Wachter, editors, Topology and Category Theory in Computer Science, Oxford University Press, 1991 pp. 357–390. [8] Goguen, J. A. and R. Diaconescu, Towards an algebraic semantics for the object paradigm, in: H. Ehrig and F. Orejas, editors, Recent Trends in Data Type Specification (1994), pp. 1–29. [9] Goguen, J. A., K. Lin and G. Ro¸su, Circular coinductive rewriting, in: Proceedings, Automated Software Engineering ’00 (2000), pp. 123–131. [10] Goguen, J. A. and G. Malcolm, Proof of correctness of object representations, in: A. W. Roscoe, editor, A Classical Mind: essays dedicated to C.A.R. Hoare, Prentice-Hall International, 1994 pp. 119–142.
34
Malcolm
[11] Goguen, J. A. and G. Malcolm, A hidden agenda, Theoretical Computer Science 245 (2000), pp. 55–101. [12] Hirsch, D. and U. Montanari, Consistent transformations for software architecture styles of distributed systems, Electronic Notes in Theoretical Computer Science 28 (1999). [13] Malcolm, G., Behavioural equivalence, bisimulation, and minimal realisation, in: M. Haveraaen, O. Owe and O.-J. Dahl, editors, Recent Trends in Data Type Specifications (1996), pp. 359–378. [14] Meinke, K. and J. V. Tucker, Universal algebra, in: S. Abramsky, D. Gabbay, T. Maibaum, editors, Handbook of Logic in Computer Science, Vol. 1, Oxford University Press, 1993 pp. 189–411. [15] Meyer, B., “Object-Oriented Software Construction,” Prentice Hall, 1997, 2nd edition. [16] Resnick, M., “Turtles, Termites and Traffic Jams: explorations in massively parallel microworlds,” MIT Press, 1994.
35
Malcolm
36
FACS 2005
Reo Based Interaction Model Silvia Amaro
1,4
Dpto. de C. de la Computaci´ on, National University of Comahue, Argentina
Ernesto Pimentel
2,5
Dpto. de Lenguajes y Ciencias de la Computaci´ on, University of M´ alaga, Spain
Ana M. Roldan
3,5
Dpto. de Ing. Electr´ onica y Sist. Inform´ aticos, University of Huelva, Spain
Abstract In Component-based Software Development the integration of possibly heterogeneous and distributed components together to form a single application require mechanisms for controlling and managing the interactions among the active entities. Coordination models and languages offer a solution to this problem. In this context we propose the use of Reo, a channel-based coordination model, to specify the interactive behavior of software components. In particular, we define a way to complement interface description languages for describing components, in such a way that the information about which are the services provided by a component is extended by giving details on how these services should be used. Our aim is to define an interaction description language based on Reo for component coordination. Key Words: components, formal methods, component specification, case study, coordination, process algebra, connector.
1
Introduction
In Component-based Software Development (CBSD) the integration of possibly heterogeneous and distributed components together to form a single ap1
Email:
[email protected] Email:
[email protected] 3 Email:
[email protected] 4 The work of Silvia Amaro has been partially supported by CYTED, proyect VII-JRITOS2 5 The work of Ana M. Rold´ an and E. Pimentel has been partially supported by the Project TINC2004-07943-C04-01 funded by the Spanish Ministry of Education and Science This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs 2
´n Amaro, Pimentel and Rolda
plication require mechanisms for controlling and managing the interactions among the active entities. In spite of its relatively recent birth, a lot of activities are being devoted to CBSD both in the academic and in the industrial world. The reason of this growing interest is the need of systematically developing open systems and “plug-and-play” reusable applications, which has led to the concept of “commercial off-the-shelf” (COTS) components. With the increasing use of distributed systems and COTS, interoperability is a major issue to consider. Commercial component models and platforms (CORBA, DCOM, EJB, .NET) attends interoperability from a syntactic point of view using Interface Description Languages (IDL). They allow the interoperation of heterogeneous components based on syntactic agreements. However the sort of interoperability attended by using IDL’s is not enough in large systems, where the information given by interfaces, that is the knowledge of the services offered by components, and in some cases the services required from other components in run-time is not enough to guarantee that they will suitably interoperate. Indeed, at the protocol level, mismatches may also occur because of blocking conditions and the ordering of exchanged messages, that is, because of differences in the component behaviors. In fact, compatibility checkings at protocol level require the solution of coordination and synchronization problems, to ensure that the restrictions imposed on components interactions when communicating are preserved and their communication is deadlock free. In general, the use of IDL descriptions during run-time is quite limited. They are mainly used to discover services and to dynamically build service calls. However, there are no mechanisms currently in place to deal with automatic compatibility checks or dynamic component adaption which are among the most commonly required facilities for building component-based applications in open and independently extensive systems. To overcome such a limitation, several proposals have been put forward in order to enhance component interfaces. In [10] Doug Lea proposes the use of a protocol specification language (PSL) to describe the protocols associated to component’s methods. It is a very expressive extension of CORBA IDL based on logical and temporal rules, but does not take into account the services a component may need from other components, neither it is supported by proving tools. The approach formalized by Yellin and Strom [15] for describing component service protocols using finite state machines, although considering both services offered and required by components, does not support multi-party interactions. Moreover the simplicity that allows the easy checking also makes it too rigid and unexpressive for general usage in open and distributed environments. Bastide et al. [4] use Petri nets to describe the behavior of components in CORBA, but this approach inherits some of the limitations imposed by the Petri nets notation: the lack of the modularity and scalability of the specifications. This paper addresses the problem of interoperability at protocol level, exploring the capability of the coordination model Reo, for specifying the in38
´n Amaro, Pimentel and Rolda
teraction behavior of software components. Reo [1] is a channel-based coordination model which enforces the use of connectors for the coordination of concurrent processes or component instances in a component-based system. Indeed our aim is to propose this model to enhance components interfaces with a description of an abstract component interaction protocol in a similar way as behavioral types [12] or role-based representations [7,8]. Intuitively, when using this model compatibility checkings will depend on the connector considered for the composition. Moreover, as the connector adds its own behavior to the resulting application, we are interested in analyzing how the composition of the same set of components is affected by selecting different connectors. In Reo complex connectors are constructed compositionally, out of simpler ones, using its join operator, and hiding the internal topology of the resulting connector. This yields a connector with a number of input and output ports which can be used by other entities to interact with and through the connector. As our model is not concerned with the internal topology of connectors, but in the connection ends a connector offers to the environment and its observable behavior, we consider a connector defined by a set of input and output ends, the possible configurations in which it can be, and a labelled transition relation defining its behavior. With this in mind, in this paper we address the problem of generating the labelled transition relation indicated for a given connector. We present an algorithm that takes as input a coordination protocol given by a set of input and output ends and a constraint automata, and produces the labelled transition relation defining the behavior of a connector. The rest of the paper is organized as follows. In section 2 we give an introduction to Reo. Section 3 is devoted to the interaction model, its semantics and the corresponding calculi to encapsulate the model. We also give an algorithm for the generation of the transitions giving the behavior of the connector, from its constraint automata. In section 4 an illustrative example is presented, showing the application of the model and the algorithm. Finally, we give some concluding remarks and future work.
2
An introduction to Reo
Reo [1] is a channel-based coordination model defined in terms of communication primitives acting on connectors which are constructed as a combination of different kinds of channels. The channel composition mechanism in addition to the great diversity of channel types with semantics different from the traditional ones allows the construction of many different connectors imposing very interesting coordination patterns. For example, the connector Exclusive Router showed in figure 1a) enables the flow of data items from its input end a to one of its output ends b or c (when both b and c are willing for a data item a non deterministic decision take place). This connector is the result of composing five synchronous channels, two Lossy Synchronous channels and a 39
´n Amaro, Pimentel and Rolda
a
{a, c} da = dc
{a, b} da = db b
c
b)
a)
Fig. 1. Connector Exclusive Router and constraint automata
synchronous Drain. A Lossy Synchronous channel is a synchronous channel with a loosing policy for items written on its input end when the output end is not waiting for it; and a Synchronous Drain channel is a synchronous channel with two input ends which has an important synchronization function. The formal semantic for Reo is based on relations on timed data streams [2]. Indeed timed data streams, which are pairs consisting of a data stream and a time stream models the potential behavior of connector ends, and relations over them express which combinations of timed data streams are mutually consistent. A timed data stream is a pair hα, ai consisting of a data stream α and a time stream a, in which the time stream a specifies for each n = 0 the time moment a(n) at which the nth data element α(n) is being input or output. Its derivative hα0 , a0 i is used to indicate changes in time. In this context is possible to define the language induced by a Reo connector in terms of the TDSs representing its input and output ends. The coinduction reasoning principle applied to the TDS calculus is sufficiently powerfull for the proof of certain formal properties over connectors, such as expressiveness and connector equivalence. The operational model for the behavior of Reo connectors is based on Constraint Automata and was introduced by Arbab et. al in [3]. A constraint automata describes the TDS language induced by Reo connector networks. A constraint automata — over Data, a finite set of data that can be sent and received via channels— is a tuple A = (Q, N , −→, Q0 ) where Q is a finite set of states, N a finite set of nodes, −→ is a finite subset of Q × (2N × DC) × Q called the transition relation —DC denotes the set of data constraints over N,g N —, and Q0 ⊆ Q a nonempty set of initial states. A transition q −→ p ∈−→ 40
´n Amaro, Pimentel and Rolda
requires N 6= 0 and g ∈ DC(N ) to be satisfiable. Figure 1b) shows the constraint automata corresponding to the connector Exclusive Router introduced before. The join operator, and hiding operation used in Reo for the construction of complex connectors have their counterparts in the model of constraint automata. Though the constraint automata for a complex connector is obtained from the constraint automata of its constituents through product and hiding of internal nodes.
3
The interaction model
When using channel-based coordination models the framework evolves by means of performing communication actions over input or output ends of channels to which the coordinated components are connected. In the case of Reo, the communication actions are performed over the input/output ends of a connector, then the interaction model will be parametrized with respect to the connector being considered. 3.1
The Calculus
For the specification of components interaction protocols we define a process algebra R based on the communication primitives of Reo. We consider a set I of input ends, a set O of output ends, and the basic actions to insert an item in a connector (write), to remove an item from the connector (take) and to capture an item without removing it (read). Agents in R are constructed by means of the prefix operator, the nondeterministic choice and the parallel composition. Formally, the syntax of R is defined as follows: P ::= 0 | A.P | P + P | P k P | recX.P A ::= wr (c, v ) | tk (c, [v ]) | rd (c, [v ]) where 0 denotes the empty process and c ∈ I ∪ O denotes an input or output end of a connector. The prefixes wr, tk and rd are shorthand for the basic operations write, take and read respectively. Note that in output operations the variable is optional, if it is not specified the operation succeed when any data item is available for taking (or reading) and it is removed through the specified connector end. As in Reo communication is possible only in presence of a connector, in order to define the operational semantics of R we must consider the semantics of the selected connector. We consider a connector C defined by a tuple hIC , OC , ΣC , 7−→C i, where IC , OC represent the input ends set and output ends set of connector C respectively, ΣC is the set of states, that is the possible configurations of the connector, and 7−→C ⊆ (ΣC × MAct) × MAct × (ΣC × MAct) represents the labelled transition relation defining the connector behavior. MAct denotes the multiset of communication actions. When (hC, acti, act1 , hC0 , act2 i) ∈7−→C we will write 41
´n Amaro, Pimentel and Rolda
act
(1)R
act
act · P −→ P
(4)R
P1 k P 2 act
(2)R
act
P1 −→ P10
(3)R
act
P1 k P2 −→ P10 k P2
(6)R
U
act2
−→
P10 k P20 act
hP, Ci −→ hP 0 , C0 i act
act
act
P1 −→ P10
act1
P −→ P 0 hC, acti 7−→C hC0 , ∅i
(5)R
act
P1 + P2 −→ P10
act
1 2 P1 −→ P10 P2 −→ P20
act
2 1 P20 hC, acti 7−→1C hC0 , act2 i P1 −→ P10 P2 −→
hP1 k P2 , Ci −→ hP10 k P2 , C0 i
Table 1 Transition System for R
act
hC, acti 7−→1C hC0 , act2 i with the following intuitive interpretation: act denotes the set of actions which applied in parallel over the ends of the connector may produce a progress over it producing eventually a state change. The set act1 denotes the actions actually applied, and act2 represents pending actions in any end of the connector. Pending actions are actions write or take that, in presence of a synchronous behavior, remain pending when applied in parallel U with read actions. These multisets must respond to the relation act = act1 act2 . When deduced from the context, we will omit the subindex C when referring to the sets I, O, and Σ. The rules giving the connector behavior will be generated from its corresponding constraint automata, using the algorithm introduce in the next subsection. The operational semantics of R depends on the connector considered. Formally, given a connector C with a behavior defined via a labelled transition α relation 7−→C , we define the transition system hR, C, −→C i, where R is the set of programs described in the process algebra, C is the considered connector and −→C ⊆ (R × C) × (R × C) the transition relation defined by rules (5)R and (6)R of table 1. Note that the definition of −→C depends on the auxiliary labelled transition system hR, Act, −→i where −→⊆ R × Act × R is the transition relation defined by rules (1)R to (4)R . There are no rules for recursion, its semantics is defined by the structural axiom recX.P ≡ P [recX.P/X]. We also consider both systems close with respect to the structural axioms for choice and parallel operators. In the process of composing components specified in R, the connector constraints the behavior of the overall system, imposing its own behavior. This leads to a level of composition flexibility which is highly desirable in component based systems. Due to the great diversity of communication patterns possible in Reo, these model, in contrast with other models [6] make possible the production of different systems composed out of the same set of components, by the use of different connectors with a well defined semantic. 42
´n Amaro, Pimentel and Rolda
3.2
From Constraint Automata to 7−→C transitions
Now we present the algorithm to generate the transition rules for the transition relation 7−→C , for the connector C from its constraint automata representation. Let C be a connector defined by the sets I and O of input ends and output ends respectively, and its constraint automata CAC given by: CAC ≡ (QC , NC , →C , QoC ) where NC = I ∪ O. We associate a name Cq to every q ∈ QC to indicate the connector C is in a state q. As the automata transitions are labelled with the maximum number of nodes over which data can flow simultaneously, we can identify from them the input ends and output ends of the connector over which input or output operations ocurring synchronously produce a state change. The symbol |= represents the satisfaction relation resulting from interpreting data constraints over data assignments. Considering the issues mentioned we propose the following algorithm: N,g
(i) for each transition (q −→C p) ∈→C , a transition is generated as follows: act
hCq , actδ i 7−→δC hCp , ∅i where δ is any data assignment function such that δ |= g, y actδ is defined as (actwr )δ ∪ (acttk )δ , where: (actwr )δ = {wr(I, δ(I)) : I ∈ I ∩ N } (acttk )δ = {tk(O, δ(O)) : O ∈ O ∩ N } act
(ii) for each transition rule hCq , acti 7−→C hCp , ∅i generated in (i), suppose ˙ wr , the disjoint union of the tk actions and the wr acact = acttk ∪act 0 tions that can be applied over the connector ends. For each act ⊆ acttk , 0 we construct actrd = {rd(O, t) : tk(O, t) ∈ act } and generate a rule 0
actrd
0
hCq , (act − act ) ∪ actrd i 7−→C hCq , act − act i We need to consider the rd operation particularly because of its non destructive condition. Last rule consider the situation in which at least one rd operation is applied synchronously with other communication operations. In this case only rd operations succeed, the other communication actions (wr and tk) remain pending over the corresponding ends until the environment provides the necessary conditions for them to proceed, by the application of some other rule. Consider the constraint automata CAExR corresponding to the connector exclusive router (Figure 1) given by the tuple: CAExR = (Q, {a, b, c}, →ExR , Q) {a,b},da =db
{a,c},da =dc
where Q = {0}, and (0 −→ C 0) and (0 −→ C 0) ∈→C . Applying the algorithm to the constraint automata CAExR we obtain the transition rules 43
´n Amaro, Pimentel and Rolda
hExR 0 , {wr(a, t), tk(b, t)}i
{wr(a,t),tk(b,t} 7−→ExR
hExR 0 , ∅i
hExR 0 , {wr(a, t), tk(c, t)}i
{wr(a,t),tk(c,t} 7−→ExR
hExR 0 , ∅i
hExR 0 , {wr(a, t), rd(b, t)}i
{tk(b,t} 7−→ExR
hExR 0 , {wr(a, t)}i
{rd(c,t}
hExR 0 , {wr(a, t), rd(c, t)}i 7−→ExR hExR 0 , {wr(a, t)}i Table 2 Transition Rules for the Exclusive Router
given in table 2. Finally we represent the connector as follows: ExR = h{a}, {b, c}, {ExR0 }, 7−→ExR i
4
Specifying components protocols in Reo
As we have already mentioned, intending to solve the interoperability problems at protocol level we propose the use of the model introduced in the previous section, based on Reo, to enhance components interfaces with a description of an abstract component interaction protocol. We ilustrate our proposal by means of an example. We describe a simplified version of a real patient monitoring system that was first introduced by Papadopoulos and Arbab [13] to show the potential of control driving coordination languages for expressing dynamically reconfigurable software architectures. The basic scenario involves a number of monitors and nurses. There is a monitor, one for each patient, recording readings of the patient’s health state in response to a received request. Besides a monitor can also send data in case of exceptional situations. A nurse is responsible for periodically checking the patient’s health state by asking the corresponding monitor for readings; further more a nurse should respond to receiving exceptional data readings. As we can see in the interface below, a monitor offers one method that allows the user to request the periodical readings. The nurse interface defines two methods to be invoked by the environment. Method normal implements the main service offered by the process, it receives readings of the patient’s health state on the parameter normalState and processes them. On the other hand the method signal allows the nurse to treat emergency cases, which are captured on the emergencyState parameter. interface Monitor { void request(); } interface Nurse { void signal ([in]Data emergencyState); void normal ([in] Data normalState); } 44
´n Amaro, Pimentel and Rolda
4.1
Interaction protocols
From the interfaces above is very difficult to discern the way in which a monitor and a nurse will behave if they are integrated in a software application. Nothing is said concerning to their interactions and the rules governing them. In fact, it is not manifested neither the possibility for a monitor to send emergency signals nor that emergency situations have priority for being attended. Now we give the specification for both agents, a monitor and a nurse, oriented to overcome this situation. A monitor receives a request for its data registers on the patient health state readings. Eventually the monitor may detect abnormal situations and in this case it has to send an emergency state. Emergency situations have priority for being attended. Note that the monitor only needs to receive a piece of data in the connection point requestIn, and this action is interpreted as a request for information. On the other hand a nurse is responsible for checking the patients health state. He or she requests a monitor for its data registers writing a token in the connection point associated to this action. A nurse must also attend the reception of emergency states, which must be attended first. The behavior of both agents is defined bellow MONITOR = tk(requestIn).(MONITOR1 + wr(signalOut,). MONITOR1 ) + wr(signalOut,). MONITOR MONITOR1 = wr(normalOut,). MONITOR
NURSE = wr(requetOut,token). NURSE1 + tk(signalIn,). NURSE NURSE1 = tk(normalIn,). NURSE + tk(signalIn,). tk(normalIn,). NURSE Component interaction protocols are specified to describe the behavior of given component interfaces. In general, there are no precise guidelines about what should and should not be included in a protocol specification. It will depend, of course, on the level of abstraction or details required. Because of in Reo communication only is possible by means of input and output operations over connector ends (connection points) we must take them into account 45
´n Amaro, Pimentel and Rolda
syncS
ROut
syncS
syncS
SOut
Valve
syncS
RIn
SIn
Valve
NIn
NOut
Fig. 2. CMN Connector
when specifying protocols. Thus, we associate an input end with each method representing a service offered by the component, and we considered an output end for each service required by the component. In case the method has no arguments we do not consider any object in the input operation. However, for the output operation a token is needed, just as a signal for the requested service. 4.2
Selecting the connector
At this point we address the selection of an adequate connector for the composition of a monitor and a nurse. Consider the situation in which in the resulting application the monitor must serve first the emergency signals and the nurse has the obligation to firstly deal with emergency situation. In the specification neither the monitor, nor the nurse ensure the priority in attending emergency cases, because of the non deterministic choice among attending the periodical readings and attending the emergency readings. In this scenario it seems clear that the expected behavior of the composition of a nurse and a monitor is achieved only when selecting a connector which enforces the required priorities. With this aim in mind we selected the connector CMN shown in Figure 2. The connector is defined by the tuple h{RIn, SIn, N In}, {ROut, SOut, N Out}, ΣCM N , 7−→CM N i In table 3 we give the transitions in the labelled transition relation which defines its behavior, which have been generated following the algorithm previously introduced. Since this connector presents many possible states and so 46
´n Amaro, Pimentel and Rolda
{wr(RIn,t)}
(1)
hCMN 0 , {wr(RIn, t)}i 7−→CMN hCMN 1 , ∅i
(2)
hCMN 0 , {wr(SIn, t)}i 7−→CMN hCMN 3 , ∅i
(3)
hCMN 1 , {wr(SIn, t)}i 7−→CMN hCMN 2 , ∅i
(4)
hCMN 6 , {wr(SIn, t)}i 7−→CMN hCMN 4 , ∅i
(5)
hCMN 2 , {wr(N In, t)}i 7−→CMN hCMN 4 , ∅i
(6)
hCMN 7 , {wr(SIn, t)}i 7−→CMN hCMN 5 , ∅i
(7)
hCMN 2 , {tk(SOut, t)}i 7−→CMN hCMN 1 , ∅i
(8)
hCMN 2 , {tk(ROut, t)}i 7−→CMN hCMN 3 , ∅i
(9)
hCMN 3 , {tk(SOut, t)}i 7−→CMN hCMN 0 , ∅i
{wr(SIn,t)}
{wr(SIn,t)} {wr(SIn,t)}
{wr(N In,t)}
{wr(SIn,t)}
{tk(SOut,t)}
{tk(ROut,t)}
{tk(SOut,t)}
{tk(ROut,t)}
(10) hCMN 4 , {tk(ROut, t)}i 7−→CMN hCMN 5 , ∅i {tk(SOut,t)}
(11)
hCMN 4 , {tk(SOut, t)}i 7−→CMN hCMN 6 , ∅i
(12)
hCMN 5 , {tk(SOut, t)}i 7−→CMN hCMN 7 , ∅i
{tk(SOut,t)}
Table 3 Behavioral Transitions for CMN
many transitions, we give only those related to the blocking situations. This connector imposes certain restrictions over its connection points which seems appropriate for solving our priority problems. The expected behavior is imposed by the effect of the two valve connectors (see [1]), and the four syncSignal connectors (which are connected to the valve control connection points), present in the configuration of the connector. The syncSignal connector is the result of the composition of a syncDrain channel with a syncSpout channel, which has almost the same behavior of a sync channel, except that it doesn’t matter which is the item written over its input end. From the analysis of its transition relation, we conclude that it presents the needed behavior. In fact, it shows an asynchronous behavior, which in some cases is disable by means of an input operation over the input end SIn —rules (3), (5) y (9)—, leaving the connector in a state in which it remains blocked until an output operation is applied over the output end SOut -rules (11), (13), (15) and (16)-. The blocking effect is the result of the propagation of the input operation over SIn foward to the open input end of both valves connectors. Indeed, even when the connector is blocked it is possible to apply input and output operations when it is for example in state CM N2 (blocked but with a data item present in the buffer associated with ROut), or in state CM N4 (blocked but with a data item present in the buffer associated with ROut, and a data item present in the buffer associated with NIn). When composing a nurse and a monitor via the connector CMN, the expected effect is achieved regarding the 47
´n Amaro, Pimentel and Rolda
input and output ports for emergency signals are connected to the connection points Sin and Sout of CMN respectively.
5
Conclusions
Although Reo was defined with a different purpose (i.e. coordination), by applying the previously explained approach it can also be used to specify the interaction behavior of software components. The information provided by this kind of protocols may be useful for analyzing a number of properties like compatibility [6](when two components can interact without deadlocking) or substitutability (when a component can be substituted by another one, preserving its “safe” behavior in the system). Reo’s capability in expressing component’s protocols was manifested by the example. If we analyze the specifications we can observe that without increasing the complexity embedded in the interaction protocol is possible to ensure the priority on the treatment of emergency signals, selecting an adequate connector. Moreover, the composition of a Monitor and a Nurse in presence of other connector (for example one without blocking behavior) results in an application completely different in that the priority in attending emergency signals is not ensure. Because of the possibility of merging synchronous and asynchronous behaviors in connectors and the special semantics of some channels (for example Drain and Spout types), connectors may offered many different communication patterns, and may respond to any constraint needed by an application. In our example the complexity added by the treatment of the emergency signal is transferred to the connector, maintaining a simple and elegant specification. The main objective of this paper was to define a framework for describing the behavior of components in terms of coordination models. In this sense, the basic idea is based on extending interface description languages with an explicit description of the interactive behavior of a component in a similar way as behavioral types [12] or role-based representations [7,8]. To do this, we consider the coordination model Reo which by means of its channel composition mechanism and the great diversity of channel types (with a well defined behavior) allows the construction of many different connectors, imposing specific coordination patterns. In contrast with other models [6], the model based on Reo make possible the production of different systems composed out of the same set of components, by the use of different connectors. We argue that this model of coordination, Reo, is mature enough to be used in the design and validation of components of large distributed systems, and the use of such methods will lead to the better design of components and component-based applications in open systems. Our future work will be devoted to formally define compatibility and substitutability relations for the model presented in this work, oriented to their semiautomated evaluation. We are also interested in the semiautomated se48
´n Amaro, Pimentel and Rolda
lection of connectors for the coordination.
References [1] F. Arbab. A Channel-based Coordination Model for Component Composition. Electronic Notes in Theorethical Computer Science, 68(3), 2003. [2] F. Arbab, J. Rutten A coinductive calculus of component connectors. Technical CWIReport SEN-R0216 ISSN 1386-369X, 2002. [3] F. Arbab, C. Baier, J. Rutten, M. Sirjani Modeling Component Connectors in Reo by Constraint Automata. Technical CWIReport SEN-R0304 ISSN 1386371l, 2003. [4] R. Bastide, O. Sy, P. Palanque. Formal specification and prototyping of CORBA systems. Lecturer Notes in Computer Science, 1628, 474-494, 1999. [5] A. Braccali, A. Brogi, F. Turini. Coordinating interaction patterns. Proc. 16th ACM Sym. 2001. [6] A. Brogi, E. Pimentel, and A. Rold´an. Compatibility of Linda-based Component Interfaces. Electronic Notes in Theoretical Computer Science, 66(4), 2002. [7] C. Canal Un Lenguaje para la Especificacin y Validacin de Arquitecturas de Software. PhD thesis, Department of Languages and Computer Science, University of Mlaga, 2001. [8] C. Canal, L. Fuentes, E. Pimentel, J.M. Troya, A. Vallecillo. Extending Corba Interfaces with Protocols. The Computer Journal, 44(5):448–462, 2001. [9] H. Han. Semantic and usage packing for software components. Proceedings of WOI’99, 25-34, 1999. [10] D. Lea. Interface-based protocol specification of open systems using PSL (ECOOP’95) Lecturer Notes in Computer Science, 25-34, Springer, 1995. [11] G. Leavens, Staraman. Foundations of Component-Based Systems. Cambridge University Press, 2000. [12] J. Magee, J. Kramer, D. Giannakopoulou. architectures Kluwer Acad. Publ. 1999.
Behaviour analysis of software
[13] G. Papadopoulos, F. Arbab. Dynacmic Reconfiguration in Coordination Languages. Advances in Computer 46, Acad. Press, 329-400,1998. [14] A. Vallecillo, J. Hernndez, J.M. Troya. Component Interoperability Technical Report ITI-2000-37, Department of Languages and Computer Science, Univ. of de Mlaga, July 2000. [15] D.M. Yellin, R.E. Strom. Protocol Specifications and Components Adaptors. ACM Transactions on Programming Languages and Systems, 19:1, 292-333, 1997.
49
´n Amaro, Pimentel and Rolda
50
FACS 2005
Verification of Distributed Hierarchical Components Tom´as Barros
1,2
INRIA Sophia Antipolis, CNRS - I3S - Univ. Nice Sophia Antipolis 2004, Route des Lucioles, BP 93, F-06902 Sophia-Antipolis Cedex - France
Ludovic Henrio
1,3
Univ. of Wesminster, Watford Rd Northwick Park, Harrow, HA1 3TP, UK
Eric Madelaine
1,4
INRIA Sophia Antipolis, CNRS - I3S - Univ. Nice Sophia Antipolis 2004, Route des Lucioles, BP 93, F-06902 Sophia-Antipolis Cedex - France
Abstract Components allow to design applications in a modular way by enforcing a strong separation of concerns. In distributed systems this separation of concerns have to be composed with distribution of controls due to asynchrony. This article relies on Fractive, an implementation of the Fractal component model allowing to unify the notion of components with the notion of activity. This article shows how to build automatically the behaviour of a distributed component system. Starting from the functional specification of primitive components, we generate a specification of a system of components, their asynchronous communications, and their control. We then show how to use such a specification to verify properties specific to components, reconfigurations, or asynchrony. Key words: Hierarchical components, behavioural specification, distribution, asynchrony.
1
This research work is carried out under the ACI Securit´e FIACRE funded by the french government, and under the FP6 Network of Excellence CoreGRID funded by the European Commission (Contract IST-2002-004265) and under the associated team OSCAR funded by INRIA and University of Chile 2 Email:
[email protected] 3 Email:
[email protected] 4 Email:
[email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs
Barros, Henrio and Madelaine
1
Introduction
Component programming has emerged as a programming methodology ensuring both re-usability and composability. Components inherit from a long experience about modules, objects and interfaces. The Fractal component model [5] provides hierarchical composition for a better structure, and specification of control interfaces for dynamic management. The various control interfaces allow the execution control of a component and its dynamic evolution: plugging and unplugging components dynamically provide adaptability and maintenance. Particularly, distributed component systems have to feature dynamic reconfiguration. This article aims at a framework for the behavioural specification and verification of distributed, hierarchical, asynchronous, and dynamically reconfigurable components built based on the Fractal specification. The challenge that is addressed is to build a formal framework ensuring both correct composition at deployment (design and implementation), and safe dynamic changes or reconfigurations (maintenance and adaptation). Therefore the intended user of our framework is the application developer in charge of those tasks. This framework should hide as much as possible the complexity of the verification process, and be as automatic as possible. Some early work on behaviour specification of components, such as Wright or Darwin, are based on process algebras. In Sofa [12] components have a frame (specification) and architecture (implementation) protocols, and verification is done through a trace language inclusion of the architecture within the target frame. In a different flavor, the work of Carrez et al on behavioural typing of components [6] gives a sound assembly and compatibility definition which ensures correctness of the composition. But most of the recent developments on correct components, e.g. in the Mobj and Eureka projects, aim at reactive systems and do not consider asynchronous models. To our knowledge, no other work consider the interplay of component management with the user-defined functional behaviour. Our approach is to give behavioural specifications of the components in the form of hierarchical synchronised transition systems. The models for the functional behaviour of basic components may be derived, as described in [1], from automatic analysis of source code, or expressed in a dedicated specification language. Control (or non-functional) behaviour is automatically incorporated within a controller built from the component’s description. The semantics of a component is then computed as a product of the LTSs of its sub-components with the controller of the composite. This system can be checked against requirements expressed as a set of temporal logic formulas, or again as an LTS. The next section reviews Fractal and its distributed implementation Fractive [4]. In section 3 we show how to generate the behavioural model of primitive and composite components. In section 4 we explain how the user specifies 52
Barros, Henrio and Madelaine
both the functional behaviour of primitive components and the control features of their composition. Finally Section 5 shows how our tools can be used to prove behavioural properties on a system of distributed components.
2
Context
We focus on component based systems built using Fractive. Fractive is a Fractal implementation using the ProActive middle-ware [8]. Thus, it provides a component model having the same features as ProActive, the most important being asynchronous method calls, absence of shared memory, user-definable service policy, and transparency versus distribution and migration. 2.1 Fractal A Fractal component is formed out of two parts: a controller (or membrane) and a content. Fig. 1 shows an example of a Fractal component system. E
E
bc lf 1111111111111111111111111111111111111111111111 0000000000000000000000000000000000000000000000 Composite Controller 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Ilf 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Elf 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Ebc 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Composite 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Controller System 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Ilf 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Elf Ebc 000000000000000000000000000 111111111111111111111111111 00000 11111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Ebuf f er : P 000000000000000000000000000 111111111111111111111111111 00000 11111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Producer Elf Ebc 000000000000000000000000000 111111111111111111111111111 00000 11111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 00000 11111 000000000000000000000000000 111111111111111111111111111 00000 11111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 E : C put 00000 11111 000000000000000000000000000 111111111111111111111111111 00000 11111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Buffer 00000 11111 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 E :A 00000 11111 Elf Elf Ebc Ebc 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Eget : C 11111 00000 11111 00000 alarm Ealarm : A 111111111 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 000000000 00000 11111 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Consumer Alarm 000000000 111111111 00000 11111 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 Ialalrm : A Ealarm111111111 :A 000000000 00000Ebuf f er : C 11111 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 000000000 111111111 00000 11111 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 BufferSystem 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 000000000000000000000000000 111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111
Fig. 1. A Fractal component
The controller of a component can have external interfaces (e.g., E in Fig. 1) and internal ones (e.g., I in Fig. 1). A component can interact with its environment through operations at its external interfaces, while internal interfaces are accessible only from the component’s sub-components. Interfaces can be of two sorts: client and server. A server interface receives method invocations while a client interface emits method calls. A functional interface provides or requires functionalities of a component, while a control interface corresponds to a management feature over the component architecture. Fractal defines 4 types of control interfaces : binding control, to bind/unbind the client interfaces (e.g. Ebc in Fig. 1) ; life cycle control, to stop and start the component (e.g. Elf in Fig. 1) ; content management to add/remove/update sub-components, and attribute control to get/set internal attributes. This pa53
Barros, Henrio and Madelaine
per focuses on the first two. A component can perform content and binding operations only when stopped and can emit invocations only when started. 2.2 ProActive ProActive is a pure Java implementation of distributed active objects with asynchronous remote method calls and replies by means of future references. A distributed application built using ProActive is composed of several activities, each one having a distinguished entry point, the active object, accessible from anywhere. All the other objects of an activity (called passive objects) can not be referenced directly from outside. Each activity owns its own and unique service thread and the programmer decides the order in which requests are served by overloading the runActive method (entry point of the activity). The method calls to active objects behave as follows: (i) When an object performs a method call to an active object (e.g., y = OB .m(x)), the call is stored in the request queue of the called object and a future reference is created and returned (y references f ). A future reference encodes the promised return of an asynchronous method call. (ii) At some point, the called activity decides to serve the request. The request is taken from the queue and the method executed. (iii) Once the method finishes, its result is updated, i.e. the future reference (f ) is replaced with the concrete method result (value of y). When a thread tries to access a future before it has been updated, it is blocked until the update takes place (wait-by-necessity). The ASP calculus [7] has been defined to provide a computation model for ProActive. 2.3 Fractive Fractive is the Fractal implementation using ProActive. Some features are left unspecified in the Fractal definition, and may be set by a particular Fractal implementation, or left to be specified at user level. Fractive makes the choice that the start/stop operations are recursive, i.e. they affect the component and each one of its sub-components, in a top-down order. 2.3.1 Primitive Components A primitive component in Fractive is made from one activity whose active object implements the provided interfaces. Both, functional and control requests are dropped in the request queue of the active object. A Fractive primitive behaves as follows: (i) When stopped, only control requests are served. (ii) Start a primitive component means run the RunActive method of its active object 54
Barros, Henrio and Madelaine
(iii) Stop a primitive component means exit from the RunActive method. Since active objects are non-preemptive, the exit from the RunActive method can not be forced: stop requests are signalled by setting the local variable isActive to false; then, the RunActive method should eventually end its execution. 2.3.2 Composites Fractive implements the membrane of a composite as an active object, thus it contains a unique request queue and a single service thread. The requests to its external server interfaces (including control requests) and from its internal client interfaces are dropped to its request queue. A graphical view of any Fractive composite is shown in Figure 2. Elf
Ebc Controller
RunActive
QC
C Ep
C Er Elf
Ilf Ebc
C Ir
C Ip
Sub−components
S Ep
S Er
C Membrane Active Object
Fig. 2. Fractive composite component
The service thread serves the requests in FIFO order but only serves the control requests when the composite is stopped. As a consequence, a stopped composite will not emit functional calls on its required interfaces, even if its sub-components are active and send requests to its internal interfaces. Serving a functional request on an internal provided interface means forwarding the call to the corresponding external required interface of the composite. Serving a functional request on an external provided interface consists in forwarding the call to the corresponding internal required interface of the composite.
3
Behavioural Models
The core of our work consists in synthesizing a behavioural model of each component, in the form of set of synchronised labelled transition systems (LTSs). The formal model has been defined in [1] where we have shown how to build the behaviour of ProActive activities ; this corresponds exactly to the functional part of the behaviour of primitive components in Fractive. Using the same formal model, [2] shows how to generate the control part of Fractal components. This article uses a similar approach for supporting Fractive. Due to size limits, we cannot recall in detail the construction of 55
Barros, Henrio and Madelaine
the LTSs corresponding to the Fractal control operations, only the elements required for an independant reading of the paper are presented below. Given the functional behaviour of a primitive component, or of the subcomponents of a composite, we extract from its architectural description the information required to generate LTSs encoding its control features (life-cycle and binding). The semantics of a component is then computed as the synchronised product of all those part, and is named the component’s controller automaton. The construction is done bottom-up through the hierarchy. At each level, i.e. for each composite, a deployment phase is applied. The deployment is a sequence of control operations, expressed by an automaton, ending with √ a distinguished successful action√ . A successful deployment is verified by the reachability analysis of the action on the automaton obtained by the synchronisation product of the component’s controller and its deployment. As in [2], we define the static automaton of a component as being the synchronisation product of the controller automaton with the deployment automaton, hiding control actions, forbidding any further reconfiguration, and minimised modulo weak bisimulation. When one is not interested in reconfigurations, the static automaton becomes the LTS encoding the behaviour of this sub-component at the next level of the hierarchy. Fig. 3 shows the controller for a Fractive component at any level of the hierarchy. ?bind/unbind(E RI nr , Iext) ∨ ?bind/unbind(I RI np, SubC k .E P I scnp )
?start/stop 111111111111111111111111111111111111111111 000000000000000000000000000000000000000000 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 !bind/unbind(E RI nr , Iext) 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 Interceptor 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 M(~x) 000000000000000000000000000000000000000000 M(~x) E P I np 111111111111111111111111111111111111111111 !start/stop 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 E RI nr (1) (2) 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 (3) 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 SubCk scnr 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 E RI 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 M(~x) M(~x) E P I scnp M(~x) B 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 M(~x) I RI np nr 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 methods M(~x) I P I methods M(~x) 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 (visible ∨ τ ) (visible ∨ τ ) 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 000000000000000000000000000000000000000000 111111111111111111111111111111111111111111 E1 errors & visibles E2
(1) !bind/unbind(I RI np , SubC k .E P I scnp ) (2) !bind/unbind(SubC k .E RI scnr , SubC j .I P I scnr ), k 6= j (3) !bind/unbind(SubC k .E RI scnr , I P I nr )
Fig. 3. Component behaviour model
In the figure, the behaviour of sub-components (i.e. their static LTS) is represented by the box named SubCk . For each interface defined in the component’s ADL description, a box encoding the behaviour of its internal 56
Barros, Henrio and Madelaine ?bind(C[i].IS[ns]) ?bind(C[i].IS[ns]) unbound
?unbind(C[i].IS[ns]) ?unbind(C[i].IS[ns]) bound(C[i].IS[ns])
?bind(C[i].IS[ns]) → C[i].IS[ns] C[i].IS[ns] ?unbind(C[i].IS[ns]) unbound
bound(C[i].IS[ns])
I RI np
M(~x) E
C[i].IS[ns].M(~x)
Fig. 4. Internal interface box detail
(I P I and I RI) and external (E P I and E RI) views are incorporated. The treatment of Fractive method calls is encoded in the box named Interceptor which we detail later. The doted edges inside the boxes indicate a causality relation induced by the data flow through the box. → The behaviour of the interfaces includes functional (method calls M(− x )) and non-functional (control) aspects, as well as the detection of errors (E1 and E2 ) such as the use of an unbound interface. These errors are made visible at the higher level of the hierarchy. For instance in Fig. 4 is shown the details of I RI np which includes the creation of an error event when a method is called on an unbound interface. Note that we put the external interface automaton of a component in the next level of the hierarchy. This enables us to calculate the controller automaton of a component before knowing its environment. Thus, all the properties not involving external interfaces can be verified in a fully compositional manner. 3.1 Modelling the Primitives Figure 5 shows the principle of asynchronous communication between two Fractive primitive components. Use M,fut,so,val
Proxy
RunActive
Response M,co,fut,so,val
Body reQuest M,so,args
reQuest M,co,fut,so,args
Queue
M()s
Proxy
Body Serve M,co,fut,so,args,mode
Queue Client role
Server Role
Fig. 5. Communication between two Activities
In the model (Fig. 5), a method call to a remote activity goes through a proxy, that locally creates a ”future” object, while the request goes to the 57
Barros, Henrio and Madelaine
remote request queue. The request arguments include a reference to the future, together with a deep copy of the method’s arguments, because there is no sharing between remote activities. Later, the request may eventually be served, and its result value will be sent back to the future reference. The Body box in the figure is itself a synchronisation network made from the synchronisation product of the RunActive method’s LTS with the behaviour of each method as described in [1]. The Queue box, additionally to requests reception, encodes the different primitives (used in the body code) provided in the ProActive API for serving the methods in the queue. In the model of a Fractive primitive component we enrich the controller of the active object by adding two extra boxes, LF and NewServe (which correspond to the Interceptor in Fig. 3) as shown in Fig. 6. The body box is the only part that is not generated automatically. !Reponse M,fut,args !stopped
!return
LF !started
Proxy
!start
Body
!Request M,fut2,args ?Reponse M,fut2,args
started !stop (1) !Serve* M,fut,args
Queue
NewServe ?Serve start/stop
(2) !ServeFirst NF,args (3) !ServeFirstNF NF,args
! start/stop ?Serve bind/unbind, args ! bind/unbind (args)
?Request M or NF,fut,args
!bind/unbind,args
Fig. 6. Behaviour model for a Fractive primitive
NewServe implements the treatment of control requests. “start” fires the RunActive method (transition) in body. “stop” triggers the !stop synchronisation with body (Fig.6). This synchronisation should eventually lead to the termination of the RunActive method (!return synchronisation). In the Fractive implementation, this is done through setting the state variable isActive to false, which should eventually cause the RunActive method to finish, only then the component is considered to be stopped. The Queue box can perform three actions: (1) serve the first functional method corresponding to the Serve API primitive used in the body code, (2) serve a control method only in the head of the queue, and (3) serve only control methods in FIFO order, bypassing the functional ones. 3.2 Modelling the Composites A composite membrane in Fractive is an active object. When started, it serves functional or control methods in FIFO order, forwarding method calls between internal and external functional interfaces. When stopped it serves 58
Barros, Henrio and Madelaine
only control requests.
!Request M,fut2,args
RunActive ?call(M,args)
?Serve start/stop ! start/stop
?Serve M,fut,args
?Serve bind/unbind, args
!Response M,fut,args
!Request M,fut2,args
?Response M,fut2,args
!fut.call(M,args) ! bind/unbind (args)
?Response M,fut2,args
fut
Proxy
Body !start/stop
!ServeFirstNF NF,args
!stopped
LF
Composite Membrane (Interceptors + LF)
!started !ServeFirst M or NF,fut,args
Queue !bind/unbind,args ?Request M or NF,fut,args
!Response M,fut,args
Fig. 7. Behaviour of a composite membrane
The membrane active object is created based on the composite description (given by the ADL). This membrane corresponds to the Interceptor box in Fig. 3. Note that the future references (proxy box in Fig. 7) are updated in a chain following the membranes from the primitive serving the method to the caller primitive. Since the method calls include the reference of the future in the arguments, future updates can be addressed directly to the caller immediately before in the chain. Consequently, like in the implementation, the future update is not affected because of rebinding or the life-cycle status of the components.
3.3 Building the Global Behaviour The next step is to build a global model for the component. This ”global” behaviour construction is compositional in the sense that each level of hierarchy can be studied independently, relying on some abstraction of the subcomponents behaviours. In practice, the abstract model of a subcomponent can be defined by its formal specification, or computed recursively from analysis of its ADL and its code. As in our previous work [1,2], we build finite abstraction of our models using finite instantiations of the data values of parameters, before computing any synchronous product. Whenever the checking tools allow it, this instantiation and the corresponding state space generation is done on the fly during the proof. This data instantiation is interpreted as a partition of the data domains and induces an abstract interpretation of the parameterized LTS. The instantiation will also be chosen with respect to the values occurring in the properties we are interested in. 59
Barros, Henrio and Madelaine
4
The User View
The models for the non-functional aspects described in this paper are built automatically. The user only has to provide the architecture through the Fractal ADL and the functional behaviour of the primitive components. 4.1 Looking at one Example We come back to our example from Figure 1. It shows, as a hierarchical component system, the classical problem of a bound buffer with one consumer and one producer. The consumer consumes one element at a time while the the producer may feed the buffer with an arbitrary quantity of elements in one action. Additionally the buffer emits an alarm through its interface Ialarm , when the buffer is full. The user may describe the system topology using the Fractal Architecture Definition Language (ADL). Fractive uses the default concrete syntax for this ADL based on XML. The XML file describing System is shown in Fig. 8. System.fractal 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Buffer.lotos
process BUFFER[NOACTIVE, SERVE_GET, GET_REP, PUT, ALARM](stock:Nat, bound:Nat): exit :=
PUT ?X:Nat [X 0] -> SERVE_GET?C:Cons; GET_REP!C; BUFFER[NOACTIVE, SERVE_GET, GET_REP, PUT, ALARM](stock-1,bound) [] [stock == bound] -> ALARM; BUFFER[NOACTIVE, SERVE_GET, GET_REP, PUT, ALARM](stock,bound) [] NOACTIVE; exit endproc
Fig. 9. Buffer behaviour Fig. 8. System ADL
The XML description shown in Fig. 8 specifies that the system is composed of the composite BufferSystem (line 6), itself described in a separate file ( components/BufferSystem.fractal), and the primitive Alarm, which implementation is the Java class components.Alarm (line 15). BufferSystem receives as construction parameter the maximal size of the buffer (3 in our example, line 7) and requires an interface named alarm of type components.AlarmInterface (lines 8,9). Alarm provides an interface alarm of type components.AlarmInterface (lines 13,14). The behaviour tag (line 16) points to a file containing the behaviour of alarm in LTS form. Finally, at lines 21, 22, the ADL defines that upon deployment, the interface alarm of BufferSystem should be bound to the interface alarm of Alarm. 60
Barros, Henrio and Madelaine
4.2 Automatic Construction Our set of tools includes : •
A tool, described in [2] that hierarchically builds the behaviour model of a component system. At each level of the component hierarchy, it builds the automata describing life-cycle and binding behaviour. We are now working to add the Fractive new elements to this tool, namely the automata encoding the request queue, the proxies for future responses, the NewServe policy for primitives and the RunActive policy for composites as described in Section 3. This tool produces networks of parameterized automata in Parameterized FC2 format.
•
A tool named Fc2Parameterized, described in [1], producing a finite instantiation of the system from a finite abstract domain for each parameter. These values may be in some cases taken from the system description, as the buffer capacity set to 3 in the ADL, or deduced from the significant values occurring in the properties. For parameters which types are simple (see [1]) these abstractions are abstract interpretations in the sense of [9].
•
A tool analysing the ADL. It generates the structure of the component hierarchy and the synchronisation networks for combining the various parts at each level of the system.
•
Interface tools with the CADP tool-set [10], at the level of LTSs and of synchronisation networks. We then make a heavy use of the CADP tools (distributed state space generator, bisimulation minimiser, on-the-fly modelchecker).
The length of Fractive request queues are unbound, and their abstraction must be chosen carefully. The choice of the queue depth is critical w.r.t. size of the generated state space: considering request queues of size 3, we were only able to generate the state space of BufferSystem (approx. 191M states and 1,498M transitions) on a cluster composed of 24 bi-processor nodes using the distributed model generation tool distributor from the CADP tool-set. For the complete system we did not even try to generate the complete automaton. Staying in the context of explicit-state tools, we use a better approach : we define the set of control actions (whether in deployment or reconfigurations) involved in each specific property we want to prove. Then we forbid any other control actions for the model, and we also use this set to determine an approximation of the length of the queues. Given those parameters, we build all basic automata, hide any action not involved in the properties, and reduce the basic automata w.r.t. weak bisimulation. Last, we compute the products of the reduced automata, using the on-the-fly verification feature of CADP. This approach has enabled us to verify all properties listed in the next section in a simple desktop machine (CPU Pentium 3GHz, RAM 1.5 GB). A potential gain would be to use partial orders or symmetry based statespace representation, especially for the request queue structures, and depend61
Barros, Henrio and Madelaine
ing on the commutativity properties of the service policy.
5
Properties
The preceding sections focused on building the correct models and not on expressing properties. This section presents some properties to illustrate the verification power of our approach. We use regular alternation-free µ-calculus [11] because of its rich expressiveness and because it is the default way to express properties in the model-checker we use (from the CADP tool-set). Regular alternation-free µ-calculus is an extension of the alternation-free fragment of the modal µ-calculus with action predicates and regular expressions over action sequences. It allows direct encodings of ”pure” branchingtime logics like CTL or ACTL, as well as of regular logics like PDL. Moreover, it has an efficient model checking algorithm, linear in the size of the formula and the size of the LTS model. 5.1 Deployment In Section 3 we defined the deployment automaton, that describes the control steps required for setting the system elements and bindings, and starting all components. For synchronous components [2], the static automaton represents the normal behaviour of the component after deployment. In Fractive, however, method calls are asynchronous, and there may be delays between the request for a control method and its treatment. So checking the execution of a control operation must be based on the observation of its application on the component, rather than the arrival of the request: •
The actions Sig(bind(intf1,intf2)) and Sig(unbind(intf1,intf2)) encodes when a binding between the interfaces intf1 and intf2 is effective. It corresponds for instance to the synchronisations !bind/unbind(E RI nr , Iext ) or !bind/unbind(I RI np , SubC k .E P I scnp ) in Fig. 3.
•
The actions Sig(start(name)) and Sig(stop(name)) encodes when the component name is effectively started/stopped. It corresponds to the synchronisations !start/stop in Fig. 3.
One of the interesting properties is that the hierarchical start operation effectively occurs during the deployment; i.e. that the component and all its sub-components are at some point started. This property can be expressed as the (inevitable) reachability of Sig(start(name)) in the static automaton of System, for all the possibles executions, where name = {System, BufferSystem, Alarm, Buffer, Consumer, Producer}. We leave the actions Sig(start(name)) observable in the static automaton and we express this reachability property as the following regular µ-calculus formula, verified in our example: [ true*.Sig(start(System))] true ∧ [ true*.Sig(start(BufferSystem))] true ∧ [ true*.Sig(start(Alarm))] true ∧ [ true*.Sig(start(Buffer))] true ∧
62
Barros, Henrio and Madelaine [ true*.Sig(start(Consumer))] true ∧
[ true*.Sig(start(Producer))] true (1)
5.2 Pure-Functional Properties Most of the interesting properties concern the behaviour of the system after its deployment, at least while there are not reconfigurations. For instance, in the example, we would like to prove that a request for an element from the queue is eventually served, i.e. that the element is eventually obtained. If the action of requesting an element is labelled as get req() and the answer to this request as get rep(), then this inevitability property is expressed as the following µ-calculus formula, as well verified by the static automaton of the example: [ true*.get req() ] µ X. (< true > true ∧ [ ¬ get rep() ] X )
(3)
5.3 Functional Properties Under Reconfigurations The approach described in this paper enables to verify properties not only after a correct deployment, but also after and during reconfigurations. For instance, property (3) becomes false if we stop the producer since at some point the buffer will be empty, and the consumer will be blocked waiting for an element. However, if the producer is restarted, the consumer will receive eventually an element and the property should become true again. In other words, we can check that, if the consumer requests an element, and then the producer is stopped, if eventually the producer is started again, the consumer will get the element requested. For proving this kind of properties the static automaton is not sufficient, we need a behavioural model containing the required reconfiguration operations. We add to the component network a reconfiguration controller (Fig. 10): its start state corresponds to the deployment phase, and the next state corresponds to the rest of the life of the component, where reconfigurations operations are enabled but are no more synchronised with the deployment. This √ state change is fired by the successful termination of the deployment ( ). √
C′t
Ct |D
Fig. 10. Synchronisation product supporting further reconfigurations
For the property stated above, the reconfigurations ?stop(Producer) and ?start(Producer) are left visible, and this property is expressed by the µcalculus formula, which is also insured in our example : (* If a request from the consumer is done before reconfiguration *) [ (¬ (?stop(Producer) ∨ ?start(Producer))*.get req() ] ( (* a response is given before stopping the producer *) µ X . ( < ¬ ?stop(Producer) > true ∧ [¬ (get rep() ∨ ?stop(Producer))] X)
63
Barros, Henrio and Madelaine ∨ (* or given after restart the producer and without stopping it again *) [ true* . ?start(Producer) ] µ X . ( true ∧ [¬ (get rep() ∨ ?stop(Producer))] X)) (4)
5.4 Asynchronous Behaviour Properties Let us now focus on a property specific to the asynchronous aspect of the component model. The communication mechanism in Fractive allows any future, once obtained, to be updated with the associated value, provided that the corresponding method is served and terminates correctly; binds, unbinds or stops operation cannot prevent this. For example, if the consumer is unbound after a request, it gets anyway the response, even if the link is then unbound or the component stopped. Using the approach for reconfigurations described above: enabling ?unbind(buffer,Buffer.get) and ?stop(Consumer), the property can be expressed as follows. This property is verified in the example: [ true*.get req() ] µ X. (< true > true ∧ [ ¬ get rep() ] X )
6
(5)
Conclusion
This paper provides methods and tools to build the specification of distributed hierarchical components, in a hierarchical bottom-up fashion. Our approach relies on the definition of a synchronisation network of LTSs, each LTS expressing a different aspect of the component behaviour. The functional behaviour of primitive components are given by the user either with an specification language or obtained by data source analysis. The non-functional behaviours are automatically incorporated based on the component description. The main contributions of this paper are: •
We define a general synchronisation network modelling the functional and control behaviour at any hierarchical level of a component system.
•
We incorporate the Fractive component features by automatically adding automata encoding the queues, future responses and serving policies depending on the life-cycle status.
•
Finally we prove a set of properties on our example, some of them very generic, others written as temporal logic requirements by the user. Those properties concern various life phases of the component, and involve the asynchronous aspects of Fractive components.
The model is automatically built from the functional behaviour of primitives and the component system description (in an Architecture Description Language). We have illustrated our approach with a guided example. A detailed description is available in [3]. Finally, many approaches are being developed to cover the right composition of components considering their functional aspects. From the user point 64
Barros, Henrio and Madelaine
of view, one of the strongest advantage of components is the separation of concerns. However, when coming to behavioural verification, one still needs to take into account the inter-play between functional and non-functional aspects, at least for existing component models. The main originality of this paper is to encode the deployment and reconfigurations as part of the behaviour, and thus verify the behaviour of the whole system of components. Moreover, our method allows to reason about reconfiguration not only at design time, but also dynamically during the execution of the system, as a safety proof before changing a sub-component. This paper provides a big step towards a concrete and strongly usable toolset. This tool-set builds the models automatically and gives feedback about generic properties and errors detection. The user may also define and verify further properties, or use the generated models to check against a specification. Further developments are necessary to take into account other important features of distributed systems, in particular the managment of exceptions, and the mechanisms for group communication, very important in computing Grids. We also want to adopt more efficient techniques for the analysis and model-checking tools, improving our use of on-the-fly methods, and looking at specific, more compact representation of state-spaces.
References [1] T. Barros, R. Boulifa, and E. Madelaine. Parameterized models for distributed java objects. In Forte’04 conference, Madrid, 2004. LNCS 3235, Spinger Verlag. [2] T. Barros, L. Henrio, and E. Madelaine. Behavioural models for hierarchical components. In SPIN’05 Workshop, San Francisco. LNCS 3639, Spinger Verlag. [3] Tom´as Barros. Formal Specification and Verification of Distributed Componennts Systems. PhD thesis, INRIA/I3S/CNRS, Novembre 2005, http://www-sop.inria.fr/oasis/Vercors [4] F. Baude, D. Caromel, and M. Morel. From distributed objects to hierarchical grid components. In International Symposium on Distributed Objects and Applications (DOA), Catania, Italy, 2003. LNCS, Springer. [5] E. Bruneton, T. Coupaye, and J. Stefani. Recursive and dynamic software composition with sharing. 7th ECOOP Int. Workshop on Component-Oriented Programming (WCOP’02), June 2002. [6] A. Fantechi C. Carrez and E. Najm. Behavioural contracts for a sound assembly of components. In in proceedings of FORTE’03, volume LNCS 2767. SpringerVerlag, 2003. [7] D. Caromel, L. Henrio, and B. Serpette. Asynchronous and deterministic objects. In 31st ACM Symp. on Principles of Programming Languages. ACM Press, 2004.
65
Barros, Henrio and Madelaine
[8] D. Caromel, W. Klauser, and J. Vayssi`ere. Towards seamless computing and metacomputing in Java. Concurrency Practice and Experience, 10(11– 13):1043–1061, Nov. 1998. [9] R. Cleaveland and J. Riely. Testing-based abstractions for value-passing systems. In CONCUR’94. LNCS 836, Springer, 1994. [10] H. Garavel, F. Lang, and R. Mateescu. An overview of CADP 2001. European Association for Software Science and Technology Newsletter, 4:13–24, aug 2002. [11] R. Mateescu and M. Sighireanu. Efficient on-the-fly model-checking for regular alternation-free mu-calculus. In S. Gnesi et al, editor, Proceedings of FMICS’2000, GMD Report 91, pages 65–86, Berlin, April 2000. [12] F. Plasil and S. Visnovsky. Behavior protocols for software components. IEEE Transactions on Software Engineering, 28(11), nov 2002.
66
FACS 2005
Proving Component Interoperability with B Refinement Samir Chouali
1
LORIA - University Nancy 2, Campus Scientifique BP 239, F 54506 Vandoeuvre-l`es-Nancy Cedex, France
Maritta Heisel
2
Universit¨ at Duisburg-Essen, Fachbereich Ingenieurwissenschaften Institut f¨ ur Medientechnik und Software Engineering D-47048 Duisburg, Germany
Jeanine Souqui`eres
3
LORIA - University Nancy 2, Campus Scientifique BP 239, F 54506 Vandoeuvre-l`es-Nancy Cedex, France
Abstract We use the formal method B for specifying interfaces of software components. Each component interface is equipped with a suitable data model defining all types occurring in the signature of interface operations. Moreover, pre- and postconditions have to be given for all interface operations. The interoperability between two components is proved by using a refinement relation between an adaptation of the interface specifications. Key words: Component interoperability, B method.
1
Introduction
In recent years, the paradigm of component orientation [9,19] has become more and more important in software engineering. Its underlying idea is to develop software systems not from scratch but by assembling pre-fabricated parts, as is common in other engineering disciplines. Component orientation has emerged from object orientation, but the units of deployment are usually more complex than simple objects. As in object orientation, components are 1 2 3
Email:
[email protected] Email:
[email protected] Email:
[email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs
Chouali, Heisel and Souqui` eres
encapsulated, and their services are accessible only via interfaces and their operations. In order to really exploit the idea of component orientation, it must be possible to acquire components developed by third parties and assemble them in such a way that the desired behavior of the software system to be implemented is achieved. This approach leads to the following requirements: (i) The description (i.e., specification) of a component must contain sufficient information to decide whether or not to acquire it for integration in a new software system. First, this requirement concerns the access to the component’s source code that may not be granted in order to protect the component producer’s interests. Moreover, component consumers should not be obliged to read the source code of a component to decide if it is useful for their purposes or not. Hence, the source code should not be considered to belong to the component specification. Second, it does not suffice to describe the interfaces offered by a component (called provided interfaces in the following). Often, components need other components to provide their full functionality. Hence, also the required interfaces must be part of a component specification. (ii) For different components to interoperate, they must agree on the format of the data to be exchanged between them. Hence, each interface of a component must be equipped with a data model that describes the format of the data accepted and produced by the component. It does not suffice to give only the signature of interface operations (e.g., operation foo takes two integers and yields an integer as its result) as is common in current interface description languages. It is also necessary to describe what effect an interface operation has (e.g., operation foo takes two integers and yields their sum as a result). In order to fulfill the above requirements, a component interface specification must contain the following information: •
•
a data model associated with each required and provided interface of a component (interface data model ), pre- and postconditions for each interface operation, such that design by contract [13] becomes possible.
We use UML class diagrams [4] to express the interface data model and the formal notation B [1]. Based on these ingredients, we prove the interoperability between two components by using a refinement relation between an adaptation of their interface specifications. Part of this notion of interoperability between component interfaces is based on a specification matching approach [24]. We chose to use the B method because its underlying concepts of machine and refinement fit well with components and their interoperability, and because the method is equipped with powerful tool support. Thus, we can exploit existing technology for proving component interoperability. Using for example the object constraint language OCL and generating verification conditions from scratch would be much more tedious. Note that our approach takes into account only the functional aspects of components. Non-functional aspects such as security and performance are of course also important, and we aim to treat these issues in future work. 68
Chouali, Heisel and Souqui` eres
The rest of the paper is organized as follows: in Section 2, we discuss related work. Then, we present an overview of the B method in Section 3. We introduce the specification of component interfaces in Section 4. The notion of interoperability between two components is defined in Section 5 with its verification using the notion of refinement as it is defined for B. The case study of a hotel reservation system serves to illustrate our approach. The paper finishes with some concluding remarks in Section 6.
2
Related Work
In an earlier paper, we have investigated the role of component models in component specification [10]. The specification of a component model makes it possible to obtain more concise specifications of individual components, because these may refer to the specification of the component model. The component model specification need not be repeated for each individual component adhering to the component model in question. In this paper, we investigate the necessary ingredients a component specification must have in order to be useful for assembly of a software system out of components. These ingredients are independent of concrete component models. Several proposals for component specification have already been made. They have in common that they have no counterpart of our interface data model and that they do not consider interoperability issues, but only the specification of single components. A working group of the German “Gesellschaft f¨ ur Informatik” (GI) has defined a specification structure for business components [20]. That structure comprises seven levels, namely marketing, task, terminology, quality, coordination, behavioral, and interface. Our specification structure covers the layers terminology, coordination, behavioral, and interface by proposing concrete ways of specifying each of those levels. The other layers of the GI proposal have to do with non-functional aspects of components. Beugnard et al. [3] propose to define contracts for components. They distinguish four levels of contracts: syntactic, behavioral, synchronization, and quality of service. The syntactic level specifies only the operation signatures, the behavioral level contains pre- and postconditions, the synchronization level corresponds to usage protocols, and the quality of service level deals with nonfunctional aspects. Beugnard et al. do not introduce data models for their interfaces. It cannot easily be checked if two components can be combined. The component specification approach of Lau and Ornaghi [11] is closer to ours, because there, each component has a context that corresponds to our interface data model. A context is an algebraic specification, consisting of a signature, axioms, and constraints. In contrast, we deem it more appropriate to allow for an object-oriented specification of the data model of a component interface. This makes it possible to take side effects of operations into account and to use inheritance, concepts that are frequently used in practice. Cheesman and Daniels [6] propose a process to specify component-based software. This process starts with an informal requirements description and produces an architecture showing the components to be developed or reused, 69
Chouali, Heisel and Souqui` eres
their interfaces and their dependencies. For each interface operation, a specification is developed, consisting of a precondition, a postcondition and possibly an invariant. This approach follows the principle of design by contract [13]. Our specification of component interfaces is inspired by Cheesman and Daniels’ work because that work clearly shows that for each interface, a data model is necessary. However, Cheesman and Daniels do not consider the case that already existing components with possibly different data models have to be combined, and hence they do not define a notion of interoperability. Canal et al. [5] use a subset of the polyadic π-calculus to deal with component interoperability only at the protocol level. The π-calculus is well suited for describing component interactions. The limitation of this approach is the low-level description of the used language and its minimalistic semantics. Bastide et al. [2] use Petri nets to specify the behavior of CORBA objects, including operation semantics and protocols. The difference with our approach is that we take into account the invariants of the interface specifications. Zaremski and Wing [24] propose an interesting approach to compare two software components. It is determined whether one component can be substituted for another. They use formal specifications to model the behavior of components and the Larch prover to prove the specification matching of components. Others [8,21] have also proposed to enrich component interface specifications by providing information at signature, semantic and protocol levels. Despite these enhancements, we believe that in addition, a data model is necessary to perform a formal verification of interface compatibility. The idea to define component interfaces using B has been introduced in an earlier paper [7].
3
The B method
The B method [1] is a formal software development approach allowing to develop software for critical systems. It covers the entire development process from an abstract specification to an implementation. Its basis is set theory. The basic building block is the abstract machine that is similar to a module or a class in an object-oriented development. A B specification consists of one or several abstract machines (examples of B machines are given in Section 4). Each of them describes a set of variables, invariance properties (also called safety properties) referring to these variables, an initialization, which is a predicate initializing the variables, and a list of operations. The specification of an operation consists of a precondition part and a body part. The precondition expresses the requirement that must be met whenever the operation is called. The body expresses the effect of the operation. The states of a specified system are only modifiable by operations that must preserve its invariant. A def B operation OP is defined as : OP = PRE P THEN S END, where P is a precondition, and S is the body part, expressed as a generalized substitution. S may for example take the following shapes: 70
Chouali, Heisel and Souqui` eres def
•
assignment statement: S = x := E where x is a variable and E is an expression,
•
multiple assignment: S = x, . . . , y := E, . . . , F ,
•
IF statement: S = IF P ′ THEN S ′ ELSE T ′ END, where P ′ is a predicate, S ′ and T ′ are substitutions.
def
def
The formula [S]P ost (where S is a substitution, and P ost is a predicate) is called the weakest precondition for S to achieve P ost. It denotes the predicate which is true for any initial state, from which the execution of S is guaranteed to achieve P ost. The B method provides structuring primitives that allow one to compose machines in various ways. Large systems can be specified in a modular way and in an object-based manner [14,12]. A system is developed by refinement used to transform an abstract specification step by step into more concrete ones. For each refinement step, we have to prove that the refined specification is correct with respect to the more abstract specification. In the end, we arrive at an implementation that refines its abstract specification. Verification can be done with the B theorem prover, Atelier B [18].
4
Specification of component interfaces
Our goal is to propose a way of specifying components as black boxes, so that component consumers can deploy them without knowing their internal details. Hence, component interface specifications play an important role, as interfaces are the only access points to a component. 4.1 Definition A component specification must contain all information necessary to decide whether the component can be used in a given context or not. This concerns the data used by the component as well as its behavior visible to its environment. This behavior is realized by services which can be used by other components or software systems. These services are collected in provided interfaces. However, in many cases, a component depends on services offered by other components. In this case, the component can work correctly only in the presence of other components offering the required services. The services required by a component are collected in required interfaces. Required interfaces are an important part of a component specification, because without the knowledge what other components must be acquired in addition, it is impossible to use the component in a component-based system. An interface specification consists of the following parts: (1) The specification of its interface data model which specifies (i) the types used in the interface, (ii) a data state as far as necessary to express the effects of operations, and (iii) invariants on that data state. In the following, we use UML class diagrams [4] to express the data model for reasons of readability. This class diagram is then automatically transformed into a B specification 71
Chouali, Heisel and Souqui` eres
IHotelMgt HotelMgr
IMakeReservation
IHotelHandling Reservation− System
...
... ITakeUpReservation
ICustomerHandling
Fig. 1. Component architecture of the hotel reservation system ReservationDetails
HotelDetails
hotel: HotelId dates: DateRange
id: HotelId name: String
Customer id: CustId
Reservation 1 *
resRef: Integer dates: DateRange claimed: Boolean
Hotel *
id: HotelId 1 name: String
Room 1
number: String 1..*
available(during: DateRange): Boolean stayPrice(for: DateRange): Currency
*
0..1
allocation
Fig. 2. Interface data model of IHotelHandling
[14]. Other languages, such as Object-Z [17], are also suitable for specifying the interface data model (see [10]). (2) A set of operation specifications. An operation specification consists of its signature (i.e., the types of its input and output parameters), its precondition expressing under which circumstances the operation may be invoked, and its postcondition expressing the effect of the operation. Both pre- and postcondition will refer to the interface data model. For each component interface, a B machine is defined that contains specifications of the interface data model and of the operations. 4.2 Case study We illustrate our approach by considering a hotel reservation system, a variant of the case study used by Cheesman and Daniels [6]. The architecture of the global reservation system using components is described in Fig.1 using UML 2 notation [15]. It has two provided interfaces, IMakeReservation and IT akeReservation, and two required interfaces IHotelHandling and ICustomer- Handling. One of the used components is HotelMgr with its export interface IHotelMgt. In the following, we will consider the interfaces IHotelHandling and IHotelMgt in more detail in order to prove that the component HotelMgr with its interface IHotelMgt satisfies the needs of the interface IHotelHandling. 4.2.1 Specification of the interface IHotelHandling Figure 2 shows the interface data model, expressed as a class diagram. The corresponding B specification is obtained by systematic transforma72
Chouali, Heisel and Souqui` eres
tion rules applied on the UML class diagram in the following way. Since in B all variables must have different names, we use the naming convention that all variable names are prefixed by an abbreviation of the name of the class they belong to. For example, the attribute hotel of the class ReservationDetails becomes the variable RD hotel in the B machine IHotelHandling. Classes. As we can see in Fig. 3, the classes of the interface data model and the types of their attributes are represented as sets. Attributes are defined as variables which are functions. The sets of objects that exist in the system, such cust, res, hotels and rooms are also defined as variables. For example, cust is declared to be a subset of the set Customer. Associations between classes. They are specified as variables whose type is a function or relation (depending on the multiplicities of the association) between the sets that model the associated classes. Figure 4 shows the B specification of the associations between the classes: Reservation and Customer, Reservation and Hotel, Reservation and Room. Integrity constraints. They are specified as predicates in the INVARIANT clause of the B machine. For example, the constraint which expresses that a reservation is claimed if and only if a room is allocated to it is expressed as: ∀(re).((re ∈ res) ⇒ ((RES claimed(re) = T RU E) ⇔ (re ∈ dom(assoc Allocation))))
MACHINE IHotelHandling SETS ReservationDetails; HotelId; DateRange; HotelDetails; Customer; CustID; Reservation; Hotel; Room; Currency VARIABLES RD hotel, RD dates, HD id, HD name, C id, cust, RES resRef, RES dates, RES claimed, RES number, Hotel id, Hotel name, hotels, res, R number, R available, R stayP rice, rooms INVARIANT / ∗ classReservationDetails ∗ / RD hotel ∈ ReservationDetails → HotelId ∧ RD dates ∈ ReservationDetails → DateRange ∧ / ∗ classHotelDetails ∗ / HD id ∈ HotelDetails → HotelId ∧ HD name ∈ HotelDetails → ST RING ∧ / ∗ classReservation ∗ / RES resRef ∈ Reservation → INT EGER ∧ RES dates ∈ Reservation → DateRange ∧ RES claimed ∈ Reservation → BOOL ∧ RES number ∈ INT EGER ∧ / ∗ classHotel ∗ / Hotel id ∈ Hotel → HotelId ∧ Hotel name ∈ Hotel → ST RING / ∗ state
of
the
system ∗ /
cust 0:But?Stp: Mot!Zr:timer=0
end start
Mot
int timer; timer=0: : Mot!Zr:
exit
Mot
Fig. 1. Basic Functions Stop and Abort
location. To describe a basic function, we use the notation described in [5]. Figure 1 shows such a basic function Stop with input port But, output port Mot, variable timer, entry location start, and exit location end. Its behavior is described by a labeled transition from start to end with a structured label timer > 0 : But?Stp : Mot!Zr : timer := 0. The first part of the label, its pre-part, states that whenever the data condition timer > 0 is true and signal Stp is received via port But, then the transition is enabled. The second part of the label, its post-part, states that, whenever the transition is triggered, in the next state signal Zr is sent via output port Mot and the data-condition timer = 0 is established. These parts correspond to terms timer > 0 ∧ But = Stp and Mot′ = Zr ∧ timer′ = 0 with unprimed variables from V ar for values prior to execution of the transition, primed variables from V ar ′ for values after its execution. The interface of Stop is defined by In = {But}, Out = {Mot}, its variables by V ar = {timer} ∪ In ∪ Out, and its locations by Loc = {start, end}. Abstracting from a concrete graphical representation, a basic function is described as the structure (a, pre, post, b) with entry location a, exit location b, pre-condition pre over S, and postcondition post over S × S. 5 The behavior of Stop is the set consisting of all observations (start, t, end) such that t = before ◦ after is a sequence of two states before and after with before(timer) > 0 ∧ before(But) = Stp as well as after(timer) = 0 ∧ after(Mot) = Zr. Note that, as shown in case of function Abort in Figure 1, a transition label may be underspecified, e.g., by leaving out the input-condition and the post-condition. 2.2.3 Alternative Combination Similar, e.g., to Or -combination used in Statecharts [3], we use alternative combination to describe sequential behavior. The behavior of an alternative combination of two functions corresponds to the behavior of either function. Figure 2 shows the alternative combination Halt of functions Stop and Abort. It shares all the structural aspects of either function, and thus uses input port But, output port Mot, and variable timer. Furthermore, by means of the common entry location start, either Stop or Abort can be executed. Due to disjoint exit locations, Halt is either terminated through exit location end of Stop or exit location exit of Abort. Formally, the alternative combination of 5
pre and post are obtained from the corresponding terms by interpretation over V ar, and (V ar, V ar′ ), resp.
218
¨tz Scha But
But
Halt
Halt
int timer;
Stop
int timer; timer>0:But?Stp: Mot!Zr:timer=0
start
end
timer>0:But?Stp: Mot!Zr:timer=0
end start
TimeOut
timer=0:: Mot!Zr:
timer=0: : Mot!Zr:
exit
exit
Mot
Mot
Fig. 2. Alternative Combination Halt and Simplified Representation But
But
Stop start
int timer; timer>0:But?Stp: Mot!Zr:timer=0
Mot PosD
Abort start
Hold end
int timer;
Stop timer>0:But?Stp: Mot!Zr:timer=0
end
start
Abort :PosD?On: Mot!Zr:
PosD
end
:PosD?On Mot!Zr:
Mot
Mot
Fig. 3. Stop and Abort and their Simultaneous Combination
two functions A and B results in a function described by A + B that •
uses the input and output ports as well as variables of each function: In(A+ B) = In(A) ∪ In(B), Out(A + B) = Out(A) ∪ Out(B), V ar(A + B) = V ar(A) ∪ V ar(B)
•
accesses their control locations: Loc(A + B) = Loc(A) ∪ Loc(B)
•
exhibits the behavior of either function: (a, t, b) ∈ Obs(A + B) if (a, t ↑ V ar(A), b) ∈ Obs(A) or (a, t ↑ V ar(B), b) ∈ Obs(B); (a, t) ∈ Obs(A + B) if (a, t ↑ V ar(A)) ∈ Obs(A) or (a, t ↑ V ar(B)) ∈ Obs(B)
Intuitively, the combined function offers observations that can be entered and exit via one of its sub-functions. If the sub-functions share a common entry location, observations of either function starting at that entry location are possible; similarly, if they share a common exit location, observations ending at that common exit location are possible. To ensure a well-defined function, we require that for two functions A and B conditions In(A) ∩ Out(B) = ∅ and In(B) ∩ Out(A) = ∅ must hold to be alternatively composable. Obviously, functions A + B and B + A, A + A and A, as well as A + nil and A are each equivalent in the sense of having the same interface and behavior . 2.2.4 Simultaneous Composition Besides alternative combination, functions can be combined using simultaneous combination similar, e.g., to And -composition in Statecharts to describe parallel execution. The behavior of a simultaneous combination of two func219
¨tz Scha
tions corresponds to the joint behavior of both functions. Figure 3 shows the simultaneous combination Hold of functions Stop and Abort. Its interface consists of input ports In = {But, PosD} of Stop and Abort as well as output port Out = {Mot}; its locations Loc = {start, end} are the shared locations of these functions; its variable V ar = {timer} is the corresponding variable of Stop. Formally, the simultaneous combination of two functions A and B results in a function described by A | B that •
use the input and output ports as well as variables of each function: In(A | B) = In(A) ∪ In(B)\Out(A | B), Out(A | B) = Out(A) ∪ Out(B), V ar(A | B) = V ar(A) ∪ V ar(B)
•
accesses their shared control locations: Loc(A | B) = Loc(A) = Loc(B)
•
exhibits the combined behavior of each function: (a, t, b) ∈ Obs(A | B) if (a, t ↑ V ar(A), b) ∈ Obs(A) and (a, t ↑ V ar(B), b) ∈ Obs(B); (a, t) ∈ Obs(A | B) if (a, t ↑ V ar(A)) ∈ Obs(A) and (a, t ↑ V ar(B)) ∈ Obs(B)
Intuitively, the combined functions offers observations that can be offered by both functions. To ensure a well-defined function, we require condition Loc(A) = Loc(B) for functions A and B to be simultaneously composable. Note that unless we require the standard interface constraint (V ar(A)\In(A))∩ (V ar(B)\In(B)) = ∅ imposed for the composition of components, simultaneous combination of functions may result in output or variable conflicts, leading to the introduction of (additional) partiality in the behavior of the combined functions. Obviously, A | B and B | A as well as A | A and A are each equivalent in the sense of exhibiting the same interface and behavior.
2.2.5 Hiding Locations Hiding a location of a function renders the location inaccessible from the outside. At the same time, when reaching a hidden location the function does immediately continue its execution along an enabled transition linked to the hidden location. Formally, by hiding a location l from a function A we obtain a function described by A\l that •
uses the input and output ports and variables of A: In(A\l) = In(A), Out(A\l) = Out(A), V ar(A\l) = V ar(A)
•
accesses the control locations of A excluding l: Loc(A\l) = Loc(A)\{l}
•
exhibits the behavior of A if entered/exited through locations excluding l and continuing execution at l: (a, t1 ◦ . . . ◦ tn , b) ∈ Obs(A\l) if (a, t1 , l), (l, tn , b) ∈ Obs(A) as well as (l, ti , l) ∈ Obs(A) for i = 2, . . . , n − 1; (a, t1 ◦ t2 ◦ . . .) ∈ Obs(A\l) if (a, t1 , l)) ∈ Obs(A) and (l, ti , l) ∈ Obs(A) for i > 1.
Obviously, (S\a)\b) and (S\b)\a) are equivalent in the sense of exhibiting the same interface and behavior. We write A\{a, b} for (A\a)\b. 220
¨tz Scha
2.2.6 Hiding Variables Hiding a variable of a function renders the variable unaccessible from the outside. Formally, by hiding a variable v from a function A we obtain a function described by A\v that •
uses the input and output ports and variables of A excluding v: In(A\v) = In(A)\{v}, Out(A\v) = Out(A)\{v}, V ar(A\v) = V ar(A)\{v}
•
accesses the control locations of A: Loc(A\v) = Loc(A)
•
exhibits the behavior of A for arbitrary v: (a, t ↑ V ar(A), b) ∈ Obs(A\v) if (a, t, b) ∈ Obs(A); (a, t ↑ V ar(A)) ∈ Obs(A\l) if (a, t) ∈ Obs(A).
Obviously, (S\v)\w) and (S\w)\v) are equivalent in the sense of exhibiting the same interface and behavior. We write A\{v, w} for (A\v)\w.
3
Structure and Behavior
As mentioned in Section 1, the presented approach supports the constructive composition of complex functionality from simpler functions. Thus, it is necessary to support the composition on the descriptional as well as on the behavioral level. In the following we define criteria that ensure that a composition of functions maintains both the structure and the behavior as much as possible. For that purpose,we use a structural mapping between function descriptions, taking function (sub-)terms out of F un to function (sub)-terms. 3.1 Maintaining Structure Especially in the description of embedded functions, hierarchic descriptions play an important role: sub-functions are often identified with modes of operations; entering and leaving those modes corresponds to activating and terminating the associated functions. Thus, for the practical application during the explicit composition of functions the hierarchical structure of these functions should be maintained. Therefore, we use the existence of a structural mapping between the description of the resulting composition and the descriptions of the composed functions as an additional constraint for the creation of such an explicit composition. While this approach also carries over to functions using simultaneous composition as structural element, for reasons of brevity here we focus on alternative composition. To define such a structural constraint, we use the concept of structural integration of one description of a function into another. Definition 3.1 (Structural Integration) The description of a function A is called structurally integrated within the description of a function C if a mapping f : F un ∪ Loc → F un ∪ Loc exists with • f (B + D) = f (B) + f (D) for all function terms B and D • f (B\l) = f (B)\f (l) for all function terms B and all locations l • f (B\v) = F (B)\v for all function terms B and all variables v 221
¨tz Scha
f (a, pre, post, b) = (f (a), pre′ , post′ , f (b)) for basic functions (a, pre, post, b) and some pre′ and post′ with A = f (C) using the equivalences of Subsection 2.2.3 and Loc(A) = f (Loc(C)). ◦
•
Intuitively, a description of a function A is structurally integrated within the description of a function C, if the structure of A in form of hierarchy and composition can be gained by structural abstraction from the structure of C, i.e., by removing elements from C and reordering the remainder according to the equivalences. The Manual control function with the graphical representation shown in Figure 4 is structurally integrated into the description of function Window in Figure 6. The corresponding mapping is obtained by projecting all locations to their second part, e.g., hi × up to hi; furthermore, the labels of basic functions are obtained by removing the parts related to PosU or PosD, e.g., :PosU?On,But?Stp:Mot!Zr is cut down to :But?Stp:Mot!Zr.
3.2 Maintaining Behavior Obviously, maintaining the structure of its constituting functions is only one aspect when constructing an explicit description of the combination of the descriptions of two functions; furthermore, the behavior of each of the functions integrated in that combined description must be maintained in the behavior associated with the combined description. Here, we use a version similar to the one introduced [11]; it can also be checked mechanically, but in contrast to the former it offers better scalability for non-toy-size systems of functions since it does not require an explicit construction of the complete state space of the system. To that end, we use a stronger version with an abstracted version of the state space. Definition 3.2 (Full Integration) The description of a function A is called a fully integrated within the description of a function C if there exists a mapping f structurally integrating A into C, and additionally for all pre1 , . . . , pren as well as all post1 , . . . , postn with f (ai , prei , posti , bi ) = (a, pre, post, b) it holds that pre ∧ post ⇔ (pre1 ∧ pos1 ) ∨ . . . ∨ (pren ∧ postn ). ◦ Intuitively, full integration ensures that the elements of C removed during structural integration do not influence the behavior of A. Thus, e.g., basic functions leading from hi × up to stop × idle with labels (PosD?On ∧ But?Stp, Mot!Zr) as well as (PosD?Of ∧ But?Stp, Mot!Zr) of function Window of Figure 6 are equivalent to the basic function leading from up to idle with label (But?Stp, Mot!Zr) in function Manual of Figure 4, since the corresponding signal is either On or Of. 222
¨tz Scha
4
Integrating Functions
As functions describe modules of behavior, their combination is the essential part of the design of a functional architecture; while alternative combination is used to model the activation/deactivation of functions, simultaneous combination is used to model concurrently active functions. As mentioned in Section 2, the simultaneous combination of functions does not correspond to the composition of components since •
functions may share variables including output ports, while components may only share input ports,
•
and as a result combined functions may exhibit undefined behavior where their constituting sub-functions do not, e.g., due to output conflicts.
As introduced in [11], functions are consistent if no new partiality is introduced by their (simultaneous) combination. In the semantic setting introduced in Section 2, consistency can be defined as follows. Definition 4.1 (Consistency) Functions A and B are called consistent if {(a, t ↑ In(X), b) | (a, t, b) ∈ Obs(X)} ⊆ {(a, t ↑ In(X), b) | (a, t) ∈ Obs(A | B)} and {(a, t ↑ In(X)) | (a, t) ∈ Obs(X)} ⊆ {(a, t ↑ In(X)) | (a, t) ∈ Obs(A | B)}, for X ∈ {A, B}. ◦ Due to the structural constraints imposed for the composition of components, components are consistent by construction. Therefore, when moving from the functional design phase to the architectural design phase •
the synchronous combinations of functions not corresponding to architectural compositions must be substituted,
•
undefined behavior introduced by conflicts in the combination must be identified.
In the following subsection, we introduce a constructive approach to resolve a description constructed using synchronous combination while maintaining as much structure and behavior as possible, basically using the product construction for automata. Furthermore, we show how to identify possible conflicts that may cause additional undefined behavior. 4.1 Unfolding a Combination Subsection 2.2.4 basically defines simultaneous composition as the product of the behaviors of the combined functions; similarly, approaches from [2] or [11] uses the product construction on a (state-based) semantical model to support mechanical analysis of functional descriptions. In contrast, here we are rather interested in using a mechanism on the notational level to integrate descriptions of function. Nevertheless, we use the product construction to construct an integrated, ‘unfolded’ version of simultaneously combined functions. To demonstrate the basic principles of function integration, we use a simple 223
¨tz Scha idle
Manual
idle But?Stp: Mot!Zr
Up up But
ButStp: Mot!Zr
Idle
dwn
idle up
idle But?Up: Mot!Hi
But?Up:Mot!Hi
Dwn
idle But?Stp:Mot!Zr
dwn But?Dn: Mot!Lo
Mot But?Dn:Mot!Lo
Fig. 4. Manual Control Function stop
Position
stop
Hi PosU
hi
hi
PosU?On :Mot!Zr
PosU?Of:Mot!Zr
PosD
Stop stop
PosD?Of:Mot!Zr
Low low Mot
stop ::Mot!Lo
::Mot!Hi stop PosU?Of:Mot!Hi
PosD?On :Mot!Zr low
::Mot!Zr
PosD?Of:Mot!Lo
Fig. 5. Position Control Function
example from automotive chassis electronics; window control often depends on the class of car or national regulations; therefore its final functionality is often constructed from basic functions. Figure 4 shows the control function Manual for manual control with subfunctions Up, Idle, and Down. Initially in Idle with stopped window movement Mot!Zr, upward and downward movement is initiated via direction buttons But?Up and But?Dn, resulting in a corresponding Mot!Hi and Mot!Lo signal. The movement is maintained in functions Up and Down with pressed buttons, until deactivated via But?Stp. Figure 5 shows the control function Position for position control with sub-functions Hi, Stop, and Low. Initially in sub-function Idle with stopped window movement Mot!Zr, upward or downward movement causes a change to functions Hi or Low, with window movement checked for absence of the signal for end positions PosU?Of and PosD?Of. Termination of the functions results in stopping the movement Mot!Zr. While both functions Manual and Position cover one part of the functionality of controlling the window, we are interested in defining a common functionality for both aspects, corresponding to their simultaneous combination. As, however, both functions share Mot as common output port, this combination must be adapted when moving from a functional-based to component-based architecture. Thus, we unfold these functions into a control function Window. As mentioned above, when unfolding a simultaneous combination of functions, we want to maintain both structure and behavior. In the following we show the hierarchic product of two functions can be used to construct an unfolded description of those functions. For reasons of brevity, we use n-ary variants of the operators introduced in Subsection 2.2: Definition 4.2 (Structural Unfolding) A description of a function F is 224
¨tz Scha stop×idle
Window PosU
stop×idle
Hi×Up
hi× up
hi×up
PosU?On, But?Of:Mot!Zr
:PosU?Of,But?Stp: Mot!Zr
But PosD :PosU?Of, But?Up:Mot!Hi
:PosU?Of, But?Up:Mot!Hi
stop ×idle
Stop ×Idle stop ×idle
But?Stp :Mot!Zr
PosD?On, But?Stp:Mot!Zr low×
dwn
:PosD?Of,But?Stp: Mot!Zr
stop ×idle
:PosD?Of, But?Dn:Mot!Lo
Low×Dwn low×dwn Mot :PosD?Of, But?Dn:Mot!Lo
Fig. 6. Simplified Window Control Function
called the structural unfolding of the description of two functions F1 and F1 if there exists a mapping U : F un × F un → F un with F = U(F1 , F2 ) and 6 (i) U(F1,1 + . . . + F1,m , F2,1 + . . . + F2,n ) = U(F1,1 , F2,1 ) + . . . + U(F1,m , F2,n ) (ii) U(F1 \{a1 , . . . , am }, F2 \{b1 , . . . , bn }) = U(F1 , F2 )\{a1 × b1 , . . . , am × bn } for locations a1 , . . . , am and b1 , . . . , bn (iii) U(F1 \{v1 , . . . , vm }, F2 \{w1 , . . . , wn }) = U(F1 , F2 )\{v1 , . . . , vm , w1 , . . . , wn } for variables v1 , . . . , vm and w1 , . . . , wn (iv) U((a1 , pre1 , post1 , b1 ), (a2 , pre2 , post2 , b2 )) = (a1 × a2 , pre1 ∧ pre2 , post1 ∧ post2 , b1 × b2 ) (v) U(F1 , F2 ) = nil , otherwise ◦ Intuitively, the structural unfolding is obtained by construction of the product of the functions on each level of hierarchy (i, ii, iii), introducing product locations a × b for internal locations; basic functions are integrated by conjunction of their pre- and post-conditions (iv); incompatible levels are ignored. (v). As shown in the description of the unfolded functions in Figure 6, we obtain products of sub-functions Hi×Up, Stop×Idle, and Low×Dwn, including their locations hi×up, idle×stop, and low×dwn. Additionally, we obtain products of basic functions, e.g., :PosU?Of,But?Up:Mot!Hi looping from hi×up as the product of :But?Up:Mot!Hi and :PosU?Of:Mot!Hi looping from hi and up. By constructing the unfolded function Window, the hierarchic structure of Manual and Position was maintained according to definition 2. However, besides maintaining the structure, the unfolded function must also correspond to the simultaneous combination of Manual and Position. By construction, if F1 and F 2 are structurally integrated into the structural unfolding F , the overall behavior remains unchanged, i.e., obs(F ) = obs(F1 | F2 ). As mentioned in Subsection 3.1, the corresponding mapping from the unfolded function to its constituting functions is obtained by projecting all product locations to their corresponding part, and furthermore removing those parts of the labels of basic functions that are added by the other function. Thus, from a development point of view, by unfolding a simultaneous combination we can adapt functional descriptions that do not respect the (struc6
For sake of brevity, we assume a homogeneous form of hierarchies as well as uniqueness of hidden variables, leading to a shorter definition than in the more general case.
225
¨tz Scha hold
Window Lift PosU
hi
But
hold PosU?On lift :Mot!Zr :But?Stp:Mot!Zr
PosD
Hold hold
But?Up:Mot!Hi
drop
:But?Stp:Mot!Zr
Drop drop Mot
hold :PosD?Of,
:PosU?Of, hold :PosU?Of, But?Up:Mot!Hi
PosD?On :Mot!Zr
But?Stp :Mot!Zr
But?Dn:Mot!Lo
:PosD?Of, But?Dn:Mot!Lo
Fig. 7. Simplified Window Control Function with Resolved Conflicts
tural) restrictions for component composition, without changing the overall behavior. As a result, unfolding helps to simplify the transition from the functional design to the component based design in the development process. Obviously, the construction of F leads to a functional description that can be considerably simplified: due to clause iv of Definition 2, the basic sub-functions of F are obtained by conjunction of the corresponding basic sub-functions of F1 and F2 . Thus, e.g., when combining the basic functions (dwn, (But?Dn, Mot!Lo), dwn) of function Manual and (hi, (PosU?Of, Mot!Hi), hi) of function Position, this results in (hi × dwn, (PosU?Of ∧ But?Dn, Mot!Lo ∧ Mot!Hi), hi × dwn). As Mot!Lo ∧ Mot!Hi requires that at port Mot simultaneously signals Hi and Lo are sent, the combined basic function is not satisfiable since Hi 6= Lo. Therefore, this basic function does not contribute to the overall behavior of Window. To simplify the description of the unfolded function, sub-functions that do not contribute to the behavior of the system are removed from the description. By iteratively removing basic functions with unsatisfiable pre ∧ post, unreachable functions, or empty functions, a simplified version – as already shown in Figure 6 – is obtained without changing its behavior. Note that here we only use a local criterion for the detection of conflicts: we analyze the satisfiability of a transition without considering the actual state space of the combined functions. Obviously, local satisfiability is a necessary prerequisite of global satisfiability; as thus local unsatisfiability is a sufficient criterion for global unsatisfiability, the strategy of simplification is safe, but may miss unsatisfiable transitions. 4.2 Detecting Conflicts As shown in the previous subsection, the construction of the product automaton in general leads to the introduction of non-executable transitions, which were removed from the description of the combined functions without changing the behavior. However, these conflicts may also be the cause for a lack of consistency of two combined functions, as described in Definition 1. Therefore, we are interested in the detection of those conflicts that do change the behavior of the combined system. To detect those conflicts, we make use of the mapping used in the previous 226
¨tz Scha
subsection to establish the structural integration of the constituting functions into the unfolded function, ensuring that the behavior of the unfolded function does indeed correspond to the behavior of the simultaneously combined functions. By checking that additionally this mapping establishes a full integration as defined in Definition 2, the absence of conflicts can be ensured. To that end, as illustrated in Subsection 3.2, we check whether the label of the basic functions of Manual and Position are equivalent to their counterparts of Window defined by the mapping of the structural integration. When, e.g., relating the basic function leading from up to up with label (But?Up, Mot!Hi) in function Manual of Figure 4 to its counterpart leading from hi×up to hi×up with label (PosD?On ∧ But?Up, Mot!Hi) of the simplified function Window of Figure 6, a non-equivalence is detected. This is due to the conflict between (But?Up, Mot!Hi) in function Manual and (PosU?On, Mot!Zr) in function Position, leading to the elimination of the corresponding product function. By changing the design through adding corresponding new basic functions for these conflicts, a complete description for the Window control function can be obtained, as shown in Figure 7 (using new location names).
4.3 Establishing Completeness As mentioned in Section 2.1, component behavior is generally expected to be completely defined. Thus, supporting the detection of partiality additionally eases the transition to the component-based architecture. To that end, we use an adaption of the completeness check in [11]: A function description F is considered locally complete if ∀s ∈ S.∃s′ ∈ S.(pre 1 (s) ∧ post 1 (s, s′ )) ∨ . . . ∨ (pre n (s) ∧ post n (s, s′ )) for all basic sub-functions (a, pre i , post i , bi ) of F with a common entry location a in F . Similar to the approach used in [8], this establishes a sufficient condition for global completeness, enabling a safe and scalable check.
5
Conclusion and Related Work
The main contributions of the approach presented here target the constructive transition from function-based to component-based descriptions of systems; especially the presented approach •
illustrates a mechanism for an integration on the descriptional level
•
introduces a corresponding mechanism to detect possible conflicts of simultaneously combined functions
with a focus on scalability. 227
¨tz Scha
5.1 Tool Support To ease transition from the function- to the component-based architecture of a system using the approach presented here, tool support is needed, both for the unfolding of a combined description including the construction of the mapping used in the structural integration, as well as for the detection of conflicts. Using the framework of AutoFocus, a user-guided merging of state-based function descriptions has been developed, currently limited to non-hierarchic descriptions [10]. This merging includes checking for conflicts when merging the labels of basic functions, however restricted to a limited set of simplification strategies when checking the equivalence of conditions. As those weak simplifications lead to less compacted versions of the unfolded descriptions as well as to more undecided conflicts, a stronger validity checker must be applied. Currently, CVCL [1] is applied to check unsatisfiability for the simplification, the validity of the equivalence condition for full integration, and the local completeness of a description. Due to the expressiveness of the description formalism for transition labels, the satisfiability of a transition is generally not decidable. In the context of simplifying a product automaton this does not pose an essential problem - we only obtain a less compact but semantically equivalent description by maintaining undecided cases. Similarly, during conflict detection, undecided cases are treated as possible conflicts, leading to more falsely identified conflicts. 5.2 Related Work The combination of functions has traditionally been studied in the context of feature integration, e.g., [2]. However, those and approaches like [8] focus mainly on the semantical level and analytical techniques. Here, in contrast, we are rather interested in supporting the modular development of control functions on the descriptional level ; furthermore, we introduce a constructive approach that supports the developer in building component descriptions from a collection of functions. Other notationally oriented approaches like [9] focus on the support of non-simultaneous composition. Finally, approaches like [2] and [11] perform a precise analysis of the system under development, leading to non-scalability; in contrast, here, we use a limited technique ensuring correct development but supplying sufficient scalability in practical applications.
References [1] Barrett, C. and S. Berezin, CVC Lite: A New Implementation of the Cooperating Validity Checker, in: Proceedings of the 16th International Conference on Computer Aided Verification (CAV), 2004. [2] Calder, M. and E. H. Magill, “Feature Interaction in Telecommunication and Software Systems IV,” IOS Press, 2000.
228
¨tz Scha
[3] Harel, D. and M. Politi, “Modeling Reactive Systems with Statecharts,” MacGraw-Hill, 1998. [4] Henzinger, T. A., Masaccio: A Formal Model for Embedded Components, in: Proceeding of the First International IFIP Conference of Theoretical Computer Science (2000), LNCS 1872. [5] Huber, F., B. Sch¨atz and G. Einert, Consistent Graphical Specification of Distributed Systems, in: Industrial Applications and Strengthened Foundations of Formal Methods (FME’97), LNCS 1313 (1997). [6] Lynch, N. and M. Tuttle, An Introduction to Input/Output Automata, CWI Quarterly 2 (1989). [7] Mutz, M., M. Huhn, U. Goltz and C. Kr¨omke, Model Based System Development in Automotive, in: Proceedings of the SAE 2002 World Congress, Detroit, 2002. [8] Park, D. Y. W., J. U. Skakkebæk, M. P. E. Heimdahl, B. J. Czerny and D. L. Dill, Checking properties of safety critical specifications using efficient decision procedures, in: Proc. Workshop on Formal Methods in Software Practice, 1998. [9] Prehofer, C., Feature-oriented programming: A new way of object composition., Concurrency and Computation: Practice and Experience 13 (2001). [10] Sch¨atz, B., P. Braun, F. Huber and A. Wisspeintner, Checking and Transforming Models with AutoFOCUS, in: Engineering of Computer-Based Systems ECBS’05 (2005). [11] Sch¨atz, B. and C. Salzmann, Service-Based Systems Engineering: Consistent Combination of Services, in: Proceedings of ICFEM 2003, Fifth International Conference on Formal Engineering Methods (2003), LNCS 2885. [12] Thurner, T., J. Eisenmann, U. Freund, R. Geiger, M. Haneberg, U. Virnich and S. Voget, The EAST-EEA Project - A Middleware Based Software Architecture for Networked Electronic Control Units in Vehicles, VDI Berichte 1 (2003), (In German).
229
¨tz Scha
230
FACS 2005
Component Identification Through Program Slicing Nuno F. Rodrigues 1,2 and Lu´ıs S. Barbosa 1,3 Departamento de Inform´ atica Universidade do Minho Braga, Portugal
Abstract This paper reports on the development of specific slicing techniques for functional programs and their use for the identification of possible coherent components from monolithic code. An associated tool is also introduced. This piece of research is part of a broader project on program understanding and re-engineering of legacy code supported by formal methods. Key words: Program Slicing, Static Analysis, Component Identification.
1
Introduction
A fundamental problem in system’s re-engineering is the identification of coherent units of code providing recurrently used services. Such units, which are typically organised around a collection of data structures or inter-related functions, can be wrapped around an interface and made available as software components in a modular architectural reconstruction of the original system. Moreover they can then be made available for reuse in different contexts. This paper proposes the use of software slicing techniques to support such a component’s identification process. Introduced by Weiser [16,14,15] in the late Seventies, program slicing is a family of techniques for isolating parts of a program which depend on or are depended upon a specific computational entity referred to as the slicing criterion. Its potential for service or component identification is therefore quite obvious. In practice, however, this requires 1
The research reported in this paper is supported by FCT, under contract POSI/ICHS/44304/2002, in the context of the PURe project. 2 Email:
[email protected] 3 Email:
[email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs
Rodrigues and Barbosa •
A flexible definition of what is understood by a slicing criterion. In fact, Weiser’s original definition has been re-worked and expanded several times, leading to the emergence of different methods for defining and computing program slices. Despite this diversity, most of the methods and corresponding tools target either the imperative or the object oriented paradigms, where program slices are computed with respect to a variable or a program statement.
•
The ability to extract actual (executable) code fragments.
•
And, of course, suitable tool support.
All these issues are addressed in this paper. Our attention, however, is restricted to functional programs [2]. Such focus is explained not only by the research context mentioned below, but also because we deliberately want to take an alternative path to mainstream research on slicing where functional programming has been largely neglected. Therefore our research questions include the definition of what a slice is for a functional program, how can program data be extracted and represented, what would be the most suitable criteria for component identification from functional monolithic code. There is another justification for the qualificative functional in our title: the tool that supports the envisaged approach was entirely developed in Haskell [2]. The context for this research is a broader project on program understanding and re-engineering of legacy code supported by formal methods. A number of case-studies in the project deal with functional code, even in the form of executable specifications 4 . Actually, if forward software engineering can today be regarded as a lost opportunity for formal methods (with notable exceptions in areas such as safety-critical and dependable computing), reverse engineering looks more and more a promising area for their application, due to the engineering complexity and exponential costs involved. In a situation in which the only quality certificate of the running software artefact still is life-cycle endurance, customers and software producers are little prepared to modify or improve running code. However, faced with so risky a dependence on legacy software, managers are more and more prepared to spend resources to increase confidence on — i.e., the level of understanding of — their code. The paper is organised as follows. Section 2 reviews basic concepts in program slicing and introduces introduces functional slicing, specifying a new representation structure — the FDG (Functional Dependence Graph) — and the slicing operations over it. The corresponding prototype tool (HaSlicer ) is described in section 3. Section 4 discusses how these techniques and tool can be used for ’component discovery’ and identification. A small example is included to illustrate the approach. The paper ends with a small section on conclusions and future work. 4
Specification understanding is not so weird as it may look at first sight. Actually, the authors became aware of the amount and relevance of legacy specifications in the context of an industrial partnership on software documentation.
232
Rodrigues and Barbosa
2
Functional Program Slicing
2.1
Program Slicing
Weiser, in [15], defines a program slice S as a reduced executable program obtained from a program P by removing statements, such that S replicates part of the behaviour of P . A complementary definition characterizes program slices as fragments of a program that influences specific computational result inside that program [13]. The computation of a program slice is called program slicing. This process is driven by what is referred to as a slicing criterion, which is, in most approaches, a pair containing a line number and a variable identifier. From the user point of view, this represents a point in the code whose impact she/he wants to inspect in the overall program. From the program slicer view, the slicing criterion is regarded as the seed from which a program slice is computed. According to Weiser original definition a slice consists of an executable sub-program including all statements with some direct or indirect consequence on the result of the value of the entity selected as the slicing criterion. The concern is to find only the pieces of code that affect a particular entity in the program. Weiser approach corresponds to what would now be classified as a backward, static slicing method. A dual concept is that of forward slicing introduced by Horwitz et al [5]. In forward slicing one is interested on what depends on or is affected by the entity selected as the slicing criterion. Note that combining the two methods also gives interesting results. In particular the union of a backward to a forward slice for the same criterion n provides a sort of a selective window over the code highlighting the region relevant for entity n. Another duality pops up between static and dynamic slicing. In the first case only static program information is used, while the second one also considers input values [6,7] leading frequently, due to the extra information used, to smaller and easier to analyse slices, although with a restricted validity. Slicing techniques are always based on some form of abstract, graph-based representation of the program under scrutiny, from which dependence relations between the entities it manipulates can be identified and extracted. Therefore, in general, the slicing problem reduces to sub-graph identification with respect to a particular node. Note, however, that in general slicing can become a highly complex process (e.g., when acting over unstructured control flow structures or distributed primitives), and even, in some cases undecidable [12]. 2.2
Functional Program Slicing
As mentioned above, mainstream research on program slicing targets imperative languages and, therefore, it is oriented towards particular, well characterised notions of computational variable, program statement and control 233
Rodrigues and Barbosa
flow behaviour. Slicing functional programs requires a different perspective. Functions, rather program statements, are the basic computational units and functional composition replaces statement sequencing. Moreover there is no notion of assignable variable or global state whatsoever. Besides, in modern functional languages encapsulation constructs, such as Haskell [2] modules or Ml [4] abstract data types, provide powerful structuring mechanisms which can not be ignored in program understanding. What are then suitable notions of slicing for functional programs? Suitable, of course, with respect to the component identification process. Such is the question addressed below. 2.3
Functional Dependence Graphs
As mentioned above slicing techniques are always based on some kind of dependence graph. Typical such structures are control flow graphs (CFG) and program dependence graphs (PDG). For a program P , a CFG is an oriented graph in which each node is associated with a statement from P and edges represent the corresponding flow of control between statements. These kind of graphs rely entirely on a precise notion of program statement and their order of execution inside the program. Since functional languages are based on expression rather than statements, CFG’s are not immediately useful in performing static analysis over functional languages. A PDG is an oriented graph where the nodes represent different kinds of entities in the source code, and edges represent different kinds of dependencies. The entities populating the nodes can represent functions, modules, datatypes, program statements, and other kind of program structures that may be found in the code. In a PDG there are different sorts of edges (e.g., loopcarried flow edges, loop-independent flow edges, control dependence edges, etc) each representing a different kind of dependency between the intervenient nodes. Adapting the definition of PDG’s to the functional paradigm, one may obtain a structure capturing a variety of information that, once combined, can form the basis of meaningful slicing criteria. This leads to the following definition. Definition 1 (Functional Dependence Graph) A Functional Dependence Graph (FDG) is a directed graph, G = (E, N ) where N is a set of nodes and E ⊆ N × N a set of edges represented as a binary relation between nodes. A node N = (t, s, d) consists of a node type t, of type N T ype, a source code location s, of type SrcLoc and a description d of type Descr. A source code location is simply an index of the node contents in the actual source code. Definition 2 (SrcLoc) The type SrcLoc is a product composed by the source file name and the line-colunm code coordinates of a particular program ele234
Rodrigues and Barbosa
ment, i.e., SrcLoc = SrcF ileN ame × SrcBgnLine × SrcBgnColumn × SrcEndLine × SrcEndColumn. More interesting is the definition of a node type which captures the information diversity mentioned above and is the cornerstone of FDG’s flexibility. Definition 3 (NType) A FDG node is typed as follows: N T ype = Nm (module) | Nf (function) | Ndt (data type) | Nc (constructor) | Nd (destructor) Let us explain in some detail the intuition behind these types. Nodes typed as Nm , represent software modules, which, from the program analysis point of view, corresponds to the highest level of abstraction over source code. Note that Haskell has a concrete definition of module, which makes the identification of Nm nodes straightforward. Modules encapsulate several program entities, in particular code fragments that give rise to other FDG nodes. Thus, a Nm node depends on every other node representing entities defined inside the module as well as on nodes corresponding to modules it may import. Nodes of type Nf represent functions, i.e., abstractions of processes which transform some kind of input information (eventually void) into an output (eventually void too). Functions are the building blocks of functional programs, which in most cases, decorate them with suitable type information, making extraction simpler. More complex is the task of relating a function node to the nodes corresponding to computational entities in its body — data type references, other functions or what we shall call below functional statements. Constructor nodes (Nc ) are specially targeted to functional languages with a precise notion of explicit type constructors (such as the ones associated to datatype declarations in Haskell). Destructor nodes (Nd ) store datatype selectors, which are dual to constructors, and again specific to the functional paradigm 5 . This diversity of nodes in the FDG is interconnected by arcs. In all cases an edge from a node n1 to a node n2 witnesses a dependence relation of n2 on n1 . The semantics of such a relation, however, depends on the types of both nodes. For example, an edge from a Nf (function) node n1 to a Nm (module) node n2 means that the module represented by n2 depends on the function associated to n1 , that is, in particular, that the function in n1 is defined inside the module in n2 . On the other hand, an edge from a node n3 to n4 , both of type Nf , witnesses a dependence of the function in n4 on the one in n3 . This means, in particular, 5
A similar notion may, however, be found in other contexts — e.g., the C selector operator “.” which retrieves specific fields from a struct construction. Object oriented languages also have equivalent selector operators.
235
Rodrigues and Barbosa
Target
Possible Sources
Edge Meaning
Nm
{Nm }
Target node imports source node
Nm
{Nf , Nc , Nd , Ndt }
Source node contains target node definition
Nf
{Nst }
Statements belong to function definition
Nf
{Nc , Nd , Ndt , Nf }
Function is using target node functionality
Ndt
{Ndt }
Source data-type is using target data-type
Ndt
{Nc }
Data-type is constructed by target node
Ndt
{Nd }
Data-type is destructed by target node Table 1 FDG Edge Description
the latter is called by the former. Notice the difference from the Nm , Nf case where dependence means definition inside the module. Table 1 introduces the intended semantics of edges with respect to the types of nodes they connect. Also note that a FDG represents only direct dependencies. For example there is no node in a FDG to witness the fact that a module uses a function defined elsewhere. What is represented in such a case is a relationship between the external function and the internal one which calls it. From there the indirect dependence can be retrieved by a particular slicing criterion. 2.4
The Slicing Process
Program slicing based on Functional Dependence graphs is a five phase process, as illustrated in figure 1. As expected, the first phase corresponds to source code parsing to produce an abstract syntax tree (AST) instance t. This is followed by an abstraction process that extracts the relevant information from t, constructing a FDG instance g according to the different types of nodes found. The third phase is where the actual slicing takes place. Here, given a slicing criterion, composed by a node from t and a specific slicing algorithm, the original FDG g is sliced, originating a subgraph of g which is g 0 . Note that, slicing takes place over the FDG, and that the result is always a subgraph of the original graph. The fourth phase, is responsible for pruning AST t, based on the sliced graph g. At this point, each program entity that is not present in graph g 0 , is used to prune the correspondent syntactic entity in t, originating a subtree t0 of t. Finally, code reconstruction takes place: the pruned tree t0 is consumed to generate the sliced program (a process which is somehow dual to the one followed in phase 1). 236
Rodrigues and Barbosa
Fig. 1. The slicing process
In [9] a number of what we have called slicing combinators were formally defined, as operators in the relational calculus [1], on top of which the actual slicing algorithms, underlying phases three and four above, are implemented. This provides a basis for an algebra of program slicing, which is, however, out of the scope of this paper.
3
The HaSlicer Prototype
HaSlicer 6 is a prototype of a slicer for functional programs entirely written in Haskell built as a proof-of-concept for the ideas discussed in the previous section. Both forward, backward and forward dependency slicing are covered. In general the prototype implements the above mentioned slicing combinators [9] and addresses two other issues fundamental to component identification: the definition of the extraction process from source code and the incorporation of a visual interface over the generated FDG to support user interaction. Although its current version accepts only Haskell code, plug-ins for other functional languages as well as for the Vdm-Sl metalanguage [3] are currently under development. Figure 2 shows two snapshots of the prototype working over a small Haskell program. Screenshot 2 (a), shows the visualization of the entire FDG loaded in the tool. Notice that the differently coloured nodes indicate different program 6
Available from wiki.di.uminho.pt/wiki/bin/view/Nuno.
237
Rodrigues and Barbosa
Node Color
Node Type Nm Nf Ndt Nc Nd
Table 2 FDG Edge Codes
(a)
(b)
Fig. 2. Slicing with HaSlicer
entity types according to Table 2. Figure 2.(b) reproduces the subgraph generated by slicing over one of the nodes of the graph represented on 2.(a). Once a slice iscomputed, the corresponding code can be automatically recovered. Note also that the slicing process can also be undone or launched again with different criteria or object files.
4
Component Discovery and Identification
4.1
Two Approaches
There are basically two ways in which slicing techniques, and the HaSlicer tool, can be used to identify software components from arbitrary functional code: either as a support for manual component identification or as a ’discovery’ procedure in which the whole system is searched for possible loci of 238
Rodrigues and Barbosa
services, and therefore potential components. In this section both approaches are briefly discussed. In the first approach manual component identification is guided by slicing performed over the FDG representation of the target legacy code. In practice, the heterogenous graph structure underlying a FDG seems to provide a suitable representation model. In particular, it enables the software architect to easily identify dependencies between code entities and to look for certain architectural patterns and/or undesired dependencies in the graph. One of the most interesting operations in this category is component identification by service. The idea is to isolate a component that implements a specific service of the overall system. The process starts in a top-down way, looking for the top level functions that characterise the desired service. Once these functions are found, forward dependency slicing is applied starting from the corresponding FDG nodes. These produces a series of sliced files (one per top level function), that have to be merged together in order to build the desired component. Note that a forward dependency slice collects all the program entities required by each top level function to operate correctly. Thus, by merging all the forward dependency slices corresponding to a particular service one gets the least (derived) program that implements it. This process leads to the identification of a new component which, besides being reusable in other contexts, will typically be part of the (modular) reconstruction of the original legacy system. But in what direction should such system be reorganized to use the identified service as an independent component? This would require an operation upon the FDG which is, in a sense, dual to slicing. It consists of extracting every program entity from the system, but for ones already collected in the computed slices. Such operation, which is, at present, only partially supported by HaSlicer, produces typically a program which cannot be immediately executed, but may be transformed in that direction. This amounts basically to identify potential broken function calls in the original code and re-direct them to the new component’s services. A second approach was mentioned in the beginning of this section under the designation of component discovery. It relies on the systematic application of slicing for the automatic isolation of possible components. In our experience this is particularly useful at early stages of component identification. Some care needs to be taken, however, as in a number of contexts this process may lead to the identification of both false positives and false negatives. This means that there might be good candidates for components which are not found, as well as situations in which several possible components are identified which turn out to lack any practical or operational interest. To use an automatic component ’discovery’ procedure, one must first understand what to look for, since there is no universal way of stating which characteristics correspond to a potential software component. Therefore, slicing techniques over program elements have to be combined with suitable metrics corresponding to empirical criteria for component identification. 239
Rodrigues and Barbosa
A typical criteria which is worthwhile to look at concerns the organization of a bunch of functions around a common data type structure. This focus component ’discovery’ on data types defined in the original code. The idea is to take each data type and isolate both the data type and every program entity in the system that depends on it. Such an operation can be accomplished by performing a backward slicing starting from each data type node in the FDG. A second well known criteria, firstly identified by the object-orientation community, is based on the empirical observation that ’interesting’ components typically present a low level of coupling and a high level of cohesion[18]. Briefly, coupling is a metric to assess how mutually dependable two components are. It acts as a measure of how much a change in one component affects the others. On the other hand, cohesion measures the degree of internal interrelation among the functions collected in a specific component. Generally, in a component with a low cohesion degree errors and undesirable behaviour are difficult to detect. In practice if its functions are weakly related errors may ’hide’ themselves in seldom used areas and remain invisible to testing for quite a long time. The conjunction of these two metrics leads to a ’discovery’ criteria which uses the FDG to look for specific clusters of functions, i.e., sets of strongly related functions, with reduced dependencies on any other program entity outside this set. Such function clusters cannot be identified by program slicing techniques only, but the FDG is still very useful in determining them. The reason is that this kind of metrics can be computed on top of the information represented in the FDG. In the HaSlicer tool, in particular, their combined value is computed through Coupling(G, f ) , ]{(x, y) | ∃x, y. yGx ∧ x ∈ f ∧ y 6∈ f }
(1)
Cohesion(G, f ) , ]{(x, y) | ∃x, y. yGx ∧ x ∈ f ∧ y ∈ f }
(2)
CCAnalysis(G) , {(Coupling(G, f ), Cohesion(G, f )) | ∀f ∈ PF }
(3)
where G is a FDG and F a set of functions under scrutiny. Depending on how liberal or strict one wants the component discovery criteria to be, different acceptance limits for coupling and cohesion can be used. This will define what clusters will be considered as loci of potential components. Once such clusters are identified, the process continues by applying forward dependency slicing on every function in the cluster and merging the resulting code. 4.2
A Toy Example
To illustrate the use of slicing for component identification, consider the Haskell code for a toy bank account system, shown in Appendix A. The corresponding FDG, as computed by HaSlicer is depicted in Figure 3. If one tries to apply an automatic component ’discovery’ method to this code, based, for example, in the combined cohesion-coupling metric, the number of cases to consider soon becomes very large. This occurs because the 240
Rodrigues and Barbosa
Fig. 3. FDG for the Toy Bank Account System
Functions’ Clusters
Coh
Cou
getAccAmount findAcc existsAcc insertAcc updateAcc removeAcc
7
0
getCltName findClt existsClt insertClt updateClt removeClt
7
0
Table 3 Cohesion and Coupling Metric for Example 3
algorithm iterates the powerset constructor over the set of functions. Nevertheless, a simple filter based on both coupling, cohesion and cardinality of the sets under analysis largely decreases the number of cases to consider. The idea is to tune the ’discovery’ engine to look for high cohesion values combined with both a low value of coupling and, what is most important, low cardinality of the sets under analysis. The results of applying such a filter to the example at hands are reproduced in Table 3. Clearly, two components were identified (corresponding to the gray area of the FDG in Figure 3): a component for handling Client information and another one for managing Accounts data. As mentioned above, the process would continue by applying forward dependency slicing over the nodes corresponding to the functions in the identified sets, followed by slice merging.
5
Conclusions and Future Work
5.1
Conclusions
Under the overall motto of functional slicing, the aim of this paper was twofold. On the one hand a specific dependence graph structure, the FDG, was 241
Rodrigues and Barbosa
introduced as the core graph structure for functional slicing and a corresponding prototype developed. On the other hand it was shown how slicing techniques can be used to identify software components from (functional) legacy code, either as a support tool for the working software architect or in an automatic way in a process of component ’discovery’. The latter is particularly useful as an architecture understanding technique in the earlier phases of the re-engineering process. What makes FDG a suitable structure for our purpose is the introduction of an ontology of node types and differentiated edge semantics. This makes possible to capture in a single structure the different levels of abstraction a program may possess. This way a FDG captures not only high level views of a software project (e.g., how modules or data-types are related), but also low level views (down to relations between functional statements inside function’s bodies, not discussed here but see [9]). Moreover, as different program abstraction levels are stored in a single structure, it becomes easy to jump across views according to the analyst needs. Finally, notice that the FDG structure is flexible enough to be easily adapted to other programming languages and paradigms. 5.2
Related Work
Our definition of a FDG is closely related to the notion of Program Dependence Graph defined by Ottenstein and Ottenstein in [8]. The introduction of distinct node types andthe associated edge semantics is, however, new. The method for component identification discussed above is in debt to previous work of Schwanke et al [11,10], where metrics like coupling and cohesion are used to identify highly cohesive modules. We have implemented similar techniques on HaSlicer resorting to Haskell lazy evaluation mode to obtain answers in a reasonable time. Another difference between our approach to component identification and other techniques studied in the literature, which are usually considered part of a boarder discipline of software clustering [17], is our focus on functional languages with no aggregation units other then the module itself. In contrast to this, most of software clustering algorithms are designed for the Objectorientationi paradigm, and as a consequence, often based on the notion of a class which is itself an aggregation construct. In the functional side, however, we have to consider in an explicit way several program elements which are typically smaller in both size and function. 5.3
Future Research
As mentioned in the Introduction, this research is part of a broader agenda. In such a context, current work includes: •
The generalization of slicing techniques to the software architecture level, 242
Rodrigues and Barbosa
in order to make them applicable, not only to architectural specifications (as in [19]), but also to the source code level of large heterogeneous software systems, i.e. systems that have been programmed in multiple languages and consists of many thousands of lines of code. •
The research on the interplay between component identification based on slicing, as discussed in this paper, and other analysis techniques (such as, e.g., type reconstruction) also based on graph analysis.
References [1] R. C. Backhouse and P. F. Hoogendijk. Elements of a relational theory of datatypes. In B. M¨oller, H. Partsch, and S. Schuman, editors, Formal Program Development, pages 7–42. Springer Lect. Notes Comp. Sci. (755), 1993. [2] R. Bird. Functional Programming Using Haskell. Series in Computer Science. Prentice-Hall International, 1998. [3] J. Fitzgerald and P. G. Larsen. Modelling Systems: Pratical Tools and Techniques in Software Development. Cambridge University Press, 1998. [4] R. Harper and K. Mitchell. Introduction to standard ml. Technical Report, University of Edimburgh, 1986. [5] S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In PLDI ’88: Proceedings of the ACM SIGPLAN 1988 Conf. on Programming Usage, Design and Implementation, pages 35–46. ACM Press, 1988. [6] B. Korel and J. Laski. Dynamic program slicing. Inf. Process. Lett., 29(3):155– 163, 1988. [7] B. Korel and J. Laski. Dynamic slicing of computer programs. J. Syst. Softw., 13(3):187–195, 1990. [8] K. J. Ottenstein and L. M. Ottenstein. The program dependence graph in a software development environment. In SDE 1: Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering posium on Practical software development environments, pages 177–184. ACM Press, 1984. [9] N. Rodrigues. A basis for slicing functional programs. Technical report, PURe Project Report, DI-CCTC, U. Minho, 2005. [10] R. W. Schwanke. An intelligent tool for re-engineering software modularity. In ICSE ’91: Proceedings of the 13th international conference on Software engineering, pages 83–92, Los Alamitos, CA, USA, 1991. IEEE Computer Society Press. [11] R. W. Schwanke and S. J. Hanson. Using neural networks to modularize software. Mach. Learn., 15(2):137–168, 1994.
243
Rodrigues and Barbosa
[12] A. M. Sloane and J. Holdsworth. Beyond traditional program slicing. In the International Symposium on Software Testing and Analysis, pages 180–186, San Diego, CA, 1996. ACM Press. [13] F. Tip. A survey of program slicing techniques. Journal of programming languages, 3:121–189, 1995. [14] M. Weiser. Program Slices: Formal, Psychological and Practical Investigatios of an Automatic Program Abstraction Methods. PhD thesis, University of Michigan, An Arbor, 1979. [15] M. Weiser. Programmers use slices when debugging. 25(7):446–452, 1982.
Commun. ACM,
[16] M. Weiser. Program slicing. IEEE Trans. Software Eng., 10(4):352–357, 1984. [17] T. A. Wiggerts. Using clustering algorithms in legacy systems remodularization. In WCRE ’97: Proceedings of the Fourth Working Conference on Reverse Engineering (WCRE ’97), page 33, Washington, DC, USA, 1997. IEEE Computer Society. [18] E. Yourdon and L. Constantine. Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design. Prentice-Hall, 1979. [19] J. Zhao. Applying slicing technique to software architectures. In Proc. of 4th IEEE International Conferencei on Engineering of Complex Computer Systems, pages 87–98, August 1998.
A
Toy Bank Account System
module Slicing where import Mpi data System = Sys { clients :: [Client], accounts :: [Account] } deriving Show data Client = Clt { cltid :: CltId, name :: CltName } deriving Show data Account = Acc { accid :: AccId, amount :: Amount } deriving Show type type type type
CltId CltName AccId Amount
= = = =
Int String Int Double
initClts :: [((CltId, CltName), (AccId, Amount))] -> System initClts = (uncurry Sys) . split (map ((uncurry Clt) . fst)) (map ((uncurry Acc) . snd)) findClt :: CltId -> System -> Maybe Client findClt cid sys = if (existsClt cid sys) then Just . head . filter ((cid ==) . cltid) . clients $ sys else Nothing
244
Rodrigues and Barbosa findAcc :: AccId -> System -> Maybe Account findAcc acid sys = if (existsAcc acid sys) then Just . head . filter ((acid ==) . accid) . accounts $ sys else Nothing existsClt :: CltId -> System -> Bool existsClt cid = elem cid . map cltid . clients existsAcc :: AccId -> System -> Bool existsAcc acid = elem acid . map accid . accounts insertClt :: (CltId, CltName) -> System -> System insertClt (cid, cname) (Sys clts accs) = if (existsClt cid (Sys clts accs)) then error "Client ID already exists!" else Sys ((Clt cid cname) : clts) accs insertAcc :: (AccId, Amount) -> System -> System insertAcc (acid, amount) (Sys clts accs) = if (existsAcc acid (Sys clts accs)) then error "Account ID already exists!" else Sys clts ((Acc acid amount) : accs) removeClt :: CltId -> System -> System removeClt cid (Sys clts accs) = if (existsClt cid (Sys clts accs)) then Sys (filter ((cid /=) . cltid) clts) accs else Sys clts accs removeAcc :: AccId -> System -> System removeAcc acid (Sys clts accs) = if (existsAcc acid (Sys clts accs)) then Sys clts (filter ((acid /=) . accid) accs) else Sys clts accs updateClt :: (CltId, CltName) -> System -> System updateClt (cid, cname) sys = if (existsClt cid sys) then insertClt (cid, cname) . removeClt cid $ sys else insertClt (cid, cname) sys updateAcc :: (AccId, Amount) -> System -> System updateAcc (acid, amount) sys = if (existsAcc acid sys) then insertAcc (acid, amount) . removeAcc acid $ sys else insertAcc (acid, amount) sys getCltName :: CltId -> System -> Maybe CltName getCltName cid sys = case findClt cid sys of Just clt -> Just . name $ clt Nothing -> Nothing getAccAmount :: AccId -> System -> Maybe Amount getAccAmount acid sys = case findAcc acid sys of Just acc -> Just . amount $ acc Nothing -> Nothing
245
Rodrigues and Barbosa
246
FACS 2005
Architecture Normalization for Component-Based Systems Lian Wen 1 Software Quality Institute Griffith University Nathan, Brisbane, Qld., 4111, AUSTRALIA
Geoff R. Dromey 2 Software Quality Institute Griffith University Nathan, Brisbane, Qld., 4111, AUSTRALIA
Abstract Being able to systematically change the original architecture of a component-based system to a desired target architecture without changing the set of functional requirements of the system is a useful capability. It opens up the possibility of making the architecture of any system conform to a particular form or shape of our choosing. The Behavior Tree notation makes it possible to realize this capability by inserting action-inert bridge component-state. For example, we can convert typical network component architectures into normalized tree-like architectures which have significant advantages. We can also use this “architecture change” capability to keep the architecture of a system stable when changes are made in the functional requirements. The results in this paper build on earlier work for formalizing the process of building a system out of its requirements and formalizing the impact of requirements change on the design of a system. Key words: Components, software architecture, formal methods, behavior trees, genetic software engineering.
1
Introduction
Software architecture is one of the critical issues in software engineering. In this paper, we will use the concept of component interaction network (CIN) 1 2
Email:
[email protected] Email:
[email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs
Wen and Dromey
[1,2] as our chosen architectural construct. A CIN is a graph that shows a software system’s components and the dependencies or interactions among them. Generally, a lower coupled system is more portable and easier to maintain. In this paper, we propose a tree-like hierarchical structure as an optimized component architecture because of the scalability and simplicity of trees. A tree is a connected graph with the least amount of coupling. Many architectual styles such as “Pipe & Filter”, “Shared Repository”, “Layered Abstract Machine”, “Bus”, “Client-Server”[6,7], and “C2”[12] can be abstracted as trees in special conditions. We call a software system with a tree-structured CIN a normalized system; the procedure for transforming a non-normalized system into a normalized system is called architecture normalization. It is usually argued that software architecture is determined or at least strongly influenced by the functional requirements of the system. A complex system may inevitably produce a complex architecture. However, our research shows that the topological structure of a CIN can be made independent of the functional requirements that the system satisfies. To prove this point, we use the Genetic Software Engineering (GSE) design process [1]. GSE provides a formal approach for designing component-based software systems. The underlying procedure of GSE includes three steps. Firstly, each individual functional requirement is translated (manually) into a corresponding tree-structured graph called a requirement behavior tree (RBT); then these trees are integrated into one large tree called a design behavior tree (DBT); finally from the DBT, other design diagrams includes the component architecture (CIN) are retrieved. In GSE, because the procedure for the last two steps is clearly defined, once the set of RBTs are fixed, the corresponding CIN is also fixed. Therefore, the focus of this problem is how we can have different sets of RBTs for the same set of functional requirements. To achieve this, the first method is to adjust the order of nodes in RBTs if the order has not be specified by the functional requirements; the second method is to insert bridge component-states, which are similar to hidden events in CSP [8]. The second method is more systematic that can transfer the CIN into any pre-defined form without affecting the functional requirements. In other words, the component architecture can be independent to the functional requirements. Based on our previous work, GSE not only provides a systematic approach to construct component-based software design, it also provides a formal method to do change impact analysis [2]. When a software system has been adjusted due to the changes in the functional requirements, a traceability model has been proposed to show the change impacts on the component architecture as well as on other design documents. Sometimes, changes in a system’s functional requirements will affect the architecture. Repeated changes of a system may eventually ruin the system’s architecture. However, based on the result of this paper, it is possible for the designers to preserve the architecture 248
Wen and Dromey
or minimize the change impact when the functional requirements have been changed. If the component architecture of a large system can be kept stable during the system’s lifetime, it will undoubtedly reduce the maintenance costs of that system. The paper is organized as following: Section 2 briefly introduces the concept of GSE. Section 3 introduces the architecture transformation theory. In Section 4, we propose the concept of software normalization, and a microwave oven case study has been presented to illustrate the architecture transformation theory and the simplicity of a normalized system. Finally, the last section gives a brief conclusion.
2
Genetic Software Engineering
2.1
Behavior Trees
The Behavior Tree notation, which has been given a formal semantics [1], captures in a simple tree-like form of composed component-states. It provides a direct and clearly traceable relationship between what is expressed in the natural language representation and its formal specification. For example, the sentence “whenever the door is open the light turns on” is translated to the behavior tree below:
The principal conventions of the notation for component-states are the graphical forms for associating with a component, a [State], an ??Event?? or a ?Decision?. Exactly what can be an event, a decision, a state, are built on the formal foundations of expressions 3 . To assist with traceability to original requirements a simple convention is followed. Tags (e.g. R1 and R2, etc, see below) are used to refer to the original requirement in the document that is being translated. System states are used to model high-level (abstract) behavior. They are represented by rectangles with a double line border. For details of the latest GSE notation please browse the SQI paper site [3]. 2.2
GSE Design Process
There are three major steps to construct a component-based architecture using the GSE design process. The first step is to translate each individual functional requirement into one or more corresponding requirements behavior trees (RBTs). The second step is to integrate all the RBTs into a single design 3
For general discussions, we may abstract everything as a state irrespective of whether it is an“even” or a“decision”.
249
Wen and Dromey
behavior tree (DBT) and the third step is to project the component interaction network (CIN) and many other design documents. Further details of the GSE procedures are given elsewhere [1,3]. To maximize communication our intent is to introduce the main ideas of the design method in a relatively informal way. The whole design process is best understood in the first instance by observing its application to a simple example. Later, the same example will be normalized to explain how the proposed method manipulates the DBT so that the corresponding component architecture can be transformed to a tree structure. We use a design example for a Microwave Oven which has already been published in the literature [1,2] and [4]. The seven stated functional requirements for the Microwave Oven problem are given in Table 1. Table 1 Functional Requirements for Microwave Oven •
R1. There is a single control button available for the user of the oven. If the oven is idle with the door is closed and you push the button, the oven will start cooking (that is, energize the power-tube for one minute).
•
R2. If the button is pushed while the oven is cooking it will cause the oven to cook for an extra minute.
•
R3. Pushing the button when the door is open has no effect (because it is disabled).
•
R4. Whenever the oven is cooking or the door is open the light in the oven will be on.
•
R5. Opening the door stops the cooking.
•
R6. Closing the door turns off the light. This is the normal idle state, prior to cooking when the user has placed food in the oven.
•
R7. If the oven times-out the light and the power-tube are turned off and then a beeper emits a sound to indicate that the cooking is finished.
The translation for the requirement 7 (R7) is shown in Fig. 1. From Fig. 1, we can see that, initially, the OVEN is in the “Cooking” state. When the OVEN times-out, the LIGHT is off, POWER-TUBE is off, BEEPER sounds etc. The “+” sign in the root state “OVEN [Cooking]” indicates these states are only implied in the original requirement. The behavior trees translated for the complete set of requirements can be found in [1]. When requirements translation has been completed, each individual functional requirement is translated to one or more corresponding requirement behavior tree(s) (RBTs). We can then systematically and incrementally construct a design behavior tree (DBT) that will satisfy all its requirements. The process of integrating two behavior trees is guided by the precondition and interaction axioms [1]. If an RBT’s root node exists in another RBT, the RBT can be integrated into the second tree at that point. For example, for the behavior trees of R3 and R6 shown in Fig. 2, it is found that the root node DOOR[Closed] of R3, exists in tree R6, so the RBT of R3 can be integrated with tree for R6 to create a new tree as shown in Fig. 3. 250
Wen and Dromey
Fig. 1. Behavior tree for requirement R7
Fig. 2. Behavior trees for requirement R3 and R6
Fig. 3. Result of Integrating R6 and R3 (the second part)
Using this same behavior-tree grafting process, a complete design is constructed (it evolves) incrementally by integrating RBTs and/or DBTs pairwise until we are left with a single final DBT shown in Fig. 4 (R8 is a missing requirement from the original functional requirements, but can be easily identified through the common domain knowledge of a microwave oven). This is the ideal for design construction that is realizable when all requirements are consistent, complete, composable and do not contain redundancies. Once the design behavior tree (DBT) has been constructed the next task is to retrieve the component interaction network (CIN) and other design dia251
Wen and Dromey
Fig. 4. Integration of all functional requirements
Fig. 5. Component Interaction Network - (CIN)
252
Wen and Dromey
grams. In the DBT representation, a given component may appear in different parts of the tree in different states (e.g., the OVEN component may appear in the Open state in one part of the tree and in the Cooking state in another part of the tree). Interpreting what we said earlier in a different way, we need to convert a design behavior-tree to a component-based design in which each distinct component is represented only once. Informally, the process starts at the root of the design behavior tree and moves systematically down the tree towards the leaf nodes including each component and each component interaction (e.g. arrow) that is not already present. When this is done systematically the tree is transformed into a component-based design in which each distinct component is represented only once. We call this a Component Interaction Network (CIN), which shows the interaction relationships between components and presents the component architecture. The CIN derived from the Microwave Oven design behavior tree is shown in Fig. 5. The algorithms to project other types of design diagrams are not related to the topics of this paper, so they will not be pursued here.
3
Architecture Transformation Theory
3.1
Definitions
In the original definition of a CIN, a link is directional. If there are two links La and Lb that connect a pair of components Ci to Cj in different directions, La and Lb are treated as two separated links. In the section, in order to simplify the discussion, we merge La and Lb into one single link, without explication, any link is supposed to be bi-directional, and a one-way link is only a special case of a two-way link (this difference is unobservable if we abstract a CIN as a bidirectional graph). Definition 3.1 A network is a graph that includes links and components, each component only appears once in the network and between two different components, there exists at most one link. A link is identified by the two components such as (Ci , Cj ), where Ci and Cj are two components in the network. Definition 3.2 In a network N , if there exists a link between two components, we say that these two components are directly connected. Suppose C1 , C2 , ..., Cm are m different components in N , if for all 1 ≤ i ≤ (m − 1), Ci and Ci+1 are directly connected, we say C1 , C2 , ..., Cm form a path and the length of this path is m − 1. Definition 3.3 A network is called a connected network, if for all pairs of components Ci , Cj , which belong to this network, there exists a path starting from Ci and ending at Cj in this network. 253
Wen and Dromey
Fig. 6. A simple DBT of 4 components and 4 states
Fig. 7. The CIN N of T shown in Fig. 6
Definition 3.4 From a DBT T , we can project a CIN N through the algorithm defined in GSE; the CIN is called this DBT’s associated CIN and it is denoted as N = M (T ). Proposition 3.5 A CIN is a connected network. Proof. Let T be a DBT and N be the associated CIN, we have N = M (T ), and Ci , Cj are two components belonging to N . Suppose Cr is the component associated with the root node in T . According to the algorithm to project N from T , it is easy to prove that there is a path between Ci and Cr in N . Similarly, there is a path between Cj and Cr . Merging the two paths together, we have a path linking Ci to Cj , so N is a connected network. The fact that a CIN must be a connected network is important for proving the paper’s main theorem, which shows that the structure of a CIN can be manipulated into any preferred form by inserting nodes in the associated DBT. Before we prove this theorem, in the next subsection, we will use a simple example to illustrate the basic ideas. 3.2
A Simple Example
Fig. 6 shows a simple DBT T , and the associated CIN N of T is shown in Fig. 7. We have removed the arrows in N to simplify the discussion. ˜ shown in Fig. 8 is more desirable. The Now suppose that the CIN N problem is how we could insert bridge component-states in T to make the new ˜. tree’s associated CIN become N 254
Wen and Dromey
˜ Fig. 8. The desired CIN N
Fig. 9. Two bridge component-states are added into tree T to create tree T 0
The link set of N is LN = {(C1 , C2 ), (C1 , C3 ), (C3 , C4 )}, and the link set ˜ of N is LN˜ = {(C1 , C4 ), (C1 , C3 ), (C3 , C2 )}.Because the links of (C1 , C4 ) and (C3 , C2 ) exist in LN˜ but not in LN , we can add two nodes in T to create a new tree T 0 shown in Fig. 9. Let N 0 be the associated CIN of T 0 , then it is obvious that the link set for N 0 is: LN 0 = {(C1 , C2 ), (C1 , C3 ), (C3 , C4 ), (C1 , C4 ), (C3 , C2 )}. Comparing LN˜ with LN 0 it is found that the links (C1 , C2 ), (C3 , C4 ) exist in LN 0 but not in LN˜ . To get rid of the extra links, we need to insert bridge componentstates between the unwanted direct connections. In Fig. 9, there is a direct connection from C1 [Foo1] to C2 [Foo2]. Because C1 and C2 are not supposed to be directly connected, we need to insert bridge state(s) between the two ˜ , we find the path to link C1 and C2 is C1 , C3 , C2 , so we nodes. Checking N should insert a bridge component-state of C3 between C1 [Foo1] and C2 [Foo2]; by similar analysis, we know that a bridge component-state of C1 should be inserted between C3 [Foo3] and C4 [Foo4]. The result new tree is shown in Fig. 10. Inspecting this tree and we find that if we remove C4 [Brg1] and C2 [Brg2], the associated CIN will not be affected. We therefore remove these two nodes to get the final T˜ shown in Fig. 11. ˜ = M (T˜). If we ignore the bridge componentIt is easy to prove that N ˜ states in T , the behavior of T˜ is exactly the same as the behavior of T . This simple example clearly illustrates how we can transform a component architecture into a new form by inserting bridge component-states into the DBT. 255
Wen and Dromey
Fig. 10. Two more bridge component-states are inserted to get rid of the unwanted direct connections
Fig. 11. Prune the unnecessary bridge component-states and get the final T˜
3.3
Behavior Invariance Theorem
Definition 3.6 a bridge component-state, also called bridge state in short, is a special state in a behavior tree. It is visible when the tree is observed from the solution domain, but it becomes invisible when we observe the tree in the problem domain. It is similar to the concept of a hidden event in CSP [8]. When we observe a system from higher level, some low level details become unobservable Generally, a design behavior tree (DBT) is a bridge to connect the two domains of a system: the problem domain and the solution domain. In the problem domain, a DBT should capture all the functional requirements and in the solution domain, many design decisions are properties that directly emerge from a DBT. Proposition 3.7 When we insert bridge states in a DBT, the bridge states will not change the functional requirements captured by the behavior tree. Theorem 3.8 Let T be a DBT and N be its associated CIN, where N = M (T ). Suppose there are a total of s components C1 , C2 , ..., Cs in N and ˜ is an arbitrary connected network that includes and only includes those s N 256
Wen and Dromey
components. Then, by adding extra nodes to T , we can produce a new DBT ˜ as the associated CIN, where N ˜ = M (T˜). T˜ with N ˜ , because they have the same component Proof. Let us compare N and N set, if they are different, they must have different link sets. If there is a link ˜ , we can simply add a node of Cj under a node of (Ci , Cj ) that only exist in N Ci in tree T to make the associated CIN have link (Ci , Cj ). So the problem is ˜ , from N by inserting nodes in how we can remove links, which are not in N T . If a link (Cl , Ck ) only belongs to N , then in tree T , there must be nodes ˜ is a connected of Cl that are directly connected to nodes of Ck . Because N ˜ . Excluding Cl and network, there must exist a path between Cl and Ck in N Ck , supposing the rest part of the path is Cn1 , Cn2 , ..., Cnt , then at the each occurrence of a direct connection between a node of Cl and a node of Ck in T , we add a series of nodes of Cn1 , Cn2 , ..., Cnt . Then the modified behavior tree’s associated CIN will not have the direct link of (Cl , Ck ). Because the ˜ , the insertion inserted nodes are ordered according to an existing path in N ˜. of the new states will not introduce extra links that are not in N Theorem 3.9 Let T be a DBT and N be its associated CIN. N has s compon˜ is an arbitrary connected network that only includes ents Cl , C2 , ..., Cs and N those s components. Then, we can create a new DBT T˜ that capture the same ˜ = M (T˜). set of functional requirements as T , and has N This theorem is the direct result from Theorem 3.8 and Proposition 3.7. It states that the component architecture can be independent to the functional requirements. Therefore, it is possible for us to investigate universal optimized software architecture regardless the functional requirements of a particular system. In the next section, we propose a tree-structured architecture as a possible universal optimized form for software architecture due to some unique features of trees.
4
Software Normalization
4.1
Trees and Normalized DBTs
There are a number of equivalent definitions of trees and a number of mathematical properties that imply this equivalence [9]. Since most of the properties are obvious, we will not repeat some of the proofs. Proposition 4.1 A connected graph is a tree when and only when for each pair of nodes in the graph; there is only one unique path between them. [9].A connected graph is a tree when and only when there is no circular path. Proposition 4.2 A connected graph with n nodes has at least (n − 1) links. It is a tree when and only when there are (n − 1) links. In other words, a tree is a connected graph with the least possible number of links [10]. 257
Wen and Dromey
Definition 4.3 A DBT is called a normalized DBT if the associated CIN is a tree. A software system with a normalized DBT is called normalized software system with normalized architecture. Theorem 4.4 Any DBT can be normalized (transformed into a normalized DBT) without changing the functional requirements. (Direct result from Theorem 3.9). Proposition 4.5 For a CIN N with n components, the number of the links must be greater than or equal to (n − 1). The number of links equals to (n − 1) if and only if the system is normalized. If we use the number of links among components as a measure of the complexity of the architecture of software systems, Proposition 4.5 indicates that a normalized software system has the simplest architecture. Proposition 4.6 Let T be a DBT and N be its associated CIN. T is normalized when and only when for all pairs of components Ci and Cj in N , there exists only one path between the two components in N provided no node in the DBT is included twice in a path. This proposition is a direct result from Proposition 4.1 and the definition of a normalized system 4 . It indicates a very important feature of a normalized software system. For large software systems, we frequently face the problem of passing references, messages or attributes between different components. Because we cannot make each pair of components directly connected, we have to use some components as bridges to pass messages or references. If there are multiple paths between two components, we may not know which paths are used and which are not and it will make the change impact analysis [2] more difficult. Proposition 4.7 If there are no mutual components in two tree-structured CINs, when the two CINs are connected by a link, the new CIN is also treestructured. Proposition 4.8 Consider two tree-structured CINs N1 , N2 . If there is only one mutual component C in both CINs, the two CINs can be merged through the mutual component C; then the merged CIN is also tree-structured. Theorem 4.9 If a normalized DBT T is broken into two DBTs T1 and T2 by cutting off a link; then T1 and T2 are also normalized DBTs. Proof. If T1 is not normalized, let N1 be the associated CIN of T1 . N1 is not tree-structured. According to Proposition 4.6, there exists at least a pair of components Ci , Cj in N1 that are connected by more than one path. When T1 and T2 are merged into the original T , because no link in the T1 is lost in T , the associated CIN of T has all the links in N1 . So the multiple paths linking 4
For a pair of components, it may have multiple types of information exchanged between them, for example, data flows or controls. However, in this paper, we assume that we can apply one type of abstract connection that can pass all the different types of information.
258
Wen and Dromey
C1 and C2 are also in T ’s associated CIN, but this is contrary to the condition that T is normalized. Therefore, we know T1 is normalized, and similarly T2 must be normalized. Proposition 4.7, Proposition 4.8 and Theorem 4.9 specify an important feature of trees. That is, if a tree is broken into two parts, each part is still a tree; if two trees are integrated into one graph, the graph is also a tree if the integration is based on some specified rules. This feature is important for building large scale systems because the normalization property can be hold in different levels. 4.2
Case Study
In the second section, we have used an example of Microwave Oven to explain the fundamental concepts of GSE. Here we will normalize it to demonstrate how the component architecture can be simplified through the normalization. Fig. 12 shows a normalized DBT. The normalized process is a mixture of inserting bridge states and adjusting the order of some states. The bridge component-states are filled with grey. The associated CIN of the DBT is shown in Fig. 13. Comparing the normalized DBT with the original DBT in Fig. 4, we have found that differences between the two behavior trees are trivial and both DBTs capture all the functional requirements in Table 1. However the differences between the two CINs are significant. The CIN shown in Fig. 13 is much simpler than the original CIN in Fig. 5. Even though the Microwave Oven case study is a small system with only 7 components, the architecture normalization has dramatically simplified the component architecture. If the same process is applied in large systems, we expect that the impact of simplification on the component architecture will be more significant. The tree shown in Fig. 5 has only two levels. This does not mean that a normalization process can only produce a CIN of two levels. Theoretically, we can have the CIN as any preferred forms, but due to the limitation of space, no further examples can be given in this paper.
5
Conclusion
This paper has addressed two things: the relationship between the functional requirements and the component architecture of a system, and the control of changes on the architecture of a system. A consequence of this work shows the advantages of using tree-like architecture as an optimized form due to its simplicity and scalability. The component architecture of a system must support the implementation of all the integrated behaviors of a system. The latter are in turn implied by the set of functional requirements for the system. Current software engineering practice suggests that, for a given problem, there exist many different 259
Wen and Dromey
Fig. 12. A normalized DBT for the Microwave Oven case study
Fig. 13. The tree-structured CIN associated with the DBT in Fig. 12
approaches to designing a solution to the problem [11] each of which may lead to a system with a different component architecture. What we have sought to do is establish the relationship between a set of functional requirements and the component architecture of a system and then shown how systematic change of the architecture can be achieved without affecting the set of functional requirements that the system satisfies. 260
Wen and Dromey
Once we have the means to systematically change the component architecture of a system we can equally effectively use this power to resist the consequences of changes on the architecture of a system. It is a well known observation of software engineering practice that repeated change to the functional requirements of a software system tends to gradually degrade the original component architecture and increase the cost of the maintenance. The results in this paper prove that we can usually keep the component architecture stable when a system is changed. This has significant implications for reducing the cost of software maintenance.
References [1] Geoff, R. G., “From Requirements to Design: Formalizing the Key Steps,” (Invited Keynote Address), SEFM’2003, pp. 2-11, Brisbane, September, 2003. [2] Wen, L., Dromey, R. G., “From Requirements Change to Design Change: A Formal Path,” SEFM 2004, pp. 104-113, 2004. [3] SQI Paper, http://www.sqi.gu.edu.au/gse/papers. [4] Shlaer, S., Mellor, S.J., “Structured Development for Real-Time Systems,” Vols. 1-3, Yourdon Press, 1985. [5] Bass, L., Clements, P. and Kazman, R., “Software Architecture in Practice,” Addision Wesley Longman, Inc. 1998. [6] Stafford, J. A., Wolf, A. L., “Software Architecture,” Component-Based Software Engineering, putting the pieces together, Chapter 20, 2001. [7] Perry, D., Wolf, A., “Foundations for the Study of Software Architecture,” SIGSOFT Software Engineering Notes, Vol. 17, No. 4, Oct., 1992. [8] Hoare, C.A.R., “Communicating Sequential Processes,” Prentice-Hall, 1985. [9] Knuth, D. E., “The Art of Computer Programming, Fundamental Algorithms,” 3rd edition, Vol 1, Addison Wesley Longman, 1992 [10] Sedgewick, R., “Algorithms,” Addison-Wesley Publishing Company, Inc, 1989. [11] Glass, R. L., “Facts and Fallacies of Software Engineering,” Pearson Education, Inc, 2003. [12] Medvidovic, N., “One the Role of Middleware in Architecture-Based Software Development,” SEKE’ 02, 2002.
261
Wen and Dromey
262
FACS 2005
Model Checking of Component Behavior Specification: A Real Life Experience Pavel Jezek 1,4, Jan Kofron 2,4 and Frantisek Plasil 3,4 Charles University in Prague Department of Software Engineering Czech Republic and Academy of Sciences of the Czech Republic Institute of Computer Science Czech Republic
Abstract This paper is based on a real-life experience with behavior specification of a nontrivial component-based application. The experience is that model checking of such a specification yields very long error traces (providing counterexamples) in the order of magnitude of hundreds of states. Analyzing and interpreting such an error trace to localize and debug the actual specification is a tedious work. We present two techniques designed to address the problem: state space visualization and protocol annotation and share the positive experience with applying them, in terms of making the debugging process more efficient.
1
Introduction
1.1 Software Component Behavior and Model Checking Model checking is one of the formal verification methods. Checking for important properties of a system (e.g. absence of deadlocks, array element indices within limits) assumes a model describing the system behavior is available. The model defines a state space and the desired property is verified via its exhaustive traversal. In case of software model checking, a model can be obtained either from a system specification such as ADL (e.g. Wright [15], FSP 1
Email:
[email protected] Email:
[email protected],
[email protected] 3 Email:
[email protected],
[email protected] 4 This work was partially supported by the Czech Academy of Sciences project 1ET400300504, the Grant Agency of the Czech Republic project GACR 102/03/0672 and France Telecom under the external research contract number 46127110. This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs 2
Jezek, Kofron and Plasil
[5], behavior protocols [1]) or via the source code analysis (the Bandera [10], SLAM [7] projects and Java PathFinder [11]). Model checking faces two key inherent problems — state space explosion and error trace complexity and interpretation. An error trace is the path through the state space representing the particular computation in which the desired property is violated. The main problem regarding error traces is that a very long trace, in the order of magnitude of hundreds of states, may be very hard to analyze and interpret [21,22,23,24]. There are two widely used tactics for exhaustive traversal of the state space: Depth First Search (DFS) and Breadth First Search (BFS). Specification of a software unit (e.g. software component) usually generates huge state space. This is caused by the need of modeling large data type domains and parallelism (threads/processes). Therefore, the BFS-based tactics cannot be practically used because of their high memory requirements; instead, a DFS-based tactic has to be chosen. Unfortunately, in comparison with BFS, DFS has a drawback — the error trace it finds is not the shortest one in general.
1.2 Goals and Structure of the Paper Behavior protocols [1] are a method of software component behavior specification. They are used for behavior specification in the SOFA [16] and the Fractal [4] component models. We employed behavior protocols in several non-trivial case studies of component behavior specification, comprising high number of components. This includes a non-trivial component-based test bed application in a project funded by France Telecom aiming at integration of behavior protocols into Fractal component model. One of the key lessons learned has been that the error trace length problem is severe and has to be addressed seriously. The goals of this paper are (i) to share with the reader the experience gained during specifying behavior of a non-trivial component-based application and show that the error trace length problem is really serious, and (ii) to describe the techniques we designed to address this problem. These goals are reflected in the rest of the paper as follows: Sect. 2.1 and 2.2 shortly describe behavior protocols and Sect. 2.3 illustrates how to use them for component behavior specification and demonstrates the problem with the error trace length on a fragment of a non-trivial application that will be used as a running example. In Sect. 3, as the key contribution, the proposed techniques for addressing the error trace length and interpretation problems are described. Sect. 4 contains an evaluation of the proposed techniques while Sect. 5 discusses related work. Sect. 6 concludes the paper and suggests future research direction. 264
Jezek, Kofron and Plasil
2
Behavior Protocol Checking
2.1 Behavior Protocols and Software Components Software components are building blocks of software and communicate through interface bindings [4,15,16]. A component may provide some functionality by its provides (server) interfaces and may require other functionality from its environment (other components) though its requires (client) interfaces. As an example, consider the DhcpServer component on Fig. 3. It is a composite component built of two other components — ClientManager and DhcpListener that are bound via their Listener interfaces. The DhcpServer has a provides interface (Mgmt) and two requires interfaces (PermanentDb and Callback). A behavior protocol [1] is an expression describing the behavior of a component; the behavior means the activity on component’s interfaces viewed as sequences (traces) of accepted and emitted method call events. A behavior protocol 5 is syntactically composed of event denotations (tokens), the operators (Fig. 1 and parentheses. For a method m on an interface i, there are four event token variants: Emitting an invocation: Emitting a response:
!i.m↑ !i.m↓
Accepting an invocation: Accepting a response:
?i.m↑ ?i.m↓
Furthermore, three syntactic abbreviations of method calls are defined: Issuing a method call: !i.m is an abbreviation for !i.m↑;?i.m↓ Accepting a method call: ?i.m is an abbreviation for ?i.m↑;!i.m↓ Processing of a method: ?i.m {expr } stands for ?i.m↑;expr ;!i.m↓ meaning that expr defines the m’s reaction to the call in terms of issuing and accepting other events. Operator ; + * | ||
Meaning Sequence: a;b means after a is performed b is performed Alternative: a+b means either a or b is performed Repetition: a* means a is performed zero to a finite number of times And-parallel: a|b generates all arbitrary interleavings of the sequences defined by a and b Or-parallel: a||b stands for(a | b ) + a + b
Fig. 1. Basic protocols operators
As an example consider the fragment of behavior protocol in Fig. 2. Ac5
In principle, behavior protocols are similar to CSP, however they are not defined via recursive equations, but by expressions only, and the generated traces are finite. Also, parallel operators | and || are syntactical abbreviation in principle (can be replaced by + and ;). Parallel composition in the sense of CSP is covered by the consent operator (Sect. 2.2). Since a full fledged definition of behavior protocols requires much more space than it is provided in this paper, we refer the reader for details to [1,2].
265
Jezek, Kofron and Plasil
cording to it, the ClientManager component is able to accept RequestNew, Update and Return method calls on the interface Listener in parallel any finite number of times. If a Return method call is accepted, the component reacts by performing a Disconnected method call on its Callback interface. Furthermore, a Disconnected method call can be emitted at any time. ( ?Listener.RequestNew || ?Listener.Update || ?Listener.Return { !Callback.Disconnected } )* | !Callback.Disconnected*
Fig. 2. Fragment of the ClientManager frame protocol
Although a behavior protocol may define an infinite set of traces, each trace is finite — the repetition operator denotes any arbitrary finite number of its argument repetition. Each behavior protocol defines a finite automaton with transitions labeled by the protocol’s events. PermanentDb
Callback Disconnected
GetIP PermanentDb
Callback
Legend: Interface name
Requires
Interface name
Provides
ClientManager
Binding
Listener
Mgmt
RequestNew Update Return UsePermanentIPs UseTransientIPs
Listener
DhcpListener
DhcpServer Mgmt
Fig. 3. DhcpServer composite component architecture
A frame (behavior) protocol of a component describes its ”black-box” behavior (only the events on provides and requires interfaces are visible), while an architecture protocol of a (composite) component describes its behavior as defined by the composition of its first-level subcomponents, i.e. the communication events of these subcomponents appear in the behavior. Using the DhcpServer composite component in Fig. 3 as an example, its frame protocol contains only the events of the Mgmt, PermanentDb and Callback interfaces; the architecture protocol of the DhcpServer component is created by a parallel composition of frame protocols of DhcpListener and ClientManager components. 266
Jezek, Kofron and Plasil
2.2 Protocol Compliance and Composition The key benefit of using behavior protocols to describe behavior of components is at the design stage of an application. The developer can check whether the components he/she composes have compatible behavior: it enables for checking the component compatibility both horizontally (e.g. between the ClientManager and DhcpListener components) and vertically (between the DhcpServer frame protocol and the architecture protocol created by parallel composition of the ClientManager and DhcpListener frame protocols) [1]. The horizontal protocol compatibility is defined via the consent operator [2], which is basically a parallel composition converting the subcomponents’ communication events to internal (τ ) events. This is similar to CSP, however in addition the consent composition detects three kinds of composition errors: bad activity, no activity, and infinite activity. Bad activity occurs when a component emits a call on an interface and the component providing that interface is not able to accept (according to its behavior protocol) such a call. No activity is a deadlock and infinite activity means that there is ”no agreement” in two composed repetitions on a joint exit (there is a loop that cannot be exited due to the nature of communication). The consent operator and composition errors are thoroughly described in [2]. The vertical compatibility is captured via protocol compliance [1]. The protocol compliance is defined between the frame protocol of a component and its architecture protocol, i.e. the protocol created from its subcomponents’ frame protocols composed via the consent operator. 2.3 Example: A Fragment of the Test Bed Application In this section we describe a fragment of a test bed application (”Wireless Internet Access”) mentioned in Sect. 1.2. The application is a quite complex system allowing clients of various air-carriers to access the Internet from airport lounges via local Wi-Fi networks. The whole Wireless Internet Access application is composed of about 20 Fractal components. One of the key components is the DhcpServer composite component (Fig. 3). It communicates with system’s clients at the lowest level, i.e. it is responsible for managing clients’ IP addresses, monitoring overall state of the local wireless network and providing this information to the rest of the system. A simplified version is presented in this section. 2.3.1 DhcpServer Architecture In principle, the DhcpServer composite component works in two functionality modes which can be swapped via the Mgmt interface: (i) DhcpServer generates IP addresses dynamically for new clients (this is the default functionality that can be also set by calling the UseTransientIPs method on the Mgmt interface). 267
Jezek, Kofron and Plasil
(ii) DhcpServer assigns IP addresses statically based on mappings between clients’ MAC and IP addresses in an external database accessible via the PermanentDb interface (this functionality is set by calling the UsePermanentIPs method on the Mgmt interface). When a client disconnects from the network, the DhcpServer calls the Disconnected method on its Callback interface to notify its environment about this event. As already mentioned, the DhcpServer functionality is implemented by its subcomponents: ClientManager and DhcpListener. The architecture of the DhcpServer and bindings between the subcomponents is shown on Fig. 3. ( !Listener.RequestNew || !Listener.Update || !Listener.Return )*
Fig. 4. Frame protocol of DhcpListener
The DhcpListener component is responsible for the ”real” communication with network clients and the network infrastructure. Internally it uses existing system infrastructure to manage client nodes. Events that occur at the network level are unified by DhcpListener which converts them to method calls. As they can arrive at any time, the corresponding frame protocol has to express the inherent parallelism (Fig. 4). ClientManager accepts notifications on network events from the DhcpListener and processes them either internally (RequestNew and Update) or forwards them to DhcpServer’s environment (via Callback.Disconnected) as part of Return processing. ( ( ( ( (
A
B
?Listener.RequestNew || ?Listener.Update A.1 || ?Listener.Return { !Callback.Disconnected } )* | !Callback.Disconnected* A.2 ) | ?Mgmt.UsePermanentIPs↑ A.3 ) ; !Mgmt.UsePermanentIPs↓ ; ( ( ( ?Listener.RequestNew { !PermanentDb.GetIP } || ?Listener.Update B.1 || ?Listener.Return { !Callback.Disconnected } )* | !Callback.Disconnected* B.2 ) | ?Mgmt.UseTransientIPs↑ B.3 ) ; !Mgmt.UseTransientIPs↓ )* )
Fig. 5. Frame protocol of ClientManager (The highlighted lines denote the events forming the composition error described in Sect. 2.3.3)
268
Jezek, Kofron and Plasil
ClientManager’s behavior is expressed by its frame protocol in Fig. 5. The part A of the protocol represents the ”generate IP addresses dynamically” functionality of ClientManager while the part B represents the ”assign IP addresses statically” functionality. The parts A.1 and B.1 express the ClientManager’s ability to process DhcpListener’s notifications and also describe reactions to them. The parts A.2 and B.2 capture ClientManager’s ability to detect client disconnections internally, resulting in a call of Disconnected. The ClientManager’s functionality mode swapping mechanism is reflected in the parts A.3 and B.3: At any time, ClientManager can accept a method call requesting a mode change (?Mgmt.UsePermanentIPs↑ or ?Mgmt.UseTransientIPs↑), but it does not respond it immediately. Instead, it waits until the processing of all pending method calls on the Listener interface is finished and then it issues the !Mgmt.UsePermanentIPs↓ or the !Mgmt.UseTransientIPs↓ response. Then ClientManager is again ready to accept further calls on the Listener interface and respond to them according to its newly set functionality mode. 2.3.2 DhcpServer Frame Protocol The frame protocol of DhcpServer is shown in Fig. 6. The interactions between DhcpServer’s subcomponents are not visible in it. However, their communication can trigger interaction with the environment of DhcpServer that is therefore visible in its frame protocol. This is illustrated by the part C of the frame protocol in Fig. 6: the !Callback.Disconnected call can be invoked by the ClientManager subcomponent either as a reaction to an accepted ?Listener.Return call or due to its internal detection of client disconnection (Sect. 2.3.1); however these two causes are indistinguishable in the DhcpServer frame protocol. The part D of the protocol expresses the DhpcServer’s ability to swap between its two modes (Sect. 2.3.1). ( C
D
!Callback.Disconnected* | !Callback.Disconnected* | ( ?Mgmt.UsePermanentIPs↑ ; ( !PermanentDb.GetIP* D.1 wrong operator selected + ( !Mgmt.UsePermanentIPs↓ ; D.2 ?Mgmt.UseTransientIPs↑ ) ) ; !Mgmt.UseTransientIPs↓ )* )
Fig. 6. First version of the frame protocol of DhcpServer (Instead of +, the | operator should have been used here as demonstrated by the error trace in Sect. 2.3.3)
2.3.3 Checking for Composition Errors and Compliance The application developer that sets up a composite component (such as DhcpServer) creates also its frame protocol, whereas the frame protocols of sub269
Jezek, Kofron and Plasil
components (ClientManager and DhcpListener) are created by their respective authors. It is the developer’s responsibility to check first for composition errors (horizontal compatibility) between subcomponents (Sect. 2.2). The frame protocols of ClientManager and DhcpListener (Sect. 2.3.1) as presented above are compatible in this sense. It should be emphasized that behavior incompatibility may occur even though the components are connected via type-compatible interfaces. The next step in a composite component’s development is to check for compliance (vertical compatibility (Sect. 2.2)) of its frame protocol with its architecture protocol. During the development of the first version of the DhpcServer component, the + operator was used in its frame protocol (Fig. 6). However, such a protocol was not compliant with its architecture protocol (Sect. 2.3.1). Using the behavior protocol checker, the error was found and reported by an error trace (Fig. 7).
(S0) τListener.Return↑ (S1) τListener.Update↑ (S2) τListener.Update↓ (S3) τListener.RequestNew↑ (S4) τListener.RequestNew↓ (S5) τMgmt.UsePermanentIPs↑ (S6) τCallback.Disconnected↑ (S7) τCallback.Disconnected↑ (S46) τCallback.Disconnected↓ (S47) τListener.Return↓ (S48) τListener.Return↑ (S49) τListener.Update↑ (S50) τListener.Update↓ (S51) τListener.RequestNew↑ (S52) τListener.RequestNew↓ (S53) τCallback.Disconnected↓ (S54) τCallback.Disconnected↑ (S55) τCallback.Disconnected↓ (S56) τListener.Return↓ (S57) τListener.RequestNew↑
(S117) (S118) (S127) (S128) (S129) (S130) (S171) (S188) (S189) (S190) (S191) (S192) (S193) (S226) (S227) (S228) (S229) (S230) (S231)
τListener.RequestNew↓ τListener.Update↑ τListener.Update↓ τListener.Return↑ τCallback.Disconnected↑ τCallback.Disconnected↑ τCallback.Disconnected↓ τListener.Return↓ τListener.Update↑ τListener.Update↓ τListener.RequestNew↑ τListener.RequestNew↓ τCallback.Disconnected↓ τMgmt.UsePermanentIPs↓ τListener.Return↑ τListener.Update↑ τListener.Update↓ τListener.RequestNew↑ !PermanentDb.GetIP↑
Fig. 7. Error trace representing a compliance error
However, identifying the actual error only from such a plain error trace is not a trivial task. The key problem is that error traces of real components tend to be rather cryptic; in particular, several method calls of the frame protocol can occur in parallel. This leads to interleaving of the error-related events with other events processed in ”background”. For example, only the highlighted events on Fig. 7 lead to the conclusion that the parts D.1 and D.2 of DhcpServer’s frame protocol (Fig. 6) need to be processed in parallel, because the ClientManager can issue the !PermanentDb.GetIP call (in B.1) in parallel with accepting the ?Mgmt.UseTransientIPs↓ call (in B.3). 270
Jezek, Kofron and Plasil
3
Approaches to Error Trace Analysis and Interpretation
In behavior protocols, an error trace’s end is reflected in the state space (defined by the protocol) as a state F. It is a specific feature of behavior protocols that each trace reaching F is an error trace. Hence, F is an error state. In consequence, an error state represents a set of error traces SF. (Note that the existence of error states is not a general feature of an LTS.) Finding all elements of SF means complete traverse of the state space. Sometimes, however, the knowledge of the whole set of error traces corresponding to an error state may be very beneficial for error cause’s identification. As the set of error traces may be huge (or even infinite), providing it as a list of traces would not be of much help. Therefore, additional forms of SF representation are needed. 3.1 Plain Error Trace As demonstrated in Sect. 2.3.3, an error trace identifying a compliance or composition error may be quite long and hard to interpret. Moreover, due to the DFS tactic used, the error trace may contain states not capturing ”the essence” of the error. For example, the state subsequence S5, S226, S230, S231 of the error trace in Fig. 7 also forms an error trace, but the longer one was found first. In this respect, the other states are ”not-important” ones. It is a challenge to filter out these ”not-important” states (to find a canonical representation of the error trace set associated with an error state). One can imagine a filtering technique based on iterative re-searching the state space, which would take advantage of the knowledge of the depth at which the error was found. 3.2 State Space Visualization One of the checking outputs we propose in order to make error interpretation easier is state space visualization. Visualization is a graphical representation of the state space associated with the protocol (Sect. 2.2). For the state space related to Sect. 2.3.1, this is illustrated on Fig. 8 (only a fragment of the state space is captured here for brevity). This helps find out what the problem cause is by tracking the error trace in the state space. Apparently, state space size might be a problem here — a state space having more than 1,000 states is hard to visualize. Thus, visualizing only a part of the state space becomes a practical necessity. In this perspective, capturing only the part containing the error state and its ”neighborhood” is a straightforward thought. We employed this idea with a very positive experience. Such a result still provides useful information, detailed enough to identify where the essence of an error is. Technically, our visualization outputs all the transitions leading from a state on the error trace — this helps with 271
Jezek, Kofron and Plasil
finding correspondence with the original protocol. τListener.RequestNew↑
•
S0
τCallback.Disconnected↑
τListener.Update↑
•
τMgmt.UsePermanentIPs↑
•
•
τListener.Return↑
S226
τMgmt.UsePermanentIPs↑
τCallback.Disconnected↑
S5
τListener.Return↑
•
•
τMgmt.UsePermanentIPs↓ S230
τListener.RequestNew↑
τMgmt.UseTransientIPs↑
τCallback.Disconnected↑
S231
τMgmt.UseTransientIPs↑
•
τCallback.Disconnected↑
•
•
•
Fig. 8. State space visualization — dashed lines represent longer paths omitted due to the limited space of this paper. The state S231 is the error state F.
3.3 Protocol Annotation Another way of representing an error state are annotated protocols. Consider a composition of protocols P and Q via the consent operator. If the composition yields a composition error in an error state S, the state S is represented by marks put into P and Q, forming the annotated protocols PS and QS. For illustration consider Fig. 9 where a fragment of the annotated frame protocol of DhcpServer corresponding to the error trace in Sect. 2.3.3 is depicted. Advantageously, there is no need to construct the entire state space, but it suffices to annotate only the protocols featuring as operands in a composition. For example, the set of error traces specified by the annotated protocol in Fig. 9, together with the annotated architecture protocol of DhcpServer internals, yields the error traces: τ Callback.Disconnected↑; τ Callback.Disconnected↓; τ Mgmt.UsePermanentIps↑; τ Mgmt.UsePermanentIps↓ and τ Mgmt.UsePermanentIps↑; τ Mgmt.UsePermanentIps↓; τ Callback.Disconnected↑; τ Callback.Disconnected↓ There are two issues to be addressed with this technique: (i) Identical prefixes in alternatives. For example, consider the following frame protocol: (?i.m1; ?i.m2) + (?i.m1; ?i.m3). If an error state is to be indicated after ?i.m1, the corresponding annotated protocol takes the form: (?i.m1; ?i.m2) + (?i.m1; ?i.m3) Even though one of the alternatives could be eliminated, we prefer keep 272
Jezek, Kofron and Plasil
them both to provide more context of the error. (ii) Transformations performed on input protocols. In the protocol checker, the protocols are modified during the parsing process (e.g. ?i.m is decomposed into ?i.m↑; !i.m↓ and the formatting information is lost). Therefore, exact mapping of an error state back to the source protocols may be difficult. Fortunately, the transformations typically still yield a reasonably readable behavior protocol, which, annotated, provides useful information for specification debugging. ( ( ?Callback.Disconnected↑; !Callback.Disconnected↓ )* ) | ( ( !Mgmt.UsePermanentIps↑ ; ( (?PermanentDb.GetIp↑; !PermanentDb.GetIp↓)* ) + ( ?Mgmt.UsePermanentIps↓; !Mgmt.UseTransientIps↑ ) ; ?Mgmt.UseTransientIps$↓* ) )
Fig. 9. DhcpServer annotated frame protocol - simplified.
4
Evaluation
During the work on the case study mentioned in Sect. 2.3, it has turned out that combining all of the three forms of checking output is the most promising approach. Even though protocol annotation (Sect. 3.3) appears a very generic technique, in complex cases the other checking outputs have to be also provided, since tracking all the path alternatives in a annotated complex protocol may be error-prone. The most complex components of the case study have behavior protocols with up to 60 events; such behavior protocols generate a state space with hundreds of thousands of states. The typical errors encountered during the development of such components then generate error traces of about 100 states in length. However there were also some error states that generated error traces with several hundreds of states. It then took the developer about an hour (often even more) to identify the actual error in case only a plain error trace was available. The checking output techniques presented in Sect. 3 have been developed to improve debugging efficiency. During the further development of our case study application, the developers used a combination of these techniques and an average time to resolve a typical error shortened down to one third or one forth of the original time. As for the plain error trace checking output, a problem is the existence of ”local loops” in behavior of a component. Typically, with respect to the other parts of the system, the actual number of local loop traversals is of no significance in terms of an error localization. These loops lengthen the error trace, making it more complex and hard to analyze. Apparently, if loops 273
Jezek, Kofron and Plasil
are nested, the situation is even worse. A desire is to eliminate those of ”no influence” on the rest of the system. This is a challenging problem - currently, only the highest-level loops are identified and eliminated in an automated way. Annotated protocols are very similar to the approach used in Bandera Toolset [10] and PREfast [3] since they are based on emphasizing of the positions in the input protocols where a composition error has been found. Unlike in Bandera and PREfast, in behavior protocols the positions between two operations are highlighted to denote an error state.
5
Related Work
In [23], the authors address the counterexample complexity and interpretation problem by proposing a method for finding ”positives” and ”negatives” as sets of related correct traces and error traces. An interesting approach is chosen in [21], where the authors analyze the complexity of error explanation via constructing the ”closest” correct trace to a specific error trace. In [24], the authors describe an algorithm (”delta debugging”) for finding a minimal test case identifying an error in a program. This idea could be used to modify an error trace in order to find a ”close enough” correct one. An optimization of the checking process is described in [22] where multiple error traces are generated in a single checking run. Static Driver Verifier (SDV) [6] is a tool used to verify correct behavior of WDM (Windows Driver Model) [8] drivers. The driver’s source code in C and the model written in SLIC (a part of the SLAM project [7]) are combined into a ”boolean” program that is maximally simplified and selected rules are checked. If a rule is violated, an error trace of the program is generated and mapped back to the driver’s C source code. Because WDM drivers are very complex, to make checking feasible, both the Windows kernel model and the rules used in the SDV have to be simplified. Thus the error traces generated by SDV are relatively short and easy to interpret. And, since they contain also the states corresponding to traversing through the kernel model, such parts are optionally hidden in the checking output. This solution might be also applicable to our plain error traces (Sect. 3.1): The events generated inside a method call could be grouped into the ”background” (Sect. 2.3.3). However, because it is not easy to identify the beginning and the end of a single method call in error trace (especially when the i.m{...} shortcuts are not used), employing this idea in the behavior protocol checker is not a trivial task. As to the classical model checker SPIN [9], in case of violating of checking property specified in LTL, Spin allows traversing the trace to the error state while watching the variable values, process communication graph, and highlighted source code. Sometimes the error trace length makes this approach very hard to use and identification of the actual problem may be quite challenging. Although the approaches to ease the interpretation of an error trace 274
Jezek, Kofron and Plasil
in SPIN work well in most cases, its modelling language Promela [9] is not a suitable specifying software components. Since such specification in Promela typically yields a large state space impossible to traverse in a reasonable time. As for other tools, Java PathFinder (JPF) [11], Bogor [17], BLAST [18], SMV [12], Moped [19], and MAGIC [20] cope with counterexamples and all provide them as error traces. Specifically, JPF, Bogor, BLAST, Moped, and MAGIC print the sequence of steps leading to an error state annotated by a corresponding line of the source code, while the SMV tool provides an error trace consisting of the input file lines written in the SMV specification language. Moped is a similar to SDV in the sense that it first translates the input program (in Java) into the language of LTL in which the counterexamples are generated. They are then translated back to the input language. The MAGIC tool checks behavior of a C program against a specification described via an LTS. Besides an error trace, it can also generate control flow graphs and LTSs using the dot tool of GraphViz package [13] (also used by the behavior protocol checker). In all cases, but especially in the case of JPF, the error trace may get quite complex and not easy to interpret.
6
Conclusion and Future Work
During the work on the project (Sect. 2.3.1) it has turned out that, besides plain error trace, additional checking outputs are needed for speeding up error detecting and debugging process. Therefore, we introduced two more approaches: (i) state space visualization, and (ii) annotated protocols. Using all the three methods in combination was found most beneficial (locating an error was then more efficient (Sect. 4)). Problems arise when checking the composition/compliance of several components described by really complex behavior protocols. The large state space generated by such a protocol causes that an error trace is typically very long and hard to interpret. Still, in our view, this is worth to pursue since we believe that the components’ compatibility problem cannot be restricted to the syntactic/type compatibility of their (bounded) interfaces [1], even though this could be checked with much smaller effort and would avoid the problems discussed in this paper; in fact, we can hardly imagine putting together a non-trivial component-based application of the size mentioned in Sect. 2.3.1, if the compliance checks were based only on syntactic/type compatibility of individual interfaces. Our future work is therefore focused on improving the methods currently used by the behavior protocol checker; in particular, a method for automated removing of unnecessary ”local loops” (Sect. 4) would further simplify the plain error trace checking output. As for state space visualization, an automated method for detecting the ”important” part of the state space (currently done by hand) is needed to simplify the resulting graphical representation of an error trace. 275
Jezek, Kofron and Plasil
Similar to Bandera [10] and PREfast [3], the possibility to dynamically indicate the correspondence between a particular position in an error trace and the associated part of the protocol would perhaps further ease and speed up the debugging process.
References [1] F. Plasil, S. Visnovsky, “Behavior Protocols for Software Components,” IEEE Transactions on Software Engineering, vol. 28, no. 11, Nov 2002 [2] J. Adamek, F. Plasil, “Component Composition Errors and Update Atomicity: Static Analysis,” Journal of Software Maintenance and Evolution: Research and Practice, vol. 17, no. 4, John Wiley, 2005 [3] PREfast — http://www.microsoft.com/whdc/devtools/tools/PREfast.mspx [4] E. Bruneton, T. Coupaye, M. Leclerc, V. Quema, J-B. Stefani. An Open Component Model and Its Support in Java. 7th SIGSOFT International Symposimum on Component-Based Software Engineering (CBSE7), LNCS 3054, Edinburgh, Scotland, May 2004. [5] J. Magee, J. Kramer, “Concurrency: State models & Java programs,” John Wiley & Sons Ltd, ISBN 0-471-98710-7, 1999 [6] SDV — http://www.microsoft.com/whdc/devtools/tools/SDV.mspx [7] T. Ball, S. K. Rajamani, “The SLAM Project: Debugging System Software via Static Analysis,” POPL 2002, ACM, Jan 2002 [8] WDM — http://www.microsoft.com/whdc/archive/wdm.mspx [9] Spin, Promela — http://spinroot.com/spin [10] Bandera — http://bandera.projects.cis.ksu.edu [11] Java Pathfinder — http://javapathfinder.sourceforge.net [12] SMV — http://www-2.cs.cmu.edu/ modelcheck/smv.html [13] GraphViz — http://www.research.att.com/sw/tools/graphviz [14] Mach, M., Plasil, F., Kofron, J., “Behavior Protocol Verification: Fighting State Explosion,” International Journal of Computer and Information Science, ACIS vol. 6, no. 1, Mar 2005 [15] Wright — http://www-2.cs.cmu.edu/ able/wright [16] SOFA — http://sofa.objectweb.org [17] Bogor — http://bogor.projects.cis.ksu.edu [18] BLAST — http://www-cad.eecs.berkeley.edu/ blast
276
Jezek, Kofron and Plasil
[19] Moped — http://www.fmi.uni-stuttgart.de/szs/tools/moped [20] MAGIC — http://www-2.cs.cmu.edu/˜chaki/magic [21] N. Kumar, V. Kumar, M. Viswanathan, “On the Complexity of Error Explanation,” VMCAI’05, ACM, 2005 [22] Ball, T., Naik, M., Rajamani, S., “From symptom to cause: Localizing errors in counterexample traces,” Proceedings of POPL 2003, ACM, 2003 [23] Groce, A., Visser, W., “What went wrong: Explaining counterexamples,” Proceedings of the SPIN Workshop on Model Checking of Software, LNCS 2648, Springer, 2003 [24] Zeller, A., “Isolating cause-effect chains for computer programs,” Proceedings of FSE 2002, ACM 2002
277
Jezek, Kofron and Plasil
278
FACS 2005
Towards an Automated Deployment Planner for Composition of Web Services as Software Components Abbas Heydar Noori 1 School of Computer Science University of Waterloo, Canada
Farhad Mavaddat 2 School of Computer Science University of Waterloo, Canada
Farhad Arbab 3 Department of Software Engineering Centrum voor Wiskunde en Informatica (CWI), The Netherlands
Abstract In this paper, we present our work-in-progress on developing an automated deployment planner for the composition of Web services as software components using the Reo coordination middleware in a distributed environment. Web services refer to accessing services over the Web. Reo is an exogenous coordination model for compositional construction of component connectors based on a calculus of mobile channels that has been developed in CWI (the Netherlands). Reo has a strong theoretical underpinning which makes it a good candidate model for coordinating the work of Web services participating in a composition. Suppose a new Web application has been developed by composing a number of Web services with different requirements and constraints. To run the application, it is required to deploy it on a number of hosts with different computational capabilities available to the application in the distributed environment (e.g., Internet) so that all constraints and requirements are satisfied. Because of the many parameters and constraints in such a deployment problem, it is difficult to do it manually. Thus, an automated deployment planner is required for this purpose. Key words: software components, software deployment, Reo coordination model.
This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs
Heydar Noori, Mavaddat and Arbab
1
Introduction
The Internet is rapidly changing from a set of wires and switches that carry packets into a sophisticated infrastructure that delivers a set of complex valueadded services to end users [1]. The term “Web service” came into being to represent a unit of business logic that an organization exposes to other organizations on the World Wide Web. Web services are gradually becoming the most popular distributed computing paradigm for the Internet [2]. Advances in Internet infrastructure and rapid evolution of the WWW are major enablers of Web services. Web services can be stand-alone or linked together to provide enhanced functionality. In other words, Web services are inter-operable building blocks for constructing applications. Therefore, the composition of Web services is an important issue and a general coordination model for composing Web services is required. The Reo coordination model is a good candidate for serving this purpose [3]. Reo presents a paradigm for composition and exogenous coordination of software components based on the notion of mobile channels. In the Reo model, complex coordinators, called connectors, are compositionally built out of simpler ones. Reo has a strong theoretical underpinning and its logic is mathematically modeled. In [4] one can find a coalgebric formal semantics for Reo. Reo promotes loose coupling, distribution, mobility, exogenous coordination, and dynamic reconfiguration. These properties make Reo a suitable candidate model for composing Web services in a distributed environment. Suppose a distributed application has been developed which utilizes a number of Web services with different requirements and constraints. In addition, the Reo coordination middleware is used to coordinate the work of these Web services together. In this method, Web services are viewed as black box software components, i.e., there are no concerns regarding their developments and internals. To run the application, it is required to instantiate different components of the application on different hosts with different computational capabilities available to the application in the distributed environment. Furthermore, this should be done in such a way that all requirements and constraints are met. This process is called software deployment. For large applications which consist of many components with many constraints and should be distributed on a large number of hosts with different characteristics, manual deployment is impractical. Furthermore, users may have specific requirements in such a deployment. For example, they may want a specific component to be instantiated on a specific host. This complicates the problem even more. Thus, an automated deployment planner is required to effectively specify where different components should be instantiated in the distributed environment. In this paper, our work-in-progress on developing 1 2 3
Email:
[email protected] Email:
[email protected] Email:
[email protected]
280
Heydar Noori, Mavaddat and Arbab
such an automated deployment planner is described. This paper is organized as follows. In section 2, the required background of this research is presented. In this section, software components, Web services, and the Reo coordination model are briefly described. In section 3, developing a deployment planner for Web services compositions using the Reo coordination middleware is discussed. Finally, in section 4, concluding remarks are provided.
2
Background
The following topics form the background of our work: software components, Web services, and the Reo coordination model. In this section, we describe them. 2.1 Software Components Making a system out of existing components is a common approach used in many engineering disciplines. The success of this approach in other engineering disciplines encouraged software engineers to use this idea in software design too, resulted in component-based software development (CBSD) methodologies. Two main reasons can be considered for this: (1) many software systems include similar or identical components and there is no need to redevelop them from scratch, and (2) because of the increasing complexity of software systems, it is becoming too expensive to develop them from scratch. Unfortunately, there is no unified definition of software components and one can find many different definitions for it in literature [5,6,7]. For this reason, we refrain from providing an extensive definition of software components. Instead, some of their properties are mentioned here. In most of the existing definitions of software components, a software component exhibits the following properties [8]: •
It is a unit of software implementation that can be reused in different applications;
•
It has one or more predefined interfaces;
•
The internal details of the software component are hidden;
•
It does a specific function.
2.2 Web Services As the term “Web service” shows, it refers to accessing services over the World Wide Web. According to the definition of Web services by IBM [9], “Web services are a new breed of Web application. They are self-contained, selfdescribing, modular applications that can be published, located, and invoked across the Web. Web services perform functions, which can be anything from simple requests to complicated business processes”. 281
Heydar Noori, Mavaddat and Arbab
Fig. 1. Web services architecture
As this definition shows, Web services can be seen as the building blocks for constructing distributed Web applications. A significant difference between the Web services model and other existing models such as CORBA/IIOP, COM/DCOM, and Java/RMI is that Web services can be written in any language and can be accessed using the HTTP protocol. In other words, in the Web services computing model, distributed software components are interfaced via non-object-specific protocols [10]. Fig. 1 shows the layered architecture of a Web service. These layers together form a standard mechanism for describing, discovering, and invoking the service provided by a stand-alone Web service. In the following, these layers are shortly described: •
Transport Layer: At the bottom of the layered architecture model, any of the standard Internet protocols may be used to transport invocations of Web services over the network.
•
Communication Layer: By using the Simple Object Access Protocol (SOAP), the exchange of information happens at this layer. SOAP provides a number of conventions about how to structure an XML message [11], so that both sides have a common understanding of the structure of a message.
•
Service Interface Description Layer: This layer provides all of the necessary information for an application to access the specified service. For this purpose, Web Services Description Language (WSDL) is used. WSDL is an XML-based language.
•
Service Discovery Layer: This layer provides a way for publishing information about Web services, as well as a mechanism for discovering which Web services are available. The Universal Description, Discovery, and Integration (UDDI) specification is used in this layer for these purposes. 282
Heydar Noori, Mavaddat and Arbab
2.2.1 Viewing Web Services as Software Components As mentioned in section 2.1, the aim of component-based software development is to make a new system by composing and integrating existing software components together. This method of software development has many advantages and it has been proposed as a paradigm for developing more reliable and higher-quality software systems within shorter development time, lower cost and effort [12]. Because of the many advantages of CBSD, there is a widespread belief that one day developers could easily assemble applications from prebuilt components instead of writing them from scratch [13]. As mentioned earlier, Web services are self-contained, self describing modular units providing location independent business or technical services that can be published, located, and invoked across the Web. Thus, one can view them as a natural extension of software component thinking. Web services, as software components, represent black box functionalities that can be reused without worrying about how those services are implemented, or where they are situated [14]. In other words, they can be used as the building blocks for the development of complex distributed applications. In [15], a Web service component infrastructure can be found. In that infrastructure, SOAP enables communication between different application parts, which are components belonging to different component models. WSDL focuses on the definition of component interfaces. Finally, UDDI serves as the component registry.
2.3 Reo Coordination Model Reo is a channel-based coordination model that exogenously coordinates the cooperative behavior of component instances in a component-based application [3]. From the point of view of Reo, an application consists of a number of component instances communicating through connectors that coordinate their activities. The emphasis of Reo is on connectors, their composition and their behavior. Reo does not say much about the components whose activities it coordinates. In Reo, connectors are compositionally constructed out of a set of simple channels. Thus, channels represent atomic connectors. A channel is a communication medium which has exactly two channel ends. A channel end is either a source channel end or a sink channel end. A source channel end accepts data into its channel. A sink channel end dispenses data out of its channel. Although every channel has exactly two ends, these ends can be of the same or different types (two sources, two sinks, or one source and one sink). Reo assumes the availability of an arbitrary set of channel types, each with well-defined behavior provided by the user. However, a set of examples in [3] show that exogenous coordination protocols that can be expressed as regular expressions over I/O operations correspond to Reo connectors which are composed out of a small set of only five primitive channel types: 283
Heydar Noori, Mavaddat and Arbab
Fig. 2. Barrier synchronization connector in Reo •
Sync: It has a source and a sink. Writing a value succeeds on the source of a Sync channel if and only if taking of that value succeeds at the same time on its sink.
•
LossySync: It has a source and a sink. The source always accepts all data items. If the sink does not have a pending read or take operation, the LossySync loses the data item; otherwise the channel behaves as a Sync channel.
•
SyncDrain: It has two sources. Writing a value succeeds on one of the sources of a SyncDrain channel if and only if writing a value succeeds on the other source. All data items written to this channel are lost.
•
AsyncDrain: This channel type is analogous to SyncDrain except that the two operations on its two source ends never succeed simultaneously. All data items written to this channel are lost.
•
FIFO1: It has a source and a sink and a channel buffer capacity of one data item. If the buffer is empty, the source channel end accepts a data item and its write operation succeeds. The accepted data item is kept in the internal buffer. The appropriate operation on the sink channel end (read or take) obtains the content of the buffer.
In Reo, a connector is represented as a graph of nodes and edges such that: zero or more channel ends coincide on every node; every channel end coincides on exactly one node; and an edge exists between two (not necessarily distinct) nodes if and only if there exists a channel whose channel ends coincide on those nodes. As an example of Reo connectors, Fig. 2 shows a barrier synchronization connector in Reo. In this connector, a data item passes from a to d only simultaneously with the passing of a data item from g to j and vice versa. This is because of the “replication on write” property in Reo, and different characteristics of different channel types. 2.3.1 An Example of Composing Web Services Using Reo In the following, we provide a simple example of how a Reo connector such as barrier synchronization can be used to compose a number of Web services together. Suppose a travel agency wants to offer a Flight Reservation Service (FRS). 284
Heydar Noori, Mavaddat and Arbab
Fig. 3. Modeling the flight reservation system with Reo
For some destinations, a connection flight might be required. For example, if you want to travel from Toronto to Glasgow, you need to travel from Toronto to London. Then, you need to travel from London to Glasgow. Suppose some other agencies offer services for International Flight Reservation (IFRS) and Domestic Flight Reservation (DFRS). Thus, FRS commits successfully whenever both IFRS and DFRS services commit successfully. This behavior can be easily modeled by a barrier synchronization connector in Reo (Fig. 3). The FRS service makes commit requests on channel ends A and B. These commits will succeed if and only if the reservations at the IFRS and DFRS services succeed at the same time. This example shows how Reo succeeds in modeling complex behaviors. In Reo, it is easily possible to construct different connectors by a set of simple composition rules out of a very small set of primitive channel types. One can find a more elaborate introduction to Reo in [16], and a detailed description of the language and its model in [3].
3
Developing A Deployment Planner
In the previous section, we described the Reo coordination model and introduced it as a good candidate model for composing and coordinating Web services. For more information on compositional construction of Web services using the Reo coordination model, you can refer to [17] where the requirements of Reo-enabled Web services are discussed. That paper shows the necessary layers between Web services and the Reo coordination middleware which are necessary for composing them together using Reo connectors. In this section, another aspect of this problem is considered and our ongoing work on developing a deployment planner for the composition of Reo-enabled Web services is introduced. We begin with a description of the software deployment process. 3.1 Software Deployment Process Software deployment is a sequence of related activities for placing a developed application into its target environment and making the application available for use. Though this definition of software deployment is reasonable and clear, for developing an automated deployment planner, the characteristics and nature of the deployment activities must be described more clearly. Different sequences of activities are mentioned in literature for the software 285
Heydar Noori, Mavaddat and Arbab
deployment process [18,19]. However, in our view, the software deployment process should include at least the following activities: Acquiring, Planning, Installation, Configuration, and Execution. Below are brief descriptions of these activities: •
Acquiring: In this activity, the components of the application being deployed and the metadata specifying the application are acquired from the software producer and are put in a repository to be used by other activities of the deployment process.
•
Planning: Given the specifications of the component-based application, a target environment, and user-defined constraints, this activity determines where different components of the application will be executed in the target environment, resulting in a deployment plan.
•
Installation: This activity uses the deployment plan generated in the previous activity to install the application into the target environment. More specifically, this activity transfers the components of the application from the repository to the hosts in the target environment.
•
Configuration: After installing the application components into the target environment, it might be necessary to modify its settings and configurations. For example, after installing an application, one may want to set different welcome messages for different users.
•
Execution: following the installation and configuration of the software application, it can be run. More specifically, the installed application components into the hosts are launched, the interconnections among them are instantiated, the components are connected to the interconnections, and the software application actually starts to work.
In this paper, our focus is on the planning stage of this process. For large, complex applications similar to Web applications being considered in this paper, users should not be required to manually deploy a large number of different components with different properties on different hosts with different constraints in a distributed environment. Therefore, this should be as automated as possible. In the following section, we talk about this automated deployment planner in more detail. 3.2 Deployment Planner In the previous section, we defined the process of software deployment in general and motivated the need to develop an automated deployment planner for deploying large, complex, component-based applications into distributed environments. In this section, we describe an automated deployment planner in the context of Web services compositions using the Reo coordination middleware. Suppose the specification of the Web application to be deployed is given. In this application, a number of Web services are composed together by us286
Heydar Noori, Mavaddat and Arbab
ing a Reo circuit. Thus, this specification specifies these Web services, their requirements and constraints, and the Reo circuit used among them. The implementations of these Web services and their internals are not important and they are viewed as black box software components. Furthermore, this specification describes the Reo circuit by specifying the nodes of the Reo circuit, channels among these nodes and their types, and each Web service is connected to which node. In addition to this specification, the specification of the available resources in the distributed environment is given. This specification describes a number of hosts and their computational capabilities. The computational capabilities of these hosts are different implementations of Reo channels they can support. In this level of abstraction, low level hardware parameters as CPU speed, memory, disk, etc. are not important. The reason is that we wish to focus on software abstraction and not hardware abstraction. As an example of computational capabilities, suppose host A can support three different implementations of the Reo’s Sync channel. Logically, they are all implementations of the Sync channel, but their requirements, costs, and speeds are different. Similarly, different channel types have different implementations on different hosts. Furthermore, users should be able to specify their constraints and requirements regarding the deployment of the application. Some examples of these requirements and constraints are certain Web services on certain hosts, certain quality of service (QoS) requirements like cost, speed, and so on. The deployment planner uses these specifications as input and generates the specification of a deployment plan. In this deployment plan, different pieces of the application (Web services and Reo nodes) are mapped to the available resources subject to the given constraints. In other words, this deployment plan specifies each of the Web services and nodes of the Reo circuit should run where in the target environment. In the following section, different issues that should be considered in developing such a deployment planner are discussed. 3.3 Challenges in Developing a Deployment Planner In the previous sections, we provided the problem definition of our ongoing work on developing an automated deployment planner for Web applications. In this section, we present different aspects of this problem and provide a list of the sub-problems we have to cope with in order to solve the whole problem. One of the important sub-problems that should be considered is related to resource allocation. The deployment planner is supposed to optimally allocate resources available at different hosts to accommodate the requirements and constraints of the application. So, generating such a deployment plan becomes a constraint satisfaction problem and so, it should be possible to develop a mathematical representation of that problem and then solve it. Generally, 287
Heydar Noori, Mavaddat and Arbab
finding the best solution for such problems that have many parameters is impossible. So, we should try to find the best possible solutions for them. For this purpose, a set of heuristics should be developed and applied to effectively solve such constraint satisfaction problems. Another important issue in a deployment is its quality. For any large, complex Web application multiple deployments in a distributed environment are typically possible. Obviously, some of those deployments are more effective than others in terms of some quality of service (QoS) requirements such as cost, reliability, speed, efficiency, and so on. Maximizing the QoS of a given system may require the system to be redeployed [20]. Thus, considering the issues related to QoS represents another important aspect of this project. Other issues relate to specification languages. As mentioned earlier, the specification of the Web application to be deployed and the specification of the distributed environment should be provided as inputs to the deployment planner. Thus, specification languages are required for this purpose. We name these languages Application Specification Language (ASL) and Resource Specification Language (RSL) respectively. ASL will be used to specify Web services utilized in the application, the Reo circuit used to compose them, and requirements of the application. RSL will be used to specify different hosts in the distributed environment available to the application, their computational capabilities, and their constraints. Furthermore, for generating deployment plans, a Deployment Specification Language (DSL) should be devised. The deployment planner will use this language to generate deployment plans. 3.4 A Graph-based Approach for Deployment Planning At the time of writing this paper, we have used a graph-based approach to solve the software deployment problem. For this purpose, two graphs are made in this approach: the Application Graph, and the Target Environment Graph. The application graph models a component-based application as a graph of components connected by different channel types. The target environment graph models the distributed environment as a graph of hosts connected by different channel types that can exist between every two hosts. In other words, before starting the deployment planning, the channel types that can exist between every two hosts in the target environment are specified. Then, the deployment planning of an application is defined as the mapping of its application graph to its target environment graph, subject to maximization of the desired QoS parameter. As an example of how such efficient algorithms and techniques can be applied to effectively solve the deployment problem, in the following, we talk about finding the most cost-effective deployment configuration. Suppose different hosts in the target environment have different costs and whenever they are being used, their costs should be paid to their administrator(s). In this situation, the most cost-effective deployment should be 288
Heydar Noori, Mavaddat and Arbab
found. For this purpose, a collection of available hosts in the distributed environment must be selected so that the total cost of the deployment is minimal and all components are also assigned to a host. It is easily possible to prove that this problem is equivalent to the Minimum Set Cover problem in graph theory [21]. Definition 3.1 (Minimum Set Cover Problem) Given a finite set U of n elements, a collection of subsets of U, S = {s1 , s2 , ..., sk } such that every element of U belongs to at least one si , and a cost function c : S → R, the problem is to find a minimum cost sub-collection of S that covers all elements of U. The cost-effective deployment problem can be converted to the minimum set cover problem in the following way: •
Set U = {C1 , C2 , ..., Cn }, i.e., the components of the application are set as the elements of the universe;
•
Set S = {CSH1 , CSH2 , ..., CSHm } in which each CSHi corresponds to host Hi , and it represents the set of components of the application that can be run on host Hi .
•
Define c : S −→ R return the cost of each host.
However, it is proved that the minimum set cover problem is a NP-hard problem and it can not be solved in polynomial time [21]. But, there exist some greedy approximation algorithms that can find reasonably good answers in polynomial time [21]. Thus, to solve the cost-effective deployment problem, first it can be converted to the minimum set cover problem as mentioned above. Then, by using existing algorithms for solving the minimum set cover problem, all components of the application will be assigned to at least one host and the cost of the deployment will be close minimal too. 3.5 An Example of Software Deployment Consider again the example presented in section 2.3.1 and suppose the following required specifications for generating a deployment plan are given: •
Specification of the Web Application: This specifies the Web services and the Reo circuit being used in this application to compose these Web services together.
•
Specification of the Target Environment: As mentioned in section 3.2, the target environment is a set of hosts with computational capabilities connected by a computer network. Also, we mentioned that these computational capabilities are defined as different channel implementations that a host can support. In this example, we assume that available resources consist of five hosts. Hosts H1 , H2 , and H3 are those hosts that Web services are running on them. But there are two other hosts H4 and H5 available to the application and can be used for different purposes. All these hosts can support two 289
Heydar Noori, Mavaddat and Arbab
Fig. 4. A sample deployment for the flight reservation system
different implementations of the Sync channel: (1) Encrypted peer-to-peer connection, and (2) Simple peer-to-peer connection. But, just H4 and H5 can support SyncDrain channels. •
User-defined Constraints and Requirements: users may have special requirements and constraints that should be taken into account during the deployment process. In this example, we assume that users only want the transfer of data between FRS and IFRS to be encrypted.
Now that the required inputs are specified, it is time to use these inputs for deployment. In the following, the actions that must be done in each deployment activity are described: •
Acquiring: In this activity, the Web application to be deployed is acquired from the software developer and is kept inside a repository.
•
Planning: The deployment planner uses the above-mentioned inputs and generates a deployment plan. Fig. 4 shows one possible deployment for our example. As we see in this figure, the deployment plan maps different application components (Web services and Reo nodes) to the available hosts subject to the given constraints to provide the desired functionality. As mentioned earlier, it is possible to consider some QoS issues in generating deployment plans. For example, suppose deployment is done in the way shown in Fig. 4. Now, one see the network traffic between H4 and H5 would be too high and it might be beneficial to move both N1 and N2 nodes to a single host (e.g., H4 ) to gain a higher QoS. 290
Heydar Noori, Mavaddat and Arbab
Installation: After generating the deployment plan, it can be used to do the actual installation. In this activity, different application components are loaded into different hosts according to the deployment plan. In the case of this example, the following actions should be done during the installation activity: (i) Tools and libraries of Reo middleware are loaded into hosts H1 − H5 ; (ii) FRS, IFRS, and DFRS Web services are instantiated on hosts H1 , H2 , and H3 respectively; (iii) Different channels of the barrier synchronization connector are created and the identities of their ends are moved to their respective hosts according to the deployment plan; (iv) Different Web services are connected to the respective Reo nodes. •
•
Configuration: After installing different application components in the target environment, it might be possible to change some configurations and settings of the application to provide the desired functionality. In this example, only travel agency operators can work with the application. So, some user accounts should be defined for this purpose and the application should be configured so that unauthorized users can not work with the application.
•
Execution: After configuring the application, it is time to execute the application. In addition, it might be possible to monitor the QoS of the application to see whether or not the application performs well. If not, it might be required to redeploy it.
4
Conclusions
The aim of software deployment process is to bring a developed application into its target environment and make it available for use. For large, complex, component-based applications that include many components with different requirements and should be distributed in many hosts with different computational capabilities, manual deployment is not easy, and automated tools are required for this purpose. In this paper, we presented our ongoing work on developing an automated deployment planner for Web services applications using the Reo coordination middleware. The strong formal basis of Reo and its easy-to-use composition rules encouraged us to choose Reo as the coordination model for Web services compositions. In this method, Web services are treated as black box software components. For the given specifications of a Web application to be deployed, available resources in the distributed environment, and user-defined constraints, a deployment plan specifies each of the components of the application should run on which of the hosts to instantiate a running Web application so that all requirements and constraints are met. 291
Heydar Noori, Mavaddat and Arbab
Acknowledgment The authors would like to acknowledge Dr. Nikolay Diakov for his helpful comments in developing the ideas expressed in this paper.
References [1] Chandra, P., Fisher, A., Kosak, C., Ng, T. S. E., Steenkiste, P., Takahashi E. and Zhang, H., “Darwin: Customizable resource management for value-added network services,” In Proceedings of the 6th IEEE International Conference on Network Protocols, Oct. 1998. [2] Gergic, J., Kleindienst, J., Despotopoulos, Y., Soldatos, J., Patikis, G., Anagnostou, A. and Polymenakos, L., “An Approach to lightweight deployment of web services,” In Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering 2002 (SEKE 2002), ACM Press, 635640. [3] Arbab, F. “Reo: A Channel-based Coordination Model for Component Composition,” Mathematical Structures in Computer Science, 14, 3 (June 2004), 329-366. [4] Arbab, F. and Rutten, J.J.M.M. “A Coinductive Calculus of Component Connectors,” In Proceedings of 16th International Workshop on Algebraic Development Techniques (WADT 2002), LNCS 2755, Springer-Verlag, 2003, 3556. [5] Reekie, H. J. and Lee, E. A., “Lightweight Component Models for Embedded Systems,” Technical report, Electronics Research Laboratory, University of California at Berkeley, UCB ERL M02/30, October 2002. [6] Szyperski, C., “Component Software-Beyond Object-Oriented Programming,” Addison-Wesley/ACM Press, 1999. [7] Kozaczynski, W., “Composite Nature of Component,” In Proceedings of the 1999 International Workshop on Component-based Software Engineering, May 1999, 73-77. [8] Orso, A., Harrold, M.J., and Rosenblum, D. S., “Component Metadata for Software Engineering Tasks,” In Proceedings of the 2nd International Workshop on Engineering Distributed Objects (EDO 2000) (Davis, CA, USA, November 2000), LNCS 1999, Springer-Verlag, 126-140. [9] “Web services: the Web’s next revolution,” https://www6.software.ibm.com/developerworks/education/wsbasics/wsbasicsa4.pdf, Last visited: Sep. 30, 2005. [10] “Introduction to Web Services,” http://www.embedded.com/story/OEG20020125S0103, Last visited: Sep. 30, 2005.
292
Heydar Noori, Mavaddat and Arbab
[11] “Extensible Markup Language (XML),” http://www.w3.org/XML/, Last visited: Sep. 30, 2005. [12] Heydar Noori, A., and Mavaddat, F., “On Software Components Characterization and Specification,” In Proceedings of the 9th Annual International CSI Computer Conference, Tehran, Iran, 2004. [13] “WEB SERVICES: FOOD FOR THOUGHT,” http://www.cutter.com/webservices/wss0209.html, Last visited: Sep. 30, 2005. [14] Stojanovic, Z., Dahanayake, A. and Sol, H., “Agile Modeling and Design of Service-Oriented Component Architecture,” In Proceedings of the 1st European Workshop on Object-Orientation and Web Services at ECOOP 2003, Darmstadt, Germany, July 21-25, 2003. [15] Curbera, F., Mukhi, N., and Weerawarana, S., “On the Emergence of a Web Services Component Model,” In Proceedings of the WCOP 2001 Workshop at ECOOP 2001, Budapest, Hungary, June 2001. [16] Arbab, F. and Mavaddat, F., “Coordination through channel composition,” In Proceedings of the 5th International Conference on Coordination Models and Languages (Coordination 2002), LNCS 2315, Springer-Verlag, 21-38. [17] Diakov, N. K. and Arbab, F., “Compositional Construction of Web Services Using Reo,” In Proceedings of the 2nd International Workshop on Web Services: Modeling, Architecture and Infrastructure (WSMAI’2004) (Porto, Portugal, April 13-14, 2004). INSTICC Press, 2004, 49-58. [18] Carzaniga, A., Fuggetta, A., Hall, R. S., Hoek, A. V. D., Heimbigner, D., Wolf, A. L., “A Characterization Framework for Software Deployment Technologies,” Technical Report CU-CS-857-98, Dept. of Computer Science, University of Colorado, April 1998. [19] Object Management Group, “Deployment and Configuration of Componentbased Distributed Applications Specification,” http://www.omg.org/docs/ptc/04-05-15.pdf, Last visited: Sep. 30, 2005. [20] Mikic-Rakic, M., Malek, S., Beckman, N. and Medvidovic, N., “A Tailorable Environment for Assessing the Quality of Deployment Architectures in Highly Distributed Settings,” In Proceedings of the 2nd International Working Conference on Component Deployment (CD 2004), Edinburgh, UK, May 2004. [21] Cormen, T. H., Leiserson, C.E., Rivest, R.L., and Stein, C. “Introduction to Algorithms”, Second edition, MIT Press, 2001.
293