Artificial Intelligence and Knowledge Engineering

3 downloads 71942 Views 3MB Size Report
It means that enterprise modeling and user requirements engineering stages of information system ..... heavyweight metamodeling, the developer can create a full-featured DSL, based on a UML profile and its customization. ...... Apple, RIM;…
KNOWLEDGE SUBSYSTEM’S INTEGRATION INTO MDA BASED FORWARD AND REVERSE IS ENGINEERING Audrius Lopata1,2, Martas Ambraziunas3 1

Kaunas University of Technology, Department of Information Systems, Studentu st. 50, Kaunas, Lithuania, [email protected] 2 Vilnius University, Kaunas Faculty of Humanities, Muitines St. 8, Kaunas, Lithuania, [email protected] 3 Valdoware Inc., [email protected] Abstract. In 2001 OMG presented MDA (Model Driven Architecture) approach which specifies the appliance of system models in the software development life cycle. Improvement of MDA by Enterprise Knowledge subsystem which composition is based on the best practices of the enterprise modeling standards will reduce risk of project failures caused by inconsistent user requirements and insufficient problem domain knowledge verification against Enterprise Meta-Model internal structure. Proposal of such MDA improvement by Knowledge-Based subsystem is discussed in this article. Keywords: Enterprise Knowledge-Based Information System Engineering, Model Driven Architecture, Enterprise Model, Enterprise Meta-Model.

1

Introduction

The majority of IT project failures (about 68% [2]) are caused by inconsistent user requirements and insufficient problem domain analysis. Although new methods of information systems engineering (ISE) are being researched and developed, they are empirical in nature: the project models repository of CASE system is composed on the basis of enterprise problem domain. The problem domain knowledge acquisition process relies heavily on the analyst and user; therefore it is not clear whether the knowledge of the problem domain is adequate. The expert plays a pivotal role in the problem domain knowledge acquisition process, and few formalized methods of knowledge acquisition control are taken into consideration. The knowledge stored in repository of CASE tool is not verified through formalized criteria, thus it is necessary to use advanced data capture techniques that ensure iterative knowledge acquisition process during which missing or incorrect data elements are obtained and fixed according to the enterprise Meta- Model. Despite of existing tools and CASE systems, requirement analysis largely depends on the expertise of system analyst and the user. OMG provides Model Driven Architecture (MDA) approach to information systems engineering where MDA focuses on functional requirements and system architecture not on technical details only [4]. Model Driven Architecture allows long-term flexibility of implementation, integration, maintenance, testing and simulation. It means that enterprise modeling and user requirements engineering stages of information system engineering life cycle are not covered enough yet. There is lack of formalized problem domain knowledge management and user requirements acquisition techniques for composition and verification of computation independent model (CIM) specified in the MDA approach. This approach can be enhanced with knowledge subsystem, which will ensure CIM verification against formal criteria defined by Enterprise MetaModel (EMM). Exist various standards like CEN EN 12204[5], CEN EN 40003 (CIMOSA)[6], UEML[7], PSM which specifies requirements for EMM internal structure. EMM provides components for construction of Enterprise Model (EM) like: function, activity, process, resource, actor, goal, business rules etc. Improvement of MDA by knowledge subsystem which composition is based on best practices of the standards mentioned above will reduce risk of project failures caused by inconsistent user requirements and insufficient problem domain knowledge verification against EMM internal structure. Proposal of such MDA improvement by Knowledge Subsystem’s integration into MDA based forward and reverse IS engineering is discussed in this article.

2

Knowledge- Based MDA approach

Most of MDA related techniques are based on empirically collected problem domain knowledge thus negative influences validation of user requirements specification against actual customer needs. In some cases, user requirements do not correspond to formal business process definition criteria, which have a negative impact on the next stage of information system engineering process. This problem can be solved by implementing Control theory [8] based Knowledge subsystem (which includes EMM and EM) to particular MDA techniques.

- 205 -

2.1

Knowledge-Based information systems engineering The Enterprise Knowledge-Based subsystem consists of two parts: the Enterprise Meta–Model (EMM) and the Enterprise Model (EM). The EMM regulates the formation order of the EM. The EMM defines the composition of computerized problem domain knowledge, which is necessary for creating project models and generating the programming code. The problem domain EM is formed by the user and the analyst according to EMM constraints. In order to solve this problem a particular method [9] has been developed in Kaunas University of Technology, Department of Information Systems. It is based on Control theory and best practices of the UEML, ENV 12204, ENV 40003, WFMC TC00–1003 standards. The conceptual scheme of Knowledge-Based subsystem integration in ISE lifecycle is presented in Figure 1.

Figure 1. Role of Knowledge- Based subsystem in ISE life cycle

2.2

MDA Approach In 2001 OMG presented MDA (Model Driven Architecture) approach which specifies the appliance of system models in the software development life cycle. A model of a system is a description or specification of that system and its environment for some certain purpose. A model is often presented as a combination of drawings and text [4]. The main concept of MDA is to separate the specification of system functionality from the specification of the implementation of that functionality on a specific technology platform [4] (“What” to do from “How” to do). Conceptual MDA structure is presented in Figure 2.

Figure 2. Conceptual MDA structure

OMG defines the following key points of MDA: • Definition of Computation Independent Model (CIM) which specifies system requirements of a particular problem domain (it can also be named Business Model); • Transformation of CIM to Platform Independent Model (PIM). User requirements specifications will be converted to systems architecture components and functionality methods during this process; - 206 -



Transformation of PIM to Platform Specific Model (PSM) where abstract system model (“What” to do) is upgraded with targeted platform specific information (“How” to do). PIM provides system’s architecture and functionality without platform specific information and technical details. PSM is constructed on the basis of PIM enhancing it with platform specific details, i. e. implementation and deployment information. • Transformation of PSM to a particular platform programming (for example: Java, C# etc.) as well as to other artifacts, such as executable files, direct link libraries, user documentation etc. Furthermore, the above described transformations can be performed backward using reversed engineering. Three different techniques can be applied to perform the following transformations: • Manual: The system analyst creates and studies the composition of all types of the defined MDA models and manually performs all the necessary transformations. • Semi-Automatic: The system analyst uses analysis and design tools that allow perform model creation and transformation process more efficiently. • Automatic: The transformation tool completes the transformation process without the system analyst’s interference 2.3

Knowledge-Based MDA IS Engineering According to survey [1], leading MDA-Based ISE methodologies need improvement in the following areas: requirements engineering, CIM construction, system models validation and verification against problem domain processes as well as most of the methodologies [1] do not provide sufficient information which MDA tools are most efficient with a particular methodology. These problems can be solved by enhancing the MDA approach with Knowledge-Based subsystem, which is not MDA compatible currently. This subsystem is able to handle validation of EM against EMM procedure. EMM ensures completeness and consistency of EM, which is created on the basis of CIM (during forward engineering) or PIM (during reverse engineering). Conceptual MDA structure enhanced with Knowledge-Based subsystem is presented in Figure 3.

Figure 3. Conceptual MDA structure enhanced with Knowledge- Based subsystem

The main steps of forward and reverse MDA IS engineering enhanced by EM and EMM are discussed below. 2.3.1 Principles of Knowledge- Based MDA Forward IS Engineering EM construction requires formal CIM structure. Although the existing numerous techniques [3] describe CIM construction procedures, most of them are not formalized enough that influences negative impact on the EM constructs and composition. Use of modified workflow diagrams [10] can solve such shortcomings and properly support the suggested method. XMI compatible third party tools are able to use Knowledge- Based subsystem’s data for transformation between particular MDA models. It ensures the availability of wide range of development alternatives for MDA models transformations. The set of modified workflow models [11] can be used for CIM construction. When this model is constructed, iterative CIM based EM verification against EMM process is started and is repeated until all the incorrect or missing EM’s knowledge is updated and corresponds to internal structure of EMM. The process leads to the creation of the consistent EM, which will be realized as a relational or object oriented database. The next step is the transformation of EM to PIM. The result of this transformation conforms to XMI standard that third party tools can use this model for the next stages of MDA ISE life cycle (PSM and Code - 207 -

generation). Detailed workflow of forward engineering is presented in Fig. 4 as steps 4-10, which are described in Table 1. 2.3.2 Principles of Knowledge- Based MDA Reverse IS Engineering Reverse engineering starts as usual from Code (working software) transformation to PSM. This process is performed by transformation tool. Particular MDA compatible tool performs PSM to PIM transformation process, removing or transforming platform related constructs to higher abstraction (PIM) level in next step. Knowledge-Based subsystem handles PIM transformation to EM process. The final reverse engineering result is EM which is consistent with analyzed IS. At this point EM can be used for two main purposes: specification and analysis of information system architecture from Control Theory [8] view or improvement of existing IS by updating problem domain knowledge that will start forward engineering process. Detailed workflow of reverse engineering (steps 1-3) including forward engineering (steps 4-10) is presented in Fig. 4 as well as description in Table 1.

Figure 4. Main steps of Knowledge-Based MDA approach

The following types of actors are specified in Knowledge-Based MDA approach: system analyst, Knowledge- Based subsystem and transformation tool. By default model transformations are performed automatically without the system analyst’s interference. Detailed description of main steps of Knowledge-Based MDA approach is presented in Table 1.

- 208 -

Table 1. Detailed description of main steps of Knowledge- Based MDA approach STEP NAME

ACTOR

STEP DESCRIPTION

System analyst

Particular MDA tool performs transformation from programming code (as well as to other artifacts such as executable files, direct link libraries) to PSM Particular MDA tool performs PSM to PIM transformation process, removing or transforming platform related constructs to higher abstraction (PIM) level. Knowledge- Based subsystem transforms PIM (which basically consists of UML based models) to EM of particular problem domain. EM is verificated against Control Theory based EMM internal structure. Missing or incorrect EM data elements that do not correspond to EMM internal structure are determined during this step. System analyst evaluates Verification report and approves transformation from EM to PIM’ process in case of success or defines necessary actions in order to solve EM inconsistency against EMM internal structure issue in case of failure. Problem domain knowledge acquisition and CIM composition are performed at this step. This step is empirical in nature thus heavily depends on the system analyst’s experience and qualification. CIM using semi-automatic technique are transformed to EM.

8.Transformatio n from EM to PIM

KnowledgeBased subsystem

XMI standard compatible PIM is constructed according to EM knowledge. It ensures PIM conformation to EMM defined formal constraints.

9.Transformatio n from PIM to PSM

Transformation tool

10.Transformati on from PSM to CODE

Transformation tool

1. Code to PSM transformation

Transformation tool

2.PSM to PIM transformation

Transformation tool

3.PIM to EM transformation

KnowledgeBased subsystem

4.EM verification against EMM

KnowledgeBased subsystem

5.Analysis of Verification report 6.CIM construction for particular problem domain 7.EM construction from CIM

3

System analyst

System analyst

Particular MDA tool performs transformation from PIM to PSM, adding to PIM platform specific information. Particular MDA tool performs transformation from PSM to programming code as well as to other artifacts such as executable files, direct link libraries, user documentation etc.

RESULT PSM

PIM

EM

Verification report Identification of insufficient problem domain knowledge

CIM

EM

PIM

PSM

CODE

Conclusions

Knowledge-Based subsystem, which improves traditional MDA conception with best practices of problem domain knowledge and user requirements acquisition methods, is presented in this article. It ensures the problem domain knowledge verification against EMM internal structure. The EMM is intended to be formal structure and set of business rules aimed to integrate the domain knowledge for the IS engineering needs. The EMM is used as the “normalized” knowledge architecture to control the process of construction of an EM for the particular business domain. Some work in this area has already been done [12], [13], [14]. The EM is used as the main source of Enterprise knowledge for discussed MDA approach. Improvement of MDA by Knowledge-Based subsystem will reduce risk of project failures caused by inconsistent user requirements and insufficient problem domain knowledge as well as allows enhancement of existing system by using the reverse engineering principles.

References [1] [2] [3]

Asadi, M., Ramsin, R MDA-based Methodologies: An Analytic Survey. Proceedings of the 4th European conference on Model Driven Architecture: Foundations and Applications. pp. 419-431, Berlin (2008) Ellis K. The Impact of Business Requirements on the Success of Technology Projects. Benchmark, IAG Consulting (2008) Ambler, S W. Agile Modeling, http://www.agilemodeling.com/essays/inclusiveModels.htm - 209 -

[4] [5] [6] [7]

[8] [9] [10]

[11]

[12] [13]

[14]

OMG. MDA Guide Version 1.0.1, www.omg.com ENV 12 204. Advanced Manufacturing Technology Systems Architecture - Constructs for Enterprise Modelling. CEN TC 310/WG1 (1996). ENV 40 003. Computer Integrated Manufacturing Systems Architecture - Framework for Enterprise Modelling, CEN/CENELEC (1990). Vernadat F. UEML: Towards a Unified Enterprise modelling language. Proceedings of International Conference on Industrial Systems Design, Analysis and Management (MOSIM’01), Troyes, France, 2001-04-25/27, http://www.univ-troyes.fr/mosim01. Gupta, M.M., Sinha, N. K. Intelligent Control Systems: Theory and Applications. The Institute of Electrical and Electronic Engineers Inc., New York. 1996. Gudas S., Lopata A., Skersys T. Approach to Enterprise Modelling for Information Systems Engineering. INFORMATICA, Vol. 16, No. 2, Institute of Mathematics and Informatics, Vilnius, 2005, pp. 175-192., 2005 Lopata A. Gudas S. Enterprise model based computerized specification method of user functional requirements International conference 20th EURO mini conference „Continuous optimization and Knowledge-based Technologies“ (EuroOpt-2008), May 20-23, 2008, Neringa, Lithuania, p. 456-461 ISBN 978-9955-28-283-9 Lopata A. Gudas S. Workflow- Based Acquisition and Specification of Functional Requirements. Proceedings of 15th International Conference on Information and Software Technologies IT2009, Kaunas, Lithuania April 23-24. p. 417- 426 ISSN 2029-0020. Kapocius K., Butleris R. Repository for business rules based IS requirements. Informatica. ISSN 0868-4952. 2006, Vol. 17, no. 4. p. 503-518. Silingas D., Butleris R. UML-intensive framework for modeling software requirements. Information Technologies' 2008 : proceedings of the 14th International Conference on Information and Software Technologies, IT 2008, Kaunas, Lithuania, April 24-25, 2008 / Kaunas University of Technology. ISSN 2029-0020. 2008. p. 334-342. Gudas S., Pakalnickas E. Enterprise Mangement view based Specification of Buisness Components. Proceedings of 15-th international conference on Information and software technologies, IT’2009, Kaunas, Technologija, 2009, p. 417-426, 2009 ISSN 2029-0020

- 210 -

IMPLEMENTATION OF EXTENSIBLE FLOWCHARTING SOFTWARE USING MICROSOFT DSL TOOLS Mikas Binkis, Tomas Blazauskas Kaunas University of Technology, Department of Software Engineering, Studentu str. 50-101A, Kaunas, Lithuania, [email protected], [email protected] Abstract. Currently there are commercial and freeware tools that allow users to specify algorithms using visual flowcharting software. These tools are widely used for many purposes – from visual programming to specification of common actions, yet they usually can't be extended with additional elements and have limited execution capabilities. Our goal is to choose a flexible implementation platform and create extensible executable flowchart system that can be used to teach students programming. In this article we present a flowchart metamodel created with Microsoft DSL Tools – a graphical designer for domain models with a set of code generators. Keywords: flowcharting, domain specific languages, Microsoft DSL Tools, visual programming, MDA, extensible flowcharts.

1

Introduction

Flowchart is graphical representation of a process or the step-by-step solution of a problem, using suitably annotated geometric figures connected by flowlines for the purpose of designing or documenting a process or program [1]. Because of the rather simple and comprehensible notation, flowcharts are widely adopted as one of the means of teaching algorithms and programming. Although computer science uses a variety of flowcharts, new software, dedicated to flowcharting, have offered extensibility. Some programs have combined flowcharting capabilities with programming and allow users to transform their diagrams into specific programming code. Microsoft Visual Studio add-on Microsoft DSL Tools is an interesting case, since it allows to graphically specify the metamodel of a diagram (or in this case – flowchart) and create a model editor. This means that it is possible to modify flowchart models by adding custom elements, which extends the usability of flowcharts in almost every imaginable field. Another advantage of the DSL approach is that the modelling environment can constrain and validate a created model for the domain's semantics, something that is not possible with UML profiles [2]. The model editor, created with DSL, can be used to transform graphical notation into any type of programming code from simple XML notation to a working program, thus creating an executable flowchart solution. This gives an advantage over commonly used flowcharting software, that usually offers only limited code transformation options and does not provide other than default executable environment . In this article we will briefly review some of the existing flowcharting software, analyse main properties of domain specific languages and propose our prototype flowchart engine, based on Microsoft DSL Tools for Visual Studio 2008.

2

Background

2.1

Flowcharts in learning process The human ability to realize graphic representations faster than textual representations led to the idea of using graphical artifacts to describe the behaviour of algorithms to learners that have been identified as algorithm visualization [3]. Visual programs based on flowcharts allow students to visualize how programs work and develop algorithms in a more intuitive fashion. The flow model greatly reduces syntactic complexity, allowing students to focus on solving the problem instead of finding missing semicolons [4]. Flowcharts complement other learning methodologies. One of the examples is Algorithm Visualization using Serious Games (AVuSG) – an algorithm learning and visualization approach that uses serious computer games to teach algorithms [5]. Open source, simple structured flowcharts, presented in widely acceptable standard formats may also contribute to the expansion of collective creativity. It is an approach of creative activity that emerges from the collaboration and contribution of many individuals so that new forms of innovative and expressive art forms are produced collectively by individuals connected by the network [6]. This way flowcharts may be used as a learning medium to transfer and exchange both simple and complex algorithms or scenarios for various frameworks, that utilize flowcharts. - 211 -

2.2

Existing flowcharting solutions RAPTOR is an open source iconic programming environment, designed specifically to help students visualize classes and methods and limit syntactic complexity. RAPTOR programs are created visually using a combination of UML and flowcharts. The resulting programs can be executed visually within the environment and converted to Java [7]. The “Iconic Programmer” is an interactive tool that allows programs to be developed in the form of flowcharts through a graphical and menu-based interface. When complete (or at any point during development), the flowchart programs can be executed by stepping through the flowchart components one at a time. Each of these components represents a sequence, a branch, or a loop, so their execution is a completely accurate depiction of how a structured program operates. To solidify the concept that flowcharts are real programs, the developed flowcharts can also be converted into Java or Turing (present capability), or and other high-level language (easily extendable) [8]. Ionic Programmer supports input / output, selection, looping, and code generation, but does not support subprograms. The SFC (Structured Flow Chart) Editor is a graphical algorithm development tool for both beginning and advanced programmers. The SFC Editor differs from other flowchart creation software because its focus is on the design of flowcharts for structured programs; using a building block approach, the graphical components of a flowchart are automatically connected and structured pseudo-code is simultaneously generated for each flowchart. While SFC was originally designed as a tool for beginning to intermediate programmers, it has been used by students in upper level classes and by professional system designers [9]. Visual Logic [10]provides a minimal-syntax introduction to essential programming concepts including variables, input, assignment, output, conditions, loops, procedures, arrays and files. Like other flowchart editors, Visual Logic supports visual execution and stepping through the elements of a diagram. The language contains some built-in functions from Visual Basic, yet it does not support the creation of classes. 2.3

Domain specific languages Domain specific language (DSL) is a language designed to be useful for a delimited set of tasks, in contrast to general-purpose languages that are supposed to be useful for much more generic tasks, crossing multiple application domains. [11]. A key benefit of using a DSL is the isolation of accidental complexities typically required in the implementation phase (i.e., the solution space) such that a programmer can focus on the key abstractions of the problem space [12]. Other benefits of DSLs include [13]: •

DSLs allow solutions to be expressed in the idiom and at the level of abstraction of the problem domain. Consequently, domain experts themselves can understand, validate, modify, and often even develop DSL programs.



DSL programs are concise, self-documenting to a large extent, and can be reused for different purposes [14].



DSLs enhance productivity, reliability, maintainability [15, 16], and portability [17].



DSLs embody domain knowledge, and thus enable the conservation and reuse of this knowledge.



DSLs allow validation and optimization at the domain level [18, 19, 20].

• DSLs improve testability following approaches such as [21]. A graphical domain-specific language must include the following features [22]: •

Notation – a domain-specific language must have a reasonably small set of elements that can be easily defined and extended to represent domain-specific constructs.



Domain Model – a domain-specific language must combine the set of elements and the relationships between them into a coherent grammar. It must also define whether combinations of elements and relationships are valid.



Artifact Generation – one of the main purposes of a domain-specific language is to generate an artifact, for example, source code, an XML file, or some other usable data.

Serialization – a domain-specific language must be persisted in some form that can be edited, saved, closed, and reloaded. A domain-specific language is defined by its domain model. The domain model includes the domain classes and domain relationships that form the basis of the domain-specific language. The domain model is not the same as a model. The domain model is the design-time representation of the domain-specific language, while the model is the run-time instantiation of the domain-specific language. Domain classes are used to create the various elements in the domain, and domain relationships are the links between the elements. They are the design-time representation of the elements and links that will be instantiated by the users of the design-specific language when they create their models [23]. •

- 212 -

Despite the benefits offered by DSLs, there are several limitations that hamper widespread adoption. Many DSLs are missing even basic tools such as debuggers, testing engines, and profilers. The lack of tool support can lead to leaky abstractions and frustration on the part of the DSL user [24]. We have chosen Domain Specific Language Tools for Microsoft Visual Studio 2008 for our research, since it provides robust development environment, adequate debugging mechanisms, good extensibility options (it is possible to add custom elements to the flowchart model), .NET framework support and sufficient documentation. There is also an approach of domain specific language creation by using UML profiles [25]. Instead of heavyweight metamodeling, the developer can create a full-featured DSL, based on a UML profile and its customization. One of the tools, implementing this approach, is MagicDraw. Although it’s possible to validate models and generate code from them, surveys show [25] that the tool lacks customization flexibility (e.g. creation of a new symbol, which has nothing in common with existing UML symbols, inability of changing default UML metamodel values etc.).

3

Prototype flowcharting tool

Proposed solution The main goal of our work was to create a customizable visual flowcharting tool with capabilities of generating flowchart definition code that could be utilized by other software (Figure 1). We have chosen a simple flowchart version with most common flow diagram symbols and created a metamodel (detailed in section 3.2) using Microsoft DSL Tools. The metamodel was compiled into an IDE that can be used to draw flowcharts and convert them into customizable XML code. The XML code is used by Adobe Flex application, which visually represents the flow algorithm. 3.1

Figure 1. From metamodel to code

It is important to note that Microsoft DSL automatically creates XML code from both metamodels and models (via implemented serialization functions), yet the generation lacks flexibility – usually even little changes in XML structure requires considerable modifications of the generator. That’s why we chose a much faster and easier solution and created our own XML generator (detailed in section 3.3). 3.2

The metamodel of flowchart Microsoft DSL Tools for Visual Studio IDE can be used to create virtually any type of diagram metamodel by specifying it’s object hierarchy, relationships between classes and representation of model objects. The main elements of the metamodel are domain classes (with optional domain properties), that can be connected among themselves with three types of relationships [26]: •

Inheritance – a relationship between a base class and a derived class. Displayed as a line that has a hollow arrow that points from the derived class to the base class.



Embedding – a containment relationship. Displayed as a solid line in the diagram.

Reference – a relationship between two domain classes that do not have an embedding relationship. Displayed as a dashed line on the diagram. Every domain relationship has roles (source / target) and multiplicity (specifies how many elements can have the same role in a domain relationship). As it is shown in Figure 2, our flowchart model consists of the most common flowchart symbols – “Start”, “End”, “Decision”, “Action”, “Input”, “Output” and “Subdiagram”. Every symbol has identification number (provided by IDE) and a name. Symbols, that require user input in a model, have property “Value”, which, if necessary, can be validated by custom criteria. Connection between diagram elements is called “FlowConnection” and has a property “Condition”, used in conjugation with “Decision” symbol. •

- 213 -

Figure 2. Domain specific language definition for flowchart model

3.3

Code generator The generator of code is based on text template mechanism, provided by IDE, and therefore enables to transform model into any type of programming code. We have chosen XML, as it is most versatile and is best suited to achieve our goals. It is also important to note, that code generator is not limited by any output language, since by using Microsoft Visual Studio text templates it is possible to transform the model into any type of programming code or formal notation like GraphML. While generated IDE with the mentioned code translator might seem like a all-round platform independent solution, the IDE itself is mainly restricted to operation in Windows family operating systems. Some sources indicate, that this may be only a temporary inconvenience, as Microsoft has recently acquired Teamprise, a division of SourceGear that built tools to give developers access to Visual Studio 2008 Team Foundation Server from systems running Linux, Mac OS X and Unix [27]. To illustrate our solution, we present a simple example of a program (Figure 3), that asks for a numerical input, checks if the given number is less than 3, increases the number if it’s not (simple illustration of a cycle) and outputs the number. Because of the text amount limitations in the article, we present only the generated XML code. Every element is specified by name (caption) and identification code. Additional attributes, such as Type and Color have been added to adapt the code that is used by an animated model representation application (detailed in section 3.4). We are planning to dedicate our upcoming articles to more detailed examples that will include more elaborate cases.

- 214 -

Figure 3. Example of a simple program, in our flowcharting environment

a a = a+1 aInfected) model.

4

Evolution forecasting model

General assumptions The model proposed in this article aims on mobile malware evolution tendencies forecasting and by that is different from other malware models that concentrate on epidemiologic or economic malware outbreak consequences modelling. Simulation environments serve many purposes, but they are only as good as their content [1]. While designing the model it is necessary to select main factors out of many and reject those that are not important or may cause result distortion. In case of GA modeling the main task consists of three parts: appropriate selection of chromosome structure, which represents the solution, definition of the fitness function and GA operating conditions, such as population size, mutation rates, parent selection, etc. The model proposed in this article is based on the model previously proposed in [11] with some modifications, adapting it for mobile malware evolution forecasting. Although the proposed model is adapted to propagation strategy evolution forecasting with some modifications (fitness function change) it can be used for other characteristic evolution forecasting. Here we define the propagation strategy as a combination of methods and techniques, used by malware to insure malware population increase. In the current study, we have chosen to model strategies for a theoretical mobile virus, which aims infecting the largest amount of mobile devices during a fixed relatively short period of time. 4.1

4.2

Experiment conditions GA consists of initialization, selection and evolution stages. During the initialization stage initial population of strategies is generated. Each strategy is represented as a chromosome. At selection stage strategies are selected through a fitness-based process and in case termination condition is not met evolutionary mechanisms are started. If termination condition is reached, algorithm execution is ended. If not – evolutionary mechanisms are activated. Initial population is generated on a random basis, i.e. each individual, representing separate strategy is combined of random genes’ values. Population size N is equal to 50. Population size remains constant after each new generation. The algorithm would stop producing new generations in case the number of generations have reached 100. Fitness proportionate selection was used. Mutation operator is activated to each newly generated individual with a 0.05 probability. MATLAB platform was used for model implementation. 4.3

Strategy representation Each strategy is represented as a chromosome (Table 1), which is combined of genes, i.e. combination of techniques and methods. Genes are divided into AA (always active compulsory or activating gene) and AE (active if enabled by AA gene). Such division insures representation flexibility and fixed chromosome length.

- 261 -

Table 1. Chromosome structure.

Gene number / Name / Type / Description / Comments 1/TRANSF1/AA*/Defines the 1st supported propagation type/Enables NR

Value range or sample values MMS

Gene number / Name / Type / Description / Comments 10/OS_PLATF/AA/OS platform affected by malware

Value range or sample values Linux; WIN MOBILE; SYMBIAN;…

2/TRANSF2/AA/Defines the 2nd supported propagation type/Enables NR

SMS

11/TEL/AA/Telephone affected by malware

models,

NOKIA, SAMSUNG, Apple, RIM;…

3/TRANSF3/AA/Defines the 3rd supported propagation type/Enables BT

Bluetooth

12,13,14/EN_EXPL_N/AA/EXPL_ N (N=1-3) activation gene

ON=ExploitRef / OFF

4/TRANSF4/AA/Defines the 4th supported propagation type/Enables EMAIL

e-mail

15,16,17/EXPL_N(N=13)/AE/Defines the exploit used for propagation

Random exploit out of suitable exploit array

5/TRANSF5/AA/Defines the 5th supported propagation type/Enables WIFI

Wi-Fi

18/NR_TIME/AA/Defines the NR gene’s activity hours

Always; 10:0020:00; 20:0010:00

6/NR/AE/Telephone number search or generation module/ Effective if SMS or MMS transfer methods.

Address book; Accepted/ Dialed numbers; Random; …

19/BT_TIME/AA/Defines gene’s activity

BT

Always; 10:0020:00; 20:0010:00

7/BT/AE/Scanner module, that searches for mobile devices with Bluetooth support.

Scan

20/WIFI_TIME/AA/Defines WIFI gene’s activity

Always; 10:0020:00; 20:0010:00

8/EMAIL/AE/E-mail module

Address book; e-mail address DB.

21/EXEC/AA/Defines additional malware functionality/Activates EXEC_CHAN

None; Manage; Update; Manage+Updat e

Scan

22/EXEC_CHAN/AE/Defines malware update channel

e-mail; WI-FI; web-update

sending

9/WIFI/AE/Scanner module, searches for mobile devices with WIFI support. 4.4

Fitness function From [30] we can say that the propagation strategy efficiency can be evaluated by value K – the number of computers the first malware individual in the wild can infect in a fixed time period. That means that the higher is K, the higher is the fitness of a propagations strategy. Our K calculations by fitness function (Eq.1.) are based on combined statistical and empirical evaluation of time expenditures of strategy’s functionality and probabilistic evaluation of strategy’s functionality efficiency. Probabilities and time consumption values for activation genes and genes that are not enabled are equal to 0 and may be excluded from calculations.

 (1 − (1 − p6 (NR _ TIME )) ⋅ (1 − p7 (BT _ TIME )) ⋅ (1 − p8 ) ⋅ (1 − p9 (WIFI _ TIME ))) ⋅ p10 ⋅ p11 ⋅    17   F (S ) = k ⋅   (1 − pi )  ⋅ 1 −       i =15 



(1)

where: S – evaluated strategy; p6-p9 – probability, that exploits will be successfully transferred to the target device (p6, p7 and p9 are time dependant); p10 – probability, that the target device will run the supported OS; p11 – probability, that device hardware is compatible; p15-p17 – probabilities, that exploit will result in infection; k – the number of cycles the virus, using the evaluated strategy, can perform in one second time interval (Eq.2).

k=

1 22



(2)

tj

j =1

th

where tj are time expenditures needed for j gene functionality. The fitness function can be read as: “The evaluated strategy S can perform k cycles per second. During each cycle the virus, using this strategy, will infect a target host in case at least one of the transfer methods successfully transfers the exploits to the target, the target runs the supported OS on the supported platform and at least one of exploits result in target infection. Compared - 262 -

to our previous model for Internet worms described in [11] limitations for probabilities’ size were removed. The correctness of fitness functions proposed was tested on historical data, by applying for fitness evaluation some malware samples with known fitness, observed experimentally. 4.5

Experiment results The best fitness result achieved during algorithm test was equal to F(Sd)= 0.023. Compared to fitness of a sample strategy F(Sp)=0,017 of the current mobile malware (Transfer method – MMS only; OS platform Symbian; Telephone platform - NOKIA; activity hours – Always; Numbers used – Address book; one exploit) fitness of the predicted mobile virus has increased almost 1.687 times. The fitness change during evolution of the best individual is shown on Fig.1., average population fitness change - Fig.2. It should be noticed, that general population fitness also increases in time and that the number of individuals with “better” strategies increase even though the best individual evolution stops after the 42 generation.

Figure 1. Best strategy fitness change graph

Figure 2. Average population fitness change graph

Compared to the sample strategy the following functionality (genes) was enabled in the best strategy during evolution: Windows mobile support, Wi-Fi transfer method support. We can make an assumption that these methods were included since they provide rather high infection efficiency (additional popular OS and W-Fi with relatively high network coverage). Other potentially efficient methods were not included since their addedvalue to propagation efficiency was neglected by time consumption, other methods do not result in infection at all (additional functionality) or even minimize the propagation rate (e.g. limitation by hours).

5

Conclusions

In this article the genetic algorithm modeling approach for mobile malware evolution forecasting was proposed. This is an absolutely new modeling approach for this malware type since it forecasts mobile malware evolution trends compared to traditional models that concentrate on epidemic consequences modeling. Model tests were performed for the mobile malware propagation strategy forecasting. The proposed model included the Genetic algorithm description, operating conditions, chromosome that describes mobile malware characteristics and the fitness function for propagation strategy evolution evaluation. Model was implemented and tested on the MATLAB platform. The model test results have shown that in case malware creators will intend to optimize the propagation strategy mobile malware evolution will tend to inclusion of additional OS platform and propagation by Wi-Fi networks. The forecasted propagation strategy tends not to be function overloaded due to time consumption increase. The main model application area is countermeasures planning, since the model predicts the propagation strategy trends. The current study shows that special attention should be paid to wireless security on mobile devices. The model can be also used as a framework (fitness function modification would be needed) for evolution modeling of other mobile malware parameters, such as stealth, functionality or their complexes.

References [1] [2] [3] [4]

Banks S.B., Stytz M.R. Challenges Of Modeling BotNets For Military And Security. Proceeding of SimTecT 2008. 2008. Barford P., Yegneswaran V. An Inside Look at Botnets. Advances in Information Security, Springer US. 2007, volume 27, 171-191. Birchenhall C., Kastrinos N., Metcalfe S. Genetic algorithms in evolutionary modeling. Journal of Evolutionary Economics. 1997, volume 7, 375-393. Bulygin Y. Epidemics of Mobile Worms. Performance, Computing, and Communications Conference, 2007. IPCCC 2007, IEEE International. 2007, 475-478. - 263 -

[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37]

Chen Z., Gao L., Kwiat K. Modeling the Spread of Active Worms. Proceedings of NFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications, IEEE Societies.2003, volume 3, 1890-1900. Defense Acquisition University. Systems Engineering Fundamentals: January 2001. Defense Acquisition University Press. 2001. Faraoun K.M., Boukelif A. Genetic Programming Approach for Multi-Category Pattern Classification Applied to Network Intrusions Detection. International Journal of Computational Intelligence. 2007, volume 3(1), 79-90. F-Secure. Worm:SymbOS/Commwarrior. F-Secure Corporation, Interactive: http://www.f-secure.com/ 2006. Fultz N. Distributed attacks as security games. Master thesis, US Berkley School of Information. 2008. Garetto M.W., Towsley G. D. Modeling Malware Spreading Dynamics. Proceedings of INFOCOM. 2003. Goranin N., Cenys A. Genetic Algorithm Based Internet Worm Propagation Strategy Modeling. Information Technology And Control. 2008, volume 37, 133-140. Goranin N., Cenys A. Genetic algorithm based Internet worm propagation strategy modeling under pressure of countermeasures. Journal of Engineering Science and Technology Review.2009, volume 2, 43-47. Goranin N., Cenys A. Malware Propagation Modeling by the Means of Genetic Algorithms. Electronics and Electrical Engineering. 2008, volume 86, 23-26. Hill R.R., McIntyre G.A., Narayanan S. Genetic Algorithms for Model Optimization. Proceedings of Simulation Technology and Training Conference (SimTechT). 2001. Holland J. Adoption in natural and artificial systems. The MIT press. 1975. Jarno U. Disinfection tool for SymbOS/Locknut.A (Gavno.A and Gavno.B). F-Secure Corporation, Interactive: http://www.f-secure.com/ 2005. Kaspersky Lab. Kaspersky Lab reports. Interactive: http://www/.kaspersky.com 2009. Kephart J.O., White S.R. Directed-graph epidemiological models of computer viruses. Proceedings of IEEE Computer Society Symposium. 1991, 343-359. Lelarge M. Economics of Malware: Epidemic Risks Model, Network Externalities and Incentives. Proceedings of Fifth biannual Conference on The Economics of the Software and Internet Industries. 2009. Li Z., Liao Q., Striegel A. BotnetEconomics: Uncertainty Matters. Managing Information Risk and the Economics of Security, Springer US. 2009, 1-23. Monga R. MASFMMS: Multi Agent Systems Framework for Malware Modeling and Simulation. Lecture Notes in Computer Science, Springer Berlin / Heidelberg. 2009, volume 5269/2009, 97-109. Naraine R. Cell Phone Security: New Skulls Mutant Comes with Virus Extras, Interactive: http://www.eweek.com/ 2004. Nazario J. Defense and Detection Strategies against Internet Worms. Artech House Publishers. 2003. Niemela J. F-Secure Virus Descriptions : Skulls.D. F-Secure Corporation, Interactive: http://www.f-secure.com 2005. Noreen S., Murtaza S., Shafiq M.Z., Farooq M. Evolvable malware. GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, ACM. 2009, 1569-1576. Ramachandran K., Sikdar B. Modeling malware propagation in Gnutella type peer-to-peer networks. Proceedings of the Parallel and Distributed Processing Symposium, IPDPS. 2006, volume 20, 8 pp. Ruitenbeek E.V., Courtney T., Sanders W.H., Stevens F. Quantifying the Effectiveness of Mobile Phone Virus Response Mechanisms. IEEE/IFIP International Conference on Dependable Systems and Networks.2007, 790-800. Serazzi G., Zanero S. Computer Virus Propagation Models. Lecture Notes in Computer Science, Springer-Verlag. 2004, 26–50. Shah A. IDC: 1 Billion Mobile Devices Will Go Online by 2013. IDG News Service, Interactive:http://www.pcworld.com/ 2009. Staniford S., Paxson V., Weaver N. How to 0wn the Internet in Your Spare Time. Proceedings of the 11th USENIX Security Symposium, USENIX Association. 2002, 149-167. Stender J., Hillebrand E., Kingdon J. Genetic Algorithms in Optimization, Simulation and modeling. IOS Press. 1994. Sundgot J. First Symbian OS virus to replicate over MMS appears.2005. Interactive: http://www.infosyncworld.com/ 2005. Turner D. Symantec Global Internet Security Threat Report. Symantec Corporation. 2008. Zou C.C., Gong W., Towsley D. Code Red Worm Propagation Modeling and Analysis. CCS '02: Proceedings of the 9th ACM Conference on Computer and communications security, ACM. 2002, 138-147. Zou C.C., Gong W., Towsley D. On the performance of Internet worm scanning strategies // Performance Evaluation, Elsevier Science Publishers B. V. 2005, volume 63, 700–723. Zou C.C., Gong W., Towsley D. Worm Propagation Modeling and Analysis under Dynamic Quarantine Defense. WORM '03: Proceedings of the 2003 ACM workshop on Rapid malcode, ACM. 2003, 51-60. Zou C.C., Towsley D., Gong W. Email Virus Propagation Modeling and Analysis. Technical report TRCSE-03-04, University of Massachusetts. 2004. - 264 -

BRINGING MODELS INTO PRACTICE: DESIGN AND USAGE OF UML PROFILES AND OCL QUERIES IN A SHOWCASE Joanna Chimiak–Opoka1, Berthold Agreiter1,2, Ruth Breu1 1

University of Innsbruck, Institute of Computer Science, ICT Building, Technikerstrasse 21a,6020 Innsbruck, Austria, [email protected], [email protected], [email protected] 2

arctis Softwaretechnologie GmbH, Jaegerweg 2, A-6401 Inzing, Austria

Abstract The introduction of systematic modelling practices in an enterprise is a demanding task. Mainly, the challenges are related to ensuring a sustaining modelling culture, especially in smaller IT departments. In this paper, we analyse experiences from a modelling project in an industrial setting. The major goal was to improve the documentation quality of the existing, widely informal process model and to establish a commonly accepted modelling culture. During the project, a UML profile was iteratively developed and applied to a model. Furthermore, OCL has been used for automatised quality assessment by model querying. Major benefits, observed by the industry partner, were improved knowledge sharing among the project participants supported by an intuitive modelling notation and automatic information retrieval from the model. Moreover, we describe our adaptations of applied methodologies and quality improvements achieved in the project. Keywords: experience story, UML profile, domain specific modelling (DSML), model querying with OCL, model quality, DSL

1

Introduction

This paper presents a field report of modelling the internal business processes and data flows in a company. The objective of the project was to document a software–assisted business process of a major retail store chain and further to investigate the quality of the process and its model in an automated way. Additionally, we wanted to investigate the applicability of the applied methods and tools in an industrial context. The primary goal of the project was to document the business process and information flow in a clearly defined and unambiguous way, with readily available tools and within short time to make this knowledge accessible to all developers within the company. A subsequent goal was to improve the understanding of complex processes by engineers and to be able to spot inconsistencies or weaknesses. To meet these requirements we decided to use an existing UML tool and make use of the UML profiling mechanism. As a consequence of this solution, costs were kept low as no implementation work was required, and the domain specific modelling language (defined as a UML profile) could be designed using a refinement approach instead of designing it from scratch. Additionally, the approach enabled easy adaptation of the visual representation of model elements, so that models are easier to understand for domain experts by the use of intuitive shapes of elements. Another aspect we explored in this project was automated quality assessment by querying the model using the Object Constraint Language (OCL).We systematically analysed quality improvements achieved by the modelling and querying methodology we introduced. Querying is especially interesting because of the fact that it can be used to assist the modeling process so that the created model maintains high quality. This contribution describes the development process of the UML profile and its application to seven use cases. Both, the development of the UML profile and the model were conducted iteratively and within close cooperation with domain experts. Furthermore, an analysis framework as a number of OCL queries is developed to assess the quality of the model. The objectives are (1) to describe iterative, expert-supported, agile adaptation of existing methods to develop UML profiles, (2) to share our experience of using this method in an industrial context, (3) to present a practical method for model analysis, and (4) to share our experience of using it to ensure technical correctness, adherence to conventions, and quality assessment of a model. To increase the clarity of the detailed project description in the following, we provide definitions of the most important concepts in Table 1. The structure of the paper is as follows: Section 2 describes the setting and motivates systematic modelling and model querying. Section 3 describes used methods, frameworks and tools and motivates their usage in the context of the project. Next, in Section 4 we demonstrate the development process of the model and queries within the project and describe obtained quality improvements. Finally, Section 5 concludes and gives an outlook on future work.

2

Project Context

This project was initiated by a major retail store chain with over 4000 employees and 150 stores. The company is developing large parts of their inventory and warehouse management software on their own while - 265 -

integrating third party solutions for some specific areas. Its IT landscape is constantly evolving, hence the organisation wanted to improve the documentation of the whole system and included processes. They were aware of the fact that a common modelling language, which can be unambiguously interpreted and used by all its developers, will further improve the usefulness of documentation. Status quo ante As certain parts of the system landscape have evolved significantly over the years, and their development was partitioned among several groups, it became constantly harder to overlook the whole system. Hence, to decide on the best strategy for adhering to new requirements became more challenging over time. Mainly because of the following two points, the company started capturing their business processes in models: Table 1. Definitions of basic concepts related to modelling and model analysis.

CONCEPT metamodel model

diagram quality model quality assurance quality assessment model query

DEFINITION a model that defines the language for expressing a model [12]. In our context, the defined language is a Domain Specific Language (DSL). a simplification of something so we can view, manipulate, and reason about it, and so help us understand the complexity inherent in the subject under study [10]. A model is a description written in a well–defined language (metamodel) [7]. a graphical presentation of a collection of model elements, most often rendered as a connected graph of arcs (relationships) and vertices (other model elements) [12]. is a framework defining and relating relevant quality aspects. a process for establishing stakeholder confidence that a model fulfills certain expectations. a general term that embraces all methods used to judge the quality. The judgement is based on a quality model. a means to retrieve information from a model to reason about the subject under study or to assess model quality.

1.

Overview on all components: when developers had to modify a part of the system which could possibly affect subsystems developed by other people, they needed to crosscheck whether their modification would have any unintentional effects on these subsystems. 2. Finding suitable interfaces: when a new third party solution was about to be introduced in the company, it could be cumbersome to find out to which existing components this solution needs interfaces and how this data can be provided. Consequently, some first diagrams capturing information about interfaces among different applications were created by the company. The purpose of these diagrams was mainly documentation. As a result, the communication among developers should be organised in an efficient way without having to consult the source code of other components. Some developers started modeling which data is used by certain applications and how this data is passed on to other applications. Essentially, these diagrams were created with drawing tools and showed the flow of information in different processes, e.g. which applications are involved in a process, from which source data is read and where is it written to. After initiating this activity, the industry partner identified the following problems: • The semantics of different model elements was not clearly defined. Some developers interpreted and modelled certain facts different than others. Sometimes developers felt that it was not possible to express a specific process with existing elements and introduced new shapes/model elements. • In most cases the diagrams were created as the state–of–the art from the most current process. However, it could happen that the implementation of a process changed, but the corresponding model was left unchanged. Consequently, the model and the actual code implementing the process were not synchronised. Hence, on the one side a modelling language that is intuitive to every domain expert should be found. On the other side, the model created this way needs to provide a way of checking certain properties. Such checks can be either on the modelled process, like “To which databases is application A connected?”, but also about the quality of the model itself, like “Are there any unnamed applications in the model?”. Thus, the organisation contacted arctis Softwaretechnologie GmbH. Arctis was in charge of developing an appropriate modelling technique in this context and responsible for modelling the initial use cases together with staff from the IT department of the enterprise. Use Cases Important and large examples were selected by the industry partner with the expectation of covering as many different constructs as possible. The selected use cases were modelled in comparable level of detail. It was desired that the use cases are complex enough so that also submodels, i.e. a subprocess of the use case in a higher level of detail, need to be created. Furthermore, the industry partner selected use cases exhibiting - 266 -

interactions with each other. This means for instance, that an application used in one use case produces data which is sent to applications of a different use case. The main purposes of modelling were documentation, improved communication among developers, and a centralised point of information. Every developer should be able to view and edit the model if changes become necessary. The processes should be described in a way so that, among other information, all involved software applications, the location where they run, the type and protocol they communicate with each other and the databases they use are displayed. To get a better understanding of how these examples look like, we briefly give a description of the Customer Order case. As the name suggests, this process covers orders by customers. In this case the term customer does not identify a person buying something in a shop but a subsidiary store ordering goods at the headquarters. This process involves different stakeholders: employees in a store and major customers. These stakeholders start the process by different means of ordering goods, like fax or other ways of invoking a software call at the server in the headquarters. The server runs several applications to process these orders and get or write data to the according database tables. In the case of the “customer order” process, there are several communication channels to other use cases. For example, after the order data is written to a database, it is fetched by a different process, which represents the Order Processing use case. Quality Assessment The objective of this project was not only to document software–assisted business processes but also to analyse different quality aspects of the model in an automated way. We distinguish between two different aspects of quality assessment of the model. One aspect targets the domain specific quality, the other targets the linguistic quality of the model. Domain specific aspects perform information retrieval to increase understanding of the modelled process and thus provide input for process improvements. Linguistic aspects consider the quality of the model as artefact. These are mostly related to technical issues, such as the internal model representation by the modelling tool, or issues related to the modeling process, like covering user defined quality aspects, i.e. completeness, consistency, and adherence to conventions. Both these quality aspects need to be assessed so that the model does not contain unused elements or even contradicting information. For example, to assure that all communications between different use cases are modelled correctly the conditions shown in Figure 1 should be satisfied. The technical realisation for this check is described in sections 3.2 and 4.2. The selection of methods and tools for project realisation is described in the following section. . completeness of the definition of the inter-use case communication triples: . each incoming communication for a use case must have . a corresponding outgoing communication in a different use case and . a corresponding communication in the global view Figure 1. Informal description of a quality aspect checking communications between use cases.

3

Concepts, Methods and Tools

In this section, we focus on methods and tools applied to the metamodel and model development (Section 3.1), development of model queries (Section 3.2), and the quality model used to discuss quality improvements (Section 3.3). 3.1

Model Development The requirement from the industry partner was to model the interplay of different applications and software components as a business process in a graphical and comprehensible manner. Unnecessary complexity should be avoided and model understandability is a key goal. Our industry partner already had numerous diagrams and a clear picture of what to capture. However, not all of the existing diagrams were created using the same technique as it was still in an evolving phase. Nevertheless, the existing diagrams extensively helped us to choose the modelling language. Different process–modelling languages were considered. Among others the Business Process Modelling Notation (BPMN, [16]) and Event–driven Process Chains (EPC, [11]). They offer very usable concepts, but the representation of data and datastores was not sophisticated enough. The industry partner wanted a clean separation between different types of communication and a structured representation of datastores and data. Furthermore, these diagrams were not customisable enough in the sense that custom shapes can be assigned to model elements. After reviewing the existing diagrams and taking the aforementioned needs into account, the choice fell on UML activity diagrams. They can be used for process modelling as they are similar to BPMN [15] and EPC [6]. However, UML activity diagrams by themselves were still lacking domain specificity and offered a modelling spectrum which was too wide and where the semantics of model elements was not immediately visually recognisable. We decided to use the UML profiling mechanism to address the adaptation of activity diagrams. For modeling business processes a standard UML profile was proposed [13]. However, for our project it was too generic, too large and would have needed further adaptation to recognise the language as an appropriate DSL. As a light–weight, easy to learn and intuitive language was required by our project partner, we created our own UML profile tailored to the project requirements. - 267 -

We used the commercial UML modelling tool MagicDrawI which fulfilled all requirements for the modelling tool. To our partner it was important that an existing, user friendly, and stable tool is used to avoid development and maintenance costs and have a preferrably flat learning curve. An additional important point covered by MagicDraw was the possibility to highly customise diagrams, e.g. by adapting shapes and icons for different model elements or different line styles for data flows and control flows. To be able to analyse the model it was important to have a tool that strictly adheres to the existing standards such as UML and XMI. This allowed us to import the model to our analysis tool.

Figure 2. The development process of the DSL and the model.

For the development of the DSL we defined an iterative development method based on existing methods [1, 14]. We followed the approach described in [1] to design a DSL: identify fundamental language constructs, relationships, constraints, concrete syntax, and semantics. We combined this approach with a pragmatic method proposed in [14] supported by MagicDraw where DSL samples are created and the DSL environment is tested. Moreover, we defined an iterative process with alternating interviews with domain experts and modelling steps. The development process (Figure 2) consists of an initial step (upper swimlane) and several macro iterations (lower swimlane). In every phase, domain and modelling experts are cooperating. In the initial step domain concepts and relations are identified and documented after an initial interview with domain experts. Based on this information, an initial version of the UML profile and a sample diagram are created by the model designers. Those artefacts are the basis of discussion between domain experts and model designers on the expressiveness and understandability of the proposed DSL. The feedback from the domain experts is included in the next step, i.e. the modeling of the first use case. Each use case is modelled in one macro iteration. In each macro iteration, domain and modelling experts work together to assure high appropriateness of the model. First, domain experts prepare an informal description of the use case. Next, the model designers start micro iterations to model the use case and update the UML profile and customisations if necessary. Afterwards, domain experts evaluate and refine the model and profile, i.e. the accuracy of the model, its understandability, and intuitiveness of representation. The feedback is integrated in the current and following use cases. 3.2

Model Queries As indicated at the beginning of this section, queries can be used to reason about the subject under study or to assess model quality. Thus, we consider domain specific and linguistic aspects (compare Section 2). As we decided to use UML, the first choice query language was OCL. It was formally shown that OCL 2.0 is expressive enough to be used as a query language [2]. Another reason to select this language was our positive experience from previous projects [5, 4]. For developing and managing the collection of queries, we decided to use our library extension to OCL [3] and OCLEditor developed in our research groupII. The library extension to OCL enables to collect and manage OCL expressions and models. Within libraries standard OCL definitions and our additional extensions for queries and tests are collected. Queries are expressions used to assess model quality or to retrieve specific information from a model. To increase semantic correctness of the expressions we use tests. The mechanism is similar to unit testing including the definition of test cases and test data. For the purpose of our project we defined a model analysis and library development process (see Figure 3). The upper swimlane corresponds to the manual model analysis, the lower swimlane to the library development process. First, a common requirement for model analysis and library development is specified. The quality aspect is selected, e.g. the definition of the inter–use–case communication triples (Figure 1). For this I

http://www.magicdraw.com/ See http://squam.info/ocleditor/ for further information on the tool, the underlying theory, examples, and documentation.

II

- 268 -

aspect OCL definitions and queries are specified in the development step. The next step is quality assessment, where the results of manual and automatic analysis are crosschecked. For the selected aspect, manual inspection is used to determine the result of this aspect for the model. Simultaneously, appropriate queries are evaluated on the model. If the results of the model inspection and the query evaluation differ, the reason has to be determined and either the OCL definition specification or manual inspection needs to be repeated.

Figure 3. The model analysis and library development process.

The manual inspection of the model is conducted as long as correctness of a query achieves a defined convenience level. Afterwards the query can be used for automatic model analysis. If the results are equal, the last step can be executed, i.e. quality assurance. The aim of this step is to assure semantic correctness of OCL expressions in the future development of the library. For this purpose tests are specified and evaluated regularly. In the test evaluation step, tests and test models are required to assess the desired semantics of definitions like test cases and test data are required to assess the semantic correctness of software. As a consequence, the model is used as test model and thus, must be freezed. 3.3

Quality Model As the size of our case study was too small to obtain statistically significant data, we decided to select a qualitative model to request and structure feedback from our project partner. We selected the SEmiotic QUALality (SEQUAL) framework [9, 8] providing a holistic view on model quality with seven quality dimensions in two layers (Figure 4). In the technical layer three quality dimensions are considered, i.e. physical, empirical, and syntactical quality; in the social layer the four remaining dimensions, i.e. semantics, pragmatics, social and organisational quality, are considered. Below we give brief definitions of the seven quality dimensions. The physical quality dimension relates the externalised model and participant knowledge. The externalised model is the set of all explicit or implicit statements in the model. In the case of this project there are informal and formal descriptions of the use cases. The participants are people involved in the development or usage of the model. Two aspects are considered in this quality dimension: what is modelled and how is it protected and shared. The first aspect is represented as the ratio of known statements represented in the externalised model to all known statements about the domain (externalisation). The second aspect is internalisation and includes model persistence and availability aspects. The empirical quality dimension considers the readability of a model, including its complexity and aesthetics. The syntactical quality dimension relates model externalisation and language extension. The language extension is the set of all statements possible to express in the language. In the presented case the language is the DSL. This dimension considers syntactical correctness with respect to the language. The semantical quality dimension relates model externalisation and the modelling domain. It covers two aspects: the first is validity which assures that all statements made in the model are regarded as correct and relevant to the domain; the second is completeness which assures that the model actually contains all correct and relevant statements about the domain. The pragmatic quality dimension relates model externalisation and interpretation by technical and social actors. It requires that the model has been understood by the targeted participants. The social quality dimension considers the degree of agreement among participants. Each participant has a subjective knowledge about the domain and a different mental view (model) of the domain. Thus, by reading an externalised model each participant may interpret it differently. This quality dimension considers the agreement with respect to different objects: knowledge, model, and interpretation. Two degrees of agreement are considered: relative agreement, where the various objects are consistent but may be incomplete, and absolute agreement where all objects are the same (equal). - 269 -

The organisational quality dimension analyses if the model is fulfilling the goals of modelling in the first place.

Figure 4. An overview of the SEQUAL framework [9, 8].

4

Development and Analysis of Project Results

To give a better insight on how the development of the various artefacts was conducted, we show some statistics and give examples. The size of the UML profile and of the model during the different iterations of the project is discussed first (Section 4.1). After that, we show how the size of OCL libraries developed over time and give an example for an OCL expression (Section 4.2). Finally, we discuss quality improvements discovered (Section 4.3). 4.1

ML Profile and Models Asmentioned earlier, development was conducted iteratively. In each iteration the UML profile evolved. The changes on stereotypes and tagged values during macro and micro iterations are illustrated in Figure 5 on the left side. After creating the initial UML profile (0.0 in Figure 5) we started modelling the first use case. During this expert interview only the name of one single stereotype changed. However, after the domain experts reviewed the model, it was discovered that a relatively large number of stereotypes and tagged values was missing. They were added in the next step. As expected, the first use case needed the largest number of micro iterations (1.0–1.2). During the second macro iteration another set of model elements was added, some were renamed and one stereotype was considered obsolete (2.0). During the last macro iteration only one stereotype was introduced and after review the UML profile remained unchanged (3.0–3.1). For the remaining macro iterations (4.0–7.1) no changes were required. The resulting DSL contains 31 stereotypes, tagged values, and enumerations. Statistics about the size of the model are illustrated in the right part of Figure 5. The figure visualises the strong connection between Classes and CentralBufferNodes especially in the first phase, as both were used to represent databases. Class diagrams were used for modelling the different tables, whereas CentralBufferNodes are then representing these tables on activity diagrams. The look and feel of the diagrams was customised due to request of the industry partner. This customisation was considered very important because on the one hand different elements can be distinguished much easier, on the other hand appropriate icons for elements extensively increase the readability and understandability of complex diagrams (see Figure 6). Simple variation of colors for different stereotypes were not sufficient because it does not render a model more intuitive to understand, and because this information would largely get lost when diagrams are printed and photocopied.

Figure 5. Statistics of changes on UML profile elements: stereotypes and tagged–values (left) and for the model (right). Iteration numbers are denoted with the number of the macro iteration followed by a dot and the number of the micro iteration. The macro iteration 0.0 represents the initial step of the UML profile development (see Figure 2).

- 270 -

Figure 6. Selected icons for various stereotypes with intuitive and easy-to-distinguish symbols. Note that symbols for activities executed by internal or external users only slightly differ but the difference is easy to see.

4.2

CL Queries Defined OCL expressions were evaluated in the quality assessment and quality assurance phase (Figure 3). Therefore, the role of the model was twofold: as an object for model analysis and as test data for query development. Figure 8 shows a query implementing a check for correctness of modelling of communications (Figure 1). For every incoming communication object, it looks up whether there is a corresponding outgoing communication object on a different process. It furthermore searches for the corresponding communication object in the global diagram. The global diagram should contain all modelled processes and communications between them. Figure 8 on the right side shows a result for this query, where one can see that two tuples are returned, and for both of them all three elements (incomingCommunication, outgoingCommunication and globalCommunication) are available. This means that these two communication objects are correct. For the project we defined 47 queries and in total 134 OCL expressions of different type and with a different scope (Figure 7). Within the project we had in average 121% of definition usage in queries, i.e. some definitions where used more than once. In average we had a test coverage of 123%, i.e. for some definitions we had more than one test. Moreover, almost half (46%) of the expressions may be reused in another project, as they are not project specific and can also be applied to general purpose UML.

Figure 7. The statistics for OCL library development: diversity of element types (definition, queries, tests) over the libraries (UML, UML activity diagram, and UML profile specific).

Figure 8. Example artefacts for model queries: a definition retrieving which data is transferred among different processes (left) and a possible query evaluation result for a query using the definition (right).

4.3

Quality Improvements Below we discuss quality improvements according to the SEQUAL framework introduced in Section 3.3. For each quality dimension we explain the status before (pre) and after (post) modelling, as well as means used to improve this dimension. The physical quality (completeness ratio, persistence and availability) Pre: Weak externalisation and internalisation. Not all known statements were expressed in the diagrams, i.e. they contained only partial description of the business process and information flow because they did not always reflect the current status of a system. Moreover, diagrams were only partially available and without an organisation–wide modelling environment. Post: Improved externalisation and internalisation. During each macro iteration a subsequent use case was modelled and all known and relevant statements were externalised in the model. Moreover, the model was electronically stored and available to all participants. Additionally, internalisation increased, especially availability of information stored in the model by means of model queries. Queries allow access to different views on the model from the domain perspective, e.g. create a list of all applications producing documents. Means: Systematic modelling with a UML profile in MagicDraw and model querying with OCL.

- 271 -

The empirical quality (readability, complexity, aesthetics) Pre: Complex to read and navigate views. Diagrams were linked to each other via identifiers. Hierarchy and links among diagrams were complex to identify. Post: Easier to read and navigate model. The nesting feature of activity diagrams, enabling browsing through the model and navigation along its hierarchy, and customisation of the shapes used in the diagrams improved the readability of diagrams. Means: Modelling tool supporting nesting and customisation of diagrams. The syntactical quality (conformance to the DSL) Pre: Not relevant. There was no uniform and formally defined modelling language so it is not possible to evaluate this dimension in the previous state. Post: The syntactically correct model on which syntactical correctness was assured by the modelling tool and additionally checked for secondary flaws with model queries. Queries detected unnamed elements after interrupted creation of a model element (as a consequence, undesired behaviour of the modelling tool was detected). This can be seen as detection of technical flaws. Moreover, an unused sample diagram was detected. This diagram was created in the initial step for demonstration purposes and as it was not needed anymore. This can be interpreted as detection of modelling flaws. Additionally, some unused elements were detected. The elements were present in the model but not used in any diagram and indeed not required. This can be seen as detection of a modelling flaw or too weak technical support. Means: In the modelling tool the syntactical correctness of activity diagrams was ensured. Moreover, the customised diagram type was designed to enable only usage of allowed elements (can be interpreted as an error prevention technique). For double checking, model queries were used. The semantical quality (validity, completeness) Pre: Not satisfactory level of completeness and validity. Status of diagrams was often not the same as the status of the actual system because it was hard to identify and update appropriate diagrams when the implementation changed. Post: Feasible completeness and validity. The model currently reflects the most up–to–date status and developers have the possibility to access the model via a central repository and thus maintaining the model regularly. Means: Interviews and instant feedback from domain experts to achieve high level of perceived, feasible completeness and validity, modifications of the model to include obtained feedback. The pragmatic quality (understandability) Pre: Possible misunderstandings. As current diagrams were not available in all cases and misunderstandings could happen because of different interpretations of diagrams. Moreover, no written and uniform guideline was available and thus different symbols were used by model developers. This was a cause of additional misunderstandings and knowledge exchange problems. Post: Unambiguous interpretations. The UML profile and modelling conventions assure uniform modelling and representation of concepts. This allows for an unambiguous interpretation of the model by the tools and all human actors. Means: Discussions with domain experts to increase individual comprehension. This includes understanding of the UML profile. Additionally, a manual on the UML profile was provided as means for educating new employees. The social quality (agreement on knowledge, model, and interpretation) Pre: Not satisfying knowledge agreement. Knowledge of participants was in some cases not consistent and complete regarding details of the process, information flow, and supporting infrastructure. Post: Increased knowledge agreement. Knowledge of participants was equal regarding the modelled use cases. They agree upon all details of the process and the supporting infrastructure. It is worth to mention, that the development of OCL queries increased the OCL developers’ linguistic knowledge (on the UML meta model), especially related to hierarchical dependencies of model elements, their methods and attributes. Means: Interviews and discussions with domain experts to get feasible agreement, including comparison of views to find out correspondences and conflicts. The organisational quality (fulfillment of goals) Pre: Not satisfying. The state recognised by our project partner, who identified the following goals of modelling: (1) intuitive, unambiguously interpretable diagrams, possible to use by all participants to improve the value of documentation, communication between developers, and easier to update; (2) the use cases modelled in the project should provide enough information for developers to start model development on their own. Post: Fulfillment of all goals. (1) As it can be seen from descriptions of the previous dimensions, the first goal was satisfied. (2) At the end of the project our partner confirmed satisfaction regarding the second goal and continues modelling of successive use cases to complete the documentation. Means: All means used in remaining quality dimensions.

5

Conclusion and Future Work

This contribution presented our adaptation of existing methods for development of an UML profile and our method for model analysis. Additionally, we presented our experience in using these methods for modelling and analysing business processes of a major retail store chain. After analysing the target of modelling, we started to develop a UML profile tailored towards the requirements. The profile was simultaneously developed and applied to the set of representative use cases. In the iterative, expert-supported, agile modelling process we closely cooperated with domain experts, which allowed us to get immediate feedback. The advantages of having a systematic way and company–wide guidelines for modelling as well as the benefits of highly customised diagrams have been discussed. Significant improvements in the quality of the business process and infrastructure - 272 -

description have been achieved. A collection of OCL queries for further quality improvement has been developed and applied to the model. The automated model quality checks will be used for monitoring the quality of the model extended with subsequent use cases modelled by our partner. This way, we expect to detect inconsistencies or deviations from conventions sooner, and thus to improve the overall quality of the existing model. Future activities will also comprise a more comprehensive use of queries to generate reports and statistics on the model.

6

Acknowledgement

The research herein is partially conducted within the competence network Softnet Austria (www.softnet.at) and funded by the Austrian Federal Ministry of Economics (bm:wa), the province of Styria, the Steirische Wirtschaftsfoerderungsgesellschaft mbH. (SFG), and the city of Vienna in terms of the center for innovation and technology (ZIT).

References [1]

[2] [3]

[4]

[5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

[15] [16]

Bran, S. A systematic approach to domain-specific language design using uml In: ISORC ’07: Proceedings of the 10th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing Washington, DC, USA IEEE Computer Society (2007) 2–9 Cengarle, M.V. and Knapp, A. Ocl 1.4/5 vs. 2.0 expressions formal semantics and expressiveness. Software and System Modeling 3(1) (2004) 9–30 Chimiak–Opoka, J. OCLLib, OCLUnit, OCLDoc: Pragmatic Extensions for the Object Constraint Language In Schuerr, A., Selic, B., eds.: Model Driven Engineering Languages and Systems, 12th International Conference, MODELS 2009, Denver, Colorado, USA, October 4-9, 2009, Proceedings. LNCS 5795 Springer Verlag (2009) 665– 669 (slides). Chimiak–Opoka, J. and Felderer, M. and Lenz, Ch. and Lange, Ch. Querying UML Models using OCL and Prolog: A Performance Study In: Model Driven Engineering, Verification, and Validation Lillehammer, Norway (4 2008) presented at MoDeVVa. Chimiak–Opoka, J. and Lenz, Ch. Use of OCL in a model assessment framework: An experience report Electronic Communications of the EASST 5 (2006) Ferdian A comparison of event-driven process chains and UML activity diagram for denoting business processes Studienarbeit Technische Universitaet Hamburg-Harburg (April 2001) Kleppe, A. and Bast, W. and Warmer, J.B. Mda Explained, the Model Driven Architecture: The Model Driven Architecture: Practice and Promise Addison-Wesley Professional (2003) Krogstie, J. and Sindre, G. and Jorgensen, H. Process models representing knowledge for action: a revised quality framework Eur. J. Inf. Syst. 15(1) (2006) 91–102 Krogstie, J. and Solvberg, A. Quality of conceptual models Trondheim, Norway (2000) 91–120 (available at http://www.idi.ntnu.no/ krogstie/, last checked 2010-01-14). Mellor, S.J. and Galiano, F.B. and Ebert, Ch. Uml distilled: From difficulties to assets IEEE Software 22(3) (2005) 106–109 Mendling, J. 2. Event-Driven Process Chains (EPC) In: Metrics for Process Models Springer Berlin Heidelberg (2009) 17–57 OMG Unified Modeling Language 2.0 Infrastructure Specification (September 2003) OMG UML Profile and Interchange Models for Enterprise Application Integration (EAI) Specification version 1.0 (2004) Silingas, D. and Vitiutinas, R. and Armonas, A. and Nemuraite, L. Domain–Specific Modeling Environment Based on UML Profiles In: In Information Technologies’ 2009 : proceedings of the 15th International Conference on Information and Software Technologies, IT 2009) Kaunas, Lithuania Kaunas University of Technology, Kaunas, Technologija (April 23-24 2009) 167–177 White, S. Process modeling notations and workflow patterns In Fischer, L., ed.: Workflow Handbook 2004 Lighthouse Point, FL, USA Future Strategies Inc (2004) 265–294 White, S.A. and Miers, D. BPMN Modeling and Reference Guide Future Strategies Inc. Lighthouse Point, FL, USA (2008)

- 273 -