MIPRO 2011, May 23-27, 2011, Opatija, Croatia
From Archetypes Based Domain Model via Requirements to Software: Exemplified by LIMS Software Factory Gunnar Pihoa,b, Jaak Tepandia, Mart Roosta , Marko Parmana and Viljam Puusepa a
b
Department of Informatics, Tallinn University of Technology, Raja St. 15, Tallinn 12617, Estonia Clinical and Biomedical Proteomics Group, CRUK, LIMM, Univ. of Leeds, Beckett St, Leeds LS9 7TF, UK
[email protected]
Abstract - The Archetypes Based Development (ABD) proceeds from archetypes based domain model via requirements to software. We give an overview of ABD and exemplify its application on Laboratory Information Management Systems (LIMS) Software Factory development. ABD is guided by Zachman Framework and utilizes software engineering triptych together with archetypes and archetype patterns. For modelling of domains the Test Driven Modelling (TDM) techniques are used. TDM utilizes test driven development techniques in domain engineering. The resultant domain models serve as the Domain Specific Language for prescribing requirements. Implementation and testing of the LIMS Software Factory proves feasibility of archetypes based techniques in real life systems. ABD helps developers to better understand business requirements, to design cost effective enterprise applications through systematic reuse of archetypal components, as well as to validate and verify requirements resulting in higher quality software. Keywords - Archetypes, archetype patterns, domain analysis, domain model, domain modelling, software engineering, software factory, laboratory information management system (LIMS), and laboratory domain model.
I.
INTRODUCTION
In order to alleviate the demand for cheap, flexible, reliable and customizable software, there are variety of ideas and initiatives (such as software product lines [1], software factories [2], generative programming [3]) on how to develop software by using different archetypal and reusable components. Software development is a complex and difficult task where the main challenges are complexity and change. Software developers have tried to embrace complexity by applying object oriented development techniques and formal methodologies as well as tried to embrace change by applying different software process methodologies including agile software development. Unfortunately lots of software projects still fail [4]. Most, if not all software development methodologies are “how to do” (“waterfall”, “spiral”, “iterative”, “evolutionary”, “extreme programming” etc) and not “what to do” (“... software developer not having a thorough understanding of the domain in which the software is to be inserted ...” [5 p. X]) processes. The lack of understanding of the user requirements is one of the main causes why software fails [4]. Unfortunately the requirements are changing. The change of requirements, for instance as in Extreme Programming
[6], as well as in Capability Maturity Model Integration [7] framework, is in the letter as axiom. When the requirements change, software needs to be changed too. To change software is risky. We are looking for possibilities to develop software so that software does not have to be changed every time the requirements change. This is achieved using the archetypes and archetype patterns based techniques for modelling of domains, requirements and software (ABD – Archetypes Based Development). These techniques combine business archetypes and archetype patterns, software engineering triptych and software factories. Business archetypes and archetype patterns are originally designed and introduced by Jim Arlow and Ila Neustadt [8]. Business archetype patterns (product, party, order, inventory, quantity and rule), composed by business archetypes (person’s name, address, phone number, etc), describe the universe of discourse of businesses as it is, without concerning neither software requirements nor software design. According to software engineering triptych, in order to develop software we must first informally or formally describe the domain (ࣞ); then we must somehow derive the requirements (࣬) from these domain descriptions; and finally from these requirements we must determine software design specifications and implement the software (࣭), so that ࣞǡ ࣭ ٧ ࣬ (meaning the software is correct) holds [9]. By Greenfield etc [2] the software factory (SF) is the domain specific RAD (Rapid Application Development). When general-purpose RAD uses „... logical information about the software captured by general-purpose development artefacts ...“, then the software factory uses „... conceptual information captured by domain specific models ...“ [2 p. 564]. Differently from model driven architecture (MDA) the target of software factory is not to create the UML-like universal modelling language for generating PSMs (platform-specific models) from PIMs (platformindependent models) [2 p. 567]. Software factory defines family-based development artefacts (DSLs – domain-specific languages, patterns, frameworks, tools, micro processes, and others) that can be used to actually build the software for the software family members. So, the focus in software factories is on developing of product families and not on developing of
570
one-off software. Models in software factories are source artefacts and not only documentation artefacts. We use the ABD techniques for developing of the real life laboratory information management systems (LIMS) [10] software for the Clinical and Biomedical Proteomics Group at the Leeds Institute of Molecular Medicine, Cancer Research UK, University of Leeds. In what follows, we present the ABD (Section 2), explain the archetypes and archetype patterns based metamodel for business domains we use as the core of software factory and as the DSL for specifying software requirements (Section 3), exemplify how we use the ABD in development of the real life LIMS software factory and LIMS software (Section 4), continue with related work as well as with validity and verification discussions (Section 5), and finalize with conclusion and future work (Section 6). II.
OVERVIEW OF ABD
The LIMS Software Factory architecture follows the business archetypes and archetype patterns based laboratory domain model and comprises LIMS Domain Specific Language (DSL), LIMS Engine and Tests Engine (Figure 1).
Figure 1. Architecture of LIMS Software Factory With LIMS DSL the requirements for particular LIMS software will be described. According to these requirements the LIMS Engine must generate the LIMS software. The Tests Engine must validate these requirements according to the domain model of laboratory. The idea behind Figure 1 is the software engineering triptych. We have formulated this idea as the ABD [11; 12; 13; 14; 15]. The main components of ABD are the following: &
&
Analysis of the business domains using domain analysis methodology similar to one suggested by Bjørner [16]. We use the Zachman Framework [17] based approach in combination with archetypes and archetype patterns [8]. In ABD, as common for Software Factories, all models are source artefacts and not only documentation artefacts. This means, that domain models are source artefacts developed according to Test Driven Modelling [18].
& &
We use these domain models as the “ubiquitous language” [19] for prescribing and formalizing requirements from customers. We validate these customer requirements and verify generated software according to the domain models.
In ABD we use Zachman Framework (ZF) rows as methodological guidance for engineering (implementing) of domain models and software. The ZF is a two dimensional matrix consisting of 6 rows and 6 columns. Table 1 summarizes the methodology we use. Row 1 (Contextual scope model) is the glossary (list of things, objects, assets, etc) that defines the scope or boundary of domain or requirements. For example, the clinical laboratory domain includes terms like patient, clinician, medical technical assistant, and so on. Row 2 (Conceptual/Business/Semantic model, CIM – computation independent model in MDA lingua) defines the terms listed in the contextual scope model. Row 3 (Logical model, PIM – platform independent model in MDA lingua) is the blueprint in terms of classes, properties, methods and events satisfying the semantic described by Row 2. Row 4 (Physical model, PSM – platform specific model in MDA lingua) is the actual source code in programming language or in embedded into this programming language DSL, e.g. API (application programming interface) for archetypes and archetype patterns. Row 5 (Detailed definition) is the code ready to run (byte code, e.g. CIL in .NET). Row 6 (Product) is the APIs or the applications in use. We use at least two APIs: archetypes and archetype patterns API (A&AP) and domain model API. Test Driven Modelling (TDM, [18]) techniques are uses for developing of APIs and software. As common for software factories our models are not just documentation artefacts, but are source artefacts (our domain models are API-s), so we start with coding on very early development phases. This means, that as soon as the synopsis (summary, a textual description of a domain) is written, we name the classes, implement the skeletons for each class and sketch the first class diagram. All of this is test driven [20], so at any stage of development we have at least some preliminary: 1.
2. 3.
4.
571
list of terms (Row 1, contextual scope model) under the automated verification; so we can be sure, that at least the types corresponding to terms are defined in the system and it is possible to create these types; semantic (conceptual/business) model (Row 2) formally specified as unit tests; logical model (Row 3), blueprints of classes, presented for example in the form of a class diagrams as exemplified in Figure 2; physical model (Row 4, source code) - for contemporary development environments, the class diagrams are just views of code with full reverse engineering features.
Table 1.
Zachman Framework (ZF) rows as methodological guidance for engineering and development in ABD Abstraction
ZF
Concretization
MDA
3 4
Conceptual Business Semantic Logical System
CIM
PIM
Physical Technology
PSM
Detailed
Code
Domain (D)
Terms
Terms
Terms
Glossary specified as unit tests Specs
Specs Specs specified as acceptance tests Design in terms of
unit tests Design of A&AP
A&AP DM A&AP based DSL DM based DSL Source code satisfying specifications for
C# A&AP
5
Requirements (R)
D also A&AP
R, D and A&AP
Byte-code (CIL) ready to run 6
Product API used as DSL in concretization
Since the contextual and semantics models are formalized as unit tests, it is relatively safe to change and also improve the models. That is because the unit tests are able to automatically track potential inconsistencies between the work done and the future developments.
With such a gradual and step by step upgrading of the models and software we come closer and closer to the implemented and tested detailed models (Row 5) and products (Row 6). We use TDM (from contextual model via semantic, logical, physical and detailed models to product) for developing APIs (A&AP, domain) and software. First we develop APIs that implement A&AP. We use these APIs as a DSL when we develop domain models. Domain models (also in the form of APIs, for instance for a clinical laboratory) we use as a DSL for specifying software requirements from customer - for example, for a specific LIMS for a particular laboratory. With domain models developed according to TDM there are at least partial possibilities to validate requirements as well as to verify software. If it is possible to prescribe formally user software requirements with
Application
domain model based DSL, then we see these user requirements as valid (compatible) according to this particular domain model. If both the domain descriptions specified as unit tests and the software requirements specified as acceptance tests are satisfied by software, then the software has been partially verified by the domain model. An important question is how to validate the domain models engineered by TDM. We propose that the domain models can rather be falsified thane verified. We can falsify the domain models by using requirements from real life. If the domain model satisfies some of the real life requirements, then we can just say that these requirements haven’t falsified the domain models. But in case we cannot satisfy a particular requirement from the real life, then this requirement has falsified the model. III.
Figure 2. Logical model for laboratory patient.
triptych software engineering
2
Contextual Scope
domain analysis
1
A&AP domain engineering
Model
ARCHETYPE P ATTERNS BASED META-MODEL
Each column of the ZF describes single, independent phenomena. These independent phenomena are things (what), processes (how), locations (where), people (who), events (when) and strategies (why). In ABD these independent phenomena are analyzed and developed by using the product (what), business process (how), organization structure (where), person (who), order and inventory (when), as well as rule (why) archetype patterns. Table 2 illustrates how we use product, party, party relationship, order, inventory, rule, quantity and money archetype patterns (AP) for modelling of the independent phenomena of enterprises described by columns of Zachman Framework (ZF). Column 1 (what, things) describes what the products (either goods or services) are and how products are related to each other.
572
Table 2.
ZF columns with archetype patterns.
What Things Products and services
How Processes Reporting (feedback)
Product AP
Business requirements Where Who Locations Persons Organization and Persons organization structure Party AP Party relationship AP Rule AP Quantity and money AP Common infrastructure
For modelling of product and product relationships we use the product AP. In addition, two other archetype patterns (quantity and rule) are needed when describing products. Column 2 (how, processes) describes business processes. For modelling of business processes we use the reporting AP [21]. This AP actively manages the progress of business processes by using feedbacks from particular business process managers. Each report is the party relationship where subordinate (the role of the person) reports to supervisor (the role of the person). We have designed the reporting AP as a special case of party relationship AP [8]. Column 3 (where, location) describes the structure of the organization in terms of organization units and in terms of roles of these organization units. We strongly separated roles from parties (persons, organizations) “playing” these roles. For modelling of locations (organization structure, business environment) we use the party and party relationship APs. Column 4 (who, persons) describes the persons employed by the organization or parties (person, organizations) playing some other roles (customers, suppliers, etc.) related somehow with the business processes of the organization. For modelling of persons and related parties we use party and party relationship APs. Column 5 (when, events) describes all the business events which are somehow related to the organization business processes. We model the business events by using of order and inventory APs. With order AP any request (not only buying and selling) to change something in the enterprise inventory or in some other list (employees for example) can be recorded. Column 6 (why, strategies) describes the strategies in terms of business rules. We use the simple propositional calculus based rules AP as the basic model for strategies. IV.
CASE STUDY
We use the Archetypes Based Development in developing of real life Laboratory Information Management Systems (LIMS) software and LIMS SF in CBPG (Clinical and Biomedical Proteomics Group) at University of Leeds (UK) under the project with codename MyLIS. The laboratory business architecture is
When Events Business events
Order AP
Why Strategies Business rules
Inventory AP
designed according to ASTM (ASTM International, formerly known as the American Society for Testing and Materials) standard guidelines for laboratories [10] and is realized as laboratory archetypes and archetype patterns (laboratory domain model). All laboratory archetypes and archetype patterns are lifted from A&AP based metamodel. A&AP are also used in designing interoperability and data persistence features. The MyLIS A&AP based database independent design should theoretically allow different commercial databases (e.g. Oracle, MySQL, MS SQL etc) to work with the MyLIS software and should give possibility to upgrade user and even domain requirements either without or with minor changes in the database layout and therefore without needs to map data from one DB layout to other. The customizable MS Excel interface ability allows data import from (and export to) MS Excel tables by A&AP based converting. Similarly the A&AP based XML interface allows exchange of data, enabling interoperability with other systems. For example, if the medical laboratory has such terms as the patient, physician and the MTA (medical technical assistant), the database and the XML interface is designed using only the archetypal concept of the role of the party. User interface (UI) (Figure 3) is designed in such a way that the UI does not know anything about the requirements, domain and even about A&AP. UI is designed by using of reflection [22]. Reflection is the process by which a computer program can observe and modify its own structure and behaviour at runtime. This means, that MyLIS observes the structure (type and public properties) of entity to show at runtime and generates UI according to this structure. Each entity can be shown in three different views: general, detail, and edit. All views are able to show different information on the entity and the user can change the information displayed even at runtime. Written in .NET MyLIS can theoretically be deployed via MONO [23] technology into all major operating systems (including Linux, Unix, Mac, MS). The use of Tests Driven Modelling methodology ensures adequate dependability and provides the opportunity to use MyLIS in laboratory routines even in the case of prototypic software.
573
Figure 3. Screenshot of MyLIS user interface with general, detail, and edit views The prototypic MyLIS has been used in CBPG from the end of 2009 and is currently in its third version, presently being used by three different CBPG research groups with different requirements. V.
DISCUSSIONS AND RELATED WORK
Contribution of our research is unification of various software engineering aspects and utilizing them in real life software factory development to get (to generate automatically) quality software. We have compared different archetype [8], analysis [24] and data [25; 26; 27; 28] patterns, and found that these patterns are describing either exactly the same universe of discourse or its special cases. We chose the Arlow and Neustadt archetype patterns [8] because they are compatible with both the triadic model of activity [29] and Zachman Framework [17]. We reinvented these archetype patterns by separating operational and knowledge levels, as suggested by Fowler [30]. The archetype patterns obtained in this way (party, party relationship, product, order, inventory, quantity and rule), contain only the knowledge level. We use these APs as the meta-model for analyzing and designing of business domains as well as requirements. Differently from domain facets methodology [16], we use common questions (what, how, where, who, when, and why) based domain analysis methodology together with archetype patterns based meta-model, describing products (what), business processes (how), organization structures (where), persons (who), orders and inventory (when), as well as rules (why). Instead of
using specification languages (RAISE, Z, VDM or similar), we use test driven modelling [18]. TDM together with A&APs integrates modelling and programming activities and results in source artefact (APIs) which can be used as DSL when specifying requirements. While implementation and testing of the LIMS Software and LIMS Software Factory proves feasibility of archetypes based techniques in real life systems, these techniques are also in agreement with and complement important software development processes and methodologies, such as Bjørner’s domain modelling, MDA (Model Driven Architecture), XP (Extreme Programming) and CMMI (Capability Maturity Model Integration) for Development as we show in [13]. Archetypes based techniques complement XP [6] by focussing on understanding the domain and requirements (what to do). This is achieved by analyzing and modelling of domains (selecting a solution that meets the multiple demands of relevant stakeholders), by describing customer requirements in terms of the designed domain model, and by validating and verifying requirements against this domain model. As also shown in [13], by using these A&AP techniques it is also possible to cover some institutional practices that the CMMI for Development [7] identifies as key elements for good engineering and management. VI.
CONCLUSION AND F UTURE WORK
We exemplified the main idea (from archetypes based domain model via requirements to software) of the
574
Archetypes Based Development we use in developing of LIMS Software Factory. ABD is guided by Zachman Framework (ZF) and utilizes software engineering triptych together with archetypes and archetype patterns. The ZF (columns) independent phenomena (things what, processes - how, locations - where, people - who, events - when and strategies - why) are analyzed and developed by using the product (what), business process (how), organization structure (where), person (who), order and inventory (when), as well as rule (why) archetype patterns. The ZF rows are used as methodological guidance for engineering (implementing) of domain models and software by using Test Driven Modelling (from contextual model via semantic, logical, physical and detailed models to product) techniques. This means, that at any stage of development we have contextual scope as well as logical (Row 3) and physical (Row 4) models under the automated verification by semantic model (Row 2) specified as unit tests. Future work can include comparing and analyzing test driven modelling possibilities and features for example with RAISE specification language possibilities and features. The goal of this investigation would be raising the degree of formality of TDM. Other topics are related to improving and maturing of models and implementing the LIMS DSL, LIMS Engine and Test Engine (Fig.1) with a wider research goal to develop A&AP based information systems that evolve in an evolutionary way together with the business processes, using collaborative system development by their users and developers.
[9]
ACKNOWLEDGMENT
[20]
This work is supported by Estonian Ministry of Education and research (SF0140013s10; by Tallinn University of Technology (Estonia); by University of Leeds (United Kingdom); by Cancer Research UK.
[21]
REFERENCES
[24]
[1] [2] [3] [4] [5]
[6] [7] [8]
Clements, P. and Northrop, L., Software Product Lines: Practices and Patterns. Addison-Wesley, 2001. Greenfield, J., et al., Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools. Wiley, 2004. Czarnecki, K. and Eisenecker, U., Genererative Programming: Methods, Tools, and Applications. Addison-Wesley, 2000. Charette, R.N., "Why Software Fails." IEEE Spectrum. Sept. 2005. Bjørner, D., Software Engineering, Vol. 1: Abstraction and Modelling. Texts in Theoretical Computer Science, the EATCS Series.Springer, 2006. Beck, K., Extreme Programming Explained: Embrace Chang, Addison-Wesley, 2000. CMMI product team., CMMI for Development, Version 1.2, CMU/SEI-2006-TR-008.Software Engineering Institute, 2007. Arlow, J. and Neustadt, I., Enterprise Patterns and MDA: Building Better Software With Archetype Patterns and UML. AddissonWesly, 2003.
[10] [11]
[12]
[13]
[14]
[15]
[16]
[17] [18] [19]
[22] [23]
[25] [26]
[27] [28] [29]
[30]
575
Bjørner, D. , "Domain Theory: Practice and Theories ( A Discussion of Possible Research Topics)." Macau SAR, China : The 4thInternational Colloquium on Theoretical Aspects of Computing - ICTAC, 2007. ASTM., E1578-06 Standard Guide for Laboratory Information Management Systems (LIMS). ASTM International, 2006. Piho, G, Tepandi, J. and Roost, M., "Domain analysis with archetype pattern based Zachman framework for enterprise architecture." [ed.] A K Mahmood, et al. Kuala Lumpur, Malaisia, 15th - 17th June 2010 : IEEE, 2010. Proceedings The 4th International Symposium on Information Technology 2010 . Vols. 3 - Knowledge Society and System Development and Application, pp. 1351-1356. Piho, G., Tepandi, J. and Roost, M., "Towards archetypes-based software development." [ed.] T. Sobh and K. Elleithy. s.l. : Springer, 2010. Innovations in Computing Sciences and Software Engineering: Proceedings of the CISSE 2009 . pp. 561-566. Piho, G., Tepandi, J. and Roost, M., "Evaluation of Archetypes Based Development." [ed.] J. Barzdins and M. Kirikova. Frontiers in Artificial Intelligence and Applications. Databases and Information Systems VI - Selected Papers from the Ninth International Baltic Conference, DB&IS 2010, 2011, Vol. 224, pp. 283 - 295. Piho, G., Tepandi, J. and Roost, M., "The Zachman Framework with Archetypes and Archetype Patterns." [ed.] J. Barzdins and M. Kirikova. Riga, Latvia, Baltic DB&IS, July 5-7, 2010 : University of Latvia Press, 2010. Databases and Information Systems: Proceedings of the Ninth International Baltic Conference. pp. 455-570. Piho, G., et al., "From archetypes-based domain model of clinical laboratory to LIMS software." Opatia, Croatoa, 24-28 May 2010 : IEEE, 2010. MIPRO, 2010 Proceedings of the 33rd International Convention. Vol. Digital Economy, pp. 1179-1184. Bjørner, D., Software Engineering, Vol. 3: Domains, Requirements, and Software Design. Texts in Theoretical Computer Science, the EATCS Series. Springer, 2006. Zachman, J. A., "A Framework for Information Systems Architecture." IBM Systems Journal. 1987, Vol. 26, 3. Piho, G., et al., "Test Driven Modelling." Opatia, 2011. accepted by MIPRO 2011. Evans, E., Domain-Driven Design: Talking Complexity in the Heart of Software. Boston, MA : Addison-Wesley, 2004. Beck, K., Test-Driven Development: By Example. Boston, MA : Addison-Wesley, 2003. Tepandi, J., Piho, G. and Liiv, I., "Domain Engineering for Cyber Defence Visual Analytics: a Case Study and Implications." Tallinn, Estonia : CCD COE Publications, 2010. CCDCOE Conference on Cyber Conflict. pp. 59-77. Wikipedia., Reflection (computer programming). [Online]. MONO., Cross platform, open source .NET development framework. [Online]. Fowler, M., Analysis Patterns: Reusable Object Models. AddisonWesley, 2005. Hay, D. C. , Data Model Patterns: Conventions of Thought. Dorset House Publishing, 1996. Hay, D. C., Data Model Patterns, First Edition : A Metadata Map (The Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, 2006. Silverston, L., The Data Model Resource Book 1. A Library of Universal Data Models for All Enterprises. Wiley, 2001. Vol. 1. Silverston, L., The Data Model Resource Book 2: A Library of Universal Data Models by Industry Types. s.l. : Wiley, 2001. Bendy, G.Z. and Harris, S.R., "The Systematic-Structural Theory of Activity: Applications to the Study of Human Work." Mind, Culture, and Activity. 2005, Vol. 12, 2, pp. 128-147. Fowler, M., Patterns of Enterprise Application Architecture. Boston, MA : Addison-Wesley, 2003.