Generating UML Class Models from SBVR Software ... - CiteSeerX

7 downloads 5147 Views 507KB Size Report
representation such as Semantic Business Vocabulary and Rules (SBVR) language [12] can be a ... type from all other object types” e.g. library, student, etc.
Generating UML Class Models from SBVR Software Requirements Specifications Hina Afreena a

Imran Sarwar Bajwab

Dept. of CS & IT, The Islamia University of Bahawalpur, 63100, Pakistan b School of Computer Science, University of Birmingham, B15 2TT, UK Abstract

SBVR is the recent standard, introduced by OMG that can be used to capture software requirements in a natural language (NL) such as English. In this paper, we present a novel approach that can translate SBVR specification of software requirements into UML class models. We want to generate UML class models from SBVR specifications instead of NL specifications of software requirements as NL to UML translation exhibit lesser accuracy due to informal nature of natural languages. SBVR specifications can be quite helpful as SBVR is not only based on higher-order logic and easy to machine process but also easy to understand for human beings. The presented approach works as the user inputs the SBVR specification of software requirements and then the input SBVR is syntactically and semantically analyzed to extract OO information and finally OO information is mapped to a class model. The presented approach is also presented in a prototype tool SBVR2UML that is an Eclipse plugin and a proof of concept. A case study has also been solved to show that the use of SBVR in automated generation of class models provide better accuracy and consistency as compared with other available approaches.

1

Introduction

In automated software modelling, the natural language (such as English) software requirements specifications are translated to the formal specifications [1], [2] such as UML models. In the last two decades, major tools that can automatically analyze the NL requirement specification and generate the UML class models are NL-OOPS [3], D-H [2], RCR [5], LIDA [6], GOOAL [7], CM-Builder [8], ReBuilder [9], NL-OOML [10], UML-Generator [11], etc. However, non of these tools can not be used in real time software development as they provide with quite less accuracy (65% to 70%) in generating UML models. The key reason of lesser accuracy that has been identified by various researchers is ambiguous and informal nature of natural languages. Moreover, the inherent semantic inconsistencies in a natural language make it complex to machine process. A natural language such as English is ambiguous [14] due to its informal sentence structure. English is also inconsistent as majority of English words have multiple senses and a single sense can be reflected by multiple words in English. However, a formal representation such as Semantic Business Vocabulary and Rules (SBVR) language [12] can be a better solution as SBVR is not only easy to machine process due its mathematical foundation but also easy to understand for software developers and other stake-holders. SBVR was originally proposed for business people to formally represent business process specifications in natural languages. However, we propose the use of SBVR for capturing software requirements specifications and machine processing SBVR to automatically generate UML class models. In this paper, the major contribution is threefold. Firstly, a novel approach is presented that performs syntactic and semantic analysis of SBVR specification of software requirements to extract object oriented elements as classes, attributes, operations, associations, generalizations, etc. Secondly, we report the structure of the implemented tool SBVR2UML that is able to automatically generate UML class models from SBVR software requirements specifications. Thirdly, we have solved a case study with our tool and compared the results with other tools (used for automated OOA) to evaluate the tool‟s performance. The remaining paper is structured into the following sections: Section 2 explains preliminaries and

related work to SBVR. Section 3 illustrates the architecture and workflow of the presented tool, SBVR2UML. Section 4 presents a solved case study from the domain of library information systems and evaluation is shown in section 5. Finally, the paper is concluded to discuss the future work.

2

Semantic Business Vocabulary and Rules

2.1

SBVR Overview

OMG introduced Semantic Business Vocabulary and Rules (SBVR) [12] in 2008 for software and business people. For a software model, a software analyst defines the SBVR vocabulary and SBVR rules for the sake of documentation software oriented vocabularies, facts, and rules. Moreover, an XML schema is also generated to make SBVR representation easy to interchange among organizations and between software tools. In software modeling, SBVR is a modern and an improved way of capturing requirement specifications in natural languages that is not only easy to read for human beings but also simple to machine process [22]. A typical SBVR representation such as the SBVR business rules is simple to machine process due to the higher order logic foundation of SBVR. Using SBVR, one can generate a shared domain model (based on business vocabularies and rules) with the same expressive power of standard ontological languages [23]. Both constituents of a standardized SBVR representation are explained below: 2.1.1

SBVR Business Vocabulary

A business vocabulary [12] (section: 8.1) consists of all the specific terms and definitions of concepts used by an organization or community in course of business. In SBVR, a concept can be a noun concept or fact type. Figure 1 shows an overview of SBVR metamodel.

Figure 1: SBVR Vocabulary- A subset of SBVR Metamodel (by Eclipse) Following five types of SBVR vocabulary those we have used in our approach are explained below:  Object Type: A general concept that exhibits a set of characteristics to distinguishes that object type from all other object types” e.g. library, student, etc.  Individual Noun: A qualified noun that corresponds to only one object [12] (section: 8.1) e.g. „Bahawalpur is a famous city in Paksitan.  Verb Concept: A verb concept [12] (section: 8.1) specifies the relationships among noun concepts e.g. library has books.  Characteristic: An abstraction of a property of an object [12] (section: 8.1) e.g. name of student is Ahmad, here name is characteristic.



Fact Type: A fact type can be binary fact type e.g. “student borrows book”. Other possible forms of fact types are associative fact type, partitive fact types, categorization fact types, etc. We recommend reader [12] (section:8.3.4) for the detailed reading. 2.1.2

SBVR Business Rules

In SBVR 1.0, the formal representation of a business entity‟s structure or behaviour under a business jurisdiction [12] is called a SBVR business rule. A business rule typically expresses structure or operation of a particular business entity in a specified business domain. Each SBVR business rule is based on at least one fact type. The SBVR rules can be of two types:  SBVR Structural Rule: Such rules are used to define an organization‟s setup [12] (section: 12.1).  SBVR Behavioural Rule: Such rules are employed to express the conduct of a business entity [12] (section: 12.1).

2.2

SBVR and Software Modelling

Since the introduction of SBVR, SBVR has gained a major attention of software scientist. A few efforts have been done to transform UML/OCL to SBVR by Cabot [8] as he used model transformation to automatically translate formal (UML/OCL) specification to SBVR representation. Motivation behind Cabot‟s work was paraphrasing the formal specifications for understanding of novel users. Another direction of research in the area of SBVR was proposed by Amit [24] as he presented an approach to generate formal representation from SBVR business design. In business modelling domain, business process specifications are typically represented in the form of SBVR business rules. However, Amit‟s work to generate class diagrams is in early stages and not provides a complete object oriented analysis of SBVR business design. Major limitations of Amit‟s work [24] are as follows:  There is no support to extract objects.  There is no support to extract class attributes  There is not support to extract class associations, multiplicity and association ends.  Class aggregations and generalizations are also not supported. Due to these limitations, there is need of a better and improved approach that can automatically perform a complete object oriented analysis of SBVR specification of software requirements and generate a complete UML class model. The presented approach is not only able to extracts the UML classes, objects, their attributes and operations but also extracts the associations, aggregations and generalizations.

3

The SBVR2UML

This section explains the used approach to automatically map SBVR representation i.e. SBVR business rules to a UML class model. To map SBVR to a UML class model, we have to extract SBVR vocabulary from given SBVR rules and then map the SBVR vocabulary to basic elements of a UML class model (such as classes, associations, etc.) and finally generate a graphical representation of class model. The used approach works in five phases (see Figure 1). All these five distinct phases are explained in detail in the remaining part of the section.

3.1

Analysis of SBVR Specification

The first phase of SBVR2UML is analysis of input SBVR specification. The analysis was performed into three phases such as lexical, syntax, and semantic analysis. All three phases work as follows:  The preprocessing starts with the lexical processing of a plain text file containing the SBVR specification of software requirements. The lexical processing initiates by splitting the SBVR rules and storing each SBVR rule as a separate string in an arraylist. Each SBVR rule is tokenized using java tokenizer class. A SBVR rule expression “It is permitted that a library can issue books to each member.” is tokenized as [It] [is] [permitted] [that] [a] [library] [can] [issue] [books] [to] [each] [member] [.]. The output is stored in an arraylist.

SBVR Specification of Software Requirements Performing Analysis of SBVR Specification Extracting Business Vocabulary Extracting UML Class Model Elements Generating Class Model Diagram Figure 1. The SBVR2UML Approach

 The preprocessed text is further passed to Stanford parts-of- speech (POS [13] tagger v3.0 to identify the basic POS tags e.g. It/PRP is/VBZ permitted/VBN that/IN a/DT library/NN can/MD issue/VB books/NNS to/TO each/DT member/NN ./. The Stanford POS tagger v3.0 can identify 44 POS tags.We have used an enhanced version of a rulebased bottom-up parser for the syntactic analyze of the input text used in [11]. English grammar rules are base of the used English parser. The text is syntactically analyzed and a parse tree is generated for further semantic processing.  In this semantic interpretation phase, role labeling is performed. The desired role labels are actors (nouns used in subject part), co-actor (additional actors conjuncted with „and’), action (action verb), thematic object (nouns used in object part), and a beneficiary (nouns used in adverb part) if exists. All roles are used to identify various SBVR vocabulary items, finally stored in an arraylist.

3.2

SBVR Vocabulary Extraction

In this phase, the basic SBVR elements e.g. noun concept, individual concept, object type, verb concepts, etc are identified from the English input that is preprocess in the previous phase. The extraction of various SBVR elements is described below:  All common nouns (actors, co-actors, thematic objects, or beneficiaries) are represented as the object types or general concept e.g. belt, user, cup, etc. In conceptual modelling, the object types are mapped to classes.  All proper nouns (actors, co-actors, thematic objects, or beneficiaries) are represented as the individual concepts.  The auxiliary and action verbs are represented as verb concepts. To constructing a fact types, the combination of an object type/individual concept + verb forms a unary fact type e.g. “vision system senses”. Similarly, the combination of an object type/individual concept + verb + object type forms a binary fact type e.g. belt conveys part is a binary fact type.  In English, the characteristic [12] (section:11.1.2.2) or attributes are typically represented using isproperty-of fact type e.g. “name is-property-of customer”. Moreover, the use of possessed nouns (i.e. pre-fixed by’s or post-fixed by of) e.g. student‟s age or age of student is also characteristic.  All indefinite articles (a and an), plural nouns (prefixed with s) and cardinal numbers (2 or two) represent quantifications.  The associative fact types [12] (section 11.1.5.1) are identified by associative or pragmatic relations in English text. In English, the binary fact types are typical examples of associative fact types e.g. “The belt conveys the parts”. In this example, there is a binary association in belt and parts concepts. This association is one-to-many as „parts’ concept is plural. In conceptual modeling of SBVR, associative fact types are mapped to associations.  The partitive fact types [12] (section 11.1.5.1) are identified by extracting structures such as “ispart-of”, “included-in” or “belong-to” e.g. “The user puts two-kinds-of parts, dish and cup”. Here „parts‟ is generalized form of „dish’ and „cup’. In conceptual modeling of SBVR, categorization

fact types are mapped to aggregations.  The categorization fact types [12] (section 11.1.5.2) are identified by extracting structures such as “is-category-of” or “is-type-of”, “is-kind-of” e.g. “The user puts two-kinds-of parts, dish and cup”. Here „parts‟ is generalized form of „dish’ and „cup’. In conceptual modeling, categorization fact types are mapped to generalizations. All the extracted information is stored in an arraylist.

3.3

Object-Oriented Analysis

In this phase, finally the SBVR rule is further processed to extract the OO information. The extraction of each OO element from SBVR representation is described below:  All SBVR object types are mapped to UML classes e.g. library, book, etc.  The SBVR individual concepts are mapped to UML instances.  All the SBVR characteristics or unary fact types (without action verbs) associated to an object type are mapped to attributes of a class.  All the SBVR verb concepts (action verbs) associated to a noun concept are mapped to methods for a class e.g. issue() is method of library class.  A unary fact type with action verb is mapped to a unary relationship and all associative fact types 1 are mapped to binary relationships. The use of quantifications with the respective noun concept is employed to identify multiplicity e.g. library and book(s) will have one to many [20] association. The associated verb concept is used as caption of association.  The partitive fact types are specified as generalizations. The subject-part of the fact type is considered the main class in generalization while is considered as the sub class.  The categorization fact types are mapped to aggregations. The subject-part of the fact type is considered the main class in aggregation while object-part is considered as the sub class.

3.4

Drawing UML Class Model

This phase draws a UML class model by combining class diagram symbols with respect to the information extracted of the previous phase. In this phase, the java graphics functions (drawline(), drawrect(), etc) are used to draw the class diagram symbols.

4

A Case Study

A case study is discussed from the domain of library information systems that was originally presented by Callan [15] and later on solved by Harmain [8]. Following is the problem statement of the case study. A library issues loan items to customers. Each customer is known as a member and is issued a membership card that shows a unique member number. Along with the membership number other details on a customer must be kept such as a name, address, and date of birth. The library is made up of a number of subject sections. Each section is denoted by a classification mark. A loan item is uniquely identified by a bar code. There are two types of loan items, language tapes, and books. A language tape has a title language (e.g. French), and level (e.g. beginner). A book has a title, and author(s). A customer may borrow up to a maximum of 8 items. An item can be borrowed, reserved or renewed to extend a current loan. When an item is issued the customer's membership number is scanned via a bar code reader or entered manually. If the membership is still valid and the number of items on loan less than 8, the book bar code is read, either via the bar code reader or entered manually. If the item can be issued (e.g. not reserved) the item is stamped and then issued. The library must support the facility for an item to be searched and for a daily update of records.

We generated the SBVR version of the problem statement of the case study by using NL2SBVR tool [21]. The SBVR specification (output of NL2SBVR tool) was given as input to the SBVR2UML tool that is an Eclipse plugin implemented in java as a proof of concept. The SBVR specification after extracting SBVR vocabulary is as follows: A library issues loan items to each customer. Each customer is known as a member and is issued a membership card that shows a unique member number. It is necessary that the membership number and other

details on a customer must be kept such as a name, address, and date-of-birth. The library is made up of a number of subject sections. Each section is denoted by a classification-mark. A loan item is identified by a bar-code. There are exactly two types of loan items, language tapes, and books. A language tape has a titlelanguage, and level. A book has a title, and author(s). It is possibility that each customer may borrow up to at most 8 items. It is possibility that each item can be borrowed, reserved or renewed to extend a current loan. When an item is issued the customer‟s membership-number is scanned via a bar code reader or entered manually. If the membership is valid and the number of items on loan at most 8, the book‟s bar-code is read, either via the bar code reader or entered manually. It is possibility that if the item can be issued the item is stamped and then issued. It is necessary that the library must support the facility for an item to be searched and for a daily update of records.

Afterwards, the extracted SBVR vocabulary was mapped to the UML class elements. Following information was extracted in OO analysis phase: Table I. Object Oriented Analysis Results Of The Case Study

Example Classes

Count 10

Attributes

10

Methods

11

Associations

07

Generalizations

02

Aggregations Instances

00 00

Details Library, Loan_Items, Member_Number, Customer, Book, Language_Tape Member, Bar_Code_Reader, Subject_Section, Membership_Card name, address, date-of-birth, bar_code, classification_mark, title, author, Level, membership-number, valid issue(), show(), denote(), identify(), extend(), scan(), enter(), read_barcode(), stamp(), search(). update() Library issues Loan_Items; Member_Card issued to Member; Library made up of Subject_sections; Customer borrow Loan_items; customer renew Loan_item; customer reserve_Loan_item; Library support facility Loan Items is type-of Language_tapes, Loan Items is type-of Books -

Figure 2. A class model of case study generated by SBVR2UML

A screen shot of a class model generated from extracted object oriented information of the input case study shown in figure 2.There were some synonyms for the used classes such as Item and Loan_Item, Section and Subject_Section. Our system keeps only one of the similar classes. Here, customer and member are also synonyms, but our system is not able to handle such similarities. There is only one wrong class that is Member_Number as it is an attribute. There are two incorrect associations: “Library support facility” is not an association and “Library made up of Subject_sections” is an aggregation but classified as an association.

5

Evaluation

We have done performance evaluation to evaluate the accuracy of SBVR2UML tool. An evaluation methodology, for the performance evaluation of NLP tools, proposed by Hirschman and Thompson [16]. To evaluate the results of SBVR2UML, each outcome (class names, attributes names, method names, associations, multiplicity generalizations, aggregations, and instance names) of the SBVR2UML‟s output was matched with the expert‟s opinion (Nsample) (sample solution). The outcome that accurately classified into respective category was declared correct (Ncorrect) otherwise incorrect (Nincorrect). Additionally, the information that was not extracted (or missed) by the NL2SBVR tool but it was given in the human expert‟s opinion (Nsample) was categorized as the missing information (Nmissing). The calculated recall and precision values of the solved case study are shown in table II. Table II -

SBVR2UML Evaluation Results

Example

Nsample

Ncorrect

Nincorrect

Nmissing

Rec%

Prec%

Results

40

37

2

1

92.50

94.87

Average recall for English requirement specification is calculated 92.5% while average precision is calculated 94.87%. These results are very encouraging for the future enhancements. We have also compared the results of SBVR2UML with other available tools that can perform automated analysis of the NL requirement specifications. Recall value was not available for some of the tools. We have used the available recall and precision values of the tools for comparison (see table III): Table III -

A Comparison of Performance Evaluation – SBVR2UML vs other tools

NL Tools for Class Modelling

Recall

Precision

CM-Builder (Harmain, 2003)

73.00%

66.00%

GOOAL (Perez-Gonzalez, 2002)

-

78.00%

NL-OOML (Anandha, 2006)

-

82.00%

LIDA (Overmyer, 2001)

71.32%

63.17%

SBVR2UML

92.50%

94.87%

Here, we can note that the accuracy of other NL tools used for information extraction and object oriented analysis is well below than SBVR2UML. Moreover, the various tools‟ functionalities (if available, is automated or user involved) are also compared with SBVR2UML as shown in Table IV: Table IV:

Support Classes Attributes Methods Associations Multiplicity Aggregation

Comparison of SBVR2UML with other tools

CM- Builder

LIDA

GOOAL

Yes Yes No Yes Yes No

User User User User User No

Yes Yes Yes Semi-NL No No

NL- OOML UML 2SBVR Yes Yes Yes No No No

Yes Yes Yes Yes Yes Yes

Generalization Instances

No No

No No

No No

No No

Yes Yes

Table IV shows that besides SBVR2UML, there are very few tools those can extract information such as multiplicity, aggregations, generalizations, and instances from NL requirement. Thus, the results of this initial performance evaluation are very encouraging and support both the approach adopted in this paper and the potential of this technology in general.

6

Conclusions

The primary objective of the paper was to address the challenge of addressing ambiguous nature of natural languages (such as English) and generate a controlled representation of English so that the accuracy of machine processing can be improved. To address this challenge we have presented a NL based automated approach to parse English software requirements specifications and generated a controlled representation using SBVR. Automated object oriented analysis of SBVR specifications of software requirements using the SBVR2UML provides a higher accuracy as compared to other available NL-based tools. Besides better accuracy, SBVR has also enabled to extract OO information such as association multiplicity, aggregations, generalizations, and instances as other NL-based tools can‟t process and extract this information. Some non-functional requirements in the case study such as “If the membership is still valid and the number of items on loan less than 8, the book bar code is read” and “If the item can be issued (e.g. not reserved) the item is stamped and then issued.” are not part of the output class model. These are basically constraints and it is our future work to also generate Object Constraint language (OCL) for these natural language constraints.

7

References

[1] Bryant B.R, Lee, B.S., et al. 2008. From Natural Language Requirements to Executable Models of Software Components. In Workshop on S. E. for Embedded Systems:51-58. [2] Ilieva, M.G., Ormandjieva, O. 2005. Automatic Transition of Natural Language Software Requirements Specification into Formal Presentation. in proc. of Natural Language Processing and Information Systems LNCS- 3513/2005:427-434mda/ [3] Mich, L. 1996. NL-OOPS: from natural language to object oriented requirements using the natural language processing system LOLITA. Natural Language Engineering. 2(2):167-181 [4] Delisle, S. Barker, K. Biskri, I. 1998. Object-Oriented Analysis: Getting Help from Robust Computational Linguistic Tools. 4th International Conference on Applications of Natural Language to Information Systems, Klagenfurt, Austria:167-172. [5] Börstler, J. 1999. User - Centered Requirements Engineering in RECORD - An Overview. Nordic Workshop on Programming Environment Research NWPER'96, Aalborg, Denmark:149-156. [6] Overmyer, S.V., Rambow, O. 2001. Conceptual Modeling through Linguistics Analysis Using LIDA. 23rd International Conference on Software engineering, July 2001 [7] Perez-Gonzalez, H. G., Kalita, J.K. 2002. GOOAL: A Graphic Object Oriented Analysis Laboratory. 17th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA '02), NY, USA: 38-39. [8] Harmain, H. M., Gaizauskas R. 2003. CM-Builder: A Natural Language-Based CASE Tool for Object- Oriented Analysis. Automated Software Engineering. 10(2):157-181

[9] Oliveira, A., Seco N. and Gomes P. 2006. A CBR Approach to Text to Class Diagram Translation. TCBR Workshop at the 8th European Conference on Case-Based Reasoning, Turkey. [10] Anandha G.S., Uma G.V. 2006. Automatic Construction of Object Oriented Design Models [UML Diagrams] from Natural Language Requirements Specification. PRICAI 2006: Trends in Artificial Intelligence, LNCS 4099/2006: 1155-1159 [11] Bajwa I.S., Samad A., Mumtaz S. 2009. Object Oriented Software modeling Using NLP based Knowledge Extraction. European Journal of Scientific Research, 35(01):22-33 [12] OMG. 2008. Semantics of Business vocabulary and Rules. (SBVR) Standard v.1.0. Object Management Group, Available: http://www.omg.org/spec/SBVR/1.0/ [13] Toutanova. K., Manning, C.D. 2000. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: 63-70. [14] Li, K., Dewar, R.G., Pooley, R.J. 2005. Object-Oriented Analysis Using Natural Language Processing, Linguistic Analysis (2005) [15] Callan. R.E. 1994. Building Object-Oriented Systems: An introduction from concepts to implementation in C++. In Computational Mechanics Publications, 1994. [16] Hirschman L., and Thompson, H.S. 1995. Chapter 13 evaluation: Overview of evaluation in speech and natural language processing. In Survey of the State of the Art in Human Language Technology. [17] Berry M.D., 2008. Ambiguity in Natural Language Requirements Documents. In Innovations for Requirement Analysis. From Stakeholders’ Needs to Formal Designs, LNCS-5320/2008:1-7 [18] Ormandjieva O., Hussain, I., Kosseim, L. 2007. Toward A Text Classification System for the Quality Assessment of Software Requirements written in Natural Language. in 4th International Workshop on Software Quality Assurance (SOQUA '07):39-45. [19] Denger, C., Berry, D.M. Kamsties, E. 2003. Higher Quality Requirements Specifications through Natural Language Patterns. In Proceedings of IEEE International Conference on Software-Science, Technology \& Engineering (SWSTE '03):80-85 [20] OMG. (2007). Unified Modelling Language (UML) Standard version 2.1.2. Object Management Group, Available at: http://www.omg.org/ [21] Imran Sarwar Bajwa, Mark G. Lee, Behzad Bordbar [2011] SBVR Business Rules Generation from Natural Language Specification. in proceedings of AAAI 2011 Spring Symposium -AI4BA, San Francisco, USA, Mar 2011, pp:2-8 [22] Imran Sarwar Bajwa, M Asif Naeem (2011) On Specifying Requirements using a Semantically Controlled Representation In: 16th International Conference on Applications of Natural Languages to Information Systems (NLDB 2011) 217-220 Alicante, Spain: Springer Verlag [23] Imran Sarwar Bajwa, M Asif Naeem, Ahsan Ali, Shahzad Ali (2011) A Controlled Natural Language Interface to Class Models In: 13th International Conference on Enterprise Information Systems (ICEIS 2011) 102-110 [24] Amit Raj , T. V. Prabhakar , Stan Hendryx, Transformation of SBVR business design to UML models, Proceedings of the 1st India software engineering conference, February 19-22, 2008.