Restructuring Natural Language Text to Elicit Software Requirements

Proceedings of the International Conference on Cognition and Recognition

Restructuring Natural Language Text to Elicit Software Requirements G.S. Anandha Mala 1, J. Jayaradika2 and G.V. Uma3 Department of Computer Science and Engineering College of Engineering, Anna University, Guindy Chennai, Tamil Nadu, India-600025. 1 [email protected], 2 [email protected], 3 [email protected] Abstract Application of natural language understanding to requirements gathering remains a field that has only limited explorations so far. This paper presents an approach to extract the elements of the required system by subjecting its problem statement to object oriented analysis. This approach starts with assigning the parts of speech tags to each word in the given input document. The text thus tagged is restructured into a normalised subject-verb -object form. Further, to resolve the ambiguity posed by the pronouns, the pronoun resolutions are performed before normalising the text. Finally the elements of the object-oriented system namely the classes, the attributes, methods and relations hips between the classes, the use-cases and actors are identified by mapping the ‘parts of speech- tagged’ words of the natural language text onto the Object Oriented Modelling Language elements using mapping rules. 1. INTRODUCTION Requirements capture is the initial phase in any software lifecycle that is expected to provide a holistic understanding of the systems scope, which describes, at a very high level what needs to be accomplished. This phase has its own challenges namely the evolving requirements, conflicting requirements, inability to track the changes to requirements and translation of the implied requirements etc. It is often the case that users find it difficult to articulate their requirements. In addition to the human-centered nature of this phase, the challenges mentioned above also prevent the requirements gathering from being automated. As already several attempts have been made to semi automate this process, this is yet another approach that does automatic pronoun resolution additionally. The paper begins with a review of the advances in the field of requirements engineering followed by the natural language understanding techniques that are applied in this approach in section 2. The proposed methodology is explained in section 3. Our implementation and results are explained in section 4. The conclusion and future work are contained in Section 5. 2. RELATED WORK Although it has been proven that Natural Language processing with holistic objectives is a very complex, it is possible to extract sufficient meaning from NL sentences to produce reliable models. The first relevant published technique attempting to produce a systematic procedure to produce design models from NL requirements was Abbot [11]. Abbot suggested a non automatic methodology that only produces static analysis and design products obtained by an informal technique requiring high participation with that of users’ for decisions where as the technique proposed in this paper is formal in the sense that it uses proven algorithms and does not require human support. Methods to bring out a justified relationship between the naturallanguage structures and 00 concept is proposed by Sylvain [3] who show that computational linguistic tools, are appropriate for preliminary computer assisted OO analysis. The first draft obtained from such a text analysis system can then be used as the foundation of a more elaborate OO model. But these are all the formal approaches that have duly to be assisted by humans. Leite [1] presents 5 models for the Natural language requirements modelling. Using the appropriate modelling techniques, information can be obtained and analysed to discover the requirements; these are decomposed into system building blocks to model the static and dynamic aspects of the developed system. Paul Rayson, Roger Garside and Pete Sawyer in their Revere [2] make use of a lexicon to disambiguate the word senses thus obtaining a summary of requirements from a natural language text but do not attempt to model the system as it is done in our methodology. Liwu Li [8] also presents a semi-automatic approach to translate a use case to a sequence diagram, which can be easily used in software design. It needs to normalize a use case manually. Overmyer [5], also present only a complete interactive 521

Restructuring Natural Language Text to Elicit Software Requirements

methodology and prototype. It is a usable tool to provide linguistic assistance to produce a subset of UML results. However, the text analysis remains in good part a manual process. Moreno [13] proposes an approach to define a justified relationship between the natural language structures and 00 concepts. They translated the utility language’s linguistic structures into mathematical representations and the 00 models into mathematical structures, and established equivalences between the mathematical representations of these translations. Liu [9] present an approach with a set of artifacts and methodologies, and to automate the transition from requirement to detail designs. Use cases are applied as the method to capture and record requirements. All the use cas es are formalized by a use case template. They apply robustness analysis to bridge the gap between a use case and its realization. And though this method eventually ends up with the identification of the system building blocks, it starts the processing from the ‘use-case specifications’-for, which they have proposed a specification language. Unlike this, the approach presented in this paper analyses the ad-hoc natural language specification. 3. THE PROPOSED SYSTEM In all the earlier work mentioned, the requirement elicitations are not fully automatic. User assistance is required in pronoun resolution. The proposed methodology includes the automatic reference resolution, which eliminates the user intervention as in the previous works. The processes that are performed by the system are shown in the figure 1. 3.1 Sentence Splitting The given input problem statement is split into sentences to ease the further processing. 3.2. Tagging Each sentence of the input text is subjected to tagging in order to get the parts of speech marker for every word. Tagging of the words is necessary to chunk the words. Also the words that are candidates for classes, attributes, methods, use cases and actors have to be chosen depending upon their tags. 3.3 Phrase Chunking The noun and the verb phrases are identified based on simple phrasal grammars. 3.4 Reference Resolution When normalising the sentences into S-V-O pattern, the subjects and objects sometimes happen to be pronouns. In that case, they have to be resolved to their res pective noun phrases. Input problem statement

REQUIREMENTS ELICITOR Split Sentences Sentence Splitter

Tagged Text Tagger

Chunker Phrase chunked text

NL-OOML Mapper

Normaliser S-V-O triplets

Reference Resolver Resolved References

Actors, Use cases & Class names, Methods, Attributes, Relationships Fig. 1: System Architecture

522


3.5 Normalisation of Nl Structure The text has to be simplified into the Subject-Verb-Object to ease the task of mapping the words onto the OO system constituents. This is achieved by applying handcrafted rules as given below. 1. When the Subject precedes an equal parallel structure, split into two (or more) simpler sentences. i.e., convert ‘‘S-V1-O1V2-O2” , to “ S-V1-O1” “ S-V2- O2” . 2. When both Subject and Verb precede an equal parallel structure, split into two (or more) simpler sentences. i.e., convert ‘‘S-V-O1-, /and-O2-, /and,-O3” to ‘‘S-V-O1” “ S-V-O2” “ S-V-O3” . 3. When Verb lead equal parallel structure, split using ‘‘(Subject)-V-O 1-, /and-O2-, /and,- O3…’’ to ‘‘(Subject)-V-O1” “VO2” “V-O3” 3.6 Mapping the Natural Language Text Onto the Object Oriented Modelling Language Simple rule based approach is followed for the identification of object oriented system elements. 3.6.1 Rules to Generate Class Diagram Constituents List of rules are given below: 1: Translating Nouns to Classes. 2: Translating Noun -Noun to Class-Property according to position. 3: Translating Subject (S) – Verb (V) – Object (O) structure to a class diagram with the Subject and Object as classes both sharing the verb as a candidate method. 4: Translating the lexical Verb of a non-personal noun to a Method of this noun. 5: Translating the lexical Verb of a personal noun to a usecase (or part of a use case) linked with an actor defined by this noun. 6: Matching a Noun to a Personal Pronoun as the nouns of previous sentence. 7: For ‘‘S-V-O-O” structure, convert to ‘‘S-V-OO” , i.e. treat compound nouns as one entity. 3.6.2 Rules for Mapping to Relationships After identifying the nouns and verbs and creating a simplified, canonical noun-verb-noun structure we are in a position to map this information onto classes and their relationships with each other. Step 1 Step 2

: If the sentence uses the passive voice, convert into active voice : Translating the lexical Verb to a Method of the classes formed from the two Nouns either side of this Verb, so that a relationship can be created between the classes.

4. IMPLEMENTATION AND RESULTS The system was named as ‘Requirements Elicitor’. It was implemented using JAVA programming language. The ‘Requirements Elicitor’ was validated using 10 problem samples each of around 500 lines. The result produced by the system was compared with that of the human output. The human outputs were the results that were obtained by conducting the noun verb analysis on the text. It was considered as the baseline and taken as expert judgment. The requirement elicitor does not miss to identify any of the classes and methods since all candidate nouns and verbs are identified to be classes and methods respectively. But approximately 12.4 % of additional classes and 7.4 % of additional methods are identified in all the samples taken. The additionally identified candidates are those that will usually be removed by human by intuition. Since the system lacks this knowledge, they were also listed as classes. The missed out methods occur only if the tagger assigns a wrong tag to the word. Also the system perfectly identifies all the attributes, usecases and actors with out any additional, missed or miss assignments. 5. CONCLUSIONS AND FUTURE WORK The project presents an approach to restructure the natural language text into a modelling language in order to elicit the stated requirements of a system. Further the work can be extended for identifying the different modules present in the requirement specification by properly segmenting the input text, which will help us to identify the packages. The segmented text can be processed as said earlier, so that the usecases and classes involved in each package can be identified and interface between the 523

Restructuring Natural Language Text to Elicit Software Requirements

packages also can be identified. All these above activities are the initial steps taken by the software development groups with the aid of existing tools, which are automated by our ‘Requirements Elicitor’. The deficiencies in the tagger and the reference resolver can be overcome by building a knowledge base which can also improve the effectiveness of generation of the system elements. 6. REFERENCES [1] Leite, J.C.S, Hadad, G., Doorn, J., Kaplan, G., (2000) ‘A Scenario Construction Process’, Requirements Engineering Journal, 5(1), Springer Verlag, pp. 38 -61. [2] Sawyer P. Rayson P. Garside R. (2002) ‘REVERE: Support for requirements synthesis from documents’. [3] Sylvain Delisle, Ken Barker, Ismaïl Biskri. (1999) ’Object oriented analysis: Getting help from robust computational linguistic tools ‘., [4] Niba, L.C.: (2002) ‘The NIBA workflow: From textual requirements specifications to UML schemata’. Proceedings of International Conference on Software & Systems Engineering and their Applications, ICSSEA. [5] Overmyer ScottP., Lavoie.B.Rambow, (2001) ’Conceptual modelling through linguistic analysis using LIDA’. Proceedings of the 23rd International Conference on Software Engineering, ICSE 2001, Toronto. [6] Alistair A.R. Cockburn.(1992) ’ Using Natural Language as a Metaphoric Base for Object -Oriented Modelling and Programming.’ IBM Technical Report TR-36.0002. [7] Boyd, N. (1999) ‘Using Natural Language in Software Development’, Journal of Object Oriented Programming-JOOP, 11(9), pp 45-55. [8] Liwu Li, (1999) ‘A semi-automatic approach to translating use cases to sequence diagrams,’ Proceedings of Technology of Object -Oriented Languages and Systems.. [9] Liu, D Kalaivani Subramaniam, Behrouz H. Far, Armin Eberlein.(2003) ‘Autom ating transition from use cases to class model’. MSc Thesis, University of Calgary, Calgary. [10] Dong Liu, Kalaivani Subramaniam, Armin Eberlein, and Behrouz H. Far, (2004) ‘Natural Language Requirements Analysis and Class Model Generation Using UCDA’. [11] Abbot.R.J.(1983) ’ Program Design by informal English descriptions’. Communications of the ACM, November 1983. [12] Robertson S.(2001) ‘Requirements trawling: techniques for discovering requirements’. International Journal on HumanComputer Studies Vol. 55, Pp. 405-421. [13] Juristo, N., Moreno, A., López, M., (2000) ‘How to Use Linguistic Instruments for Object- Oriented Analysis’, IEEE Software, 17(3), pp. 80-89. [14] Brill E.’ (1992) ‘A simple rule-based part -of-speech tagger’. Proceedings of Third Conference on Applied Natural Language Processing. Trento, Italy. 1992. [15] Daniel Hardt, (2004) Dynamic Centering, Proceedings of the Workshop on Reference Resolution and its Applications: ACL. [16] Mitkov, R. (1998) ‘Robust pronoun resolution with limited knowledge, In: Proceedings of the 18.th International Conference on Computational Linguistics (COLING'98)/ACL'98 Conference Montreal, Canada 869-875. [17] Mitkov, Ruslan; Evans, Richard and Orasan, Constantin..(2002) ‘A new, fully automatic version of Mitkov’s knowledgepoor pronoun resolution system’. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, Mexico-City, Mexico. APPENDIX: SAMPLE TEXT [ TICKET RESERVATION MODULE OF AIRLINE TICKET RES ERVATION SYSTEM ] & USECASE AND CLASS DIAGRAM GENERATED The passenger has the ssn, the name, street, city, state, telephone, details and age. Whenever the passenger wants ticket, the receptionist meets the passenger. The receptionist gets the passenger-details. The passenger gives details and requests ticket. The aircraft has number, name, no-of-coaches and details. Every route has the route number, the source, the destination and the intermediate stops. A ticket includes ticket number, the aircraft number, the journey date, the departure time, the passenger age, the coach number, the seat-berth number, the total amount and the reservation charges. The seat has coach, seat -berth, class, rate and status. The receptionist receives the request. The receptionis t checks the aircraft. He checks the route then. If the source and the destination of the request fall on the same route, the receptionist checks the seat. If the seat is available, the

524


passenger gets the ticket. The receptionist issues the ticket. He stores the ticket in the database. He blocks the seat. If the passenger requests ticket and he cannot get one, he can reserve. He can cancel the ticket also. If the ticket is cancelled, the seat may be freed. If there is a reservation, the seat is allotted to that passenger. The seat is retained ‘blocked’. If there is no reservation of the seat, the receptionist frees the seat. Once the passenger gets a ticket, a history is created in the database. The history is a report on the passenger. The history includes the ssn and preferences. As the history is important, it is stored in the database. A receptionist observes the history. Every passenger possesses a history. If the passenger does not contact for a long time, the history is deleted from the database. Otherwise the history is retained in database. On observing the history, the receptionist provides the preferences to the passenger.

525