Automated Identification of LTL Patterns in Natural ... - Semantic Scholar

5 downloads 1619 Views 408KB Size Report
ly, the great majority of software requirements we have ... tracing methods and tools for NASA, such as keyword ..... the Ranker attribute ranking technique [15].
20th International Symposium on Software Reliability Engineering

Automated Identification of LTL Patterns in Natural Language Requirements Allen P. Nikora Jet Propulsion Laboratory, California Institute of Technology Pasadena, CA 91109-8099 [email protected]

Galen Balcom California State University Fresno, CA 93740 [email protected]

Abstract—Analyzing requirements for consistency and checking them for correctness can require significant effort, particularly if they have not been maintained with a requirements management tool (e.g., DOORS) or specified in a machine-readable notation. By restricting the number of requirements being analyzed, fewer opportunities exist for introducing errors into the analysis. This can be accomplished by subsetting the requirements and analyzing one subset at a time. Previous work showed that simple natural language processing and machine learning techniques can be used to identify temporal requirements within a set of natural language requirements. This paper builds on that work by detailing our results in applying these techniques to a set of natural-language temporal requirements taken from a current JPL mission and determining whether a requirement is one of the most frequently occurring types of temporal requirements. The ability to distinguish between different LTL patterns in natural-language requirements raises the possibility of automating the transformation of natural-language temporal requirements into LTL expressions. This would allow automated consistency checking and tracing of naturallanguage temporal requirements. Since correctness properties are often specified as LTL expressions, this would also provide a set of correctness properties against which abstract models of the system could be verified.

organizations. A significant amount of effort can be required to learn: A formal specification language Effective use of the analysis tools. The skill of abstracting away unnecessary detail when specifying requirements in a formal language. At the same time, there is increasing pressure to minimize the cost and schedule time for software development efforts, giving software developers even less time to learn effective use of formal techniques. Consequently, the great majority of software requirements we have seen continue to be specified in natural language. Any effective technique for analyzing requirements, then, must be able to deal with natural language requirements text. This paper describes work in one specific area, that of developing subsets of related requirements. If requirements can be classified into distinct types (e.g., different types of temporal requirements), developers and assurance personnel would be able to analyze the different categories separately, reducing the amount of effort required to conduct an analysis and potentially increasing its accuracy. Results obtained to date indicate that accurate classifiers for categorizing requirements may be developed using simple representations of the text and structure of natural language requirements. Once a collection of requirements has been categorized, each category may be analyzed separately, potentially reducing the effort required and increasing the accuracy of the analysis. The remainder of the paper is organized as follows: Section 2 describes work related to identifying specific types of requirements (e.g., temporal) specified in natural-language text. Section 3 describes an approach to identifying different types of temporal requirements in natural language text, and presents the results of our work. Section 4 discusses the results of this work, and lays out directions for future research in this area.

Keywords - requirements analysis, temporal requirements; machine learning; natural language processing

I. INTRODUCTION Ensuring that higher-level requirements are accurately reflected in lower-level requirements, design elements, and implemented components is a key aspect of producing reliable software. A number of techniques and tools focusing on different aspects of this problem have been developed. For example, a model checker such as SPIN [1] can be used to assure that a specification satisfies the constraints imposed by a higher-level set of requirements. However, a system‘s requirements and/or design must be represented with formal specification languages to use these techniques. Although these techniques can be quite effective in identifying requirements defects, they still have a low level of acceptance in software development

978-0-7695-3878-5/09 $26.00 © 2009 IEEE DOI 10.1109/ISSRE.2009.15

II. RELATED WORK Hayes et al. have developed several requirements tracing methods and tools for NASA, such as keyword

185

matching based SFEP for Software Automated Verification and Validation and Analysis System (SAVVAS) [2] and Information Retrieval (IR) based approach [3, 4]. To determine whether a higher-level requirement is related to a lower-level requirement, the IR based requirements tracing method counts the number of common terms and their frequency in both requirements, then determines their similarity based on the relative frequencies of common terms. It has been applied to a small project and has shown improvement over manual tracing. Although this technique can be used to identify related requirements within a subset of requirements of a specific type (e.g., temporal requirements), its intended use is not to identify requirements of a specific type. Cobleigh et al. have developed Propel, a technique and toolset for elucidating rigorous system properties in natural language [5,6,7,8]. The Propel website [8] states that ―The Propel approach provides templates that explicitly capture these [subtle but important, and often unconsidered] details for property patterns that commonly occur in the properties that are created for model checking and other types of analysis. With Propel, users are shown the evolving property specification in both "disciplined" English [a restricted subset of natural language] sentences and graphical finite-state automata (FSA), allowing the specifier to easily move between these two views as they develop their properties.‖ Although Propel specifications are not currently translated to LTL [1], Propel‘s developers plan this as a future capability. Although the following tool does not use natural language, it is included here because of its intent to simplify the task of developing properties for system verification. The timeline editor [1], developed by Margaret Smith, allows users to draw system properties as timelines, eliminating the need to write LTL expressions. The properties can be visualized as finite state automata, and are converted to ―never claim‖ automata (finite state automata specifying behavior that the system should never exhibit) used by the SPIN model checker in its verifications.

ing is to transform natural language temporal requirements into LTL expressions. The first step, discriminating between temporal and non-temporal natural language requirements, was demonstrated in previous work [9]. Progress for the second step, assigning the appropriate LTL pattern to each temporal requirement, is reported in the remainder of this paper. Applying semantic techniques to populate the LTL patterns identified in the second step with information from the natural language text of the requirement remains as future work. B. LTL Patterns in Temporal Requirements The first step in assigning an LTL expression type to a natural language temporal requirement was to select a set of LTL patterns likely to include each different type of temporal requirement that would be encountered in the requirements documents being analyzed. The LTL patterns developed by Dwyer et al. [10, 11] was selected to categorize the temporal requirements being analyzed. To develop a training set which machine learning techniques might be applied, the temporal requirements identified in previous work [9] were assigned to one of the LTL patterns defined by Dwyer et al. Since there are 55 possible LTL patterns, reducing the number of possible choices while still retaining a useful number of categories was a practical consideration. The cumulative frequencies for the most frequently observed LTL patterns in the requirements that were analyzed are shown in Fig. 1. As a cut-off point, we chose only those LTL patterns accounting for at least 80% of the total number of LTL patterns observed in the requirements. In Fig. 1, each ―LTL Pattern Name‖ is described in Table I, which in turn refers to one of the 55 LTL patterns documented on the SAnToS Laboratory website [11]. It is interesting to note that only a small fraction of the 55 LTL patterns account for over 80% of the temporal requirements that were analyzed. This is at least consistent with anecdotal evidence that only a small part of the complete set of 55 LTL patterns described in [11] is used in specifying real systems. In Table I, ―U‖ is the strong until operator, and ―W‖ is the weak until operator: the relationship between ―U‖ and ―W‖ is The observation that a relatively small number of patterns accounts for a significant majority of the temporal requirements analyzed indicates that it may be possible to use machine learning techniques to develop relatively simple and stable learning models that will discriminate between the most-frequently occurring types of temporal requirement. The next sections discuss experiments conducted with a number of machine learning techniques.

III. IDENTIFYING LTL PATTERNS IN NATURAL LANGUAGE REQUIREMENTS A. Problem Description Previous work noted that the author‘s experience in implementing space mission software systems indicates that temporal requirements are the most problematic type of requirement to verify, and that accurately identifying temporal requirements during the specification phase may significantly reduce the effort of analyzing them for consistency, as engineers conducting the analysis will not have to examine the complete body of requirements [9]. If the analysis of these requirements is to be automated, the natural language text must be transformed into a machine-readable notation. The goal we have been follow-

186

0.90 0.791

0.80

0.834

0.738 0.682

0.70

Cumulative Frequency

AQR: 5 PG: 8 EQ: 14 RR: 12 EQR: 15 UQ: 14 ER: 65 UQR: 38 The remaining 14 temporal requirements in the training set were associated with other LTL patterns. The representations using PoS information were developed as follows: 1. The Trigrams ‗n‘ Tags (TnT) parts-of-speech tagger [12, 13] was applied to each word in each requirement as described in [9]. 2. Frequently used words were removed from each requirement. The stop list found at ―http://www.dcs.gla.ac.uk/idom/ir_resources/ling uistic_utils/stop_words‖ was modified by removing words specifying event ordering and timing; words remaining in the modified list were removed from each requirement. The words that were removed from the original stop list are: after never until before next when beforehand not whenever 3. For each remaining word in a requirement, the PoS tag produced for that word by TnT was appended to that word. For example, applying TnT to the text ―The star scanner shall be calibrated no earlier than 30 days after launch and no later than 40 days after launch‖ produces a string of PoS tags which, when concatenated to the corresponding word of the original string, produces the symbol string ―The_DT star_NN scanner_NN shall_MD be_VB calibrated_VBN no_RB earlier_RBR than_IN 30_CD days_NNS after_IN launch_NN and_CC no_RB later_RBR than_IN 40_CD days_NNS after_IN launch_NN‖. The meanings of the tags in this string are as follows [14]. CC: Coordinating conjunction; CD: Cardinal number; DT: Determiner; IN: Conjunction, subordinating; MD: Modal Verb; NN: Noun, singular or mass; NNS: Noun, plural; RB: Adverb; RBR: Adverb, comparative; VB: Verb, base form; VBN: Past participle. 4. Each temporal requirement was formatted as a text string in the Attribute-Relation File Format (ARFF) [15,16] for the WEKA data-mining tool [17]. An example ARFF file is shown below.

LTL Patterns: Cumulative Frequency for Most Frequently Used

1.00

0.623

0.60

0.553

0.50

0.450

0.40 0.30

0.232

0.20 0.10 0.00 ER

UQR

AQR

EQR

UQ

RR

EQ

PG

LTL Pattern Name

Figure 1 - Cumulative Frequencies for Individual LTL Patterns TABLE I. EIGHT MOST FREQUENTLY OCCURRING LTL PATTERNS IN TEMPORAL REQUIREMENTS

ER UQR AQR EQR UQ RR EQ PG

Existence – P becomes true before R Universality – P is true between Q & R Absence – P is false between Q and R Existence – P becomes true between Q and R Universality – P is true after Q Response – S responds to P before R Existence – P becomes true after Q Precedence – S precedes P globally

C. Data Representation For the work reported in [9], two representations of the natural language requirements presented to the machine learners were developed. The first representation used only the text of a natural language requirement. For the second representation, each requirement was represented by the text of the natural language requirement as well as information about the part of speech (PoS) for each word in that requirement. For this work, we also developed these two types of representations. The temporal requirements were drawn from a set of approximately 7500 requirements for a current space mission managed by the Jet Propulsion Laboratory (JPL). We identified temporal requirements by reading the complete body of requirements, which yielded 526 requirements specifically concerned with the timing and ordering of events. Of these temporal requirements, 195 were used to develop a training set to which machine learning techniques would be applied. The number of requirements representing each of the LTL patterns identified in Table I is given below:

@RELATION RequirementType @ATTRIBUTE SYMBOL_STR

STRING

@DATA "electrical interfaces passing cable cutter separation devices shall deadfaced prior actuation

187

device signal power interfaces shall unpowered" "orbiter design shall not preclude tcm hours prior moi" "orbiter shall capability propulsively establish primary science orbit after aerobraking apoapsis altitude km" "orbiter shall achieve primary science orbit days prior start solar conjunction" … … 5.

6.

We also created representations of the temporal requirements text without the PoS information by omitting steps 1and 3 above.

D. Approach We took a two-phase approach to the investigation of machine learning systems for classifying natural language temporal requirements. First, we compared the performance of classifiers applied to text plus parts-ofspeech tags with that of classifiers applied to text only. Text plus parts of speech information is referred to as ―symbols‖; text only is referred to as ―words‖. We constructed the three following representations for symbols and words: Word/Symbol Count Word/Symbol Frequency TF x IDF. We constructed 8 training data sets for each of these representations, one for each of the LTL patterns identified in Table I. We chose to develop learning models for each of the 8 classes rather than develop models that could differentiate between more than two classes. The resulting models simpler and may have a smaller amount of noise associated with it than more complex models. We then developed six variants for each training set by varying the number of included variants. In this case, an attribute is the count, frequency, or TF x IDF value for the unique symbols or words in the set of requirements. The attributes for each variant were chosen by applying WEKA‘s implementation of the Information Gain (InfoGain) attribute evaluator in conjunction with the Ranker attribute ranking technique [15]. The first variant of the training data included the attributes accounting for the first 70% of the classification merit according to InfoGain, the second included the attributes accounting for the first 75% of the classification merit, and so on to the fifth and sixth variants, which included 90% and 100% of the attributes respectively. 31 classifiers implemented in WEKA were applied to these training sets. These are: AD Tree Naïve Bayes Multinomial Bayes Net Naïve Bayes Updateable Complement Naïve Bayes Naïve Bayes Conjunctive Rule NB Tree Decision Stump Nnge Decision Table OneR Hyper Pipes PART IB1 Random Forest IBk Random Tree J48 RBF Network JRip Ridor Kstar SMO LMT VFI

WEKA‘s capabilities were used to transform the text string into three different representations – symbol counts, symbol frequency, and TF (Term Frequency) x IDF (Inverse Document Frequency) [15]. A partial representation of one of the resulting ARFF files is shown below. @RELATION RequirementType @attribute Freq_actuation_nn numeric @attribute Freq_cable_nn numeric @attribute Freq_cutter_jj numeric @attribute Freq_deadfaced_vbn numeric … … @attribute Freq_dscc_jj numeric @attribute Freq_meter_nn numeric @attribute Freq_outage_nn numeric @attribute Freq_telemetry_jj numeric @attribute class {AQR,OTHR_LTL} @data 5.273,5.273,5.273,5.273,5.273,5.273,5.273,5.273 ,5.273,3.48124,2.788093,3.886705,0.041891,3.4 8124,5.273,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,…,0,0,0,0,0,0,0,0,0,0,0,0,0, OTHR_LTL … … Each record in the ―@data‖ section is a vectorized representation of the text for one requirement. In the example above, the first item in the first data record is the value of TF x IDF for the symbol ―actuation_nn‖, the second item is the value of TF x IDF for the symbol ―cable_nn‖, and so forth. For the other two representations, these values are symbol counts and symbol frequencies. The class of the requirement is appended to end of the vector. The example above distinguishes between the ―AQR‖ LTL pattern and all other LTL patterns (―OTHR_LTL‖).

188

Simple Logistic LWL Multilayer Perceptron

with their root form (e.g.,―earlier‖ and ―earliest‖ become ―earli‖).

Voted Perceptron Zero R

E. Results Classifiers are evaluated according to four criteria: pd (probability of detection), pf (probability of false detection, or ―false positives‖), accuracy, and precision. These criteria are defined with respect to a confusion matrix, as follows: Probability of detection (pd): a/(a+b) Probability of false detection (pf): c/(c+d) Precision: c/(a+c) Accuracy: (a+d)/(a+b+c+d) where a, b, c, and d are entries in a confusion matrix:

The classifiers were applied to each variant using 10-fold cross validation, the results of which were used to plot Receiver Operational Characteristic (ROC) curves [15] curves for each classifier applied to each variant. After identifying the representations of the requirements yielding the ―best‖ results, we performed a more detailed analysis of classifier performance by applying the same 31 classifiers to more detailed representations. Again, we applied the classifiers using 10-fold cross validation. Specifically, we created more variants for each representation to obtain a more detailed ROC curve for each classifier. We used the InfoGain attribute evaluator and Ranker attribute ranking technique again to select the attributes included in each variant. The number of attributes included in each attributes is shown in Table II. Suppose, for example, that the data set being analyzed is a symbol frequency representation. In that case, the first variant would include only the frequencies for the 70 symbols having the highest cumulative classification merit according to InfoGain, the second variant would include only the frequencies for the 80 symbols having the highest cumulative classification merit, and so forth.

Detected as type “x” a c

80

16

250

3

90

17

300

4

100

18

350

5

110

19

400

6

120

20

450

7

130

21

500

8

140

22

550

9

150

23

600

10

160

24

650

11

170

25

700

12

180

26

750

13

190

27

800

14

200

28

850

Really type “x” Really type “not x”

In this case, we chose as the best classifier the one whose ROC curve comes closest to the performance ideal – a pd value of 1 and pf value of 0. Fig. 2 – Fig. 5 below show the ROC curves for the ―best‖ classifiers obtained for each type of representation during the first phase. Fig. 2 and Fig. 3 show the performance of the best classifiers for counts, frequencies, and TF x IDF values using text information only; Fig. 4 and Fig. 5 show this same information for representations using PoS information. It should be noted that in Fig. 4 and Fig. 5, the ROC curves for the symbol counts representations are coincident with the ROC curves for the symbol frequency representations. We see in Fig. 2 – Fig. 5 that the classifiers appear to provide the best performance for the TF x IDF representation of the text. In these plots, 14 of the 16 ROC curves shown come closer to the upper left-hand corner of the ROC plot than the ROC curves for counts and frequencies. Table III compares the pd and pf values for the best classifier for each representation. Lines that are bolded indicate the representation that produces the best performing classifier. For example, the first entry in Table III indicates that the best classifier for the word counts (No PoS Info) representation of the training set for the AQR class has a pd of 0.815 and a pf of 0.287. For AQR, the classifier developed from the TF x IDF representation of the requirements text without PoS information performs better than those developed from word counts and frequencies. We see again that in 14 of 16 cases, the representation for which the ROC curve comes closest to the point of ideal classifier performance is TF x IDF. We also see that representations that include PoS information produce classifiers that perform better than those that do not – in 7 out of 8 cases, the best classifier for an LTL pat-

TABLE II. NUMBER OF ATTRIBUTES PER TRAINING DATA VARIANT Number of Number of Variant Variant Attributes Attributes 70 225 1 15 2

Detected as type “not x” b d

We also investigated whether the Porter stemming algorithm would have an effect on classifier performance. Prior to computing word or symbol counts, frequencies, or TF x IDF values, the Porter stemming algorithm [18] was applied to the words in the text after applying the TnT PoS tagger but before concatenating PoS tags to the words. This algorithm replaces variants of the same word

189

tern type developed with text plus PoS information performs better than the best classifier for that same LTL pattern developed without PoS information, as measured by the distance of (pd,pf) from the point of ideal performance, (0,1).

information in more detail. We also investigated the effect of the Porter stemmer on classifier performance. Fig. 6 and Fig. 7 show the ROC curves for the same 31 classifiers that were used in the first phase, this time applied to stemmed and unstemmed variants of the representations. On those plots, the values of pd and pf are shown for the point on the ROC curve that is closest to the point of ideal performance, (0,1). For all but two of the classes (EQR and PG), Fig. 6 and Fig. 7 show that pd for both stemmed and unstemmed text is at least 0.667 and has a maximum value of 0.800. For all but the two classes EQR and PF, both stemmed and unstemmed text provide typical pf values in the range of 0.15 – 0.25, although for some of the smaller classes, such as AQR and RR, pf is less than 0.05. The results shown in Fig. 6 and Fig. 7 indicate that there appears to be no significant difference between the performances of the learners using stemmed and those using unstemmed text. However, since stemming tends to reduce the number of attributes in a data set, stemming may be advantageous when the amount of computation time required to develop a learning model is a concern. Although not shown in Fig. 6 and Fig. 7, the best classifier performance was often seen when only a small number of attributes were included in the training set (e.g., 70-100 attributes). Although the computation time required to develop a learning model can be significant, these results indicate that the model itself may be relatively simple, requiring only a small amount of computation time to apply to a new set of data.

TABLE III. PD AND PF, FOR BEST CLASSIFIERS – WITH AND WITHOUT POS INFORMATION No PoS Info PoS Info Representation pd pf pd pf AQR Count

0.81579

0.28662

1.00000

0.07895

Frequency

0.71053

0.24841

1.00000

0.07895

TF x IDF

0.76300

0.04500

1.00000

0.07895

EQ Count

0.78571

0.32044

0.71429

0.10497

Frequency

0.57143

0.11050

0.71429

0.10497

TF x IDF

0.85714

0.02210

0.92857

0.01657

EQR Count

0.66667

0.13333

0.60000

0.05556

Frequency

0.53333

0.03333

0.60000

0.05556

TF x IDF

0.73333

0.08889

0.73333

0.11111

ER Count

0.69231

0.05385

0.75385

0.16154

Frequency

0.81538

0.21538

0.75385

0.16154

TF x IDF

0.81538

0.08462

0.84615

0.07692

Count

0.37500

0.14439

0.37500

0.12834

Frequency

0.25000

0.00535

0.37500

0.12834

TF x IDF

0.75000

0.00535

0.87500

0.00535

Count

0.83333

0.00546

0.83333

0.04372

Frequency

0.83333

0.00000

0.83333

0.04372

TF x IDF

0.83333

0.02186

0.83333

0.02186

PG

IV. DISCUSSION AND FUTURE WORK Our results indicate that relatively simple machine learning and natural language processing techniques can be used to identify specific types of natural-language requirements within a set of specifications – in this case, the most frequently occurring types of LTL expression within a set of temporal requirements. These subsets can be analyzed separately (e.g., ambiguity, consistency, completeness) from the other requirements in the specification from which they have been extracted; their smaller size makes it likelier that fewer errors will be made in conducting the analyses. For most of the eight classes of requirement types analyzed, relatively simple learning techniques provide classifiers performing a detection rate high enough that they might be used in real development efforts, although the number of false positives may still make their use somewhat impractical. Future work will investigate more sophisticated learning techniques (e.g., voting, bagging, boosting) and additional data representations (e.g., inclusion of additional syntactic information) to reduce the false positive rate. In addition to discriminating between different types of temporal requirements,

RR

UQ Count

0.42857

0.09945

0.57143

0.02762

Frequency

0.50000

0.15470

0.57143

0.02762

TF x IDF

0.64286

0.00552

0.78571

0.00552

UQR Count

0.81579

0.28662

0.71053

0.21019

Frequency

0.71053

0.24841

0.71053

0.21019

TF x IDF

0.76316

0.04459

0.78947

0.06369

The results of the first phase indicate that including PoS information in the requirements representation improves classifier performance, and that vectorizing the text and PoS information by computing TF x IDF provides superior performance to representations based on counts and frequencies. For the second phase, then, we investigated the TF x IDF representation for text and PoS

190

we will also determine whether classifying requirements in other ways is feasible and useful (e.g., discriminating between requirements for ground-based and on-board systems). For two of the eight classes, EQR and PG, pd for the best-performing classifiers is significantly lower than that for the other classes. Future work plans include an investigation of the requirements in these two classes to determine the reason for this difference in performance. As we examine additional requirements in future work, there may be variations in the requirements across missions due to differences in the development teams producing the requirements. For this investigation, a single team was responsible for the requirements that were analyzed. However, as we analyze requirements across multiple missions, it will be necessary to determine whether the composition of different missions‘ development teams has an effect on the learning models produced. Finally, we plan to investigate techniques for transforming natural-language temporal requirements to LTL expressions. This will simplify the application of more rigorous techniques for checking requirements consistency and identifying requirements that may be incomplete. ]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

ACKNOWLEDGMENT [12]

The work described in this paper was carried out at the Jet Propulsion Laboratory, California Institute of Technology. This work was sponsored by the National Aeronautics and Space Administration‘s IV&V Facility. Galen Balcom‘s effort was supported by JPL‘s Minority Initiatives Internship (MII) Program. The authors thank Prof. Tim Menzies of the Lane Department of Computer Science and Electrical Engineering at West Virginia University for many helpful discussions in the application of text mining and machine learning techniques.

[13]

[14]

[15]

[16]

REFERENCES [1] [2] [3]

[4]

G. Holzmann, The SPIN Model Checker Primer and Reference Manual. Addison-Wesley, 2003, ISBN 0-321-22862-6. J. H. Hayes, ―Risk Reduction Through Requirements Tracing,‖ Proc. of 1990 Software Quality Week, San Francisco, CA, 1990. J. H. Hayes, A. Dekhtyar, J. Osbourne, ―Improving Requirements Tracing via Information Retrieval,‖ Proc. of 2003 IEEE International Conference on Requirements Engineering, IEEE Press, Sep. 2003, pp. 151-161, doi:10.1109/ICRE.2003.1232745. J. H. Hayes, A. Dekhtyar, S. Sundaram, S. Howard, ―Helping Analysts Trace Requirements: An Objective Look,‖ in Proc. of

[17]

[18]

191

IEEE International Conference on Requirements Engineering, Sep. 2004, pp. 249-261, doi:10.1109/ICRE.2004.1335682. R. L. Cobleigh, G. S. Avrunin, L. A. Clarke, ―User Guidance for Creating Precise and Accessible Property Specifications,‖ Proc. of ACM SIGSOFT 14th International Symposium on Foundations of Software Engineering , Nov. 2006, pp. 208-218, doi: 10.1145/1181775.1181801. R. L. Smith, G. S. Avrunin, L. A. Clarke, ―From Natural Language Requirements to Rigorous Property Specifications,‖ Proc. of Workshop on Software Engineering for Embedded SystemsFrom Requirements to Implementation, Sep. 2003, pp. 40-4. R. L. Smith, G. S. Avrunin, L. A. Clarke, L. J. Osterweil, ―PROPEL: An Approach Supporting Property Elucidation,‖ Proc. of 24th International Conference on Software Engineering, IEEE Press, May 2002, pp. 11-21, doi:10.1145/581339.581345. University of Massachusetts, Laboratory for Advanced Software Engineering Research, PROPEL web page, http://laser.cs.umass.edu/tools/propel.shtml, viewed Aug. 2009. A. Nikora, ―Classifying Requirements: Towards a More Rigorous Analysis of Natural-Language Specifications,‖ Proc. 16th International Symposium on Software Reliability Engineering, IEEE Press, Nov. 2005, pp 291-300, doi: 10.1109/ISSRE.2005.14. M. Dwyer, G. Avrunin, J. Corbett, ―Patterns In Property Specifications For Finite-State Verification,‖ Proc. 21st International Conference on Software Engineering, IEEE Press, May 1999, pp. 411-420, doi: 10.1145/302405.302672. Kansas State University CIS Department, Laboratory for Specification, Analysis, and Transformation of Software (SAnToS Laboratory), Property Pattern Mappings for LTL, http://patterns.projects.cis.ksu.edu/documentation/patterns/ltl.sht ml, viewed Aug. 2009. T. Brants, ―TnT – A Statistical Part-of-Speech Tagger,‖ Proc. Sixth Applied Natural Language Processing Conference, Apr. 2000, pp. 224-231, doi:10.3115/974147.974178. T. Brants, ―TnT -- Statistical Part-of-Speech Tagging,‖ Universität des Saarlandes, Department of Computational Linguistics, TnT software download application form, http://www.coli.unisb.de/~thorsten/tnt/ , viewed Aug. 2009. ―Part of Speech Tagging Guidelines for the Penn Treebank Project,‖ The Penn Treebank Project, University of Pennsylvania, http://www.cis.upenn.edu/~treebank/, viewed Aug. 2009. I. H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. Morgan Kaufmann, June 2005, ISBN 0-12088-407-0. The University of Waikato Computer Science Department Machine Learning Group, WEKA software download, http://www.cs.waikato.ac.nz/~ml/weka/index.html, viewed Aug. 2009. The University of Waikato Computer Science Department Machine Learning Group, Attribute-Relation File Format, http://www.cs.waikato.ac.nz/~ml/weka/arff.html, viewed Aug. 2009. M. F. Porter, ―An Algorithm For Suffix Stripping‖, Program, 14(3):130-137, 1980.

ROC Curves for Word Counts, Frequencies, and TFxIDF AQR, EQ, EQR, and ER LTL Patterns 1.000

0.900 0.800 AQR: Counts

0.700

EQ: Counts EQR: Counts

0.600

pd

ER: Counts AQR: Freqs

0.500

EQ: Freqs EQR: Freqs

0.400

ER: Freqs 0.300

AQR: TFxIDF EQ: TFxIDF

0.200

EQR: TFxIDF ER: TFxIDF

0.100 0.000 0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

0.500

pf Figure 2 – ROC Curves for AQR, EQ, EQR, and ER LTL Patterns Using Word Counts, Frequencies, and TF x IDF Representations

ROC Curves for Word Counts, Frequencies, and TFxIDF PG, RR, UQ, and UQR LTL Patterns 1.000

0.900 0.800 PG: Counts

0.700

RR: Counts UQ: Counts

0.600

pd

UQR: Counts PG: Freqs

0.500

RR: Freqs UQ: Freqs

0.400

UQR: Freqs 0.300

PG: TFxIDF RR: TFxIDF

0.200

UQ: TFxIDF UQR: TFxIDF

0.100 0.000 0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

0.500

pf Figure 3 – ROC Curves for PG, RR, UQ, and UQR LTL Patterns Using Word Counts, Frequencies, and TF x IDF Representations

192

ROC Curves for Symbol Counts, Frequencies, and TFxIDF AQR, EQ, EQR, and ER LTL Patterns 1.000

0.900 0.800 AQR: Counts

0.700

EQ: Counts EQR: Counts

0.600

pd

ER: Counts AQR: Freqs

0.500

EQ: Freqs EQR: Freqs

0.400

ER: Freqs 0.300

AQR: TFxIDF EQ: TFxIDF

0.200

EQR: TFxIDF ER: TFxIDF

0.100 0.000 0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

0.500

pf Figure 4 – ROC Curves for AQR, EQ, EQR, and ER LTL Patterns Using Symbol Counts, Frequencies, and TF x IDF Representations

ROC Curves for Symbol Counts, Frequencies, and TFxIDF PG, RR, UQ, and UQR LTL Patterns 1.000

0.900 0.800 PG: Counts

0.700

RR: Counts UQ: Counts

0.600

pd

UQR: Counts PG: Freqs

0.500

RR: Freqs UQ: Freqs

0.400

UQR: Freqs 0.300

PG: TFxIDF RR: TFxIDF

0.200

UQ: TFxIDF UQR: TFxIDF

0.100 0.000 0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

0.500

pf Figure 5 – ROC Curves for PG, RR, UQ, and UQR LTL Patterns Using Symbol Counts, Frequencies, and TF x IDF Representations

193

ROC Curves for Stemmed and Unstemmed Symbol TFxIDF AQR, EQ, EQR, and ER LTL Patterns 1.000

0.900 0.800 0.700 AQR: Stemmed

0.600

pd

EQ: Stemmed EQR: Stemmed

0.500

ER: Stemmed AQR: Unstem'd

0.400

EQ: Unstem'd 0.300

EQR: Unstem'd ER: Unstem'd

0.200

0.100 0.000 0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

0.500

pf Figure 6 – ROC Curves for AQR, EQ, EQR, and ER LTL Patterns Using Stemmed and Unstemmed Symbol TF x IDF Representations

ROC Curves for Stemmed and Unstemmed Symbol TFxIDF PG, RR, UQ, and UQR LTL Patterns 1.000

0.900 0.800 0.700 PG: Stemmed

0.600

pd

RR: Stemmed UQ: Stemmed

0.500

UQR: Stemmed PG: Unstem'd

0.400

RR: Unstem'd 0.300

UQ: Unstem'd UQR: Unstem'd

0.200

0.100 0.000 0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

0.500

pf Figure 7 – ROC Curves for PG, RR, UQ, and UQR LTL Patterns Using Stemmed and Unstemmed Symbol TF x IDF Representations

194

Suggest Documents