Scenario-based assessment of non- functional requirements - CiteSeerX

34 downloads 26438 Views 803KB Size Report
system performance within an envelope of environmental variations and reports the ... Index Terms—Non-Functional Requirements Validation, Scenario-Based Testing, .... We then developed more automated tools for scenario analysis of NFR ...
Scenario-based assessment of nonfunctional requirements Andreas Gregoriades, Alistair Sutcliffe, Member, IEEE Abstract—This paper describes a method and a tool for validating non-functional requirements in complex socio-technical systems. The System Requirements Analyser (SRA) tool validates system reliability and operational performance requirements using scenario-based testing. Scenarios are transformed into sequences of task steps and the reliability of human agents performing tasks with computerised technology is assessed using Bayesian Belief Network (BN) models. The tool tests system performance within an envelope of environmental variations and reports the number of tests that pass a benchmark threshold. The tool diagnoses problematic areas in scenarios representing pathways through system models, assists in the identification of their causes and supports comparison of alternative requirements specifications and system designs. It is suitable for testing socio-technical systems where operational scenarios are sequential and deterministic, in domains where designs are incrementally modified so set-up costs of the BNs can be defrayed over multiple tests.

Index Terms—Non-Functional Requirements Validation, Scenario-Based Testing, Bayesian Belief Networks, Systems Engineering

——————————  ——————————

1 INTRODUCTION

cenarios have attracted considerable interest as a means of validating requirements

Sspecifications [4, 5, 9, 54]. Foundations of scenario-based approaches were laid by Hsia and Davis [8, 33], and the influential work of Potts [48] who created the Inquiry Cycle and later the ScenIC [46] method for scenario-based requirements validation [46, 47, 48, 49]. The potential of scenario-based requirements validation has also been recognised by Anderson and Durley [1], Zhu and Jin [71], and Haumer [28].

Scenarios have been applied to the analysis of non-functional requirements (NFRs) using dependency tables to assess the relationships between different NFRs [43] and by modelling the dependencies between goals (representing functional requirements and non-functional requirements, also called soft goals), and the agents and tasks that achieve them in the i* language [70]. The “satisfying” or fulfilment of soft goals (i.e. NFRs) by functional requirements is assessed by inspecting strategic dependency and rationale models that show goals, agents, tasks and dependency relationships [40, 41, 1

70]. Although i* support tools do provide limited reasoning support for assessing dependencies, most validation still requires human expertise. The TROPOS [20] language supports more formal reasoning about i* models; however, it does not explicitly assess non-functional requirements.

Unlike functional requirements, which can be deterministically validated, NFRs are soft variables that can not be implemented directly; instead, they are satisfied [40] by a combination of functional requirements. Since many NFRs are influenced by human properties, they inherit the diverse nature of human characteristics: for example, assessment of NFRs such as system reliability is influenced by human characteristics such as ability, stress, concentration, etc. Software engineering and systems engineering requirements validation methods do not take human factors into account, even though they are a critical cause of systems failure [30, 31, 52].

In our previous work [61] we developed a method and software tool for scenariobased requirements validation that prompted designers with questions about potential problems in a scenario event sequence. The tool used a psychology-based taxonomy of failure causes [32] with a pathway expansion algorithm that generated alternative paths from a single seed scenario. It supported an inspection-based process with probe questions about possible problems and generic requirements as cures for the problems it identified. However, evaluation of this approach showed that too many scenario variations were generated and the software developers drowned in excessive detail.

To address this problem, we developed a semi-automated approach to requirements validation [59], by transforming the taxonomy of human and system failure causes into a model to predict potential errors in a system design. Bayesian Belief Nets (BNs) provided a probabilistic reasoning mechanism to predict reliabilities, from models composed of descriptions of system components and attributes of human operators [21]. However, the output from the BN model was fed into a paper-based walkthrough for validating scenarios which was still time-consuming. This led to the motivation for the research we report in this paper, to create a software tool, for scenario-based requirements validation that automates as much of the process as possible.

2

The paper is organised in seven further sections. BN and uncertainty modelling is briefly described; this is followed by the methodology and the tool’s architecture; the NFR assessment follows. A case study analysis of NFR compliance and validation of system-level components in a military command and control domain is presented in which the tool is applied; the BN evaluation is explained; and the paper concludes with a discussion and proposals for future development of our approach.

2 RELATED WORK

The SRA – System Requirements Analyser – tool described in this paper can be regarded as a form of model checking which takes place early in the system development life cycle, and uses BNs to reason about properties of system components rather than more detailed models of system behaviour.

Model-checking techniques have been used extensively to verify and validate requirements. However, despite the advantages, formal modelling suffers from a communication problem between user-stakeholders and the model developers [7, 11], since formal models are difficult to communicate to the stakeholders who set the requirements in the first place. The software cost reduction (SCR) system used a tabular notation for specifying requirements dependencies which is relatively easy for software developers and end users to understand [34]. Tabular representation based on the underlying SCR state transition formal model provided a precise, unambiguous basis for communication among developers, coupled with automated analysis of specifications. The approach hides the logic associated with most formal methods and adopts a notation that developers find easier to use.

While tabular representations can improve communication of requirements, a combination of visualisations, examples and simulation are necessary to explain complex requirements to end users [6]. Scenario-based representations and animated simulations help users see the implications of system behaviour and thereby improve requirements validation [22]. Lalioti [36, 37] suggested potential benefits from animating requirements validation including an interactive and user-friendly validation environment for stakeholders.

3

Animation simulation tools integrated with formal model checkers have been developed by Dubois in the ALBERT II [10] language and associated requirements validation-animator tool (animator). The language preserves the structure of the informal requirements expressed by stakeholders and maintains traceability links to the formalised software requirements document. The animator validates the requirements based on scenarios proposed by the stakeholders, allowing them to cooperatively explore different possible behaviours of the future system. A similar approach has been adopted in the KAOS language and supporting GRAIL tool which enable formal reasoning about dependencies between goal model, required system behaviour and obstacles or constraints [66, 67]. Another similar animator-validator tool, TROLL [23], uses a formal object-oriented language for modelling information systems, with syntax and consistency checker tools as well as an animator to generates executable prototypes that can be used for requirements validation. As with SCR and ALBERT II animators, our approach employs a tabular and graphical representation of results [29] and runs test scenarios against the system model to identify problems with the requirements specifications.

Scenario-based requirements analysis methods, pioneered by Potts [46, 47, 48], proposed that obstacles or difficulties which might prevent a goal being achieved should challenge requirements and hence promote refinement of the requirements specification to deal with such obstacles. This approach was developed by van Lamsweerde [65, 67], who applied formal reasoning to requirements specifications to infer whether goals could or could not be achieved given constraints imposed by obstacles. Hierarchical goal decomposition produced specifications of the states to be achieved and the system behaviour required to reach those states, so considerable problem refinement is necessary before automated reasoning can be applied. These approaches also assumed that a limited number of scenarios and their inherent obstacles are tested. This raises the question of test data coverage, i.e. just what is a sufficient set of scenarios to enable validation to be completed with confidence? While we believe there is no quick answer to this vexing problem, one approach is to automate the process as far as possible so more scenarios can be tested.

Methods for requirements validation in safety critical systems have adopted hierarchical fault trees to represent the space of possible normal and abnormal system 4

behaviours and their causal conditions (e.g. THERP [64]). While fault stress can be formalised as state machines with temporal logic to reason about potential failures in deterministic systems [27], the influence of human operators and the system environment are generally not modelled. When they are represented, as performance shaping factors [35], probabilistic modelling has to be used to reason about the likelihood of failure of system components based on a description of their properties and factors such as operator stress and fatigue [68].

Intent specifications provide a hierarchical model to facilitate reasoning about system goals and requirements in safety critical systems [38]. Goals are decomposed in a means-ends hierarchy, widely practised in requirements engineering [54, 67]. Intent specification requirements are assessed by inspecting dependencies between constraints, design principles and system goals to discover conflicts. Automated support for reasoning about conflicting system states and behaviour is provided by the SpecTRM-RL tool which uses a tabular format to represent relationships between threat events and systems states, based on design assumptions and constraints. However, intent specifications do not support assessment of human error in systems or dependencies between human operators and user interfaces.

Assessment of non-functional system requirements, such as system reliability, has to use probabilistic reasoning since the range of potential system behaviours is either unknown, in the early requirements phase, or too large to specify. Bayesian Nets (BNs) have been developed to assess software quality from properties of the code and software engineering process [13, 15, 16, 18, 19], and for system risk analysis and management [17]. Fenton and Littlewood’s [16] approach predicts the number of defects in the system. They estimate software reliability using BNs to reason about quality probabilities based on information gathered during the software development process, such as the difficulty of the problem, the complexity of the designed solution, the programmer’s skill, and the design methods employed. Fenton [17, 42] has developed large BN models to assess risk at the system level, such as the reliability of system engineering processes for developing ships, vehicles or the operational reliability of air traffic control systems. This work has also produced methods and tools for building large BN models to solve complex real world problems and improved support for use of BN tools by end users. BNs have also been applied to 5

evaluating the confidence which might be assigned to different combinations of test strategies in assuring reliable software [72].

In summary, BNs have been widely applied as a probabilistic reasoning technique in software engineering and other domains; however, previous work used single nets to evaluate a set of discrete states pertaining to a software product or development process. In our earlier work we extended the application of BNs for safety analysis in systems engineering domains using a semi-automated scenario-based approach [21]. We then developed more automated tools for scenario analysis of NFR conformance for requirements specifications with multiple BN tests [60]. This paper extends that work to show the development of a more comprehensive tool architecture which can be configured with different types of BNs to analyse other non-functional requirements; description of the scenario-based NFR evaluation method with different modes of using BNs in scenario analysis; and validation studies of the BNs. The extensive case study is reported using the tool to analyse a requirements specification for an aircraft weapons loading system for a future aircraft carrier.

3 MODELLING UNCERTAINTY

Because of the uncertain nature of NFRs it is necessary to model them using modelling techniques such as Bayesian probability, Dempster-Shafer theory, fuzzy sets or possibility theory. Following Wright and Cai’s [69] review of the advantages and disadvantages of stochastic reasoning methods, we adopted Bayesian probability. They argued that Bayesian probability offered easier combination of multiple influences on probability than Dempster Shafer and a sounder reasoning mechanism than fuzzy sets. Bayesian probability provides a decision theory of how to act on the world in an optimal fashion under circumstances of uncertainty. It also offers a language and calculus for reasoning about the beliefs that can be reasonably held, in the presence of uncertainty, about future events, on the basis of available evidence [45]. BNs are useful for inferring the probabilities of future events, on the basis of observations or other evidence that may have a causal relationship to the event in question [12, 19].

6

BNs are directed acyclic graphs of causal influences, where the nodes represent variables, and the arcs represent (usually causal) relationships between variables [12]. The example in figure 1 shows two influences on agent stress loading: workload and duty time. Variables can have any number of states in a BN, so the choice of measurement scale is left to the analyst’s discretion. For the illustration we have assigned these variables to one of the two possible states: high, or low.

Fig. 1: Fragment of the proposed BN model.

In the above example, if we know that when the duty time is high (bad) and the workload is high (bad), then the overall probability of the agent’s stress loading being high (i.e. bad influence on human agent) will be greater. In the BN we model this by a network probability table (NPT), as shown in table 1. Table 1: A network probability table for the BN in figure 1. Duty Time

Stress-loading

High

Low

Workload

High

Low

High

Low

High

1

0.4

0.6

0

Low

0

0.6

0.4

1

Column 1 asserts that if the duty time of a human agent is high (bad) and his/her workload is high, then the probability of stress loading being high (bad) is 1, with zero probability of being low. NPTs are configured by estimating the probabilities for the output variables by an exhaustive pairwise combination of the input variables. BNs can accommodate both probabilities based on subjective judgements (elicited from domain experts) and objective data [17]. When the net and NPTs have been completed, Bayes’ theorem is used to calculate the probability of each state of each node in the net. The theorem is shown in equation 1:

P ( a / b) =

P(b / a ) P(a ) P(b)

[1]

Where, P(a/b) = posterior (unknown) probability of a being true given b is true P(b/a) = prediction term for b given a is true (from NPT) 7

P(a) P(b)

= prior (input) probability of a = input probability of b

or, less formally: Posterior_Probability =

Likelihood ⋅ Prior_Probability Evidence

Substituting data from the above example, the calculation is as follows. We want to calculate the probability P that duty_time will be high and if we have opbserved that the agent has a high workload. The likelihood of P stress (high) given work loading = high and duty_time = high is 0.6 as given in the network probability table. To calculate the posterior probability P(duty_time = high) and P(overloading = high), we need the prior P(duty_time = high) is 0.5 and the input evidence of workload being high which is 0.42, which produces the following calculation (equation 2):

P(duty = high / load = high) = (P(loading = high / duty = high) * P(duty = high)) / P(loading = high)

or

P(duty = high /load = high) = (0.6 * 0.5) / 0.42 = 0.71

[2]

Input evidence values are propagated through the network, updating the values of other nodes. The network predicts the probability of certain variable(s) being in particular state(s), given the combination(s) of evidence entered. BN models are extremely computation-intensive; however, recent propagation algorithms exploit graphical models’ topological properties to reduce computational complexity [45]. These are used in several commercial inference engines such as HUGIN, which we used. BNs have to conform to a strict hierarchy since cycles lead to recursive and non terminating propagation of probabilities by the algorithm. This imposes some compromises in modelling influences, which can be partially overcome by introducing additional input nodes to model cyclic influences, although this increases complexity of the network and the control process for the algorithm.

BNs are currently used in many applications to reason about probabilities of properties given a set of existing (prior) states; however, they do not naturally lend themselves to a time series analysis. We examined three possibilities. First was serial evaluation using an extended net which contained an input node that accepted the result from the previous run. Hence the output reliability from step 1 became an input prior state for step 2. This approach had the advantage of being able to explicitly

8

model the interaction between events; for instance, a high probability of failure at step 1 may make completion of step 2 much more difficult. However, input of a posterior probability into a BN as a prior observation over many events has doubtful validity, and we were advised by an expert statistician to avoid this approach. The expert’s argument was that each run should be assumed to be independent, which would not be the case if we propagated results between runs. The second approach was to combine the output probabilities from a sequential run; assuming a BN has been used to assess the probability of failure in a multi-step scenario, how should N probability judgements be combined into a single value? One possibility was to use the output probabilities as input into a “summariser net” that combined all the inputs as prior observations into a single probability, with the net structure organised to group events into episodes in a scenario. However, this option also faced the same criticism as the first, namely converting multiple posterior probabilities into input observations. Our expert advised that sample runs assuming they were independent, were possible but this required probabilities of sampling particular runs to be set. This introduced a subjective sampling bias; accordingly we rejected this option as well.

The third option avoided the net combination problem by converting the output probability into a Boolean variable by judging each step to have succeeded or failed. The output calculated probability for each event was compared with a user-defined target value, and if it surpassed the target is was assigned as a “survivor”, otherwise a failure and discounted. This option had the advantage of being able to pinpoint particular steps in scenarios that were posing reliability problems. Furthermore, sensitivity analyses could be carried out with multiple BN runs for each step by varying the environmental conditions, and thus producing frequencies of survivors for a set number of tests at each scenario event. This enabled investigation of the effect of environmental conditions on design (x) with a set of scenarios (a, b, c) by counting the number of surviving BN runs per step, having systematically varied all combinations of the environmental variables from worst case to best case.

The SRA tool currently has two BNs: one to evaluate reliability and one to evaluate performance time. Each BN model has variants with different probability distributions in the NPTs to deal with variations in the degree of automation between tasks. New BNs can be added to the tool to evaluate a wide range of NFRs. 9

3.1 BN MODEL OF SYSTEM RELIABILITY

The BN model of system reliability is based on a taxonomy of influencing factors by Sutcliffe and Rugg [62] and the slips/mistakes distinction from Reason [52], who drew on earlier work by Norman [44]. Slips are attention-based lapses and omissions in skilled behaviour, whereas mistakes are failures in plans and hence fit into Rasmussen’s [51] rule and knowledge levels of processing. The BN model distinguishes between tasks that involve highly trained skills and are more prone to slips (e.g. monitoring tasks) and knowledge-intensive tasks, such as analysis and planning, that are more prone to mistakes.

According to the human error theory [41], the system environmental variables have an indirect influence on an individual’s ability through increasing the fatigue and stress levels, as reflected in the BN model in figure 2. An individual’s ability, however, has a direct effect on mistakes. Organisational factors (management culture, incentives) have a direct effect on individuals’ motivation [39]. Finally, individuals’ characteristics, such as domain and task knowledge, have a direct effect on mistaketype errors [53]. Slips are mainly influenced by the user interface, the constraints (time constraints, interruptions) and the individual’s dedication [52]. Tasks of high cognitive complexity are considered to be more prone to mistake-errors, while tasks of physical complexity, such as complex manipulations involving precise movements and detailed co-ordination, are more prone to slip-errors [59].

10

17

18

Noise

Lighting

1-2

19

20

21

22

Sea_State

Visibility

3-6 7 -10

War_Peace

Comfort

11-16 17-22

3

5

4

Functionality

Performance

External_Env factors

Internal_Env factors

Env_Context

13

Reliability

15

16

Management culture

14 Incentives

1

2

Duty_Time

Workload Cognitive complexity

Physical complexity

Task support

Task complexity

Usability 7

Functional UI_Design

11

Enthusiasm Stress

10

8

Knowledge Autom Effectiveness

Time constraints

Distractions

Task Domain knowledge knowledge

Operational stress External influence

Inherent ability

Ineffective -ness Ability

Internal influence

Expertise Mistakes

Organisation culture

Internal motivation

Fatigue

12 6

9

Motivation

Dedication

Slips

Fig. 2: BN model for system reliability. Inputs 1-2 relate to the task, 3-6 are technology attributes, 7-10 are human attributes and 11-22 are environmental variables. Appendix A describes the nodes and summarises the NPT influences from parent to child nodes.

The first two inputs represent judgement of task complexity; for instance, operating radar is cognitively and physically easy, whereas interpreting an object on the radar is cognitively more complex (hence set to high). Inputs 3 to 6 describe technical component properties, which can be taken from historic data on similar equipment, or estimated. Inputs 7 to 10 are properties of human agents which are taken from training data and job descriptions. Input values for the agent’s task knowledge, domain knowledge, motivation and so forth can be measured using aptitude and psychometric tests. The next six variables model influences on the human operational environment. These include the short-term effects of time pressure, distractions and workload, which can be estimated from narrative scenario descriptions, to the longer-term influences of management culture and incentives. The final six inputs describe aspects of the system’s operational environment (noise, lighting, comfort, sea state, visibility and war/peace status). All the inputs are held in databases containing attributes of human agents, technology components, and tasks. The environmental variables, sub-

11

divided into human and system operational aspects, can be entered manually to reflect a particular scenario or systematically varied.

Task complexity can be either cognitive or physical; for instance, operating radar is cognitively and physically easy, whereas interpreting an object on the radar is cognitively more complex (hence set to high). Attributes of technical components can be taken from historic data on similar equipment, or estimated. Inputs from the human agent are taken from training data, job descriptions or are measured objectively by using

psychological

questionnaires.

For

instance,

general

ability

and

accuracy/concentration can be measured by intelligence aptitude scales, decision making and judgement by locus of control scales, whilst domain and task knowledge can be measured by creating simple tests for a specific task/domain. Input nodes in the human operational environment include the short-term effects of time pressure, distractions and workload, which can be estimated from narrative scenario descriptions, to the longer-term influences of management culture and incentives, which are judged from contextual scenarios. The final input nodes describe aspects of the system’s operational environment. All the inputs are contained in files that link the variables to human agents, technology components, task properties or the environment, sub-divided into human and system operational aspects. The input variables are all discrete states (best/worst case) which are derived from the measures detailed in appendix A.

The BN is run with a range of scenarios that stress-test system design against operational variables. Scenarios can either be taken from domain-specific operational procedures or by interviewing users, or postulated to cover a variety of organisational and work situations that may occur in the domain. The BN produces outputs: slip-type errors that apply to skilled tasks (recognise, interpret and act), and mistake errors pertinent to judgement-style tasks (analyse, plan and decide). 3.2 BN FOR OPERATIONAL PERFORMANCE TIME

The topology and components of the BN for performance time assessment are similar to the Reliability BN since many of the influences on performance and error are the same. The Operational Performance Time model has a similar causal network to the Reliability BN, apart from having one output node (operational performance) rather

12

than two. As with the Reliability BN illustrated in figure 2, the likelihood influences expressed in the BN model and its NPTs are based on human factors performance literature [64] (see also appendix A). For example, a poor physical and operational environment (time on duty and workload) have an adverse influence on the agent’s stress and fatigue levels which in turn adversely influence the agent’s concentration [3]. Input variables are either an expert’s assessment of a quality, e.g. information for decision support provided in a prototype; or a functionally rich and more expensive design which would have a higher rating for functionality, situation awareness support, etc. Different levels of automation are reflected in variations in the BNs. For example highly automated tasks tend to be quicker and more reliable, but this only applies if the equipment is well designed and maintained. Hence maintenance has more influence in highly automated tasks than in minimum automation, and this is reflected in different NPTs based on the equipment types. Similarly, the type of task (manual, semi-automated) determines the degree of influence of technology.

Whereas the Reliability BN produces probabilities of reliable completion for each task step, output from the Operational Performance BN is used to increase a best case task completion time to reflect the less than ideal properties of human and machine agents. Each task is assigned a best and worst case completion time, obtained from domain experts. The estimated task completion time is calculated using the following formula (equation 3):

ET = (Plow * BT) + (Phigh) * WT)

[3]

Where,

ET P high BT P low WT

= Estimated time = Probability of operational performance being high = Best task-completion time = Probability of operational performance being low = Worst completion time

Hence, if the probability of high operational performance is equal to 1 then the probability of low operational performance will be 0 (best case); this will result in a best case completion time. On the other hand, if the probability of low operational performance time is 0.57 and the best and worst times are 3 and 10 sec respectively, then the estimated time is (0.57*3) + (0.43*10) = 6.01 sec. If the threshold value is set

13

at 75% in the range best-worst case, then this is converted into time with the following formula (equation 4):

Thsec = (Th% / 100) * BT + ((1- Th%) / 100) * WT

[4]

Where,

Thsec = Threshold in seconds BT = Best task-completion time Th% = Threshold as a percentage value WT = Worst completion time Therefore, according to the above example and equation 5: Thsec = 10 −

75 ⋅ (10 − 3) = 4.75 sec 100

[5]

Hence any task completion time less than 4.75 is acceptable. For each task-step the system counts the BN runs with task completion times below the threshold.

To reflect the case of reverting to manual when an automated technology fails, highly automated tasks’ worst completion times are generally set much higher than those of the manual tasks. This is because the human operator has to diagnose the reason for failure and then substitute the manual version of the task which will not be familiar. Hence the worst-case time will be longer than the manual task alone. For instance, the task “Manually load weapons on trolley” requires 120 sec to complete best-case situations and 180 sec in the worst case. On the other hand, the same task with automated technology could be completed ideally in 70 seconds but in 320 sec in the worst case. If the automated technology fails to load the weapons correctly then intervention of a human agent is required to discover the reason for the failure and then correct the misplacement or manually load the weapons.

4 SRA SYSTEM ARCHITECTURE

Analysis starts with the selection of the i* model to be evaluated, and creating the test scenarios. Scenarios are narratives taken from real life experience describing operation of similar systems from which event sequences are extracted. This process is explained in more depth in sections 5 and 6. A scenario editor tool is provided [24] which allows the analyst to point to task nodes on the i* diagram; the tool then presents a list of the technology and human agents which may be associated with the

14

task. The analyst picks the agents from the list to form a task ‘tuple’ consisting of . Scenarios are built up in this manner by following task pathways through the i* model, which is illustrated in figure 3. The analyst specifies the NFR threshold values, then selected the scenarios and system database. The SRA loads the required information (for the task and agents in the scenario) from the domain database. Because of differences between semi- and highly automated tasks, the system evaluates operational performance for each type of task using slightly different BN models. Nodes that do not apply to the equipment used are left undefined and therefore have a neutral influence on operational performance. For instance, tasks that are highly automated are more dependent on maintenance compared with semi-automated tasks, whereas highly automated equipment is generally more reliable as long as it is well designed and maintained. These influences are reflected in the network probability tables of the BN models.

Fig. 3: System model for a navy command and control combat system represented in the i* notation. To simplify the model only human agents are shown. Scenarios trace pathways through the model from the radar operator to PWO and then to weapons directors – EWD, WDB or WDV – for a response to the threat.

Furthermore, depending on the task type, the SRA assesses system reliability based on two types of errors, slips and mistakes. Slips are more common in tasks that are highly skilled or physical in nature, while mistakes occur in tasks that are cognitively 15

complex or knowledge-intensive, such as planning [50, 51]. For each BN run the tool assesses the system reliability and compares it against the pre-defined threshold. Throughout this process the system keeps track of the number of BN runs that pass the threshold.

In its current form the tool assesses two NFRs, system reliability and operational performance time. The BN models are used in a plug-and-play architecture that binds BN models’ input nodes with the System Requirements Analyser (SRA), enabling a range of NFRs to be tested using the same set of scenarios.

The SRA tool is composed of the following software components (see figure 4): •

The Session Controller implements the user command interface for selecting designs and scenarios and executes the algorithm that assesses a set of scenarios with the BNs. It calls the system reliability or operational performance BN assessors to execute the BN runs with all possible environmental combinations.



The i* model editor allows interactive construction of i* models with typical CASE tool-type functions.



The Interactive Scenario Constructor produces test scenarios from the system model based on user directions. Scenarios are stored in a database in an array of tuples.



The Model Controller controls the BN models. It selects the appropriate BN model for each task step, then populates the input nodes, runs the model and receives the belief distributions of the output nodes. The Model Controller also manages the back propagation of the BN model to identify required technology and agent characteristics.



The BN assessor modules run the net by calling the HUGIN algorithm for each task step and for each set of environmental variable combinations. The output from each run is compared with the desired NFR threshold and the survivor runs are passed to the results visualiser.

16

i* model editor Scenario constructor

i* system models Desired NFR thresholds

Session controller

Selects appropriate BN models, controls runs Model controller Configuration editing: add new BN models and selection rules

HUGIN BN editor

HUGIN BN tool

Scenario task sequences

agent task properties

Domain database

Results visualiser

Survivor runs for each step / scenario / design

Fig. 4: System Requirements Analyser – conceptual architecture and functional components.



The Visualiser provides a visual summary of all qualified BN runs for a set of scenarios for one or more system designs. This enables different designs to be compared and problem areas in the requirements to be identified, i.e. task/technical component combinations which show low potential NFR assessments. The Visualiser displays results at three levels: System, Scenario and Phase views based on our previous visualisation model [24].

The system can be configured with new BNs by creating a new net and NPTs using the HUGIN tool. The new BN is then added to the Model and Session Controllers by editing menus to allow selection of the new NFR analysis and adding any rules to the Model Controller to select between different model sub-types and NPTs according to task or agent/equipment types. Currently only one NFR can be analysed in a session; however, several designs and scenarios can be analysed sequentially. The system automatically aggregates results from lower-level phase views, to the scenario and then system design level, allowing two or more designs to be compared using the

17

same set of scenarios. The system was developed in JAVA using JBuilder 9 (J2EE). The user interface was implemented using Swing components while the model controller interfaces with the HUGIN Decision Engine via the provided Java API. The connection to the database uses JDBC. 5 NFR ANALYSIS METHOD

The process, illustrated in figure 5, starts by creating the system model, using the i* modelling language, to describe the characteristics of agents, tasks, resources and soft goals. Soft goals in this case constitute the NFRs under investigation, while resources are the equipment used by the agent to perform the task. The domain knowledge necessary for the development of the i* model is elicited from domain experts. NFRs and their validation criteria are specified in the requirements specification (e.g. system reliability should be >= 95% for a system design with a set of operational scenarios 1..n).

The next step converts scenarios, which are narrative stories, into a format that can be executed by the system. This is achieved by extracting task sequences undertaken by agents from the narrative. For example in the naval domain a missile attack scenario narrative is “The enemy aircraft launches a missile, which is detected by the ship’s radar. The Radar Operator (RO) reports a hostile contact, speeds and bearing to the Tactical Picture Complier (TPC) who estimates the direction and closing time of the threat and notifies the incoming missile threat to the Principal Weapons Officer (PWO). PWO decides to jam the missile’s radar using electronic counter-measures and issues the command to the Electronic Weapons Director (EWD) … [continues]”. Scenario narratives can contain implicit tasks which are not articulated because they are tacit or assumed knowledge, therefore we apply generic task patterns [58] to define the task sequence. In the above example the generic pattern for command and control consists of five tasks Monitor (events), Interpret (threat), Analyse (situation), Plan (response), and Act.

Using the scenario editor with the i* system model, test scenarios are constructed by selecting the tasks that are explicit and implicit in the scenario narrative, so for the above example the task sequence from the Monitor by RO to Plan by PWO followed by Act (EWD) would be selected. Scenarios are composed of a number of phases and

18

each phase is composed of a number of task-steps, each one modelled as a tuple.

Descriptions of tasks, agents, goals

Develop system model

Construct scenarios

scenario constructor

Compare designs

survivors bar chart

i* editor Select designs, scenarios and NFRs

survivors bar chart

backpropagation

Pinpoint critical tasks

Identify improvements

Assess environmental variables

Identify critical components

results visualiser

results visualiser

Requirements changes

Fig. 5: NFR analysis method, processes and tool support. Ellipses denote method steps, ordered in a dependency sequence; boxes show tool components that support the method step they are connected to.

In the above missile attack example, the narrative has four phases, each one representing a command and control sequence: first electronic counter-measures are tried; in the next phase the ship manoeuvres to avoid the threat; then fires decoys; and finally destroys the hostile missile with a defensive missile. Phases are used to structure task sequences that fulfil a higher order goal. Scenarios can be interactively constructed by pointing to tasks on the system model editor display. The tool then automatically creates a scenario task sequence by tracing the human and machine agents involved with each task.

The Compare Design step finds the best system design using the system view bar chart (see figure 6) to investigate the number of surviving runs for each task step. Trade-offs between NFRs can be assessed by selecting different BN models (e.g. reliability, performance time) from the Session Controller menu, while designs can be

19

compared by changing the database, which loads different technology and human agents that represent a new design, and repeating the process. NFR thresholds can be set at the user’s discretion so the tool allows the analyst to compare designs and desired performance in a more flexible manner than if the variables had been hard coded.

The best design will generally have more surviving BN runs (as defined in section 3); however, it is also desirable that the design succeeds in all scenario steps. Each bar in the system view (see figure 6) corresponds to the cumulative number of surviving runs for each task-step in a scenario phase. The analyst can easily identify the best design and pinpoint task steps with low NFR satisfaction rates by focusing on low scores on the bar chart. Moving the cursor on top of any bar reveals the total number of surviving runs for the task-step. The bar chart identifies poorly performing task steps, which can be cross-referenced to the human and machine agents involved. Right-clicking on top of any bar to reveal the components involved. The domain database can then be queried to find the input variables. The domain database has an annotation field so the analyst can record reasons for settings, and refer to these when improvement may have to be made. The BN models have a limited explanation facility of pop-up tool tips that summarise the NPT influences (see appendix A) for each parent-child node combination. This information is then used in the Identify Improvement step. Further advice on generic requirements for technology to support particular tasks, and improving human operation, is given in a related part of the toolset which we have described elsewhere [57].

The best design also needs to be resilient to environmental conditions. This analysis is supported by the results visualiser in the Assess Environment step. The results visualiser uses colour coding to identify variables which adversely affect system risks over a range of scenario steps. In the phase view the influences of environmental variables on survivor runs are collated into a matrix (figure 6). Columns correspond to the twelve environmental variables, and rows report the percentage scores that passed the threshold. The impact of environmental variables is calculated as equation 6:

20

EP ( x )

IEP ( x) =

Qb × 100 Q EP ( x ) All

[6]

Where,

Qb

EP ( x )

= Survivor runs with environmental variable (x) set to best case

Q EP ( x ) All = Total survivor runs for all settings The matrix’s colour coding denotes the level of importance of each parameter; “green” designates a low risk parameter since it has been assigned to “worst-case” most of the time. On the other hand “red” denotes the high risk due to the high percentage of runs with “best-case” settings. Since the environmental variables which were set to worst case did not degrade the NFR level below the threshold, if they are set to best case they can only have a positive effect on the NFR. Conversely, with variables that were set to best case during the NFR assessment, if set to worst case they will decrease the NFR so it fails to pass the threshold level, therefore they are indicated as a risk.

Selected BN for NFR

Test Scenarios

Dynamic task-steps display, depicting the last phase of a scenario composed of three tasksteps

2

1 Phase view

System view 2 different designs

Fig. 6: System visualisation showing the system and phase view of the operational performance assessment. The Incentives column (1) is worst case (coloured red in display), whereas the Light column (2) is better than average (yellow) and other columns are average (orange). In this run no best-case (green) runs survived.

21

In the Identify Improvements step, if an overall design or a particular task step fails to meet the desired NFR threshold then the back propagation analysis is used to set the desired NFR value and the BN is back-propagated to discover the necessary settings of agent or environmental variables to achieve the NFR value. Back propagation can be used in two modes: all input nodes unconstrained, in which case the BN calculates the input values required to achieve the user-determined output NFR; or one/few input nodes unconstrained, in which case the BN calculates the values for these nodes given settings for the constrained nodes. Back-propagation is usually hypothesis-driven to focus on where design improvement could be made, so many variables are left with their original settings, with a few nodes left unconstrained.

The results from the back propagation are compared with the properties of the original component in order to identify the level of improvement required. For instance, if the usability of the radar is set to 0.65 (actual) in the database and the assessed usability from the back propagation is 0.83 (estimated) to achieve the desired NFR for reliability of 0.85, then the required level of improvement is 0.18, i.e. 0.83 minus 0.65.

Figure 7 depicts the back propagation of the Operational Performance model using an input set of environmental variables, the agent properties and the required NFR values defined by the requirement specifications. The monitor windows on top of system environment, human agent and NFR notes show the input variables. The monitor windows on top of technology influences depict the distribution of the output nodes.

22

Desired NFR value input

Changed functionality variable output

Fig. 7: Back propagating the BN to identify the cause of the NFR effect in terms of technology characteristics (influence of each one). A sub-set of the Operational Time performance net is illustrated.

6 CASE STUDY

This case study describes the application of the SRA tool in validating the operational performance and system reliability of a complex socio-technical system. The requirements question is to assess the impact of new automated technology on the task of loading weapons on to aircraft in an aircraft carrier. A description of the human roles used in the following scenario is provided in table 2 and the technology components are listed in appendix B.

A request for an air mission arrives in the control room from Carrier Group Strategic Command. The mix of weapons/fuel tanks/electronic counter-measures pods, etc. is planned according to the mission type and aircraft assigned to the mission. The Air Planning Officer (APO) plans the required weapons load and schedules the loading with the Deputy Air Planning Officer (DAPO). The load plan is communicated to the Magazine Weapons Supervisor (MWS). The MWS plans the retrieval of weapons

23

from the magazine and the Magazine Artificer (MA) retrieves the weapons and places them on a trolley. The trolley is placed on the hoist which lifts it to the flight deck. The trolley is then moved by the Weapons Artificer (WA) to the specified aircraft. The Weapons Team Supervisor (WTS) is responsible for organising the WA teams. A number of checks are performed by the Weapons Loading Controller (WLC) prior to the loading of the weapons, e.g. check that the aircraft is properly grounded and engine power is set to off; visually inspect the wing rack to ensure safety pins are placed and the rack is locked; verify that all cockpit armament selectors are in the off or safe position. On completion of safety checks the WA positions the trolley under the aircraft wing, orients the trolley under the desired rack, lifts into position and attaches the weapons. The trolley has a pneumatic pump to hoist the weapon up to the wing; however, the final load-and-secure is manual and requires two or more WAs depending on weapon weight. The process is repeated for the rest of the weapons. On completion of the loading process the WLC tests the connections between the weapons and the rack, then the WA removes the trolley. Finally the WLC inspects the weapons before arming them and reporting completion to the Flight Deck Supervisor. The process is usually carried out concurrently with two teams, one per aircraft wing.

Table 2. Description of the agent roles. Roles

Description

APO

Air Planning Officer is responsible for the planning of the weapons load according to missions requirements

DAPO

Deputy Air Planning Officer is accountable to the APO. Responsible for the planning of weapons load and communicating the plan to the magazine

MWS

Magazine Weapons Supervisor is responsible for the effective management of the MAs and the planning of the weapons retrieval

MA

Magazine Artificer is responsible for the retrieval of weapons from the magazine and loading on the transportation equipment

WTS

Weapons Team Supervisor is responsible for the effective management of the weapons loading team

WA

Weapons Artificer is responsible for handling weapon systems on the flight deck and elsewhere

WLC

Weapons Loading Controller manages the flight deck weapon loading process

The scenario task-steps and components used for two prospective designs are shown in appendix B. Tasks in Design 1 are manual or semi-automated, while in Design 2 they are semi- or fully automated; for instance, the task “Transfer weapons to aircraft” has becomes specialised into “Move trolley to aircraft” and “Drive autoload palette to

24

aircraft”. The autoload palette has image sensors to detect the correct position on the aircraft wing and knowledge of the aircraft and weapon type, so it can automatically hoist and connect the weapons. The second design saves manpower since it can be operated by one WA, and is potentially more rapid to operate, but it is more expensive. The systems engineer needs to compare the two designs with a sensitivity analysis to test different assumptions. The analyst can easily pinpoint the more reliable design by focusing on the comparison in the system view. Overall most of the tasks were more reliable in Design 2 (advanced technology) at the rear of the bar chart in figure 6; however, tasks “Schedule load” and “Report task completion” had more survivors and hence better reliability in Design 1. Also both designs had poor reliability for “Move trolley to aircraft” and the following checking tasks, so these are critical tasks that warrant further attention. The two designs have equal and acceptable reliability for the Load Planning task even though Design 2 was automated. Inspection of the agents’ properties and the BN tables shows that the information accuracy and maintenance technology properties were set to poor because the planning system was a new prototype, hence the improvement from automation was small. The poor reliability of “Move trolley to aircraft” in both designs is a consequence of the effect of environmental variables on human operation. This can be seen in the phase view in figure 6 which shows that this task and load planning both suffer from adverse environmental influences. Moving the trolley is primarily a manual task, so the system selects the NPT tables which minimise the influence of the technology component; in the Design 2 autoload palette, poor maintenance settings for new technology reduce the advantage of automation. The adverse environmental influences on human and machine agents are present for both designs, reflecting the experience that manoeuvring equipment on a pitching aircraft carrier deck (sea variable setting) is prone to error. Similarly the subsequent four checking tasks are all manual and exposed to reliability influences from motivation (slips when not paying attention) and interruptions in a busy flight deck environment (concurrency variable). Solutions require human factors knowledge, which might suggest double checks to improve reliability or improved design to support checking by augmented reality display of reminders, location of objects to check, etc.

25

Fig. 8: Task completion time for each task in both designs. The lower part of the bar is the best case time; the upper part is the estimated time taking agent and environment variables into account.

When the operational performance times are compared (see lower bars at the rear of figure 8), Design 2 is quicker for nearly all tasks, which is not surprising since it has more automated tasks. The projected increase from the best case task completion times for Design 1 reflects the effect of the same variables that also caused poor reliability.

Completion times for Plan and Schedule load tasks are long for both designs, which might seem strange since Design 2 partially automated both tasks. However, best case time even after automation is still long, since human checking is necessary to verify automated decision making. The projected actual times reflect the poor reliability of both designs, which can be traced to poor rating of information provided by the technology, reflecting uncertainty under operational conditions. Most tasks have more rapid best-case and estimated times in Design 2 because automated processes are quicker and the time advantage is not changed by the effect of poor reliability in some tasks, e.g. Planning, Scheduling, and Move trolley to aircraft.

The next step is to consider the critical environmental variables for both designs, illustrated in figure 9. Figure 9a shows that incentives, motivation, duty time concurrency, and time constraints were all marked as vulnerable for Design 1. Design 2 (figure 9b) in contrast fares better with only motivation, concurrency and maintenance marked as vulnerable. Maintenance becomes a concern for the second, more highly automated design and this reflects the NPTs selected for different levels of automation. Cures as before require human factors knowledge; however, some suggestions which can be found in the system database are to increase motivation to improve crew morale, or provide incentives for these roles. Concurrency is difficult to

26

cure since so many tasks are prone to interruptions, while the effect of maintenance depends on the system engineer’s judgement about the effectiveness of planned maintenance. The tool’s role is to point out the problem which can be cured by changed procedures, and management decisions such as to increase investment in low maintenance equipment.

Fig. 9(a): Environmental influences for Design 1. The arrow points to critical task. Red (darker shading) indicates adverse environmental variables.

Fig. 9(b): Environmental influences for Design 2. The arrow points to the critical task.

After identifying the most appropriate design, the problematic tasks and the critical environmental variables, the analyst investigates the improvements required for the Autoload palette component, which was the weakest link in Design 2. Using the backpropagation facility, the minimum acceptable reliability is set in the output node, and the nodes where design or operational environmental changes can be made are left unconstrained.

27

Fig. 10: Tuple components suggested improvements for Design 2. The circled cells correspond to the required improvements for the generic task “Drive autoload palette to aircraft”. Dark-filled cells represent properties that are not applicable to the component.

In this case, equipment maintenance (already identified as a vulnerability) and the human operator’s experience (the only way to overcome difficult carrier deck operations) are selected. The BN shows that maintenance needs to be improved by 50%, and operator’s experience by 26% (see figure 10). Translating these into specific needs requires domain expertise; however, the tool does quantify the degree of improvement and this can be empirically tested by setting targets in a prototype system.

7 VALIDATING THE BN MODELS

We used data mining techniques to test the assumptions embedded in the BN models to map the expected influences elicited from domain experts and theory. We simulated all possible permutations of the input model variables and created a database of reliability and performance time predictions for these runs. This produced an extensive set of test data; for example, for one scenario composed of four phases with six task steps in each phase the tool generated 4*6*312 records. The BN model’s NPT and the causal influences were analysed with the following data mining techniques: relevance analysis, association rules and classification [25]. Relevance analysis ranks input parameters of the model based on their relevance to one of the model’s output parameters (e.g. reliability in our BN). Association rules describe how often two or more facts co-occur in a data set and were employed to check the causal associations in our model. Classification partitions large quantities of data into sets with common characteristics and properties and was used to provide a further check on the structure of the BN models.

The initial assumptions made about influences on system reliability and operational performance were mainly satisfied. However, the relevance analysis revealed that sea state had only a minor influence on system error, although according to domain

28

experts, it is a major influence on human error. Several intermediate nodes had diluted the influence of sea state on system error nodes so it was necessary to alter the BN causal diagram. The two BN models for assessing operational performance with different levels of automation showed a similar influence of maintenance on operational performance, which should not be the case. These inaccuracies were addressed by altering the BN’s NPTs to increase the prior probability influence for poor maintenance on automated tasks. Association analysis identified two rules with high significance levels that were not explicitly defined in the model: IF (DutyTime = High) THEN (Survived = Fail) IF (Workload) = High) THEN (Survived = Fail). These rules indicated that the causal influences of “Duty Time” and “Workload” were higher than the influences in the BN what had been specified by the domain experts. In order to overcome this problem we altered the NPT settings to reduce the weighting of these nodes and increase the influence of the “Distractions” node that appeared weak. Finally, classification analysis pinpointed problems with crew motivation and agent ability nodes which suggested changes to the BN model structure. 8 DISCUSSION AND CONCLUSIONS

The main contribution of this research has been to develop automated testing of requirements specifications and designs for conformance to non-functional requirements using a set of scenarios and variations in the system environment. This is a considerable advance over existing tools which support validation of NFRs by inspection of models [41]. Our automated scenario-based testing tool explicitly considers environmental influences, and provides visualisations for pinpointing problematic tasks and components within a design and scenario sequence. The technology is applicable to problems where requirements are expressed as properties of components, such as the human and machine agents in our system engineering domain. However, the configuration costs of the BNs will limit the cost effectiveness of the technology for new green-field requirements engineering problems; on the other hand it should pay back in brown-field domains where designs are incrementally refined, and the set-up costs can be amortised over many generations of testing.

29

More generally the SRA could be applied to any class of component-based problems where the selection of components needs to be optimised for non-functional requirement types of criteria. The architecture is modular and scalable, allowing new NFRs to be investigated by “plugging in” the appropriate BN. Our work presents a new view on component-based model-checking using BNs which could, in principle, be applied to model-checking requirements at lower levels of granularity, such as black-box software component configuration. The BN approach could apply to any domain where requirements attributes can be synthesised into a predictive model of performance, effectiveness, or other non-functional requirements. It can be applied to problems that can be described by a set of sequential tasks, for instance checking workflow systems expressed as sequential tasks/functions undertaken by a collaboration between human and software agents.

The SRA tool was a development from our previous BN requirements analyser [26], and has partially addressed the difficult problem of scenario-based testing [4, 63x]. Although there is no substitute for domain expertise in generating or acquiring scenarios, our approach can amplify scenario-based validation by systematically testing a set of assumptions that are implicit within scenarios. This enables areas of concern to be pinpointed, as well as enabling trade-off analysis between alternative designs. However, the fidelity of testing depends on the accuracy and sophistication of the BN models. There is no quick solution to validating complex models of human error and environmental influences on system failure since exhaustive experiments on complex systems can never be complete; incorporating human factors into assessment of systems or user interfaces has to rely on models constructed from theory and domain expertise [30, 35, 53]. We have followed both approaches in constructing BN models.

The SRA tool is aimed at requirements investigation in complex socio-technical systems, and hence it complements model-checking tools which are more appropriate to later stages in development when specifications of agent behaviour are available, e.g. SpecTM-RL [38], KAOS-GRAIL [66, 67]. Other scenario-based requirements analysis tools such as ARTSCENE [56] help to automatically generate scenario variations

by

pathway

expansion

algorithms

that

trace

normal

and

alternative/exception paths through use cases, but no validation support is provided 30

beyond suggestions for generic requirements which may be applications for different scenario events.

The use of BNs by Fenton et al. in their work on software metrics and risk analysis [12, 15, 18] is closely related to our approach. However, they employed BNs to assess the quality of software systems based on the properties of system specifications, development process and code. Their use of BNs assumes a static view whereas we have extended Bayesian tests for a dynamic view in operational scenarios by introducing the notion of test survivors to avoid the problems of Bayesian reasoning over multiple sequence states. We do not consider operational testing with scenarios. In the JSIMP tool Fenton and Cates [14] provide predictions of project failures based on BN analysis of project management practices. Users enter scenario information via a questionnaire interface and obtain probability distributions of unknown variables using the back-propagation facilities of BNs, also incorporated within our tool. Although the JSIMP tool has an end-user interface that hides complexities of the BN from the user, it does not include sophisticated visualisation facilities to compare with our SRA tool, which allows the analyst to assess multiple model assessments over a variety of scenario sequences and environmental conditions.

There is no shortage of scenario-based tools for requirements validation and verification; however, all these tools use more detailed specifications of system behaviour which will not exist in the early stages of the requirements process or domains with black-box component-based design. For instance, Ryser and Glinz [55] convert natural language scenarios into statecharts which in turn are used to generate test cases used for system validation. In common with our tool the scenario conversion process is manual and labour intensive, so one future direction in our work will be to investigate information extraction tools [57] which may be able to partially automate generation of scenario event sequences from text-based narratives. Like the ARTSCENE environment, the SCENT method [53] only provides automated derivation of possible test cases, and no assistance in validation of requirements specifications. Zhu and Jin [71] also used formalised scenarios for validating requirements based on the principles of activity lists [2] but did not provide any validation for non-functional requirements.

31

Although our approach has delivered an analysis tool for investigating system requirements, there are some limitations in its applicability. First we make the assumption of single-threaded tasks. While this is true for highly trained military domains in event-driven scenarios, it will not be the case in domains where opportunistic behaviour is the norm. Another simplification is that we do not model concurrency and communication in our scenarios. Since our scenarios are singlethreaded, concurrency is not a severe problem; furthermore, we argue that the SRA tool uses approximate models so its value lies not in diagnosis of a completely realistic task model but rather in comparative assessment of two (or more) different designs using the same set of scenarios and analysis approach. Given these limitations, the SRA provides a reasonable trade-off between modelling effort and diagnostic power. However, in our ongoing research we are investigating concurrent scenarios and communication within the BN analysis.

REFERENCES

[1]

J. S. Anderson and B. Durley, “Using scenarios in deficiency-driven requirements engineering,” presented at Requirements Engineering RE'93, 1993.

[2]

J. S. Annett and K. D. Duncam, “Task analysis and training design,” Occupational Psychology, vol. 41, pp. 211-221, 1967.

[3]

R. W. Bailey, Human Performance Engineering: A Guide for System Designers. Englewood Cliffs NJ: Prentice Hall, 1982.

[4]

J. M. Carroll, Scenario-based design: Envisioning work and technology in system development. New York.: Wiley, 1995.

[5]

J. M. Carroll, M. B. Rosson, G. Chin, and J. Koenemann, “Requirements development in scenario-based design,” IEEE Transactions on Software Engineering, vol. 24, pp. 1156 1170, 1998.

[6]

K. Casey and C. Exton, “A Java 3D Implementation of a Geon Based Visualization tool for UML,” presented at PPPJ, Kilkenny, Ireland, 2003.

[7]

S. Cunning, J., “Test scenario generation from structural requirements specification,” presented at Symposium on Engineering of Computer-Based Systems (ECBS '99), Nashville, TN, USA, 1999.

[8]

A. Davis and P. Hsia, “Giving voice to requirements engineering,” IEEE Software, vol. 11, pp. 12-16, 1994.

[9]

J. C. S. do Prado Leite and L. M. Cysneiros, “Nonfunctional Requirements: From Elicitation to Conceptual Models,” IEEE Transactions on Software Engineering, vol. 30, pp. 328-350, 2004.

32

[10]

P. Dubois, E. Dubois, and J. Zeippen, “On the Use of a Formal Representation,” presented at 3rd IEEE International Symposium on Requirements Engineering, Los Alamitos CA, 1997.

[11]

G. Engels, “Model-Based Verification and Validation of properties,” Electronic Notes in Theoretical Computer Science, vol. 82, 2003.

[12]

N. Fenton, “Applying Bayesian belief networks to critical systems assessment,” Critical Systems, vol. 8, pp. 10-13, 1999.

[13]

N. Fenton, “A critique of software defect prediction models,” IEEE Transactions on Software Engineering, vol. 25, pp. 675-689, 1999.

[14]

N. Fenton and P. Cates, “JSIMP: BN model and tool for the SIMP project,” Queen Mary (University of London), London 30 July 2003.

[15]

N. Fenton, P. Krause, and M. Neil, “Software Measurement: Uncertainty and Causal Modeling,” IEEE Software, vol. 10, pp. 116-122, 2002.

[16]

N. Fenton and B. Littlewood, Software reliability and metrics: Elsevier, 1991.

[17]

N. Fenton and N. Maiden, “Making Decisions: Using BNs and MCDA.” London.: Computer Science Dept, Queen Mary and Westfield College, 2000.

[18]

N. Fenton and M. Neil, “Software metrics: successes, failures and new directions,” Journal of Systems Software, 2000.

[19]

N. Fenton and S. L. Pfleeger, Software Metrics: A Rigorous Approach. London: International Thomson Computer Press, 1997.

[20]

A. Fuxman, M. Pistore, J. Mylopoulos, and P. Traverso, “Model Checking Early Requirements Specifications in Tropos,” presented at International Symposium on Requirements Engineering 01, Toronto, Canada, 2001.

[21]

J. Galliers, S. Sutcliffe, and S. Minocha, “An impact analysis method for safety-critical user interface design,” IEEE Transactions on Software Engineering, vol. 6, pp. 341-369, 1999.

[22]

A. Gemino, “Empirical comparison of animation and narration in requirements validation,” Requirements Engineering, vol. 9, pp. 153-168, 2003.

[23]

A. Grau and M. Kowsari, “A validation system for object-oriented specifications of information systems,” presented at 1st East European symposium on advances in databases and information systems (ADBIS '97), St Petersburg, 1997.

[24]

A. Gregoriades, J. E. Shin, and A. G. Sutcliffe. “Human-centred requirements engineering”. In Proceedings: RE 04, Kyoto Japan,. Los Alamitos CA: IEEE Computer Society Press, pp154164, 2004.

[25]

A. Gregoriades, A. G. Sutcliffe, and H. Karanikas, “Evaluation of the SRA Tool Using Data Mining Techniques,” presented at CAiSE 2003, Klagenfurt/Velden, Austria, 2003.

[26]

A. Gregoriades, A. G. Sutcliffe, and J. E. Shin, “Assessing the Reliability of Socio-technical Systems,” presented at 12th Annual Symposium INCOSE, Las Vegas, USA, 2002.

[27]

K. M. Hansen, A. P. Ravn, and V. Stavridou, “From safety analysis to software requirements,” IEEE Transactions on Software Engineering, vol. 24, pp. 573 - 584, 1998.

[28]

P. Haumer, K. Pohl, and K. Weidenhaupt, “Requirements elicitation and validation with real world scenes,” IEEE Transactions on Software Engineering, vol. 24, pp. 1036-1054, 1998.

33

[29]

C. Heitmeyer, J. Kirby, and B. Labaw, “Applying the SCR requirements method to a weapons control panel: An experience report,” presented at FMSP 98, Clearwater Beach, Florida, USA, 1998.

[30]

E. Hollnagel, Cognitive Reliability & Error Analysis Method: Elsevier Science, 1998.

[31]

E. Hollnagel, Human Reliability Analysis Context and Control. New York: Academic Press, 1993.

[32]

E. Hollnagel, “The phenotype of erroneous actions: Implications for HCI design,” in Humancomputer Interaction and complex systems, G. Weir and J. Alty, Eds. London: Academic Press, 1990.

[33]

P. Hsia, A. Davis, and D. Kung, “Status Report: Requirements engineering,” IEEE Software, vol. 10, pp. 75-79, 1993.

[34]

R. Jeffords and C. Heitmeyer, “A strategy for efficient verifying requirements specification using composition and invariants,” presented at ESEC/FSE 03, Helsinki, Finland, 2003.

[35]

B. I. Kirwan, A Guide to Practical Human Reliability Assessment. London: Taylor and Francis, 1994.

[36]

V. Lalioti, “Animation for validation of business system specifications,” presented at Hawaii International Conference on System Sciences 30, The dynamics of business systems engineering, Wailea, Hawaii, January 1997, pp 7-10, 1997.

[37]

V. Lalioti and P. Loucopoulos, “Visualisation of conceptual specifications.,” Information Systems, vol. 19, pp. 291-309, 1994.

[38]

N. G. Leveson, “Intent specifications: an approach to building human-centered specifications,” IEEE Transactions on Software Engineering, vol. 26, pp. 15 - 35, 2000.

[39]

N. G. Leveson, Safeware: System Safety and Computers. Reading, MA.: Addison Wesley, 1995.

[40]

J. Mylopoulos, L. Chung, and B. Nixon, “Representing and using non-functional requirements: A process oriented approach,” IEEE Transactions on Software Engineering, vol. 18, pp. 483-497, 1992.

[41]

J. Mylopoulos, L. Chung, and E. Yu, “From Object-Oriented to Goal-Oriented Requirements Analysis,” Communications of the ACM, vol. 42, pp. 1-7, 1999.

[42]

M. Neil, N. Fenton, and L. Nielsen, “Building large-scale Bayesian Networks,” The Knowledge Engineering Review, vol. 15, pp. 257-284, 2000.

[43]

B. Nixon, “Management of performance requirements for information systems,” IEEE Transactions on Software Engineering, vol. 26, pp. 1122 - 1146, 2000.

[44]

D. Norman, The psychology of everyday things. New York: MIT Press, 1988.

[45]

J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Information. San Francisco: Morgan Kaufmann, 1988.

[46]

C. Potts, “ ScenIC: A strategy for inquiry-driven requirements determination,” presented at RE'99: International Symposium on Requirements Engineering, Limerick, Ireland, 1999.

[47]

C. Potts and A. Anton, “A Representational Framework for Scenarios of System Use,” Requirements Engineering, vol. 3, pp. 219-241, 1998.

34

[48]

C. Potts, K. Takahashi, and A. Anton, “Inquiry-Based Requirements Analysis,” IEEE Software, vol. 11, pp. 21-32, 1994.

[49]

C. Potts, K. Takahashi, J. Smith, and K. Ota, “An Evaluation of Inquiry-Based Requirements Analysis for an Internet Service,” presented at Second International Symposium on Requirements Engineering, York, UK, 1995.

[50]

J. Rasmussen, “Human Error and the Problem of Causality in Analysis of Accidents,” Philosophical Transactions of the Royal Society of London Series B - Biological Sciences, vol. 327, pp. 449-462, 1990.

[51]

J. Rasmussen, “ Skills, rules, knowledge; signals, signs, and symbols; and other distinctions in human performance models,” IEEE Transactions on System Man and Cybernetics, vol. 13, pp. 257-266., 1983.

[52]

J. Reason, Human Error. New York: Cambridge University Press, 1990.

[53]

J. Reason, Managing the Risks of Organizational Accidents. Ashgate: Aldershot, 2000.

[54]

C. Rolland, C. Souveyet, and C. B. Achour, “Guiding goal modeling using scenarios,” IEEE Transactions on Software Engineering, vol. 24, pp. 1055 - 1071, 1998.

[55]

J. Ryser and M. Glinz, “A scenario-based approach to validating and testing software systems using statecharts,” presented at 12th International Conference on Software and Systems Engineering and their Applications ICSSEA' 99, Paris, France, 1999.

[56]

N. Seyff, P. Grunbacher, N. Maiden, and A. Toscar, “Requirements engineering tools go mobile,” presented at International conference on software engineering (ICSE 04), Scotland, 2004.

[57]

J. E. Shin, A. Sutcliffe, and A. Gregoriades, “Scenario Advisor Tool for Requirements Engineering,” Requirements Engineering, vol. Online http://www.springerlink.com/app/home/journal.asp?wasp=m3tlwhruwl4u54qmhqvl&referrer= parent&backto=linkingpublicationresults,1:102830,1, 2004.

[58]

Sutcliffe, A.G. The Domain Theory: Patterns for Knowledge and Software Reuse. Mahwah NJ: Lawrrence Erlbaum Associates, 2002.

[59]

A. G. Sutcliffe, J. Galliers, and S. Minocha, “Human Errors and System Requirements,” presented at 4th IEEE International Symposium on Requirements Engineering, Los Alamitos, 1999.

[60]

A. G. Sutcliffe and A. Gregoriades. “Validating Functional System Requirements with Scenarios.” In Proceedings of 1st IEEE Joint International Conference on Requirements Engineering, RE02, Essen, Germany Sept 2002, Eds Greenspan S., Siddiqi J., Dubois E. and Pohl K., pp 181-190. Los Alamitos CA: IEEE Computer Society Press, 2002.

[61]

A. Sutcliffe, N. Maiden, S. Minocha, and M. Darrel, “Supporting scenario based requirements engineering,” IEEE Transactions on software engineering, vol. 24, pp. 1072-1088., 1998.

[62]

A. G. Sutcliffe and G. Rugg, “A taxonomy of error types for failure analysis and risk assessment,” International Journal of Human Computer Interaction, vol. 10, pp. 381-406., 1998.

[63]

A. Sutcliffe, G. and M. Ryan, “Assessing the Usability and Efficiency of Design Rationale,” presented at Human Computer Interaction INTERACT-97, IFIP/Chapman and Hall, 1997.

[64]

A. D. Swain and H. Guttmann, “Handbook of human reliability analysis with emphasis on nuclear power plants applications,” Nuclear Regulatory Commission, Washington, DC 1983.

35

[65]

A. van Lamsweerde, “Goal-Oriented Requirements Engineering: A Guided Tour,” presented at Fifth IEEE International Symposium on Requirements Engineering (RE '01), 2001.

[66]

A. van Lamsweerde, “Goal-oriented requirements engineering: a roundtrip from research to practice,” presented at Requirements Engineering Conference, Kyoto, Japan, 2004.

[67]

A. van Lamsweerde and E. Letier, “Handling obstacles in goal-oriented requirements engineering,” IEEE Transactions on Software Engineering, vol. 26, pp. 978 - 1005, 2000.

[68]

M. Visser and P. A. Wieringa, “PREHEP: human error probability based process unit selection,” IEEE Transactions on Software Engineering, vol. 31, pp. 1 - 15, 2001.

[69]

D. Wright and K. Cai, “Representing uncertainty for safety critical systems,” City University, London 1994.

[70]

E. Yu and J. Mylopoulos, “Towards Modelling Strategic Actor Relationships for Information Systems Development, with Examples from Business Process Reengineering,” presented at 4th Workshop on Information Technologies and Systems, Vancouver, B.C., Canada, 1994.

[71]

H. Zhu and L. Jin, “Scenario analysis in an automated tool for requirements engineering,” Requirements Engineering, vol. 5, pp. 2-22, 2000.

[72]

H. Ziv and D.J. Richardson. “Constructing Bayesian-network Models of Software Testing and Maintenance Uncertainties”, International Conference on Software Maintenance, Bari, Italy, September 1997.

36

Appendix A: BN models: summary of input nodes and measurements Node

Description + measure

Worst-case settings

Noise

Ambient noise: decibels (dB)

>100 dB (good 8

Visibility

Visibility from vessel in nautical miles

3 concurrent tasks

Duty time

Agent’s time on duty and at sea

>3 months continuously at sea

Fatigue

Time on watch, weighted by war/peace

>7 hours on duty at high alert

Time constraints

Time available to complete a task

Response necessary

Suggest Documents