StakeRare: Using Social Networks and ... - Semantic Scholar

3 downloads 339866 Views 3MB Size Report
paper proposes StakeRare, a novel method that uses social networks and ... rate an initial list of requirements, recommends other relevant requirements to them ...
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

1

StakeRare: Using Social Networks and Collaborative Filtering for Large-Scale Requirements Elicitation Soo Ling Lim, and Anthony Finkelstein, Member, IEEE Abstract—Requirements elicitation is the software engineering activity in which stakeholder needs are understood. It involves identifying and prioritising requirements – a process difficult to scale to large software projects with many stakeholders. This paper proposes StakeRare, a novel method that uses social networks and collaborative filtering to identify and prioritise requirements in large software projects. StakeRare identifies stakeholders and asks them to recommend other stakeholders and stakeholder roles, builds a social network with stakeholders as nodes and their recommendations as links, and prioritises stakeholders using a variety of social network measures to determine their project influence. It then asks the stakeholders to rate an initial list of requirements, recommends other relevant requirements to them using collaborative filtering, and prioritises their requirements using their ratings weighted by their project influence. StakeRare was evaluated by applying it to a software project for a 30,000-user system, and a substantial empirical study of requirements elicitation was conducted. Using the data collected from surveying and interviewing 87 stakeholders, the study demonstrated that StakeRare predicts stakeholder needs accurately, and arrives with a more complete and accurately prioritised list of requirements compared to the existing method used in the project, taking only a fraction of the time. Index Terms—Requirements/Specifications, Elicitation methods, Requirements prioritisation, Experimentation, Human factors, Recommender systems, Social network analysis, Stakeholder analysis

—————————— ! ——————————

1

INTRODUCTION

S

OFTWARE systems are growing. The increase in size extends beyond mere lines of code or number of modules. Today, projects to build large software systems involve vast numbers of stakeholders – the individuals or groups that can influence or be influenced by the success or failure of a software project [1]. These stakeholders include customers who pay for the system, users who interact with the system to get their work done, developers who design, build, and maintain the system, and legislators who impose rules on the development and operation of the system [1, 2]. In large projects, these stakeholders cut across divisions and organisations. They have diverse needs, which may conflict. Requirements elicitation is the software engineering activity in which stakeholder needs are understood [1]. It aims to identify the purpose for which the software system is intended [3]. It involves identifying stakeholders and prioritising them based on their influence in the project. It also involves identifying requirements from these stakeholders and prioritising their requirements. StakeRare is a method to identify and prioritise requirements using social networks and collaborative filtering. It aims to address three problems that beset largescale requirements elicitation: information overload, inadequate stakeholder input, and biased prioritisation of require————————————————

• S.L. Lim is with the Department of Computer Science, University College London, Gower Street, London, WC1E 6BT. E-mail: [email protected]. • A. Finkelstein is with the Department of Computer Science, University College London, Gower Street, London, WC1E 6BT. E-mail: [email protected].

ments. Information overload is inevitable in big projects. These projects tend to have many stakeholders and requirements. Existing methods for requirements elicitation require intensive interactions with the stakeholders, for example through face-to-face meetings, interviews, brainstorming sessions, and focus groups [1]. These methods lack the means to manage the information elicited from stakeholders. As such, the methods fail to scale to big projects with hundreds, thousands, or even hundreds of thousands of stakeholders [4]. Practitioners struggle to use these methods in large projects. Inevitably, stakeholders are omitted and their requirements overlooked. Users become frustrated when the software fails to meet their needs. Customers who pay for the project pay for the mistakes [5]. Inadequate stakeholder input is caused by inadequate stakeholder selection. Omitting stakeholders is one of the most common mistakes in software engineering [6]. Existing stakeholder analysis methods are likely to overlook stakeholders [7]. In addition, stakeholders are often sampled during requirements elicitation [8]. As requirements are elicited from stakeholders, omitting stakeholders results in missing requirements, which in turn leads to the wrong product being built. Biased prioritisation of requirements occurs because current prioritisation practices depend on individuals, who may not have a global perspective in large projects [4, 9]. Although the literature suggests that prioritising from multiple stakeholders’ viewpoints can reveal important

xxxx-xxxx/0x/$xx.00 © 200x IEEE

2

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

requirements [10], the task is almost impossible to perform with many stakeholders and many requirements. As a result, important requirements known to only a few stakeholders can be lost in the sea of information. Those who attempt to get multiple viewpoints find it difficult to combine information from different sources [9]. Many practitioners avoid prioritising requirements or resort to rough guesses when they prioritise requirements [9]. Above all, the existing requirements elicitation literature is largely qualitative [1, 11]. Without empirical evaluations using real projects, no-one can be certain how well one method performs against another, or indeed whether the methods work at all! To address these problems, this work proposes a method that uses social networks and collaborative filtering for requirements elicitation. In doing so, the work makes the following contributions: • The development of StakeRare, a novel method that uses social networks and collaborative filtering to support requirements elicitation in large-scale software projects. o StakeRare stands for Stakeholder- and Recommender-assisted method for requirements elicitation. StakeRare supports requirements elicitation in projects where there are many stakeholders who must be heard, but unable to meet, possibly due to sheer numbers, dispersed locations, or lack of time. It aims to be open and inclusive so that the stakeholders can participate in requirements elicitation. o StakeRare uses social networks to identify and prioritise stakeholders and their roles in the project. Then it asks the stakeholders to rate an initial list of requirements, recommends other relevant requirements to them using collaborative filtering, and prioritises their requirements using their ratings weighted by their project influence derived from their position on the social network. o StakeRare addresses information overload by using collaborative filtering to recommend relevant requirements to stakeholders, and prioritising the stakeholders and requirements. It addresses inadequate stakeholder input by asking stakeholders to recommend other stakeholders, and asking all stakeholders to provide requirements. It addresses biased prioritisation of requirements by prioritising requirements using the stakeholders’ ratings on the requirements and their position on the social network. • The evaluation of StakeRare using a real large-scale

software project. o The evaluation is empirical and appears to be one of the first in requirements elicitation (as reviewed in [12]). It is substantial, using postproject knowledge to establish the ground truth of requirements. It uses measurements from the information retrieval literature, such as precision,

recall, and mean absolute error, to determine the quality of the requirements returned by StakeRare. It also compares StakeRare to the existing methods used in the project. o The evaluation provides clear evidence that StakeRare can identify a highly complete set of requirements, and prioritise them accurately. In addition, it is straightforward to use, and requires less time from the requirements engineers and stakeholders compared to the existing methods used in the project. The rest of the paper is organised as follows. The next section reviews the existing literature. Section 3 describes the StakeRare method and Section 4 evaluates StakeRare. Section 5 identifies the limitations of the study, Section 6 describes future work, and Section 7 concludes.

2

BACKGROUND

2.1 Large-Scale Software Projects In this work, the definition of a large-scale software project is derived from the existing measures of project size and definitions of large-scale software projects. As requirements elicitation is the focus of this work, the definition measures the size of the requirements engineering tasks, rather than the size of the software system. There are a number of existing measures to size a project, leading to different views on what constitutes largescale. Popular measures of project size include lines of code (LOC), function points (FP), number of developers, and man-hours [13-19]. LOC counts the number of nonblank, non-comment lines in the text of a software program’s source code [20-24]. FP determines size by identifying the components of the system as seen by the enduser, such as the inputs, outputs, interfaces to other systems, and logical internal files. Number of developers counts the number of developers involved in the project. A man-hour or person-hour is the amount of work performed by an average worker for an hour [19]. These measures have been used to indicate the relative size of projects (Table 1) [25-28]. But the numbers to indicate size are not absolute and may vary across different work. For example, McConnell [23] considered small projects to have 2,500 LOC, but Kruchten [29] considered them to have 10,000 LOC. McConnell [24] considered projects with 500,000 LOC as very large, but Kruchten [29] considered projects with 700,000 LOC as large. These measures are more suitable for development [23, 24] and less so for elicitation. For example, a software project to solve a complicated set of differential equations may be very large in terms of LOC or man-hours, but may only have a small number of stakeholders [30]. Although the project is considered large in terms of development effort, it is small in terms of elicitation effort [30]. In requirements elicitation, the number of stakeholders is often used to size a project. Cleland-Huang and Mobasher define an ultra-large-scale project to have thousands or even hundreds of thousands of stakeholders [4]. Burstin and Ben-Bassat define a large software system as

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

models [48-50]. Nevertheless, similar to traditional techniques, they require face-to-face meetings, hence do not † ‡ ~ scale well to large projects [4, 3]. ! " # $ (Source: [15]*, [25]^, [26] , [23] , [24] , [27] , [32] , [28] , [18] ) TABLE 1 PROJECT SIZE AND MEASURES

Project Size

Lines of Code

Small Large Ultra-large

< 2,000^ > 500,000*!‡ 1,000,000,000~

Measure Function Points < 100* > 5,000* > 100,000$

Number of Developers < 5† > 50‡" > 1,000#

“a software system that has a large and diversified community of users, and entails a variety of human, organisational, and automated activities, and various, sometimes conflicting, aspects of different parts of its environment” [30]. Large, complex projects have multiple stakeholder groups that cut across many different agencies, divisions, and even organisations [31]. According to Northrop et al. [32] and Cheng and Atlee [11], the human interaction element makes requirements elicitation the most difficult activity to scale in software engineering. For example, the FBI Virtual Case File project is widely cited in the existing literature as a large-scale software project [4, 33-36]. It had 12,400 users (agents who would use the software) and more than 50 stakeholder groups (the FBI consisted of 23 divisions which previously had their own IT budget and systems, and the agents worked out of 56 field offices) [37]. This work defines a large-scale software project as a software project with dozens of stakeholder groups and tens of thousands of users, where users are members of the stakeholder groups, and a stakeholder group contains one or more stakeholder roles. These stakeholders have differing and sometimes conflicting requirements. This definition measures the size of the project in terms of the requirements engineering task, and is based on the requirements engineering literature discussed in previous paragraphs.

2.2 Requirements Elicitation Elicitation Techniques In requirements elicitation, traditional techniques, such as interviews and focus groups, form the basis of existing practice [1, 38-40]. In interviews, the requirements engineers approach stakeholders with questions to gain information about their needs [41]. Focus groups bring stakeholders together in a discussion group setting, where they are free to interact with one another. These techniques are effective but require direct interaction between the requirements engineers and stakeholders. As such, they are difficult to scale to a large number of stakeholders. More advanced elicitation techniques improve the completeness and variety of the identified requirements by catalysing discussions and exploring the stakeholders’ needs. These techniques include prototyping [42], metaphors [43], storyboards [44, 45], and model-driven techniques such as use cases [38, 46], scenarios [47], and goal

Prioritisation Techniques Projects often have more requirements than time, resource, and budget allow for. As such, requirements should be prioritised and managed so that those that are critical and most likely to achieve customer satisfaction can be selected for implementation [36, 51-53]. A prioritisation technique commonly used in practice is the numeral assignment technique [36, 53, 54]. In this technique, each requirement is assigned a value representing its perceived importance. For example, requirements can be classified as mandatory, desirable, or inessential [53]. Numeral assignment is straightforward, but a study by Karlsson [53] found that the participants’ opinions about the numbers in the numeral assignment technique differ, and the scoring system is often inconsistent as different people make use of different personal scales. Nevertheless, this technique is widely used due to its simplicity. Another popular technique is the pairwise comparison approach [53]. In this approach, requirements engineers compare two requirements to determine the more important one, which is then entered in the corresponding cell in the matrix [53, 55]. The comparison is repeated for all requirements pairs such that the top half of the matrix is filled. If both requirements are equally important, then they both appear in the cell. Then, each requirement is ranked by the number of cells in the matrix that contain the requirement. Pairwise comparison is simple. However, since all unique pairs of requirements need to be compared, the effort is substantial when there are many requirements [55]. Prioritising n requirements needs n!(n–1)/2 comparisons [55, 56]. Hence, a project with 100 requirements would require 4,950 comparisons. Many existing approaches, including those previously mentioned, prioritise requirements from an individual’s perspective. Other similar approaches include the costvalue approach which prioritises requirements based on their relative value and implementation cost [53, 55, 56], and the value-oriented prioritisation method which prioritises requirements based on their contribution to the core business values and their perceived risks [57]. As prioritisations involve a small subset of stakeholders, the results are biased towards the perspective of those involved in the process [4]. More sophisticated methods combine prioritisations from multiple stakeholders. In the 100-point test, each stakeholder is given 100 points that they can distribute as they desire among the requirements [58]. Requirements that are more important to a stakeholder are given more points. Requirements are then prioritised based on the total points allocated to them. 100-point test incorporates the concept of constraint in the stakeholder’s prioritisation by giving each of them a limited number of points. One criticism of this approach is that it can be easily manipulated by stakeholders seeking to accomplish their own objectives [36, 59]. For example, stakeholders may

3

4

distribute their points based on how they think others will do it [60]. In addition, it is difficult for stakeholders to keep an overview of a large number of requirements [54]. In the requirements triage method, Davis [61] proposed that stakeholders should be gathered in one location and group voting mechanisms used to prioritise requirements. One method to collect group vote is to use the show of fingers to indicate the stakeholders’ enthusiasm for a requirement. A disadvantage is the relative priorities of requirements depend on the stakeholders who attended the prioritisation meeting, and dominant participants may influence the prioritisation [36]. In the win-win approach proposed by Boehm, stakeholders negotiate to resolve disagreements about candidate requirements [62, 63]. Using this approach, each stakeholder ranks the requirements privately before negotiations start. They also consider the requirements they are willing to give up on. Stakeholders then work collaboratively to forge an agreement through identifying conflicts and negotiating a solution. Win-win negotiations encourage stakeholders to focus on their interest rather than positions, negotiate towards achieving mutual gain, and use objective criteria to prioritise requirements. Nevertheless, the approach is labour intensive, particularly in large projects [59]. Another method that involves multiple stakeholders is the value, cost, and risk method proposed by Wiegers [64]. In Wiegers’ method, the customer representatives estimate the value of each requirement, which is the relative benefit each requirement provides to them and the relative penalty they suffer if the requirement is not included. The project team estimates the relative cost of implementing each requirement and the relative degree of risk associated with each requirement. The priority of each requirement is calculated from its value, cost, and risk such that requirements at the top of the list have the most favourable balance of the three elements. This method is limited by the individual’s ability to determine the value, cost, and risk for each requirement [64]. Many existing prioritisation methods consider requirements to have a flat structure and be independent of one another [65]. However, requirements are often defined at different levels of abstraction. For example, a high-level requirement can be refined into several specific requirements [48, 66]. Hierarchical cumulative voting (HCV) proposed by Berander and Jönsson [54] enables prioritisations to be performed at different levels of a hierarchy. Stakeholders perform prioritisation using 100point test within each prioritisation block. The intermediate priorities for the requirements are calculated based on the characteristics of the requirements hierarchy. Final priorities are calculated for all requirements at the level of interest through normalisation. If several stakeholders have prioritised the requirements, their individual results are then weighted and combined. When doing so, different stakeholders may have different weights. Although the hierarchical prioritisation in HCV makes it easier for the stakeholders to keep an overview of all the requirements, the prioritisations need to be interpreted in a rational way as stakeholders can easily play around with

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

the numbers [54]. There is a plethora of methods to prioritise requirements, such as multi-attribute utility theory [67], top 10 requirements [41], outranking [68], minimal spanning tree [55], cost benefit analysis [69], and Quality Function Deployment [70]. Many of these methods have similar shortcomings: significant effort is required when there are many requirements and the requirements’ priorities are easy to manipulate [60]. For example, the Quality Function Deployment suggests the limit of 30 requirements [71]. Cost benefit analysis relies on the type of costs included in the analysis by the decision-makers which may be biased due to their vested interest [69]. One of the few methods that can scale to a large number of requirements is the binary search tree (BST) [72]. In BST, a requirement from the set of requirements is selected as the root node. Then, a binary tree is constructed by inserting less important requirements to the left and more important ones to the right of the tree. A prioritised list of requirements is generated by traversing the BST in order. The output is a prioritised list of requirements with the most important requirements at the start of the list, and the least important ones at the end. This method is simple to implement but provides only a simple ranking of requirements as no priority values are assigned to the requirements [36]. For projects with many requirements, recent work by Laurent et al. [34] and Duan et al. [36] propose Pirogov, which uses data mining and machine learning techniques to support requirements prioritisation. Pirogov uses various clustering techniques to organise requirements into different categories. The requirements engineers then prioritise the clusters and determine the importance of each clustering technique. Using the information, Pirogov generates a list of prioritised requirements. By automatically clustering the requirements into different categories, Pirogov reduces the number of manual prioritisations. It is a significant step towards large-scale requirements elicitation. But at the moment, the results of prioritisation depend on the requirements engineers’ subjective prioritisation of the clusters and clustering techniques [36].

2.3 Social Network Analysis Social network analysis is the application of methods to understand the relationships among actors, and on the patterns and implications of the relationships [73]. In social network analysis, actors are discrete individuals, corporate, or collective social units, such as employees within a department, departments within a corporation, and private companies in a city [73]. These actors are linked to one another by relational or social ties, such as evaluation of one person by another (e.g., friendship or respect), transfers of material resources (e.g., business transaction), and formal relations (e.g., authority) [73]. In social network analysis, the snowballing method proposed by Goodman [74] is used to sample social network data for large networks where the boundary is unknown [73, 75]. It is also used to track down “special” or “hidden” populations, such as business contact networks, community elites, and deviant sub-cultures [76]. Snowball

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

sampling begins with a set of actors [73, 75, 76]. Each of these actors is asked to nominate other actors. Then, new actors who are not part of the original list are similarly asked to nominate other actors. As the process continues, the group of actors builds up like a snowball rolled down a hill [75]. The process continues until no new actors are identified, time or resources have run out, or when the new actors being named are very marginal to the actor set under study [76]. A social network is a structure that consists of actors and the relation(s) defined on them [73, 75]. It is often depicted as a graph in which the actors are represented as nodes and the relationships among the pairs of actors are represented by lines linking the corresponding nodes [73, 75]. The graph can be binary or valued, directed or undirected, depending on the relations between the actors. If the relations are directed, then the links have direction and if the relations are valued, the links have weights attached to them. Using graph structures to represent social networks enables large sets of social network data to be visualised. The centrality of actors in their social networks is of great interest to social network analysts [75]. Actors that are more central have a more favourable position in the network [76]. For example, in a friendship network, an actor who is connected to many actors in the network is popular. In a business contact network, an actor that sits in between clusters of networks has high influence on the information that passes between the clusters. A number of different social network measures have been developed to measure the centrality of social network actors, such as betweenness centrality, load centrality, degree centrality, in-degree centrality, and out-degree centrality [73, 75, 76]. In requirements engineering, Damian et al. used social network analysis to study collaboration, communication, and awareness among project team members [77, 78]. The nodes were members of the development team who are working on, assigned to, or communicating about the requirements in the project. Social network measures, such as degree centrality and betweenness centrality, were used to analyse the collaboration behaviour. For example, degree centrality indicated active members and betweenness centrality indicated members who control interactions between other members.

2.4 Collaborative Filtering Collaborative filtering is a technique to filter large sets of data for information and patterns [79]. This technique is used in recommender systems to forecast a user’s preference on an item by collecting preference information from many users [80]. For example, Amazon1 uses collaborative filtering to recommend books to their customers and MovieLens2 uses it to recommend movies [80, 81]. The underlying assumption is that users who have had similar taste in the past will share similar taste in the future [82]. In collaborative filtering, users are the individuals who

1 2

http://www.amazon.com/ !""#$%%&&&'()*+,-,./')01%

provide ratings to a system and receive recommendations from the system. Items can consist of anything for which ratings can be provided, such as art, books, songs, movies, vacation destinations, and jokes [83]. A rating is a numerical representation of a user’s preference for an item. A profile is the set of ratings that a particular user has provided to the system. Collaborative filtering systems take a set of ratings from the user community as input, use this set of ratings to predict missing ratings, and use the predictions to create a list of items that is personalised for each user. This list of items are then presented to the user as recommendations [82]. To produce predictions, collaborative filtering systems use a variety of algorithms. One of the most well-known algorithms is the k-Nearest Neighbour (kNN) algorithm [80, 84, 85]. kNN is used to identify like-minded users with similar rating histories in order to predict ratings for unobserved users-item pairs [85]. kNN first finds a unique subset of the community for each user by identifying those with similar interests. To do so, every pair of user profile is compared to measure the degree of similarity. A popular method is Pearson’s correlation coefficient, which measures the degree of linearity between the intersection of the pair of users’ profiles [80]. Then, a neighbourhood is created for each user by selecting the k most similar users. The similarity between each pair of user profiles for users in the neighbourhood is used to compute predicted ratings. Finally, the predicted ratings for the items are sorted according to the predicted value, and the top-N items are proposed to the user as recommendations, where N is the number of items recommended to the user [80]. In requirements engineering, Castro-Herrera et al. uses collaborative filtering to facilitate online discussions for requirements identification [86, 87]. Their method, named Organiser and Promoter of Collaborative Ideas (OPCI), uses clustering to group the stakeholder’s ideas into an initial set of discussion forums and construct a stakeholder profile for each stakeholder. These profiles are used by the kNN algorithm to identify stakeholders with similar interests and suggest additional forums that might be of interest to the stakeholders. By recommending suitable forums to stakeholders, OPCI aims to encourage stakeholders to contribute to relevant forums and increase the quality of the elicited requirements. OPCI uses collaborative filtering to recommend forums of interest to stakeholders. It has inspired the work described in this paper to use collaborative filtering to recommend requirements of interest to stakeholders, in order to support large-scale requirements elicitation. Recommending relevant requirements to stakeholders can reduce the number of requirements each stakeholder has to identify and prioritise, while still ensuring they are aware of the requirements they may be interested in.

5

6

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

Step 1

Step 2

Step 3

Step 4

Input: None

Input: Stakeholder list from Step 1.

Input: Profiles from Step 2.

Input: Stakeholder list from Step 1, and profiles from Step 3.

Substeps: 1. Requirements engineer provides initial stakeholders. 2. Initial and newly identified stakeholders provide recommendations. 3. StakeRare builds social network. 4. StakeRare prioritises stakeholders and roles using social network measures.

Substeps: 1. Requirements engineer provides initial requirements. 2. Stakeholders in stakeholder list rate requirements and provide other requirements. 3. StakeRare propagates ratings.

Substeps: 1. StakeRare applies collaborative filtering algorithms on profiles to predict unrated requirements. 2. StakeRare recommends requirements that may be relevant to the stakeholders. 3. Stakeholders rate recommended requirements. Substeps 1-3 can be repeated using updated profiles.

Output: Prioritised list of stakeholders and roles.

Output: Profiles of stakeholders who responded.

Substeps: 1. StakeRare calculates project influence for each stakeholder. 2. StakeRare calculates score for each requirement. 3. StakeRare prioritises requirements based on score. Output: Prioritised list of requirements.

Output: Updated profiles.

Fig. 1. StakeRare steps.

2.5 Summary In this paper, a large-scale software project is defined as a software project with dozens of stakeholder groups and tens of thousands of users, where users are members of the stakeholder groups. These stakeholders have differing and sometimes conflicting requirements. Existing methods to identify and prioritise requirements do not scale well to large projects. Most elicitation methods require face-to-face meetings with the stakeholders, hence is time consuming when there are many stakeholders. Existing requirements prioritisation methods require substantial efforts from the requirements engineers when there are many requirements. Furthermore, requirements prioritisation from an individual’s perspective is likely to be biased, especially in large projects where no individual has the global perspective. An ideal method in requirements elicitation should identify and prioritise stakeholders and their requirements from a global perspective. It should be independent of the individual doing the analysis, and scalable to large projects. In doing so, it should not overload stakeholders with information or burden the requirements engineers. The aim of this work, described in the next section, is to develop such a method using the existing techniques in social networks and collaborative filtering described in this section.

3

STAKERARE

Large projects tend to be beset by three problems: information overload, inadequate stakeholder input, and biased prioritisation of requirements. StakeRare is a method that uses social networks and collaborative filtering to elicit requirements in large projects. To address the problem of inadequate stakeholder input, StakeRare aims to be open and inclusive, so that a representative sample of stakeholders participates in the requirements elicitation process. As stakeholders are socially related to one another, they can be identified and prioritised using their relations. StakeRare exploits previ-

TABLE 2 STAKERARE CONCEPTS Concept Salience

Scope Stakeholder

Stakeholder role Requirement Rating Profile

Definition The level of influence a stakeholder has on the project [88]. Stakeholders with high salience are crucial to project success; stakeholders with low salience have marginal impact. The work required for completing the project successfully [39]. An individual or a group who can influence or be influenced by the success or failure of a project [1]. The stakeholder’s position or customary function in the project [2]. The real-world goals for, functions of, and constraints on software systems [3]. Numerical importance of a requirement to the stakeholder [80]. The set of requirements and their ratings provided by a stakeholder [80].

ous work [89] to do this. The previous work asks stakeholders to recommend other stakeholders, builds a social network with stakeholders as nodes and their recommendations as links, and prioritises stakeholders from a global perspective using social network measures [89]. To avoid overloading stakeholders with information, StakeRare uses collaborative filtering to present only the requirements that are relevant to them. StakeRare asks each stakeholder to rate an initial list of requirements, and based on the list, identifies a neighbourhood of similar stakeholders for each stakeholder. Then, it predicts other relevant requirements for the stakeholder based on the requirements provided by similar stakeholders. These predictions are presented to the stakeholder to be approved and added into their set of ratings. To avoid overloading the requirements engineers with information, StakeRare prioritises stakeholders and their requirements.

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

Finally, to avoid biased prioritisation of requirements, StakeRare produces a prioritised list of requirements based on each stakeholder’s ratings and their influence in the project. The stakeholders’ influence in the project is produced from a global perspective, by running the social network measures on the stakeholder network. StakeRare has four steps (Fig. 1) and uses the concepts in Table 2.

Step 1: Identify and Prioritise Stakeholders Step 1 identifies and prioritises the stakeholders based on their influence in the project (Fig. 1). Stakeholders have to be identified, as they are the source of requirements. They have to be prioritised as their level of influence in the project affects the priority of their requirements. The output is a prioritised list of stakeholder roles and for each role, a prioritised list of stakeholders. StakeRare uses StakeNet for Step 1. StakeNet is a previously published stakeholder analysis method that produces such an output [89]. StakeNet identifies an initial set of stakeholders and asks them to recommend other stakeholders and stakeholder roles. A recommendation is a triple , where salience is a number on an ordinal scale (e.g., 1–5). For example, in a software project to implement a university access control system, Alice, a stakeholder representing the role of Estates that manages the university’s physical estate, can make a recommendation . StakeNet then asks Bob to recommend other stakeholders. Based on the stakeholders’ recommendations, StakeNet builds a social network with stakeholders as nodes and their recommendations as links [89]. An example stakeholder network is illustrated in Fig. 2. Finally, StakeNet applies various social network measures, such as betweenness centrality, degree centrality, and closeness centrality, to prioritise the stakeholders in the network [89]. The social network measures produce a score for each stakeholder. The stakeholder roles are prioritised by the highest score of their stakeholders. An example output is illustrated in Table 3. Fractional ranking or “1 2.5 2.5 4” ranking [90] is used such that if a tie in ranks occurs, the mean of the ranks involved is assigned to each of the tied items. For example, if Estates and Students have the same level of influence, then the ranks become Estates: Rank 1.5, Students: Rank 1.5, Library: Rank 3.

TABLE 3 EXAMPLE PRIORITISED LIST OF STAKEHOLDERS Prioritised Stakeholder Roles

Prioritised Stakeholders

(1) Estates

Alice

(2) Students

(1) Dave (2) Carl

(3) Library

Bob

Step 2: Collect Profile Step 2 collects a profile from each stakeholder identified in Step 1 (Fig. 1). Existing elicitation methods in the background section, such as interviews with a subset of stakeholders or focus groups, can be used to identify an initial list of requirements. Using the university access control project in Step 1 as an example, an interview with Alice from Estates revealed that one of the project objectives is to provide “better user experience.” Bob representing the library reveals that his requirement is “to combine library card with access card,” student Dave’s requirement is “to combine access card with bank card,” and Alice, representing the Estates, requests for “all in one card.” As mentioned in the background section, requirements can be defined at different levels of abstraction and a high-level requirement can be refined into several specific requirements [48, 66]. In this example, the requirements are organised into a hierarchy of three levels: project objective, requirement, and specific requirement3. Achieving all the specific requirements means that the parent requirement is achieved, and achieving all the parent requirements means that the project objective is achieved. For example, the requirement “all in one card” falls under the project objective “better user experience,” as it is easier to carry one card for all purposes (Fig. 3). Then, combining the various cards are specific requirements under “all in one card.”

Fig. 3. The hierarchy of requirements4. The stakeholders identified in Step 1 are asked to pro-

Fig. 2. Example stakeholder network.

3 Project objectives describe specific and measurable goals for the project, requirements describe what must be delivered to meet the project objectives, and specific requirements describe what must be delivered to meet the requirements. The hierarchy of requirements and their classification are determined by the requirements engineer, and may be modified during the elicitation process (e.g., stakeholders may provide more details to a specific requirement, resulting in an additional level to the hierarchy). 4 A requirement such as “to combine library card with access card, unless access card is also bank card” should be placed under “to combine library card with access card” as it is more specific.

7

8

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

vide their preferences on the initial requirements. A preference is a triple ,

where rating is a number on an ordinal scale (e.g., 0 – 5) reflecting the importance of the requirement to the stakeholder (e.g., 0 is unimportant and 5 is very important). For example, Alice provides a preference .

Stakeholders can also indicate requirements that they actively do not want (e.g., by rating the requirement an X). For example, Bob provides a preference

level of interest that a stakeholder will have in a requirement that he has not yet rated. StakeRare returns requirements that may be relevant to the stakeholder (i.e., requirements with the highest predicted level of interest) as recommendations at all three levels (e.g., Fig. 4). TABLE 4 STAKERARE ASSUMPTIONS ID

Assumption

A1

Specific requirements when unrated by the stakeholder have the same rating as their parent requirement, and the stakeholder agrees with the decomposition of the requirement into the specific requirements. (This assumption does not always hold. For example, the specific requirements may be in conflict, the stakeholder may have unequal preference among the specific requirements, and the rating for specific requirements may be higher than the requirements.)

A2

If a stakeholder cares about a specific requirement, they would care equally about the parent requirement. (This assumption does not always hold. For example, a stakeholder who actively does not want a specific requirement may still rate the higher-level requirement positively.)

A3

A stakeholder’s project influence is determined by their role in the project.

A4

The stakeholder represents only the role that gives him the highest weight.

.

Stakeholders can also rate requirements not in the list by adding their own requirements. The requirements added are then available to be rated by other stakeholders. If a requirement provided by a stakeholder does not have any specific requirements, specific requirements can be identified using existing elicitation methods (e.g., interviews) and added to the list to be rated. Finally, StakeRare propagates the ratings of requirements to avoid missing values. Rating propagation enables StakeRare to make prioritisations and predictions at different levels of detail. If a stakeholder rates a high-level requirement but does not rate the lower-level requirements, then his rating propagates down to the lower-level requirements. For example, Carl provides a preference .

Since Bob and Dave provided specific requirements for this requirement, Carl then implicitly provides two other preferences , and .

This propagation assumes that specific requirements when unrated by the stakeholder have the same rating as their parent requirement, and the stakeholder agrees with the decomposition of the requirement into the specific requirements (Table 4 A1). Similarly, if a stakeholder rates a lower-level requirement but does not rate the high-level requirement, then his rating propagates up to the highlevel requirement. This propagation assumes that if a stakeholder cares about a specific requirement, they would care equally about the parent requirement (Table 4 A2). If more than one specific requirement is rated, then the maximum rating is propagated.

Step 3: Predict Requirements Based on the stakeholders’ profile, Step 3 uses collaborative filtering to predict other requirements that each stakeholder needs or actively does not want (Fig. 1). StakeRare uses the k-Nearest Neighbour (kNN) algorithm described in the background section. Cross-validation is used to find the optimal value for k. kNN finds similar stakeholders by measuring the similarity between the stakeholders’ profiles. Then, it generates the predicted !

Stakeholders can then rate the requirements that are recommended to them, provide new requirements, or rate other requirements. The new ratings by the stakeholders are then added to their profiles. Then, Step 3 is repeated with the updated profiles. Step 3 can be repeated until no new ratings and requirements are provided by stakeholders after one round of recommendations.

Step 4: Prioritise Requirements For the final step, StakeRare aggregates all the stakeholders’ profiles into a prioritised list of requirements (Fig. 1). The ratings from the stakeholders’ profiles, and the priority of the stakeholders and their roles from Step 1 are used to prioritise requirements. Negative ratings (from a stakeholder actively not wanting a requirement) are excluded in the calculation, as their purpose is to highlight conflicts to the requirements engineers, rather than to prioritise the requirements. To calculate the importance of a requirement in a project, the influence of the stakeholder’s role in the project is determined, and then the influence of the stakeholders in their roles is determined as follows. The influence of stakeholder i’s role in the project is calculated using Equation 3-1. Influencerole(i) =

RRmax + 1" rank(role(i))

,

(3-1)

n

# (RRmax + 1" rank(role( j))) j=1

where role(i) is stakeholder i’s role in the project, RRmax is

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

StakeRare recommends the following requirements to you: 1. Card to have features to prevent sharing 2. To combine access card with fitness centre card The recommendations are based on the requirements you have rated: To combine library card with access card To combine ID card with session card

9

dent role is 0.33, and the Student role’s influence in the project is 0.42. Hence, Carl’s influence in the project is 0.33!0.42 = 0.1386. This calculation of project influence assumes that a stakeholder’s project influence is determined by their role in the project (Table 4 A3). The importance of a requirement is calculated using Equation 3-4 as follows. n

ImportanceR = # ProjectInfluencei " ri ,

(3-4)

i=1

Fig. 4. StakeRare’s output for Alice at the specific requirements level.

where ProjectInfluencei is the stakeholder i’s influence in the project (Equation 3-3), ri is the rating provided by the maximum rank of the roles in the list, rank(role(j)) is the! stakeholder i on requirement R, and n is the total number fractional rank of role j, and n is the total number of roles in of stakeholders who rated on requirement R. Following the list. Roles where none of the stakeholders provide rat- the previous example, the requirement “To combine liings are excluded. As lower rank values correspond to brary card with access card” is rated 5 by Alice and 4 by higher influence, this calculation inverts the rank value by Carl. Alice’s influence in the project is 0.42, and Carl’s subtracting it from the upper bound of maxrankRole + 1. The influence in the project is 0.1386, hence the requirement’s calculation also normalises the influence of a role by divid- importance is (0.42!5) + (0.1386!4) = 2.6544. If a stakeing it with the sum of all role influences. An example priori- holder has more than one role, only the position in the tised list of stakeholder roles is Estates: Rank 1.5, Students: role that gives him the highest weight is considered. By Rank 1.5, and Library: Rank 3 (fractional ranking is used doing so, StakeRare assumes that the stakeholder reprewhere Estates and Students have the same rank). Using this sents only the role that gives him the highest weight (Table 4 A4). example, Estates’ influence is , Students’ inFinally, the requirements are prioritised based on their fluence is the same as Estates’ influence, and Library’s inimportance, where requirements with higher importance values are ranked higher. The requirements are prioritised fluence is . The influence of stakeholder i in the role is calculated within their hierarchy, so that the output is a ranked list of project objectives, for each project objective, a ranked list the same way using Equation 3-2. of requirements, and for each requirement, a ranked list (3-2) RSmax + 1" rank(i) , of specific requirements. This list is StakeRare’s output for Influencei = n the requirements engineers. # (RSmax + 1" rank(k)) k=1

where RSmax is the maximum rank of all stakeholders with the same role, rank(i) is the fractional rank of stakeholder i, ! and n is the total number of stakeholders with the same role. Stakeholders who do not provide any ratings are excluded. Again, as lower rank values correspond to higher influence, this calculation inverts the rank value by subtracting it from the upper bound of maxranks + 1, then it normalises the influence by dividing it with the sum of all the influences of stakeholders with the same role. For roles with one stakeholder, the stakeholder’s influence is its role’s influence. For example, Alice’s influence is 1 as she is the only stakeholder for the Estates role. The Student role has two stakeholders Dave and Carl. Dave’s influence is

and Carl’s

influence in his role is . The influence of stakeholder i in a project is calculated using Equation 3-3 as follows. ProjectInfluencei = Influencerole(i) " Influencei ,

!

(3-3)

where Influencerole(i) is the influence of the stakeholder’s role in the project (Equation 3-1), and Influencei is the influence of the stakeholder in the role (Equation 3-2). The sum of all the stakeholders’ project influence is equal to 1. From the previous example, Carl’s influence in the Stu-

4

EVALUATION

StakeRare was evaluated by applying it to a real-world large-scale software project. The stakeholders were surveyed for their requirements. The resulting lists of requirements were empirically evaluated in terms of quality of the requirements and accuracy of the prioritisation, by comparing them with the ground truth – the actual complete and prioritised lists of requirements in the project. StakeRare was also compared to the existing methods used in the project in terms of quality of the requirements, accuracy of the prioritisation, and the time spent using the methods. Finally, the stakeholders were interviewed and surveyed on the level of difficulty and effort in using StakeRare. The rest of this section is organised as follows. Section 4.1 details the research questions. Section 4.2 describes RALIC, the large-scale software project used to evaluate this work. Section 4.3 describes the application of StakeRare to RALIC. Section 4.4 describes the construction of the ground truth list of requirements for RALIC and its validation. Section 4.5 describes the existing method list of requirements. Finally, Section 4.6 reveals the results. Fig. 5 summarises the evaluation described in this section.

10

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

Select RALIC Project

RALIC Documentation

Apply StakeRare to RALIC StakeRare Steps: 1. Identify and prioritise stakeholders 2. Survey stakeholders to collect profile 3. Predict requirements 4. Prioritise requirements

Build Ground Truth

Get Existing Method List

Steps: 1. Identify requirements 2. Pairwise comparison 3. Prioritise requirements 4. Validate with stakeholders

RankP RateP PointP

Steps: 1. Identify requirements elicited by project team 2. Compile list

Existing Method List

Ground Truth

Compare Lists

Fig. 5. StakeRare evaluation.

4.1 Research Questions Step 1 of StakeRare identifies and prioritises the stakeholders and their roles for RALIC. The evaluation of this step has been reported in [89] and [12]. The evaluation shows that social networks can be used to effectively identify and prioritise stakeholders. The method identified a highly complete set of stakeholders and prioritised them accurately, using less time compared to the existing method used in the project. The rest of this section describes the evaluation of the other StakeRare steps. For Steps 2 and 4, the requirements lists produced by StakeRare were compared with the existing method and the ground truth lists of requirements. RQ1 to RQ4 as follows are the research questions for Steps 2 and 4. RQ1. Identifying requirements. The existing requirements elicitation methods described in the background section involve a subset of stakeholders. In contrast, StakeRare involves all the identified stakeholders. This research question assesses how well StakeRare identifies requirements as compared to the existing method used in the project by asking: • How many requirements identified by StakeRare and the existing method used in the project are actual requirements as compared to the ground truth? • How many of all the actual requirements in the

ground truth are identified by StakeRare and the existing method used in the project? RQ2. Prioritising requirements. StakeRare prioritises stakeholders using social network measures, and then uses the output to prioritise requirements. This research question asks: • How accurately does StakeRare prioritise requirements as compared to the ground truth? RQ3. Survey response and time spent. The quality of the

requirements returned by StakeRare depends on the stakeholders’ motivation to participate. Also, to provide effective support in requirements elicitation, StakeRare should take less time than existing methods. This research question asks: • Are stakeholders motivated to provide requirements for StakeRare? • How much time did stakeholders spend in identify-

ing and prioritising requirements as compared to the existing method in the project? RQ4. Effective support for requirements elicitation. StakeRare aims to provide effective support for requirements elicitation, by providing a predefined list of requirements for the stakeholders to rate (RateP). During the survey, two other elicitation methods were administered to explore the effectiveness of different methods. RankP asks stakeholders to enter their requirements without providing an initial list of requirements, and PointP asks stakeholders to allocate 100 points to the requirements they want in the same predefined list. In RateP and PointP, stakeholders can suggest additional requirements. This research question explores what kinds of support are effective for the requirements engineer and stakeholders by asking the following questions. • Between the three elicitation methods RankP, RateP, and PointP, which produces the most complete list of requirements and most accurate prioritisation for the requirements engineer? Are the results consistent regardless of the elicitation method used? • Between the three elicitation methods RankP, RateP,

and PointP, which do the stakeholders prefer? • If stakeholders are provided with a list of all the re-

quirements in the project, how prepared are they to rate them all? In Step 3, collaborative filtering is used to predict other requirements a stakeholder may need based on the profile

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

they provide in Step 2. This step was evaluated using the standard evaluation method in the collaborative filtering literature [80, 82, 91]. The evaluation partitioned the stakeholders’ profiles into two subsets. The first subset was the training set, which the collaborative filtering algorithm learnt from. The second subset was the test set, with rating values that were hidden from the algorithm. For the evaluation, the algorithm’s task was to make predictions on all the items in the test set. The predictions were then compared to the actual hidden rating values. Using this method of evaluation, no additional input was required from the stakeholders. An alternative method to evaluate Step 3 was to make predictions based on the stakeholders’ complete profiles and ask the stakeholders to rate the recommended requirements. This option was not selected as some stakeholders were not available to be interviewed more than once. RQ5 to RQ7 as follows are the research questions for Step 3. RQ5. Predicting requirements. To recommend requirements that may be of interest to the stakeholders, StakeRare uses the kNN algorithm in collaborative filtering to identify similar stakeholders and predict their requirements. This research question asks: • How accurately can collaborative filtering predict stakeholder requirements? • Are the results consistent regardless of the elicita-

tion method used? RQ6. Predicting requirements: enhanced profiles. In Castro-Herrera et al.’s work in recommending forums to stakeholders, the stakeholders’ profiles are enhanced with stakeholder information, such as their roles in the project and their interest in different aspects of the system, to produce more accurate predictions of the stakeholders’ interest in forums [86]. This research question asks: • Does enhancing stakeholder profile by adding stakeholder information improve the accuracy of predicting stakeholder interest in requirements? • Are the results consistent regardless of the elicita-

tion method used? RQ7. Predicting requirements: other algorithms. As mentioned in the background section, kNN is a simple machine learning algorithm. Other machine learning algorithms can also be used to predict stakeholders’ interest [92]. This research question asks: • Does using other algorithms and combinations of algorithms improve the prediction accuracy? • Are the results consistent regardless of the elicita-

tion method used?

4.2 The RALIC Project The RALIC project was a software project in University College London (UCL), initiated to replace the existing access control systems at UCL and consolidate the new system with library access and borrowing. RALIC stands for Replacement Access, Library and ID Card. It was a

combination of development and customisation of an offthe-shelf system. The project duration was 2.5 years and the system has been in deployment for over two years. RALIC was selected to evaluate this work from a list of approximately 40 software projects in UCL. The selection criteria were as follows. • Large-scale. The software project must be a largescale software project following the definition of large-scale provided in the background section. • Well-documented. The project must be very well

documented in order to build the ground truth and existing method lists of requirements to evaluate the work. • Available stakeholders. The stakeholders should be

available for interviews. • Completed and deployed. The project should be

completed and the system should have been deployed in UCL for more than a year. This is necessary to allow sufficient time for missing stakeholders and requirements to surface in order to build the ground truth of requirements. o Requirements elicitation and analysis activities at the start of the project often produce a “complete enough” set of requirements [1]. Stakeholders and requirements that are omitted during requirements phase are uncovered in later phases, such as design, development, and deployment. For example, one study described a project where all the change requests received during the first year the software was deployed were from stakeholder needs that were overlooked during the project [38]. o Ideally, for the least biased evaluation, the proposed method should be applied to a project when it was initiated and evaluated after the system is deployed, so that post-project knowledge does not influence the results. But it is impractical to do so because big projects often take longer than the three-year duration allocated for the work5. Also, studies suggested that software projects are more likely to fail than complete successfully [94, 95], hence evaluating StakeRare using a project that has just started is risky because the project may fail before the ground truth can be built. Evaluation using a completed project comes with threats to validity, which will be discussed in Section 5. Most of the projects in the list of available projects were either not large-scale or lack documentation. RALIC was selected because it met the selection criteria. • Large-scale. RALIC had a large and complex stakeholder base with more than 60 stakeholder groups. Approximately 30,000 students, staff, and visitors

5 A study of 214 software projects in Europe found the average project duration to be over 2 years [93]. The study investigated projects of all sizes, and the duration for large-scale projects is likely to be above the average.

11

12

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

use the system to enter buildings, borrow library resources, use the fitness centre, and gain IT access. Besides all UCL faculties and academic departments, RALIC also involved other supporting departments such as the Estates and Facilities Division that manages UCL’s physical estate, Human Resource Division that manages staff information, Information Services Division, Library Services, Security Services, and so on. These stakeholders have differing and sometimes conflicting requirements. • Well-documented. RALIC was a very well doc-

umented project and the project documentation was available to build the ground truth and existing method lists of requirements. • Available stakeholders. The majority of RALIC

stakeholders were available for interviews. For stakeholders who were unavailable, other staff members were available to take their roles. • Completed and deployed. RALIC was completed

and the system was already deployed in UCL for more than a year.

4.3 Applying StakeRare to RALIC Using Step 1 of StakeRare, the stakeholders and roles for the RALIC project were identified and prioritised. The application of this step on RALIC is described in previous work [12, 89]. A total of 127 stakeholders and 70 roles were identified6. This list of stakeholders and their roles served as input to Step 2 of StakeRare to collect stakeholder’s profile. Step 2 of StakeRare uses existing elicitation methods to identify an initial list of requirements. To reflect the actual initial list of requirements in the project, the initial list of requirements was taken from the earliest draft requirements produced by the project team. This initial list consists of 3 project objectives, 12 requirements, and 11 specific requirements. Once the initial list of requirements was prepared, a survey was conducted to collect the profiles of RALIC stakeholders. To do so, all the stakeholders identified in Step 1 were contacted separately via email for a survey. Each survey took 30 minutes on average. In general, each stakeholder spent 10 minutes to learn about StakeRare and RALIC, and 20 minutes to complete the questionnaire. At the start of each survey, the respondent was provided with a cover sheet describing the survey purpose to elicit RALIC requirements. Then, StakeRare was introduced using a set of slides. After that, the respondent was provided with a description of RALIC and its project scope to familiarise the respondent with the project. The respondent was also asked to put themselves in the situation before RALIC was initiated when providing requirements. To prompt the respondent for recommendations, the respondent was provided with the defi-

6 The 127 stakeholders and 70 roles were identified by applying StakeNet on RALIC. Refer to [12] and [89] for the precision and recall of this list. The 127 stakeholders include all kinds of stakeholders such as users, legislators, decision-makers, and developers.

nition of requirements, as well as the different types of requirements, examples for each type of requirement, and a template to guide the free text provided by the respondent (Fig. 6). Step 2 of StakeRare collects stakeholders’ profiles by asking them to rate a predefined list of requirements (the initial requirements) and provide other requirements not in the list. In addition to this elicitation method, the work also tested two other methods: (1) without a predefined list, stakeholders provide a list of requirements and assign numeric ranks to the requirements based on their perceived importance [53], and (2) 100-point test, where each stakeholder is given 100 points that they can distribute as they desire among the requirements [65]. In order to gather these different forms of information, a questionnaire comprising the following parts was used to gather the stakeholders’ profile. The complete questionnaire is available in [12]. (a) Stakeholder details. Respondents provide their name, position, department, and role in the project (Fig. 7(a)). (b) Ranked profile (RankP). Respondents provide their requirements with numeric priorities (1 for most important) and X for requirements they actively do not want (Fig. 7(b)). Then, respondents provide feedback on the elicitation method in terms of three criteria: (1) level of difficulty, (2) effort required, and (3) time spent, by rating each criterion High, Medium or Low (Fig. 7(c)). The respondents are required to complete this question before proceeding to the next to avoid the predefined list of requirements in the next question from influencing their answers. (c) Rated profile (RateP). Respondents rate a predefined list of requirements, from 0 (not important) to 5 (very important), and –1 for requirements they actively do not want (Fig. 7(d)). The predefined list consists of requirements extracted from the earliest draft requirements produced by the project team to reflect the actual initial requirements in RALIC. One extra requirement was added to the list, which is combining Santander Bank Card with UCL access card (Fig. 7(d), Item 1.3.8). This requirement was being considered at the time of the survey and it was an opportunity to use the survey to elicit the stakeholders’ views on the requirement. Respondents are also asked to add requirements not in the predefined list and rate those requirements (Fig. 7(e)). Once they start on RateP, they cannot return to RankP. As before, respondents provide feedback on the elicitation method after they have completed the question. (d) Point test profile (PointP). Respondents are allocated 100 points each to distribute among the requirements they want from RateP (Fig. 7(f)). The requirements include both the predefined ones and the additional ones they provide. Respondents are asked to allocate more points to the requirements that are more important to them. Again, respond-

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

(b)

Fig. 6. Requirements: examples and template.

(c)

ents provide feedback on the elicitation method. (e) Interest and comments. Finally, respondents reveal their interest in the RALIC project in terms of “not at all”, “a little”, “so so”, or “a lot” (Fig. 7(g)). Respondents also provide any other comments they have on the study (Fig. 7(h)). This part aims to learn more about the respondents and collect extra information to support the analysis of the results.

(a)

(d)

13

14

(e)

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

When the respondents have completed their questionnaire, they were interviewed for their survey experience and the rationale behind their answers. Similar to the interviews in the StakeNet survey, these interviews were semi-structured, allowing the questions to be modified and new questions to be brought up depending on what the respondents say [96]. Some questions include: • What do you think about the StakeRare method? • Why is requirement R bad? (If the respondent ac-

tively does not want a requirement.) • Why is requirement R important to you? (If the re-

spondent allocated many points to a particular requirement.) • Which elicitation method do you prefer and why?

(f)

A total of 87 stakeholders (out of the 127 stakeholders identified in Step 1) were surveyed. Table 5 summarises the amount of data collected from the survey. RateP received the highest number of ratings, followed by PointP, and then RankP. The predefined list in RateP has the advantage of suggesting requirements that the respondents are unaware of, but has the disadvantage of enabling respondents to rate as many requirements as they like. PointP has fewer ratings than RateP, as the limitation of points encouraged the stakeholders to rate only the requirements that they needed most. TABLE 5 DATA COLLECTED FROM 87 RESPONDENTS

(g)

(h)

Fig. 7. StakeRare survey questionnaire.

Data Stakeholder details RankP Ratings Requirements Specific Requirements RateP Ratings Requirements Specific Requirements PointP Ratings Requirements Specific Requirements Feedback on elicitation method Interest in RALIC

Amount 87 sets of details 415 ratings 51 items 132 items 2,396 ratings 48 items 104 items 699 ratings 45 items 83 items 783 ratings 79 ratings

The data collected from the survey was processed and cleaned. The survey revealed that although all stakeholders provided short statements of their requirements (e.g., short clauses, or one to two sentences per requirement), very few stakeholders adhere to the requirements template provided to them at the start of the survey (Fig. 6). A respondent experienced in business analysis advised that in requesting input from stakeholders, restrictions and templates should be kept to a minimum to encourage response, and the requirements engineers should process the data after collection. As RateP and PointP used a predefined list of requirements, less cleaning was required.

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

Data cleaning focused on RankP as follows. • Same requirement different wording. Different statements describing the same requirement were merged. For example, in RALIC, “all in one card” was used interchangeably with “one card with multiple functionality,” hence the two requirements were merged. • Same requirement different perspective. Stake-

holders with different perspective may express the same requirement in a different way, hence these requirements were merged. For example, Library Systems had the requirement “to import barcode from Access Systems,” but Access Systems had the requirement “to export barcode to Library Systems.” • One statement many requirements. Statements

containing more than one requirement were split into their respective requirements. • Classification into the requirements hierarchy. The

requirements were grouped under their respective project objectives. Specific requirements were classified into their respective requirements. • Duplicate entries. If a stakeholder provided a re-

quirement more than once, only the requirement with the highest rank (i.e., assigned with the smallest number) was kept. • Missing fields. A valid preference is in the form of

in PointP, the requirement with the most points was kept. In addition to the data cleaning, Step 2 of StakeRare involves propagating the ratings up and down the hierarchy. If a stakeholder rates a lower-level requirement but does not rate the high-level requirement, then StakeRare assumes the stakeholder provides the same rating to the high-level requirement. If a stakeholder rates a requirement but not its specific requirements, then StakeRare assumes the stakeholder provides the same rating to all the specific requirements. This propagation expanded the requirements’ ratings into the ratings of their corresponding specific requirements, resulting in a total of 1109 ratings on specific requirements for RankP, 3113 for RateP, and 1219 for PointP7. When plotting the specific requirements in RankP against the number of positive ratings, the result resembles a power-law distribution (Fig. 8), with a few dominating requirements to the left receiving many ratings and a long tail to the right with many requirements receiving a few ratings [97]. A power law graph is expected to emerge when there is a large population of stakeholders, a large number of ratings, and a high freedom of choice [97]. The graph for RateP has a large number of dominating requirements due to the predefined list (Fig. 9) and the graph for PointP has a long tail but does not have clear dominating requirements because of its point restriction (Fig. 10).

. Three stakeholders provided only a requirement and did not rank their requirement. The rank of 1 was assigned to such requirements, because if the stakeholder provided one requirement, then that requirement was assumed to be the most important to the stakeholder. • Tied ranks. Some respondents provided tied ranks

for their requirements (e.g., two requirements with the same rank). Fractional ranking was used to handle tied ranks. In RankP, the range of the ranking depends on the number of requirements provided by the stakeholders, and this variability affects the prediction and prioritisation. Hence, normalisation was done such that the sum of all the ranks from a stakeholder adds up to 1 in order to ensure that all rankings were within the same range of 0 to 1 for each stakeholder. The rating for “actively do not want” was converted to 0. In RateP, if the respondents entered additional requirements, then the requirements were cleaned the same way as they were in RankP for the items “same requirement different wording”, “same requirement different perspective”, and “one statement many requirements”. When providing additional requirements in RateP, some respondents indicated which project objective in the predefined list the requirements belong to, hence reducing the need for classification into project objectives. For duplicate entries in RateP, the requirement with the highest rating was kept. For PointP, the ratings were normalised such that each stakeholder’s allocated points added up to 100 to remove arithmetic errors during the survey. For duplicate entries

Fig. 8. RankP: Specific requirement vs. number of ratings.

7 The complete RankP, RateP, and PointP datasets are available at http://www.cs.ucl.ac.uk/staff/S.Lim/phd/dataset.html.

15

16

Fig. 9. RateP: Specific requirement vs. number of ratings.

Fig. 10. PointP: Specific requirement vs. number of ratings. In Step 3, collaborative filtering is used to predict other requirements a stakeholder may need based on the profile they provide in Step 2. As discussed in Section 4.1, no additional input is required from the respondents. The fourth and final step of StakeRare gathers all the stakeholders’ initial ratings in Step 2 and their approved ratings from Step 3 to prioritise all the requirements for the project. This step uses the priority of stakeholders and their roles from Step 1. For this priority, the StakeNet list produced by the betweenness centrality measure was used, because the evaluation from the previous work found the list to be the most accurate compared to the lists produced by the other social network measures [12, 89]. As three datasets, RankP, RateP, and PointP, were collected during the survey, three prioritised lists of requirements were produced.

4.4 Ground Truth The ground truth of requirements in RALIC is the actual list of requirements implemented in the project, prioritised based on their importance in the project. The ground truth consists of 10 project objectives, 43 requirements, and 80 specific requirements. Construction The ground truth was built by the author in three stages.

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

Stage 1: Identify requirements. The ground truth project objectives, requirements, and specific requirements were identified by analysing project documentation such as the project plan, functional specification, meeting minutes, and post implementation report. For example, the project plan revealed that a project objective was “to improve security.” A requirement to achieve the project objective was “to enable security/reception staff to validate the cardholder’s identity.” Two specific requirements were “to enable security/reception staff to check that the appearance of the cardholder matches the digitally stored photo” and “to enable security/reception staff to check the cardholder’s role.” RALIC requirements were organised into three levels of hierarchy: project objectives, requirements, and specific requirements. A requirement that contributed towards a project objective was placed under the project objective, and a specific requirement that contributed towards the requirement was placed under the requirement. When placing requirements and specific requirements in the hierarchy, a requirement is placed under the objective that it influences most, and similarly, a specific requirement is placed under the requirement that it influences most. Stage 2: Pairwise comparison. This stage uses pairwise comparison to prioritise the project objectives, requirements, and specific requirements from Stage 1. The pairwise comparison method discussed in the background section was used as it includes much redundancy and is thus less sensitive to judgmental errors common to techniques using absolute assignments [56]. The requirements from each level were considered separately. • Project objectives. The project objectives were arranged in an N!N matrix where N is the total number of project objectives. For each row, the objective in the row was compared with respect to each objective in the rest of the row in terms of their importance to the project. The comparison is transitive. The project objective that contributes more to the success of the project is more important. For example, in RALIC, the objective “to improve security and access control in UCL” (labelled as Security in Fig. 11) was more important compared to the objective “to design the access card” (labelled as Card design in Fig. 11). The objective considered to be more important in each pairwise comparison was placed in the corresponding cell of the matrix (Fig. 11). If the two project objectives were equally important, they were both placed in the cell. For example, the objective “to design the access card” was equally important with the objective “to reduce cost” (labelled as Cost). Hence, in Fig. 11, they both appear in the corresponding cell of the matrix. • Requirements. Requirements were prioritised the

same way as project objectives. A requirement was more important if it contributed more towards the project objectives. For example, in RALIC, between the requirements “granting access rights” and “cashless vending,” “granting access rights” was

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

more important as it contributed highly towards the project objectives “to improve security” and “to improve processes,” but “cashless vending” only contributed towards “extensible future features.” • Specific requirements. Specific requirements were

prioritised the same way as requirements. A specific requirement was more important if it contributed more towards the requirements. Security

Card design

Cost

Security



Security

Security

Card design





Cost, Card design

Cost







Fig. 11. 3!3 Matrix for Project Objectives. Stage 3: Prioritise requirements by their project objectives. This stage produced the ground truth of requirements, which consists of a prioritised list of project objectives, and within each project objective, a prioritised list of requirements, and within each requirement, a prioritised list of specific requirements. To produce the ground truth, project objectives were ranked by the number of cells in the matrix that contains the project objectives, from the most to the least. For each project objective, the requirements were prioritised from the requirements that appeared in the most cells to the least, and for each requirement, the specific requirements are prioritised from the specific requirements that appear in the most cells to the least.

Validation The ground truth was validated by management-level stakeholders and stakeholders who were involved in a major part of the project. These stakeholders were interviewed to validate its completeness and accuracy in prioritisation. The interviews were conducted after StakeRare was applied to RALIC. As StakeRare was evaluated by surveying the stakeholders, conducting the interviews after the surveys prevents the survey answers from being influenced by the interviews. The interviews increased the confidence that the ground truth is objective and accurate, and is representative of the actual requirements and their prioritisations in RALIC. During the interviews, the stakeholders were provided with the description of the RALIC project and reminded about the purpose of the study, which was to identify and prioritise requirements. The stakeholders were also provided with the ground truth, and the purpose of the ground truth to evaluate StakeRare’s output was explained to them. The stakeholders were asked to inspect the ground truth and were presented with the following questions: • Are there any project objectives, requirements, and specific requirements that are missing from the ground truth? • Are there any project objectives, requirements, and

specific requirements that should not be included in the ground truth?

17

• Are any of requirements incorrectly prioritised? If

so, why? The feedback based on the questions was used to amend the ground truth. Missing requirements that were pointed out were confirmed with project documentation as this work only considered documented requirements. The interviews revealed that the ground truth of requirements was complete. However, some stakeholders pointed out that as the project was completed a while ago, they may not have remembered all the requirements, and the best way to check for completeness would be to consult the project documentation. Disagreements in the prioritisations were confirmed with the other stakeholders and justified before amendments were made to the ground truth.

4.5 Existing Method List The existing method list of requirements is an unprioritised list of requirements identified by the project team at the start of the project using existing methods. The project team used traditional elicitation techniques, which included meetings and interviews with key stakeholders (approximately 20 stakeholders representing 30 roles) determined by the project board. The existing method list consists of 10 project objectives, 43 requirements, and 56 specific requirements. These requirements were identified from the start of the project until the date the requirements were signed off by the project board. The project team spent about 127 hours to produce the existing method list. This number is an approximation calculated from the total number of hours the stakeholders spent in meetings until the requirements were signed off. 4.6 Method and Results This section describes the method to evaluate each research question in Section 4.1 and the results. RQ1: Identifying Requirements The first research question asks: • How many requirements identified by StakeRare and the existing method used in the project are actual requirements as compared to the ground truth? • How many of all the actual requirements in the

ground truth are identified by StakeRare and the existing method used in the project? Method The list of requirements identified by StakeRare and the existing method were compared against the ground truth, in terms of precision and recall, the two metrics widely used in the information retrieval literature [91]. The precision of identified requirements is the number of actual requirements in the set of identified requirements divided by the total number of identified requirements (Equation 4-1).

,

(4-1)

18

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

where X is the set of requirements identified by StakeRare or the existing method, and GroundTruth is the set of requirements in the ground truth. The recall of identified requirements is the number of actual requirements in the set of identified requirements divided by the total number of actual requirements (Equation 4-2).

,

(4-2)

with X and GroundTruth same as for precision. Both precision and recall range from 0 to 1. A precision of 1 means all the identified requirements are actual requirements. A recall of 1 means that all the actual requirements are identified. As explained in the StakeRare method, the requirements in StakeRare are organised into a hierarchy of project objectives, requirements, and specific requirements. To measure the precision and recall of the identified requirements, both the requirements and specific requirements were considered. Project objectives were not considered because the different methods share the same project objectives. The requirements returned by the methods can be at a finer, equal, or coarser grain compared to the ground truth. If the returned requirements were at a finer grain, then the results were considered a match. For example, if ground truth returned “control access to departments,” and StakeRare returned “control access to the Department of Computer Science, Department of Engineering, etc.,” then it was considered that StakeRare returned a requirement that matched the ground truth. Otherwise, if the returned requirements were at a coarser grain than the ground truth, then the results were considered not a match. If they were of equal grain, the descriptions of the requirements were compared. Match was independent of exact details such as numeric parameters. For example, if StakeRare returned the requirement “a minimum of 5 years support from the software vendor,” but the actual requirement in the ground truth was “a minimum of 10 years support from the software vendor,” then it was considered that StakeRare returned a requirement that matched the ground truth. This is because in an actual project, specific details could be modified during the project as long as the requirement was identified. Results StakeRare identified the requirements in RALIC with a high level of completeness, with a 10% higher recall compared to the existing method used in the project (Fig. 12). As the existing method mainly involved decision-makers, the list omitted process related requirements such as “enable visual checking of cardholders’ roles” and “ease of installing new card readers.” In the StakeRare list, these requirements were identified by stakeholders who were involved in the process. Hence, StakeRare’s approach of asking stakeholders with different perspectives increased the completeness of the elicited requirements, which is critical to build a system that meets the stakeholders’

needs. The majority of the requirements missing in the StakeRare list were technical constraints such as “the database platform must support Microsoft SQL Server and/or Oracle Server” and “the system manufacturer must be a Microsoft Certified Partner,” although technical stakeholders were involved in the survey. One reason could be the survey method of asking stakeholders to provide requirements on the spot, which may result in a bias to the type of requirements that can be identified using this approach. Future work should investigate this further.

Fig. 12. Identifying requirements8. StakeRare had a lower precision compared to the existing method. This is because in StakeRare, stakeholders are free to suggest requirements, which may not always be implemented. For example, some RALIC stakeholders wanted to replace the existing access control system with thumb readers, but this requirement was not implemented, hence lowering the precision. Other such requirements include enabling the tracking of UCL alumni, card to be cheap (10 pence per card), and combine access card with travel card. These requirements do not appear in the existing method and ground truth lists. Nevertheless, in requirements elicitation, it is better to be more complete but less precise (identify extra requirements which are not implemented), rather than to be precise (identify only the requirements that are implemented) but miss out requirements [1, 39]. Due to the wide coverage of requirements in StakeRare, the majority of missing requirements are of low priority. Not all stakeholders had requirements. For example, some developers who had not provided any requirements explained that their job was to implement the requirements given to them; others highlighted prerequisites for their job. For example, when completing the survey, the maintenance team entered technical documentation as their requirement. The StakeRare lists were less precise partly due to these requirements, as they were not in the ground truth. These requirements do not appear in the existing method list.

8 Comparison between RankP, RateP, and PointP can be found in RQ4 Fig. 14.

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

By asking stakeholders to indicate requirements they actively did not want, StakeRare uncovered conflicts. For example, security guards preferred not to have UCL’s logo on the ID card, for security reasons had the card been lost. Fitness centre users preferred not to have too many features on the ID card, as they had to exchange their cards for their locker keys for the duration they use the gym lockers. If the ID card had also served as a cash or bank card, then the fitness centre users would prefer using a separate gym card to exchange for locker keys, which meant that the requirement “all in one card” would not be achieved. Stakeholders recommended for a role may provide requirements for another role, as it is common for a stakeholder to have more than one role in a project. For example, a RALIC developer was also a user of the access cards. According to this stakeholder, “as a developer I only care about system requirements. But as a member of staff, I want to be able to access required buildings, and by rating these [access related] requirements, I am answering the questionnaire as a member of staff.” During the surveys, some respondents recommended other stakeholders to be surveyed regarding a specific requirement (e.g., the comment in Fig. 7(h)). This suggests that the stakeholder analysis step should overlap with the requirements elicitation step to improve the quality of the stakeholder list and requirement list. This finding is consistent with the existing requirements engineering literature. According to Robertson and Robertson [39], the identifications of scope, stakeholders, and requirements are dependent on one another, and should be iterated until the deliverables have stabilised. Stakeholders supported StakeRare’s idea of being open and inclusive in requirements elicitation. When asked to represent all UCL students in RALIC, the student representative clarified that “I do not represent all 20 thousand students, even though they voted for me. Their views on data management, for example, can be very different from mine,” and suggested a wider body of students to be surveyed using StakeRare to ensure a more representative view. A management-level stakeholder commented that StakeRare provides the opportunity to elicit different views from a large number of stakeholders, which can increase stakeholder buy-in, a vital element for project success.

RQ2: Prioritising Requirements The second research question asks: How accurately does StakeRare prioritise requirements as compared to the ground truth? Method StakeRare’s output for the requirements engineers was measured against the ground truth of requirements in Section 4.4, in terms of accuracy of prioritisation. The accuracy of prioritisation for project objectives is the similarity between the prioritisation of the project objectives by StakeRare and the prioritisation in the ground truth. Pearson’s correlation coefficient, %, was used to determine the similarity [91]. Values of % range from +1 (perfect cor-

relation), through 0 (no correlation), to –1 (perfect negative correlation). A positive % means that high priorities in the ground truth list are associated with high priorities in the list of identified requirements. The closer the values are to &1 or +1, the stronger the correlation. The statistics software by Wessa9 is used to calculate % in this work. The computation of % requires the lists to be of the same size. Therefore, the measurement of % takes the intersection between the lists: each list is intersected with the other, and fractional ranking [90] is reapplied to the remaining elements in that list. Missing requirements and additional requirements in the StakeRare list were accounted for when answering RQ1 on the completeness of the returned requirements. For requirements, % was measured for each list of requirements per project objective and the results were averaged, and the standard deviation was calculated for each average. The same was done for specific requirements, by measuring % for each list of specific requirements per requirement, averaging the results and calculating the standard deviation for each average. As a control, prioritised lists were produced using unweighted stakeholders (referred to as Unweighted in the results), i.e., each stakeholder’s weight is 1. The existing method list was not compared as it is unprioritised. Results StakeRare prioritised requirements accurately compared to the ground truth (Fig. 13). In prioritising project objectives and requirements, StakeRare had a high correlation with the ground truth (% = 0.8 and % = 0.7 respectively). It was less accurate in prioritising specific requirements (% = 0.5). Weighting stakeholders by their influence increased the accuracy of prioritising project objectives and requirements by over 10%, but not for specific requirements. This influence is produced by applying social network measures to the stakeholder network. As such, the results show that using social networks generally improves the accuracy of prioritising requirements. Nevertheless, the low accuracy in prioritising specific requirements should be further investigated, as these requirements are crucial to system development. StakeRare’s output was presented to interested stakeholders after this work was completed. The director of Information Systems, who was experienced in options rankings for major decisions, commented that StakeRare’s prioritisation is “surprisingly accurate.” Interestingly, analysis of the meeting minutes and post implementation report revealed that the project team spent a disproportionate amount of time discussing less important requirements during project meetings. According to the minutes and interviews with the stakeholders, a significant amount of time was spent discussing card design. However, the project objectives “better user experience” and “improve processes” have a higher priority than “card design” in the ground truth. The post implementation report identified a key area relating to user

9 Wessa, P. (2009) Free Statistics Software, Office for Research Development and Education, version 1.1.23-r4, URL http://www.wessa.net/.

19

20

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

experience and processes that was not given adequate attention. According to the report, “A student’s first experience of UCL should not be spent all day in a queue. …the queuing arrangements should be revised and a more joined up approach with Registry taken. Last year the queue for the RALIC card meant students were queuing outside – we were fortunate with the weather.”

Fig. 13. Prioritising requirements (standard deviation in parentheses). StakeRare accurately prioritised the project objectives “better user experience” and “improve processes” as having higher priority than “card design.” Had StakeRare been used, the project team would have realised that “card design” had a lower priority compared to the other two project objectives, and might have spent less time discussing card design and more time discussing various ways to improve processes and user experience.

RQ3: Survey Response and Time Spent The third research question asks: • Are stakeholders motivated to provide requirements for StakeRare? • How much time did stakeholders spend in identify-

ing and prioritising requirements as compared to the existing method in the project? Method To determine the stakeholders’ motivation to provide ratings, the response rate of the survey was calculated as the number of stakeholders who responded, over the total number of stakeholders who were contacted, expressed as a percentage. In the calculation, “non-stakeholders” and stakeholders who have left but yet to be replaced were excluded. For stakeholders who responded, their level of interest in the project, which was collected during the survey, was also investigated. The time spent using StakeRare was calculated using Equation 4-3.

timeStakeRare = time predefined _ list +

timequestionnaire , 3

(4-3)

where timepredefined_list is the time spent building the predefined list of requirements. As the predefined list of re-

!

quirements was taken from the first draft requirements, timepredefined_list is the number of hours the stakeholders spent in meetings until the draft requirements was produced on 8 August 2005, which was 61 hours. timequestionnaire is the total time spent answering the questionnaire, which is the total survey time minus the time spent introducing StakeRare and interviewing the respondents after the survey, which was approximately 10 minutes per respondent. timequestionnaire is divided by 3, to consider only one elicitation method out of the three. timeStakeRare is the time spent using StakeRare from the stakeholders’ perspective. Only the respondents’ time spent was calculated, as this researcher’s presence while they were completing the questionnaires was just to observe them for research purposes. The calculation was an approximation that assumed the elicitation methods take an equal amount of time. The time spent using StakeRare was compared with the time spent using the existing method in the project, which was 127 hours as reported in Section 4.5. Results Stakeholders were motivated to provide ratings. The survey response rate was 79%, about 30% higher than the weighted average response rate without regard to technique10 [98]. Most of the stakeholders who responded were very interested in RALIC, only 13% indicated that they have little interest and 3% no interest. Those with little or no interest may not have responded in tool-based implementations of the survey. The time spent using StakeRare was 71 hours, 56 hours less than the time spent using the existing method in the project. While the existing method in the project returned an unprioritised list of requirements, StakeRare’s list of requirements was accurately prioritised and highly complete, as shown in the previous research questions. These findings indicate that a more adequate involvement of stakeholders in this case produced better requirements. Finally, many stakeholders preferred using StakeRare to provide requirements rather than attend lengthy meetings. In line with the existing literature, elicitation meetings used in existing methods can be time consuming and ineffective [6]. One stakeholder commented, “I was only interested in one issue, but had to sit through hours of meetings, where unrelated items were discussed. What a waste of time. With this method, I could just write it down and get on with my work!”

RQ4: Effective Support for Requirements Elicitation StakeRare’s default elicitation method is RateP, but the evaluation administered two other elicitation methods, RankP and PointP, to explore the effectiveness of different methods. The fourth research question asks: • Best elicitation method. Between the three elicita-

10 In the study by Yu and Cooper [98], the sample sizes for 497 response rates from various survey methods (e.g., mail surveys, telephone surveys, personal interviews) varied from 12 to 14,785. As such, the response rate averages are weighted by the number of contacts underlying the response rate.

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

tion methods RankP, RateP, and PointP, which produces the most complete list of requirements and most accurate prioritisation for the requirements engineer? Are the results consistent regardless of the elicitation method used? • Stakeholders’ preference. Between the three elicita-

tion methods RankP, RateP, and PointP, which do the stakeholders prefer? • Rating all requirements. If stakeholders are pro-

vided with a list of all the requirements in the project, how prepared are they to rate them all? Method Best elicitation method. Three elicitation methods were administered to explore the effectiveness of different methods. In RankP, stakeholders were asked to enter their requirements without providing an initial list of requirements. In RateP, stakeholders were asked to rate a predefined list of requirements and provide additional requirements. In PointP, stakeholders were asked to allocate 100 points to the requirements they want in the same predefined list as RateP. To evaluate the different elicitation methods, the lists produced from RankP and PointP were compared against the ground truth, and RQ1 and RQ2 were answered using precision, recall, and accuracy as described previously. Then, the results from RankP and PointP were compared with the results from RateP. Stakeholders’ preference. To determine the elicitation method preferred by the stakeholders, the respondents’ feedback on the elicitation methods was investigated. During the survey, each respondent rated RankP, RateP, and PointP in terms of the criteria: level of difficulty, effort required, and time spent. Each criterion was rated High, Medium, or Low. The ratings were converted into numeric values (High = 3, Medium = 2, Low = 1) to be averaged. The average rating on each criterion for each elicitation method and the standard deviation for each average were calculated for RankP, RateP, and PointP. The interviews conducted with stakeholders after the surveys provided the rationale behind the stakeholders’ preference. Rating all requirements. To determine how prepared stakeholders were to rate all the requirements, an alternative RateP questionnaire, which consisted of the predefined list of requirements as well as the requirements provided by other stakeholders, was prepared. An experiment was conducted with four stakeholders to rate this alternative questionnaire. The initial plan to involve more stakeholders in the experiment fell through as the stakeholders were reluctant to rate a long list of requirements. Results Best elicitation method. In identifying requirements (RQ1), RateP had the best overall results, RankP had the highest recall, and PointP had the highest precision (Fig. 14). RankP had the lowest precision because stakeholders were free to express what they want. PointP had the high-

est precision as limited points encouraged stakeholders to only suggest requirements they really needed. Although RateP collected the highest number of ratings, it has a lower recall compared to RankP. This is because in RateP, some stakeholders do not provide additional requirements after they have rated the requirements in the predefined list. Requirements missing from RateP that are in RankP are mostly specific requirements. For example, RateP has the requirement “import data from other systems” but omitted the specific systems to import the data from. In prioritising requirements (RQ2), RateP produced the most accurate prioritisation for project objectives and requirements (Fig. 15). The results in all three datasets showed that weighting stakeholders generally increased the accuracy of prioritisation. The most significant improvement was RateP requirements with an increase of 16 percentage points. Only for RankP requirements and RateP specific requirements was there no improvement. The elicitation method influenced the prioritisation. For example, the project objective “extensible for future features” was prioritised disproportionately high in RateP, and disproportionately low in PointP and RankP. As RateP allowed stakeholders to rate as many requirements as they wanted, “nice to have” features were rated high. In PointP, stakeholders were given limited points, hence they allocated the points for requirements they really need rather than future features. In RankP, developer related project objectives such as “compatibility with existing UCL systems” and “project deliverables11” were prioritised disproportionately high. This was because developers who participated in the survey listed development requirements (e.g., the maintenance team needed the requirements documentation for their work) rather than system requirements (e.g., easier to use access cards), and the other stakeholders provided relatively fewer requirements in RankP compared to RateP and PointP.

Fig. 14. Identifying requirements: RankP, RateP, and PointP.

11 This project objective contains development related requirements such as requirements documentation, technical documentation, and change management.

21

22

Fig. 15. Prioritising requirements: RankP, RateP, and PointP (standard deviation for requirements and specific requirements in parentheses). RateP had the advantage of suggesting requirements that the respondents were unaware of, which improved the accuracy of prioritisation. Upon looking at the predefined requirements in RateP, some stakeholders commented that they were reminded about requirements which did not cross their minds while they were completing RankP. For example, the requirement “centralised management of access and identification information” had high priority in the ground truth. But in RankP, only one respondent provided this requirement, resulting in a biased overall prioritisation. The accuracy of prioritisation for the list of requirements under the same project objective was % = &0.1, indicating that the list was negatively correlated with the ground truth. In contrast, this requirement, which was provided in the predefined list in RateP, received positive ratings from 68 respondents, resulting in a prioritisation that was highly correlated with the ground truth (% = 0.7). Stakeholders found it easy to point out requirements they actively do not want from the predefined list of requirements in RateP. However, they found it more difficult to do so in RankP where no requirements were provided. Although RateP suggested requirements to stakeholders, stakeholders did not blindly follow the suggestions. For example, although the “Santander bank card” requirement was in the predefined list, the majority of stakeholders were against it. The requirement received a rating of zero from 25 respondents and a rating of –1 (actively do not want) from 20 respondents. Only 23 respondents rated it positively, suggesting that if UCL were to implement it, the card would not be well received. Stakeholders’ preference. The majority of stakeholders preferred RateP as they found it to require the least effort. Nevertheless, as they had to go through a predefined list of requirements, they found it more time consuming than RankP and PointP, where they only entered the requirements they wanted. For example, a security guard found the RateP list too student focused, and commented that it was “tedious to go through a list of requirements that are

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID

mostly unrelated.” In general, stakeholders found all three elicitation methods easy to complete, requiring little time and effort (Fig. 16). The average responses sat between Low and Medium for all three elicitation methods in all criteria. Different types of stakeholders preferred different elicitation methods. Many decision-makers preferred PointP, as they were used to making decisions under constraints. System users such as students and gym users preferred RateP, where options were provided. RankP was challenging to some as they found it difficult to articulate their needs without prompts. Nevertheless, stakeholders with specific requirements in mind preferred RankP, where they could freely articulate their requirements. Some stakeholders found the predefined list of requirements constraining. For example, a respondent had trouble rating the requirement “enable the gathering and retrieval of the time which individuals enter and leave buildings.” He explained that the requirement should be worded as two requirements because he would provide a negative rating for gathering the time individuals enter buildings (he did not want the time he arrived at work to be recorded), but a high positive rating for gathering the time individuals leave buildings for security reasons.

Fig. 16. Stakeholder feedback on elicitation method. Finally, many stakeholders found the arithmetic exercise in PointP distracting, especially those who allocated points at a fine level of granularity. As such, PointP was rated the highest in terms of effort and difficulty. Future implementations of PointP should provide automatic point calculations. Rating all requirements. Although stakeholders rated most of the initial requirements, they were not prepared to rate the full list of requirements. The four stakeholders involved in rating the alternative questionnaire found the task tedious and time consuming. They preferred to only rate a subset of requirements that were relevant to them. One complained that she was bored and wanted to stop halfway. This suggests that it is useful to recommend relevant requirements to stakeholders when the list of requirements is long.

RQ5: Predicting Requirements The fifth research question asks:

LIM ET AL.: USING SOCIAL NETWORKS AND COLLABORATIVE FILTERING FOR LARGE-SCALE REQUIREMENTS ELICITATION

TABLE 6 DATA CHARACTERISTICS RankP

RateP

PointP

Number of Stakeholders Providing > 1 Rating

66

75

71

Number of Items

10

10

10

Number of Ratings

249

438

270

Number of Stakeholders Providing > 1 Rating

71

75

71

Number of Items

51

48

45

Number of Ratings

461

1513

664

Number of Stakeholders Providing > 1 Rating

76

75

75

Number of Items

132

104

83

Number of Ratings

1106

3112

1217

Rating Range12

0' x