Tool Support for Planning Global Software Development Projects

3 downloads 384 Views 489KB Size Report
Abstract—Planning global software development (GSD) projects is a challenging task, as it involves balancing both technical and business related issues.
2014 IEEE International Conference on Computer and Information Technology

Tool Support for Planning Global Software Development Projects Sriharsha Vathsavayi, Outi Sievi-Korte and Kari Systä Department of Pervasive Computing Tampere University of Technology Tampere, Finland {[email protected]}

Abstract—Planning global software development (GSD) projects is a challenging task, as it involves balancing both technical and business related issues. While planning GSD projects, project managers face decision-making situations such as, choosing the right site to distribute work and finding an optimal work distribution considering both the cost and duration of the project. Finding an optimal solution for these decision-making situations is a difficult task without some kind of automated support, as there are many possible alternative work allocation solutions and each solution affects the cost and duration of project differently. To assist project managers in these situations, we propose a tool for planning GSD projects. The tool uses multiobjective genetic algorithms for finding optimal work allocation solutions in a tradeoff between cost and time. This article discusses the implementation of the tool and application of the tool using two scenarios. Keywords-global software development; planning; genetic algorithms; tool support

software

possibility of having face to face and real-time communication between different teams. Cultural misunderstandings between the teams may also affect the communication. All these challenges caused by the nature of GSD must be taken into account in the planning of a GSD project. Moreover, the number of possible work divisions and schedules is huge, which also makes planning GSD projects a difficult activity. For example, if a software company has multiple sites where to distribute the work, the number of potential team to work package combinations grows exponentially. In this case evaluating all the possible combinations and their impact on cost and duration of the project is very difficult without some kind of automated support. Moreover, the automated support can propose better solutions to the project manager, as it may have access to and consider without prejudice a much larger solution and knowledge base than a human decision maker, who often suffers from the Golden Hammer syndrome [4]: once a person has found a solution that works nicely in one context, he or she tends to apply it over and over again, even in inappropriate contexts. Project planning is carried out in early stages of software engineering life cycle, where most of the information is based on estimates. In these situations, automatically finding the optimal solution is unrealistic, but a tool for guiding the project manager in seeking the (near-)optimal solution and exploring the effects of different management selections and estimates that are most likely to affect the solutions would be very helpful [5]. Thus, it would be beneficial to have an automated support, which can also be used to analyze criticality of different estimates, i.e., how accuracy of different estimates would affect the solutions. In the context of GSD projects, guidance on how different work allocations influence cost and duration of the project will be especially helpful as cost and duration are the main reasons for moving work to remote teams [6]. In our proposal, the automated support would provide a palette of solutions in cost and time space, allowing the project manager to explore the solutions and find a solution that has the most appropriate balance between cost and time. With this kind of automated support, the project manager can examine a set of (near-)optimal work allocations and choose a work allocation which does not over run the cost of the project and also assures the completion time of the project. However, since the number of possible combinations can grow exponentially, using a deterministic search method to

project

I. INTRODUCTION Global software development (GSD) is becoming an increasingly common practice in the software and IT industries. Despite the widespread success and adoption of GSD, the team communication, coordination and control is difficult in GSD projects, due to geographic dispersion, timezone and socio-cultural distance between the teams [1]. Due to these difficulties realizing the cost and time advantage of sending work to remote teams [2] is often hindered. To overcome this, the GSD projects have to be planned effectively. Otherwise, the cost and development time of the project may increase and the quality of the product may also be degraded. Planning is an important activity of software engineering. By planning we mean splitting the work into work packages, assigning work packages to teams and scheduling the order of the work packages. Planning a GSD project is challenging, as the project manager has to distribute and schedule work to a heterogeneous and distributed organization. In a GSD project, the project manager has to assign work to teams in such a way that the teams (resources) are utilized as effectively as possible and the overhead caused by the organization is minimized. Since the overhead is mainly caused by communication problems [3], the need for inter-team communication should be minimized. Geographical and time-zone distances limit the 978-1-4799-6239-6/14 $31.00 © 2014 IEEE DOI 10.1109/CIT.2014.75

458

corresponding experience levels. Each team is associated with multiple slots for supporting ordered development, as a team can develop allocated work in multiple sequences. Moreover, multiple teams can be working on a software project and each team can have different distance with other teams based on their geographical locations, time-zones and socio-cultural distance. In this work, we assume that each team has a certain communication distance with other teams, which specifies how effectively it communicates with other teams in the project. The communication distance is expected to be estimated by the project manager, and the estimation should be on factors like physical distance, time-zone shift and cultural characteristics of the teams. For example, let us consider a case where teams are located in New York and Bangalore. The physical distance between New York and Bangalore is very high, which limits the possibility of having frequent visits and face-to-face meetings. Moreover, the time-zone distance and cultural distance between these teams are very high. This makes the real-time communication between the teams difficult and increases the misunderstandings between the teams. The above factors indicate that the communication distance between New York and Bangalore teams is very high. In our work, the system to be implemented is described in terms of logical components that describe the functionality of the system. A logical component (called component from hereafter), which is also considered as a work package, consists of a set of responsibilities, which will be implemented once the component is realized. The major components required to fulfill the system functionality are obtained from the initial functional decomposition of the system. As each responsibility requires a different skill, each component is characterized in terms of the set of skills required for implementing the component. Moreover, each component is characterized in terms of estimated effort, i.e., person-hours required for developing the component. We assume that the effort of each component can be estimated using any known software cost estimation methods [9]. Each component has two types of relationships with other components, which are dependency and precedence relationships. Dependency relationships are obtained from the initial functional decomposition of the system: if component c 1 needs the services of component c2, then c1 depends on c2, denoted c1 dep  c2. This implies that when developing component c1, some information is needed of component c2 (e.g. service interfaces, service protocols, and the meaning of services), implying need for communication between the teams developing the components. On the other hand, precedence is a relationship expressing the preferred development order of the components, because of strong dependency between the components: if component c1 precedes component c2, denoted c1 pre  c2, then developing c1 is a prerequisite for developing c2. Otherwise, the team developing c2 has to wait until c1 is developed. The precedence relationships are assumed to be determined by the architect based on various facts known about the components. For example, if component c2 can be sensibly tested only if component c1 exists, or if component c1 produces large data entities used by component c2, or if the chosen architectural approaches for component c2 are affected by the architectural approaches for component c1, in all these cases c1 pre  c2. Using this relationship, the architect or project

find (near-)optimal solution is difficult. Thus, it is more reasonable to base the automated support described above on meta-heuristic search algorithms, such as, simulated annealing [7], genetic algorithms (GA) [8] or hill climbing [7]. In this work, we apply multi-objective GA for developing an automated support (i.e., tool) for planning GSD projects. The tool takes an initial work distribution containing information about the available teams and work packages as input and uses GAs for finding the optimal way to schedule and distribute work to teams in the cost and time space. The project manager can then browse through the results and choose a suitable work allocation and schedule for the project at hand. Moreover, the project manager can use the tool for studying different GSD project planning scenarios. The main contributions of this work are 1) an approach to provide the project manager with a tool to search for optimal work allocations and explore cost and time characteristics of the different work allocations, 2) application of multi-objective genetic algorithms for GSD project planning problems and 3) prototype of tool that is used in example scenarios. The paper is structured as follows. In the following section we discuss the model given as input to the tool. Section 3 presents the multi-objective GA used in the tool and implementation of tool. The approach is studied by applying it to different project planning scenarios in Section 4. Section 5 discusses the related work. Finally, we conclude with some remarks on future work in Section 6. II.

GSD WORK DISTRIBUTION MODEL

The GSD work distribution model consists of information about available resources and the system that has to be developed by the available resources. The overview of the model is presented in Fig. 1.

Figure 1. Overview of GSD work distribution model.

In this work, we consider a software team as an individual resource available for performing the work. A team is represented as a combination of characteristics that differentiates each team from other teams. These characteristics are cost of the team per day, capacity, i.e., number of hours the team can spend per day and performance of the team. As a team can consist of several developers, each of whom can have multiple skills with different experience level, each team is characterized in terms of skills and their

459

developing an individual component i can be obtained by subtracting the starting time of component i from the ending time of the component (let’s say j) that precedes component i. If the starting time of component i is greater than or equal to ending time of component j, then team developing component i no need to wait for the development of component j to be finished. Otherwise the team has to wait until component j is developed. The position of the component in the development order of the teams is used in calculating the starting time and ending time of a component. For a given work distribution with a set of components allocated to teams, the estimated duration is obtained by the following equation:

manager can express various types of additional information concerning the development order, which cannot be deduced by any automatic means. For simplicity, we do not allow cyclic precedence relationships. The initial work distribution is created by randomly assigning components to available resources, i.e., available team slots. A component can be allocated to a team only if the team has the necessary skills to develop the component. The tool takes the initial work distribution as input and optimizes it with respect to both cost and duration. As cost and duration may be partially conflicting objectives a balance must be found. The duration and cost of a work distribution can be obtained using the team and system characteristics. First, given a set of components C assigned to team T and assuming that a team is working on only one component at a given time, the total duration spent by a team T can be obtained by the following equation:

Duration = Max (Duration(t1), Duration(t2), …, Duration(tn))

where t1, t2, … tn are the set of teams used in the project. In calculating the duration of a work distribution it is assumed that all teams will be working in parallel. The cost for developing the planned software can be calculated by summing the cost consumed for using each team: N Cost   Duration(ti) *cost(ti) (5) i 1 where N is the total number of teams used and cost(ti ) represents the cost of team ti per day.

C

Duration(T )   df(i)  wf( i) i 1

(1)

where function df(i) specifies the duration required to implement the component i, and function wf(i) measures whether the team has to wait for some time due to the precedence relationship of component i. The duration for developing each component i can be calculated as follows:

df(i) 

effort(i)  cef(i) capacity(T)*performance(T)

(4)

(2)

III. TOOL SUPPORT In this section we describe the implemented algorithm for finding the optimal work distribution and prototype tool for planning GSD projects.

where effort(i) represents the effort taken by component i, function cef(i) specifies the communication effort required for developing component i and T is the team to which component i is assigned. The functions capacity(T) and performance(T) denote the capacity of team T per day and performance of team T. The communication effort required for developing component i is calculated based on the dependencies the component has with other teams and their associated communication distance:

A. Multi-objective genetic algorithm Genetic algorithms are inspired from Darwinian evolution [8], and draw from the concepts of biology and natural selection. GAs are generally used to find a good solution from a large search space. The goal is to find a solution that is as good as possible in reasonable time. Each solution to the problem is represented as a chromosome. The chromosome can be further divided into genes. An evolution cycle is composed of generations. The collection of chromosomes at a given point of time (i.e., at a certain generation) is called population. A population reproduces through crossover, which enables the creation of new individuals for the next generation. The crossover creates new chromosomes (offspring) by exchanging genes of parent chromosomes, and thus increases the size of the population. The population is then subjected to mutation, where values of genes are modified. Both mutation and crossover have some determined probabilities of occurrence – not all individuals produce offspring or mutate in each generation. The quality of the solution is represented using fitness. Fitness indicates the likelihood of the solution to survive to the next generation when the ideas of natural selection are applied. Typically some probability based method is used, where the probability of survival is proportional to the solution’s fitness value. Selection weeds out the poorest individuals, so the size of the population is the same at the start of each evolution cycle.

cef (i)   α*cdist(t(i),t(j)) (3) i j where j = 1, …, M is equal to the total number of components which have relationship R with component i. The constant α is a scaling factor used to estimate each inter-team dependency. The function t(i) specifies the team to which component i is assigned and cdist(t(i),t(j)) is the communication distance between teams that have been assigned components i and j. The communication distance is estimated using the scale {low, medium, high}. For making calculation easy, each value on the nominal scale is converted into a corresponding relative numerical value. As we have precedence relationship between components, a team has to wait for some time if the component it is building has precedence relationship with a component that has not yet been developed. In many cases, the waiting time does not increase the cost of the project, because the team can use the waiting time to do some other work. However, the waiting time increases the total duration required by the team to develop the component. The waiting time spent by a team before

460

Fitness Function and Selection The fitness function evaluates the generated GSD work distribution in terms of duration and cost. Here, we have used two fitness functions f1(x) and f2(x) to evaluate each work distribution. The function f1(x) calculates the duration and function f2(x) calculates the cost of the work distribution. They are modeled after the duration and cost equations specified in (4) and (5). In this work, we have used the Pareto optimality approach [12] for selecting the work distributions. The Pareto optimality approach evaluates each GSD work distribution individually for both objectives and produces a set of nondominated* solutions in the cost and time space. Selecting the individuals for each generation is made as follows: the actual Pareto front pf1 of the population (i.e., collection of chromosomes) in i:th generation, pi, is first collected, and stored for the population in the next generation pi+1. However, as each Pareto front usually contains less than ten individuals and the population size is typically more than 100, just one Pareto front is not enough to make a sufficient population. Thus, the Pareto front of the remaining individuals in pi, i.e., the Pareto front pf2 of the set pi \ pf1 is selected and moved to pi+1. This process is repeated until pi+1 has at least the required minimum of individuals.

Problem representation In order to apply GA, the initial work distribution needs to be represented as a chromosome, and two kinds of data are specified about each component to be included in the encoding. Firstly, the basic information about each component is given as input. This contains the responsibilities associated with the component, other components depending on the component, the components that it has precedence relationship with, component’s name, skills required for developing the component and estimated effort. Secondly, there is the information about the team to which the component is assigned to and the position of the component in the team’s development order (i.e., slots of the team). The chromosome handled by the GA is gained by collecting all data regarding all components. The initial population is created by first creating the desired number of individuals with the given initial work distribution and then randomly assigning a mutation to each individual (for details see [11]). Mutation and Crossover Operations In this work we have applied two mutations. The mutations modify the work distribution. The applied mutations are: (i) change team and (ii) change development order. The change team mutation changes the team assigned to the component with a randomly chosen team from the set of teams available for development. The order of development is defined by the order of assignments, i.e, the implementation of a component starts when the implementation of an already assigned components has been done. The change development order mutation exchanges the position of the component in the team’s development order with the position of a randomly chosen component from the development order of the team assigned to it, thus changing the order in which the components are developed. The crossover operation is implemented as a traditional one-point crossover. In a traditional one-point crossover the crossover point is selected randomly. Two individuals are selected as parents, and the genes of the parents are swapped, so that the crossover point is the cutoff point. This creates two offspring with genes from both parents. In addition to the specified mutations and crossover, we also used a null mutation. We have also used a corrective function, which ensures that the development order of components in a team is rational, as crossover may result in components with identical positions in the development order of a team. We have chosen to use preconditions and a corrective function as opposed to handling malformed solutions in the fitness function to ensure that all solutions are always valid. The preconditions for applying mutations are: (i) the newly chosen team must have necessary skills for developing the component. Otherwise, the change team mutation is not applied. (ii) if two components i and j with component i preceding component j are assigned to same team then the position of component i in the team’s development order has to be less than the position of component j. Otherwise, the solutions are invalid, as it is not possible to estimate duration of a component j whose preceded component is not yet developed. A probability-based method, the roulette wheel selection [8] is used for selecting the mutation.

B. Tool support for planning GSD projects The tool support for planning GSD projects is developed by extending the Darwin tool [13], which has been implemented as an Eclipse plugin. Darwin tool provides necessary user interface controls for facilitating the use of genetic architecture synthesis [11]. Darwin tool has been fully integrated with a UML based CASE tool called UML2Tool [14] for specifying the functional decomposition of the system. It also includes Mutations and settings view. The mutations view allows to view/modify mutation and crossover probabilities. The settings view contains user interface controls for modifying the population size and total number of generations. After each generation the fitness of the individuals is shown as a graph, called fitness graph. The user interface also contains buttons for starting, stopping, pausing and resuming GA. In this work, we have extended the Darwin tool to provide support for planning GSD projects. We have extended Darwin with a new multi-objective GA using Pareto optimality as described above. The settings view is updated to accommodate the option for viewing the Pareto front. To visualize the fitness of multi-objective genetic algorithm, we have modified the fitness graph view of Darwin. The fitness graph is modified in such a way that whenever the specified interval to view Pareto front is encountered, the fitness, i.e., cost and duration of nondominated work allocations are shown on the fitness graph. For entering the information about the teams and their characteristics an organization view is added to Darwin. The organization view is a graphical editor implemented using Eclipse’s Zest visualization tool kit [15]. It can be used to specify team characteristics, view the existing team characteristics and modify the team characteristics. For more information about Darwin, please refer [13].

*

461

Solution cannot be optimized to improve both criterias.

for development are in different sites (i.e., New York, Frankfurt and Bangalore). The team resources used in both work distributions are shown in Fig. 3. We assume that all the teams have same experience level. In both the work distributions, the system to be developed is the ehome system. Both tests are executed 50 times to reduce randomness bias. All the non-dominated solutions of the Pareto fronts of 50 runs are shown in Fig. 4. The smaller circles denote results of test 1 (i.e., teams in same site) and the larger circles denote the results of test 2 (i.e., teams in different sites). Analyzing the Pareto fronts of test 1 and test 2 can provide information about cost and time tradeoffs of using remote teams over same-site teams. As can be seen from Fig. 4, the duration required to finish the project is less when the work is performed by same-site teams (i.e., test 1) compared to when remote teams (i.e., test 2) perform the work. When work is distributed to remote teams it is taking approximately 50% additional time to finish the project as compared to same-site teams. This is because the communication overhead among local teams is less, as it is easy to communicate among local teams than remote teams.

IV. APPLICATION SCENARIOS In this section we describe the application of the tool using two scenarios that are faced by a project manager in planning GSD projects: a) How to distribute work to remote teams and what are the advantages and disadvantages of using low-cost remote teams, as compared to same-site work b) How to choose a suitable development site in a tradeoff between developing the project quickly versus developing the project cheaply. These scenarios correspond to situations the project manager faces while planning a GSD project. The two scenarios are illustrated using an example project. The example project we have used is an electronic home control system (called ehome) that allows the user to change the temperature, make coffee, play music and move drapes. The functional decomposition of ehome contains 12 logical components and 15 dependencies between them. The analysis of the system has revealed 8 precedence relationships between the components. Each component is given an estimated effort and skills of development. The functional decomposition of ehome with estimated effort and required skills is presented in Fig. 2. For example, the skill requirements for the CoffeeMachine component are experience in user interfaces and database development, and there should also be programming specialists. Moreover, the WaterControl component has precedence relationship with CoffeeMachine component. For presentation purpose we have depicted only components, but not the responsibilities of each component. UserInterface MusicSystem

CoffeeMachine

Effort - 60 Skill – UI,PS

Effort - 110 Skill – UI,PS,DB

Effort - 30 Skill - UI

DrapeRegulation Effort - 80 Skill – UI,DB

TemperatureRegulation Effort - 50 Skill – UI,DB

DrapeDriver Effort - 30 Skill - DB

SpeakerDriver

MainController

Effort - 10 Skill - DB HeaterDriver

MusicFiles

Effort - 20 Skill – PS, DB

Effort - 70 Skill – PS, DB

dependency

Effort - 20 Skill - PS

WaterControl

UserManagement

Effort - 50 Skill - PS

Effort - 100 Skill – UI,PS

Figure 3. Resources used for test 1 and test 2.

Precedence relationship

Figure 2. Functional decomposition of ehome.

For both scenarios, we have applied same parameter settings for the GA. The GA is executed with a population of 100 individuals for 250 generations. The applied mutation probabilities are 0.4 for change team mutation, 0.3 for change development order mutation, 0.2 for crossover operation and 0.1 for null mutation, and were discovered after experimentation. A. How to distribute work to remote teams and what are the advantages and disadvantages of using low-cost remote teams, as compared to same-site work? In this scenario, we experimented with two different work distributions. In the first work distribution, the resources available to develop the work are located in a single site (i.e., in USA) and in the second work distribution the teams chosen

Figure 4. Non-dominated solutions of test 1 and test 2.

462

The advantage of distributing work to remote teams over same-site teams is cost, which is obtained at the expense of duration. However, sometimes it is expensive to distribute work to remote teams. For instance, there is a sudden increase in the cost in test 2, i.e., the cost of second data point from left is very high compared to third data point, but the reduction in duration is much less. To understand what factors caused this behavior we analyzed those work distributions. The work distributions of second data point and third data point are shown in Fig. 5 and Fig. 6. The number after each component represents the position of the component in the team’s development order. As can be seen from the work distributions, to reduce the inter-team communication, the GA has assigned components with inter-dependencies to the same team. For example, in Fig. 5, the inter-dependent components CoffeeMachine, WaterControl and TemperatureRegulation, HeaterDriver and DrapeDriver, DrapeRegulation are assigned to same teams. In addition, the components are scheduled according to their precedence constraints, i.e., no team need to wait for some time due to precedence constraints. However, the overall effort spent per day by the teams is high in Fig. 5 than in Fig. 6. One would have imagined that time consumed for Fig. 5 would be much less than the time consumed by Fig. 6, as the effort spent per day is more. However, the communication required between teams is much higher in Fig. 5 than the communication required between teams in Fig. 6, which hinders the development speed of work distribution shown in Fig. 5. The results show that when more teams are used, more time is spent for communicating with other teams than for development work. This shows that when distributing work to remote teams, it is not always possible to gain cost advantage due to low cost teams. Moreover, the amount of communication between the teams and the number of teams used for development has to be considered.

B. How to choose a suitable development site in a tradeoff between developing the project quickly versus developing the project cheaply? Let us take a situation where the development team at customer site (let’s say New York team) is not able to develop ehome project within agreed duration and cost. In this case, the project manager needs to find additional teams to perform the work. He can have multiple teams from multiple sites to choose from. Let us assume that project manager has the option to choose either team at Bangalore site (let’s say Bangalore team) or team at London site (let’s say London team). The characteristics of Bangalore team and London teams are shown in Fig. 7. Both teams have different geographical, time-zone and socio-cultural differences with New York team. Both teams have same amount of skills and experience level, but the wages and performance of the team in London are higher than the team in Bangalore. Now, which team would the project manager choose in a tradeoff between developing the project quickly and developing the project cheaply? While London team has higher performance, language and socio-cultural advantage over Bangalore, the Bangalore team has cost and time-zone difference advantage over London team. In this case, our tool can be used to improve the understanding of the advantages and disadvantages of choosing each site. To study this scenario, we performed two different tests using two different sites. Like in scenario 1 we used the ehome project, but we have experimented with two alternative sites (i.e., London and Bangalore) to provide the required additional resources. Similarly to scenario 1, both tests are run for 50 times to reduce the randomness bias. The nondominated solutions of the Pareto fronts of two tests are shown in Fig. 8. The smaller circles denote results of test 1 (i.e., teams in New York-London sites) and the larger circles denote the results of test 2 (i.e., teams in New York-Bangalore sites). As can be noticed from Fig. 8, the project can be developed quickly if London team is used and the project can be developed cheaply if Bangalore team is used. However, the duration required to complete the project is much longer than the duration required to complete the project when London team is used. The communication distance between Bangalore and New York teams and performance of the Bangalore team hinders the development speed when Bangalore team is used. As can be seen, it is not self-evident, which team is more suitable, as both have advantages and disadvantages. The advantage of using Bangalore team over London team is low cost, but the drawback is longer duration required to finish the project.

Figure 5. Work distribution of second data point of test 2.

Figure 7. Characteristics of New York – London teams and New York – Bangalore teams.

Figure 6. Work distribution of third data point of test 2.

463

automated project management and considering also business aspects. Almeida et al. [19], in turn have studied multi-criteria decision analysis on GSD with Scrum. They present a model for project managers to make decisions using cognitive mapping relying on user input. This kind of multi-criteria decision analysis for all aspects of GSD projects could be done automatically and the produced suggestions could be optimized with the use of heuristics – such as GA, which we are applying. Applying meta-heuristic algorithms to software engineering problems is a rapidly growing area of research, and searchbased approaches have also been applied in the area of project management. One of the most studied problems is assignment of tasks while taking into account skills of the employees and to simultaneously optimize the cost and duration of the project [20, 21, 22, 23]. Similar to our approach, there are also studies on optimizing the estimated effort required to complete a project [24, 25], on defining the cost function of software projects [26], and on scheduling and staffing a project [27]. However, none of these approaches consider the projects in a globally distributed setup, as we do. Multi-objective approaches using Pareto optimality have recently been surveyed [28], and they include applications in all areas of software engineering, most approaches concentrating in software design and software testing. There are also several approaches to project management using Pareto optimality, such as solving the agile team allocation problem [29], scheduling [30], and overtime planning [31]. Again, none of these approaches consider the distributed setup, which differentiates them from our approach. In our earlier work [32] we have shown how genetic algorithms can be used to automate work allocation in projects also in the context of GSD. That research showed our initial results but it tried to optimize time and cost at the same time and did not address the fact that project managers need to do tradeoff between time and cost. In this paper we use the multiobjective optimization and Pareto approach to make this tradeoff visible for the project manager. Thus, we have made a significant step towards practical use of the idea.

Figure 8. Non-dominated solutions of test 1 and test 2.

Thus, the project manager has to choose a team based on the available project constraints. Moreover, as can be seen from graph, a small increase in the duration i.e. from third data point (from left) to sixth data point has decreased the cost of the work distribution from around 18000$ to around 12000$. If the duration of the project is not very important then the project manager can choose this solution. The project manager can use the Pareto fronts of tests to analyze different work distributions and can choose the suitable work distribution for a given project. These small examples showed how the set of optimal solutions can be presented as a Pareto front computed by the tool. From those Pareto fronts the manager can see the tradeoffs and can select suitable work distribution based on the present business priorities. Furthermore, if the manager is uncertain on some parameters like communication distance she can try effects of changes in that parameter and if the optimization seems sensible on that variable, she can consider it as a source of risk and investigate it deeper. V. RELATED WORK In this section we will cover some general studies in GSD, and relevant studies in search-based software engineering and particularly applications of Pareto optimality. Smite et al. [6] have reviewed empirical studies in GSD. They conclude that the most popular topics are requirements engineering, coordination and communication and the application of agile processes. Our work concentrates on work allocation and tradeoffs that must be made (duration versus cost). The former has been studied by, e.g., [16] and the difficulty of the latter has been demonstrated by Ramasubbu et al. [17]. These studies, however, do not include any automation. Some tool support has been proposed for aiding in GSD project management. Yildiz et al. [18] present a framework including a metamodel for GSD project planning. By gathering input from the project manager through a set of questions on the project (employees, their skills, sites, work cultures, etc.) the tool puts together a model of the project. However, this approach does not consider cost or duration, which are essential for planning, and though from the model perspective their study is close to ours, we take a step further to the field of

VI. CONCLUSIONS AND FUTURE WORK In this paper we have presented a tool, which can be used to guide the project management in making optimal work distribution and for studying different project planning scenarios in the context of GSD projects. As time and cost are inter-related and partly conflicting criteria, we propose that managers are shown a Pareto front that makes the time-cost compromise explicit. Since the search-space is huge and can grow exponentially we propose the use of genetic algorithms instead of deterministic calculation. This paper presents the idea and results of initial experiments. Obviously, the application of the approach on real industrial data and evaluating the results together with practitioners are needed for validating the approach. However, even with the small example demonstrated here, the results are encouraging. The results show that it is not straightforward for a project manager to distribute work in GSD projects and to choose a suitable site for developing the project in a tradeoff between cost and time.

464

[13] Hadaytullah, S. Vathsavayi, O. Räihä, K. Koskimies "Tool Support for Software Architecture Design with Genetic Algorithms", In Proc. ICSEA2010, IEEE CS Press, pp. 359-366, August 2010. [14] Eclipse’s Model Development Tools WWW site. At URL http://www.eclipse.org/modeling/mdt [15] Zest WWW site. At URL http://www.eclipse.org/gef/zest. [16] V. Clerc, P. Lago, and H. van Vliet “Global Software Development: Are Architectural Rules the Answer?” in Proceedings of ICGSE’07, IEEE CS Press, 2007, pp. 225-234. [17] N. Ramasubbu, M. Cataldo, R.K. Balan, and J.D. Herbsleb “Configuring Global Software Development Teams: A Multi-Company Analysis of Project Productivity, Quality, and Profits” in Proceedings of ICSE’11, ACM Press, 2011, pp. 261-270. [18] B. M. Yildiz., B. Tekinerdogan, and S. Cetin, “A Tool Framework for Deriving the Application Architecture for Global Software Development Project”s, in Proceedings of ICGSE’12, IEEE CS Press, 2012, pp. 94103. [19] L.H. Almeida, P.R. Pinheiro, and A.B. Albuquerque “ Applying MultiCriteria Decision Analysis to Global Software Development with Scrum Project Planning.” in Proceedings of RSKT’11, LNCS 6954, Springer, 2011, pp. 311-320. [20] C.K. Chang, M.J. Christensen, and T. Zhang ”Genetic Algorithms for Project Management”. Annals of Software Engineering 11, pp. 107-139, 2011.

In the next steps the model should be applied in a real industrial context and should be complemented with realistic cost factors for elements like communication distances. In addition, the current model considers skills as binary values and assumes that all teams have same productivity. These need to be taken into account. The initial feedback from our industry peers also suggests that motivational aspects should be taken into account. Moreover, the architectural decisions that can ease or hamper distributed development can also be considered. Planning and management of projects requires learning the characteristics of the organization. An experienced manager knows the strengths and weaknesses of her organization from the past experience. Similarly, the various parameters in the tool should be tuned from experiences and measurements from the previous projects. This kind of tool will be especially helpful for the project managers who are new to distributed software development. ACKNOWLEDGMENT We would like to thank professor Kai Koskimies for the helpful discussions in various phases of this research.

[21] E. Alba, and F. Chicano “Software Project Management with GAs”. Information Sciences 177, pp. 2380-2401, 2007. [22] L.L. Minku, D. Sudholt, and X. Yao “Evolutionary Algorithms for the Project Scheduling Problem: Runtime Analysis and Improved Design.” in Proceedings of GECCO’12, ACM, 2012, pp. 1221-1228. [23] D. Debels, and M. Vanhoucke “A Bi-population Based Genetic Algorithm for the Resource-Constrained Project Scheduling Problem.” in Proceedings of ICCSA’05, LNCS 3483, Springer, 2005, pp. 378-387. [24] J.J. Dolado, and L. Fernandez “Genetic Programming, Neural Networks and Linear Regression in Software Project Estimation.” in Proceedings of INSPIRE III,1998, pp.157̽171. [25] K.K. Shukla “Neuro-Genetic Prediction of Software Development Effort.” Information and Software Technology 42, 10, pp. 701 ̽ 713, 2000. [26] J.J. Dolado “ On the Problem of the Software Cost Function.” Information and Software Technology, 43, pp.61-72, 2001. [27] M. Di Penta, M. Harman, and G. Antoniol, “The use of search-based optimization techniques to schedule and staff software projects: an approach and an empirical study,” Software: Practice and Experience, 2011, pp. 495–519. [28] A. S. Sayyad, and H. Ammar “Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey”in Procedings of RAISE’13, IEEE CS Press, pp. 21-27, 2013. [29] R. Britto, P. S. Neto, R. Rabelo, W. Ayala, and T. Soares, "A Hybrid Approach to Solve the Agile Team Allocation Problem," in Proc. CEC, Brisbane, Australia, 2012, pp. 1-8. [30] F. Chicano, F. Luna, A. J. Nebro, and E. Alba, "Using Multiobjective Metaheuristics to Solve the Software Project Scheduling Problem," in Proc. GECCO, Dublin, Ireland, 2011, pp. 1915-1922. [31] F. Ferrucci, M. Harman, J. Ren, and F. Sarro, "Not Going to Take this Anymore: Multi-Objective Overtime Planning for Software Engineering Projects," in Proc. ICSE, San Francisco, USA, 2013. [32] S. Vathsavayi, O. Sievi-Korte, K. Koskimies, and K. Systä, “Planning Global Software Development Projects Using Genetic Algorithms,” in Search Based Software Engineering, vol. 8084, G. Ruhe and Y. Zhang, Eds. Springer Berlin Heidelberg, 2013, pp. 269–274.

REFERENCES [1]

E. Hossain, P. L. Bannerman, and D. R. Jeffery, ”Scrum practices in global software development: a research framework,” in 12th international conference on Product-focused software process improvement, Berlin, Heidelberg, 2011, pp. 88–102. [2] G. Seshagiri, “Point/Counterpoint: GSD: Not a Business Necessity, but a March of Folly,” IEEE Software, vol. 23, no. 5, 2006, pp. 62–65. [3] P. J. Ågerfalk, B. Fitzgerald, H. Holmstrom,B. Lings, B. Lundell, and E. Ó. Conchuir, “A framework for considering opportunities and threats in distributed software development,” in International Workshop on Distributed Software Development, Paris, France, Austrian Computer Society, 2005, pp. 47-61. [4] W.J. Brown, R.C. Malveau, H.W. McCormickIII and T.J. Mowbray, Antipatterns - Refactoring Software, Architectures, and Projects in Crisis, Jhon Wiley and Sons, Inc., 1998. [5] M. Harman, P. McMinn, J. Souza, and S. Yoo, "Search based software engineering: Techniques, taxonomy, tutorial," in Empirical software engineering and verification: LASER 2009-2010, B. Meyer and M. Nordio, Eds. Springer, 2012, pp. 1-59. [6] D. Smite, C. Wohlin, T. Gorschek, R. Feldt, "Empirical Evidence in Global Software Engineering: A Systematic Review", Journal of Empirical Software Engineering 15(1), 2010, pp. 91-118. [7] F.W. Glover and G.A. Kochenberger, Handbook of Metaheuristics, International Series in Operations Research & Management Science, 57th ed., Springer, 2003 [8] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolutionary Programs, Springer-Verlag, 1992. [9] H. Leung and Z. Fan. Software cost estimation. In Handbook of Software Engineering and Knowledge Engineering, 2002. [10] M. Amoui, S. Mirarab, S. Ansari, and C. Lucas, "A genetic algorithm approach to design evolution using design pattern transformation," International Journal of Information Technology and Intelligent Computing, vol. 1, no. 2, pp. 235-244, June 2006. [11] O. Räihä, K. Koskimies, and E. Mäkinen, "Genetic synthesis of software architecture," in Proceedings of the 7th International Conference on Simulated Evolution and Learning (SEAL '08). Melbourne, Australia: Springer, 2008, pp. 565-574. [12] C.C. Coello Coello, “An updated survey of GA-based multiobjectiveoptimization techniques,” ACM Computing Surveys 32, 2, pp. 109-143, 2000.

465