existing and proposed Bayesian Networks for software ... Software development project is a collection of .... repository consisting of 150 web development.
Abou Bakar Nauman, et al International Journal of Computer and Electronics Research [Volume 1, Issue 2, August 2012] ISSN: 2778-5795
PRODUCTIVITY INFERENCE WITH DYNAMIC BAYESIAN MODELS IN SOFTWARE DEVELOPMENT PROJECTS Abou Bakar Nauman1, M.Ikram. Lali2 1 Sarhad University of Science and Information Technology, Peshawar, Pakistan 2 University of education Lahore, Attock campus, Pakistan Abstract--This article covers a literature review of some existing and proposed Bayesian Networks for software project management and effort estimation. Each of the reviewed models is proposed for a specific issue of software project management and involved variety of research methods for factors selection, data population and validation. The purpose of the review is first to identify the scope and utility of the models and secondly to reveal how different methods are being used for development of BBN based models. Dynamic Bayesian networks are one of the rapidly adopted techniques which we used in this research. And there exist a strong need to explore the capabilities of BN in effort estimation. This article not only provides a brief discussion on the models available, but also highlights their application in software effort estimation and also their suitability for development of bigger models.
in managing uncertainty and missing data [6-9]. BN is based on probabilistic theory [10, 11], and as the nature of estimation is also probabilistic , the use of BN for seems more suitable for estimation [12,13]. BBN are based on Causal Networks which present more logical relationship than the regression based models [11,12]. Application of Bayesian approach in project management, decision making and risk analysis has encouraged many researchers to enhance the capabilities of BBN in these areas [12-22].
Keywords--Bayesian Networks; Management; Literature review.
Structure or graph: The graph, as discussed earlier, demonstrates the relationship of the factors or variables. So one of the most important requirement is the correctness of the graph, this is significant for the accuracy of final outcome of the probability distribution.
Software
Project
1. INTRODUCTION Software development project is a collection of efforts and resources in a defined time period to realize a software product which satisfies the requirements made by a client or agreed upon [1,2]. Project management focuses on suitable application of efforts and resources to achieve the constraints of Cost, Time and Quality. From very first day, the planning for efforts and resources is conducted based on estimates. Estimation is key to the planning and is made not only at the beginning but also at every single milestone. Current research in estimation is focused on issues like development of new models, metrics conversion, uncertainty, missing data, intelligent decision support and models for new life cycles [3,4,5]. Decision support methods e.g. Bayesian networks (BN) also termed as Bayesian belief networks (BBN) have recently established their effectiveness
A Bayesian network contains two components [10,11]:
Conditional probability table: The second step is to populate the nodes with probability information for each of the variables. Node Probability table is used to provide the information regarding the probability of a particular event at the node. The data for NPT is either collected from sample, or given problem. The data at NPT can also be changed by the researcher to simulate a particular scenario, which provides the flexibility for risk management. The review of Bayesian networks hence require study of the structure of network and the research methodology involved in population of conditional probability table and its validation.
50
Abou Bakar Nauman, et al International Journal of Computer and Electronics Research [Volume 1, Issue 2, August 2012]
2. BAYESIAN NETWORKS IN SOFTWARE ENGINEERING 2.1 Project Management Models The research by Norman Fenton [18] deals with the development of a prediction model for software engineering resources. The article introduces the concept of causal modeling to represent the relationship among different factors and variables in project development. The article introduced two models one is simple and other is extended. In the simple model factors like People & process quality, project duration, functionality delivered, effort and quality delivered were included. The model was then extended with the same basics relationships; however the factors were increased in relative subnets. The approach used to identify the factors and include them with respective relationship is empirical. The researchers used the existing knowledge about practices and significance of different factors. This research is one of the initial attempts to develop the Bayesian models for software development project management, the authors of this paper have also presented improved model, however here we are reviewing what has been presented in this article. Initial model includes some basic factors and relationships. The model suggests that effective effort depends on Process& people quality as well as adjusted total effort (result of project duration and number of people). The total effort affects the functionality delivered. The differential of effort and functionality delivered is also calculated which affects the quality delivered; the quality delivered is also
dependent
on
Process
and
people
quality.
Fig 1: Initial model for project management.
Fig 2: Extended model for project management The extended model is almost like the simple model however includes some new factor groups. Prime focus of the model is resource planning, however the model is not suitable for implementation in dynamic Bayesian networks due to large number of nodes. The model doesn’t provide effort estimation or productivity. In another research by same group [22] is an enhancement in the article reviewed earlier with the title of “Making resource decisions for software projects”. This particular model is more capable in producing effective estimation on the 51
Abou Bakar Nauman, et al International Journal of Computer and Electronics Research [Volume 1, Issue 2, August 2012]
basis of previous organizational data (prior) and user input. The model also incorporates the significant factors of project development. This model can be used in a particular organization which will provide estimation according to the previous data. The model works in such way that it first take the existing data of previous projects, then user is asked to enter the %age difference of current project Scale, Complexity and Novelty from previous projects. The data about other quality attributes is also included. This helps the model to predict the estimated productivity as well as error rate which is used to estimate the number of defects in that project. Thus the model helps the project manager to make resources decisions on the basis of previous data of the organization. However the model is neither implemented nor tested for IID projects. Large number of nodes also makes it hard to be successfully implemented in DBN. 2.2 Effort estimation models The authors of another research [13] concentrate on showing the process of development of a small Bayesian model and to show how the probabilistic calculations can be made. The model is relatively small with only three variables; however the article elaborates each step of probabilistic reasoning with providing an algorithm for probability distribution calculation. In this article the results of the new probabilistic model were compared with some other estimation models e.g. CART and it was observed that Probabilistic model provides more accurate estimation results. The model is based on a small number of nodes which makes it suitable for implementation in DBN. However the model is not implemented or tested for IID effort estimation.
development effort of web projects. The model is developed using Hugin Tool and used a data repository consisting of 150 web development projects for NPT development. The model was verified and validated by a 30 projects data set as well as by an expert of web development. The results showed that Bayesian Networks based model was better than some other effort estimation model. However there is no evidence of capability of calibration of the model with latest data. The model is not implemented or tested for iterative projects. In another interesting research [24] the authors proposed that BBN can be used to estimate the productivity in software development projects. The authors argues that as the BBN can support the expert judgment the use of BBN in software project estimation can cater the problem of uncertainties in the software development. The authors particularly modeled the COCOMO81 estimation model in the BBN and demonstrated how productivity can be estimated in proposed Bayesian model. A large set of nodes makes it unsuitable for implementation in DBN. The model is also not implemented for IID projects. 2.3 Iterative project models Bibi S. and I. Stamelos. [15] proposed the modeling of software processes by using Bayesian and dynamic Bayesian networks. The example of RUP process is used to demonstrate the metrics collected from workflows of RUP process can be recorded and used to estimate the required effort. However the model is not tested or validated for its application.
The model proposed by [21] is another example to show how Bayesian network can be used for estimation of effort in software development. The particular research article deals with the software
52
Abou Bakar Nauman, et al International Journal of Computer and Electronics Research [Volume 1, Issue 2, August 2012]
. Fig 3: Process model for RUP. The second research article [17] in this subsection demonstrates that how the Bayesian networks can be build for agile project development methodology. The BBN is developed to support the phase based development and particularly deals with learning the project velocity in extreme programming. Although the model is considered as a successful implementation in terms of its
capabilities, however there are few gaps for its application in a IID process. First of all it is focused on extreme programming. Secondly it doesn’t provide any help for initial project estimation. Third, the model has computational limitations. Although the researchers themselves raised the need for development of smaller repeatable network, however as the model is built and links are established beyond time slices, the computational complexity is increased.
Fig 4: Project velocity prediction model. 53
Abou Bakar Nauman, et al International Journal of Computer and Electronics Research [Volume 1, Issue 2, August 2012]
2.4 Defect Prediction models The use of BBN in software defect prediction is one of the most recognized research topics in the recent software engineering research. Fenton et. al. presented the concept of casual modeling for the prediction of uncertain artifacts like defects in software. In the earlier paper [25] authors. reviewed the existing traditional model of defect prediction and then proposed a simple BBN for defect prediction. After that in a further article [12] they proposed how the model can be used in different lifecycles. The research was further enhanced with the use of Dynamic Bayesian Networks [19] to demonstrate how defects in one phase can effect the next phase. In the recent article the authors demonstrated the use of defect prediction BBN in varying lifecycles. Here a small review of all these articles is presented and the focus is again to show the capability of BBN to produce good models with accuracy and effectiveness. Another point which needs to be made is the enhancements which can be made in the models by incorporating different methodologies of BBN development. This trend shows that Development of BBN and its enhancement is an ongoing process and different researchers can add their opinion in the
research. Hence we review these three articles by Norman Fenton group. • A Critique of Software Defect Prediction Models [25]. • A Probabilistic Model for Software Defect Prediction [12]. • Predicting software defects in varying development lifecycles using Bayesian nets [19]. In the first article the authors proposed a causal model for the prediction of defects in software development. It was a very basic model with very few factors typically related with testing process. The model [25] narrates that defects detected in a particular module depends on design Size, Testing effort and defects introduced. The size is dependent on Problem complexity and design effort. The defects introduced also depend on these two. We can also estimate defect density by defects detected and size. This was hence a small model with limited number of factors. The data used for NPT’s of this model was fictitious however determined from published literature.
Fig.5: Defect prediction model. 54
Abou Bakar Nauman, et al International Journal of Computer and Electronics Research [Volume 1, Issue 2, August 2012]
In the third article [19] the BBN for defect prediction was further enhanced by introducing the Dynamic Bayesian Network. Also a generalized lifecycle phase was developed for defect detection and that phase was repeated with help of DBN to represent the multiphase or iterative development.
In the next article [12] the model was enhanced by introducing the phases. The whole model was described in the form of sub-nets where each subnet represented the phase of software development. Each subnet was representing a BBN of different factors separately.
The main sub-nets in the high-level structure correspond to key software life-cycle phases in 3. ANALYSIS AND DISCUSSION the development of a software module. Thus there are sub-nets representing the specification Literature is rich with books, Journal Articles phase, the specification review phase, the design and Conference Papers on the topic of Bayesian and coding phase and the various testing phases. Net [23]. It is interesting to see that the number Two further subnets cover the influence of of these references is increasing with each year. requirements management on defect levels, and It is also notable that most of the articles discuss operational usage on defect discovery. The final application of BN in different areas like defect density sub-net simply computes the Environmental sciences, Medical sciences, industry standard defect density metric in terms Biological sciences and Software engineering. of residual defects delivered divided by module There have been models for different aspects of size. This structure was developed using the software engineering developed in Bayesian software development processes from a number networks which includes defect prediction, of Philips development units as models. Hence software reliability, process control etc. Some of it can be said that the model was particularly these models do highlight the issue of effort developed for a specific organization. The estimation. The table below provides a brief of NPT’s of the nodes were populated by the data these models. available at the organization. The validation process was also carried out at Philips. Table1. Review of existing models. Model
Focus
Effort Estimation
Initial Hearty, 2007[17]
Emilia Mendes, 2007 [21] I. Stamelos et al. 2003[24]
Project velocity in extreme programming Web effort estimation Productivity inference based on
No (No Empirical Base) Yes
Initial Productiv ity
Calibrati on N/A
Learning Method
Suitability for dynamic networks
DBN
Yes
Iterative Yes (Project Velocity)
No
No
Static Bayesian
The derived model is complex Some how suitable
Not tested
No
Manual NPT developm
Large number of nodes 55
Abou Bakar Nauman, et al International Journal of Computer and Electronics Research [Volume 1, Issue 2, August 2012]
COCOMO81
Norman Fenton et al. 2004 [18] Pendharkar, P.C et al. 2005 [13] Łukasz Radli ski et al. 2004 [22]
Resource planning
Effort Estimates based on Use case Resource Planning
ent
Partial; Based on brook factor Yes
Not known
Not tested
Static Bayesian Network
Large number of nodes
Yes
No
static
Suitable
Productiv ity
Yes
Not tested
%age Large difference number of from nodes previous processes
The major focus of model developed is listed under the ‘focus’ column. The effort estimation column provide three listings; Availability of Initial estimate, Availability of Calibration on the basis of latest data and Support for Iterative projects. The method adopted by the researcher to allow the model to learn is provided under Learning Method. It describes the technique used for learning the latest data. The last column presents our perception about the model for its suitability for incorporation in Dyanmic Bayesian Networks. A model with a large set of nodes is considered hard to incorporate in Dynamic Bayesian Networks.
models also exist and researchers have improved their own proposed models by time. It is however found that large set of models are not focused on effort estimation in IID. It is also observed that the large set of nodes make it hard to use these existing models are a reusable set of nodes for effort estimation in IID projects. 5. ACKNOWLEDGMENTS I acknowledge the prestigious guidance of Dr William Marsh at RADAR, QMUL, UK and of Dr. Romana Aziz at CIIT Islamabad Pakistan. 6. REFERENCES
4. CONCLUSION The models identified in this review are focused towards software engineering and project management. Differing approaches have been used to identify the causal structure of the networks which includes process modeling, data exploration and casualty identification. The models are developed in static Bayesian networks as well as in Dynamic Bayesian networks. However each model has large number of nodes representing variables or factors. It is also observed that a chain of
[1]. C. Larman, "Agile and Iterative Development: A Manager's Guide", Addison Wesley, 2003 [2]. Robert C. Martin “Iterative and Incremental Development”, Engineering Notebook Column, April, 1999 [3]. Bohem B. et al. 1995, “Cost models for future life cycle processes: COCOMO2.0”, Annals of Software Engineering, Vol 1. [4]. Jingzhou Li, Guenther Ruhe “Decision Support Analysis for Software Effort Estimation by Analogy”, Third International 56
Abou Bakar Nauman, et al International Journal of Computer and Electronics Research [Volume 1, Issue 2, August 2012]
Workshop on Predictor Models in Software Engineering (PROMISE'07) [5]. Walker Royce, “ Software Project Management, A Unified Frame work” Pearson Education, 2000 [6]. Mohammad Azzeh et al. “Software Effort Estimation Based on Weighted Fuzzy Grey Relational Analysis”, ACM 2009 [7]. Steve McConnell, “Software Estimation: Demystifying the Black Art”, Microsoft Press,2006. [8]. Bente Anda, Hege Dreiem, Dag I.K. Sjøberg and Magne Jørgensen, “Estimating Software Development Effort based on Use Cases – Experiences from Industry” [9]. Andrew R. Gray, Stephen G. MacDonell, A comparison of techniques for developing predictive models of software metrics, Information and Software Technology 39 (1997) 425-437 [10]. Heckerman D. 1997, “Bayesian Networks for Data Mining”, Data Mining and Knowledge Discovery, vol. 1, pp. 79119. [11]. Jensen F.V.1996, “An Introduction to Bayesian Networks”, UCL Press. [12]. Fenton N.E., Paul Krause, Crossoak Lane and Martin Neil, 2001, “A Probabilistic Model for Software Defect Prediction”, citeseer, manuscript available from the authors. [13]. Pendharkar, P.C.; Subramanian, G.H.; Rodger, J.A. 2005, “A Probabilistic Model for Predicting Software Development Effort”, IEEE Transactions on Software Engineering, Volume: 31 Issue: 7 Pages: 615-624. [14]. Martin N., Fenton N.E., Nielson, Lars, 2000, “Building large-scale Bayesian networks”, Journal of Knowledge engineering review, Volume: 15 Issue: 3 [15]. Bibi S., I. Stamelos.2004, “Software Process Modeling with Bayesian Belief Networks”. IEEE Software Metrics 2004, Online proceedings. [16]. Azalia Shamsaei, 2005, M.Sc. Project report, Advanced Method in computer science at the University of London
[17]. Hearty P, Fenton NE, Marquez D, Neil M. 2009, “Predicting Project Velocity in XP using a Learning Dynamic Bayesian Network Model”, IEEE Transactions on Software Engineering, Volume 35 , Issue 1,January. [18]. Fenton N.E., William Marsh, Martin Neil, Patrick Cates, Simon Forey, and Manesh Tailor, 2004, “Making Resource Decisions for Software Projects”, Proceedings of the 26th International Conference on Software Engineering (ICSE’04). [19]. Fenton N.E., Neil, M.; Marsh, W.; Hearty, P.; Marquez, D.; Krause, P.; Mishra, R. 2007, “Predicting software defects in varying development lifecycles using Bayesian Nets”, Information and Software Technology , Volume: 49 Issue: 1. [20]. Khodakarami, V., Fenton, N., & Neil, M. 2009, “Project scheduling: Improved approach incorporating uncertainty using Bayesian networks”, Project Management Journal. [21]. Emilia Mendes (2007), “Predicting Web Development Effort Using a Bayesian Network” Proceedings of (EASE'07) 11th International Conference on Evaluation and Assessment in Software Engineering 2-3 April, pp. 83-93 [22]. Łukasz Radli ski, Norman Fenton, Martin Neil, David Marquez.(2004)”Improved Decision-Making for Software Managers Using Bayesian Networks”, Proceedings of the 11th IASTED International Conference on Software Engineering and Applications [23]. V.K. Narayanan, Deborah J. Armstrong, "Causal Mapping for research in Information technology", Idea group publications, 2005 [24]. I. Stamelos, P. Dimou, L. Angelis. (2003), On the Use of Bayesian Belief Networks for the Prediction of Software Development Productivity, Information & Software Technology, Elsevier, 45, pp. 51-60. [25]. Norman Fenton, Martin Neil, (1999), “A Critique of Software Defect Prediction Models”, IEEE Transactions on Software Engineering 25 (5) 675–689. 57