Size Estimation of Web Applications through Web CMF Object
Giulio Barabino, Daniele Grechi Dept. of Biophysical and Electronic Eng. Dept. of Electrical and Electronic Eng. University of Genova University of Cagliari Genova ITALY Cagliari ITALY {erika.corona, michele}@diee.unica.it {giulio.barabino,daniele.grechi}@unige.it
Erika Corona, Michele Marchesi
Abstract—This work outlines a new methodology for estimating the size of Web applications developed with a Content Management Framework (CMF). The reason for proposing – through this work – a new methodology for size estimation is the realization of the inadequacy of the RWO method, which we had recently developed, in estimating the effort of the latest Web applications. The size metric used in the RWO method was found not to be well suited for Web applications developed through a CMF. In this work, we present the new key elements for analysis and planning, needed to define every important step in developing a Web application through a CMF. Using those elements, it is possible to obtain the size of such an application. We also present the experimental validation performed on a 7-project dataset, provided by an Italian software company.
INTRODUCTION
This work outlines a new methodology for estimating the size of Web applications developed with a Content Management Framework (CMF). We concerned ourselves with effort estimation for Web applications twice, in 2009 [1] and in 2011 [2]. In our 2009 paper [1], we compared the effectiveness of Albrecht's classic Function Points (FP) metrics [3] and Reifer's Web Objects (WO) metrics [4] in estimating development effort for Web applications, in the context of an Italian software company, Datasiel s.p.a.. We tested those metrics on a dataset made of 10 projects provided by the software company between 2003 and 2009. We proceeded to calculate the size of each project in both FP and WO, and used the resulting values in the empirical cost model habitually used by the software company. Thus we obtained the corresponding effort estimation values. Lastly, we compared the estimate data with the real effort of each completely developed project, using the MRE (Magnitude of Relative Error) method. The experimental results showed a high error in estimates when using WO metrics, which proved to be more effective than FP metrics in only two occurrences. However, neither of the metrics passed Conte's criterion [5], although the FP metric was the closest to its satisfaction.
c 2012 IEEE 978-1-4673-1762-7/12/$31.00
Datasiel s.p.a. Genova ITALY
[email protected]
In the context of this first work, it appeared evident that effort estimation depends not only on functional size meas ures, though these represent the main input – strongly influ encing the final results; other factors had to be considered, such as model accuracy and other challenges specific to Web applications. For this reason we revised the WO methodology in 2011, creating the RWO model [2] and providing a related empirical study. The RWO model belongs to the category of mixedapproach effort estimation models: it estimates the effort required to develop a Web project in terms of man days, using a standard size metric associated to an empirical cost model. The size metrics used in the model are a combination of two metrics: Albrecht's classic FP metric and Reifer's WO metric. The empirical cost model was obtained by the Italian software company through its previous experience of estimation and development of similar projects, as well as from certain characteristics of its development team and of the technology they used. We applied the RWO method to a dataset made of 24 projects provided by the company, comparing the results to those gathered by applying the FP and WO methods. The experimental results showed that the RWO method reached effort prediction results that are comparable to – and in 4 cases even better than – the FP method. The reason for proposing – through this work – a new methodology for size estimation is to counteract the inad equacy of the RWO method when used with the latest Web application development tools. The size metric used in the RWO method was found not to be well suited for Web ap plications developed through a CMF. In particular, operands and operators used in Reifer's metric relate to elements that can be quickly and easily created by modern programming technologies, and whose weight appears to be irrelevant in terms of size calculation for a Web project. New key elements for analysis and planning were identi fied; they allow for the definition of every important step in the development of a Web application using a CMF. Each element can show a different degree of complexity: low, mediumlow, mediumhigh, or high. Following the RWO method approach, the estimated size of a Web project stems from the sum of all elements, each one of them weighted with its own complexity. We tested the new methodology
Keywords–Web application size estimation; Content Management Framework; Web Objects; experimental validation
I.
Laura Piccinno
14
WETSoM 2012, Zurich, Switzerland
on 7 projects, provided by the Italian software company mentioned above. We compared the value of the size estimate to the final size of each project, with favourable results. This paper is organized as follows: Section II presents an overview of main current cost models; in Section III we describe our approach, while Section IV discusses the results of the experiments performed applying our method to obtain its validation. Finally, Section V presents the conclusions and plans for future work. II.
applications are not well suited to the context, as stated by many authors also in regards to CMSbased projects [12] [13][14][15][17]. III.
PROPOSED MODEL: WEB CMF OBJECTS
Thanks to the partnership, now years old, with the same software company, we could directly observe in the field the evolution of development technologies and methodologies on projects developed over a span of almost 10 years. This fact allowed us to experiment the effort prediction methodo logies in the literature and to adapt them to the changes in technologies over time. The latest trend1 in Web application development is the now prevailing usage of Content Management Frameworks (CMF). Therefore, we decided to focus our research work in this direction, elaborating a method specifically built for size estimation of projects where a CMF is in use.
COST MODEL OVERVIEW
Research on software development effort and cost estimation has been abundant and diversified since the end of the Seventies [3][6][5]. The topic is still very much alive, as shown by the numerous works existing in the literature. Researchers have extensively investigated the topic, in relation to both estimation approach (regression, analogy, expert judgment, function points, simulation, etc.) and research approach (theory, survey, experiment, case study, simulation, etc). The studies are carried out in both industrial and academic contexts. The most frequently used estimation approach is regressionbased, where the COCOMO model is the most wellknown model [6]. With regard to the validation of estimation methods, the dominant research approach is based on the use of historical data. Moreover, the context most research applies to is the industrial one [7]. Narrowing down the topic to Web applications, one of the first researchers to introduce size metrics to measure Web applications was Tim Bray [8], through statistical analyses on the basic characteristics of the main Web sites in 1995. Size metrics were proposed to be used to predict development effort for Web applications some years later, with Cowderoy et al. [9]. At the beginning, the models used for Web effort estimation were the same as the ones used for general software applications. One of the first scholars to introduce a method specifically devised for the Web was Reifer, through WO metric and the WEBMO model [4]. This model was later used by other researchers to perform comparisons among different estimation models, but with varying results, sometimes dissimilar from each other [1][10][11][12]. Many research works on Web effort estimation were also carried out by Mendes and collaborators [13][14]. Works devoted to estimate development effort in CMS projects are fewer: for example, a paper by Aggarwal et al. [15], where the linear regression estimation model CMSEEM is proposed. In ending this section, we want to underline that, in general, project effort estimation models are based on cost models that consider as input a set of parameters – named cost drivers – being size the predominant one [6][16], and give as output an effort measure. However, currently there is no model able to adequately measure the size of a Web application. The existing models for classic software
A. Content Management Frameworks The size estimation methodology – outlined below – was devised starting from a thorough observation of the develop ment cycle of Web applications developed with Content Management Frameworks available with an Open Source li cense, such as Joomla!, Drupal, etc. [19][20]. A Content Management Framework (CMF) is a high level software application that allows for the creation of a customized Web Content Management System (WCMS). A WCMS is a software tool that can be used by both technical and general staff, and that allows for the creation, editing, management and publishing of a wide asset of multimedia content, in a website. CMFs support and greatly help to or ganize and plan a WCMS, freeing the site administrator from all aspects related to Web programming (knowledge of scripting languages, server installation, database creation etc.). Every CMF, apart from the basic functionalities for creation and management of dynamic pages, have libraries of modules and addon components readily available to users. By using such libraries, even the most knowledgeable programmer can be free from the task of writing code parts on easy and recurring functionalities, with the advantage of focusing on specific functionalities for her or his own application. The web developer using CMF has many options: using just readymade modules (and components) for the entire application, editing and customizing the available modules to her or his liking (a chance specific to open source CMFs), or plan and program new, completely original, modules2. It is clear that, in the final estimation of development effort, 1
15
According to a recent, unpublished survey [18], conducted by the Department of Electrical and Electronic Engineering (DIEE) of the University of Cagliari and the Department of Biophysical and Electronic Engineering (DIBE) of the Univer sity of Genoa from a sample group of software developers, 90% of respondents usually adopt Open Source CMF when de veloping Web applications.
the use of readymade modules and components will have a different impact compared to programming new ones starting from scratch. Similarly, editing modules and components in order to customize them will have a different impact altogether.
B. Size Estimation of a Web Application The proposed methodology was meant for Web applica tions developed with CMFs, regardless of the technology used to implement the frameworks. Following the analysis on a sample of Web applications and of their development cycle, distinctive and recurring elements were found. They were divided into two sets: general elements and specific functionalities. Each element found is marked by a complexity degree, depending on various factors: context of application, existence or absence in the used CMF library, customization, reuse, etc. The weighted sum of each element makes up the size estimation of the Web application. In this way, size estimation is performed in terms of functionalities offered by the application to the user, as in Albrecht's classic FP metric [3], but everything is now contextualized to the present time.
Site Structure
C. General Elements General elements are defined as all the preliminary ana lysis and planning activities, as well as the essential elements for creating the main structure of an application, like basic image elements and some information content, usually static and with low or no interaction with the user. Basic, necessary elements for interaction in an application belong to this class. Some elements are single, while for others there might be a number of instances; all elements have a complexity that can be low, mediumlow, medium high or high. 1) Single-instance general elements: Below is a list of the 15 singleinstance elements, each with its own definition. These elements can be present or not, but if they are present, their number is just one.
2
Content architecture: content management plan ning: document types and management types (e.g.: listing texts by expiring date or by type/topic, by priority/deadlines, user type, etc.). Management and re-aggregation of tags and keys: categorization and classification of content and information on the Website. System infrastructure: arrangements for the re quired infrastructure at system level. General search engine on site: a basic (standard) search engine or a customized one, present in the application. Preparation of bare mockup, requirements and navigation: decision as to how navigation should be done, what is to be highlighted, content management solutions. Content management system: creation of compon ents for content management. Graphic and Maps
Context and External Environmental Analysis
Analysis of on-line demand-and-offer: critical summary and review of gathered materials (market analysis, interviews, focus groups, etc.). Newsletter: policies on spreading and publishing content, how frequently, to whom, etc. Customizations by editorial staff: feasibility of updates to the site from outside. Options (software side) to edit the template, in case external staff is planned to be in charge. Site findability and positioning verification: operations related to the positioning of the site on search engines.
Context and user-base analysis: critical issues and opportunities of the informative space where the Web application is to be run.
According to a recent unpublished survey [18], from a sample group of CMF users in Web application development: 68% of respondents frequently uses components of the library, and the same respondents state they use the components with a frequency between 61 and 100% on the total development of an application; 5% of respondents uses modules from the library, and the same respondents state they use the mod ules with a frequency between 0 and 20%; 64% of respondents edits modules from the library, changing usually about 40% of the code compared to the original.
Production of logo and corporate image: thor ough study of design and meanings. Graphic layout production: layout elaborated by graphic artists, starting from bare mockup (title, footer, static elements in interface). Creation of ad hoc texts, pictures and/or videos: development of original multimedia content for the Web, on request by the customer, on specific topics. Map (or background): management of necessary backgrounds for creation of georeferenced information into the application.
2) Multiple-instance general elements: Below is a list of the 4 multipleinstance elements, each with its own definition.
16
Community and social management: managing the presence of the Web application on the main so cial networks, as static (simply sharing contents) or
dynamic (an intelligent and more complex manage ment style). One instance per social network. Templates and navigation system: planning of main templates (home page, content pages, search pages, etc.), menu and crosssection views (view by user, view by life events, etc.). One instance per template. User role management: Frontend user registration and customization of access type to the site depend ing on user type. One instance per user type. Multilingualism: simple translation of the site and replanning of some parts depending on language. One instance per each language.
External accessibility
D. Specific Functionalities This category includes all elements needed for interac tion between application and user, concerning the specific features of the application. These are functionalities ex pressly created, thus with a high customization level and database interaction (authentication, profiling, data input forms, etc.) are considered. As done previously, functionalities are evaluated by number of instances, as well as by complexity level, which can be low, mediumlow, mediumhigh or high. For instance, in the case of the number of tables in the DB that have to be created, we will consider separately the number of low complexity tables, of mediumlow complexity tables, and so on, multiplying each number by a weight depending on their complexity and summing up the four factors. Below is a list of elements describing the 11 multiplein stance specific functionalities, each with its own definition.
DB and internal Query creation: number of tables in the DB. Report system design: number of reports. External Query: number of queries to external DBs.
Cartographic and Multimedia
Management of reserved areas: definition of ac cess levels (management of content approval work flow: e.g. none, reading, writing, adding/deleting documents, adding new pages, etc.) and functional ities of each reserved area (page or site section) – number of different areas. External system access: number of accesses to different external applications. Services available outside of the application: number of Web services the system provides and/or uses. Data input models: number of modules specific to the application.
E. Complexity Degree Determining the complexity degree of each element is one of the most critical steps in the methodology, because it is left to the project manager's own experience and know ledge of her or his team of developers. The degrees that can be associated to each element are four: low, mediumlow, mediumhigh or high complexity. We decided to use a 4de gree ordinal scale to avoid giving the user of the method the chance to chose a “fully balanced” judgment – that is not to perform a choice. In all cases, the user must chose between “low” and “high”, albeit in different levels. The complexity degree to be assigned to analysis and planning is strongly related to the context and size of the application; thus, it must be assessed on an empirical basis. As far as development of CMF modules or plugins is con cerned, we can generally consider:
Query and Reporting
File types managed by the application: number of different file types the application needs to manage.
Cartographic data base: use and management of preexisting data bases needed to include georefer enced information into the application (e.g. data bases on hospitals/hotels/companies etc.). Adhoc cartographic data bases belong to the "DB and ex ternal query creation" category. Creation and inclusion of customized maps: cre ation and inclusion of maps with placeholder icons, lines, selection tools, videos or pictures, through the use of Google Maps JavaScript API, or similar APIs – number of different maps. Clickable maps: number of pictures/graphs with hypertextual links to other sites or other sections of the same site.
Low complexity when the element is present in the CMF library or when preexisitng elements are used without substantial changes; Medium-low complexity when the element is present in the CMF library but a customization is needed, or when preexisting elements are used with nonsub stantial changes; Medium-high complexity when the element is not present in the CMF library and therefore there is a need for it to be implemented, or when the customization of an element in the library is substantial; High complexity when the element is not present in the CMF library and its implementation is complex or when the customization of an element in the lib rary is very high.
F. Calculation of the Estimation After considering every element, each one of which is weighted with its own complexity, the size estimation of the
17
Web application results from the simple sum of all elements: M
N
Sizeestimation =∑ j=1 EG j c j∑ k=1 FS k c k
For each element present in each examined project, as reported in Section II.C, we multiplied their relative size for the complexity factor shown in Table I (accounting also for the number of instances with different complexity level). We then summed up the value of all these elements, as reported in eq. (1) of Section II, obtaining the provisional size estimate. We then analysed the results of the actually completed projects (Web sites, functionalities, data base, etc.), and re computed the size of the elements of these projects, using in the computation the data taken from the real complexity degree experienced by the team in creating each element, finding the final actual size. This analysis showed dif ferences in both the number of completed elements and in their complexity, compared to the estimation data, as expec ted.
(1)
Where: EG j is the j-th general element, of c j complexity, and M is the total number of general elements; FS k is the k-th specific functionality, of c k complex ity, and N is the total number of specific functionalities. IV.
EXPERIMENTAL RESULTS
The method outlined here can be considered to be gener ally valid, since the elements presented in Section III are common to many Web applications. On the other hand, the calibration of the method through the choice of the complexity degree to assign to each element is strongly dependent on the team developing the application. Therefore, the experimental findings shown below are to be considered of limited external validity, although they represent an interesting validation case of a methodology on real data. Testing the validity of the Web CMF Objects size estimation model through comparison with other methods usually used in literature, such as FP, COCOMO, WO etc., has not been possible, because these methods measure different elements from those we considered, and are, therefore, hardly comparable whit it.
TABLE I. Low 0.5
LINEAR COMPLEXITY FACTOR
Medium-Low 1
Medium-high 1.5
High 2
B. Dimension Prediction and Evaluation Method We evaluated the effectiveness of the methodology in predicting the size of the analyzed applications through the calculation of the MRE (Magnitude of Relative Error) factor for each project, a measure commonly used in the Web es timation literature for prediction accuracy.
MRE=
A. Dataset We considered a dataset made by 7 projects, developed with Content Management Frameworks between 2009 and 2011, by the software company Datasiel. The data on all the previously examined elements in the projects were provided during the requirement gathering (estimation level), as well as after the development was finished (final level). The pro jects are generally aimed to develop applications for public administrations and health services. For each project: the elements discussed in Section III were extracted from their requirement documents; with the help of the project manager, a qualitative judgment on the complexity of each element (low, mediumlow, mediumhigh, high) was stated, also considering the abilities of the team and their experience on past projects; the qualitative complexity degree was then conver ted into a quantitative degree, using a multiplicative factor linearly accounting for complexity. Table 1 shows the conversion factor used to account for complexity in the size estimate. The model was tested using both an exponential (0.5, 1, 2, 4) and the linear factor shown. We observed that, using a linear factor, the model is able to more faithfully reproduce the estimation size value of each project of empirical data.
Size ACTUAL −Size ESTIMED Size ACTUAL
(2)
Similarly to what reported in [10], we completed the error evaluation by calculating the prediction level Pred, the proportion of the observations within a given level of accuracy:
Pred l =
k N
(3)
By performing N total observations, if k is the number of observations with an MRE less than or equal to l, Pred (l) is the percentage of projects with a MRE less than or equal to l. For instance, if we have 10 projects, with just 8 out of them having an MRE less than 0.2, then Pred(0.2) = 0.8. Conte et al. [5] suggest an acceptable threshold value for the mean MRE to be less than or equal to 0.25 and for Pred(0.25) greater than or equal to 0.75 (more than 75% of the projects have an MRE less than 0.25). C. Results Table 2 summarizes the results for the examined dataset. As shown there, the method gives reassuring results, with very low MRE values, except in the case of project 5, that shows a large MRE, due to an overestimate of context and external environmental analysis. As confirmed by the more
18
detailed MRE analysis in Table 3, the Web CMF Objects methodology has a value of Pred(0.25) equal to 85.7%, so its satisfies the criterion by Conte et al. This means, for our methodology, an acceptable estimation value regarding the size of the application. A limitation of our analysis is that we computed the actual size using the same estimators of the method, and not using an independent measure, such as for instance the lines of code. In Web development using a CMF, however, the LOC metric is almost meaningless, and there is no other sound dimensional metric that can be applied. Of course, the ultimate metric is the actual effort (in mandays) needed to develop the application, but for the studied projects they were not readily available, for administrative reasons.
Final – provisional size
Qualitative judgment
MRE
1 2 3 4 5 6 7
Final actual size through Web CMF Objects
Project n.
WEB CMF OBJECTS ON 7 DATASETS PERTAINING TO REAL PROJECTS. Provisional size estimate through Web CMF Objects
TABLE II.
391.5 161.5 71.5 59.5 41 331.5 183.5
405 165 64.5 51.5 27.5 351 180
13.5 3.5 7 8 13.5 19.5 3.5
underestimated underestimated overestimated overestimated overestimated underestimated overestimated
0.03 0.02 0.11 0.16 0.49 0.06 0.02
TABLE III. Min
Max
Mean
Median
0.02
0.49
0.13
0.06
V.
CMF. After calibrating the method, we tested it on a dataset made by 7 projects, developed between 2009 and 2011. The data on all the previously described elements in the projects were provided during the requirement gathering (estimation level), as well as after the development was finished (final level). Our findings show that the application of the method gives very low MRE values: the Web CMF Objects methodology has a value of Pred(0.25) equal to 85.7%, so its satisfies the acceptance criterion by Conte et al. [5]. This means, for our methodology, an acceptable estimation value. This work represents the base for the development of a new methodology for effort estimation in Web applications developed with a CMF. In fact, size estimation is the main input for cost models. We are presently working on finding the right conversion model and parameters to estimate the effort needed to develop a Web application, starting from the size estimation. To this purpose, we are examining several projects carried on in various companies, gathering the initial requirement data, and the final balance of cost and effort data of the same projects. ACKNOWLEDGMENT We would like to acknowledge the contribution of the following persons to this paper: Marinella Ghisu, Danele Sanna, Antonio Ariu and Francesca Serci. This work was partially funded by Regione Autonoma della Sardegna (RAS), Regional Law No. 7, 2007 on Promoting Scientific Research and Technological Innovation in Sardinia, call 14/2/2009, and RAS Integrated Facilitation Program (PIA) for Industry, Artisanship and Services, call 14/10/2008, project No. 265, Advanced Technologies for Software Measuring and Integrated Management, TAMIGIS.
MRE STATISTICS Std dev 0.16
Pred (0.25) 6
%Pred (0.25) 85.71%
REFERENCES [1]
CONCLUSIONS AND FUTURE DEVELOPMENTS
[2]
We presented a new methodology of size estimation for Web applications developed with a Content Management Framework (CMF). The reason for proposing a new methodology for size estimation was the realization of the inadequacy of the RWO method which we had previously developed in estimating the effort of the latest Web applications. Thanks to the partnership, now years old, with the same software company, we could directly observe on the field the evolution of development technologies and methodologies on projects developed over a span of almost 10 years. New key elements for analysis and planning were identified; they allowed for the definition of every important step in the development of a Web application through a
[3] [4] [5] [6] [7]
[8]
19
Barabino G., Porruvecchio G., Concas G., Marchesi M., De Lorenzi R., Giaccardi M., “An empirical comparison of function points and Web Objects”, Computational Intelligence and Software Engineering, CiSE 2009. International Conference on, 2009. Folgieri R., Barabino G., Concas G., Corona E., DeLorenzi R., Marchesi M., Segni A., “A revised Web Objects method to estimate Web application development effort”, ICSE 2011. A.J. Albrecht, “Measuring application development productivity”, 1979. Reifer D., “Web development: estimating quicktomarket software”, Software, IEEE, 2000. Conte D., Dunsmore H. E., Shen Y.E., “Software engineering metrics and models”, BenjaminCummings Publishing Co, 1986. Boehm, B.W, “Software engineering economics”, Software Engineer ing, IEEE Transactions on, 1981. Jørgensen, M., Shepperd, M.J., “A systematic review of software de velopment cost estimation studies”, Software Engineering, IEEE Transactions on, 2007. Bray, T, “Measuring the Web”, 1996
[9]
[10]
[11]
[12]
[13]
Cowderoy A.J.C., Donaldson A.J.M., Jenkins J.O., “A metrics frame work for multimedia creation”, Software Metrics Symposium, Met rics 1998, Proceedings, 1998. Ruhe M., Jeffery R., Wieczorek I., “Using Web Objects for estimat ing software development effort for Web applications”, Software Metrics Symposium, Proceedings, 2003. Ferrucci F., Gravino C., Di Martino S., “A case study using Web Ob jects and COSMIC for effort estimation of Web applications”, Soft ware Engineering and Advanced Applications. SEAA '08. 34th Eur omicro Conference, 2008. Hooi, T.C., Yusoff, Y., Hassan, Z., “Comparative study on applicab ility of WEBMO in Web application cost estimation within Klang Valley in Malaysia”, Computer and Information Technology Work shops, CIT Workshops, 2008. Mendes E., Mosley N., Counsell S., “The application of casebased
[14] [15]
[16] [17]
[18] [19] [20]
20
reasoning to early Web project cost estimation”, Computer Software and Applications Conference, Proceedings, 2002. Mendes E, Mosley N., Counsell S., “Early Web size measures for Web costimation”, Software Metrics Symposium, Proceedings, 2003. Aggarwal, N., Prakash, N., Sofat, S., “Web hypermedia content man agement system effort estimation model”, ACM SIGSOFT, Software Engineering Notes, 2009. Jones, T.C., “Estimating software costs”, 1998. Mangia L., Paiano R., “MMWA: a software sizing model for Web ap plications”, Web Information Systems Engineering, WISE 2003, Pro ceedings 2003. Corona E., Grechi D., “CMF Survey”, unpublished. www.joomla.org http://drupal.org