Criteria-formulating delphic hierarchy process: A ...

Applied Computing and Informatics (2015) xxx, xxx–xxx

Saudi Computer Society, King Saud University

Applied Computing and Informatics (http://computer.org.sa) www.ksu.edu.sa www.sciencedirect.com

ORIGINAL ARTICLE

Criteria-formulating delphic hierarchy process: A systematic approach for open source solution adoption Yixin Bian, Song Zhao *, Hailong Zhu College of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China Received 28 June 2014; revised 12 June 2015; accepted 27 July 2015

KEYWORDS Open source solutions; Biomedical software development; Delphic Hierarchy Process; UIMA; GATE

Abstract Open source solutions have been widely applied by the research community across many application areas and have formed a cornerstone in enabling research in the systems science era. In the biomedical domain, rapid innovation and wide adoption of reusable software components, standards and new computational methods have been promoted by open source solutions. The process of evaluating, comparing and selecting open source solutions is far from being trivial and there has been very little work on systematic approaches for the process. In this study, we present a systematic approach, Criteria-Formulating Delphic Hierarchy Process (CFDHP) for evaluation, comparison, and selection of the complex frameworks. The DHP method is used and integrated with the criteria formulation. The application of CFDHP and its utility in assisting the decision making process of open source solution adoption are illustrated through the comparison of two popular open source natural language processing architecture frameworks: UIMA and GATE. Ó 2015 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http:// creativecommons.org/licenses/by-nc-nd/4.0/).

* Corresponding author. E-mail addresses: [email protected], [email protected] (Y. Bian), [email protected], zhaosong@ hrbnu.edu.cn (S. Zhao). Peer review under responsibility of King Saud University.

Production and hosting by Elsevier http://dx.doi.org/10.1016/j.aci.2015.07.002 2210-8327 Ó 2015 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Please cite this article in press as: Y. Bian et al., Criteria-formulating delphic hierarchy process: A systematic approach for open source solution adoption, Applied Computing and Informatics (2015), http://dx.doi.org/10.1016/j.aci.2015.07.002

2

Y. Bian et al.

1. Introduction Informatics plays a crucial role in advancing the systems science. One trend in the past decade is that researchers tend to disseminate informatics tools as open source software packages [1]. Of particular interest is an open source software framework consisting of a group of program modules that provide the essential code for generic functionalities. The code can be selectively changed and does not need to be implemented for each new project [2] and allows researchers to fulfill their research objectives faster, cheaper, and better. Rapid innovation and wide adoption of reusable software components, standards and new computational methods in the biomedical area have been promoted by open source solutions. Often, it is not a trivial process to adopt a specific open source solution for a specific project because there are multiple open source solutions available. Besides the development, a software product needs to be maintained. Software maintenance covers various activities such as bug fixing, functionality enhancement, deletion and addition of capabilities, the adaptation of changes in data requirements and operation environments, and the improvement of performance, usability, or any other quality attributes. Therefore, selection of a framework is an important decision because many of the functional and non-functional properties of a framework can have a strong impact on the success of a research project. In addition, it is impossible to explore all quality factors for a specific domain. So some of the considerations from the actual demands in biomedical informatics domain can be the following: Q1 : Will the software be maintained easily in the future? Q2 : Does the software have strong community support – bugs will be fixed in time and software will be maintained? Q3 : Is the software useful, usable and easy to use – user manual, helpdesk etc? If it has, does it provide detailed information? In this paper, we present a systematic approach named CFDHP that integrates criteria1 development with Delphic Hierarchy Process (DHP) to evaluate software quality, documentation and maintainability for open source solution adoption. The rest of the paper is organized as follows: Section 2 provides the background of this study. The CFDHP method is presented in Section 3. In next section, the proposed approach will be applied for evaluation of a case study. Section 5

1 The selected criteria are used for evaluation of the software framework in biomedical domain. Different evaluation methods or models selected different criteria. For example, even ISO 9126 defined a set of evaluation criteria which were more generalized than other models, ISO 9126 model was chosen as the base model and was customized to make it more suitable for the evaluation of a particular application domain, which is a B2B (business to business) application [3]. Of course, CFDHP can be applied to other research or application areas such as telecommunications, financial, and electric power transmission. Obviously, the criteria should be different in various areas according to different software products or purposes.

Please cite this article in press as: Y. Bian et al., Criteria-formulating delphic hierarchy process: A systematic approach for open source solution adoption, Applied Computing and Informatics (2015), http://dx.doi.org/10.1016/j.aci.2015.07.002

Criteria-formulating delphic hierarchy process

3

discusses the formulated evaluation criteria. Finally, conclusions and future works are given in Section 6. 2. Background and related work 2.1. Delphic hierarchy process Our method involves the Delphi and Analytic Hierarchical Process (AHP) techniques. The Delphi method is a technique that can be used to resolve problems and forecast by collecting and modifying expert judgements. Analytic Hierarchical Process is an approach of measurement through pair-wise comparisons and relies on expert judgements to derive priority scales for organizing and analyzing complex decisions [4]. The two techniques are integrated into Delphic Hierarchy Process (DHP) which was first provided by Reza Khorramshahgol and Vassilis S. Moustakis in 1988 [5]. DHP utilizes structured communication and hierarchical analysis present in Delphi and AHP to systematically identify the organizational objectives and then to set priorities among them. In 2012, DHP had been used to resolve the problems of Corporate Governances Quality [6]. 2.2. Existing framework comparison methods In this section, some of the most standard and well-known framework comparison methods are briefly discussed, focusing on their strength and weaknesses. As shown in Table 1, although the criteria in each study are different, the six comparison methods are similar with the purpose of finding the strengths and limitations of each compared framework as much as possible. 2.3. Quality models for evaluating software products Behkamal [3] proposed an ISO/IEC9126 based quality model, which is customized in accordance with the special characteristics of B2B applications. Their work is not to provide a method for evaluation quality but to identify a group of criteria. Therefore, it is unknown whether some of the criteria can be used if the evaluated products are similar in some aspects such as their portability are all good. In addition, the evaluation process cannot be performed again if the results are not reasonable or there are no communications among the experts. Young Min Lee et al. [13] suggested a complemented OSS selection process by examining existing OSS selection process method. This approach mentioned that established assessment criteria in its step four, but it did not provide what the criteria are and how to measure them. Taibi et al. [14] provided a method for OSS evaluation. OpenBQR took advantage of the strength of the existing methods and alleviated some of their drawbacks. The criteria of OpenBQR mainly divided the software quality into


4

Y. Bian et al. Table 1

The different framework comparison methods.

Literature

The types of evaluated software frameworks

Evaluation criteria

Advantages

Disadvantages

Casagni et al. [7]

(1) The web-centric Java 2 Enterprise Edition (J2EE) framework (2) the FIPA-compliant multiagent system (MAS) Multimedia conferencing frameworks

Design properties performance resource usage developer experience scalability robustness extension transmission delay Views/ perspectives abstractions systems development life cycle

The differences and similarities between the two component frameworks were presented

Lacking of the guidance of expert for certain situations

The differences and similarities between the two component frameworks were presented The comparison results can be used for guidance in the selection of an EAF that meets the needed criteria


The benefits and limitations of each visualization framework are presented and provide the concrete application area for each framework The benefits and limitations of each software are presented and provide the concrete application area for each product

Katrinis et al. [8]

Urbaczewski et al. [9]

Enterprise architecture frameworks

Bitter [10]

Four image processing and visualization frameworks

50 more evaluation criteria

Jean-Charles Mateo-Velez [11]

Two spacecraft charging software tools, the spacecraft plasma interaction software and the multi-utility space craft charging analysis tool Five popular life, cycle impact, assessment, software tools

Crosscomparison

Ren Zhongming [12]

Six comparison criteria

The benefits and limitations of each software are presented and provide the concrete application area for each product

Some of the frameworks do not clearly ‘map’ to the idea of ‘viewpoints’ and ‘‘aspects”, therefore making the comparison of the frameworks difficult Lacking of the guidance of expert for certain situations



external quality and internal quality, which all involved in our study. However, the relative importance of the evaluation factors was not taken into account. Aversano et al. [15] applied EFFORT as a base framework to be specialized to the context of CRM system. It considered all of characteristics defined by ISO/IEC 9126 standard and the totality of software product quality attributes was classified in a hierarchical tree structure. Similar with OpenBQR, it did not mention the relative importance of the evaluation factors. The main objective of



5

Sarrab et al. [16] was to identify system quality, information quality or service quality that derive or motivate users and IT decision maker in choosing their OSS products. However, their work failed to adjust the criteria according to the requests in specifical domain. Literature [15,16] both gave more criteria than Behkamal’s [3] and ours, but as illustrated in [3], it is not possible to explore all of quality factors in one research. Therefore, the criteria and its relative weights from domain experts who know the actual issues are reliable. In addition, our approach is more flexible than theirs because it can be repeated again and again if the final results are not acceptable or some experts do not agree with others, and it can adjust criteria according to different practical requests. 3. Method description An overview of the proposed evaluation approach, Criteria Formulating Delphic Hierarchy Process (CFDHP), is shown in Fig. 1. In CFDHP, criteria formulation is conducted prior to DHP so that the evaluation criteria not only can be obtained, but also their weights and scores to be considered in analysis are determined as well. The views of all decision makers are incorporated into criteria formulation. 3.1. The step of CFDHP The evaluation approach contains seven steps as follows: Step 1 Develop evaluation criteria for assessing the alternatives: The first step involves development of the evaluation criteria. When there are clear evaluation criteria, DHP can facilitate decision making [5]. However, the validity of the evaluation criteria varies depending on the specific projects. Therefore, it is important to identify the selection criteria and objectives which are the basis for a sound decision.

Figure 1

An overview of CFDHP.


6

Y. Bian et al.

Step 2 Collect data: On the basis of the criteria created in the first step, this step is to collect data on the candidate frameworks. Step 3 Select a monitor team: The team should consist of experts who are familiar with both biomedical software development and the specific content of research projects. Step 4 Use Delphi method to determine criteria weights and assign scores: The monitor team should design a questionnaire in which the participants, who are the main stake-holders of the project, are asked to specify the criteria weights and scores assigned to the criteria for the alternatives. Higher scores show better suitability. Step 5 Set up a pairwise comparison matrix: At this step, the weights and scores obtained in Step 4 should be presented to the participants in order to obtain their subjective value judgements for pairwise comparison matrix. If consensus is not reached regarding any individual element of this matrix, the arithmetic mean of the value judgements of all participants will be considered by the monitor team for that element. Step 6 Perform AHP: This process can be done manually or by using a tool where the pairwise comparison matrix and scores confirmed in step 4 are the input. Step 7 Make a decision: At this step, a decision will be made if there is a considerable difference among alternatives. In case of small differences, it is important for the decision makers to remember that they are reaching a qualitative judgement. The process can be repeated to review the criteria formulated and earlier weight and score assignments. The dashed line in Fig. 1 shows the possibility of iteration in CFDHP. That is the decision process will be performed again (from step 1 to step 7) until the result is reasonable or accepted by all experts.2 3.2. Development of the evaluation criteria A set of acceptance criteria are the key factors to fulfill the evaluation software for specific purposes. In this section, multiple quantitative evaluation criteria used in Step 1 and Step 2 which are (a) code metrics, (b) code smells, (c) bugs, (d) bug-fix time, and (e) user manuals are presented respectively. These evaluation criteria are chosen by the software engineering experts in respect to software quality standards which synthesize the knowledge of software engineering domain and the practical requests of biomedical practice. The experts are also taking into account that the tools are OSS. The bugs and bug fixing time are evaluation criteria directly connected with the particularity of OSS, measuring the activeness/reliability of the

2

The monitor team will make a decision if all experts do not accept the result or some experts do not agree with others after several iterations.


Criteria-formulating delphic hierarchy process Table 2

The selected metrics and their data source.

No.

Metrics

1

Code metrics Code smells Bug Bug-fix time

2 3 4

5

7

The data source CBO, NOC, RFC, DIT, LCOM, and WMC

Understand for Java [17]

Data class, data clumps, feature envy, long message chain, refused bequest, shotgun surgery and god class

Borland togethera

Users’ manuals

FindBugsb, PMDc, and Lint4jd The web site of UIMAe The web site of GATEf The user manuals of UIMA and GATE obtained from its web site

a

https://www.borland.com/us/products/together. http://findbugs.sourceforge.net/downloads.html. c http://pmd.sourceforge.net/. d http://www.jutils.com/. e https://issues.apache.org/jira/browse/uima#selectedTab=com.atlassian.jira.plugin.system.project% 3Aissuespanel. f http://sourceforge.net/tracker/?atid=756796&groupid=143829&func=browse. b

community producing the evaluated tool. Table 2 lists the selected metrics and their data source. 3.2.1. Brief introduction of the evaluation criteria In this section, the chosen five evaluation criteria: code metrics, code smells, bugs, bug-fix time, and user manuals are introduced briefly. Code metrics: One of the first suites of OO design measures is CK-Metric suite [18]. CK-Metric suit contains 6 metrics which measure different aspects of an object oriented system such as size, coupling, cohesion and inheritance [19]. In our early research [20,21], we found that dependencies concentrate more in smaller modules in both certain product release and consecutive product releases. That is, smaller modules are proportionally (per line of code) more dependent on other modules compared to larger ones over a product’s whole lifetime. In addition, we also found that refactoring exacerbated the dependence concentration for both one release (i.e. a single snapshot) per product and successive software releases. That is, after refactoring, smaller classes were proportionally even more dependent. Refactoring is a process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure [22]. For software, refactoring is a good practice. So, the more smaller classes a software product has, the better its design quality is. In CK-metric suite, CBO and DIT are often used to measure the dependencies between the modules. Code smells: Code smells are signs of potential problems in code [23]. Code smells are usually not bugs – they are not technically incorrect and fail to change the program function. Instead, they just some signs of weaknesses in design that


8

Y. Bian et al.

may be slowing down development or increasing the risk of bugs or failures in the future. Bug: A software bug is the common term in a computer program or system. An error, mistake, flaw, fault, or failure in the source code can be described a bug. The errors made by people in a program’s source code or its design are the main source of bugs. The number of bugs is commonly used to measure software quality [24]. Bug–fix time: Bug-fix time is the time to fix a bug after the bug was introduced [24]. Most open source software development projects include an open bug repository–one to which users of the software can gain full access–that can track and report problems in the software system [25]. The main benefit of a bug repository is to provide a clear centralized overview of development requests and their status. Similar to the bug related analysis, bug-fix time is often used to measure software quality [24]. User manuals: A user manual is a technical document used to communicate. This kind of document can give assistance to people using a particular system. A well-documented guide will provide the users with detailed installation and un-installation instructions, step-by-step operation and system requirements etc. 3.2.2. Standards for quality assurance Different quality models have different standards to evaluate the target software products. While these studies are useful, they have also caused confusion because of the many quality aspects offered. In this paper, the base model is chosen by synthesizing ISO standards and the work of Shaikh and Cerone [26] and it is customized to make it more suitable for evaluation of a particular application domain, which is biomedical informatics application. Fig. 2 shows the relationships among evaluation criteria, the quality model standard from ISO/IEC-9126 and the quality metrics in this paper. In addition to the attributes of ISO/IEC-9126, community support is also important behind your chosen OSS, because it is always where one will usually go for support, news, advice and tips [16]. In this research, development communities have been considered. One of the selected criteria, bug-fix time, belongs to community support3 (maintenance capacity and sustainability). However, in order to reduce overlap with other criteria, bug-fix time is placed in reliability, which is related to quality by development as shown in Fig. 2. According to the described characteristics in former section (Section 3.2.1), code metric and code smell are grouped into Maintainability of ISO/IEC-9126 quality standard and belong to the notion quality by design from the literature [26] at the same time. User manual is grouped into Usability of ISO/IEC-9126 model and belong to the notion quality by access as well. In ISO/IEC-9126 quality standard, Reliability refers to maturity, fault tolerance and recoverability. 3

If the percentage of the unresolved bugs of product A is lower than B at some time, bugs can be resolved faster in A than in B at that time. It means that there are more people supporting A than B.



9

Figure 2 The relationships among evaluation criteria, the standard of quality model from ISO/IEC9126 and quality metrics.

Therefore, bug and bug-fix time are grouped into Reliability and belong to the notion quality by development at the same time. Fig. 2 shows the relationships among evaluation criteria, the quality model standard from ISO/IEC-9126 and the quality metrics in this paper. 4. Applying the proposed approach to a case study To show that the proposed approach can be used in practice, CFDHP is applied to evaluate and select two natural language processing (NLP) frameworks, UIMA and GATE. The background information of UIMA and GATE is introduced briefly at first, followed by presenting the evaluation process and results. The steps 3–7 of CFDHP are presented in real data. This process was firstly permitted by Institutional Review Board and then the relevant experts gave the evaluation data. These data were provided by Dr. Hongfang Liu from Mayo Clinic of USA. 4.1. UIMA and GATE NLP tools UIMA (Unstructured Information Management Architecture)4 is an open, scalable and industrial-strength framework for NLP. It can analyze applications or search solutions that process text or other unstructured information. 4

http://uima.apache.org/.


10 Table 3

Y. Bian et al. The comparison of metrics. UIMA

GATE

Line of code

169,516

Metrics

Min

Max

Total

Median

Average

Min

218,963 Max

Total

Median

Average

CBO NOC RFC DIT LCOM WMC

0 0 0 0 0 0

84 71 347 10 100 345

11,822 1170 35,220 3837 79,374 15,166

5.41 0.53 15.82 1.72 35.66 6.81

0.07 0.007 0.21 0.023 0.468 0.089

0 0 0 0 0 0

65 81 214 8 100 180

11,203 1027 29,909 4731 85,051 15,220

3.99 0.37 10.65 1.68 30.29 5.42

0.05 0.005 0.14 0.022 0.388 0.07

GATE (General Architecture for Text Engineering)5 is a suite of tools for Java. Originally GATE was developed at the University of Sheffield and now is used worldwide by a wide community. Students, teachers, scientists and companies choose GATE for all sorts of NLP applications, including information extraction in many languages. 4.2. Evaluation and comparison of UIMA and GATE This section presents the whole evaluation process. 4.2.1. Step 1 and step 2: Developing evaluation criteria and collecting data (1) Comparison of software from code metrics The values of the six metrics of UIMA and GATE are presented in Table 3. As shown in Table 3, the median value and the average value of each metric in UIMA are all larger than the values in GATE. The values of CBO and DIT6 shown in Table 3 mean that UIMA has more smaller classes than GATE has. As a result, the design quality in UIMA is better than the one in GATE. (2) Comparison of software from code smells In a concrete software product, there maybe exist several kinds of bad smells at the same time. This paper focuses on investigating seven of Fowler’s code smells including Data Class, Data Clumps, Feature Envy, Long Message Chain, Refused Bequest, Shotgun Surgery and God Class. We choose Borland Together, a Java IDE, supporting smell detection for Java programs to detect code smells. Table 4 presents the results. Table 4 shows that there is no duplicated code, the most famous code smell, in the two software products. There are seven kinds of code smells in UIMA while six 5

https://gate.ac.uk/overview.html. CBO and DIT were used as measures of dependency between classes in our early research [20,21] of UIMA are both larger than GATE. We drew our conclusions in light of the values of CBO and DIT of the tested programs. 6



11

The number of code smells in UIMA and GATE.

Code smell

The number of code smells in UIMA

Average (UIMA/KLOC)

The number of code smells in GATE

Average (GATE/KLOC)

Data class Data clumps Feature envy Refused bequest Long message chain Shotgun surgery God class Total

6 63 26 101 19 23 16 254

0.035 0.372 0.153 0.6 0.112 0.136 0.094 1.5

11 21 0 448 30 189 48 747

0.05 0.091 0 2.05 0.137 0.863 0.219 3.41

Table 5

The warning count in UIMA and GATE.

Detection tool

UIMA

GATE

FindBugs (2.0.0) PMD (5.0) Lint4j (0.9.13)

6 1798 84

178 1794 494

types of code smells in GATE. The average value of code smells in GATE is larger than the one in UIMA. (3) Comparison of software from bug data In the existing bug detection tools, FindBugs, PMD and Lint4j are applied to identify software bugs from UIMA and GATE according to the comparison results in the literature [27]. In Table 5, it can be seen that the numbers of bugs in UIMA detected from FindBugs and Lint4j are much less than the ones in GATE. (4) Comparison of software from bug-fix time Generally bug reporters provide a bug summary, bug description, the suspected product, and the component name with its severity [28]. Bug priority indicates the importance or urgency of fixing a defect. Though priority may be initially set by the Software Tester, it is usually finalized by the project or product manager. In UIMA, the priorities of bug are categorized into five levels: Blocker, Critical, Major, Minor and Trivial. In GATE, the priorities of bug are expressed as eight numbers: 9,8,7,6,5,4,2,1. In this paper, we consider Blocker, Critical, and Major as high priority and Minor and Trivial as low priority in UIMA. In GATE, we consider 9,8,7,6 as high priority and 5,4,2,1 as low priority. The bug information of UIMA and GATE was collected from their repository respectively.7 Fig. 3 shows the percentage of the unresolved bugs of UIMA is lower than that of GATE before about 700 days, therefore bugs can be resolved faster in UIMA than GATE before that time (The bug information is limited in March 16, 2012). 7

UIMA, https://issues.apache.org/jira/browse/uima#selectedTab=com.atlassian.jira.plugin.system.project% 3Aissues-panel. GATE, http://sourceforge.net/tracker/?atid=756796&group_id=143829&func=browse.


12

Y. Bian et al.

Figure 3

The survival curve of bugs.

(5) Comparison of software from user manuals Table 6 presents the comparison results of user’s manual. The results show that the users’ guide for UIMA contains more information than the one for GATE. 4.2.2. Step 3 to step 7: Applying the evaluation criteria Step 3: Select a monitor team After the first two steps of CFDHP, a monitor team consisting of experts who are involved in both biomedical software development and the concrete of research projects needs to be formed. Step 4: Use Delphi method The team would design a questionnaire in which the participants were asked to specify the weights of criteria and score. The obtained weights of criteria and scores which can be used by the monitor team to get the preliminary results are presented in the second row in Table 7.8 In Table 7, the sum of the criteria weights which each expert specified is 1 and the final value of each criterion would be calculated by the monitor team according to the four experts’ suggestions. At the same time, the experts also gave the relative judgements for each criterion shown 8

This process was firstly gotten a permission of Institutional Review Board and then the relevant experts gave the evaluation data. These data were provided by Dr. Hongfang Liu from Mayo Clinic of USA.



13

The documentation for UIMA and GATE.

Contents

UIMA

GATE

Catalog Tutorial of manual Overview and characteristics of software product Installation and setup Introduction of product application Frequently Asked Questions (FAQ) Known issues and problems with the software Terms and concepts and their basic definitions in software

U U U U U U U U

U U U U U

Table 7

The weight assignment and relative judgements for each criterion.

Weight Relative judgement

Code metric

Code smell

Bug

Bug fix time

Users’ manual

0.2 2:1

0.2 4:1

0.3 6:5

0.15 7:3

0.15 9:1

in the third row in Table 7 based on the obtained results of formulated evaluation criteria.9 Step 5: Set up a pairwise comparison matrix The participants provided their subjective value judgements for pair-wise comparison matrix shown in Table 8 based on the weights and scores specified in Step 4. In the literature [5], the author considered the each criterion as C1 ; C2 ,. . .. . .,Cn and assigned weights for them. The importance degree of Ci with respect to Cj is shown in Table 9 in Appendix. If consensus is not reached regarding any individual element of this matrix, the monitor team will consider the arithmetic mean of the value judgements of all participants for that element. Step 6 and step 7: Perform AHP and make a decision Finally, AHP should be performed by using the pair-wise comparison matrix and scores confirmed in Step 5. This can be done manually or by using a tool (yaahp V6.010). Numerical priorities are calculated for each of the decision alternatives. These numbers show the relative ability of the decision alternatives to achieve the final choice. In this example, the final result is: UIMA: 0.697, GATE: 0.303. With this result, the option for the decision makers would be clear. Note that here, we collect real data: The weights and scores are from five NLP experts of Mayo Clinic. The criteria set could vary among different use cases. So adopters should use their best judgements by considering the characteristics of their project needs. 9 The data in last row are the relative degree of UIMA with respect to GATE for each criterion, for example, the cell value ‘‘2:1” means the relative degree of UIMA with respect to GATE for code metric. These relative degree values are also requested by the tool, Yaahp. 10 Yaahp(Yet Another AHP) is a software for analytic hierarchy process(AHP). It provides the construction of the hierarchical model, data entry of judgment matrix, sorting, weight calculation and calculation of data export, and other functions.


14 Table 8

Y. Bian et al. Pairwise comparison matrix.

Criteria

Code metric

Code smell

Bug

Bug fix time

Users’ manual

Code metric Code smell Bug Bug fix time Users’ manual

1

1 1

2/3 2/3 1

4/3 4/3 2 1

4/3 4/3 2 1 1

5. Discussion There are many other criteria not considered in the case study. First, the quality standard: In this paper, the quality standard, the ISO/IEC 9126 is adopted as a basis. However, as time goes by, the ISO/IEC 9126 has been recently replaced by the ISO/IEC 25010 [29]. One important change in the ISO 25010 is that Security appeared as one of the main software product quality characteristics [30]. Although there are multiple software packages available to adopt, it is not easy to select an appropriate tool for use since these tools are similar in some aspects in biomedical domain and most software products are not safety critical software, for example, the two NLP frameworks chosen in this study, UIMA and GATE. They are similar in design and purpose: Both represent documents as text plus annotations and allow users to define pipelines of processes that manipulate the document [31], and they are both developed in Java. Therefore, the two quality characteristics in the ISO/IEC 9126, functionality and portability are not suitable for comparison. In addition, according to the experts and engineers from biomedical informatics, the following considerations are they most concerned with: Will the software be maintained easily in the future? Does the software have strong community support – bugs will be fixed in time and software will be maintained? Is the software useful, usable and easy to use – user manual, helpdesk etc? If it has, does it provide detailed information? To sum up, the paper adopt the ISO/IEC 9126 as a basis, in which the three characteristics: reliability, maintainability, and usability are considered. Since the ISO 25010 contains more facts than the ISO 9126, we consider it to assist the decision. The following three quality attributes in ISO 25010 can be compared by taking the example of UIMA and GATE. The first is efficiency: Processing large (or many) documents in GATE requires lots of memory and GATE needs lots of space to store annotations.11 So GATE uses more resources than UIMA. The second is compatibility: UIMA is compatible with a large set of external NLP tools. These include OpenNLP, DKPro Core, and JULIE Lab [32] while GATE is limited to Java, therefore it is difficult for GATE to make integration with CL tools written in other languages. The third is security: For UIMA, the 11 http://stackoverflow.com/questions/15082298/getting-oom-while-using-gate-on-large-data-set/15199743# 15199743.



15

Apache Security Team12 provides information on known security vulnerabilities while no security supports are available from GATE’s web site. Therefore, the result is also clear if we select the ISO 25010 standard as a assist. In the future, the ISO 25010 or a updated one will be considered as a basis for evaluating the software quality, especially for safety critical software. Second is the OSS community. Development communities are the main difference between commercial and F/OSS products. For OSS, many important assets of the community are also open for inspection. Literature [33] listed three aspects of quality characteristics related to the community: Maintenance capacity, sustainability and process maturity. The definition of maintenance capacity is the ability of a community to provide the resources necessary for maintaining its product(s) (e.g., implement changes, remove bugs, provide support) over a certain period of time. Therefore, the bug and bug-fix time of the selected criteria in this paper are related to this aspect (maintenance capacity). The dataset of bug and bug-fix time used in our work was extracted from two online bug tracking systems – the bug repositories of UIMA and GATE. The fact that the percentage of the unresolved bugs of one product is lower than the other means more people maintain the product than the other. Third is the lifespan of the project. Given the fact GATE has been active since 1995 with over 2000 citations while UIMA is relatively new with over 500 citations based on Google Scholar. Last is the adoption base. For example, there are many more NLP applications built on the top of GATE which provides more ready-touse codes. 6. Conclusion A quantitative decision method, CFDHP, for evaluation, comparison, and selection of the complex frameworks in biomedical research domain is proposed in this study, in which the DHP method is integrated with the criteria formulation. The first stage of CFDHP provides the necessary information for the following steps by means of developing a set of criteria. The important attributes of a framework such as reliability, maintainability, and documentation with respect to project requirements and priorities can be obtained by using these criteria. Based on these criteria, the proposed method should be immediately useful since it provides a systematic way to document the decision making process and make better decisions. We plan to apply the CFDHP to other research or application areas such as telecommunications, financial, and electric power transmission. Obviously, the criteria should be different in various areas. Acknowledgments We thank the associate editor and the anonymous reviewers for their useful and constructive feedback. The presented research was supported by Harbin Normal 12

http://www.apache.org/security/.


16 Table 9

Y. Bian et al. Proportional scale [5].

The degree of importance of Ci with respect to Cj

Quantitative values

Ci and Cj are equally important Ci is weakly more important than Cj Ci is strongly more important than Cj Ci is very strongly more important than Cj Ci is absolutely more important than Cj Used to compromise between two judgements

1 3 5 7 9 2, 4, 6, 8

University Youth Academic Fund (No. KGB201216) and Nature Science Foundation of Heilongjiang Province, China (No. F201321). Appendix A. See Table 9.

References [1] Clement J. McDonald, Gunther Schadow, Michael, et al, Open source software in medical informatics–why, how and what, Int. J. Med. Inform. 69 (2003) 175–184. [2] Anton Gerdessen, Framework comparison method, PhD thesis, University of Amsterdam, 2007. [3] Behshid Behkamal, Mohsen Kahani, Mohammad Kazem Akbari, Customizing ISO 91*26 quality model for evaluation of B2B applications, Inform. Softw. Technol. 51 (3) (2009) 599–609. [4] Thomas L. Saaty, Decision making with the analytic hierarchy process, Int. J. Services Sci. 1 (2008) 83–98. [5] Reza Khorramshahgol, Vassilis S. Moustakis, Delphic hierarchy process (DHP): a methodology for priority setting derived from the Delphi method and analytical hierarchy process, Eur. J. Oper. Res. 37 (1988) 347– 354. [6] Michail D. Nerantzidis, Delphic hierarchy process (DHP): a methodology for the resolution of the problems of the evaluation of corporate governance quality, in: The 10th European Academic Conference on Internal Audit and Corporate Governance, 10th, 2012. [7] M. Casagni, M. Lyell, Comparison of two component frameworks: the FIPA-compliant multi-agent system and the web-centric J2EE platform, in: 25th International Conference on Software Engineering, 2003, pp. 341–351. [8] K. Katrinis, G. Parissidis, B. Plattner, A comparison of frameworks for multimedia conferencing: SIP and H.323, in: 8th IASTED International Conference Internet Multimedia Systems and Applications (IMSA 2004), USA, 2004. [9] Stevan Mrdalj Lise Urbaczewski, A comparison of enterprise architecture frameworks, Issues Inform. Syst. 7 (2006) 18–23. [10] Ingmar Bitter, Robert Van Uitert, Ivo, et al, Comparison of four freely available frameworks for image processing and visualization that use ITK, IEEE Trans. Visual. Comput. Graph. 13 (2007) 483–493. [11] J.-C. Mateo-Velez, J. Roussel, V. Inguimbert, Mengu Cho, K. Saito, D. Payan, SPIS and MUSCAT software comparison on LEO-like environment, IEEE Trans. Plasma Sci. 40 (2) (2012) 177–182. [12] Dai Zhong Su, Zhong Ming Ren, Comparison of different life cycle impact assessment software tools, Key Eng. Mater. 572 (2013) 44–49. [13] Young Min Lee, Jong Bae Kim, Woo et al. A study on selection process of open source software, in: Sixth International Conference on Advanced Language Processing and Web Information Technology, 2007, pp. 568–571. [14] Davide Taibi, Luigi Lavazza, Sandro Morasca, OpenBQR: a framework for the assessment of OSS, in: Open Source Development, Adoption and Innovation, vol. 234, 2007, pp. 173–186. [15] Lerina Aversano, Maria Tortorella, Applying EFFORT for evaluating CRM open source systems, in: Product-Focused Software Process Improvement, Lecture Notes in Computer Science, vol. 6759, 2011, pp. 202–216.



17

[16] Mohamed Sarrab, Osama M. Hussain Rehman, Empirical study of open source software selection for adoption, based on software quality characteristics, Adv. Eng. Softw. 69 (0) (2014) 1–11. [17] Inc Scientific Toolworks, Understand for Java: User Guide and Reference Manual, version 2.5, Technical report, 2010. [18] R. Subramanyam, M.S. Krishnan, Empirical analysis of CK metrics for object-oriented design complexity: implications for software defects, IEEE Trans. Softw. Eng. 29 (2003) 297–310. [19] Kuljit Kaur Chahal, Hardeep Singh, Metrics to study symptoms of bad software designs, SIGSOFT Softw. Eng. Notes 34 (2009) 1–4. [20] A. Gunes Koru, Khaled El Emam, The theory of relative dependency: higher coupling concentration in smaller modules, IEEE Softw. 27 (2010) 81–89. [21] M.A. Parande, G. Koru, A longitudinal analysis of the dependency concentration in smaller modules for open-source software products, in: IEEE International Conference on Software Maintenance (ICSM), 2010, pp. 1–5. [22] Martin Fowler, Refactoring: Improving the Design of Existing Code, Addison Wesley, 1999. [23] Hui Liu, Zhiyi Ma, Weizhong Shao, Zhendong Niu, Schedule of bad smell detection and resolution: a new way to save effort, IEEE Trans. Softw. Eng. 38 (2012) 220–235. [24] Sunghun Kim, Whitehead, How long did it take to fix bugs? in: Proceedings of the International Workshop on Mining Software Repositories, USA, 2006, pp. 173–174. [25] John Anvik, Lyndon Hiew, Gail C. Murphy, Coping with an open bug repository, in: Proceedings of the 2005 OOPSLA Workshop on Eclipse Technology eXchange, eclipse ’05, 2005, pp. 35–39. [26] Antonio Cerone, Siraj Ahmed Shaikh, Towards a metric for open source software quality, Electron. Commun. EASST 20 (2009) 1–11. [27] Nick Rutar, Christian B. Almazan, Jeffrey S. Foster, A comparison of bug finding tools for Java, in: Proceedings of the 15th International Symposium on Software Reliability Engineering, 2004, pp. 245–256. [28] R.K. Saha, S. Khurshid, D.E. Perry, An empirical study of long lived bugs, in: IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), February 2014, pp. 144–153. [29] ISO/IEC 25010:2011 Systems and software engineering C Systems and software Quality Requirements and Evaluation (SQuaRE) C System and software quality models, 2011. [30] Haiyun Xu, Jeroen Heijmans, Joost Visser, A practical model for rating software security, in: Proceedings of the 2013 IEEE Seventh International Conference on Software Security and Reliability Companion, USA, 2013, pp. 231–232. [31] Nancy Ide, Keith Suderman, Bridging the gaps: interoperability for graf, gate, and uima, in: Proceedings of the Third Linguistic Annotation Workshop, Stroudsburg, PA, USA, 2009, pp. 27–34. [32] Peter Exner, Pierre Nugues, KOSHIK- a large-scale distributed computing framework for NLP, in: International Conference on Pattern Recognition Applications and Methods, 2014, pp. 463–470. [33] Martin Soto, Marcus Ciolkowski, The QualOSS open source assessment model measuring the performance of open source communities, in: The 3rd International Symposium on Empirical Software Engineering and Measurement, Washington, DC, USA, 2009, pp. 498–501.


Criteria-formulating delphic hierarchy process: A ...

Criteria-formulating delphic hierarchy process: A ...

Suggest Documents

The Analytic Hierarchy Process

The Analytic Hierarchy Process

the analytical hierarchy process

Analytic Hierarchy Process Aprroach

CONSISTENCY IN THE ANALYTIC HIERARCHY PROCESS: A ...

Using Analytic Hierarchy Process as a Decision

the analytical hierarchy process - isahp

Analytic Hierarchy Process aided Key

the analytic hierarchy process (ahp)

Delphic Article II

the analytic hierarchy process. - IDA

The Analytic Hierarchy Process - CiteSeerX

Delphic Maxims.pdf - Google Drive

Freight Villages, Logistic Management, Analytic Hierarchy Process ...

WHY FUZZY ANALYTIC HIERARCHY PROCESS ... - CORE

Understanding the Analytic Hierarchy Process - Springer

Enhanced Analytical Hierarchy Process for U

Analytic hierarchy process: An overview of applications

Analytic Hierarchy Process and Its Application in

ADVANCED APPLICATION OF ANALYTIC HIERARCHY PROCESS

the Analytical Hierarchy Process Approach - Semantic Scholar

APPLICATION OF THE ANALYTIC HIERARCHY PROCESS ... - isahp

An Analytic Hierarchy Process Approach in ...

Analytical Hierarchy Process Based Flexibility ...