Detecting Code Smells in Software Product Lines—An Exploratory Study

4 downloads 0 Views 186KB Size Report
Abstract— Code smells are symptoms that something is wrong in the source code. They have been catalogued and investigated in several programming ...
2015 12th International Conference on Information Technology - New Generations

Detecting Code Smells in Software Product Lines - An Exploratory Study Ramon Abílio

Juliana Padilha, Eduardo Figueiredo

Heitor Costa

IT Department Federal University of Lavras Lavras, Minas Gerais, Brasil [email protected]

Department of Computer Science Federal University of Minas Gerais Belo Horizonte, Minas Gerais,Brazil {juliana.padilha, figueiredo}@dcc.ufmg.br

Department of Computer Science Federal University of Lavras Lavras, Minas Gerais, Brasil [email protected]

despite of the mechanisms for feature modularization, source code developed with FOP may present symptoms of poor quality (low cohesion [2] or code clones [28]). Some refactoring methods were adapted from traditional methods to minimize code clones [29], called code smell [10]. Code smell is an indication that there might be something wrong in the source code. Different code smells have been defined and catalogued [10]. Similarly, a set of code smells based on AOP constructs and abstractions were proposed [21]. In fact, we can find code smells in systems written in every programming technique. However, we have not found systematic studies concerning code smell characterization and detection on FOP-based SPL. The main goal of this work is to propose means to detect three code smells in FOP source code (God Method, God Class, and Shotgun Surgery). To detect code smells, developers may perform manual inspection or they may use heuristics based on software metrics [12, 22]. To achieve the study goal, we studied strategies available in literature to detect these code smells [12, 22]. We performed an exploratory study with eight SPLs developed with AHEAD and 26 participants to detect code smells in a SPL by using source code metrics. The results show that these code smells may occur in SPL and the proposed strategies may be useful in their detection. In summary, main contributions are definitions of three code smells adapted to address specific characteristics of FOP, eight measures for compositional approaches, and three metrics-based detection strategies. This paper is organized as follows. Section II presents background on software composition with AHEAD, code smells, and software measures. Section III presents code smells related to SPL and detection strategies. Section IV details the study settings. Section V discusses results. Section VI presents threats to validity and related work. Section VII shows final remarks.

Abstract— Code smells are symptoms that something is wrong in the source code. They have been catalogued and investigated in several programming techniques. These techniques can be used to develop Software Product Lines (SPL). However, feature-oriented programming (FOP) is a specific technique to deal with the modularization of features in SPL. One of the most popular FOP languages is AHEAD and, as far as we are concerned, we still lack systematic studies on the categorization and detection of code smells in AHEAD-based SPL. To fill this gap, this paper extends the definitions of three traditional code smells, namely God Method, God Class, and Shotgun Surgery, to take into account FOP abstractions. We then proposed 8 new FOP measures to quantify specific characteristics of compositional approaches like AHEAD. Finally, we combine the proposed and existing measures to define 3 detection strategies for identifying the investigated code smells. To evaluate the detection strategies, we performed an exploratory study involving 26 participants. The study participants rely on metrics to identify code smells in 8 AHEAD systems. Our results show that the proposed detection strategies can be used as code smell predictor since statistical tests indicate agreement among them and the study participants. Keywords- Software Product Lines, Code Smells, Detection Strategies

I. INTRODUCTION Software Product Line (SPL) is an approach to support the software design and development aiming to promote large-scale and systematic reuse of components [23]. This reuse is possible because common features of a domain compose kernel and other features define points of variation [23]. Features are visible and relevant items to end-users, quality or property of software [15]. Feature-Oriented Programming (FOP) is a paradigm for software modularization in which features are the main abstractions [24]. Features can be realized in separate artifacts using composition approaches with FOP and Aspect-Oriented Programming (AOP) or may be identified into the sourcecode using annotative approaches [3]. In regards of composition approach, different techniques may be used to modularize features (AHEAD [4], AspectJ [16], and CaesarJ [19]). Some of those techniques have been studied in order to compare their capability to implement features as basic blocks and to group these blocks and assign a name to them so that they can be manipulated as cohesive modules [14]. In general, studied techniques are based on base code that has main implementation and delta that has fragments of code that add functionality to the base code. In 978-1-4799-8828-0/15 $31.00 © 2015 Crown Copyright DOI 10.1109/ITNG.2015.76

II.

BACKGROUND

A. Feature Oriented Programming with AHEAD FOP is addressed by using AHEAD [4] for feature implementation. AHEAD is a compositional approach based on gradual refinements, in which programs are defined as constants and features. The latter are added using refinement functions. The AHEAD Tool Suite (ATS) was developed to support FOP in AHEAD. It is composed of tools for realization and composition of features and uses Jak programming language.

433

B. Code Smells Code smells aim to diagnose symptoms that may indicate something wrong in the design [10]. They may be present in classes and methods. For example, classes may suffer from God Class and Shotgun Surgery and methods may suffer from God Method. God Classes are classes with many instance variables and tendency to centralize the intelligence of system or to perform too much work on its own [10, 12, 25]. Shotgun Surgery may occur when a kind of change is made and many little changes to different classes may be necessary. When changes are spread over the source code, they are hard to find, and it is easy to miss an important change [10, 12]. God Methods are methods with many branches and tends to realize many responsibilities [22, 30].

A. Feature-Oriented Software Metrics We proposed measures to address specific characteristics of compositional approaches, such as, constants and refinements. Table I presents them with names, acronyms, and brief descriptions. For example, Total Number of Constants (TNCt) indicates number of constants (classes or interfaces) in a SPL. We proposed these measures because previous studies [1, 20] did not show specific ones for compositional approaches. Number of Features (NOF) [13] counts number of features in a SPL. We are interested in features of code artifact; we adapted the original definition. TABLE I.

C. Software Metrics and Detection Strategies Software metrics can be used to quantify software quality attributes (e.g., coupling and cohesion) [5]. Metrics are often too fine grained to comprehensively quantify deviations from good design principles [18], but researchers [12, 18] proposed a mechanism (detection strategy) for formulating metrics-based rules that capture code smells. A detection strategy is a composed logical condition, based on metrics, which detects design fragments with specific code smells. For instance, we present below a detection strategy proposed by Marinescu [18] that detecting Shotgun Surgery [10]

B. SPL God Method SPL God Method may start out as a “normal” method, have a “normal” size, and realize only one concern. However, its responsibilities can grow due to number of features that overriding or refining this method. The detection strategy of this smell is based on methods with concentrate responsibilities, represented by methods' overrides and refinements, and long and complex methods. Four measures were selected: Number of Operation Overrides (NOOr), Number of Method Refinements (NMR), Method's Lines of Code (MLoC), and McCabe's Complexity Cyclomatic (Cyclo). A method with SPL God Method can have many overrides and/or be long and complex. This detection strategy is

((CM, TopValues(20%)) AND (CM, HigherThan(10))) AND (CC, HigherThan(5))

This strategy is based on Changing Method (CM) metric and Changing Classes (CC) metric. CM metric counts number of distinct methods that access an attribute or call a method of the given class [18]. CC metric counts number of classes that access an attribute or call a method of the given class [18]. TopValues and HigherThan are filtering parameterized mechanisms with value (threshold). TopValues selects the percentage of elements with the highest values for a metric. HigherThan selects all elements with values above a given threshold. For instance, Shotgun Surgery strategy above says that a class should neither have CC higher than 5 nor should not have methods with the top 20% highest CM if these methods have CM higher than 10. Current design heuristic rules focus on OOP-based single software. Therefore, they do not target at SPLs and FOP. III.

FEATURE-ORIENTED SOFTWARE METRICS

Acronym Name Description NOF Number of Features Number of Features which has code artifacts NCR Number of Constant Refinements Number of refinements which a constant has NMR Number of Method Refinements Number of refinements which a method has TNCt Total Number of Constants Number of constants (classes, interfaces - constant) TNR Total Number of Refinements Total of refinements (classes, interfaces - refinement) TNMR Total Number of Method RefinementsTotal of refinements of a method TNRC Total Number of Refined Constants Total of refined constants TNRM Total Number of Refined Methods Total of refined methods

((NOOr+NMR) > HIGH) OR ((MLoC > AVG) AND ((Cyclo/MLoC) > HIGH))

C. SPL God Class SPL God Class may be a God Class and have many refinements, i.e., it may be large and complex, and tend to concentrated responsibilities. The detection strategy is based on classes with high number of refinements, and large and complex classes coupled to other classes. Four measures were selected: Number of Constant Refinements (NCR), Lines of Code (LoC), Weighted Methods per Class (WMC), and Coupling between Object Classes (CBO). First, NCR is used to detect classes that have high number of refinements. After, LoC, WMC, and CBO are used to filter large and complex classes coupled to other classes. This detection strategy is

METRIC-BASED CODE SMELL DETECTION IN SPL

This section presents three traditional code smells whose definition we adapted to address characteristics of SPL and their detection strategies. The proposed detection strategies were adapted from the literature [12, 22]. To define the thresholds, we used statistical values (average and standard deviation) calculated from a set of projects whose values are low (average minus standard deviation), average, and high (average plus standard deviation) [12]; these values were used because exist studies that use them ([12]). LOW, AVERAGE, and HIGH labels are used in the strategies because real values may be changed depending on the context.

(NCR > HIGH) OR ((LoC > HIGH) AND (CBO > LOW) AND ((WMC/LoC) > HIGH))

D. SPL Shotgun Surgery SPL Shotgun Surgery may occur in classes with many refinements, attributes, and operations, and are coupled to other classes. In other words, classes that share a lot of attributes and operations with several refinements, and have high number of relationships with other classes may suffer

434

from SPL Shotgun Surgery. The detection strategy is based on classes with high number of refinements, and classes with high number of attributes and operations coupled to other classes. Four measures were selected: Number of Constant Refinements (NCR), Number of Attributes (NOA), Coupling between Object Classes (CBO), and Number of Operations (NOO). First, NCR is used to detect classes with number of refinements. After, NOA, NOO, and CBO are used to filter classes with many attributes and operations, and are coupled to other classes. This detection strategy is

(basic), except for JP, in which they classified their knowledge in 3, 4, and 5 (intermediate to advanced). In spite of the basic knowledge in SM, FOP, and SPLs, 85% of participants have experience working in companies and they are spread in each level. This homogeneity conducted the participants in a similar behavior in their inspections. TABLE III. 1

(NCR > HIGH) AND (NOA > HIGH) AND (NOO > HIGH) AND (CBO > LOW)

IV.

TABLE II.

TNC 61 233 963 46 68 21 75 88

P02, P05, P08, P18, P23

4

5

P06, P09, P10, P13, P15, P16, P20, P21

P04, P08, P12, P14, P22

P03, P09, P14, P15, P20

P24

P26

P07, P26

We divided the study in two 90-minute sessions. The first one was a training session to allow participants to familiarize themselves with the evaluated measures, the target code smells, and AHEAD. In the second one, we divided the participants in three groups concerning their institutions. Each group worked with one code smell and did not have access to source code. Each participant received a document with: i) TankWar SPL's description, measures, and target code smell; ii) thresholds; and iii) 60 operations/components and respective measures. The participants started indicating which thresholds they used, inspecting components (making a check if the component has code smell), and indicating which measures were considered.

SOFTWARE PRODUCT LINES NOF 15 25 106 16 11 12 36 30

Knowledge Level 3 P01, P02, P03, P07, P11, P17, P18, P19, P23, P24, P25, P26

P01, P07, P11, P04, P06, P10, P12, P13, P16, P17, P26 P19, P21, P22, P25 P06, P10, P12, P01, P02, P03, P13, P14, P15, P07, P08, P18, P20 FOP P04, P05, P09, P17, P19, P21, P11, P16, P22 P23, P24, P25 P01, P03, P06, P08, P10, P12, P04, P13, P15, SPL P02, P05, P09, P11 P14, P16, P17, P18, P24 P19, P20, P21, P22, P23, P25 SM

STUDY SETTINGS

Domain Grammar tool Graphical configuration tool Programming tool Local searcher tool E-mail and Instant Message client Expression evaluation Graph and algorithm library Game

P05

JP

A. Analyzed SPLs, Participants, and Tasks We analyzed eight SPLs available in FeatureIDE Eclipse plug-in and AHEAD. Table II details the SPLs by showing names, domains, NOF, number of components (TNC), and number of lines of code (TLoC). These SPLs are from different domains and their NOF ranges from 11 to 106. To realize those features, TNC ranges from 21 to 963. In those components, TLoC ranges from 98 to 16,719. We can note that TNC and TLoC do not grow proportionally to NOF.

SPL AHEAD-Bali AHEAD-Guidsl AHEAD-Java DesktopSearcher Devolution EPL GPL TankWar

2

PARTICIPANTS' KNOWLEDGE

TLoC 3,988 8,738 16,719 1,858 3,913 98 1,824 4,669

This study involved 26 participants taking an advanced Software Engineering course from two different educational institutions. In the first one, one participant is undergraduate student, nine are master's students, and one is a professional. In the second one, six participants are undergraduate, five are master's students, and four are PhD students. We used a background questionnaire to know previous knowledge of each participant. The participants were identified with a code (P) and seem to be homogeneous in regards to their courses, because 96% of them took some course on OOP, only 8% did not take course on Java Programming (P05 and P21), and 23% did not take courses on Software Modeling (P03, P10, P13, P20, P21, and P23). In regards to work experience, 15% never worked (P08, P10, P23, and P24); 4% worked almost one year (P07); 35% had two years of experience working in companies (P01, P06, P09, P13, P16, P17, P20, P21, and P25); and 46% worked more than three years (P02, P03, P04, P05, P11, P12, P14, P15, P18, P19, P22, and P26). Therefore, 84% of participants have some experience working in companies. The participants classified their knowledge in Java Programming (JP), Software Measurement (SM), FeatureOriented Programming (FOP), and Software Product Lines (SPL) in a scale from 1 (No Knowledge) to 5 (Expert) (Table III). The participants seem to be homogenous in regards to their knowledge, since they are concentrated in 1 and 2

B. Statistical Analysis We used inter-rater agreement Cohen's kappa measure to evaluate the agreement among the participants (completely agreement = 1 and no agreement = 0) [6] and Binomial and Sign hypotheses test to evaluate if metrics-based detection strategies can be used as a code smell predictor [26]. To verify the agreement among the participants, components and operations analyzed were divided in two categories: i) Indicated: components/operations with code smell value = 1); and ii) No-indicated: components/operations without code smell (value = 0). We used a statistical significance test (p-value) to evaluate if the agreement is reasonable. The null (H0) and alternative (H1) hypotheses are: H0: There is no agreement among participants (Kappa = 0) H1: There is agreement among participants (Kappa > 0)

We used Binomial and Sign test on the agreement between the detection strategy and the majority of participants. The agreement between them is represented with value = 1 when they agree and value = 0 when they did not agree. The H0 and H1 hypotheses are: H0: The agreement between strategy and participants occurred by chance H1: The agreement between strategy and participants did not occur by chance

435

thresholds, in which the difference between AVERAGE and HIGH is higher. In regards to SPL God Class detection, P02, P04, and P01 consider that a large class has number of lines higher than LOW, AVERAGE, and HIGH, respectively. We noted most participants have selected AVERAGE for LoC, NOO, CBO, and NCR. It means if a component has more LoC than the average of components then it is a long component. Observing NOA and WMC/LoC, we noted that participants were divided between AVERAGE and HIGH. There is no consensus among the participants on what is a component with lots of attributes and high complex component. In SPL Shotgun Surgery detection, P06 has considered that a component with many operations or a component with many refinements has more operations than the AVERAGE. We noted that most participants has selected AVERAGE for NOO, NOA, and NCR. It means if a component has more operations, attributes, and refinements than the average of components then it has many operations and attributes and might shares them with many refinements. Observing CBO, we noted that the participants were divided between LOW and AVERAGE. There is no consensus among them on what is a component with high coupling.

In both tests, if p-value > 0.05 then H0 is accepted else H1 is accepted. V.

RESULTS AND DISCUSSIONS

This section presents the collected data and the data analyses. The background was collected via questionnaire and data were obtained from the documents inspected by the participants. We used statistical tests to test participants' agreement and detection strategies' results. A. Selected Thresholds The participants have selected threshold values for each code smell (Table IV). For example, eight participants selected AVERAGE for MLoC in their SPL God Method inspection. In SPL God Method detection, P08 and P03 considered that a large method has number of lines higher than AVERAGE and HIGH, respectively. We noted that majority participants chosen AVERAGE for the measures (MLoC 73%, Cyclo/MLoC - 64%, NMR+NOOr - 82%). Only 27% and 18% of participants used HIGH for MLoC and NMR+NOOr, respectively. In regards to Cyclo/MLoC, participants disagreed among them, 64% used the AVERAGE and 36% used the HIGH, differently of the other TABLE IV.

Low Average High

MLoC 0 8 3

SPL God Method Cyclo/MLoC NMR+NOOr 0 0 7 9 4 2

LoC 2 4 1

THRESHOLDS SELECTED BY THE PARTICIPANTS NOO 1 5 1

SPL God Class NOA CBO 1 2 3 4 3 1

NCR 0 4 3

WMC/LoC 1 3 3

NOO 0 6 2

SPL Shotgun Surgery NOA CBO 0 4 5 3 3 1

NCR 0 6 2

The SPL God Method's detection strategy indicated 52% of operations with code smell and 48% of operations no code smell. Considering strategy's indications and agreement among participants, they agreed in 97% of indications and disagreed in 3% of indications. These two operations were not indicated by the strategy because Cyclo was compared only to higher values than HIGH. For example, one operation has 0.24 for Cyclo (equal to HIGH) and other one has 0.23 for Cycle (lesser than HIGH).

B. Results and Analyses of SPL God Method To detect God Method, 11 participants (P01-P11) have inspected 60 operations. We noted that 62% of operations were indicated with code smell (four operations were indicated by five or less participants and 33 operations were indicated by seven or more participants). Five participants failed in indicating these operations because they compared different thresholds. For example, P01 has indicated the use of AVERAGE for Cyclo, but assigned Cyclo of the operation 5 as LOW. Majority of participants indicated 89% from the 37 operations, but we observed that there is not agreement among them, since the numbers of indications varied from 7 to 11: i) seven participants agreed in one operation; ii) eight participants agreed in five operations; iii) nine participants agreed in three operations; iv) 10 participants agreed in eight operations; and v) all participants agreed in 16 operations. The disagreements occurred due to selected thresholds, different operators used in the comparison, failures in detection, and changes in thresholds. For example, a disagreement in an operation because three participants used HIGH for MLoC while the eight participants used AVERAGE. In other operation, four participants used HIGH for Cyclo, but two participants of them considered only values higher than HIGH while other two participants considered values higher or equal to HIGH. P11 failed in detecting four operations. Since, MLoC and Cyclo are higher than thresholds and operations were not indicated.

C. Results and Analyses of SPL God Class To detect SPL God Class, seven participants (P12-P18) inspected 60 components. Participants did not indicate 33% of components and 67% of components were indicated as having code smell (23 components were indicated by 4 or less participants and 17 components by 5 or more). The participants failed in indicating these 23 components because, in general, they considered only one measure in their analysis. We observed that the majority of participants were indicated 17 from the 40 components by: i) five indications for five components; ii) six indications for five components; and iii) seven participants for seven components. This fact occurred due to selected thresholds and failures in detection. For example, P15 did not indicate one component because HIGH was used for CBO while the other five participants used the LOW or AVERAGE. More four participants failed in their analyses because they considered only one measure and these failures caused 51 wrong indications: i) P12: 1 failure; ii) P13: 36 failures;

436

iii) P16: 13 failures; and iv) P17: 1 failure. P14 indicated one component and P16 indicated three components wrongly because these components have value of measures lower than the thresholds indicated by them. The detection strategy indicated 15 components with code smell and 45 components no smell. Considering strategy's indications and agreement among the participants, they agreed in 97% of indications and disagreed in 3% of indications. Two components were not indicated because it compares WMC/LoC only to higher values than HIGH. For example, one operation has WMC/LoC = 0.23 and other operation has WMC/LoC = 0.12 (lesser than HIGH - 0.24). In fact, the last one has five indications, but two indications wrongly were indicated by P13 and P16. Thus, the agreement would be 59 in 60 (98%).

in regards to operations/components detected with/without code smell. We took the participants' results as an oracle and used the hypothesis test to calculate the probability of the detection strategies results have been by chance. Cohen's Kappa measures for the frequencies of SPL God Method, SPL God Class, and SPL Shotgun Surgery are presented in Table V. For example, Cohen's Kappa measure for SPL God Method is 0.793 that indicates a high agreement among participants because the measure is close to 1 then the higher is the agreement. The p-value < 0.001 and, using 0.05 as the significance level, we can reject H0 (p-value < 0.05) and accept H1 with the confidence of 95% (upper: 0.827 and lower: 0.758). We have reasonable agreement among participants (moderate to high). The agreement in SPL God Class and SPL Shotgun Surgery would be better if we had low number of failures in indications. Statistical evidences show that agreement among participants did not occur by chance (p-value < 0.001) and we used indications as oracle and performed Binomial and Sign tests to verify the detection strategy's results have been by chance.

D. Results and Analyses of SPL Shotgun Surgery To detect SPL Shotgun Surgery, eight participants (P19P26) have inspected 60 components. Participants did not indicate 30 components with code smell (20 components indicated by four or less participants and 10 components indicated by five or more participants). The participants failed in indicating 20 components because, they considered only one measure in their analysis or did not consider NCR. For example, P19, P20, P23, and P25 considered only NOA while analyzing one component. Besides, the majority of participants was indicated 33% from 30 components: i) five components received five indications; ii) one component received six indications; iii) one component received seven indications; and iv) three components received eight indications. It occurred because participants selected different thresholds and/or failed in their analysis. For example, P24 did not indicate one component because HIGH was used for CBO. P22 did not indicate other component because the value of measures is higher than the thresholds and the participant did not indicate the component. More six participants failed in their analyses because they considered one or two measures and these failures caused 53 wrong indications: i) P19: 17 failures; ii) P20: 3 failures; iii) P23: 19 failures; iv) P24: 1 failures; v) P25: 12 failures; and vi) P26: 1 failures. P19, P20, P23, P25, and P26 did not indicate 18 components, because they did not consider NCR. P19 and P20 did not indicate one component because they did not consider NOA. The detection strategy indicated 4 components with code smell and 56 components no code smell. Considering strategy's indications and agreement among the participants, they agreed in 54 indications and disagreed in six indications. These six components were not indicated by the strategy because four have NCR = 0, one has no refinements and NOO is lower than HIGH, and other has NOA lower than HIGH. Participants and strategy disagreed only in component 59 because the participants have wrongly indicated other five components.

TABLE V.

Cohen's Kappa

COHEN 'S KAPPA MEASURES

SPL God Method

SPL God Class

0.793

0.427

Confidence Interval - 95% upper: 0.827 / lower: 0.758 upper: 0.506 / lower: 0.396 p-value

< 0.001

< 0.001

Interpretation

High Agreement

Moderate Agreement

SPL Shotgun Surgery 0.427 upper: 0.474 / lower: 0.379 < 0.001 Moderate Agreement

We used Binomial and Sign tests on the agreement between detection strategy and majority of participants. Detection strategy and participants agreed in 58 of the 60 inspections in SPL God Method and SPL God Class and in 54 of the 60 inspections in SPL Shotgun Surgery. The statistical test with 0.5 of “success” probability in each inspection shows that we can reject H0 and accept H1 (pvalue < 0.0001). The agreement among detection strategies and participants did not occur by chance and statistical evidences show that strategies can be used to detect the SPL code smells analyzed in agreement with participants. VI.

RELATED WORK AND THREATS TO VALIDITY

Related Work. Many of the existing studies are about code smells in OO software [5, 12, 18]. Several studies use metrics to evaluated diverse attributes of software, such as, modularity [27], instability [9, 11], and error-proneness [7, 8] or other peculiarities. However, these studies are limited to OOP and AOP [17]. In contrast, our work focuses on code smell detection and analysis of particularities of FOP-based SPL. Recent research work has explored the use of FOP in the development of SPL focusing on techniques presentation [3, 4, 14], feature cohesion [2], code clones [28], and refactoring methods [29]. There are not systematic studies on categorization and detection of code smells in FOP-based SPL. To fill this gap, this paper extends the definitions of three code smells that take into account FOP abstractions. We focused on AHEAD because it is popular FOP language. Our work extends previous findings [22] by observing which metrics could also serve as reliable indicators of code smells in SPL. We performed an exploratory study in eight

E. Statistical tests We performed statistical tests to verify if the proposed metrics-based supports the SPL code smells detection. In first step, we verified the agreement among the participants

437

new FOP measures to quantify characteristics of compositional approaches in AHEAD. Finally, we combined the proposed and existing measures to define three detection strategies [12, 18]. This study involved 26 participants that performed manual detection using source-code measures, and two statistical tests to verify if the results of our detection strategies are in agreement with the participants. Threats to Validity. This work is a first step in detecting code smells in feature-oriented SPL and have limitations, such as, number of participants in each group, participants' knowledge in FOP, the use of one compositional approach (AHEAD), and number of SPL to calculate the thresholds.

[1]

VII. CONCLUSION

[8]

[2] [3] [4] [5] [6] [7]

In this study, eight measures were proposed to address characteristics of compositional approach, such as, constants and refinements, three metrics-based detection strategies were proposed, and 26 participants have performed manual detections. In the manual inspections, majority of participants selected thresholds with average value. Informally, two participants commented that they selected that value because they were in doubt of how selecting threshold and selected average value as a “secure” option. The homogeneity of the participants on courses, work experience, and knowledge conducted them in a similar behavior concerning the selected thresholds and their analyses. We calculated the agreement among them using Cohen's Kappa measure, which showed an agreement from moderate to high in regards to their indications. We noted that it could be better if the training session had exercises on manual detection in addition to exposed theory. Strategies and participants agreed in 97% in detecting SPL God Method and SPL God Class, and in 90% in detecting SPL Shotgun Surgery. Some participants failed in their analysis because they ignored the code smell definition and used only one measure to evaluate its presence. Others participants indicated one threshold and compared a value with other or even with all values in accordance with the thresholds; the component was not indicated. The agreement among the majority of participants and the detection strategies were statistically tested; results have indicated that our detection strategies can be used as a code smell predictor in feature-oriented SPL. This affirmative is based on the agreement among the strategies and the majority of the participants, and on number of failures in manual inspection, which can be avoided with an automatic inspection using proposed strategies. As future work, we suggest to develop a tool to measure different source-code based on compositional approach, perform other study with more participants and SPLs to investigate other code smells and elaborate new detection strategies, and perform a study on code smells in an evolving feature-oriented SPL. Besides, to study what specialists consider as a large and complex method, and long class.

[9] [10] [11] [12] [13]

[14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29]

ACKNOWLEDGMENT We thank FAPEMIG, CNPq e CAPES by financial support.

[30]

REFERENCES

438

Abilio, R.; et al. A Systematic Review of Contemporary Metrics for Software Maintainability. In: SBCARS, pp. 130-139 (2012) Apel, S.; Beyer, D.: Feature Cohesion in Software Product Lines: An Exploratory Study. ICSE, pp. 421-430 (2011) Apel, S.; Kastner, C.: An Overview of Feature-Oriented Software Development. IN: JOT, v.8, n.5, pp.49-84 (2009) Batory, D.; et al. Scaling Step-Wise Refinement. In: ICSE, pp.187-197 (2003) Chidamber, S. R.; Kemerer, C. F. A Metrics Suite for Object Oriented Design. Transactions on Software Engineering, vol.20, n.6, pp.476-493 (1994) Cohen, J. Weighted Kappa: Nominal Scale Agreement Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, vol.70, n.4, pp.213-220 (1968) Eaddy, M.; et al. Do Crosscutting Concerns Cause Defects? Transactions on Software Engineering, 497-515 (2008) Ferrari, F.; et al. An Exploratory Study of Fault-Proneness in Evolving Aspect-Oriented Programs. ICSE, 65-74 (2010) Figueiredo, E.; et al. Evolving Software Product Lines with Aspects: an Empirical Study on Design Stability. ICSE, 261-270 (2008) Fowler, M.; et al. Refactoring: Improving the Design of Existing Code. Addison-Wesley (1999) Greenwood, P.; et al. On the Impact of Aspectual Decompositions on Design Stability: An Empirical Study. ECOOP (2007) Lanza, M.; Marinescu, R. Object-Oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems. (2006) Lopez-Herrejon, R.; Apel, S. Measuring and Characterizing Crosscutting in Aspect-based Programs: Basic Metrics and Case Studies. International Conference on Fundamental Approaches to Software Engineering, pp. 423-437 (2007) Lopez-Herrejon, R.; et al. Evaluating Support for Features in Advanced Modularization Technologies. ECOOP, pp. 169-194 (2005) Kang, C. K.; et al.: Feature-Oriented Domain Analysis (FODA) Feasibility Study. Software Engineering Institute, Tech. Rep. CMU/SEI-90-TR-21, 161p. (1990) Kiczales, G.; et al.: An Overview of AspectJ. ECOOP, pp.327-353 (2001) Kiczales, G. et al.: Aspect-Oriented Programming, ECOOP, 220-242 (1997) Marinescu, R.: Detection Strategies: Metrics-Based Rules for Detecting Design Flaws. ICSM, pp. 350-359 (2004) Mezini, M.; Ostermann, K.: Conquering Aspects with Caesar. International Conference on Aspect-Oriented Software Development, pp.90-99 (2003) Montagud, S.; et al. A Systematic Review of Quality Attributes and Measures for Software Product Lines. In: Software Quality Journal,v.20,n.3-4,pp.425-486 (2012) Monteiro, M.; Fernandes, J.: Towards a Catalog of Aspect-Oriented Refactorings. International Conference on Aspect-Oriented Software Development, pp.111-122 (2005) Padilha, J.; et al. Detecting God Methods with Concern Metrics: An Exploratory Study. Latin-American Workshop on Aspect-Oriented Software Development (2013) Pohl, K.; et al. Software Product Line Engineering: Foundations, Principles and Techniques. Springer, 490p. (2005) Prehofer, C.: Feature-Oriented Programming: A Fresh Look at Objects. ECOOP, pp. 419-443 (1997) Riel, A.: Object-Oriented Design Heuristics. Addison-Wesley Professional (1996) Salkind, N. (Ed.): Encyclopedia of Measurement and Statistics. SAGE Publications (2007) Sant'Anna, C., et al. On the Reuse and Maintenance of Aspect-Oriented Software: An Assessment Framework. In: Brazilian Symposium on Software Engineering, 19-34 (2003) Schulze, S.; et al. Code Clones in Feature-Oriented Software Product Lines. International Conference on Generative Program and Component Engineering, pp. 103-112 (2010) Schulze, S.; et al. Variant-Preserving Refactoring in Feature-Oriented Software Product Lines. Workshop on Variability Modeling of Software Intensive Systems, pp. 73-81 (2012) Sjoberg, D. I. K.; et al. Quantifying the Effect of Code Smells on Maintenance Effort. In: Transactions on Software Engineering, vol.39, no.8, pp.1144-1156 (2013)

Suggest Documents