Experimental approaches in software engineering and about ...

Experimental approaches in software engineering and about software engineering September 2013

Carlo Ghezzi Politecnico di Milano Deep-SE Group @ DEIB

1

Warning: I am NOT an epistemologist Forgive imprecision

2

Experimental approaches IN software engineering

3

Software engineering definition • Software engineering (SE) is the application of a systematic, disciplined, quantifiable approach to the design, development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software. • This is collected in a body of knowledge that is the subject of courses taught in Informatics curricula • It is a research area (with several subareas) with a large community

4

SE vs other engineering fields • The essential distinction between software and other engineered artifacts has always been the absence of fabrication cost. In conventional engineering of physical artifacts, the cost of materials and fabrication has dominated the cost of design and placed a check on the complexity of artifacts that can be designed. When one bottleneck is removed, others appear, and software engineering has therefore faced the essential challenges of complexity and the cost of design to an extent that conventional engineering has not. Software engineering has focused on issues in managing complexity, from process to modular design to cost-effective verification, because that is the primary leverage point when the costs of materials and fabrication are nil. Young, M., Faulk, S. (2010). "Sharing What We Know About Software Engineering". FSE/SDP workshop. 5

Main SE research repositories • 2 super-top archival journals and conferences (in SE conferences have same prestige and selectivity as journals!) • IEEE Transactions on Software Engineering (TSE) • ACM Transactions on Software Engineering and Methodology (TOSEM) • International Conference on Software Engineering (ICSE) • Fundamentals of Software Engineering (FSE)/ European Software Engineering Conference (ESEC)

6

Experimental software engineering • Sub-domain of SE focusing on experiments on software systems (software products, processes, and resources). • Interested in devising experiments on software, in collecting data from these experiments, and in devising laws and theories from this data. • Empirical software engineering is a related concept, sometimes used synonymously. Empirical software engineering is a field of research that emphasizes the use of empirical studies of all kinds to accumulate knowledge. Methods used include experiments, variety of case studies, surveys, and statistical analyses.

7

Experimental/empirical • An experiment is an empirical method that arbitrates between competing models or hypotheses, or to support a theory or hypothesis. • An empirical method is uses a collection of data to base a theory or derive a conclusion in science • No paper can be written today in software engineering unless there is an adequate assessment of what one is claiming or has done • Not all kinds of research require an experimental assessment • An increasing number require convincing experiments

• Perhaps an indication that the field (software) is a mature engineering field 8

From ICSE 14 CfP • Analytical: A paper in which the main contribution relies on new algorithms or mathematical theory. Examples include new bug prediction techniques, model transformations, algorithms for dynamic and static analysis, and reliability analysis. Such a contribution must be evaluated with a convincing analysis of the algorithmic details, whether through a proof, complexity analysis, or run-time analysis, among others and depending on the objectives. • Empirical: A paper in which the main contribution is the empirical study of a software engineering technology or phenomenon. This includes controlled experiments, case studies, and surveys of professionals reporting qualitative or quantitative data and analysis results. Such a contribution will be judged on its study design, appropriateness and correctness of its analysis, and threats to validity. Replications are welcome. 9

From ICSE 14 CfP •

•

Technological: A paper in which the main contribution is of a technical nature. This includes novel tools, modeling languages, infrastructures, and other technologies. Such a contribution does not necessarily need to be evaluated with humans. However, clear arguments, backed up by evidence as appropriate, must show how and why the technology is beneficial, whether it is in automating or supporting some user task, refining our modeling capabilities, improving some key system property, etc. Methodological: A paper in which the main contribution is a coherent system of broad principles and practices to interpret or solve a problem. This includes novel requirements elicitation methods, process models, design methods, development approaches, programming paradigms, and other methodologies. The authors should provide convincing arguments, with commensurate experiences, why a new method is needed and what the benefits of the proposed method are.

•

10

• Perspectives: A paper in which the main contribution is a novel perspective on the field as a whole, or part thereof. This includes assessments of the current state of the art and achievements, systematic literature reviews, framing of an important problem, forwardlooking thought pieces, connections to other disciplines, and historical perspectives. Such a contribution must, in a highly convincing manner, clearly articulate the vision, novelty, and potential impact.

11

Traditional experimental SE • Common in systems research to assess artifacts (e.g., an OS, a compiler, a network protocol, a machine learning algorithm, ...) • Experiments can be via simulation or in-field • Focuses on efficiency (time, space, energy, ...), especially in relation with scalability

12

Testing: the most venerable class of experiments • With testing you try to devise experiments (test cases) that uncover defects in the software, until one reaches a good confidence • Testing methods try to make the generation of good test cases systematic • good = likely to improve defect discovery • The software is then assumed to be correct until further defect discovery invalidates the assumption

13

Trend • Increasing empirical work addresses software development and artifacts • How do programmers work? • How do defects evolve? • How do they relate with group dynamics? • How do they relate with the used technology? • How do they relate with the design structure?

14

An example: mining software repositories • Very hard for researchers before 2000 to access data about how people develop software • Data either absent or non disclosed for commercial/ privacy reasons • Open-source repositories gave access to an enormous amount of data

15

Mining software repositories Researchers mine repositories to

• understand how software is developed in the real world • extract models • validate hypothesized models

-

ultimate goal is to improve quality, along one or more of its dimensions ideally it should lead to quality improving changes to be implemented and to further empirical assessment to compare the old and the new

16

Examples • Certain structural properties lead to more defective code • Repositories of code and bugs exist (Bugzilla) and correlations may be studied • Analysis of emails exchanged during software development, extracting structured data from unstructured texts

17

Critical evaluation of empirical results Man prefers to believe what he prefers to be true

Francis Bacon 18

Threats to validity: internal vs. external • The controlled or experimental design enables the investigator to control for threats to internal and external validity. • Threats to internal validity compromise our confidence in saying that a relationship exists between the independent and dependent variables. • Threats to external validity compromise our confidence in stating whether the study’s results are generalizable (applicable to settings/individuals/times).

19

Why is internal validity important? • We often conduct research in order to determine cause-and-effect relationships. • Can we conclude that changes in the independent variable caused the observed changes in the dependent variable? • Is the evidence for such a conclusion good or poor? • If a study shows a high degree of internal validity then we can conclude we have strong evidence of causality. • If a study has low internal validity, then we must conclude we have little or no evidence of causality.

20

Variables and internal validity • Extraneous variables are variables that may compete with the independent variable in explaining the outcome of a study. • A confounding variable is an extraneous variable that does indeed influence the dependent variable. • A confounding variable systematically varies or influences the independent variable and also influences the dependent variable. • Researchers must always worry about extraneous variables when they make conclusions about cause and effect.

21

Necessary conditions for causality • Three conditions that are necessary to claim that variable A causes changes in variable B: • Relationship condition: • Variable A and variable B must be related. • Temporal Antecedence condition: • Proper time order must be established. • Lack of Alternative Explanation Condition: • Relationship between variable A and variable B must not be attributable to a confounding, extraneous variable.

22

Notice that the existence of a relation does not imply CAUSALITY

Definitions True positives Elements correctly collected by the approach under analysis False positives Elements wrongly retrieved False negatives Elements that are not retrieved Precision TP/(TP+FP)

Recall TP/(TP+FN)

23

Experimental approaches ABOUT science (software engineering)

24

Sample questions Can we understand trends? Topics gaining/loosing interest. Can we understand influence of research/researchers? Can these data be used to evaluate/promote people, select themes to fund, ...? This can be seen as empirical meta-research

25

Topics analysis--personal experience • We selected the MAIN journals and conferences that the represent a field (Software Engineering) as a whole (set A) • We selected (expert judgement) a number of representative topics and specialized journals/conferences for each topic (e.g., software testing) (set B) • We associated a topic profile to each paper in A based on the citations it received • if a paper has 20 citations related with topic A , 20 related with topic B , and 40 related with topic C , its profile is 25 % A , 25 % B , 50 % C . • Citations are attributed to a topic if they come from set B; otherwise we recursively classify the citing paper if it is in set A 26

Inferring influence from publication data • Citations as a measure of influence • # paper citations as an indication of impact of a piece of work • # paper citations in a journal/proceedings as an indication of impact of a venue • impact factor • # citations of one's research measure impact of researcher • H-index

27

H-index • h-index is an index that attempts to measure both the productivity and impact of the published work of a scientist or scholar • A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np − h) papers have no more than h citations each

28

Recommended readings Joint Committee on Quantitative Assessment of Research Report CITATION STATISTICS--R. Adler, J. Ewing, P. Taylor (Eds.) Intl Mathematical Union (IMU) in cooperation with Intl Council of Industrial and Applied Math (ICIAM) and the Institute of Mathematical Statistics (IMS) 6/12/2008 Informatics Europe Report RESEARCH EVALUATION FOR COMPUTER SCIENCE Viewpoint article in CACM, April 2009 Eds. B. Meyer, C. Choppy, J. Staunstrup, J. van Leewen (Eds.) D. Parnas, CACM nov 2007, STOP THE NUMBERS GAME 29

Findings •  Much of the modern bibliometrics is flawed (statistics improperly used) •  Objectivity and accuracy illusory –  the meaning of a citation can be even more subjective than peer review •  Sole reliance on citation data provides incomplete and shallow understanding of research –  only valid if reinforced by other judgments numbers are not inherently superior to and cannot substitute complex judgement ICSE 2009

36 30

The complex sociology of citations Average citations per article

citation practices differ substantially among disciplines ICSE 2009

39 31

The complex sociology of citations •  Most citations are rhetorical •  Reward citations can be of many kinds –  currency, negative credit, operational information, persuasiveness, positive credit, reader alert, social consensus •  Obliteration effect –  incorporated into other work, which is cited

ICSE 2009

41 32

Recent thoughts How scientific discovery has changed because of computing CERN e-Science

How new scientific discovery can be spawned by computing Big data

33

34