Towards General Theories of Software Engineering

4 downloads 34794 Views 119KB Size Report
Sep 29, 2014 - In 2013, as a part of the SEMAT (Software Engineering Method and ... In fact, the business of proposing general theories is in full bloom.
Towards General Theories of Software Engineering Pontus Johnsona , Michael Goedickeb , Mathias Ekstedta , Ivar Jacobsonc a

KTH Royal Institute of Technology, Stockholm, Sweden b University of Duisburg-Essen, Essen, Germany c Ivar Jacobson International, Verbier, Switzerland

In 2013, as a part of the SEMAT (Software Engineering Method and Theory) initiative, we organized a workshop devoted to the theme of general theories of software engineering (GTSE). The workshop, in total involving around a hundred authors, participants and program committee members, was held in conjunction with the International Conference on Software Engineering (ICSE 2013). The interest in the workshop, both during ICSE 2013 and afterward, prompted us to follow up on the theme. The current special issue of Science of Computer Programming is the result. The GTSE concept may be parsed into two parts, namely general theory and software engineering. Regarding the complicated concept general theory, it seems easier and more enlightening to provide an extensional definition, i.e. definition by examples, than an intensional one. Well-known general theories in other disciplines include Maxwell’s equations [1], the Standard Model of particle physics [2], the Big Bang theory [3], the theory of the cell [4], probability theory [5], general equilibrium theory in economics [6], John Maynard Keynes’ general theory of employment, interest and money [7], Sigmund Freud’s theory of the psyche [8], and Einstein’s general theory of relativity [9]. Additionally, there are many general theories that are well established within their respective disciplines, but have not reached the general public, including Henry Mintzberg’s general theory of organizational structure [10], Richard Easterlin’s unified theory of income and happiness [11], Michael Gottfredson and Travis Hirschi’s general theory of crime [12], Nobel Prize winner Gary Becker’s theory of marriage [13], and many more. In fact, the business of proposing general theories is in full bloom. Google Scholar reports over 200 hits on the term general theory in the titles of scientific articles only in 2013, including contributions such as a general theory of acute and chronic heart failure [14], a general theory of Preprint submitted to Science of Computer Programming

September 29, 2014

business marketing [15], a more general theory of commodity bundling, [16], life history theory and the general theory of crime [17], a general theory of environment-assisted entanglement distillation [18], a general theory of implementation [19], a general theory of behavior and brain coordination [20], and an almost general theory of mean size perception [21]. A general theory is in contrast to a specific theory. Typically, the prefix general is either attached to a theory that addresses a greater empirical domain than its predecessors, or to a theory that encompasses a whole field (which begs the question of how fields are defined). Typically, a general theory aims at answering the big questions in its field. Ideally, it should be able to explain and predict important phenomena in the field. A general theory of electromagnetism should help us understand why one transformer design is superior to another. A general theory of marriage [13] should be able to explain divorce rates. A general theory of economic equilibrium [6] should be able to explain the pricing of goods. A general theory of international relations [22] should predict the effects of foreign policy decisions. If the concept of general theory constitutes one part of the theme of this special issue, then the second part, software engineering, may be considered by pondering the big questions that general theories of software engineering would address. Arguably, the issues that have most concerned software engineering since its inception are related to productivity and quality. Why do some software engineering projects produce a fine product at a low cost, while others do not? What tools, processes, methods and competencies will increase productivity and quality? Ideally, a general theory of software engineering will be able to predict the effects of new methods and tools before they are deployed. 1. Why strive for a general theory? It is not self-evident why it is desirable to strive for a general theory of software engineering. As a first question, we may ask what the problem is with today’s situation? It is obviously possible to get by without general theories of software engineering. In brief, there are two problems with today’s state of affairs: (a) theory is implicit, and (b) theory is disconnected. At ICSE 2013, we asked some 60 participants of the 2013 International Software Engineering Conference to produce diagrams describing their personal understanding of causal relationships between core software engineering constructs. No two respondents produced identical diagrams, indicating 2

that consensus on the basic cause-and-effect of software engineering is very low. Thus, there exist general theories of software engineering, but they are subjective. The obvious effect is that two decision makers facing the same problem will make different choices. If the decision makers are on different projects, then one of them will be fortuitous while other will not. If they are on the same project, chance will determine whether they choose the decision with more or less desirable consequences. There will be no way to rationally compare their beliefs. By making theory explicit, it can be subjected to the critical gaze of science, a gaze that has been remarkably successful in producing credible decision support in many other disciplines. In addition to personal general theories, the software engineering discipline also features a multitude of well-known specific theories. For instance, the theory of languages and automata [23] is highly mature, and it is a theory of major practical relevance for traditional software engineering problems, such as programming language design. Less formalized, yet well-established, are theories such as Brooks’s Law [24], Conway’s Law [25], and Boehm’s Laws [26]. There are hundreds of such laws [26], principles [27][28][29] and hypotheses [24]. However, these small theories are disconnected from each other. They form no coherent knowledge structure. In fact, based on this mass of principles, an infinite number of contradictory statements can be derived. Each individual must therefore make their own selection of which laws, principles and hypotheses to believe in. Under these circumstances, it is no wonder that every software engineer seems to have their own general theory. In the absence of a common theory, scientific progress is painstakingly slow, claims philosopher of science Thomas Kuhn [30]. Without a theory to guide information gathering, “different men confronting the same range of phenomena ... describe and interpret them in different ways.” Rather than jointly scrutinizing and developing common theories, scientists talk past each other and thus fail to focus. This inhibits the development of a cumulative body of knowledge. With a common theory, however, the scope of the field narrows, as joint investigations are concentrated to the topics designated by the theory. The ensuing stage of normal science, characterized by the development of idiosyncratic vocabularies, specialized equipment and advanced skills accelerates progress in terms of scientific depth and detail. Such focused investigations may highlight regularities and anomalies that would have been impossible to detect through the tinkering of individual researchers. Additionally, as is also the case when developing software, joint effort often leads 3

to more advanced results than those attainable for the solitary explorer. Perhaps most importantly of all, a general theory, scrutinized and debated by a whole scientific community, is arguably more likely to reflect the real world than the untested, many times unquestioned, idiosyncratic mix of ideas harbored in the mind of any given community member. So far, we have discussed the advantages of a common, general theory. It is, however, not certain that a single theory will come to dominate the field, especially from the start. Many academic disciplines feature two or more rival theories. In the field of international relations, there has been a longstanding debate between the realists and the liberalists. In psychology, the cognitive-behavioral theories have confronted the psychoanalytical ones. Although these fields do not harvest all of the benefits of Kuhn’s normal science, they represent a significant increase in scientific maturity as compared to disciplines without any general theories. The underlying assumption in the few-theory fields is typically that one of the theories eventually will win the day, thus unifying the scientific community around a common world view. 2. How are general theories created? We argue above that the endeavor of creating one or a small set of competing general theories of software engineering is a worthwhile endeavor. But how can general theories come into being? The topic of scientific theory generation is a part of the philosophical field of epistemology, and as such has a long history. One of the big epistemological debates has been the one between rationalists and empiricists, between those who believe that things can be known before experience, a priori, and those who believe that knowledge requires observations of the world, a posteriori. We can divide approaches to theory generation into two camps loosely based on this dichotomy. In the first, deductive, a priori case, the theory is derived from other, pre-existing theories. In the second, inductive, a posteriori case, the theory is based (more) directly on empirical observation. Considering the rationalistic, deductive approach, the adaptation of existing theory to new phenomena seems to be inherent in our way of thinking. The concepts of metaphors and analogical thinking are embodiments of this strategy. In science, analogical thinking is a common approach. For instance, the wave theory of light, proposed by Hooke in the 17th century, applied the theory of fluid waves to the phenomenon of light [31]. Another way to derive new theories from preexisting ones is through integration. A 4

well-known example is the Standard Model of particle physics, which describes the fundamental particles of the universe and how they interact. In 1961, Sheldon Glashow discovered a way to combine the electromagnetic and weak interactions. A few years later, Steven Weinberg and Abdus Salam were able to integrate the Higgs mechanism with Glashow’s theory, thus proposing the Standard Model [2]. A long-standing ambition for the future is to also incorporate the strong interaction into the Standard Model, an ambition oftentimes referred to as Grand Unification [2]. The sociologist Robert Merton also championed consolidating special theories into more general sets of concepts and mutually consistent propositions [32]. When it comes to the empiricist, inductive, a posteriori, approach, one well-known example is Carl Linnaeus’ biological taxonomy [33]. In Linnaeus taxonomy, all living organisms are classified in a tree structure of species and subspecies. Fundamentally descriptive, the taxonomy can be used to predict properties of individuals from knowledge of other properties. For instance, observation of the plumage of a bird may allow us to predict feeding habits, clutch size, migratory habits, life expectancy, and many other aspects. Another kind of data-driven theory development is common within the healthcare field, where large empirical studies are used to identify risk factors. When a factor, such as smoking, is found, the theory is empirically refined by further studies, detailing the effects of smoking on health and mortality [34]. Such studies have a clear prescriptive aim, to improve individual as well as collective decisions. A mixed approach is represented by the scientific endeavor of codifying respondents’ pre-existing theories of the investigated phenomenon. For instance, expert elicitation [35] is a deductive technique, where the respondents are assumed to know the theory in question. Methods such as grounded theory [36] view the respondents less as sources of theory than as sources of empirical data, which is subsequently analyzed in order to generate theory. 3. In what organizational settings are general theories created? Considering the history of science, we see at least three social settings for theory generation. The one that first comes to mind is perhaps the lone genius, pondering the big questions in solitude, presenting his findings to an astonished world. Examples of such projects may include Isaac Newton’s theory of mechanics, presented in Philosophi Naturalis Principia Mathematica [37] in 1687, as well as the special theory of relativity, proposed by Albert 5

Einstein in his annus mirabilis, 1905 [38]. A second kind of social setting for theory development is that of cutthroat competition between scientific rivals. One well-known case is the race for understanding the molecular structure of DNA [39]. A third approach to theory development and scientific progress is through teamwork. The massive joint efforts undertaken in particle physics in places like Cern are examples of the collaborative form of social organization in the theory-generative process. The Standard Model of particle physics, mentioned earlier, is a well-established product of large-scale joint research. It is not unusual that articles on this topic are authored by hundreds, or even thousands of authors, as e.g. in [40]. These three forms, solitary, competitive and collaborative can be mapped to Thomas Kuhn’s stages of scientific maturity. The solitary genius represents early efforts, when the scientific community has little common understanding of the problem. The competitive scenario requires a common understanding of the problem, but different ideas about the solution, and a heterogeneous organizational structure. The collaborative approach requires a common understanding of the problem as well as of the method for finding the solution. It also requires a cohesive organizational setting, with international funding systems and research organizations. In software engineering theory development, we are currently very clearly in the solitary stage. There is little agreement on the problem to be addressed; in fact, there is not even agreement on the relevance and viability of general theories as a concept. As the present special issue demonstrates, proposals for general software engineering theories differ widely with respect to content and form as well as process. Typically, papers proposing new theories are authored by one or two researchers. Currently, cutthroat competition is not the norm in software engineering. This may be due to the lack of joint research goals; there is plenty of room to define new research questions, and many methods to address similar questions. Joint action is not either the standard modus operandi in the software engineering community. However, the community is capable of joint action. There are many examples of successful standardization work, such as the OMG MOF/UML effort [41] and SWEBOK [42]. It should therefore not be ruled out that, given the right preconditions, the software engineering community could come to agree on a joint path forward.

6

4. Included articles in this special issue In the current special issue, a number of theories are proposed, and the road to theory is discussed. Erbas and Erbas present a general theory of software engineering based on the economic theory of transaction costs [43]. This theory is a good example of the rationalistic, deductive, a priori kind discussed above. An established theory from another discipline is adapted to the context of software engineering. The organizational setting of this work is mainly of the solitary kind. Ralph proposes a theory of design called SCI, where the agent engages in three types of activities, namely sensemaking, coevolution and implementation. This theory is also primarily an a priori theory, but of the integrative kind, merging Christoffer Alexander’s Selfconscious Process [44] with various concepts from the fields of software engineering, design, management and psychology. Organizationally, the SCI theory was developed by Ralph in solitude. Stoica, Pelckmans and Rowe propose a theory of software engineering where models and decision theory constitute the core. This too, is an a priori theory, integrating well-established decision theory with an innovative take on models. Although authored by three contributors, the Stoica-PelckmansRowe theory should also be classified to the aforementioned solitary organizational form of theory development; it was neither the result of cut-throat competition or a larger standardization effort. The article authored by Ng explores the SEMAT Essence [45] as a general theory of software engineering. Based on the expertise of practitioners, and explicitly aiming to capture that which is generally agreed on in the practicing community, the Essence is a good example of an inductive, a posteriori theory. It is also an excellent example of the collaborative potential of the software engineering community. Recently adopted by the Object Management Group as an international standard, the Essence is the result of a community consensus process. Because explicit work on general theories of software engineering is new, there is a need to discuss both the definition of theory and the process of theorizing. Stol and Fitzgerald focus on these issues in their article. Inspired by results in the field of consumer research, they propose the Research Path Schema, which is a framework for thinking about theory in software engineering. The paper by Dewayne Perry develops a framework of design theories 7

and related models as reification of such theories. This framework allows the combination and development of new theories from existing ones. As the paper explains by discussing examples of existing software engineering approaches, the resulting structures become quite complex. P¨aiv¨arinta and Smolander also target the activity of theorizing. In their Coat Hanger model, they describe the process of building theories from practices. The Coat Hanger is thus not a theory of software engineering, but rather a theory of epistemology, i.e. a theory about theory generation. As such, it subscribes to the inductive approach, advocating a tight interplay between theory and practice. Rather than a precise point of a scale, theoretical generality is a relative term. Inspired by Robert Merton’s thoughts on middle-range theories [32], Wieringa and Daneva consider strategies for generalizing software engineering theories. As generalization requires a pre-existing theory, this article contributes to the a priori stream of theory development. As the reader will notice, we have accepted a set of papers that do not include empirical validation of their respective contributions. This omission is intentional. In the current stage of scientific maturity, there is a need to propose and discuss candidate general theories. A demand of validation already at this stage would silence a much-needed conversation, forcing scientists to work alone, or worse, to not present their theories to a greater audience at all. References [1] D. J. Griffiths, R. College, Introduction to electrodynamics, Vol. 3, Prentice hall Upper Saddle River, NJ, 1999. [2] D. J. Griffiths, Introduction to elementary particles, John Wiley & Sons, 2008. [3] E. W. Kolb, M. S. Turner, The early universe., Front. Phys., Vol. 69, 1. [4] P. Mazzarello, A unifying concept: the history of cell theory, Nature Cell Biology 1 (1) (1999) E13–E15. [5] W. Feller, An introduction to probability theory and its applications, Vol. 2, John Wiley & Sons, 2008.

8

[6] K. J. Arrow, M. D. Intriligator, W. Hildenbrand, H. Sonnenschein, Handbook of mathematical economics, North-Holland, 1986. [7] J. M. Keynes, General theory of employment, interest and money, Atlantic Publishers & Dist, 2006. [8] S. Freud, A. Freud, A. Richards, C. L. Rothgeb, J. Strachey, The Standard Edition of the Complete Psychological Works, Hogarth Press and the Institute of psycho-analysis, 1974. [9] R. M. Wald, General relativity, University of Chicago press, 2010. [10] H. Mintzberg, The structuring of organizations: A synthesis of the research, Prentice Hall, 1979. [11] R. A. Easterlin, Income and happiness: Towards a unified theory, The economic journal 111 (473) (2001) 465–484. [12] M. R. Gottfredson, T. Hirschi, A general theory of crime., Stanford University Press, 1990. [13] G. S. Becker, A theory of marriage: Part i, The Journal of Political Economy (1973) 813–846. [14] D. H. MacIver, M. J. Dayer, A. J. Harrison, A general theory of acute and chronic heart failure, International journal of cardiology 165 (1) (2013) 25–34. [15] S. D. Hunt, A general theory of business marketing: Ra theory, alderson, the isbm framework, and the imp theoretical structure, Industrial Marketing Management 42 (3) (2013) 283–293. [16] M. Armstrong, A more general theory of commodity bundling, Journal of Economic Theory 148 (2) (2013) 448–472. [17] C. S. Dunkel, E. Mathes, K. M. Beaver, Life history theory and the general theory of crime: Life expectancy effects on low self-control and criminal intent., Journal of Social, Evolutionary, and Cultural Psychology 7 (1) (2013) 12.

9

[18] F. Buscemi, N. Datta, General theory of environment-assisted entanglement distillation, Information Theory, IEEE Transactions on 59 (3) (2013) 1940–1954. [19] C. May, Towards a general theory of implementation, Implement Sci 8 (2013) 18. [20] J. Kelso, G. Dumas, E. Tognoli, Outline of a general theory of behavior and brain coordination, Neural Networks 37 (2013) 120–131. [21] J. Allik, M. Toom, A. Raidvee, K. Averin, K. Kreegipuu, An almost general theory of mean size perception, Vision research 83 (2013) 25–39. [22] J. J. Mearsheimer, The tragedy of great power politics, WW Norton & Company, 2001. [23] J. E. Hopcroft, Introduction to automata theory, languages, and computation, Pearson Education, 1979. [24] F. P. Brooks Jr, The Mythical Man-Month, Anniversary Edition: Essays on Software Engineering, Pearson Education, 1995. [25] M. E. Conway, How do committees invent, Datamation 14 (4) (1968) 28–31. [26] A. Endres, H. D. Rombach, A handbook of software and systems engineering: empirical observations, laws and theories, Pearson Education, 2003. [27] T. Gilb, S. Finzi, Principles of software engineering management, Vol. 4, Addison-Wesley Reading, MA, 1988. [28] C. Ghezzi, M. Jazayeri, D. Mandrioli, Fundamentals of software engineering, Prentice Hall PTR, 2002. [29] A. M. Davis, 201 principles of software development, McGraw-Hill, Inc., 1995. [30] T. S. Kuhn, The structure of scientific revolutions, University of Chicago press, 2012.

10

[31] R. Hooke, Micrographia or some physiological Descriptions of Minute Bodies, Cosimo, Inc., 2007. [32] R. K. Merton, Social theory and social structure, Simon and Schuster, 1968. [33] C. v. Linnaeus, Systema naturae 1, Editio Decima, Reformata. [34] U. D. of Health, H. Services, et al., The health consequences of smoking50 years of progress: A report of the surgeon general, Atlanta, GA: US Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health 17. [35] R. Cooke, Experts in uncertainty: opinion and subjective probability in science, Oxford University Press, New York, 1991. [36] B. G. Glaser, A. L. Strauss, The discovery of grounded theory: Strategies for qualitative research, Transaction Publishers, 2009. [37] I. Newton, Philosophiæ naturalis principia mathematica (mathematical principles of natural philosophy), London (1687). [38] A. Einstein, On the special theory of relativity, Ann Phys 17 (1905) 891–921. [39] K. Davies, Cracking the genome: inside the race to unlock human DNA, JHU Press, 2002. [40] G. Aad, B. Abbott, J. Abdallah, A. Abdelalim, A. Abdesselam, O. Abdinov, B. Abi, M. Abolins, H. Abramowicz, H. Abreu, et al., Measurement of the differential cross-sections of inclusive, prompt and non-prompt j/ψ production in proton–proton collisions at, Nuclear Physics B 850 (3) (2011) 387–444. [41] O. M. Group, Omg meta object facility (mof) core specification, version 2.4.1 (2013). [42] A. Abran, P. Bourque, R. Dupuis, J. W. Moore, Guide to the software engineering body of knowledge-SWEBOK, IEEE Press, 2001.

11

[43] O. E. Williamson, The economic intstitutions of capitalism, Simon and Schuster, 1985. [44] C. Alexander, Notes on the Synthesis of Form, Vol. 5, Harvard University Press, 1964. [45] I. Jacobson, P.-W. Ng, P. E. McMahon, I. Spence, S. Lidman, The essence of software Engineering: applying the SEMAT kernel, AddisonWesley, 2013.

12