Towards an Open Source Software Development Life Cycle

23 downloads 25 Views 294KB Size Report
ABSTRACT. We develop an Open Source Software (OSS) lifecycle model by conducting a ... Immediately after each release of software, new requirements arise.
10.5465/AMBPP.2015.299

TOWARDS AN OPEN SOURCE SOFTWARE DEVELOPMENT LIFE CYCLE: A STUDY OF ROUTINE HETEROGENEITY AND DISCOURSE ACROSS MULTIPLE RELEASES ARON LINDBERG Weatherhead School of Management Case Western Reserve University Cleveland OH 44106 NICHOLAS BERENTE University of Georgia KALLE LYYTINEN Case Western Reserve University ABSTRACT We develop an Open Source Software (OSS) lifecycle model by conducting a longitudinal mixed method analysis of a medium-scale OSS project. We identify two forms of routines: discourse-driven and direct problem solving routines and trace their proportional distributions over time. We use these findings to propose an OSS lifecycle model. INTRODUCTION From its earliest days, software development has been known to follow a cyclical development pattern of inception, development and decay. Similar patterns are repeated in release control of software. Immediately after each release of software, new requirements arise (often in the form of bugs in the code) and a process of identifying requirements, designing and coding the software, as well as testing ensues. Open Source Software (OSS) development processes, however, are typically not conceived of in terms of lifecycles. Since OSS projects can involve thousands of independent, typically volunteering, developers, a rationalized, predictable lifecycle processes seem out of the question. Once a piece of software is made available to the OSS community, OSS developers self-organize to address bugs and issues in an emergent manner as they arise, and through this mechanism they continuously and incrementally extend and strengthen the code (Raymond, 2001). OSS developers postpone major, complex problems and instead pursue incremental, manageable contributions. These incremental contributions layer upon each other in a sedimented fashion and enable the development of complex software (Howison & Crowston, 2014). Although existing research provides insight into various processes which OSS communities follow, we have little understanding of how these processes vary across the release cycle and whether they exhibit any stages. While OSS projects have releases, we do not know if there is a lifecycle pattern between these releases. To investigate the presence of temporal patterns of activity in OSS development, we conducted a longitudinal, exploratory, mixed methods computational analysis of a major OSS project, Ruby on Rails. Overall, we examined the project’s 126,182 activities over 28 months

10.5465/AMBPP.2015.299

(May 2011 to August 2013) using computational sequence analysis techniques. These activities comprised 11,846 activity sequences, were clustered into two classes of “routines” (repeated sequences of activities; Pentland, 2005) – discourse-driven and direct problem solving routines. We found that the relative distribution of these routines is associated with the intensity of discursive shaping of software features occurring throughout the release cycle. The ebb and flow of discourse throughout the project forms the distinct stages between each release. We identify the following stages: a) cleanup: exhibiting a balance between discourse-driven and direct problem solving routines, b) sedimentation: dominated by direct problem solving routines, and c) negotiation: dominated by discourse-driven problem solving routines. This suggests the presence of a lifecycle within OSS release processes. FINDINGS We report the findings as follows: first we detail the overall pattern of how routine heterogeneity changes over time. Then we report findings from the cluster analysis and provide a plot of the temporal distribution of routine performances across two routine types. As we can see in Figure 1 below, the overall routine heterogeneity increases, on average, across the period that we have studied. Make notice of a local peak around the 3.1 release, a notable decrease in heterogeneity around the 3.2 release and an increasing climb towards the 4.0 beta in February 2013. -------------------------------Insert Figure 1 about here -------------------------------First we analyzed the routine performances using cluster analysis, which allowed us to extract two distinct, statistically validated clusters: discourse- and direct problem-solving. To determine temporal distribution of routine performances across these two clusters, we plot the distribution of such performances in Figure 2. Here, we can highlight three patterns which occur around each of the three releases. If one considers a 6 month-period with each release at it’s center, we can characterize the distribution of routine performances across discourse-driven and direct problem solving in the following ways: 3.1 exhibits a balanced pattern, meaning that there are roughly equal amounts of discursive and direct problem solving (61 direct- and 57 discoursedriven problem solving routine performances). 3.2 exhibits a direct problem solving-dominant pattern, indicating that the features in this release require less discursive forms of problemsolving (102 direct- and 68 discourse-driven problem solving routine performances). Last, the 4.0 Beta exhibits a discourse-driven problem solving-dominant pattern, implying that the features in this release require larger amounts of discursive problem solving (84 direct- and 100 discourse-driven problem solving routine performances). -------------------------------Insert Figure 2 about here -------------------------------TOWARDS AN OSS LIFECYCLE MODEL

10.5465/AMBPP.2015.299

After capturing the overall temporal patterns of routine heterogeneity, analyzed its constituent parts, as well as extracted some emergent qualitative findings with regards to the possible generative mechanisms, it is now possible to develop a tentative OSS lifecycle model. We do this by first examining how the two routine structures we uncovered form a cohesive system, and then explaining how temporal distributions of these two forms of routines are shaped across the OSS release lifecycle. This enables us to propose some basic elements of a lifecycle model of OSS development. -------------------------------Insert Figure 3 about here -------------------------------In Figure 3 above we can see how the two routine structures, discourse-driven and direct problem solving form a cohesive system. Jointly, these routines enable developers to work on both incremental problems, essentially through utilizing the logic of “open superposition” (Howison & Crowston, 2014) to solve problems directly, as well as address controversial problems by utilizing discourse-driven problem solving (Scacchi, 2009; Winograd, 1987). Both routine structures start with reporting problems: various bugs and feature requests, after which controversial problems enter a discourse-driven problem solving routine, and less controversial problems enter a direct problem solving routine. In the former a process of inquiring into and aligning code is initiated, through which features are negotiated so as to satisfy constraints imposed by internal/external artifacts as well as common use cases. In the latter, direct-problem solving, code can often be written a simple and incremental manner, solving one problem at a time. After these problem solving processes have concluded, a decision is made whether to reject the solution, or merge it into the codebase. As we observed in our findings, the relative distributions of performances across these two kinds of routines, discourse-driven and direct problem solving, vary depending on discursive processes and community interactions unfolding across the multiple arenas of the community. By characterizing the distinct patterns that these temporally changing distributions of routine performances constitute, we can construct an OSS lifecycle model. -------------------------------Insert Table 1 about here -------------------------------As we can see in Table 1 above there are three qualitatively distinct stages in the overall lifecycle: cleanup, sedimentation, and negotiation. The cleanup stage consists of fixing a large amount of bugs that a new release exposes, but also of prioritizing new features to de developed. This leads to an overall discursive shaping process which has some degree of controversy. To respond to this moderate level of controversy, routine heterogeneity is raised to a medium level. The next stage is labeled sedimentation, and consists of distributed, relatively independent work on incremental problems (Howison & Crowston, 2014). This reflects a situation where there is little controversy with regards to the features that are being implemented – therefore necessitating little discursive shaping. This leads to lower degrees of attention to concerns and arguments, lower community interaction, and therefore lower degrees of routine heterogeneity.

10.5465/AMBPP.2015.299

Last, in the negotiation stage the controversial nature of features being implemented leads to intensive forms of discursive shaping and high degrees of attention to concerns and arguments distributed across the community. To address such issues a wider structural variety, or heterogeneity, in development routines is necessitated. When new features are discursively shaped across multiple arenas, different arguments are compared, contrasted, pitted against each other, synthesized etc. Hence, the discursive shaping of artifacts generates community interaction in the form of routine heterogeneity – the actual practices through which developers attend to concerns and arguments and stabilize the resulting material artifacts for implementation. Therefore, heterogeneity constitutes a coping mechanism for translating a set of divergent requirements into an actual feature set. Note that direct problem solving is representative of straightforward development work based on already agreed upon discursive artifacts, and as such does not require great degrees of heterogeneity. Direct problem solving addresses straightforward issues with the software. Discourse-driven problem solving, on the other hand, is more open-ended and involves the navigation of a variety of perspectives, concerns, and approaches to handling the issues that cannot be addressed easily and directly. In conclusion: in this paper we have provided indications of an OSS lifecycle, how it is structured, and which mechanisms shape its ebbs and flows. The dominance of the Bazaar metaphor in OSS has obscured its dynamic nature, and through this study we show how routine structures moves adaptively across time together with the intensity of discourse. REFERENCES AVAILABLE FROM THE AUTHORS FIGURES & TABLES

Figure 1. Routine Heterogeneity across time

10.5465/AMBPP.2015.299

Figure 2. Distribution of routine clusters across time

Figure 3. Discourse-driven & Direct Problem Solving

10.5465/AMBPP.2015.299

Table 1. OSS Lifecycle Model Stage

Cleanup

Sedimentation

Negotiation

Description

Post-release cleaning and prioritization

Incremental superpositioning of code

Interaction, conflict & decision-making

Routine Distribution

Balanced

Direct-Dominant

Discourse-Dominant

Discursive Shaping

Some controversy

Uncontroversial

Controversial

Community Interaction

Medium

Low

High

Routine Heterogeneity

Medium

Low

High

Modeling interaction and heterogeneity Community Interaction Routine Heterogeneity Flow across releases

Suggest Documents