Developer Preferences for Joining Related Open ...

3 downloads 1810 Views 203KB Size Report
domain overlap should intuitively attract a developer to an OSS project this is not ...... freelance labour market," Work, Employment & Society, 25:2, pp 342-351.
How Knowledge Overlap Drives (and Doesn’t Drive) Developer Preferences for Joining Related Open Source Software Projects

Nilesh Saraf (Simon Fraser University) Deepa Chandrasekaran (Lehigh University) S. Siddarth (University of Southern California)

Draft October 10, 2011

1 Electronic copy available at: http://ssrn.com/abstract=2002366

ABSTRACT Open Source Software communities are exemplars of online and virtual collaboration among software developers. However, such communities are typified by a scarcity of volunteers with the result that OSS projects typically strive to garner necessary expertise. This study attempts to understand how OSS projects can attract software developers to join its collective effort. We proceed by examining the role of a key construct, knowledge domain overlap, defined as the overlap between the expertise domain of a software project a developer is considering joining, and the combined expertise domains of her past software projects. While greater knowledge domain overlap should intuitively attract a developer to an OSS project this is not always so. Instead, we find that two contingencies -- community experience of a developer and the centrality of the OSS project in the community – play a counter-intuitive role in driving developer preferences for OSS projects. Our empirical analysis is based on the archival data of the joining behavior of 462 developers in 538 OSS projects. First, we find that developers prefer those projects more which have a higher knowledge domain overlap in both dimensions – technical and application domains. However, despite lower technical domain overlap projects are still able to attract developers who have greater community experience. Similarly, those projects which are centrally located in the community structure are also able to attract developers despite a lower technical domain overlap. In comparison, application domain overlap seems important across contexts. The findings have implications for knowledge-based innovation in open communities which are a hotbed of innovation. Keywords: open source software development; social capital, knowledge management; innovation; virtual collaboration

2

Electronic copy available at: http://ssrn.com/abstract=2002366

INTRODUCTION The open source software (OSS) development paradigm is a prime example of how the Internet enables open innovation communities to coordinate highly knowledge intensive tasks virtually among a diverse group of developers (Agarwal et al. 2008; Crowston & Howison 2004; Crowston & Scozzi 2002; Johnson 2002; Lakhani & von Hippel 2002; Raymond 2001; von Krogh & von Hippel 2006). Born out of the hacker culture and enabled by the rise of the Internet, the OSS

movement is characterized by a globally dispersed community of

programmers who contribute to, and maintain software code to develop functioning software programs, which they subsequently make freely available for public consumption (Raymond 2001; von Hippel & von Krogh 2003). Several open source projects have already become so successful that they pose a significant threat to established proprietary products (Lerner and Tirole 2005). This rapid rise, apparent sustainability and success of the ‘bazaar’ style of software development has stimulated a rich stream of empirical research (Crowston & Howison 2006; Hahn et al. 2008; Kogut & Metiu 2001; Lakhani & von Hippel 2002; Lerner & Tirole 2002b; Lerner & Tirole 2005; Setia et al. 2010; Singh 2010; Stewart et al. 2006; Stewart & Gosain 2006; von Krogh et al. 2003; von Krogh & von Hippel 2006). The insights gathered from research on open innovation communities are being applied to spur significant capital investments and guide innovations within corporations (Fosfuri et al. 2008; Lerner & Tirole 2002b) and emerging fields such as biotechnology (Chesbrough 2003). Virtual collaborative communities are often understood to nurture innovation by mixingand-matching knowledge from diverse sources (Crowston & Scozzi 2008; Faraj & Sproull 2000; Wasko & Faraj 2005). However, it is not entirely clear how such communities can sustain this approach of introducing diverse expertise during knowledge creation because knowledge

3

management literature suggests the reverse -- that learning is path-dependent and that reuse of knowledge in familiar domains is more desirable and more likely than in less familiar domains (Faraj & Sproull 2000; Majchrzak et al. 2004; Markus 2001; Nonaka 1994; Tiwana & Bush 2005). Therefore, viewing OSS communities as knowledge creating communities (Lee & Cole 2003) where contribution is voluntary and there are no organizational constraints, we can expect that OSS projects should attract developers who have greatly overlapping software expertise1, matching with its requirements. Since this intuitive link has seldom been theorized in literature2, it forms our first research question: How does overlap between the software expertise of developers from their prior OSS projects and a focal project’s expertise needs, lead developers to contribute to the focal project? However, as per the above rationale, if developers choose to contribute only to OSS projects in highly related knowledge domains, then this may lead to ‘path-dependence’ (Vernon 1997) in the growth of the OSS community, where developers bridge similar OSS projects thus forming clusters which are disjoint from other dissimilar projects. Such a ‘path-dependent’ model of innovation is not entirely consistent with the vision of open, virtual collaboration facilitated by a global, electronic infrastructure (i.e., the Internet) (von Krogh & von Hippel 2006), which should ideally facilitate synthesis of chunks of knowledge from developers with diverse software expertise. At the extreme, such path-dependence can potentially lead to a community-wide myopia by restricting software knowledge within clusters of similar OSS projects thus undermining the purpose of an open community. This paradox is observed at the individual as well as the organizational levels. At the individual level, cognitive entrenchment in one expertise area leads to less creativity and adaptation in problem-solving, and incremental                                                              1

We use the terms expertise and knowledge synonymously. Recent work (Hahn et al. 2008) includes the match between projects in four dimensions, as control variables in predicting whether a developer joins a newly launched OSS project.

2

4

rather than radical innovations (Dane 2010). At the organizational level, this paradox is referred to in terms of exploitation and exploration3 (Levinthal & March 1993) where both modes of knowledge acquisition are beneficial to a firm. Thus, while recent literature examines the OSS phenomena from many different perspectives including developer motivations to contribute (Aksulu & Wade 2010), none attempt to explain how developers choose OSS projects -whether by seeking a match between their existing expertise with that required by the projects or by seeking greater diversity of OSS projects. We believe understanding this preference of developers for overlap (or lack thereof) between their expertise domain and that of the OSS projects they are choosing, would be a useful contribution to a theory of open source innovation. Therefore, in addition to our first research question, a deeper understanding of OSS development, and open communities in general, can emerge from answering our second research question: What contingencies lead developers to choose OSS projects in diverse rather than overlapping knowledge domains? We proceed by examining the role of a theoretical construct knowledge domain overlap in driving an OSS developer to join an OSS project. We define knowledge domain overlap as the degree of commonality between the knowledge domain of an OSS project and that of a developer’s prior OSS projects. We explain how this construct plays an important role in understanding knowledge-based innovation in open communities. Then we explain how two contingencies can moderate the role of knowledge domain overlap in driving developer choice of OSS projects – the social capital of an OSS project and the amount of a developer’s experience in the OSS community.

                                                             3

Exploitation refers to an individual’s application of acquired knowledge to somewhat familiar tasks in order to better assimilate concepts and skills learned; whereas exploration refers to her tendency to diversify her learning to far more varied domains (Gupta et al. 2006).

5

We test our model using a novel longitudinal data set spanning a period of two years in which 538 joining decisions were made by 462 developers on SourceForge.Net, the largest OSS community web portal. We sample only experienced (cross-participating) developers, that is, those who have previously worked on at least one project on Sourceforge, and subsequently join an OSS project, (with the latter joining decision included in our sampling frame). We test our hypotheses using a logistic regression model estimated via the RELOGIT (Rare Event Logit) procedure, which explicitly accounts for the rarity of new cross-participation occurrences as a percentage of all possible events, typical of large communities. The low occurrences of crossparticipations in fact add to the importance of understanding what factors drive such crossparticipation. The rest of the paper is organized as follows. Section 2 elaborates the theoretical framework and our hypotheses. Section 3 describes the data, sample construction, the empirical model, and measures used in the estimations. Section 4 presents the results. Section 5 contains a discussion of the results and the implications. Section 6 concludes with some of the limitations of this work and some possible areas for future research. THEORY While a large number of recent studies spanning across disciplines examine OSS networks (Aksulu & Wade 2010), research on the behavioral mechanisms of how OSS networks communities grow and evolve, is still emerging. Many studies have examined developers’ motivation to contribute software code to OSS communities (Crowston & Scozzi 2002; Hars & Ou 2002; Lakhani & von Hippel 2002; Roberts et al. 2006; Stewart & Gosain 2006; von Krogh et al. 2003). This work has identified benefits such as “scratching a personal itch”(Raymond 2001), more formally termed as intrinsic motivations (Hars & Ou 2002), while others have

6

focused on how these motivations are enhanced or subdued by various external incentives such as career benefits and monetary incentives (Roberts et al. 2006). In a more general context of a online professional community, Wasko and Faraj (2005) find that willing contributors are those who seek to enhance their professional reputations, have relevant experience and are structurally related to others in the social network. Another important stream of research provides insights into the process of knowledge exchange and learning in OSS projects (Bagozzi & Dholakia 2006; Lee & Cole 2003) and antecedents of OSS project success (Fershtman & Gandal 2004; Grewal et al. 2006; Singh et al. ; Stewart et al. 2006; Weber 2004). Recent work also finds that developers decide to join newly launched OSS projects based on social connections arising from past collaboration with the members of the newly launched projects (Hahn et al. 2008). Thus, with the exception of very little recent literature that explains cross-participating behavior of developers, most literature does not sufficiently elaborate on these mechanisms, which obviously are central to the growth of an OSS community. Developer Cross-Participation and Knowledge-Based Innovation OSS projects are highly knowledge intensive (Mathiassen & Pourkomeylian 2003; Tiwana 2002), and are typically organized as an implicit or explicit hierarchy of developers (Crowston & Howison 2006; Jensen & Scacchi 2005; Long 2006; Long & Siau 2006). Excepting the most popular projects such as Linux, Apache and a few others, it is most often difficult for OSS projects to attract the attention of developers to contribute or join its team (Choi et al. 2010). In a typical OSS hierarchy, a core-developer team makes key decisions such as when new code should be released, what new user-requested features it should incorporate, and how to fix newly discovered bugs. The core-team excludes peripheral developers (Setia et al. 2010) who also contribute ‘patches’ to plug software bugs or to add new features, but whose contributions have

7

to be scrutinized and approved by core-developers (Herraiz et al. 2006). By virtue of making multiple contributions, over time, peripheral developers become candidates for inclusion in the core-team and be eventually invited to join it (Jensen & Scacchi 2005; Qureshi & Fang 2011). However, the motivations that drive developers to join a single project also lead them to join more than one project simultaneously or sequentially. Figure 1a illustrates a social network of projects resulting from developer cross-participation. As illustrated in Figure 1b (a one-mode projection of the network in Figure 1a (Wasserman et al. 1994)), developer D1, already a member of projects P1 and P4 may subsequently join one of the other remaining OSS projects P2, P3 or P5. (a)

(b)

A unipartite project‐project network

P3

A bipartite developer‐project affiliation  network

P4 P2

P1

P3

P4

P5

P2 P1

P5

D1

D2

D3

Only a sample of developers and projects are shown

Research has demonstrated that participation as a core developer in OSS projects provides ‘private benefits’ such as learning and reputation (Hars & Ou 2002; von Hippel & von Krogh 2003). For instance, Lee and Cole (2003) explain how knowledge creation benefits participants due to the iterative cycle of code generation, followed by error identification, detection, and even rejection of contributions in an open innovation community. Lakhani and Hippel (2002) examine knowledge sharing among users of the Apache software

forum and find that almost all

information providers realized direct learning benefits. Kuk (2006) identifies three types of 8

knowledge-sharing activities including sharing of unique experience, recombining old code with new to address bugs and new feature requests. We propose that such benefits could be significantly enhanced by developer crossparticipation across multiple projects. Specifically, as developers participate in new projects4, it is not uncommon for them to reuse code that they may have previously collaborated on as developers on other projects (Banker & Kauffman 1991; Desouza et al. 2006; Haefliger et al. 2008; Newell et al. 2006)5 . Because the reuse of software does not simply involve copying and pasting existing code, cross-participation provides a means for the developer to creatively apply her acquired knowledge. Re-using code requires the developer to develop and apply highly contextual knowledge of the functional needs of both, the recipient and sourcing OSS projects (Desouza et al. 2006; Long & Siau 2006; Méndez-Durón & García 2009). Thus software reuse is a partly tacit activity and in addition to being re-used, knowledge is also re-created by crossparticipating developers (Nonaka & Konno 1998). Knowledge-Domain Overlap Individuals have an inherent tendency to engage in path-dependent learning (Ellis 1965) whereby their prior knowledge and expertise influences their choice of the type of knowledge to acquire in future. The importance of path dependence is often highlighted in literature on sociotechnical change (Nelson & Winter 1982; Rosenberg 1976) .The speed and accuracy with which new knowledge is absorbed is greatest when learners (individuals or firms) have related knowledge base built up from prior learning episodes. Therefore, prior knowledge-base provides an incentive to an individual to apply it to related knowledge domains and also to engage in                                                              4

In this paper, ‘new’ projects refer to OSS projects in which the developers have not participated before, and not projects which are newly launched on the community. 5 A recent report finds the reuse level for open source software at 50% for a small sample of projects, which can be considered to be high (Sophie 2005).

9

acquisition of related knowledge (Wasko & Faraj 2005). At the firm level, path dependent learning is related to the notion of absorptive capacity (Cohen & Levinthal 1990), which is “cumulative and contributes to expectation formation” (pg. 137). As noted by Levinthal and March (1993), organizations’ “preferences for particular technologies develop in tandem with competences at them” (pg 100). Therefore, absorptive capacity becomes both a rational behavior as well as a type of inertia. Absorptive capacity may lead an individual (or a firm) to begin to aspire for more knowledge in the same field, merely due to prior exposure, and thus is a form of self-reinforcing behavior. In the OSS context, whereas development effort on a specific project grounds a developer in the mental models of the specific knowledge domain as characterized by the software’s functionality, user-requirements, operating systems and programming languages (Kraut & Streeter 1995; Mayer 1981), the broader OSS community is a venue for developers to apply their expertise to other projects. The community is a source of new knowledge built upon their current expertise. This potential for knowledge creation, acquisition, and transfer is therefore a powerful incentive for developers to contribute to multiple projects. Knowledge reuse is an accepted behavior and is prevalent not only in innovation contexts (Majchrzak et al. 2004; Markus 2001), or in software development (Banker & Kauffman 1991; Sodhi & Sodhi 1999) but also in OSS development (Haefliger et al. 2008; Von Krogh et al. 2005) where relative newcomers often make their early contributions by reusing software code from other projects (Haefliger et al. 2008; Sophie 2005; Von Krogh et al. 2005; von Krogh et al. 2003). Such reuse also extends to software methods, algorithms, and software code. Therefore, while a developer’s past projects are indicative of the knowledge domains in which the developer has acquired expertise in (Adelson & Soloway 1985), we argue that future

10

choices of OSS projects are driven by the need to speedily augment their expertise and increase the quality of their contributions, both of which results in ‘private gains’ (von Hippel & von Krogh 2003) such as greater reputation or career prospects (Roberts et al. 2006). Haefliger, von Krogh and Spaeth (2008) find that OSS developers work under severe time and skill constraints, and that efficiency become important because of the self-inflicted pressure to release a working product. Indeed, Fleming and Waguespeck (2007) note the higher levels of technical contribution by developers as a characteristic of open source community leaders. These empirical findings suggest that developers would derive greatest incremental benefits in term of efficiency or reputation, by contributing to other OSS projects which are most similar in knowledge domains to their prior projects. Previous research has identified two important dimensions of the knowledge domains relevant to software development: application domain knowledge and technical domain knowledge (Lee 2005; Tiwana 2004; Vitalari 1985). Application domain knowledge refers to the tacit and explicit understanding of what users need from the software system. Participating in open source software projects permits developers to acquire a better understanding of the innate functional needs of the users (Crowston & Scozzi 2002). Technical domain knowledge refers to the knowledge and skills involved in manipulating technology components and the technical tasks of programming (Weinberg 1971), revealed by the programming language and operating system expertise of the developer. Extending functionality or debugging the software may involve either creating small software ‘patches’ or even overhauling the architecture of the software. These complex tasks are can be a significant impediment if a core-team lacks the niche expertise specific to the knowledge domain of the project.

11

The greater the similarity between the application and technical domains of any two OSS projects the more likely it is that the software development expertise required to extend the functionality of the software, or to understand user requests for debugging and adding new features to one project, is also similar to the other. Thus selecting a similar OSS project can enhance developers’ benefit by enabling them to increase their expertise in that knowledge domain, to reduce contribution effort, to increase overall level of contribution, and therefore, to enhance their reputation. Hence, we argue that, given the freedom to voluntarily contribute to any OSS project, developers are likely to select those which have greater knowledge domain overlap with their past projects. Hypothesis 1: The likelihood that a developer will join an OSS project will be greater when there is a higher knowledge domain overlap of the target OSS project with a developer’s prior projects, than when there is a lower knowledge domain overlap. Social Capital of OSS Projects An OSS project in the project level social network (Figure 1a) is highly central if it shares developers with many other projects. Conversely, it is completely isolated if none of its developers participate in any other project within the community. Thus OSS project centrality defined in terms of the number of developer linkages with projects, captures the level of embeddedness in the network with projects as its nodes. Centrality is a form of social capital, which represents a set of actual or potential resources linked to the set of durable network ties with other projects (Bourdieu 1986). In organizational networks, firm centrality is associated with reputation (Podolny 1993), access to deep knowledge (Hagedoorn & Duysters 2002) and novel information (Burt 2000; Zaheer & Zaheer 1997). Whereas most of the literature on organizational networks mostly illustrates the benefits to actors from greater centrality, recent

12

literature (Barabasi & Albert 1999; Powell et al. 2005; Rosenkopf & Padula 2008) goes one step further to explain how actors in a social network (drawn by the potential for greater visibility and access to resources) can often display a preference for linking with centrally connected actors. In the OSS community, centrally located projects gain speedier access to software development knowledge (for example, software algorithms, methods, and so forth (Crowston & Scozzi 2002; Méndez-Durón & García 2009)) from linked OSS projects. Greater centrality of an OSS project can help accelerate the diffusion of its code and related software knowledge from to other linked OSS projects, with the common developers acting as conduits (Méndez-Durón & García 2009; Singh 2010). Conversely, by virtue of being linked to many other projects, developers in central projects are more efficiently able to search for software knowledge in the broader community. Thus, greater centrality of an OSS project endows its developers with greater visibility and popularity in the OSS community – and access to more resources. Atleast two types of signaling benefits are enjoyed by central projects. First, Fershtman and Gandal (2008) found that projects that were relatively isolated within the developer community were less popular among users than their more-connected counterparts; and, in a study of OSS video game developers, Grewal et al (2006) found the embeddedness of an OSS project also contributed to the number of user downloads of the software. Second, community norms oblige developers who borrow source code from another OSS project’s repository, to acknowledge and express solidarity to that project (Bergquist & Ljungberg 2001). Reuse is an implicit but significant signal that validates the quality of the borrowed code. The goodwill and reciprocity engendered by a central project makes it more credible source of software code. Both these outcomes enjoyed by structurally central OSS projects, user downloads and code reuse, lead to an increase in the project’s reputation.

13

While prior studies have suggested that developers value the ability to contribute to highly visible projects (Lee & Cole 2003; Roberts et al. 2006), these studies have typically restricted their attention to single isolated projects. We advance this argument by proposing that the benefits to a developer from joining an OSS project can be significantly enhanced if it is central in the community network. An important benefit of social capital is its convertibility or transformation into ‘economic capital’ (Bourdieu 1986), such as reputation and improved employability that the OSS developers enjoy (Roberts et al. 2006). Social capital is collectively owned by the core team and does not erode very easily over time despite changes in the coreteam membership. The entire OSS code repository, that records developers’ contributions and the developmental activity, preserves the accumulated social capital of OSS projects (MéndezDurón & García 2009). Therefore, we propose below that centrality as a key criterion for developers to evaluate the learning and reputation benefits, and hence the attractiveness of an OSS project. Hypothesis 2: The likelihood that a developer will join a target OSS project will be greater when the project is more central in the community network than when the project is less central. Moderating Effects: Knowledge-Domain Overlap and Project Social Capital An OSS developer may have different motivations to contribute to a project (Roberts et al. 2006) but we argue that these motivations may not necessarily be complementary. On the one hand, the high knowledge-domain overlap of certain target projects may be attractive to developers because of the reasons discussed previously, but on the other hand, central projects are attractive to the developer because of potential benefits of reputation enhancement and access to better quality of knowledge. Raymond (2001) argues that the society of open source hackers is

14

a ‘gift culture’, where social status is determined not by what is controlled, but what is given away. Since software is freely shared in the open source domain, this abundance implies that reputation remains the sole valid measure of success for OSS developers. Lerner and Tirole (2005) suggest that the potential for reputation-signaling leads programmers to work on OSS projects that have a larger audience or which can attract a large number of other programmers. Projects with high social capital also provide learning opportunities to developers, outside of applying or reusing their previously learnt knowledge or honing their current expertise. These projects often may have large amounts of rigorously tested, pre-written code and may become attractive to developers for their potential in future code development or reuse. Further, by virtue of their links with several other projects, they also provide accelerated learning opportunities to a developer in new areas of knowledge. Thus, for a developer, the attractiveness of joining an OSS project with high knowledge-overlap may be outweighed by the advantages of joining a highly central project in the OSS project network. Therefore, we expect that a highly central project should be able to draw OSS developers out of their comfort zone of their current expertise and entice them to contribute to the project even if it lies in a more diverse knowledge domain. Hypothesis 3: The likelihood that a developer joins an OSS project based on high knowledge domain overlap with prior projects decreases with the social capital of the project in the community level social network. Moderating Effect: Knowledge-Domain Overlap and Developer Experience The OSS community can be viewed not only as a collective of experts, but also as a community of learners (Hars & Ou 2002). Therefore, apart from displaying a tendency to reapply their existing knowledge (Ericsson et al. 2007; Weinberg 1971), learners typically also have a

15

longer-term preference for broadening their expertise within a community. In particular, we suggest that OSS developers’ desire for greater monetary gains and career prospects (Roberts et al. 2006; von Hippel & von Krogh 2003) should attract them to projects that offer them opportunities to broaden their expertise. The preference of career individuals to widen their expertise is also recognized in literature on how IT professionals navigate their sequence of jobs (Arthur et al. 2005; Bidwell & Briscoe 2010) and select new jobs that involve learning a different skill beyond their current expertise (O'Mahony & Bechky 2006). These contexts are similar to the OSS community because: one, there is no structured career progression that restrict their choices within a single organization’s hierarchy: and second, just as OSS developers are, career individuals are also motivated learners who are seeking to expand opportunities by acquiring additional skills through contingent/project-based work (Defillippi & Arthur 1994) to deepen their current technical or application expertise. In the OSS community, greater experience gained by participating in multiple projects increases the means as well as opportunities available to developers to broaden their skill set. Participation involves searching for reusable software code created by other projects, soliciting information from the broader community (Faraj & Sproull 2000). This keeps the developers aware of complementary and competing software products being created by other OSS projects within the community (Crowston & Scozzi 2002). Therefore, development experience on multiple OSS projects allows developers to access richer information about more diverse opportunities and trends they can pursue to broaden their software expertise. Combined with the opportunities for pursuing diverse OSS projects, greater experience also generates a preference for them because typically, individual learning is cyclical and alternates between exploitation and then exploration. The cyclical trajectory helps learners to

16

overcome the cognitive constraints of engaging simultaneously in exploration and exploitation. In the organizational learning literature, the notion of punctuated equilibrium refers to the alternating cycles of exploration and exploitation that organizations prefer over the more resource-intensive strategy of simultaneously engaging in both (Annique 2007; Gupta et al. 2006). Indeed career progressions within organizations are also designed in order to help employees overcome cognitive entrenchment (Dane 2010), which limits organizations to incremental innovations. Hence, initial stages of developer contribution within an OSS community may be characterized by developing a particular expertise leading them to seek projects with greater knowledge overlap. Then, as the OSS developers gain experience, they may be less attracted by the knowledge domain overlap of target projects than by the potential of target projects to help them tap into more diverse knowledge domains (Hansen et al. 2005) and thus widen their skill set. In the organizational literature on career stretchwork, individuals are typically known to follow a strategy to widen their skills by seeking more diverse projects, but with a certain overlap with their existing skills (O'Mahony & Bechky 2006). This implies that a developer’s amount of prior experience in the OSS community should moderate the impact of her knowledge domain overlap with the target project. Analogously, we propose that if a developer has greater experience within the community then she is more attracted to OSS projects that need more diverse expertise (lower knowledge domain overlap), than if she had lesser experience. Hypothesis 4: The likelihood that a developer joins an OSS project based on high knowledge domain overlap with prior projects decreases with the cumulative experience of that developer in the OSS community.

17

METHOD Research Setting and Model The process of joining the core team is one of the most important steps in the OSS development process, which can be (metaphorically) visualized using the ‘onion’ model (Crowston & Howison 2006; Herraiz et al. 2006; Long 2006; Long & Siau 2006). This model segregates contributors of a specific OSS project concentrically into inner and outer “layers”, with the inner core of the project occupied by core team members. We focus on the project participation efforts of core developers alone for two reasons: identification of the core developers in a project is straightforward as they are specifically listed on the project web-page whereas there may be several unidentified users; second, the listing of an individual as a developer signals a more significant and impactful commitment by the developer to contribute to the project formally, instead of contributions made by peripheral developers on an ad-hoc basis. An important transition point for the developer is when the developer achieves a level of commitment that is higher than some, unknown, latent threshold, which is revealed her joining a particular project6. The decision of a developer to achieve such a level of commitment is the central focus of our work and the primary dependent variable in our model. We specify the probability that developer i joins project j as: P r( y i  1 / x  ( x j , x ij )) 

e e

x j  1  x ij  2

x j  1  x ij  2

1

- (1)

where x j and x ij are vectors of covariate values characterizing project j (e.g., its prominence) and developer i ’s relationship with j (e.g., knowledge domain overlap) respectively: 1 and  2                                                              6

The fact that most requests from non-core developers to join a core team are seldom turned down was verified by Hahn et al. (2008) in their survey of project administrators (see footnote 2). This supports our assumption that a developer’s unilateral decision to contribute substantially to a project is represented with reasonable accuracy by her inclusion as a core member, which is our dependent variable.

18

are the two vectors of parameters to be estimated: project

j

takes the value 1 if developer i joins the

, ‘0’ otherwise.

As discussed in greater detail in the Data section, typical of any large social network, the number of developer-project linking (i.e., joining) events in our data is much more infrequent relative to the number of times that developers choose not to join a project. Previous research has termed binary dependent variables with very few 1’s relative to the number of 0’s as rare events, and highlighted the importance of taking special care in the statistical analysis (King & Zeng 2001; Manski & Lerman 1977). Because the average joining probability in our data is about 0.002, a level similar to those reported in past rare events studies (Hahn et al. 2008; Singh 2005), we adapt our statistical analysis appropriately. The first source of difficulty in analyzing rare events lies in data collection. Previous research recommends that the estimation data is collected via a choice-based sampling procedure, and not simple random sampling, so as to ensure that the maximum number of “informative” 1’s (i.e, joining events) is included. Further, these ones must be supplemented by a random sample of the zeroes drawn from the population of all available zeroes. We implement this recommended choice-based sampling procedure to derive our estimation sample. Because the choice-based sampling procedure stratifies the sample based on the dependent variable, direct maximization of the sample likelihood based on equation (1) will result in biased estimates of model parameters. Therefore appropriate statistical corrections are required in order to yield coefficients that are consistent and efficient (King & Zeng 2001; Singh 2005). The first correction involves redefining the log-likelihood to compensate for differences in the sample and population fractions of ones used by choice-based sampling. Specifically, the probability of joining is weighted by the true proportion of 1’s in the original data while the probability of not

19

joining is weighted by the true proportion of 0’s (Manski & Lerman 1977). The second correction adjusts the weighted maximum likelihood estimates to correct for its underestimation of the probability Pr(Y=1) when events are rare. King and Zeng (2001) discuss these issues in greater detail and develop the Rare Events Logistic Regression or RELOGIT procedure that yields unbiased estimates in the presence of rare events data. We use the STATA implementation of RELOGIT (Tomz et al. 1999) to estimate the parameters of the model. Data Open source development activity is largely supported by the portal, SourceForge, which provides a more representative sample than other smaller sites such as Freshmeat, Rubyforge, or ObjectWeb. For our empirical analyses we used monthly data on all open source projects hosted on SourceForge.net and archived at University of Notre Dame (Madey 2006). Detailed search queries were used to extract and process data detailing the characteristics of the projects (for example, when created, project size topic, and intended audience), core developer team members as well as project activity statistics (for example, downloads, commits, etc.). This data was extracted, and variables computed, for each developer’s current or previous OSS projects. Sample Construction We used three criteria in selecting core developers. First, we ensured that all developers in our analysis belonged to the same cohort, so as to control for environmental influences that might impact their choice of new projects. Hence, we select developers who newly registered on the SourceForge site between November 1, 2004 and June 30, 2005. Second, to ensure that our sample only included ‘serious’ developers, as distinct from the large number of amateur lurkers on most online websites, we chose those developers who had participated in at least one project hosted at SourceForge.net. Hence, we ensure that the developers selected in the first step were

20

active in at least one project between November 1, 2004 and November 1, 2005, which we term as the ‘initialization period’ for our sample. Finally, the project choices made by the shortlisted developers between December 2005 and November 2006, the ‘estimation period’, were used to calibrate the parameters of our empirical model7. It is possible that a developer does not simply ‘join’ a project, but may actually initiate it. As the available data does not

enable us to

distinguish between project user roles, we adopt the conservative approach by considering only those projects with a group size of more than one developer in the month prior to the joining month. This ensures that our sample has no ‘project initiators/owners’. Only projects that Sourceforce.net classifies as ‘active’ in both the joining month and in the month prior to the joining event are deemed ‘joinable’. Dependent Variable: The outcome variable of interest is the joining event – which is coded as ‘1’ if developer i joins project j in month t, and ‘0’ otherwise. Using the previously discussed selection criteria, we identified 538 joining decisions made by 462 unique developers8. The month/year combination on which a developer chooses a new project is termed the joining date. The data revealed that, during the calibration period, the 462 developers in our sample had an average choice set size of 26231 projects. Our choice-based sample included all 538 observations in which a particular project was actually chosen (i.e,

1 ), and each of these

were matched by 25 randomly selected projects from the total number of projects that a developer could have potentially joined; that month yielding a calibration sample of 538 actual joining events and 13448 non events9.                                                              7

Prior to selecting the focal project, seventy percent of the developers worked in one prior project in the initialization period and ninety-three percent worked in at most three projects. 8 These developers chose 472 unique projects. 9 We also created separate datasets by varying the number of randomly sampled projects. Specifically, we created four separate datasets each with 5, 10, 15 and 20 randomly selected projects for each genuine developer-project affiliation event. Analyses on each of these four datasets are not reported in this paper. Since, ideally it is preferable to have the entire population of non-events in the sample, having 25 random non-events for each actual event is

21

Independent Variables Knowledge Domain Overlap: We capture the extent of knowledge domain overlap between the target project and a developer’s prior project(s) in two distinct dimensions: application domain overlap and technical domain overlap. These were computed by parsing the descriptions of the OSS projects in terms of four attributes – intended audience, project topic, programming languages and operating systems. Application Domain Overlap: We capture the extent of a match between the application domains (APPREL) of the target project and all the developer’s past projects on SourceForge using the sum of two match variables: i) Match in the topic of the target project with those of the developer’s past project(s) (M_TOPIC), and ii) Match in the type of intended audience of the focal project with those of the developer’s past project(s) (M_AUD). A developer team classifies the topics and intended audiences of its project based on a comprehensive list of such categories provided by SourceForge, which is selectively expanded at infrequent intervals to include new leading-edge topics. For example, if a target project is characterized in terms of two topics, Artificial Intelligence and Astronomy, and if the focal developer’s past projects also are described in terms of these topics, then M_TOPIC = ‘2’ for that developer. Similarly, M_AUD is the number of audience types in the target project that the developers past projects pertain to. Typically, the list of audience types is pre-specified by SourceForge and a developer team labels the intended software users as belonging to one or more of these categories. Technology Domain Overlap: We measure the technology domain overlap (TECHREL) of the target project with a developer’s past projects as the sum of two match variables. i) Match                                                                                                                                                                                                  preferred over just having 5. As expected, the pattern of results remained stable as we decreased the event to nonevent ratio from 1:20 to 1:25. Thus, the results using the dataset with a 1:25 event to non-event ratio can be considered to be robust and stable.

22

in the type of programming language of the focal project and the past project (M_PROG): M_PROG is the number of programming languages specified in the target project’s description which overlap with those in the description of the focal developer’s past projects. ii) Match in the type of operating system of the focal project and the past project (M_OS): M_OS is the number of operating systems in the target project that the developer has worked on in one or more past projects. Again, a project may use one or more programming languages and one or more operating systems. All the ‘match’ variables are computed as of the month prior to the joining date of the focal project. Project social capital (PDEG): We assess the social capital of the focal project by a degree centrality measure (Wasserman et al. 1994), measured as the number of other projects with which the target project shares one or more developers in the month prior to “the joining date”. Prior Developer Experience (PRIOREX): This is measured by the total number of past projects on Sourceforge in which the developer has been a core member. Because the model estimates developer-fixed effects for this variable, we cannot test (or hypothesize) main effects for PRIOREX, and it can only be used by interacting it with other variables. Because the model fixed effects for each developer this variable can only be used by interacting it with other variables. To test H4, we created two additional variables, APPDEG and TECHDEG that captured the interactions between PDEG and APPREL and between PDEG and TECHREL, respectively. To test H5 we create two additional interactions APPEX and TECHEX, by interacting PRIOREX with APPREL and TECHREL, respectively. Control Variables We also included the following control variables in our analysis.

23

Project license: Prior literature (Kogut & Metiu 2001; Lerner & Tirole 2002a; Stewart et al. 2006) has posited that project choice may be impacted by the type of license each project is registered under. The copyleft provisions of the GNU General Public License (GPL) and related licenses mandate that derivative code be of the same copyleft license as the original code. This feature ensures that all GPL code remains free. Licenses that mandate a copyleft provision have been termed “restrictive” in our empirical analysis. On the other hand permissive licenses are the least restrictive, permitting commercial modification and redistribution of licensed source code with no royalties to the originator of the software. We control for the type of license by including two dummy variables (RESTRICTIVE and HYBRID) following Lerner and Tirole (2002a). Out of the 472 unique projects 278 had a purely restrictive license, 10 had a hybrid-only license structure and 184 had a permissive license (RESTRICTIVE = 0 and HYBRID = 0). Project age (AGE): Projects at an early stage of development may present developers more opportunities to contribute substantively to code generation than do mature and stable projects. Therefore, we control for this effect by using the length of time that the project has been available for community participation to proxy for its maturity. We measure AGE as the number of months from when the project was first registered on SourceForge.Net to the month prior to the joining event month. We also include a squared term (AGESQ). Project size (SIZE) : To account for the possibility that a project may need a critical mass of contributions before it can attract developers, we include the size of the core team in the month prior to the developer’s joining date. Project activity: The higher the activity associated with a project, the greater its reputational benefits and the more developers are likely to be attracted to it (Raymond 2001). Three indicators of project level activity were included: 1) Cumulative file releases, (CUM_FILE) which is the total number of releases of the software on Sourceforge 2)

24

Cumulative number of feature requests submitted by users on Sourceforge (CUM_FEATURES), and 3) Cumulative commits made by the developers of each project (CUM_COMMIT). A ‘commit’ is a record of submission of, and amalgamation of new code created by a developer with the actual most current ‘released’ version of the code repository of the project, all of which is stored in the versioning system used by the project members. Project feature popularity: Projects with the most popular technological features, such as the most popular operating system or the most popular programming language, may receive more attention from OSS developers simply because there are likely to be a greater number of users/developers using them. We compute two variables, Top5_TOPIC and TOP5_AUD to capture the extent to which the target project addresses a highly popular application domain. TOP5_TOPIC is the total number of topics specified in the target project that are among the top five most popular topics across the entire SourceForge repository in the month prior to the joining event month(t). Analogously, TOP5_AUD is the total number of audience types specified in the target project that are among the top five most popular categories across the entire SourceForge repository in the month prior to t. For example, in November 2006, we determined that the following five topics out of 246 categories, System, Backup, Benchmark, Boot, and Medical Science Applications, and the following types of audiences out of 19 categories, Developers, End users/desktop, System Administrators and Advanced End-users, were the most popular on SourceForge. We use two variables to capture the popularity of the technical attributes of the target project – TOP5_PROG (programming language) and TOP5_OS (Operating system). Each of these measures take values ranging from 0 to 5, with 5 indicating a perfect match with the technical (or application) domain, and 0 indicating no match. Project developer centrality (DEVDEG): We also control for average developer centrality of focal projects. The notion that success is achieved by being connected to

25

successful others underlies many types of human relationships ranging from job mobility to marital relationships (Hahn et al. 2008; Monge & Contractor 2003). By joining a project with well-connected developers, a focal developer can enjoy higher levels of exposure to popular developers, as well as learn from experienced developers who have deep and diverse knowledge due to their multiple linkages. DEVDEG is the average number of other projects that existing developers in a focal target project belong to in the month before the joining date. Developer Past Ties (TIES): A project can also attract a developer if its existing members have a history of past collaboration on other OSS projects with that developer (Hahn et al. 2008). The developer is attracted by several benefits from past ties with core-team members of a target project. These could be in terms of the ability to develop quicker and higher quality code through reuse, greater efficiency scrutinizing code contributions, jointly developing effective design changes that can improve the overall software code architecture, all mostly by virtue of greater transactive memory in terms with prior co-workers (Faraj & Sproull 2000; Hahn et al. 2008; Hinds et al. 2000; Robert et al. 2008; Wellman et al. 1996). If the target project has at least one member with whom the focal developer has previously worked as a core co-developer prior to the joining month then TIES takes the value 1, else it is 0. Results Table 1 reports the correlations and Table 2 reports the univariate statistics for all variables. The logistic model was estimated using the RELOGIT procedure after standardizing the main variables and then creating interaction terms. Table 3 presents the results of estimating a number of incremental models to examine the stability of the coefficient signs and significance of the estimates. The first model includes only the control variables. We add the main effects in Models 3-5, and the interaction effects in Models 4 and 5. We discuss the results based on the

26

model 5, which tests all the hypotheses simultaneously. Here, we find that coefficient of RESTRICTIVE is negative and significantly different from zero. Hence, projects with restrictive licenses have a significantly smaller likelihood of being chosen by a developer as compared to permissive licenses. The coefficient for AGE is negative and significantly different from zero. This suggests that older projects have a significantly lower likelihood of being joined as compared to younger projects. However, the positive and significant coefficient of AGESQ implies that there is a non-linear effect of age and that there is a minimum age threshold for project preference. The coefficient of SIZE is positive and significantly different from zero, implying that developers prefer to join larger-sized projects to smaller-sized projects. Only one of the three indicators of project activity level (CUM_COMMIT) remains significant (and negative) in the full model (Model 5). This indicates that experienced developers have a higher likelihood of joining projects in the early stages of development when the project has not begun to see much activity in terms of feature requests or developer commits of codes. Surprisingly, three of the TOP5_ control variables do not see coefficients that are significantly different from zero, while TOP5_AUD sees a negative and significant coefficient. The coefficient of DEVDEG is positive and significantly different from zero in the full model. As expected, the coefficient for past ties (TIES) is positive and significantly different from zero (Hahn et al. 2008). The coefficients for both the knowledge domain variables (TECHREL and APPREL) are positive and significantly different from zero. This indicates that developers have a higher likelihood of joining projects with higher application domain knowledge and technical domain knowledge, supporting H1. These results are also supported by the descriptive statistics in Table 2 which reveal that the mean of the two match variables (TECHREL and APPREL) for the selected projects (JOIN=1) are higher as compared to those for the sample including all randomly

27

selected projects. The coefficient of project social capital (PDEG) is not significantly different from zero in model 5. Thus, our results do not offer support for H2. In terms of the interaction effects, we examine the effect of two moderating effects of developer experience, one between technical domain overlap and developer experience (TECHEX) and the other between application domain overlap and developer experience (APPEX). However, we only find support for TECHEX. This suggests that developers’ preference for projects from related technical domains is moderated by developer experience. The higher is the developer experience within the open source community the lower their tendency to join technically related projects. We also study the interaction terms between knowledge domain overlap and social capital (TECHDEG and APPDEG). Again, we find support only for one of the interaction terms (TECHDEG). This indicates that the likelihood of developers to join projects from related technical domains is moderated by the social capital of the target project. The higher the social capital of the target project, the lower is the developers’ tendency to join technically related projects. The coefficients in Model 5 are robust and remain qualitatively unchanged even after alternately dropping the interaction terms from the model, separately. Hence, we find support for H3 and H4, for technical domain overlap, but not for application domain overlap. In summary, a developer is more likely to join an OSS projects based on both the technical as well as application domain overlap. However, while the importance of technical domain overlap in attracting an OSS developer is contingent on both our interaction variables -- project social capital and prior developer experience – the importance application domain overlap is not. DISCUSSION OF RESULTS Past research has shown that the composition of a developer team and the ability to attract and retain a large number of committed and capable developers is vital to the success of OSS

28

communities (Raymond 2001). However, research on OSS communities often indicate a significant presence of small sized projects, or projects with a single member (Hahn et al. 2008). In our own research, we find that new project choices may be termed “rare events” and that the typical size of an OSS project is 2.7, while the projects that did attract developers, had a size of 12.9. To understand how some OSS project teams are able to successfully attract a significant volume of talent and commitment it is useful to examine developer-project joining events as in this study. Therefore, we develop and test an empirical model that considers several OSS project characteristics (including its network-level social capital), a developer-level attribute (prior experience) and two features of the developer-project ‘dyad’ (knowledge domain overlap and prior social ties) in influencing the likelihood that a developer joins a project. To develop our theory, first, we start with a basic premise from IS literature that it is more easy and beneficial for knowledge workers to contribute to related knowledge domains (Banker & Kauffman 1991; Garud & Kumaraswamy 2005; Newell et al. 2006). Therefore, we propose that developers are more likely to join OSS projects in related knowledge domains. Our results indeed suggest that developers join projects with a high degree of overlap in terms of their application and technical domains, even controlling for the presence of past project codevelopers. This finding is somewhat consistent with another study (Cottam & Lumsdaine 2008) which shows that developers tend to form sub-communities around the same programming language. While this is a fairly intuitive finding, interestingly, past work has almost never examined it because of the difficulty of finding an empirical setting where developers have the freedom to opt for projects of their choice. However, the above constructs of knowledge domain overlap, project social capital and important control variables such as prior social ties, provide an incomplete picture of how open

29

innovation networks can change. By definition, ‘open’ innovation networks should offer a different set of incentives to community members compared to those offered within an intraorganizational network (Von Hippel 1988). One of the key contributions of this study is to provide insights into how open innovation networks inculcate “exploratory” tendencies and thus lead participants to seek diversity in terms of their expertise. Thus, our findings highlight that while knowledge domain overlap is an important construct that explains joining behavior of experienced developers, equally important are project social capital, and developer experience, in leading developers to join more diverse projects. There is extensive literature on the benefits of affiliation with other highly central or prominent others in a social system; and, specifically, where a social system is viewed as a network, structural centrality has been equated with social capital and preferential access to network resources (Wasserman et al. 1994). Not only has higher centrality been found a contributing to OSS project success (Grewal et al. 2006), but some exploratory studies of the macro-level network structure of the Sourceforge community have posited a preferential attachment mechanism to explain why a few nodes (developers or projects) have high centrality and a few projects have a large number of developers (Crowston & Howison 2004; HylandWood et al. 2005; Myers 2003; Weiss et al. 2006; Xu et al. 2005). Therefore, we hypothesized that OSS developers’ may join projects with high social capital. While interesting implications are derived in a qualitative fashion from prior exploratory studies, our statistical analysis determines that central projects do not attract developers per se, but moderate the impact of knowledge domain overlap. That is, while developers have a strong preference for selecting projects with high technical domain overlap, this preference is diminished when considering joining projects with high centrality.

30

We also find that as developers gain more experience in OSS projects, they are less likely to join technically related projects. Hence, less experienced developers may be motivated by the opportunities to deepen their existing technical knowledge base, more experienced developers may be motivated by opportunities to widen their technical knowledge base. Further, OSS developers possess the trait of openness to experience (Griffin & Hesketh 2004) that manifests itself as they mature as community members. As the individuals’ learning histories build up, their learning-related idiosyncrasies may lead to accidental discovery of novel choices (Witt 2008) since developers have unlimited access to the software repository of all OSS projects in the community. These findings echo the literature on individual cognition which points to the propensity of innovators to seek risk and enjoy new experiences for the purpose of intellectual stimulation (McAlister & Pessemier 1982; Ratner 2006; Venkatraman & Price 1990); but this study makes an additional contribution in that we find that it is with cumulative experience that OSS developers in fact seek diverse technical knowledge. Interestingly, while prior experience of a developer and prominence of the target project moderates the effect of technical domain overlap on the project choice, they do not moderate the effect of application domain overlap on project choice. An important implication of these findings is that while programmers are open to relaxing their technical match requirements, they are not as flexible in relaxing the application domain match. Consistent with many of the intuitions in the information systems development literature, this finding suggests that moving away from the application domain is harder because much of the knowledge is unstructured, semantic and conceptual (i.e., it is “sticky”) (Adelson & Soloway 1985; Lee 2005; Vitalari 1985). Such knowledge is harder to make explicit compared to the syntactic knowledge of programming and operating systems. Thus, our findings offer a more nuanced understanding of

31

how the type of knowledge domain overlap among software projects manifests in the OSS developer contributions. The understanding offered by this study about how inter-team knowledge networks can change due to developer affiliations goes beyond the explanations in literature, the bulk of which focuses on intra-organizational (employee) networks (Soda et al. 2004). In particular, while there are widely recognized benefits of inter-team or inter-group interactions on team or group performance (Marrone 2010), there is scarcely work on how teams can attract voluntary memberships. Even in an intra-organizational context the problem of harnessing IS development expertise in multi-project organizations remains significant (Tiwana & Bush 2005; Tiwana & McLean 2005). In a broader context of virtual work (Ahuja et al. 2003; Suh et al. 2011), knowledge integration remains a key challenge. We believe that theorizing about the behavioral mechanisms underlying developer crossparticipation can lead to a deeper understanding of the evolution of a knowledge-based community’s networked structure. Research on open innovation communities refers to advantages in a network enjoyed by an invisible college of collaborators, termed as the “main component” if it is the largest cluster in that network (Merton 1996). Management literature has begun to study this phenomenon in terms of the performance and survival advantages enjoyed by new firms after entering the main component (Khwaja et al. 2011; Rosenkopf & Padula 2008). Our findings suggest that, to the extent developers prefer to join related-projects, that is, those that have their past co-developers as its members, or based on knowledge overlap with their past projects, and to the extent they have relatively inflexible preferences for an application domain match, they may contribute initially to some to the fragmentation (clustering) of the community by seeking to remain within their individual clusters, instead of forming ‘bridging’ ties with

32

other network clusters such as the main component. However, projects with less technical knowledge overlap become more attractive to developers as they gain more experience in the community, which suggests that over time, a developer’s particular network cluster could begin to coalesce with the main component of the network (that typically contains highly central projects). Thus, developers’ tendency to seek highly central projects over time leads to a better integration of an innovation community. Guided by these implications, the eventual community structure can also be tested in future work by devising holistic simulation models of how innovation communities can evolve, and with the use of massive longitudinal datasets. LIMITATIONS AND FUTURE RESEARCH The theoretical contributions of this study in terms of understanding developer behavior can form a basis for many future studies. First, though it is the core developers who spearhead the design of a software product, selection of its technical platform, and its evolutionary trajectory, recently, some attention is also being focused on peripheral developers and key users of open source software products (Setia et al. 2010). Future studies perhaps can draw a larger sample for analysis and develop a richer theory of the behavior of different types of participants. Second, in much of our theorizing we have alluded to the anticipated learning and performance benefits that can attract developers to OSS projects. However, the actual learning benefits to developers from their cross-participation is only a part of our theorizing but has not been measured, for example, by using a survey questionnaire or qualitatively. This is another avenue for future research, though any empirical method to capture this data could reduce the sample sizes, and thus the generalizability of the studies. Third, while our analysis controls for project activity and user downloads, we do not study the impact of developer joining on these consequences OSS development. Future studies can attempt to delineate more accurately the contribution of each

33

developer to their open source software projects, which would help build more nuanced theory. Fourth, also important in future studies, is the consideration of interdependencies in software code repositories which arise (German 2007) partly as developers cross-participate in many projects. Finally, whereas, our theorizing is focused on the ‘endogeneous network’ (Rosenkopf & Padula 2008), that is, we only examine the choices of those developers who are already connected within the community, a larger theory of open innovation communities could include both the entry of new developers and the rise of new projects as in a recent study (Hahn et al. 2008) that examines the emergence of new projects as developers make participation choices. Sixth, while OSS projects attract developers with diverse skills strongly if they are more experienced, this also raises a complementary question how sustained such participation (Fang & Neufeld 2009) is and whether OSS projects also face more turnover (Robles & GonzalezBarahona 2006) due to experienced developers seeking diverse opportunities. If so, this suggests that innovation is impeded due to loss of significant expertise such that experienced developers leave a significant vacuum and novices are left with lesser opportunities to learn from experts (Grugulis & Stoyanova 2011). Thus, not only is it sufficient to study how OSS projects attract developers but also how they lose developers (that is how turnover occurs). Thus, our study could be a basis for a further research towards understanding dynamics of growth as well as decline of OSS communities. Finally, the OSS projects are increasingly being sponsored by corporates, though in percentage terms, the proportion is extremely small (Mehra et al. 2011), and OSS projects are increasingly becoming more main-stream and commercial (Fitzgerald 2006). While our sample does not include projects which are sponsored by corporate entities, our findings cannot be generalized to projects with sponsorship. Therefore, literature would benefit

34

from a better understand how such sponsorship is altering the structure and viability of OSS communities and the interplay with the mechanisms studied in this paper. IMPLICATIONS & CONTRIBUTION Our work contributes to better understanding of OSS innovation and broadly to research on the formation of open innovation networks. Many studies have found that it is always the open communities that face the challenge of recruiting contributors instead of the contributors facing a challenge in finding opportunities to contribute their expertise (Choi et al. 2010; Fang & Neufeld 2009; Wasko & Faraj 2005). Further, it cannot be assumed that by simply being open, global and online, a community of knowledge workers has access to all necessary expertise and therefore will be innovative. Instead, just as organizations face the risk of myopia by being overspecialized (Levinthal & March 1993), this study identifies mechanisms by which an innovation community can engage (inadvertently) in path-dependent growth (through greater knowledge domain overlap) or circumvent it by recruiting more diverse expertise in work-groups (that is, lower knowledge domain overlap). The need to balance radical with incremental innovation makes it important for the OSS or any knowledge-community to achieve an appropriate balance between diversity and depth of expertise (Smith 2005; Tiwana & Bush 2005). By studying crossparticipation in OSS projects by developers, our work contributes towards this objective. The availability of micro-level data on open, virtual, innovation communities is an opportunity to deepen understanding of how social ties bridging different work-groups are formed. In our study we have attempted to examine this by focusing on developer crossparticipation. These online professional communities are greatly under-studied even though they consist of a large number of knowledge workers across many occupations and industries. Whereas some of these communities emerge as vibrant professional knowledge networks, others

35

fail to thrive. Our study has exploited available archival data on this context to extend and test theory that explains growth of an online, virtual community. References Adelson, B., and E. Soloway (1985)."The Role of Domain Experience in Software Design," Software Engineering, IEEE Transactions on, SE-11:11, pp 1351-1360. Agarwal, R., A.K. Gupta, and R. Kraut (2008)."The Interplay Between Digital and Social Networks," Information Systems Research, 19:3, pp 243-252. Ahuja, M.K., D. Galletta, and K. Carley (2003)."Individual centrality and performance in virtual R&D groups: An empirical study," Management Science, 49:1, pp 21-38. Aksulu, A., and M. Wade (2010)."A Comprehensive Review and Synthesis of Open Source Research," Journal of the Association for Information Systems, 11:11/12, pp 576-656. Annique, C.U. (2007)."Managing the innovators for exploration and exploitation," Journal of Technology Management and Innovation, 2:3, pp 4-20. Arthur, M.B., S.N. Khapova, and C.P.M. Wilderom (2005)."Career success in a boundaryless career world," Journal of Organizational Behavior, 26:2, pp 177-202. Bagozzi, R., and U. Dholakia (2006)."Open Source Software User Communities: A Study of Participation in Linux User Groups," Management Science, 52:7, pp 1099-1115. Banker, R.D., and R.J. Kauffman (1991)."Reuse and Productivity in Integrated Computer-Aided Software," MIS Quarterly, 15:3, pp 375-401. Barabasi, L., and R. Albert (1999)."Emergence of scaling in random networks," Science, 286, pp 509-512. Bergquist, M., and J. Ljungberg (2001)."The power of gifts: organizing social relationships in open source communities," Information Systems Journal, 11, pp 305-320. Bidwell, M., and F. Briscoe (2010)."The Dynamics of Interorganizational Careers," Organization Science, 21:5, pp 1034-1053. Bourdieu, P. "The forms of social capital," in: Handbook of theory and research for the sociology of education, J. Richardson (ed.), Greenwood Press, New York, 1986, pp. 241258. Burt, R.S. "The Social Capital of Structural Holes," in: The New Economic Sociology, M.F. Guillen, R. Collins, P. England and M. Meyer (eds.), Russel Sage Foundation, New York, 2000, pp. 148-192. Chesbrough, H.W. Open innovation: The new imperative for creating and profiting from technology. McGraw-Hill Ryerson Agency, 2003. Choi, N., I. Chengalur-Smith, and A. Whitmore (2010)."Managing First Impressions of New Open Source Software Projects," Software, IEEE, 27:6, pp 73-77. Cohen, W.M., and D.A. Levinthal (1990)."Absorptive Capacity: A New Perspective On Learning And Innovation," Administrative Science Quarterly, 35:1, pp 128-152. Cottam, J.A., and A. Lumsdaine "Extended assortativity and the structure of open source development community," in: Sunbelt, St. Pete Beach, FL, 2008 Crowston, K., and J. Howison "The social structure of Free and Open Source software development," in: MIT Open source collection (http://opensource.mit.edu/online_papers.php?lim=1000), 2004 Crowston, K., and J. Howison (2006)."Hierarchy and centralization in free and open source software team communications," Knowledge, Technology and Policy, 18:4, pp 65-85. 36

Crowston, K., and B. Scozzi (2002)."Open source software projects as virtual organizations: Competency rallying for software development," IEE Proceedings Software, 149:1, pp 3– 17. Crowston, K., and B. Scozzi (2008)."Bug fixing practices within Free/Libre Open Source software development teams," Journal of Database Management, 19:2, pp 1–30. Dane, E. (2010)."Reconsidering the trade-off between expertise and flexibility: A cognitive entrenchment perspective," Academy of Management Review, 35:4, pp 579-603. Defillippi, R.J., and M.B. Arthur (1994)."The Boundaryless Career: A Competency-Based Perspective," Journal of Organizational Behavior, 15:4, pp 307-324. Desouza, K.C., Y. Awazu, and A. Tiwana (2006)."Four dynamics for bringing use back into software reuse," Commun. ACM, 49:1, pp 96-100. Ellis, H.C. The transfer of learning. MacMillan, New York, 1965. Ericsson, K.A., M.J. Prietula, and E.T. Cokely (2007)."The Making of an Expert," Harvard Business Review, 85:7/8, pp 114-121. Fang, Y., and D. Neufeld (2009)."Understanding Sustained Participation in Open Source Software Projects," Journal of Management Information Systems, 25:4, pp 9-50. Faraj, S., and L. Sproull (2000)."Coordinating expertise in software development teams," Management Science, 46:12, pp 1554-1568. Fershtman, C., and N. Gandal "The Determinants of Output Per Contributor in Open Source Projects: An Empirical Examination," in: CEPR Discussion Paper No. 4329. Available at SSRN: http://ssrn.com/abstract=539783, 2004 Fershtman, C., and N. Gandal "Microstructure of Collaboration: The Network of Open Source Software (Working Paper #08-01)," in: NET Institute (www.netinst.org), 2008 Fitzgerald, B. (2006)."The Transformation of Open Source Software," MIS Quarterly, 30:3, pp 587-598. Fleming, L., and D. Waguespack (2007)."Brokerage, Boundary Spanning, and Leadership in Open Innovation Communities," Organization Science, 18:2, pp 165-180. Fosfuri, A., M.S. Giarratana, and A. Luzzi (2008)."The penguin has entered the building: The commercialization of open source software products," Organization Science, 19:2, pp 292-305. Garud, R., and A. Kumaraswamy (2005)."Vicious and virtual cycles in the management of knowledge: The case of Infosys Technologies," MIS Quarterly, 29:1, pp 9-33. German, D. "Using software distributions to understand the relationship among free and open source software projects," International Conference on Software Engineering, 2007. Grewal, R., G.L. Lilien, and G. Mallapragada (2006)."Location, location, location: How structural embeddedness affects project success in open source systems," Management Science, 52:7, pp 1043-1056. Griffin, B., and B. Hesketh (2004)."Why openness to experience is not a good indicator of job performance," International Journal of Selection and Assessment, 12:3, pp 243-251. Grugulis, I., and D. Stoyanova (2011)."The missing middle: communities of practice in a freelance labour market," Work, Employment & Society, 25:2, pp 342-351. Gupta, A.K., K.G. Smith, and C.E. Shalley (2006)."The interplay between exploration and exploitation," Academy of Management Journal, 49:4, pp 693-706. Haefliger, S., G. von Krogh, and S. Spaeth (2008)."Code reuse in open source software," Management Science, 54:1, pp 180-193.

37

Hagedoorn, J., and G. Duysters (2002)."Learning in Dynamic Inter-Firm Networks: The Efficacy of Multiple Contacts," Organization Studies, 23:4, pp 525-548. Hahn, J., J.Y. Moon, and C. Zhang (2008)."Emergence of new project teams from open source software developer networks: Impact of prior collaboration ties," Information Systems Research, 19:3, pp 369-391. Hansen, M.T., M.L. Mors, and B.R. LØVÅS (2005)."Knowledge sharing in organizations: Multiple networks, multiple phases," Academy of Management Journal, 48:5, pp 776793. Hars, A., and S. Ou (2002)."Working for free? Motivations for participating in open source projects," International Journal of Electronic Commerce, 6:3, pp 25-39. Herraiz, I., G. Robles, J. Jose Amor, T. Romera, and J.M.G. Barahona "The Processes of Joining in Global Distributed Software Projects," International Conference on Software Engineering, Shanghai, China, 2006, pp. 27-33. Hinds, P.J., K.M. Carley, D. Krackhardt, and D. Wholey (2000)."Choosing Work Group Members: Balancing Similarity, Competence, and Familiarity," Organizational Behavior & Human Decision Processes, 81:2, pp 226-251. Hyland-Wood, D., D. Carrington, and S. Kaplan "Scale-free nature of Java Software Package, Class and Method Collaboration Graphs," The 5th International Symposium on Empirical Software Engineering, Rio de Janeiro, Brazil, 2005. Jensen, C., and W. Scacchi "Modeling Recruitment and Role Migration Processes in OSSD Projects," Sixth Intern. Workshop on Software Process Simulation and Modeling,, St. Louis, MO, 2005. Johnson, J.P. (2002)."Open Source Software: Private Provision of a Public Good," Journal of Economics & Management Strategy, 11:4, pp 637-662. Khwaja, A.I., A. Mian, and A. Qamar "The value of business networks," in: Working paper, Harvard Business School, 2011.http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1763351 King, G., and L. Zeng (2001)."Logistic regression in rare events data," Political Analysis, 9:2, pp 137-163. Kogut, B., and A. Metiu (2001)."Open-source software development and distributed innovation," Oxford Review of Economic Policy, 17:2, p 248. Kraut, R.E., and L. Streeter (1995)."Coordination in software development," Communications of the ACM, 38:3, pp 69-81. Kuk, G. (2006)."Strategic interaction and knowledge sharing in the KDE developer mailing list," Management Science, 52:7, pp 1031-1042. Lakhani, K.R., and E. von Hippel (2002)."How Open Source Software Works: "Free" User-toUser Assistance," Research Policy, 32, pp 923-943. Lee, C.K. (2005)."Analysis of skill requirements for systems analysts in Fortune 500 organizations," Journal of Computer Information Systems, 45:4, pp 84-92. Lee, G., and R.E. Cole (2003)."From a firm-based to a community-based model of knowledge creation," Organization Science, 14:6, pp 633-649. Lerner, J., and J. Tirole (2002a)."The scope of open source licensing," Journal of Law, Economics & Organization, 21:1, pp 20-56. Lerner, J., and J. Tirole (2002b)."Some simple economics of open source," The Journal of Industrial Economics, 50:2, pp 197-234.

38

Lerner, J., and J. Tirole (2005)."The economics of technology sharing: Open source and beyond," Journal of Economic Perspectives, 19:2, pp 99-120. Levinthal, D.A., and J.G. March (1993)."The myopia of learning," Strategic Management Journal, 14, pp 95-112. Long, J. (2006)."Understanding the Role of Core Developers in Open Source Software Development," Journal of Information, Information Technology, and Organizations, 1, pp 75-85. Long, Y., and K. Siau (2006)."Social network structures in open source software development teams," Journal of Database Management, 18:2, pp 25-40. Madey, G. "SourceForge.net Research Data Archive, University of Notre Dame," 2006 Majchrzak, A., L.P. Cooper, and O.E. Neece (2004)."Knowledge reuse for innovations," Management Science, 50:2, pp 174-188. Manski, C., and R. Lerman (1977)."The estimation of choice probabilities from choice-based samples," Econometrica 45, pp 1977-1988. Markus, M.L. (2001)."Toward a Theory of Knowledge Reuse: Types of Knowledge Reuse Situations and Factors in Reuse Success," Journal of Management Information Systems, 18:1, pp 57-93. Marrone, J.A. (2010)."Team Boundary Spanning: A Multilevel Review of Past Research and Proposals for the Future," Journal of Management, 36:4, pp 911-940. Mathiassen, L., and P. Pourkomeylian (2003)."Managing knowledge in a software organization," Journal of Knowledge Management, 7:2, pp 63-80. Mayer, R.E. (1981)."The Psychology of How Novices Learn Computer Programming," ACM Comput. Surv., 13:1, pp 121-141. McAlister, L., and E. Pessemier (1982)."Variety Seeking Behavior: An Interdisciplinary Review," Journal of Consumer Research, 9:3, pp 311-322. Mehra, A., R. Dewan, and M. Freimer (2011)."Firms as Incubators of Open Source Software," Information Systems Research, 22:1, pp 22-38. Méndez-Durón, R., and C. García (2009)."Returns from social capital in open source software networks," Journal of Evolutionary Economics, 19:2, pp 277-295. Merton, R.K. On social structure and science. University of Chicago Press, Chicago, IL, 1996. Monge, P., and N. Contractor Theories of Communication Networks. Oxford University Press, New York, New York, 2003. Myers, C.R. (2003)."Software systems as complex networks: Structure, function, and evolvability of software collaboration graphs," Physical Review E, 68:046116, pp 1-15. Nelson, R., and S.G. Winter An Evolutionary Theory of Economic Change. Harvard University Press, Cambridge, 1982. Newell, S., M. Bresnen, L. Edelman, H. Scarbrough, and J. Swan (2006)."Sharing knowledge across projects: Limits to ICT-led project review practices," Management Learning, 37:2, pp 167-185. Nonaka, I. (1994)."A dynamic theory of organizational knowledge creation," Organization Science, 5:1, pp 14-37. Nonaka, I., and N. Konno (1998)."The concept of "Ba": Building a foundation for knowledge creation," California management review, 40:3, pp 40-54. O'Mahony, S., and B.A. Bechky (2006)."Stretchwork: Managing The Career Progression Paradox In External Labor Markets," Academy of Management Journal, 49:5, pp 918941. 39

Podolny, J.M. (1993)."A status-based model of market competition," American Journal of Sociology, 98:4, pp 829-872. Powell, W., K.W. Koput, D.R. White, and J. Owen-Smith (2005)."Network dynamics and field evolution: The growth of inter-organizational collaboration in the life sciences," American Journal of Sociology, 110:4, pp 1132-1205. Qureshi, I., and Y. Fang (2011)."Socialization in Open Source Software Projects: A Growth Mixture Modeling Approach," Organizational Research Methods, 14:1, pp 208-238. Ratner, R.K. (2006)."A Variety of Explanations for Variety-Seeking Behaviors: Physiological Needs, Memory Processes, and Primed Rules," Advances in Consumer Research, 33:1, pp 529-531. Raymond, E. The cathedral and the bazaar: Musings on Linux and open source by an accidental revolutionary. O'Reilly, Sebastopol, CA, 2001. Robert, J.L.P., A.R. Dennis, and M.K. Ahuja (2008)."Social Capital and Knowledge Integration in Digitally Enabled Teams," Information Systems Research, 19:3, pp 314-334. Roberts, J., I.-H. Hann, and S. Slaughter (2006)."Understanding the motivations, participation, and performance of open source software developers: A longitudinal study of the Apache projects," Management Science, 52:7, pp 984-999. Robles, G., and J.M. Gonzalez-Barahona (2006)."Contributor Turnover in Libre Software Projects," IFIP International Federation for Information Processing, 203. Rosenberg, N. (1976)."On technological expectations," Economic Journal, 86:343, pp 523-535. Rosenkopf, L., and G. Padula (2008)."Investigating the microstructure of network evolution: Alliance formation in the mobile communications industry," Organization Science, Articles in advance, pp 1-19. Setia, P., B. Rajagopalan, V. Sambamurthy, and R. Calantone (2010)."How Peripheral Developers Contribute to Open-Source Software Development," Information Systems Research (Articles in Advance), pp 1-23. Singh, J. (2005)."Collaborative networks as determinants of knowledge diffusion patterns " Management Science, 51:5, pp 756-770. Singh, P.V. (2010)."The Small World Effect: The Influence of Macro Level Properties of Developer Collaboration Networks on Open Source Project Success," ACM Transactions of Software Engineering and Methodology, 20:2, pp 6-27. Singh, P.V., Y. Tan, and V. Mookerjee."Network Effects: The Influence of Structural Social Capital on Open Source Project Success," MIS Quarterly, Forthcoming. Smith, E.A. (2005)."Communities of Competence: new resources in the workplace," Journal of Workplace Learning, 17:1/2, pp 7 - 23. Soda, G., A. Usai, and A. Zaheer "Network memory: The influence of past and current networks on performance," in: Academy of Management Journal, Academy of Management, 2004, pp. 893-906 Sodhi, J., and P. Sodhi Software reuse : domain analysis and design processes. McGraw-Hill, New York; London, 1999. Sophie, R. "Metrics of software reuse for free and open source software." Accessed at libre.tudor.lu/results/FOSSSoftwareReuse-Metrics-v1.0.pdf (March 15, 2010) Stewart, K., A.P. Ammeter, and L.M. Maruping (2006)."Impacts of license choice and organizational sponsorship on success in open source software development projects," Information Systems Research, 17:2, pp 126-144.

40

Stewart, K., and S. Gosain (2006)."The Impact of Ideology on Effectiveness in Open Source Software Development Teams," MIS Quarterly, 30:2, pp 291-314. Suh, A., K.-S. Shin, M. Ahuja, and M.S. Kim (2011)."The Influence of Virtuality on Social Networks Within and Across Work Groups: A Multilevel Approach," Journal of Management Information Systems, 28:1, pp 351-386. Tiwana, A. The Knowledge Management Toolkit: Orchestrating IT, Strategy, and Knowledge Platforms. (2nd ed.) Prentice Hall, 2002. Tiwana, A. (2004)."An empirical study of the effect of knowledge integration on software development performance," Information and Software Technology, 46:13, pp 899-906. Tiwana, A., and A.A. Bush (2005)."Continuance in expertise-sharing networks: a social perspective," Engineering Management, IEEE Transactions on, 52:1, pp 85-101. Tiwana, A., and E.R. McLean (2005)."Expertise Integration and Creativity in Information Systems Development," Journal of Management Information Systems, 22:1, pp 13-43. Tomz, M., G. King, and L. Zeng "RELOGIT: Rare Events Logistic Regression," (1999), Cambridge, MA, http://gking.harvard.edu/, Accessed December 2007 Venkatraman, M.P., and L.L. Price (1990)."Differentiating between cognitive and sensory innovativeness : Concepts, measurement, and implications," Journal of Business Research, 20:4, pp 293-315. Vernon, W.R. (1997)."Induced Innovation, Evolutionary Theory and Path Dependence: Sources of Technical Change," The Economic Journal, 107:444, pp 1520-1529. Vitalari, N.P. (1985)."Knowledge as a Basis for Expertise in Systems Analysis: An Empirical Study," MIS Quarterly, 9:3, pp 221-241. Von Hippel, E. The sources of innovation. Oxford University Press, 1988. von Hippel, E., and G. von Krogh (2003)."Open source software and the "private-collective" innovation model: Issues for organization science," Organization Science, 14:2, pp 209233. Von Krogh, G., S. Spaeth, and S. Haefliger "Knowledge reuse in open source software: An exploratory study of 15 open source projects," Proceedings of the 38th Hawaii International Conference on System Sciences, Hawaii, 2005. von Krogh, G., S. Spaeth, and K. Lakhani (2003)."Community, joining, and specialization in open source software innovation: A case study," Research Policy, 32:7, pp 1217-1241. von Krogh, G., and E. von Hippel (2006)."The promise of research on open source software," Management Science, 52:7, pp 975-983. Wasko, M.M., and S. Faraj (2005)."Why Should I Share? Examining Social Capital and Knowledge Contribution in Electronic Networks of Practice," MIS Quarterly, 29:1, pp 35-57. Wasserman, S., K. Faust, and M. Granovetter Social Network Analysis. Cambridge University Press, Cambridge, 1994. Weber, S. The success of open source. Harvard University Press, Cambridge, 2004. Weinberg, G. The psychology of computer programming. Van Nostrand Reinhold Company, New York, NY, 1971. Weiss, M., G. Moroiu, and P. Zhao (2006)."Evolution of open source communities," IFIP International Federation for Information Processing, 203. Wellman, B., J. Salaff, D. Dimitrova, L. Garton, M. Gulia, and C. Haythornthwaite (1996)."Computer networks as social networks: Collaborative Work, Telework, and Virtual Community," Annual Review of Sociology, 22:1, p 213. 41

Witt, U. (2008)."Observational learning, group selection and societal evolution," Journal of Institutional Economics, 4:1, pp 1-24. Xu, J., Y. Gao, S. Christley, and G. Madey "A topographical analysis of the open source software development community," Proceedings of the 38th Hawaii International Conference on System Sciences, 2005. Zaheer, S., and A. Zaheer (1997)."Catching the wave: Alertness, responsiveness, and market influence in global electronic networks," Management science, 43:11, pp 1493-1509.

42

Table 1 - Correlation Matrix 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

RESTRICTIVE HYBRID AGE(MNTHS) AGE SQUARE SIZE CUM_FILE CUM_FEATURE CUM_COMMIT TOP5_PROG TOP5_OS TOP5_TOPIC TOP5_AUD DEVDEG TIES TECHREL APPREL PDEG TECHEX APPEX TECHDEG APPDEG

CHOSEN -0.053 -0.016 -0.126 0.105 0.240 0.083 0.080 0.111 -0.005 -0.041 -0.011 -0.039 0.022 0.563 0.277 0.273 0.175 0.004 0.010 0.116 0.087

2

3

4

5

6

7

8

9

10

11

12

13

14

15

-0.291 0.020 -0.038 -0.064 -0.042 -0.015 -0.045 0.013 0.038 -0.045 0.021 -0.074 -0.057 -0.022 -0.037 -0.099 -0.005 0.004 -0.035 -0.038

0.054 0.011 0.006 0.017 -0.002 0.012 -0.006 0.015 -0.013 0.022 0.023 -0.008 0.002 0.004 0.028 -0.004 -0.007 -0.006 0.005

0.162 0.067 0.151 0.051 0.105 0.193 0.275 0.090 0.310 0.121 -0.078 -0.081 -0.092 0.123 -0.030 -0.033 0.031 0.025

1.000 0.038 0.058 -0.003 0.093 0.110 -0.011 -0.010 0.114 0.045 0.063 0.099 0.103 0.053 0.018 0.016 0.012 0.007

0.263 0.559 0.271 0.041 0.037 0.044 0.081 -0.006 0.186 0.137 0.151 0.813 0.007 0.009 0.490 0.522

0.329 0.174 0.083 0.071 0.028 0.100 0.042 0.039 0.074 0.075 0.276 0.001 0.003 0.133 0.118

0.188 0.013 0.027 0.013 0.063 0.000 0.035 0.067 0.084 0.488 0.035 0.039 0.306 0.336

0.053 0.046 0.014 0.058 0.014 0.151 0.108 0.105 0.283 -0.005 -0.002 0.328 0.247

0.371 0.087 0.182 0.014 0.001 0.076 0.030 0.039 0.008 0.007 0.024 0.016

0.048 0.214 0.015 -0.026 0.063 -0.004 0.037 0.006 0.003 0.025 0.013

0.147 0.030 0.005 -0.011 0.025 0.059 0.000 0.013 0.031 0.036

0.055 -0.037 0.030 0.130 0.099 0.018 0.025 0.048 0.047

0.058 -0.029 -0.033 0.308 -0.013 -0.013 0.000 0.003

0.244 0.245 0.156 -0.021 -0.019 0.137 0.109

43

Table 1 (cont’d) 17 18 19 20 21 22

APPREL PDEG TECHEX APPEX TECHDEG APPDEG

16 0.591 0.105 0.193 0.173 0.204 0.126

17

18

19

20

21

0.129 0.214 0.242 0.178 0.184

-0.001 0.001 0.458 0.549

0.850 -0.001 0.009

0.010 0.011

0.823

Table 2 - Univariate Statistics Variables

Full Sample (N=13896) Choice sample (N=538) in parentheses Mean

CHOSEN 0.038 (1) RESTRICTIVE 0.7 (0.58) HYBRID 0.035 (0.02) AGE (MONTHS) 32.01 (19.09) AGESQ 1448 (848.8) SIZE 2.735 (12.96) CUM_FILE 4.599 (9.69) CUM_FEATURE 3.404 (17.78) CUM_COMMIT

143.62 (1193.35)

StdDev

Min,Max

0.192 (0)

0, 1 (1,1)

0.458 (0.49)

0, 1 (0,1)

0.184 (0.14) 20.56 (22.02)

Variables

Full Sample (N=13896) Choice sample (N=538) in parentheses Mean

StdDev

TOP5_PROG 0.833 (0.81) 0.694 (0.8)

0, 6 (0,5)

1.071 (1.15)

0, 6 (0,6)

0, 1 (0,1)

TOP5_TOP 0.279 (0.25) 0.532 (0.52)

0, 3 (0,3)

0, 83 (0,79)

TOP5_AUD 1.328 (1.15) 0.915 (1.01)

0, 7 (0,5)

1452 (1390.91) 0, 6889 (0,6241)

TOP5_OS 1.221 (1)

Min,Max

DEVDEG 2.4 (2.65)

2.23 (1.91)

1, 34 (1,15)

8.524 (31.74)

1, 350 (1350)

TIES 0.013 (0.33) 0.112 (0.47)

0, 1 (0,1)

12.338 (19.9)

0, 266 (0,153)

TECHREL 0.193 (1.06) 0.622 (1.38)

0, 9 (0,8)

35.768 (94.59) 0, 1955 (0,1151) 1888.232 (8980.2)

0, 138928 (0,138928)

APPREL 0.166 (0.87) PDEG 3.51 (13.67)

0.52 (1.08)

0, 6 (0,5)

11.63 (34.33) 0, 398 (0,397)

44

Table 3 - Estimation results        

Models  Constant Project License (RESTRICTIVE) Project License (HYBRID) Project Age (AGE) AGE SQUARE Project Size (SIZE) No. of files (CUM_FILE) No. of Features (CUM_FEATURE) No. of commits (CUM_COMMIT) Prog. Language Popularity (TOP5_PL) Operating System Populatity (TOP5_OS) Topic Popularity (TOP5_TOPIC) Audience Popularity (TOP5_AUD) Developer centrality (DEVDEG) Developer past ties (TIES)

1 -7.810*** (0.1913) -0.249* (0.1413) -0.538 (0.3466) -0.629*** (0.0678) 0.641*** (0.059) 0.057*** (0.0096) 0.015*** (0.0023) -0.012*** (0.0031) -0.000*** (0.000015)

2 -7.985*** (0.218) -0.271* (0.137) -0.586 (0.3871) -0.414*** (0.0716) 0.640*** (0.0633) 0.073*** (0.0122) 0.007** (0.0035) -0.017*** (0.0038) -0.000*** (0.000036)

3 -7.722*** (0.2089) -0.224* (0.1355) -0.518 (0.4006) -0.423*** (0.0612) 0.547*** (0.0624) 0.094*** (0.0191) 0.008** (0.0037) -0.009 (0.0056) -0.000** (0.000062)

4 -7.690*** (0.207) -0.238* (0.1358) -0.525 (0.4032) -0.438*** (0.0619) 0.534*** (0.0622) 0.089*** (0.0194) 0.008** (0.0038) -0.008 (0.006) 0 (0.000068)

5 -7.520*** (0.2171) -0.241* (0.1378) -0.51 (0.3988) -0.455*** (0.0649) 0.524*** (0.0651) 0.085*** (0.0191) 0.005 (0.0037) 0.001 (0.0044) -0.000*** (0.000016)

0.176* (0.0955)

0.132 (0.1001)

0.071 (0.0978)

0.047 (0.0981)

-0.007 (0.0994)

-0.129 (0.1265)

-0.168 (0.1126)

-0.187* (0.1132)

-0.172 (0.1097)

-0.157 (0.1067)

-0.076 (0.1583)

-0.024 (0.17)

-0.119 (0.1371)

-0.124 (0.1361)

-0.175 (0.1371)

-0.008 (0.0844)

-0.236** (0.0798)

-0.199** (0.0869)

-0.211** (0.0888)

-0.268** (0.0871)

-0.011 (0.0324)

-0.015 (0.0351)

0.086*** (0.0233)

0.088*** (0.0239)

0.055** (0.0268)

7.364*** (0.5997)

6.626*** (0.5456)

6.990*** (0.6568)

6.984*** (0.6659)

6.639*** (0.5797) 45

Technical Domain 0.266*** 0.303*** 0.313*** 0.297*** Overlap (TECHREL) (0.0424) (0.0388) (0.0381) (0.038) MAIN Application Domain 0.228*** 0.231*** 0.267*** 0.335*** EFFEC Overlap (APPREL) (0.0443) (0.0435) (0.0455) (0.0408) TS Project social capital -0.535** -0.553** -0.121 (PDEG) (0.2011) (0.2065) (0.2575) Technical Domain Overlap X Developer -0.068** -0.058* Experience (TECHEX) (0.0337) (0.0309) Application Domain Overlap X Dev. 0.018 0.009 (0.0194) (0.0177) INTERACTION Experience (APPEX) EFFECTS Technical Domain Overlap X Project -0.194*** social capital (TECHDEG) (0.0516) Application Domain Overlap X Project -0.002 social capital (APPDEG) (0.0545) Notes: p

Suggest Documents