User-Generated Open Source Products: Founder’s Social Capital and Time-to-Product-Release
Girish Mallapragada University of North Carolina at Chapel Hill
Rajdeep Grewal Gary L. Lilien The Pennsylvania State University
ISBM Report 01-2011
Institute for the Study of Business Markets The Pennsylvania State University 484 Business Building University Park, PA 16802-3603 (814) 863-2782 or (814) 863-0413 Fax www.isbm.org,
[email protected]
User-Generated Open Source Products: Founder’s Social Capital and Time-to-Product-Release
Girish Mallapragada Assistant Professor of Marketing Kenan-Flagler Business School University of North Carolina at Chapel Hill CB#3490, McColl Building Chapel Hill, NC 27599
[email protected]
Rajdeep Grewal Irving & Irene Bard Professor of Marketing Smeal College of Business The Pennsylvania State University 407 Business Building University Park, PA 16802
[email protected]
Gary L. Lilien Distinguished Research Professor of Management Science Smeal College of Business The Pennsylvania State University 484B Business Building University Park, PA 16802
[email protected]
January 2011 PLEASE DO NOT CITE WITHOUT THE PERSMISSION OF THE AUTHORS
__________________ The article is based on the dissertation of the first author. The authors acknowledge support for the research from the Institute of Business Markets, The Pennsylvania State University, University Park, PA, 16802.
User-Generated Open Source Products: Founder’s Social Capital and Time-to-Product-Release
ABSTRACT Volunteer users employ collaborative Internet technologies to develop open source products, a form of usergenerated content, for which time to product release is a crucial measure of project success. The open source community also features two separate but related subcommunities: developer-users who contribute time and effort to develop products and end-users who act as collaborative testers and provide feedback. We develop hypotheses concerning how the location of the project’s founders in the social network of developer-users, the interplay of developer-users and end-users, and project and product characteristics affect time to product release. We use data on 817 development projects from SourceForge, a large open source community forum, to calibrate split-hazard timing process model to test the hypotheses. That model supports the twocommunity conceptualization and most of the related hypotheses. The results also have theoretical and managerial implications; for example, the position of founders in the developer-user community can reduce time to product release by up to 51%, and products that use a forum experience 80% quicker time to product release compared with those that do not. Key words: user-generated products, product development, social networks, open source, innovation.
1.
Introduction Blogs, twitter feeds, social network profiles, videos, and other digital products have the common feature
that content consumers are also producers; hence the resulting content is known as user generated content. In the open source software (OSS) domain, where the user generated content is normally a software product (Raymond 1999), users are developers of software code as well as consumers of software products. Because the intellectual property or source code of these products is available in the public domain, such products are referred to as “open source products”. For example, Cambrian House, a start-up firm founded in 2006, leveraged its community of users to develop business technology ideas and thus provided a best practice case study on distributed knowledge management and open innovation (Coles, Lakhani and McAfee 2008). Other OSS communities have developed a plethora of successful products, including the Firefox browser, Apache Web server, and Linux operating system (O'Mahony and Ferraro 2007). The OSS domain also provides an alternative to traditional, top-down, firm-driven software new product development (NPD) processes. In the OSS domain, NPD process success depends on the contributions of volunteer users, who both produce and use the products. Various NPD projects compete for the community’s attention and resources, so when project teams can release working versions of their products more quickly, they earn an early mover advantage that offers them a superior market position (e.g., Lakhani and von Hippel 2002, Raymond 1999). Therefore, the time to product release is a critical metric for NPD process success in the OSS domain and the focus of our research. The user community for an OSS project consists of two subcommunities: developer-users, who include OSS project members and developers, and end-users, who are not developers on that project but might develop other projects. Drawing on NPD literature in both traditional firm settings (for a meta-analysis, see Henard and Szymanski 2001) and the OSS domain (e.g., Grewal, Lilien and Mallapragada 2006, Lee, Kim and Gupta 2009), with a recognition of the critical roles of both subcommunities, we infer three main factors that should influence OSS NPD process success: (1) the information and resources available to the project, which depend on the location of the founders of a project in the social network of developer-users (see Burt 1997,
1
Lin 2001); (2) the interplay of the project's developer-user and end-user communities; and (3) the OSS project and product characteristics. The first factor reflects the founders’ social capital in the network of developer-users; we thus draw on literature from sociology pertaining to social networks and social capital (e.g., Lin 2001, Obstfeld 2005) and consider the embeddedness and brokerage activities of founders. Embeddedness measures the volume of information and resource input founders receive from the network; brokerage represents the novelty of such information and resources (e.g., Fleming and Waguespack 2007, Nerkar and Paruchuri 2005). For the second factor, we measure both user engagement, which reflects how the end-user community participates in the projects, and product positioning, which determines whether products are positioned primarily for the developer-user or end-user community. Finally, for the third factor, we consider various OSS project and product characteristics, including project founder’s experience, type of open source license, and team size, among others. To account for projects that never release a product, we model time to product release by a split-hazard process timing model. We collect data over 42 months pertaining to 817 projects that began in January 2003 on SourceForge.net, an Internet community that focuses on open source product development and features more than 160,000 user-generated product development projects and nearly 3 million users (as of December 2010). Results from a simulated maximum likelihood estimation of the split-hazard hazard model show that increases in both embeddedness and brokerage hasten the time to product release (the linear, quadratic, and interactive terms are statistically significant). In addition, the engagement of end-users and product positioning significantly affect the influence of embeddedness on time to product release. Beyond their statistical significance, our results have managerial implications; for example, time to product release declines by an average of 51% when project founders have both high embeddedness and high brokerage. In the next section, we discuss the OSS context, delineate the importance of time to product release, and develop our conceptual framework. We next develop theoretical arguments for the effects of the founder’s social capital, the interplay between the two subcommunities and the OSS project and product characteristics. 2
After we outline our timing process model, we describe our data source, data collection procedure, and measures. We then discuss our findings and model validation approach, followed by the implications of our work for theory and practice, limitations, and opportunities for further research. 2.
Conceptual Background The fundamental principle of the OSS model is simple: OSS community members who develop new
digital products (developer-users) post the source code on Internet forums (e.g., SourceForge, RForge) under an open source license, and other OSS community members use the product (end-users) and may modify the code and create their own versions. The term “co-creation” describes such communities, where content is both created and used by the same set of members (O’Hern and Rindfleisch 2009). In the OSS co-creation model, users’ contribution decisions often depend on the NPD projects’ visibility, uniqueness, and popularity (Lakhani and von Hippel 2002, Subramaniam, Sen and Nelson 2009). Developer-users contribute by writing and modifying code, resolving bugs, maintaining documentation, and addressing support requests; end-users report bugs, download products, and participate in forums, which generate product visibility. Thus, the core process of user-generated content creation and maintenance involves the active and iterative interplay of the developer-user and end-user communities. Raymond’s (1999) visionary call to release early and often also is a central tenet of the OSS development model. A product’s versions that perform core functions but may lack secondary features or final aesthetics get released to prompt feedback of the OSS user community. Furthermore, releasing the product early may be essential to gain initial traction and attract a sufficient number of dedicated users who offer vital, early, and critical feedback (Lakhani and von Hippel 2002, Raymond 1999). Therefore, time to product release is a critical measure of success for OSS development.1 Meta-analytic evidence regarding the drivers of traditional NPD suggest that success depends on the characteristics of the product, organizational processes and strategies, and marketplace characteristics We acknowledge criticism of and limitations related to this success metric (e.g., Bayus, Jain and Rao 1997, Langerak and Hultink 2006), yet by investing heavily in product R&D, firms can release more frequent product upgrades (Bayus, et al. 1997), capture and maintain market share (Smith 1999), and increase future market valuation (Hendricks and Singhal 1997). 1
3
(Henard and Szymanski 2001). Similar research in the OSS domain (e.g., Subramaniam, et al. 2009) suggests that OSS project and product characteristics, such as the type of license, availability of prior code, user and developer interest, and the network location of project managers, affect open source project success too. Thus, the success of an OSS NPD process likely depends on the information and resources available and the characteristics of the product under development (e.g., Barczak, Griffin and Kahn 2009; Lee, et al. 2009). Furthermore, because the OSS community consists of two distinct subcommunities—developer-users and end-users—their interplay should affect OSS NPD success as well. In particular, the availability of information and resources for a project depends on the social capital of developer-users working on the project, as reflected in their location in the social network of developer-users (e.g., Lin 2001) and their interactions with end-users (e.g., Shah 2006). As we depict in Figure 1, the drivers of success thus should include (1) founders’ social capital, (2) the interplay of developer-users and end-users, and (3) control variables related to project and product characteristics. Because developer-users drive the NPD process, we theorize that the effect of their social capital is moderated by variables concerning the interplay of the two subcommunities. [Insert Figure 1 about here] According to literature on social capital and social networks (e.g., Lin 2001), embeddedness and brokerage (Burt 2005, Granovetter 1985) together ensure that resources for NPD projects will be available (e.g., Fleming, Mingo and Chen 2007, Obstfeld 2005). When a new project begins, the founders’ social network represents the primary connection to the OSS world and thus the primary determinant of information and resource availability. Research in organizational ecology (e.g., Romanelli 1989, Stinchcombe 1965) also shows that the resource environment at founding has a lasting effect on subsequent organizational structures, processes, strategies, and survival likelihood. In the OSS context, the social network of the founders therefore should be the primary source of information and resources at founding, and its characteristics should influence the success of OSS projects (e.g., Fleming and Waguespack 2007, Grewal, et al. 2006).
4
We conceptualize the interplay of the developer-user and end-user communities with two variables from previous research (e.g., Lee, et al. 2009, Subramaniam, et al. 2009). The degree of user engagement captures the level of interest among the user community and indicates the exchange of information between communities, which forms a channel by which feedback from the end-user community reaches the developer-user community. Product positioning indicates whether the developer-user community presents products to the end-user community as “developer-focused” (i.e., technical products) or “end-user focused” (i.e., nontechnical) ( Subramaniam, et al. 2009). We theorize that the degree of user engagement and product positioning both moderate the influence of the developer-user social network constructs on the time to product release. Finally, following research on firm-centric NPD processes (e.g., Henard and Szymanski 2001), we incorporate several project and product characteristics, including the type of the open source license, the size of the first release file, the size of the project team, and prior project code as control variables (e.g., Stewart, Ammeter and Maruping 2006, Subramaniam, et al. 2009). 2.1. Founders’ Social Capital Collaboration in goal-oriented groups occurs either through direct, natural collaboration between individual actors who interact (e.g., Coleman 1988, Granovetter 1985) or through brokerage (e.g., Burt 2005, Fleming and Waguespack 2007). In either case, it signals trust among network actors and entails information and knowledge (resource) sharing (e.g., Uzzi 1997). The resource benefits from direct collaboration are higher when the actor is more deeply embedded in the network; higher embeddedness implies greater access to “fine-grained information” that tends to be “tacit, complex, or proprietary” (Fleming, et al. 2007, p. 444). The resource benefits from brokered collaboration instead reflect the strength of weak ties (Granovetter 1973), in that weak ties connect distant parts of a network, so actors that broker weak ties gain access to diverse information and control information sharing (Burt 1997). To capture both forms of collaboration, we characterize founders’ social capital as both embeddedness and brokerage (cf. Grewal, et al. 2006), in line with recent research on organizational social networks that recognizes their joint influence (e.g., Nerkar and Paruchuri 2005). For example, the execution and implementation of innovation requires the mobilization of 5
available resources, which vary directly with embeddedness, but creativity often stems from integrating diverse information, obtained from brokerage (e.g., Obstfeld 2005). In Figure 2 we illustrate embeddedness and brokerage in a social network: Seven users work on common projects, such that two users are connected if they work on a common project (i.e., a two-mode or affiliation network; Wasserman and Faust 1999). An actor’s embeddedness increases as it connects to more other actors (Swaminathan and Moorman 2009) and becomes more and more central to the social network (Rindfleisch and Moorman 2001). For example, in Figure 2, user 4 works on two projects, and user 3 works on three, so user 3 is more embedded than user 4, in terms of degree centrality. Brokerage instead increases with actors' ability to connect disconnected parts of the network (e.g., Burt 2005, Fleming and Waguespack 2007). In Figure 2, user 4 brokers the relationship between two groups, and as the sole link between them, user 4 earns a high brokerage score, because it spans the gaps and bridges structural holes in the network (e.g., Burt 2004). [Insert Figure 2 about here] 2.1.1.
Embeddedness
In a social network setting, embeddedness provides efficient access to other parts of the network (e.g., Grewal, et al. 2006, Ronchetto, et al. 1989), so greater embeddedness of founders signals more and increasingly redundant connections to the network. By serving as the access point for these redundant connections, the highly embedded founders enjoy the benefits of being the centers of action and can coordinate activities (e.g., Swaminathan and Moorman 2009). That is, high embeddedness provides the resources, information, and feedback necessary to ensure coordination in the OSS domain (e.g., Fleming, et al. 2007, Kogut and Zander 1992), and efficient coordination then reduces the time to product release (Barczak, et al. 2009). In our research context, newly founded projects compete in a crowded virtual community with strict resource constraints, as evidenced by the high rate of failure in OSS projects (von Krogh and von Hippel 2006). Projects initiated by highly embedded founders gain access to diverse member skills, improve coordination (e.g., Uzzi 1997), and thus speed up product development. However, greater founder embeddedness means they can devote less attention to any given project (Rosa, Porac, Runser-Spanjol and 6
Saxon 1999). Therefore, embeddedness should decrease time to market, with diminishing returns, and we hypothesize: H1:
As the project founders’ embeddedness at the time of founding increases, the time to product release decreases at a decreasing rate.
2.1.2.
Brokerage
Developer-users that broker relationships by linking sparsely connected parts of a social network tend to have access to and control over diverse information sets, which gives the broker power (e.g., Burt 2005) and fosters creativity (e.g., Burt 2004). The combination of power and creativity should result in innovative NPD projects developed in a timely manner (e.g., Fleming, et al. 2007, Nerkar and Paruchuri 2005). Existing NPD literature supports this assertion for the general NPD process (e.g., Sethi, Smith and Park 2001, Song and
Parry 1997). In our study context, such improvement should mean shorter times to product release, though similar to the returns to social capital (e.g., Zucker, Darby, Brewer and Peng 1995), we reason that this benefit exhibits diminishing returns. Thus, we hypothesize: H2:
As the project founders’ brokerage at the time of founding increases, the time to product release decreases at a decreasing rate.
2.1.3.
Embeddedness and Brokerage
Several studies have predicted trade-offs in the extent to which members benefit from various types of social capital, given the interactions among these types (e.g., Burt 1997, Obstfeld 2005, Fleming, et al. 2007). For example, Burt (2004) shows that decreased brokerage by a network member might reflect greater connection density for that member, a larger network, or changes in the network hierarchy. Therefore, members in a given network structure may have to shift their location in the network to gain the benefits of one type of social capital relative to another. In our study context, this trade-off refers to the desire to achieve efficiency by improving embeddedness and the desire to foster creativity by occupying a unique network location (brokerage) (Fleming, et al. 2007). It parallels the exploration versus exploitation dichotomy that March (1991) has proposed for organizational learning processes, such that exploitative processes require continuous and efficient resource availability, 7
whereas explorative processes require continual access to new opportunities. Because founders’ connections facilitate information and resource dissemination, greater embeddedness should directly benefit founders’ new projects. When their brokerage increases, founders also should obtain more indirect value from being able to combine disconnected sets of resources when they achieve greater embeddedness, because embeddedness facilitates information dissemination (e.g., Fleming, et al. 2007). Therefore, projects should benefit from the founders’ ability to create an environment marked by both explorative and exploitative processes (Obstfeld 2005). We thus predict that embeddedness has a greater impact on time to product release when brokerage increases: H3:
As the project founders’ brokerage at the time of founding increases, the magnitude of the reduction in time to product release due to embeddedness increases.
2.2.
Interplay of Developer-Users and End-Users The exchange of information between developer-user and end-user communities underlies user-
generated content creation and maintenance; we characterize this exchange using the degree of user engagement and product positioning. 2.2.1.
Degree of User Engagement
As user engagement (i.e., number of bugs reported) increases, the benefits that the project team receives from the community’s feedback also increases (Grewal, et al. 2006). Thus, products that garner many bug reports benefit from elevated user activity, which reveals problems in functionality and suggests ways to fix programming logic flaws. Overall though, the degree of user engagement should reduce time to product release. However, when the project founder’s social capital is greater, the beneficial effects of degree of user engagement should be stronger: More social capital enables founders to understand and take advantage of user engagement and feedback to speed up the NPD process. (e.g., Grewal, et al. 2006, Hahn, Moon and Zhang 2008). Therefore, the decrease in time to product release as user engagement increases should be magnified as founder’s social capital increases H4:
As the degree of user engagement increases, the magnitude of the reduction in time to product release due to the founders’ (a) embeddedness and (b) brokerage increases.
2.2.2.
Product Positioning 8
The project’s target audience, or its positioning, is critical in an OSS context (e.g., Subramaniam, et al. 2009). The project administrator decides whether to target developer-users (technical) or end-users (nontechnical), though development teams tend to prioritize products with the broadest community appeal. Therefore, user-focused, nontechnical products should exhibit a shorter time to product release than developer-focused, technical products. The effect of the founder’s social capital also should vary with the product positioning. User-focused projects may generate less technical feedback from the community, resulting in a faster development process (Lakhani and von Hippel 2002). That is, the reduction in time to product release due to social capital should be greater for user-focused compared with developer-focused projects. H5:
The magnitude of the reduction in time to product release due to the founders’ (a) embeddedness and (b) brokerage is greater for user-focused projects than for developer-focused products.
3.
Model Development To model the effect of the three theorized factors on the time to product release of user-generated
products, we denote the founding point as time 0 and seek a model that accounts for (1) censored observations, (2) projects that never release a product, (3) the influence of covariates (some of which vary over time) and control variables, (4) unobserved heterogeneity, and (5) an appropriate specification of the basic timing distribution. In our base model, we let the random variable T represent the time to the first product release when the hazard rate—or the probability of product release at time T, given that the first release has not occurred yet—is h T f T S T , with T as the time to product release, f is the probability density function, and S is the survival function, where S 1 F and f F ' , such that F is a cumulative distribution function. 3.1.
Censored Observations Some covariates that affect the hazard vary over time, so for each project i in the sample, we let the
interval (0, ti] be divided into k exhaustive, non-overlapping intervals, such that t0 t1 ... tk 1 tk , with
t0 0 and tk ti . Then we let Ci indicate the release event for project i, so Ci 1 if a release occurs and 0 9
otherwise. The covariate vector is X i X i
DU
, X iEU , X iC , where X iDU represents the project founders’
social capital variables derived from the developer-user network, X iEU represents variables that characterize the interplay between developer-users and end-users and X iC represents the OSS project and product characteristics, i.e., control variables. For the sake of simplicity, we drop the subscript i. The covariates may change from one interval to the next but stay constant within an interval. Therefore, the hazard function
from time tj-1 to tj can be written as h t | X j , and because ht d log S t / dt ,
tj Pr ob T t j | T t j 1 exp h s | X j ds . t j1
(1)
Therefore, the survival function for a duration of tk or longer can be written as
tj S tk | X k Pr ob T t j | T t j 1 exp h s | X j ds . t j 1 j 1 j1 k
k
(2)
If we take the logs of both sides of Equation (2), we derive: k
ln S tk
tj
h s | X ds .
j 1 t j 1
(3)
j
The log-likelihood for a single observation can be written as the sum of the contributions to the hazard and survival functions, that is, k
LogLi Ci * ln h tk | X k
tj
h s | X ds .
j 1 t j 1
j
(4)
Finally, projects that do not get released during the observation window (0, Tc], such that Ci 0 at the end of the observation window, contribute only to the survival function. Projects released during one of the k intervals during the observation window (0, Tc], such that Ci 1, contribute to both survival and hazard functions. If we substitute Equation (3) into Equation (4), we can simplify the log-likelihood as
LogLi Ci *ln h tk | X k ln S tk .
(5) 10
3.2.
Accounting for Projects that Never Release A standard assumption in basic hazard models is that all observations eventually experience the event,
which implies an occurrence probability of 1. However, this probability need not equal 1 (e.g., Schmidt and Witte 1989), because some open source projects may never release a product file. To account for two subpopulations (events that eventually occur and those that do not), we modify the basic hazard specification with a split-hazard formulation (Dekimpe, Gucht, Hanssens and Powers 1998, Sinha and Chandrashekaran 1992), such that the probability of eventual product release
i is a function of covariate vectors, including the
project founders’ social capital variables X iDU , characteristics of interplay between the two communities
DU EU C X iEU , OSS project and product characteristics X iC , as i X i , X i , X i . Thus, the log-likelihood
for a single observation can be specified as
LogLi Ci *ln h tk | X k ln 1 i i * S tk . 3.3.
(6)
Model Covariates and Controls We incorporate the effects of project founders’ social capital ( X iDU ), characteristics of interplay between
developer-users and end-users, i.e., the moderators ( X iEU ), and OSS project and product characteristics ( X iC ) on time to product release. We also incorporate the interaction between the moderators and founder’s social capital. Let
i be the positive location parameter and p be the scale parameter of the duration
distribution. We incorporate these effects by modifying the location parameter of the hazard function
i as a
function of X iDU , X iEU and X iC , as follows:
i exp 0 X iDU X iEU X iDU X iEU X iC , where
(7)
0 is the constant term, captures the effect of founders’ social capital variables, is the main
effect of moderators (i.e., degree of user engagement and product positioning), refers to the interaction
11
between social capital variables X iDU and the moderators X iEU and is the effect of control variables X iC . The degree of user engagement is a time-varying covariate in the model. In addition, to accommodate the effect of unobserved individual project characteristics, we incorporate the effect of unobserved heterogeneity. Using a random effects approach, we model the unobserved heterogeneity component with a random parameter (Sastry 1997). Thus, a standard normal random variable
~ N 0, 2 affects the location parameter, as follows: i i exp 0 ' X iDU ' X iEU ' X iDU X iEU X iC i . Then the modified log-likelihood conditional on
i can be specified as
N
(8)
LogL | 1 ,..., N Ci *ln h ti | i i , p ln 1 i i * S tk . i 1
(9)
The timing process model in Equation (9) incorporates heterogeneity as a function of project characteristics on both time to product release and the probability of eventual product release (Dekimpe, et al. 1998). To obtain the unconditional log-likelihood, we integrate N
i from Equation (9), such that
LogL Ci * ln i * f ti | i i , p ln 1 i i * S tk f i d i . i 1
(10)
Then, because the log-likelihood in Equation (10) does not have a closed form, the integral can be computed by first simulating draws of
i and then averaging the log-likelihood across the draws (e.g., Train 2003). The
simulated log-likelihood approach has been employed as an effective estimation technique in other studies (e.g., Erdem and Keane 1996, Park and Gupta 2009, Villas-Boas and Winer 1999) with estimation problems similar to ours. The simulated log-likelihood is: N
LogLs i 1
where 3.4.
1 R Ci *ln i * f ti | i ir , p ln 1 i i * S tk R r 1
(11)
ir is a simulated random sample of R draws. Timing Distribution Specification 12
The timing process model in Equations (1)–(11) can accommodate a range of timing distributions, such as exponential with a constant hazard, Weibull with a monotonic hazard, and log-normal with a nonmonotonic hazard. We have no a priori, theoretical reason to support one of these distributions over others and use model fit to select among them. 4.
Research Context and Data
4.1.
SourceForge.net: Data Context The SourceForge.net hosting platform is a collaborative product development platform that enables users
to coordinate their open source product development efforts (Hahn, et al. 2008). As of December 2010, more than 2.7 million users worked on approximately 260,000 open source projects, including database tools, application software, games, text and programming editors, utility tools, and so forth. The platform’s collaborative capabilities enable a project founder to advertise and recruit volunteer users and organize the development of the product via the Internet. Projects make their source code public, so any registered user can download the source code for his or her own use. Prior studies also gather data from SourceForge (e.g., Grewal, et al. 2006, Hahn, et al. 2008), and von Hippel and von Krogh (2006) cite it as an attractive venue for research into user communities and hybrid innovation models. We gathered data about the network structure of OSS projects at their founding, as well as the projects’ characteristics, from a sample of open source projects registered on SourceForge. Our access to these data proceeded through the SourceForge data warehouse, which records and organizes all activities on the site (Madey 2005). The data warehouse is a separate entity from the Web site and provides a repository of activities taking place on SourceForge and is maintained by researchers at the University of Notre Dame. To avoid inconsistencies and confounds from different start times and obtain the longest observation timeframe possible, we selected only new projects initiated during the first 10 days of January 2003, or earliest available data in the data warehouse. We thus identified 817 new projects, which we tracked for 42 months. Market entry occurs upon the first file release by the project, as recorded in the data warehouse. We tabulated such events for the projects in our sample and calculated the time to first release (registration to release) in hours. During our observation window, 468 projects resulted in product releases, with the times plotted in 13
Figure 3. The remaining 349 projects did not report a product release during the 42-month observation window. 4.2.
The Network of User-Generated Products and Founders Because data about the network structure of the projects are not directly available on SourceForge, we
constructed the network structure at founding from user-project membership information. The data warehouse stores monthly snapshots of all the activity on the SourceForge Web site, so we accessed database tables for January 2003, which represent the cumulative history from 1999–2003. Subsequent monthly data dumps are available beginning November 2004 updated every month. We accessed data about project-user memberships from the data warehouse and obtained details about the founding users for the 817 focal projects using the user-project membership table for January 2003, such that we downloaded the relational data in a list format. We then transformed the relational data (i.e., who founded which project) into network structure data, represented by an affiliation network matrix (Wasserman and Faust 1999) with the network analysis software UCINET 6. An affiliation network matrix depicts actors (i.e., users) as rows and events (i.e., projects) as columns; each cell takes a value of 1 if the user works on a project or 0 if the user does not (see Figure A.1 in Appendix A). Because our objective is to study newly founded projects initiated in a connected network, we applied a snowballing procedure (Wasserman and Faust 1999), in which we listed all projects whose development efforts formally initiated in the first 10 days of January 2003, then listed all 966 developer-users working on these initiatives. We next listed other (existing) projects on which these 966 developer-users had worked, excluding the first set of 817 projects, and identified 624 additional projects. Finally, we listed all developerusers working on these 624 projects and thereby identified 1,723 additional individuals. This procedure produced a network of 1,441 projects and 2,689 developer-users. We again transformed these relational data into an affiliation network, in which 1 in a cell indicates that the user identified by the column works as a contributor on the project identified by the row. A 0 in a cell indicates the absence of such a relationship. This matrix captures all relationships defined by the founders of the original 817 new
14
initiatives. We present a visual representation of the largest continuously connected component of the network sample in Figure 3. [Insert Figure 3 about here] 5. Measures 5.1.
Time to Product Release We use the time until the first release of the product as our dependent variable. Of the 817 projects, 468
(57.3%) experienced a product release and thus provide an observed time to product release. We treat the remaining 349 projects (42.7%) as right-censored observations. Because we use hours as the unit of time, we employ a continuous time modeling approach; because the dependent variable is a time measure and can only take positive values, we log-transform it, which is a common and recommended practice in hazard\duration modeling literature (Kalbfleisch and Prentice 1980, Kiefer 1988, Sinha and Chandrashekaran 1992). 5.2.
Project Founders’ Social Capital Scholars have used various operationalizations of embeddedness, depending on the context, but the most
widely employed measures include degree centrality, betweenness centrality, and closeness centrality (Ronchetto, et al. 1989); we employ all three. Degree centrality is the number of existing projects connected to any given project founder in the network; in our study context, it captures the project founders’ diverse contacts in the network. Betweenness centrality implies that the network member can mediate flows of information; it is a count of all the shortest possible paths that include a founder and thus represents the ease with which the project founder can facilitate a transfer of resources over the network. Finally, to assess the efficiency of access to other locations through the shortest paths, we use closeness centrality. This measure is the sum of the shortest paths between the project founder and all other projects in the network. Greater closeness means the founder relies more on other members in the network (Ronchetto, et al. 1989). Because closeness centrality is a summed measure, after being normalized, a higher value means the project founder is less embedded and therefore we took an inverse of the summed measure. To measure brokerage, we follow prior research (e.g., Ahuja 2000, Fleming and Waguespack 2007) and apply a version of what Burt (1992) calls "constraint," or the degree to which a network member depends on 15
directly connected neighbors to connect to others in the network. Therefore, brokerage measures the degree of opportunity available to the founder to form new connections in the network. Higher constraint implies lower brokerage, so we inverse code this measure. When a project has multiple founders, we use an average to develop a composite measure. We provide the intuition underlying the measures in Figure 2 and the technical details regarding their calculation in Appendix A. 5.3.
Interplay of Developer-Users and End-Users The number of bugs reported provides our measure of the degree of user engagement in the project
(Grewal, et al. 2006). This measure is highly correlated with the number of bugs resolved and captures the degree of product quality to some extent because number of bugs resolved is an indicator of the degree to which problems in the product have been addressed. Previous research in OSS product development shows that more intense user engagement can benefit the development process and lead to better products (Crowston, Annabi and Howison 2003, Lee, et al. 2009). We downloaded the pertinent data for the projects in our sample from the concurrent versioning system history table in the data warehouse. The number reported varies over time, so we obtained the data at six distinct times: November 2004, February 2005, June 2005, October 2005, February 2006, and June 2006. This variable is the time-varying covariate in the model. Following prior research on OSS product development (e.g., Subramaniam, et al. 2009), we also characterize projects according to their intended audience as technical or nontechnical. Project managers categorize their projects as developer-focused or user-focused, so we constructed a dichotomous measure from information in the project details table in the data warehouse, such that 1 indicates a user-focused project and 0 a developer-focused project. 5.4.
Control Variables Project founders develop coordination and product development skills through interactions in the open
source community (e.g., O'Mahony and Ferraro 2007), and greater experience might enable them to foresee and respond to problems better (e.g., Chandrashekaran, Mehta, Chandrashekaran and Grewal 1999), which would benefit the product development process. Therefore, we measure founders’ experience as the
16
cumulative time since they registered in the open source community. When a project has multiple founders (as do 34% of the sample projects), we sum their experience. From the project details table in the data warehouse, we also obtained data about whether the project development process relies on the community forum for communication with users (Madey 2005). This control variable therefore indicates the extent of use of the community feature provided by the Web site. Projects that use the forum may benefit from it, compared with projects that do not use the forum, so we control for this effect. This information is recorded automatically in the data warehouse: 1 if the project uses a forum and 0 if not. In the OSS setting, different licenses grant various degrees of control over the source code to users (Stewart, et al. 2006, Subramaniam, et al. 2009). The most common licenses include the generally public license (GPL) and the less restrictive limited general public license (LGPL). Because GPL is the most widely used, we code the license type variable as a dichotomous variable, where 1 indicates use of GPL and 0 otherwise. We obtained data about the size of the released project files from the project details table in the SourceForge data warehouse. After identifying the release times of these file, we collected determined their sizes, measured in megabytes. Similarly, we obtained data about the number of project members from the project details table for January 2003. We use this number as a control for the effect of team size on the time to product release. Finally, some projects lead to product releases immediately after registration, because the development team has been working on the source code before registering the site. To control for this prior code availability, we obtained data about the status of the project at the time of its registration from the project details table. A dichotomous variable indicated if prior code was available at the time of registration (e.g., Subramaniam, et al. 2009), equal to 1 if prior code was available and 0 otherwise. 5.5.
Descriptive Statistics We present the descriptive statistics of the variables in Table 1. A majority (52%) of the projects in our
sample were nontechnical or user-focused, and 33% used the GPL license. Approximately 15% of the 17
projects had prior code at the time of registration, and 89% of them used the community forum. The average project gathered 3.55 bug reports during the observation window , with a team size of two at the end of the first month (including founders and non-founders), and their founders had 112 days’ experience. Of the projects that released a product file, the average file size was 1207 MB, released after 487 hours, or roughly 20 days. [Insert Table 1 about here] 6.
Results
6.1.
Model Selection Following Sinha and Chandrashekaran (1992), we use a two-step procedure to select the appropriate
functional form for the final model (for details, see Appendix B). First, we set out to choose an appropriate functional form for the timing distribution. The fit criteria indicated that the model with a log-logistic distribution (model M2) fit the duration data better than when the lognormal distribution (model M1) was used in the model (BICM1 = 4375.70, CAICM1 = 4408.70; BICM2 = 3958.92, CAICM2 = 3991.92).2 Second, in the log-logistic model, the comparison of probit (model M3) versus logit (model M4) specifications for the probability of eventual product release in the split-hazard specification (Schmidt and Witte 1989; Sinha and Chandrashekaran 1992) shows that the logit specification fit the data better (BICM3 = 3878.46, CAICM3 = 3942.46; BICM4 = 3797.78, CAICM4 = 3861.78). We also find a statistically significant improvement in this logistic distribution specification for the hazard and logit link function specifications for the probability when we account for unobserved heterogeneity in the model (model M5). The final model (M5) outperforms the best model from the second stage (M4) on various fit criteria (BICM5 = 3796.16, CAICM5 = 3861.16). We estimate model M5 using simulated maximum likelihood estimation with a sequence of 30 Halton draws instead of the standard normal random draws3 (Bhat 2003, Train 2003), and in Table 2, we present the results from this final model. BIC = Bayesian information criterion; CAIC = consistent Akaike information criterion. Bhat (2003) finds, in a simulation study, that Halton draws increase processing speed without significant effects on the simulation results; the results are robust to a number of Halton draws that is slightly higher than the square root of the number of observations. We use 30 draws for our sample size of 817. 2 3
18
[Insert Table 2 about here] 6.2.
Eventual Product Release The results from the logit portion of the split-hazard formulation reveal which projects eventually lead to
a product release (see Table 2). As degree centrality increases, the probability of eventual product release increases at an increasing rate (i.e., positive quadratic term; b = .04, p < .01); as betweenness centrality increases, this probability decreases (i.e., negative linear term; b = -.003, p < .10). When brokerage increases, the probability of eventual release increases at a decreasing rate (i.e., positive linear, negative quadratic terms; b = 1.36, p < .01 and b = -.60, p < .01). Moreover, the embeddedness measures pertaining to betweenness centrality (b = .17, p < .01) and closeness centrality (b = .83, p < .01) strengthen the positive effect of brokerage on the probability of eventual product release. Nontechnical projects are more likely to experience product release than are user-focused projects (b = .69, p < .01), though projects that use a GPL are less likely to do so (b = -.33, p < .05). The probability of eventual product release increases with greater founders’ experience (b = .21, p < .01), and using the community forum increases the probability of eventual product release (b = .60, p < .05). The results for the interplay of developer-user and end-user communities reveal that brokerage increases probability of eventual release at a faster rate for end-user products than for developer-user products (b = .21, p < .10). 6.3.
Time to Product Release For this section, we turn to the results from the hazard portion of the split-hazard formulation (see Table
2). 6.3.1.
Project Founders’ Embeddedness and Brokerage
In support of H1, we find that time to product release decreases at a decreasing rate with greater embeddedness, measured by both degree (i.e., negative linear, positive quadratic effects; b = -.10, p < .10 and b = .01, p < .01) and closeness (i.e., negative linear, positive quadratic effects; b = -.56, p < .01 and b = .39, p < .01) centrality. In contrast, time to product release increases at a decreasing rate as betweenness centrality increases (positive linear, negative quadratic effect; b = .005, p < .01 and b = -.19, p < .01).
19
These findings related to betweenness centrality merit discussion. They are consistent with other studies that indicate variance in the strength and direction of relationships involving betweenness centrality (e.g., Ronchetto, et al. 1989; Fleming and Waguespack 2007). Because betweenness centrality measures a founder’s location on shortest pathways in the network, it is possible that an overflow of information along the shortest pathways causes these locations to become overburdened with information (e.g., Rosa, et al. 1999). In this case, founders occupying such locations slow down both the NPD process and the time to product release. Consistent with H2 and extant research on diminishing returns to (social) capital (e.g., Elfenbein and Zenger 2009, McFadyen and Cannella Jr. 2004), as brokerage increases, the time to product release decreases at a decreasing rate (i.e., negative linear, positive quadratic effects; b = -.92, p < .01 and b = .50, p < .01). Thus, brokerage offers benefits, but the rate of return tends to decrease as the level of brokerage increases. 6.3.2.
Embeddedness and Brokerage
In support of H3 regarding the synergy between embeddedness and brokerage in an OSS product development context, we find that the magnitude of the decrease in the time to product release with greater embeddedness increases with greater brokerage. However, the effect is only significant for embeddedness measured by degree centrality (b = -.42, p < .01) and closeness centrality (b = -.68, p < .01)4. We display the interaction between embeddedness and brokerage in Figure 4. Panel A represents the effect of interaction between degree centrality and brokerage; Panel B shows the interaction effect between closeness centrality and brokerage. Panel A in Figure 4 also shows that the rate of decrease in the time to product release as brokerage increases slows even more when degree centrality is higher than when it is lower, as indicated by the steeper negative slope in the high degree centrality condition. Panel B in Figure 4 shows another negative slope that indicates that the rate of decrease in the time to product release as brokerage increases is even more negative when closeness centrality is high compared with when it is low. Therefore, projects whose founders are endowed with both high embeddedness and high 4 Ai and Norton (2003) indicate that in nonlinear models, the parameter value of an interaction does not equal the marginal interaction effect. However, Greene (2010, p. 295) notes that “partial effects are neither coefficients nor elements of the specification of the model. They are implications of the specified and estimated model.” The marginal effects are an artifact of the specification of the functional form of the mean function. Therefore, for our hypothesis tests, following Greene (2010), we use the interaction effects estimated in the model, as reported in Table 2.
20
brokerage get to market sooner than projects founded by people who score high only on one of the social capital measures. [Insert Figure 4 about here] 6.3.3.
Interplay of Developer-Users and End-Users
As the degree of user engagement for a project increases, time to release shortens (b = -.05, p < .01). Consistent with H4b, we find that the decrease in the time to product release as the degree of user engagement increases intensifies when embeddedness increases (degree centrality b = -.09, p < .01; betweenness centrality b = -.05, p < .05). Thus, the developer-user community leverages the activity of the end-user community in the OSS context. Consistent with H5b, the faster time to product release due to greater embeddedness speeds up even more for user-focused projects than for developer-focused projects (degree centrality b = -.01 p < .01; closeness centrality b = -.25, p < .10). In contrast, when betweenness centrality increases, the reduction in the time to product release is greater for developer-focused projects than for user-focused projects (b = .35, p < .05). 6.3.4.
OSS Project and Product Characteristics
As the project founders’ experience increases, the time to product release decreases (b = -.15, p < .01), and projects that use forums experience a shorter time to product release than projects that do not use them (b = -.80, p < .01). As expected, projects with prior code at the time of registration on SourceForge release products at a faster rate than projects without this code (b = -.69, p < .01). Finally, the effects of the type of open source license, size of first release file, and size of project team are insignificant. 6.4.
Robustness Checks and Model Validation To assess the validity and robustness of our results, in addition to the various model specification tests
we conducted in model selection, we vary our sampling frame and test the predictive validity of our model. 6.4.1.
Sampling Frame
To rule out the possibility that our results are specific to our sampling frame, we created three random subsets from the 817 projects that constitute our sample. We then created three different samples by combining two of the random samples at a time: subsample 1 (n = 582), subsample 2 (n = 510), and 21
subsample 3 (n = 542). We estimated three split-hazard models with unobserved heterogeneity, one for each of the subsamples. The results from all three models are consistent with those from the entire sample. 6.4.2.
Predictive Validity
To assess the predictive validity of the duration model, we divided the sample into two random subsets, one for estimation (sample A: n = 410, censored = 177) and another for predictive validation (sample B: n = 407, censored = 172). We estimated two models using the sample A data: one with only project feature variables and another that includes the social capital variables and interplay of the developer-user and enduser communities. We account for censoring by following Grewal, Mehta and Kardes (2004) and removing any censored observations from the holdout sample before calculating the predictive validity scores. With the sample B data, we compared the two models in terms of their root mean square errors (RMSE), mean absolute errors (MAE), and Thiel’s U statistic (TUS) (Greene 2003). The model that includes the network measures provides superior predictions than the one without them (RMSE improves by 12.4%, MAE by 17.8%, and TUS by 11.3%). We validate the predictive ability of the eventual product release (i.e., the logit part of the split-hazard model) for the overall sample (Sinha and Chandrashekaran 1992). With a correct prediction rate of .82, our model specification outperforms the naïve criteria of Cpro (= .51) and Cmax (= .57) advanced by Morrison (1969).5 7.
Discussion As the economic significance of user-generated content continues to grow for firms, it becomes critical to
understand how user-generated content is created, managed, and used in various commercial contexts. We study the factors that drive the success of OSS NPD projects, an important user-generated content domain. We conceptualize the interplay of two communities, developer-users and end-users, which creates the necessary basis for user-generated content creation and its maintenance. Therefore, the time to product
Cpro is the total proportion of correctly classified projects that do and do not release a product, and Cmax is the minimum expected proportion of projects that release a product.
5
22
release of new OSS projects is influenced by the project founders’ social capital, aspects of the interplay of developer-user and end-user communities, and the OSS project and product characteristics. 7.1.
Theoretical Contributions In Table 3, we summarize our study’s findings with respect to extant research in innovation, social
networks, and OSS product development. Several of our hypotheses are new, and research on the specific factors that influence NPD success in an OSS context, as well as the role of social networks, remains sparse. Grewal et al. (2006) consider the role of networks in the technical and commercial success of OSS products, and Hahn et al. (2008) examine how prior ties affect the formation of new OSS project teams. Other studies have investigated the role of OSS project features on success (Stewart et al. 2006, Subramaniam et al. 2009). We extend this body of literature by studying the role of all three important factors simultaneously and their effects on time to product release. [Insert Table 3 about here] Emerging literature in OSS product development also recognizes the importance of social networks (e.g., Hahn, et al. 2008); we extend this research in two ways. First, rather than studying product outcomes such as commercial success (Grewal, et al. 2006), we consider the NPD process, with time to first product release as a NPD process metric. Second, together with embeddedness, which has been studied previously in the OSS domain (Grewal, et al. 2006), we consider brokerage, which captures founder’s access to diverse information sets and thus their creativity (e.g., Burt 2004, Obstfeld 2005). Our results regarding the nonlinear main effect of brokerage and the interaction between embeddedness and brokerage (Figure 4) provide support for including both measures in our OSS conceptualization. Our work also explicitly recognizes two subcommunities, their different roles, and their interplay. This recognition is important for two reasons. First, it is central to the OSS NPD process, as well as common to other user-generated contexts such as pictures on flickr and videos on youtube. Second, it represents a contrast from the firm-centric NPD process; that is, though firms may resort to user feedback during the NPD process, the level and intensity of such feedback is a managerial decision, not driven by users. 7.2.
Managerial Implications 23
Our research has particular implications for managers of firms that actively participate in open source projects (e.g., IBM). Project founders’ social capital plays a critical role, beyond the interplay of the developeruser and end-user communities or the impact of OSS project and product characteristics. If IBM were to choose between two employees, say A and B (actual members of SourceForge), to found a project, it should consider their embeddedness scores, which reveal that user A is likely to bring a project to market almost a full year (357 days) before user B. Similar metrics might consider data about the user and the project, as outlined in our model and the associated parameters. For example, projects achieve a 51% decrease in their time to product release when the founder has high embeddedness (degree centrality in particular) and brokerage, but the decrease is only 12% when the founder has a high degree centrality alone (see Figure 4, Panel A). Furthermore, recognizing the existence of two communities allows managers to find new ways to speed products to market. They might offer more product versions targeted toward end-users, to encourage enduser participation (e.g., beta-testers) and thus increase bug reports. All else being equal, an increase of one standard deviation in bugs reported (42 bugs), or user engagement, reduces time to product release for the average project by 11%. Finally, the effects of OSS project and product characteristics have managerial relevance. Experience pays off, and our findings help quantify that payoff; a one standard deviation increase (54 days) in the cumulative experience of project founders reduces the time for the average project by 15%. Our results also can help managers justify the use of community collaboration tools: Projects that use forums experience an 80% shorter time to product release than do projects that fail to use forums. 7.3.
Limitations and Research Opportunities Our research centers on one user-generated content domain, without addressing other domains such as
blogs or wikis, which may not rely on the same success metrics or network dynamics. By featuring only SourceForge, with its unique institutional idiosyncrasies, this study’s findings require testing in other open source project domains. Our research further is limited by data availability and contains only structural data; we observe the connections but not what they comprise (e.g., communication patterns, formal or informal 24
hierarchies). Information about the type and quality of communication might enrich our understanding of the process. We also believe that time to product release is a sound NPD metric, but our results clearly reflect our use of that metric; others outcomes might be investigated further, assuming data availability. It would be interesting to investigate if and how our two-subcommunity characterization applies to other user-generated contexts. It also would be useful to determine how our findings regarding the role of social capital in innovation apply more broadly to the role of social networks in other domains (e.g., Stephen and Toubia 2010). For example, bloggers typically link to other popular bloggers, and active Facebook users tend to have large social networks. Investigating the mechanics of network formation and growth in these contexts should enable managers of online social networking sites to develop new ways to manage and monetize traffic on their sites. 7.4.
Conclusion In the modern networked world, innovative models of communication and collaboration emerge rapidly,
creating the need to investigate various phenomena (Reibstein, Day and Wind 2009). The emergence of usergenerated content for example is changing how consumers share information, form communities, and connect, which has serious implications for marketing strategy. The OSS context represents one such important phenomenon; as we show, social networks of developer-users and the interplay between the developer-user and end-user communities have critical impacts on NPD process success. Social network analysis provides effective concepts and tools to address these new opportunities, and an emerging networked environment presents many interesting and important challenges for research. We therefore hope to contribute to a better understanding of this user-generated content domain.
25
Table 1 Descriptive Statistics and Bivariate Correlations (n = 817) Variable (1)
Brokerage
(2)
Degree centrality
(3)
Betweenness centrality
(4)
Mean
SD
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
.30
.33
1
2.03
3.55
-.34
1
48.35
266.02
-.14
.18
1
Closeness centrality
.03
1
-.62
.43
.19
1
(5)
Product positioning
.52
.50
.09
-.03
.01
-.07
1
(6)
Type of license
.33
.47
-.09
.05
.07
.07
-.08
1
(7)
Project registration status
.15
.36
.01
-.02
-.05
.03
-.01
-.05
1
(8)
Bugs reported
3.55
42.24
.02
-.01
-.01
-.02
.03
-.02
-.01
1
(9)
Project team size
1.82
2.27
-.06
.03
-.03
-.01
-.09
.17
-.03
-.02
1
(10)
Founder’s experience
2683.63
1303.23
-.23
.14
.08
.34
.03
.05
.07
-.00
-.04
1
(11)
Forum usage
.89
.31
-.09
.04
.04
.08
-.01
.03
-.10
-.13
-.01
.01
1
(12)
File size at release
1207.13
18799094
-.09
.06
.06
.07
.02
.08
.02
.02
-.03
.07
-.04
1
(13)
Time to product release
487.34
7.956
-.12
-.21
-.04
-.09
-.01
.04
-.05
.02
.02
-.44
-.04
-.03
(13)
1
26
Table 2 Effects of Social Network Variables and Product Characteristics: (N = 817)
B2 Degree of user engagement (DU)
Time to Product Release 6.89***(1.92) -.10* (.07) .01*** (.003) .01*** (.02) -.02***(.01) -.56**(.33) .39***(.17) -.92***(.18) .50***(.14) -.05*** (.02)
Probability of Eventual Product Release 4.36 (5.74) -.04 (.13) .04***(.01) .01 (.01) -.12 (.133) .61* (.47) -.08(.11) 1.14*** (.36) -.54***(.11) .05 (.04)
Product positioning (PP)
.22 (.22)
-.69** (.18)
Interaction between embeddedness and brokerage
DC * B BC * B CC * B
-.42*** (.18) .02 (.03) -.68*** (.19)
.18 (.21) .17*** (.01) .83*** (.23)
Interactions of founder’s social capital with interplay of developer-users and endusers
DC * DU BC * DU CC * DU B * DU DC * PP BC * PP CC * PP B * PP
-.09*** (.04)a -.05** (.03) -.01 (.03) .03 (.02) -.7*** (.2)a .35** (.21) -.25*(.17) -.01(.06) -.15***(.02) -.80*** (.34) -.01 (.20) .00 (.001) .00 (.001) -.69*** (.27) -.17*** (.08) 1.04*** (.05)
-.02 (.03) -.00 (.00) .06 (.10) .15 (.34) .01 (.01) -.21 (.24) .06 (.23) .21* (.13) .21***(.03) .60** (.35) -.33** (.18) -NA.03 (.04) .46 (.31) -NA-NA-
Constant Degree centrality (DC) DC2 Betweenness centrality (BC)
Founder’s social capital
Embeddedness
BC2 Closeness centrality (CC)
CC2
Brokerage Interaction between developer-users and endusers
Brokerage (B)
Experience of founders Forum usage
Control variables
Type of OSS license Size of first release file Project team size Prior code
Unobserved heterogeneity parameter Scale parameter of duration distribution ***p < .01. **p < .05. *p < .1. a The coefficient and standard error were multiplied by 102 for ease of presentation.
27
Table 3 Summary of Results and Contributions Hypotheses
Study Findings
Contribution
Prior Literature in OSS
H1: Embeddedness decreases time to product release at a decreasing rate
Supported (5 Established hypothesis of 6 effects) Broadened role of embeddedness, to encompass its effect on a critical OSS process success measure. Presence of a nonlinear effect
H2: Brokerage decreases time to product release at a decreasing rate
Supported (2 New hypothesis effects) Brokerage has a main effect on OSS product development speed. Presence of a nonlinear effect.
H3: Brokerage strengthens the negative effect of embeddedness on time to product release
Supported (2 out of 3 effects)
New hypothesis Trade-offs between types of social capital represented via interaction effects.
H4: Interaction between (a) embeddedness and (b) brokerage and degree of user engagement.
a. Supported (2 out of 3 effects) b. Not supported
New hypothesis Grewal et al. (2006): Degree of user Boundaries of social capital in the engagement as a control variable. presence of heterogeneity in Subramaniam et al. (2009): Community project features. interest as a dependent variable
H5: Interaction between (a) embeddedness and (b) brokerage and product positioning (user-focused vs. developer-focused).
a. Supported (all 3 effects) b. Not supported
New hypothesis Subramaniam et al. (2009): License and Boundaries of social capital in the positioning as determinants and controls of presence of heterogeneity of OSS success measured by user interest, product characteristics. developer interest, and project activity. Stewart et al. (2006): License as a determinant of user interest and development activity.
-None-
Grewal et al. (2006): Positive effect on technical and commercial success in OSS networks.
-None-
Prior Literature in Social Networks or Innovation Ronchetto et al. (1989): Positive effect on organizational buying influence in B2B purchasing context. Ahuja (2000): Positive effect in the context of firm innovation output. Swaminathan and Moorman (2009): positive effect on firm abnormal returns in marketing alliance announcements. Nerkar and Paruchuri (2005): Positive effect of brokerage in intrafirm R&D networks. Ahuja (2000): Negative effect in the context of intrafirm networks Nerkar and Paruchuri (2005): Positive interaction effect between brokerage and centrality in intrafirm R&D networks. -Not applicable, because the moderator variables are specific to the OSS context.
28
Figure 1 Conceptual Framework Interaction between Developer-users and End-users Product Positioning
Degree of User Engagement H5 Founder’s Social Capital Embeddedness
H4
H1
Time to product release
H3 Brokerage
H2
Control Variables
Experience of founders Forum usage Type of open source license Size of first release file Size of project team Prior code
29
Figure 2 Illustration of Brokerage and Embeddedness
1
6 4 3
5
2 -OSS project
7
-OSS user
NOTES:
(1) User 1 shares a project with users 2 and 3; user 2 shares a project with users 1 and 3; user 3 shares a project with users 1, 2, and 4; user 4 shares a projects with users 3 and 5; user 5 shares a project with users 4, 6, and 7; user 6 shares a project with users 5 and 7; and user 7 shares a project with users 5 and 6. (2) User 4 works on two projects, whereas user 3 works on three projects, so user 3 is more embedded than user 4 (degree centrality; Ronchetto, Hutt and Reingen 1989). (3) User 4 brokers the relationship between the two groups, one comprising users 1, 2, and 3 and the other users 5, 6, and 7. As the sole link between these otherwise disconnected groups, user 4 scores high on the measure of brokerage compared with the other users.
30
Figure 3 Representation of the Network Sample of Open Source Projects and Users
Notes: This network graph was created using Pajek software (Batagelj and Mrvar 1998). The squares represent newly founded open source projects (N = 817), and the triangles are users. Consider users 142, 736, and 786 (marked by stars). User 736 is connected to two projects (33, 156), and the sole connection for otherwise unconnected network parts is an indicator of its large brokerage measure. User 142 is highly embedded as part of multiple projects (475, 95, 294, 69, 164, 598). User 786 is connected to just one project (178) on the periphery of the network and has low brokerage and embeddedness. 31
Figure 4 Panel A: Brokerage, Degree Centrality, and Time to Product Release Notes: Time to product release changes with brokerage at high (95th percentile) versus low (65th percentile) levels of degree centrality. It decreases by 51.07% when brokerage changes from 0 to 1 in the high degree centrality condition. It decreases by 12.18% in the low degree centrality condition. The negative effect of brokerage is more pronounced in the presence of high degree centrality.
Panel B: Brokerage, Closeness Centrality, and Time-to-Product-Release Notes: Time to product release changes with brokerage at a high (95th percentile) versus low (65th percentile) level of closeness centrality. It decreases by 21.06% as brokerage changes from 0 to 1 in the high closeness centrality condition. It decreases by 6.19% in the low closeness centrality condition. The negative main effect of brokerage is more pronounced in the presence of high closeness centrality.
32
Appendix A Network Location–Based Social Capital: An Illustrative Example Consider an illustrative network N of open source projects and users. 1
6 4
3
5
User 1 shares a project with users 2 and 3; user 2 shares a project with users 1 and 3; user 3 shares a project with users 1, 2, and 4; user 4 shares a projects with users 3 and 5; user 5 shares a project with users 4, 6, and 7; user 6 shares a project with users 5 and 7; and user 7 shares a project with users 5 and 6.
-OSS project
2
‐OSS user
7
Affiliation matrix [N] for Network N Project/ User 1 2 3 4 5 6 7
1 2 3 4 5 6 7 8 1 1 0 0 0 0 0
1 0 1 0 0 0 0
0 1 1 0 0 0 0
0 0 1 1 0 0 0
0 0 0 1 1 0 0
0 0 0 0 1 1 0
0 0 0 0 1 0 1
0 0 0 0 0 1 1
The network N can be represented in matrix form [N] by placing the users in rows and projects in columns. The presence of a link is indicated by 1 and its absence by 0. This affiliation matrix can be asymmetrical. The objective is to use this pattern of relationships between projects and users to calculate the network measures of the users. We construct two affiliation matrices.
The affiliation matrix (XU) of the users indicates co-membership patterns in projects; the affiliation matrix of projects (XP) indicates shared memberships of users.
1 1 0 N 0 0 0 0
1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 , and its transpose N 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0
1 0 1 0 0 0 0 0
0 1 1 1 0 0 0 0
0 0 0 1 1 0 0 0
0 0 0 0 1 1 1 0
0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1
33
2 1 1 U X NN 0 0 0 0
1 1 0 0 0 0 2 1 0 0 0 0 4 1 1 3 1 0 0 0 1 2 0 1 2 1 0 0 , and X P N N 1 2 0 0 1 3 1 1 1 1 0 0 0 1 2 1 0 0 0 1 1 2
1 2 3 2
1 1 . 2 2
U For example, a value of 2 in row 4/column 5 in X implies that user 4 shares two projects with
We use degree centrality (e.g., Faust 1997) as one measure of embeddedness, following similar conceptualizations in prior research (e.g., Gulati and Gargiulo 1999). Methodologically, we calculate it using the following expression (e.g., Faust 1997):
C D ( X iU ) X iiU ,
(A.1)
where X iiU is the ith diagonal element of the affiliation matrix X U . We use betweenness centrality (e.g., Faust 1997, Freeman 1977) as a second measure of embeddedness. Methodologically, we calculate it using the following expression:
CB X iU j k
g jk (i ) g jk
,
(A.2)
where g jk (i ) is the number of shortest paths between any two actors j and k that pass through the node i, and g jk is the number of all possible shortest paths between nodes j and k. We use closeness centrality (e.g., Faust 1997) as the third measure of embeddedness. We calculate it as follows: 1
CC X iU
g d i, j j 1 g 1 .
(A.3)
Intuitively, closeness is the inverse of the average of shortest distance between the user i and all other projects in the network. 34
Brokerage is based on Burt’s (1992) measure of constraint and calculated using the following expression:
CS X iU 1 j ( pij k qik rkj ) 2 ,
(A.4)
where pij represents the proportional investment of node i in node j; k is a node linked to both nodes i and j;
qik represents the proportional investment of node i in node k; and rkj represents the proportional investment of node k in node j. If each link has a nominal value of unity, for a node with two connections to two distinct nodes, the proportional investment of the focal node in each of the nodes is ½. To calculate the proportional investment of a node i in node j, we use the affiliation network of projects, X P . The proportional investment of a node i in node j equals the strength of the relationship between i and j (number of shared users), divided by the total number of users at the disposal of node i. We use Equations A.1–A.4 to calculate the network measures of users and present the results in Table A.1. Table A.1. Network Measures of Users in Network N
1
6 3
4
Eq. A.4
5
-OSS project
2
-OSS user
7
Embeddedness
Brokerage
1 2 3 4 5 6 7
0 0 .39 .50 .39 0 0
Degree Eq. A.1 2 2 3 2 3 2 2
Betweenness Eq. A.2 0 0 8 9 8 0 0
Closeness Eq. A.3 40 40 54.55 60 54.55 40 40
35
Appendix B Model Comparison for Model Selection Model
Log likelihood
Akaike Bayesian Consistent Modified Information Information Akaike Akaike Criterion Criterion Information Information
Step 1: Lognormal -2058.07 4182.14 4375.70 4408.70 4215.14 Selection of (M1) functional Log-logistic form for the 3765.36 3958.92 3991.92 3798.36 -1849.68 (M2) duration probability Notes: The fit criteria indicate that M2 provides the best fit to the data, compared with M1. We retained the logistic specification for estimation of the split-hazard model in Step 2. Step 2: Log-logistic Selection of duration with -1667.39 3503.06 3878.46 3942.46 functional probit splithazard (M3) form for the Log-logistic probability of duration with -1647.19 eventual 3422.38 3797.78 3861.78 logit splitadoption hazard (M4) probability Notes: The fit criteria indicate that M4 provides a better fit to the data than does M3. Step 3: Unobserved Heterogeneity
M5 with unobserved heterogeneity (M5)
-1642.45
3414.9
3796.16
3861.16
3567.06
3486.38
3479.90
Notes: Because M4 outperformed M3 in Step 2, we incorporated unobserved heterogeneity into M4 and estimated M5.
36
REFERENCES Ahuja, G. 2000. Collaboration Networks, Structural holes, and Innovation: A Longitudinal Study. Administrative Science Quarterly. 45(3) 425-455. Ai, C., Norton, E.C. 2003. Interaction Terms in Logit and Probit Models. Economic Letters. 80(1) 123-129. Barczak, G., Griffin, A., Kahn, K.B. 2009. PERSPECTIVE: Trends and Drivers of Success in NPD Practices: Results of the 2003 PDMA Best Practices Study. Journal of Product Innovation Management. 26(1) 3-23. Batagelj, V., Mrvar, A. 1998. Pajek-program for large network analysis. Connections. 21(2) 47-57. Bayus, B.L., Jain, S., Rao, A.G. 1997. Too Little, Too Early: Introduction Timing and New Product Performance in the Personal Digital Assistant Industry. Journal of Marketing Research. 34(1) 50-63. Bhat, C.R. 2003. Simulation Estimation of Mixed Discrete Choice Models Using Randomized and Scrambled Halton Sequences. Transportation Research. 37(9) 837-855. Burt, R.S. 1992. Structural Holes: The Social Structure of Competition. Harvard University Press. Cambridge, MA. Burt, R.S. 1997. The Contingent Value of Social Capital. Administrative Science Quarterly. 42(2) 339-365. Burt, R.S. 2004. Structural Holes and Good Ideas. American Journal of Sociology. 110(2) 349-399. Burt, R.S. 2005. Brokerage and Closure: An Introduction to Social Capital. Oxford University Press, USA. Chandrashekaran, M., Mehta, R., Chandrashekaran, R., Grewal, R. 1999. Market Motives, Distinctive Capabilities, and Domestic Inertia: A Hybrid Model of Innovation Generation. Journal of Marketing Research. 36(1) 95-112. Coleman, J.S. 1988. Social Capital in the Creation of Human Capital. American Journal of Sociology. 94(1) 95-120. Coles, P.A., Lakhani, K.R., McAfee, A. 2008. Cambrian House. Harvard Business School. 608016-PDF-ENG. Crowston, K., Annabi, H., Howison, J. 2003. Defining Open Source Software Project Success. eds. International Conference on Information Systems Dekimpe, M.G., Gucht, L.M.V.d., Hanssens, D.M., Powers, K.I. 1998. Long-Run Abstinence after Narcotics Abuse: What Are the Odds? Management Science. 44(11) 1478-1492. Elfenbein, D.W., Zenger, T.R. 2009. The Economics of Relational (Social) Capital in Industrial Procurement. SSRN eLibrary. Erdem, T., Keane, M.P. 1996. Decision-making Under Uncertainty: Capturing Dynamic Brand Choice Processes in Turbulent Consumer Goods Markets. Marketing Science. 15(1) 1-20. Faust, K. 1997. Centrality in Affiliation Networks. Social Networks. 19(2) 157-191. Fleming, L., Mingo, S., Chen, D. 2007. Collaborative Brokerage, Generative Creativity and Creative Success. Administrative Science Quarterly. 52(3) 443-475. 37
Fleming, L., Waguespack, D.M. 2007. Brokerage, Boundary Spanning, and Leadership in Open Innovation Communities. Organization Science. 18(2) 165-180. Freeman, L.C. 1977. A Set of Measures of Centrality Based on Betweenness. Sociometry. 40(1) 35-41. Granovetter, M.S. 1973. The Strength of Weak Ties. American Journal of Sociology. 78(6) 1360-1380. Granovetter, M.S. 1985. Economic Action and Social Structure: The Problem of Embeddedness. American Journal of Sociology. 91(3) 481-510. Greene, W.H. 2003. Econometric Analysis. Prentice Hall. Upper Saddle River, NJ. Greene, W.H. 2010. Testing Hypotheses About Interaction Terms in Nonlinear Models. Economics Letters. 107(2) 291-296. Grewal, R., Lilien, G.L., Mallapragada, G. 2006. Location, Location, Location: How Network Embeddedness Affects Project Success in Open Source Systems. Management Science. 52(7) 1043-1056. Grewal, R., Mehta, R., Kardes, F.R. 2004. The Timing of Repeat Purchases of Consumer Durable Goods: The Role of Functional Bases of Consumer Attitudes. Journal of Marketing Research. 41(1) 101-115. Gulati, R., Gargiulo, M. 1999. Where do Interorganizational Networks Come From? American Journal of Sociology. 104(5) 1439-1493. Hahn, J., Moon, J.Y., Zhang, C. 2008. Emergence of New Project Teams From Open Source Software Developer Networks: Impact of Prior Collaboration Ties. Information Systems Research. 19(3) 369-391. Henard, D.H., Szymanski, D.M. 2001. Why Some New Products are More Successful Than Others. Journal of Marketing Research. 38(3) 362-375. Hendricks, K.B., Singhal, V.R. 1997. Delays in New Product Introductions and the Market Value of the Firm: The Consequences of Being Late to the Market. Management Science. 43(4) 422-436. Kalbfleisch, J.D., Prentice, R.L. 1980. The Statistical Analysis of Failure Time Data. Wiley Publishers. New York: NY. Kiefer, N.M. 1988. Economic Duration Data and Hazard Functions. Journal of Economic Literature. 26(2) 646679. Kogut, B., Zander, U. 1992. Knowledge of the Firm, Combinative Capabilities, and the Replication of Technology. Organization Science. 3(3) 383-397. Lakhani, K.R., von Hippel, E. 2002. How Open Source Software Works: “Free” user-to-user Assistance. Research Policy. 1451 1-21. Langerak, F., Hultink, E.J. 2006. The Impact of Product Innovativeness on the Link between Development Speed and New Product Profitability. Journal of Product Innovation Management. 23(3) 203-214. Lee, S.-Y.T., Kim, H.-W., Gupta, S. 2009. Measuring Open Source Software Success. Omega. 37(2) 426-438.
38
Lin, N. 2001. Building a Network theory of Social Capital.in Social Capital: Theory and Research. eds. N. Lin, Cook, K., Burt, R.S. Aldine De Gruyter. Hawthorne, NY. Madey, G. 2005. SourceForge.net Research Data Archive. Retrieved Accessed October 2005. University. March, J.G. 1991. Exploration and Exploitation in Organizational Learning. Organization Science. 2(1) 71-87. McFadyen, M.A., Cannella Jr., A.A. 2004. Social Capital and Knowledge Creation: Diminishing Returns of the Number and Strength of Exchange Relationships. Academy of Management Journal. 47(5) 735-746. Morrison, D.G. 1969. On the Interpretation of Discriminant Analysis. Journal of Marketing Research. 6(2) 156163. Nerkar, A., Paruchuri, S. 2005. Evolution of R&D Capabilities: The Role of Knowledge Networks Within a Firm. Management Science. 51(5) 771-785. O'Mahony, S., Ferraro, F. 2007. The Emergence of Governance in an Open Source Community. Academy of Management Journal. 50(5) 1079-1106. O’Hern, M.S., Rindfleisch, A. 2009. Customer Co-Creation: A Typology and Research Agenda. Review of Marketing Research. 6 84-106. Obstfeld, D. 2005. Social Networks, the Tertius Iungens Orientation, and Involvement in Innovation. Administrative Science Quarterly. 50(1) 100-130. Park, S., Gupta, S. 2009. Simulated Maximum Likelihood Estimator for the Random Coefficient Logit Model Using Aggregate Data. Journal of Marketing Research. 46(4) 531-542. Raymond, E. 1999. The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. O'Reilly. Cambridge, MA. Reibstein, D.J., Day, G., Wind, J. 2009. Guest Editorial: Is Marketing Academia Losing Its Way? Journal of Marketing. 73(4) 1-3. Rindfleisch, A., Moorman, C. 2001. The Acquisition and Utilization of Information in New Product Alliances: A Strength-of-ties Perspective. Journal of Marketing. 65(2) 1-18. Romanelli, E. 1989. Environments and Strategies of Organization Start-Up: Effects on Early Survival. Administrative Science Quarterly. 34(3) 369-387. Ronchetto, J.R., Hutt, M.D., Reingen, P.H. 1989. Embedded Influence Patterns in organizational Buying Systems. Journal of Marketing. 53(4) 51-62. Rosa, J.A., Porac, J.F., Runser-Spanjol, J., Saxon, M.S. 1999. Sociocognitive Dynamics in a Product Market. Journal of Marketing 64-77. Sastry, N. 1997. A Nested Frailty Model for Survival Data, With an Application to the Study of Child Survival in Northeast Brazil. Journal of the American Statistical Association. 92 426-435. Schmidt, P., Witte, A.D. 1989. Predicting Criminal Recidivism Using 'Split Population' Survival Time Models. Journal of Econometrics. 40(1) 141-159. 39
Sethi, R., Smith, D.C., Park, C.W. 2001. Cross-Functional Product Development Teams, Creativity, and the Innovativeness of New Consumer Products. Journal of Marketing Research. 38(1) 73-85. Shah, S.K. 2006. Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Management Science. 52(7) 1000-1014. Sinha, R.K., Chandrashekaran, M. 1992. A Split Hazard Model for Analyzing the Diffusion of Innovations. Journal of Marketing Research. 29(1) 116-127. Smith, P.G. 1999. From Experience: Reaping Benefit from Speed to Market. Journal of Product Innovation Management. 16(3) 222-230. Song, X.M., Parry, M.E. 1997. A Cross-National Comparative Study of New Product Development Processes: Japan and the United States. Journal of Marketing. 61(2) 1-18. Stephen, A.T., Toubia, O. 2010. Deriving Value from Social Commerce Networks. Forthcoming. Journal of Marketing Research. Stewart, K.J., Ammeter, A.P., Maruping, L.M. 2006. Impacts of License Choice and Organizational Sponsorship on User Interest and Development Activity in Open Source Software Projects. Information Systems Research. 17(2) 126-144. Stinchcombe, A.L. 1965. Organizations and Social Structure.in Handbook of Organizational Design. eds. P.C. Nystrom, Starbuck, W.H. Oxford University Press. London. Subramaniam, C., Sen, R., Nelson, M.L. 2009. Determinants of Open Source Software Project Success: A Longitudinal Study. Decis. Support Syst. 46(2) 576-585. Swaminathan, V., Moorman, C. 2009. Marketing Alliances, Firm Networks, and Firm Value Creation. Journal of Marketing. 73(5) 52-69. Train, K. 2003. Discrete Choice Methods with Simulation. Cambridge University Press. Cambridge: UK. Uzzi, B. 1997. Social Structure and Competition in Interfirm Networks: The Paradox of Embeddedness. Administrative Science Quarterly. 42(1) 35-67. Villas-Boas, M.J., Winer, R.S. 1999. Endogeneity in Brand Choice Models. Management Science. 45(10) 13241338. von Hippel, E., von Krogh, G. 2003. Open Source Software and the "Private Collective" Innovation Model: Issues for Organization Science. Organization Science. 14(2) 209-223. von Krogh, G., von Hippel, E. 2006. The Promise of Research on Open Source Software. Management Science. 52(7) 975-983. Wasserman, S., Faust, K. 1999. Social Network Analysis: Methods and Applications. Cambridge University Press. Cambridge, UK. Zucker, L.G., Darby, M.R., Brewer, M.B., Peng, Y. 1995. Collaboration Structure and Information Dilemmas in Biotechnology.in Trust in Organizations. eds. R.M. Kramer, Tyler, T.R. Sage. Thousand Oaks: CA. 40