mass of contributors can participate. KEY WORDS ⢠collective ... ically software) into essentially a public good (Kogut and Metiu 2001;. O'Mahony 2003). ...... subsequently or used for commercial purposes (please send requests via email to:.
REPUTATION AND RELIABILITY IN COLLECTIVE GOODS THE CASE OF THE ONLINE ENCYCLOPEDIA WIKIPEDIA
Denise Anthony, Sean W. Smith, and Timothy Williamson
ABSTRACT An important organizational innovation enabled by the revolution in information technologies is ‘open source’ production which converts private commodities into essentially public goods. Similar to other public goods, incentives for reputation and group identity appear to motivate contributions to open source projects, overcoming the social dilemma inherent in producing such goods. In this paper we examine how contributor motivations affect the type of contributions made to the open source online encyclopedia Wikipedia. As expected, we find that registered participants, motivated by reputation and commitment to the Wikipedia community, make many contributions with high reliability. Surprisingly, however, we find the highest reliability from the vast numbers of anonymous ‘Good Samaritans’ who contribute only once. Our findings of high reliability in the contributions of both Good Samaritans and committed ‘zealots’ suggest that open source production succeeds by altering the scope of production such that a critical mass of contributors can participate. KEY WORDS • collective action • collective goods • group identity • open source production • public goods • reputation • technology
1. Introduction New information and communication technologies such as the Internet, cell phones and social networking websites, often are presented as atomistic causal forces that dramatically alter society. This type of technological determinism fails to recognize not only the extent to which the production and use of technology is embedded in social and Rationality and Society Copyright © The Author(s), 2009, Reprints and permissions: http://www.sagepub.co.uk/journalsPermissions.nav, Vol. 21(3): 283–306. http://rss.sagepub.com DOI: 10.1177/1043463109336804
284
RATIONALITY AND SOCIETY 21(3)
organizational contexts (Kling and Iacono 1984; Fischer 1992; Orlikowski and Barley 2001), but also the role of communities of practice often necessary to produce the knowledge or content transmitted and used with such technologies (Lave and Wenger 1991; Orlikowski 2002; Miller and Sim 2004). One of the most important organizational innovations accompanying the Internet is the emergence of ‘open source’ production (also known as ‘open content’), defined as the free and open creation, alteration and distribution of goods, typically software, via the contributions from vast numbers of geographically distributed and uncoordinated actors (Lakhani and Wolf 2005; Open Source Initiative 2005). Open source production is remarkable because it converts a private commodity (typically software) into essentially a public good (Kogut and Metiu 2001; O’Mahony 2003). That is, open source production is organized not as a private market good produced through either contract or hierarchy (Williamson 1985) but as a common pool resource system (Ostrom 1990). Indeed, advocates of open source software often describe it as a ‘movement’ rather than a product because it entails incentives that encourage participants to become committed to it, similar to participants of social movements (Stallman 1999; Raymond 2001; Torvalds and Diamond 2001). Given the inherent social dilemma in producing public goods and protecting common pool resource systems, open source production would seem to lack the selective incentives that ensure efficient production of goods (Olson 1965; Hardin 1968; Ostrom 1990; Kollock 1998). Early studies of open source, however, suggest that production is fueled by a small number of experts who are motivated by factors such as reputation and group identity (Ghosh and Prakash 2000; Lerner and Tirole 2002; Lakhani and von Hippel 2003; Mockus et al. 2005). Such mechanisms are capable of overcoming the social dilemma inherent in collective goods production. In this paper we move beyond the initial question of what motivates contributors to open source production to ask how contributor motivations are related to the nature of the contributions they make. Given the constant opportunity for revision of content in open source production, whose contributions survive over time? Also, how are the mechanisms that motivate participation related to the survival of content in open source goods? In seeking to answer these questions, this paper makes three contributions. First, we hypothesize how contributor motivations in open source goods affect the level and reliability of contributions to the online, open-content encyclopedia, Wikipedia. Second, we use data from a
ANTHONY ET AL.: REPUTATION AND RELIABILITY
285
random sample of 7058 contributors to the French and Dutch language Wikipedia websites to test our hypotheses. Finally, we consider the implications for open source production specifically, and more broadly, for collective action in general.
2. The case of Wikipedia Wikipedia, the online, open content encyclopedia (www.wikipedia.org) is a compelling example of open source production. According to its Main Page, Wikipedia is ‘the free-content encyclopedia that anyone can edit.’ The English language version, started in 2001, currently has the most content with over 2.44 million articles (as of July 2008). Wikipedia describes itself as ‘a multilingual, web-based, free content encyclopedia project. Wikipedia is written collaboratively by volunteers; its articles can be edited by anyone with access to the Internet’ (http://en.wikipedia.org/wiki/Wikipedia). It has editions in 200 different languages and contains entries both on traditional encyclopedic topics and on almanac, gazetteer, and current events topics. Not only is Wikipedia content open access, but the creation and revision of the content is also entirely open such that anyone can add to or edit any entry.1 The precursor to Wikipedia was conceived by developers Jimmy Wales and Larry Sanger as a freely accessible online encyclopedia, but the content and its quality was to be ensured by seeking expert contributions evaluated by peer review (see Lih 2004; http://en. wikipedia.org/wiki/Wikipedia#History). In contrast, Wikipedia as it now exists succeeded by replacing the more time-consuming professional contributions and expert peer review with their most immediate and democratic extremes: no proof of identity or qualifications is necessary in order to contribute or edit content at any time (see footnote 1). As with any encyclopedia, the value of Wikipedia is the breadth and quality of its content, yet its quality is a much debated issue. Somewhat surprisingly, in the few systematic studies comparing quality of content between Wikipedia and professionally produced encyclopedias, Wikipedia is found to be comparable in quality (Lih 2004; Giles 2005; cf. Encyclopedia Britannica 2006). Yet questions about the quality of Wikipedia content persist. Concerns about quality in Wikipedia focus on the nature and skills of the contributors and editors given the lack of credentials required (Wagstaff 2004; Giles 2005; Orlowski 2005; Terdiman 2005; Encyclopedia Britannica 2006; Nature 2006).
286
RATIONALITY AND SOCIETY 21(3)
Moreover, it is the open source organization of Wikipedia that is cited as the primary problem because of the difficulties associated with participation in collective goods, i.e., the lack of individual incentives suggest that few will participate while it is extremely unlikely that ‘real experts’ will contribute at all. Furthermore, low barriers to entry also are assumed to encourage low quality contributions at best. Contributor motivations in Wikipedia In order to understand how contributor motivations are related to the nature and impact of the contributions they make, we must identify fully the types of motivations likely to encourage contributors to Wikipedia. As noted above, one factor encouraging contributions to Wikipedia and other open source goods, cited by critics as a problem, is the low cost of contributing (Lerner and Tirole 2002). The very ‘wiki’ technology used by Wikipedia reduces the costs of participation. A ‘wiki’ is a type of collaborative Webpage (comprised of a collection of pages) that enable anyone who accesses it to contribute or modify content and every edit is saved as a unique document. Wikipedia is a collection of wiki-pages on specific topics for which the entire edit history of the topic is available because each unique ‘edit’ submitted is its own wiki page. Every Wikipedia article has a button ‘edit this page’ (along with buttons for the edit ‘history’ of the page and a ‘discussion’ related to the page) that makes it easy to contribute. The wiki technology also reduces the costs of fixing a mistake because any user can view past edits and even restore a previous version of the content, as well as add his or her own content. In addition to providing clear instructions on how to make an edit, Wikipedia promotes the idea that even novice contributors can participate by encouraging that ‘you can’t break it’ when you contribute. Such low barriers to entry would seem to support critics who claim that Wikipedia must be low quality because there is little to prevent poor and even nonsense contributions. However, the formal policies of Wikipedia,2 as well as the wiki technology, limit (though do not prevent) negative contributions such as nonsense contributions or so-called ‘graffiti attacks’ in which contributors deface or overwrite the content of certain topic areas. For example, Ciffolilli (2003) argues that the very wiki technology that saves all past versions of every article, makes it very easy for friendly contributors to ‘clean up’ a damaged article, often by simply restoring a previous version. Other research similarly shows that graffiti and damage to controversial topic pages (e.g. the page on abortion) are repaired quickly at Wikipedia (Wattenberg and Viegas 2003; Viegas et al. 2004).
ANTHONY ET AL.: REPUTATION AND RELIABILITY
287
Beyond the relatively low costs of contributing, contributors can benefit from participating in Wikipedia by building a reputation within the community. Studies of various open source projects find that contributors claim that an important reason they participate is to build a reputation within the community (Lerner and Tirole 2002; Lakhani and von Hippel 2003; von Krogh et al. 2003; Lakhani and Wolf 2005). By providing a basis for status in the community (Stewart 2005), reputation systems are powerful mechanisms for overcoming collective action problems (Raub and Weesie 1990; Kollock 1998; Cheshire and Cook 2004). Indeed, reputation systems are important for the success of other new Internet-based institutions, such as the auction website eBay (Kollock 1999). Some researchers argue that reputation systems could be the basis for all secure Internet-based communication and exchange (e.g. Camp et al. 2002; Cheshire and Cook 2004), yet there is evidence that reputations can be strategically abused in such systems (David and Pinch 2005). Wikipedia encourages contributors to become ‘registered users’ by outlining the benefits of having a user account, including building a reputation in the community (http://en.wikipedia.org/wiki/ Wikipedia: Why_create_an_account, July 2008). According to Wikipedia, there are now over seven million registered user accounts, ‘plus an unknown, but quite large, number of unregistered contributors’ (http://en.wikipedia.org/wiki/ Wikipedia:Wikipedians July 2008). Though registered-user names are merely ‘cheap’ pseudonyms (Friedman and Resnick 2001) that are easily abandoned and not necessarily tied to an individual’s real identity, they provide a mechanism for establishing and tracking reputation. Indeed some users may care about the reputation of their username or online ‘avatar’ regardless of the connection to their offline identity (Balkin 2004; Anderssen 2007; Tzortzis 2007; thanks to an R&S reviewer for this point). Reputation is based on an actor’s (perceived) history of behavior. In Wikipedia, users can view the history of contributions for any topic. In doing so, users see each edit and who contributed it. Contributions from registered users will be listed by their usernames, while anonymous contributors have no name but merely an IP address listed. An IP or Internet-Protocol address is a 32-digit number used to identify a computer (or other device) on computer networks connected to the Internet. Clicking on a registered user name takes one to the ‘user’s page,’ which is Wikipedia-space where registered users create personalized pages about themselves and their contributions to Wikipedia, if they choose to
288
RATIONALITY AND SOCIETY 21(3)
do so. Wikipedia even lists the top 1000 contributors with the most edits, some of whom have been identified by name in the popular press (e.g. Terdiman 2005). Contributors with no interest in reputation can remain anonymous. Though anonymous users are listed by IP address only, it is possible to view the history of an IP address similar to a registered user, if more than one contribution is made. As shown below, however, the majority of anonymous users make one contribution only. In addition to reputation, some contributors are motivated by an apparent strong commitment to the Wikipedia community (Giles 2005; http://en. wikipedia.org/wiki/Wikipedia_community, accessed 4/2/2007). Actors who feel a strong and salient group identity are more likely to contribute to collective goods (Anthony 2005; Turner and Tajfal 1986; Dawes 1980; Dawes et al. 1990). Contributors to open source projects (Raymond 2001) and virtual communities (Wellman and Gulia 1999) express strong feelings of identification with the group, even though such groups may exist only in virtual ‘online’ space. Such contributors may even be ‘zealots’, Coleman’s (1990) term for contributors to a collective good who contribute for purely intrinsic value rather than, or in addition to, individual incentives such as the rewards of reputation or skill development (see, e.g., Raymond 2001; Lakhani and Wolf 2005). Wikipedia clearly presents itself as a community. For example, Wikipedia defines its contributors: ‘Wikipedians are the people who write and edit articles for Wikipedia…Wikipedian suggests someone who is part of a group or community. So in this sense, Wikipedians are people who form the Wikipedia Community’ (http://en.wikipedia. org/wiki/Wikipedia:Wikipedians accessed July 2008). One of the top links on the main webpage is for the ‘Community Portal’ which contains information about many different ways that users can participate in the community of Wikipedia. According to this discussion, in addition to the internal incentives that motivate zealots, at least two types of external incentives may motivate contributors to Wikipedia: (1) reputation in, and (2) commitment to, the Wikipedia community. How might these motivations influence participation in Wikipedia? Contributors interested in building a reputation will become registered users since a username and account enables them to create a track record with which to establish a reputation. Contributors with no interest in reputation will remain anonymous. Group identity, in contrast, has implications for both registration and the level of participation. Users who strongly identify with the Wikipedia community (i.e. Wikipedians)
ANTHONY ET AL.: REPUTATION AND RELIABILITY
289
will likely register and participate by making many contributions. In contrast, contributors who do not identify with the community will participate less, making few contributions, and are unlikely to register, leading to our first hypothesis: Hypothesis 1: Registered users will make more contributions than non-registered users. In addition to the number of contributions, how might motivations affect the reliability of contributions, i.e. the extent to which a contributor’s content is retained over time? If the way to gain a positive reputation is to be a registered user with many contributions, we would also expect that such contributors are interested in making valuable contributions to Wikipedia content. That is, we would expect these types of contributors to make highly reliable contributions given the ability to identify and track their contributions, particularly over many contributions. Indeed, such contributors at the intersection of strong interest in reputation and a strong Wikipedia identity are the committed contributors and zealots expected by advocates of open source and typically found to be the primary contributors to such projects. Thus we hypothesize that: Hypothesis 2: Registered users with many contributions will have higher reliability than both (a) registered users with fewer contributions, and (b) non-registered (i.e., anonymous) users. However, the risk of being edited increases with more contributions, so it may be the case that reliability declines for committed contributors who make very many contributions. Thus we examine whether there is a non-linear relationship between reliability and number of contributions for registered users with the following hypothesis: Hypothesis 3: Above some threshold of contributions, reliability will decline for registered users. The discussion above focused on users high on both types of motivations: reputation and identity. What are the implications for the nature and impact of contributions from anonymous and/or one-time contributors? Virtually all theories of social dilemmas would predict little participation and low quality contributions (reliability) from anonymous
290
RATIONALITY AND SOCIETY 21(3)
contributors since they have little incentive to contribute at all. Yet the lore of open source suggests that anonymous contributors are as important as the zealots. Who are these anonymous contributors? They are likely to be of (at least) two types. The first type of anonymous contributor to Wikipedia is simply the user who sees a mistake such as a typographical or grammatical error, and makes a contribution to fix it. These contributions are likely to be shorter and less substantive than others, leading to our next hypothesis: Hypothesis 4: Anonymous users will contribute less content per edit than registered users. Given the nature of these types of short, non-substantive contributions, we would also expect them to be highly reliable because they are unlikely to be edited or changed in the future. We discuss this hypothesis more specifically below after we describe the second type of anonymous contributor. A second type of anonymous Wikipedia contributor is the expert in a particular field who makes a contribution to the article in their area of expertise. These experts do not care about their reputation in Wikipedia so they do not register, nor are they committed to Wikipedia as a community so they are unlikely to make many contributions. Instead they care about their area of expertise and so contribute to that topic only. Taking the time to register would increase the costs of contributing for these Good Samaritan experts, and since they are not interested in reputation and do not identify with the community itself, they have no reason to incur these costs. Our next hypothesis is based on this discussion that both types of anonymous contributors, users who fix minor errors and field experts who contribute substantive content to articles in their areas of expertise, will be likely to contribute only one time: Hypothesis 5: Most anonymous contributors will contribute one time only. Furthermore, despite their anonymity, both types of anonymous users would seem to have high reliability. The minor mistakes fixed by onetime Good Samaritan contributors are likely to be retained over time since they are unlikely to be substantive changes, while the more substantive contributions of Good Samaritan ‘experts’ also are likely to be retained given their presumed expertise in the subject matter.
ANTHONY ET AL.: REPUTATION AND RELIABILITY
291
Hypothesis 6: Anonymous one-time contributors will have higher reliability than anonymous contributors with more than one contribution. But what about the reliability of anonymous users with many contributions? As noted above, high participation levels suggest that the contributor strongly identifies with the Wikipedia community. Why would a Wikipedian committed to the community by making many contributions choose to remain anonymous? One possibility is that such users know their contributions are of low quality and do not want to be identified through a registered user name. Alternatively, many contributions may mean that these users are strongly committed, but unlike the registered Wikipedians described above, their interest may be negative rather than positive. These would-be ‘hackers’ may actively seek to contribute lowquality content to harm the community. For these reasons, we expect that reliability will decline with the number of contributions for anonymous users, in contrast to registered users whose reliability is expected to increase with number of contributions (up to some threshold after which it will decline): Hypothesis 7: Reliability will decrease with number of contributions for anonymous users. Another possibility for why we might see multiple contributions from an anonymous user is that the multiple contributions identified from a single IP-address are not from the same contributor at all, but rather the result of proxies or dynamic IP-address allocation in some large companies and universities in which a user’s IP address will be different over time.3 Unfortunately, because we cannot distinguish dynamic from stable IP addresses, we may misidentify some anonymous users as having multiple contributions when in reality each is a unique edit from a single contributor. This means that we may underestimate the number of anonymous contributors with only one edit, making our test of hypothesis 6, that most anonymous users contribute one time only, a conservative test. Similarly, our estimate of reliability among anonymous users with more than one edit (hypothesis 7) may be confounded by combining the estimates of reliability for unique anonymous users. Given our expectation that anonymous users with one contribution will have high reliability, this may increase our estimate of reliability for multi-contribution anonymous users, making our test of hypothesis 7 a conservative test. We now turn to data from Wikipedia contributors to analyze these hypotheses.
292
RATIONALITY AND SOCIETY 21(3)
Table 1.
Population and sample of Wikipedia contributors by user type and language User type
Language French Population Sample Dutch Population Sample Total Population Sample
Registered
Anonymous
Total
5690 1763
48211 1729
53901 3492
2895 1819
30322 1747
33217 3566
8585 3582
78533 3476
87118 7058
3. Data and methods We selected samples of Wikipedia contributors from the populations of both the French and Dutch language sites as of March 1, 2005.4,5 By March 2005 Wikipedia was well known, but significant debate over the quality of its content had not yet occurred (Giles 2005; Encyclopedia Britannica 2006). As of March 1, 2005 there were a total of 53 901 contributors to the French language site and 33 217 contributors to the Dutch language site. In order to select our sample, we first compiled a list of all contributors within each language group, then drew two random draws within each language of up to 1000 contributors for each user-type (registered and anonymous), for a total of n = 7058 (see Table 1). Since registered users are over-represented in our sample compared to their distribution among all contributors, we weight the analyses based on the population proportions of each user-type within each language group. Variables We hypothesize that contributor motivations affect the level and reliability of contributions between registered and anonymous users (hypotheses 1, 2b and 4), among registered users (hypotheses 2a, 3) and among anonymous users (hypotheses 5, 6 and 7). We measure reliability as the rate of each contributor’s content retained in the current version of the topic article. The retention rate, R, of contributions is measured as the percentage of characters retained per contribution by
ANTHONY ET AL.: REPUTATION AND RELIABILITY
293
each contributor. More specifically, we measure the number of characters retained, C, in a given article, summed across all edits (contributions), j, for each contributor, i, divided by the sum of the total number of characters, T, in each topic article edited per contributor. K
Σ Cij
Ri =
j=1
K
ΣT
j=1 ij
For each contributor, we use the Wikipedia differencing algorithm6 to compare the differences among three documents: (1) edit, the content submitted to each topic article by the contributor, (2) previous, the version of the article prior to the edit, and (3) current, the version of the article as it exists on the day the sample was drawn. Edits generally occur in time prior to the time point at which current is measured, so current does not in general equal edit, though it is possible if the contributor contributed all of the current content. We measure the retention of an edit by calculating the number of characters from a contributor’s edit (comparing edit to previous) that are retained in the current version (comparing edit to current) as a percentage of the total number of characters in the article. For example, compare the following illustrative sentences. Previous: ‘Public goods are unlike private goods’; edit: ‘Public goods, in contrast to private goods, are non-excludable’; and current: ‘In contrast to private goods, public goods are non-excludable and non-rival.’ Comparing edit to current, we find that (when considering longest common subsequences) 62 of the total 75 characters in the current version are retained for a retention rate of 83% (note that spaces are counted in the character count). We propose reliability as one quantitative dimension of the quality of a contribution. It is likely a conservative measure to the extent that contributors are satisficing (Simon, 1957) rather than maximizing with regard to content. That is, contributors add to or edit an entry until is is ‘good enough’ rather than until it is ‘perfect’ or complete. As illustrated in the example above, a contributor’s edit may include any of the following: added material, edited or deleted content, content kept unchanged from the previous version. Our measure of retention includes all characters in the version ‘submitted’ by the contributor, no matter how much or how little of the content was added, deleted or changed by the contributor. A contributor has the opportunity to add, edit or delete whatever she chooses, so preserving content from earlier versions is taken to mean at least tacit acceptance of its quality. It is
294
RATIONALITY AND SOCIETY 21(3)
Table 2. Means for Wikipedia contributor characteristics (unweighted)
Number of cases Retention rate Number of edits Log edits Article size Log article size Contribution size Log contribution Registered user (percentage)
Total
French
Dutch
7058 72.1 (29.0) 9.4 (15.0) 1.3 (1.3) 4412 (5886) 7.8 (1.2) 358 (1545) 4.8 (1.6) 51
3566 70.4 (29.6) 9.0 (14.5) 1.2 (1.3) 5054 (6869) 7.9 (1.2) 358 (1089) 5.7 (2.5) 51
3492 73.7 (28.4) 9.7 (15.5) 1.2 (1.4) 3784 (4647) 7.7 (1.2) 358 (1889) 5.7 (2.5) 51
Note. Standard deviations in parentheses.
important to note that Wikipedia requires that contributors edit on the granularity of whole entries. For example, the data structure does not permit ‘journaling’ in which a contributor might submit an edit such as: ‘like before, except change sentence 23 as follows.’ Overall, the mean retention rate at Wikipedia is 72% (see Table 2). Level of contribution is measured in two different ways. First, the number of contributions is measured as the number of times a contributor made an edit. On average, contributors made over nine edits, with a range of 1–50 edits. Given the significant positive skew of this measure, we take the natural log in the analyses. Second, the size of a given contribution is measured as the number of characters added per edit (natural log). Smaller contributions are more likely to be a minor change such as fixing a typographical error, and thus are more likely to be retained. The key independent variable is whether a contributor is registered or anonymous. Contributor registration status is measured by whether they have a registered user name or remain anonymous as indicated by an IP address. Finally, our analyses also control for language area (French = 1, Dutch = 0), and the total size of each article to which a user contributed, measured as the total number of characters (natural log). Article size controls for the possibility that registered and anonymous users contribute to fundamentally different types of Wikipedia topics. Since Wikipedia content is constantly evolving, at any given time there are many ‘new topics’ with relatively small existing entries, as well as many well-established topics with a great deal of existing content. It may be that anonymous users are more likely to contribute only to
ANTHONY ET AL.: REPUTATION AND RELIABILITY
295
well-established articles, or conversely only to newer topics with less existing content, thus we control for these possibilities. In the analyses that follow we first present bivariate comparisons of registered to anonymous users on the dependent and control variables (hypotheses 1, 4 and 5). Next we show the analysis of reliability among registered users (hypotheses 2a and 3), then among anonymous users (hypotheses 5, 6 and 7). Finally, we compare reliability of registered to anonymous users (hypothesis 2b).
4. Results Table 3 shows the bivariate comparison of registered to anonymous users for each variable. Registered and anonymous users have the same percentage of French language users in each (as a result of the sampling frame), and they contribute to articles that are, on average, the same size. While we control for article size in all analyses below, it is worth noting that, at least on the dimension of size, registered and anonymous users do not contribute to apparently different types of articles. However, Table 3.
Wikipedia contribution characteristics by type of user (unweighted n = 7057) Registered user
Anonymous user
French language
0.49 (0.50)
0.50 (0.50)
Article size (Number of total characters in each article edited, natural log) Edits (Number of edits contributed, natural log) Percentage with one edit
7.8 (1.1)
7.8 (1.3)
1.9** (1.4)
0.6 (0.83)
Contribution size (Number of characters added per edit, natural log) Reliability (Retention rate: percentage of characters added per edit)
20.6 (40.5) 5.0** (1.5)
70.3 (28.4)
54.7 (49.8) 3.9 (1.8)
74.0** (29.5)
F = 0.19 df = 1 and 7056 F = 0.89 df = 1 and 7056
F = 2058.0*** df = 1 and 7056 F = 998.9*** df = 1 and 7056 F = 774.1** df = 1 and 7056 F = 29.7** df = 1 and 7056
Note. Standard deviations in parentheses; ** p < 0.01, *** p < 0.001; F = analysis of variance (ANOVA) test statistic; df = degrees of freedom.
296
RATIONALITY AND SOCIETY 21(3)
registered and anonymous users contribute in very different ways. Overall, registered users contribute significantly more often than anonymous users, contributing many more edits (1.9 log edits compared to 0.6 log edits for anonymous users, p < 0.001) as expected by hypothesis 1, consistent with the idea that registered users are more interested in reputation and committed to the community. Similarly, most anonymous users contribute one time only, consistent with hypothesis 5, while most registered users contribute more than once (54.7% of anonymous users contribute one time compared to 20.6% of registered users, p < 0.001). Registered users also contribute significantly more content per edit compared to anonymous users, consistent with hypothesis 4 that anonymous users will be more likely to submit shorter edits, such as fixing typographical or simple errors. Surprisingly however, anonymous users have significantly higher reliability compared to registered users (74% versus 70% for registered users, p < 0.001). This bivariate finding on reliability is not consistent with hypothesis 2b that registered users with many contributions will have higher reliability than anonymous users, but nor is this a valid test of hypothesis 2b because we are not controlling for the number or size of contributions. However, given the expected motivations of reputation and identity among registered users, this seems remarkable. Before turning to the results of the multivariate analysis of reliability between registered and anonymous users, we first present the results for outcomes within each user category. Table 4 shows the results of weighted Ordinary Least Squares (OLS) regression of retention rate for registered users. Model one shows that after controlling for language, article size, and the size of the contribution, reliability is not significantly different as the number of edits increases (i.e., log edits is negative and not significant) for registered users, indicating no support for hypothesis 2a. It is important to recognize that all of the control variables are significant. Registered users contributing to the French language site have significantly lower retention rates than those in the Dutch site (suppressed category), though we do not speculate as to why this occurs. The larger the topic article to which one contributes, the higher is the retention rate. In contrast, the larger the size of the edit contribution, the lower is the retention rate. Model two in Table 4 shows the weighted OLS regression of retention rate that includes the quadratic to test hypothesis 3 that the relation between retention rate and number of edits is non-linear for registered users. The control variables have the same effect as in model one, while in model two,
ANTHONY ET AL.: REPUTATION AND RELIABILITY
Table 4.
297
OLS unstandardized coefficients for reliability among registered users (weighted) Registered users
Constant French language Log article size Log contribution size Log edits Log edits2 Adjusted R2 Unweighted N
1
2
0.39** (0.040) −0.03** (0.010) 0.06** (0.005) −0.03** (0.003) −0.01 (0.008) – 0.07 3582
0.37** (0.040) −0.03** (0.010) 0.06** (0.005) −0.03** (0.004) 0.03* (0.015) −0.01** (0.003) 0.08 3582
Note. Standard error terms in parentheses; * p < 0.05, ** p < 0.01.
log edits is significant and positive indicating that retention rate is higher for those with more edits, consistent with hypothesis 2b. Log edits squared is significant and negative, indicating a non-linear relationship in which, above some threshold, retention rate declines with increasing number of contributions for registered users, consistent with hypothesis 3. Table 5 shows the results of the weighted OLS regression of reliability for anonymous users. Whereas log edits was positive for registered users (model 2 in Table 4), indicating increasing reliability with increasing contributions, it is negative for anonymous users, supporting hypothesis 7 that reliability decreases with number of contributions for
Table 5.
OLS unstandardized coefficients for reliability among anonymous users (weighted) Anonymous users
Constant French language Log article size Log contribution size Log edits Adjusted R2 Unweighted N
0.54** (0.040) −0.05** (0.010) 0.05** (0.004) −0.03** (0.003) −0.03** (0.009) 0.08 3476
Note. Standard error terms in parentheses; * p < 0.05, ** p < 0.01.
298
Table 6.
RATIONALITY AND SOCIETY 21(3)
OLS unstandardized coefficients for reliability among Wikipedia contributors (weighted)
Constant French language Log article size Log contribution size Log edits Registered user Registered user* Log edits Adjusted R2 Unweighted N
1
2
0.52** (0.040) −0.05** (0.009) 0.05** (0.004) −0.03** (0.003) −0.002** (0.008) −0.002 (0.009) –
0.53** (0.040) −0.05** (0.009) 0.05** (0.004) −0.03** (0.003) −0.001 (0.007) −0.05** (0.010) 0.03** (0.006)
0.08
0.08 7058
7058
Note. Standard error terms in parentheses; ** p < 0.01.
anonymous users. The control variables have similar effects on reliability for anonymous users as they did for registered users. Reliability is higher when the topic article being edited is larger. Shorter contributions have higher reliability, while French contributors are less likely to have their contributions retained compared to Dutch contributors. Table 6 shows the weighted OLS regression of reliability for all users to examine whether registered users with many contributions have higher reliability than anonymous users (hypothesis 2b). Somewhat surprising that registered users do not have significantly higher reliability than anonymous users overall. As above, the control variables have similar effects on reliability in model one in Table 6: article size is positive, contribution size and French language are negative. Among all users, the more contributions (log edits) a contributor makes, the lower their reliability. Model two in Table 6 adds the interaction between registered status and number of contributions to test hypothesis 2b. Here log edits is no longer significant while registered user status is significant but negative, while the interaction between registration and log edits is significant and positive. The result for the interaction means that reliability is higher among registered users with more contributions, consistent with hypotheses 2a and 2b. However, the finding that registered users have significantly lower reliability (registered user status is negative) once controlling for the interaction indicates that anonymous users with fewer contributions have higher reliability not only compared to other anonymous users (consistent with hypothesis 7 and results shown in Table 5), but also to registered
ANTHONY ET AL.: REPUTATION AND RELIABILITY
299
Retention rate
0.76 0.74
p