select the OS projects that are best suited for this research study adopt the right approach for ... SourceForge. SourceForge is a website containing different tools to facilitate OS ... snmp, Firewall Builder, The Nebula Device, Php-. Wiki, CMU Sphinx .... connections: this is a feature of all scale-free networks. Degree. Project N.
Open Source communities as Social Networks: an analysis of some peculiar characteristics Giulio Concas, Manuela Lisci, Sandro Pinna, Guido Porruvecchio and Selene Uras
Guido Porruvecchio and Selene Uras
University of Cagliari
ASWEC 2008, Perth, Australia OS communities () as Social Networks
ASWEC, 26 March 2008
1 / 26
Outline
1
Communication in OS Communities
2
Most successful OS Projects
3
Social Network Analysis
4
Open Source Communities as Social Networks
5
Centrality indexes on OS Communities
6
Conclusion and further work
Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
2 / 26
Communication in OS Communities OS Communities OS development teams are characterized by different peculiar aspects: Developers work in a dis-located manner Source code is shared Knowledge about the project is shared Members are often volunteer and have different roles Users are sometimes involved in decisional process To conciliate these distinct aspects, it is necessary to understand, facilitate and improve team communication
Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
3 / 26
Communication in OS Communities Some researches have found that team performance also depends on the information provided on the project Our purpose is to analyze OS communities from the communication point of view We need to: select the OS projects that are best suited for this research study adopt the right approach for their analysis
How to choose the OS projects for our research? Which approach is best suited to analyze them? Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
4 / 26
Most successful Open Source Projects SourceForge SourceForge is a website containing different tools to facilitate OS software development process It is targeted at both developers and users Sourceforge hosts more than 172.000 projects; every project has its own page One of the most important community tool is the Developers Mailing List
Most successful projects On SourceForge there are many statistics that indicates the success of a project, like activity, downloads, pageviews... It was decided to analyze communicational flow in the most active developers mailing lists of SourceForge projects Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
5 / 26
Choosing the most successful Open Source Projects Status Corrupted archives
Without developers ML
Without ML
ML not active enough
Available archives Guido Porruvecchio and Selene Uras
Projects Crystal Space 3D, SquirrelMail, Jedit, Icewm BO2K, Mesa3D, Small Device C Compiler, Firebird, BZFlag, User-mode Linux kernel port, Etherboot, Enlightenment WebCalendar, Numerical Python, Exult, wxWidgets, MiKTeX, Tcl, Doom Legacy, AWStats, netsnmp, Firewall Builder, The Nebula Device, PhpWiki, CMU Sphinx, Courier Mail Server, gnuplot, Developer’s Image Library, Dev-C++ Ghostscript, Python, Linux PCMCIA Card Services, Mailman, Quanta, The Freenet Project, Openads, Common C++ Libraries, Gnucleus, Gabber, Boa Constructor, Scintilla, The EDGE Project Double Choco Latte, TUTOS, Slash, Screem, Leo, Owl Intranet Knowledgebase, WebMail-Java, Halflife Admin Mod, Cdex, DynAPI Gaim, Gimp, Licq, Miranda, MinGW, Netatalk, Gallery, Arianne, Geotools OS communities () as Social Networks
ASWEC, 26 March 2008
6 / 26
Chosen projects Only 9 projects out of 70 seem to actually use SF developers ML so they have been chosen to evaluate communication among members Project
Topic
Arianne
Multi player on line engine to develop games
Gaim
Instant messaging application
Gallery
Web based photo gallery
Geotools
Open source Java GIS toolkit
Gimp-Print
Package of printer drivers
Licq
Instant messaging application
Mingw Miranda
Tool for importing libraries and header files Instant messaging application
Netatalk
Daemon for sharing files and printers
The framework to parse the data from MLs archives for each mail extracts sender, subject, date and thread starter Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
7 / 26
Advanced Users and Developers in DMLs DML is frequented by two different kinds of people Advanced Users exchange opinions on software utilisation, giving the right feedback to developers that helps them to implement new features and correct bugs Developers discuss topics about new features and bug fixes Project
Users
Developers
Participants
Arianne
90
12
112
Gaim Gallery
1195 431
52 53
1247 484
Geotools
299
96
395
Gimp-Print
581
46
627
Licq
598
13
611
Mingw
62
43
105
Miranda
164
16
180
Netatalk
695
43
738
Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
8 / 26
Social Network Analysis Social Network Analysis Social Network Analysis (SNA) has been defined as a way to describe relationships among social entities, as well as the patterns and implications of these connections
SNA in this research In this research SNA provides the tools to : study relationships among people in OS communities draw the network of each community actors become nodes each relationship between them is represented by a link
Considers actors as people keeping in contact
Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
9 / 26
Actors, links, centrality The network performed by OS communities was defined as follows: Nodes are the mail senders: each member who posted a message in a discussion thread Links are established between two members participating at the same thread The network built in this way was analysed extracting centrality measures One of the most important aspects of SNA is the identification of the most prominent or central actors: Those are particularly visible to other actors in the network and are able to keep many relationships with them It is important to take into account not only direct ties in order to evaluate how central a node is in the network, but also indirect paths that link one actor to another Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
10 / 26
Centrality indexes on OS Communities Freeman focused his attention on three fundamental measures: Degree, betweenness and closeness (point indexes) Basing on them he elaborated the network-level centralization indexes They quantify the gap between the most prominent actor and all the remaining actors All these centrality indexes, by identifying the most central members in community networks, can be useful: To better characterize these communities To find a possible relationship between prominent members and OS projects’ success, quality and maturity
Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
11 / 26
Data analysis: degree index Degree is the simplest measure of centrality, defining as most central the actors who have the largest amount of ties with other actors in the network Freeman, reviewing previous related work, chose Nieminen definition for point degree: X mij CD (mi ) = (1) j
where m indicates the community member, and mij = 1 if mi and mj are connected, 0 otherwise. The group degree centralization index is defined as follows: i Pg h ′ i=1 CD (n ) − CD (ni ) CD = (2) (g − 1)(g − 2) ′
where CD (n ) is the largest point degree value and g is the number of nodes in the network. This index equals 1 when one actor interacts with all other g − 1 actors, and they interact only with him; its minimum value is 0, when all degree values are equal (the graph is regular) Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
12 / 26
Degree index in OS Communities The degree distribution in log-log scale shows a similar behavior among projects: very few central members have a high degree, while the vast majority have few direct connections: this is a feature of all scale-free networks
Degree Project Arianne Gaim Gallery Geotools Gimp Licq MinGW Miranda Netatalk
N 95 1185 435 301 587 572 97 156 701
Guido Porruvecchio and Selene Uras
Mean 0.077 0.008 0.016 0.031 0.008 0.011 0.103 0.083 0.006
Centr. 0.656 0.402 0.551 0.431 0.658 0.407 0.535 0.341 0.174
OS communities () as Social Networks
ASWEC, 26 March 2008
13 / 26
Degree index analysis Some projects (especially Miranda and Netatalk) have a low value of degree centralization index These communities do not have a central member whose degree is much higher than the others, but instead, a core group of members who are more visible and exchange an higher number of messages As Wasserman pointed out, not all the actors are directly connected to the network core: Using only degree, an actor position risks to be evaluated only with respect to the closest ties Betweenness index overcomes this obstacle considering another topological aspect of the network
Guido Porruvecchio and Selene Uras
OS communities () as Social Networks
ASWEC, 26 March 2008
14 / 26
Data analysis: betweenness index Betweenness is a measure which takes into account how often a node lies in the shortest path (geodesic path) between two other nodes Under the hypothesis that two geodesics with the same length have the same probability to be chosen, Freeman defined actor betweenness: CB (ni ) =
X gjk (ni ) j