Open Source communities as Social Networks: an ... - Semantic Scholar

4 downloads 74743 Views 898KB Size Report
select the OS projects that are best suited for this research study adopt the right approach for ... SourceForge. SourceForge is a website containing different tools to facilitate OS ... snmp, Firewall Builder, The Nebula Device, Php-. Wiki, CMU Sphinx .... connections: this is a feature of all scale-free networks. Degree. Project N.
Open Source communities as Social Networks: an analysis of some peculiar characteristics Giulio Concas, Manuela Lisci, Sandro Pinna, Guido Porruvecchio and Selene Uras

Guido Porruvecchio and Selene Uras

University of Cagliari

ASWEC 2008, Perth, Australia OS communities () as Social Networks

ASWEC, 26 March 2008

1 / 26

Outline

1

Communication in OS Communities

2

Most successful OS Projects

3

Social Network Analysis

4

Open Source Communities as Social Networks

5

Centrality indexes on OS Communities

6

Conclusion and further work

Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

2 / 26

Communication in OS Communities OS Communities OS development teams are characterized by different peculiar aspects: Developers work in a dis-located manner Source code is shared Knowledge about the project is shared Members are often volunteer and have different roles Users are sometimes involved in decisional process To conciliate these distinct aspects, it is necessary to understand, facilitate and improve team communication

Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

3 / 26

Communication in OS Communities Some researches have found that team performance also depends on the information provided on the project Our purpose is to analyze OS communities from the communication point of view We need to: select the OS projects that are best suited for this research study adopt the right approach for their analysis

How to choose the OS projects for our research? Which approach is best suited to analyze them? Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

4 / 26

Most successful Open Source Projects SourceForge SourceForge is a website containing different tools to facilitate OS software development process It is targeted at both developers and users Sourceforge hosts more than 172.000 projects; every project has its own page One of the most important community tool is the Developers Mailing List

Most successful projects On SourceForge there are many statistics that indicates the success of a project, like activity, downloads, pageviews... It was decided to analyze communicational flow in the most active developers mailing lists of SourceForge projects Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

5 / 26

Choosing the most successful Open Source Projects Status Corrupted archives

Without developers ML

Without ML

ML not active enough

Available archives Guido Porruvecchio and Selene Uras

Projects Crystal Space 3D, SquirrelMail, Jedit, Icewm BO2K, Mesa3D, Small Device C Compiler, Firebird, BZFlag, User-mode Linux kernel port, Etherboot, Enlightenment WebCalendar, Numerical Python, Exult, wxWidgets, MiKTeX, Tcl, Doom Legacy, AWStats, netsnmp, Firewall Builder, The Nebula Device, PhpWiki, CMU Sphinx, Courier Mail Server, gnuplot, Developer’s Image Library, Dev-C++ Ghostscript, Python, Linux PCMCIA Card Services, Mailman, Quanta, The Freenet Project, Openads, Common C++ Libraries, Gnucleus, Gabber, Boa Constructor, Scintilla, The EDGE Project Double Choco Latte, TUTOS, Slash, Screem, Leo, Owl Intranet Knowledgebase, WebMail-Java, Halflife Admin Mod, Cdex, DynAPI Gaim, Gimp, Licq, Miranda, MinGW, Netatalk, Gallery, Arianne, Geotools OS communities () as Social Networks

ASWEC, 26 March 2008

6 / 26

Chosen projects Only 9 projects out of 70 seem to actually use SF developers ML so they have been chosen to evaluate communication among members Project

Topic

Arianne

Multi player on line engine to develop games

Gaim

Instant messaging application

Gallery

Web based photo gallery

Geotools

Open source Java GIS toolkit

Gimp-Print

Package of printer drivers

Licq

Instant messaging application

Mingw Miranda

Tool for importing libraries and header files Instant messaging application

Netatalk

Daemon for sharing files and printers

The framework to parse the data from MLs archives for each mail extracts sender, subject, date and thread starter Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

7 / 26

Advanced Users and Developers in DMLs DML is frequented by two different kinds of people Advanced Users exchange opinions on software utilisation, giving the right feedback to developers that helps them to implement new features and correct bugs Developers discuss topics about new features and bug fixes Project

Users

Developers

Participants

Arianne

90

12

112

Gaim Gallery

1195 431

52 53

1247 484

Geotools

299

96

395

Gimp-Print

581

46

627

Licq

598

13

611

Mingw

62

43

105

Miranda

164

16

180

Netatalk

695

43

738

Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

8 / 26

Social Network Analysis Social Network Analysis Social Network Analysis (SNA) has been defined as a way to describe relationships among social entities, as well as the patterns and implications of these connections

SNA in this research In this research SNA provides the tools to : study relationships among people in OS communities draw the network of each community actors become nodes each relationship between them is represented by a link

Considers actors as people keeping in contact

Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

9 / 26

Actors, links, centrality The network performed by OS communities was defined as follows: Nodes are the mail senders: each member who posted a message in a discussion thread Links are established between two members participating at the same thread The network built in this way was analysed extracting centrality measures One of the most important aspects of SNA is the identification of the most prominent or central actors: Those are particularly visible to other actors in the network and are able to keep many relationships with them It is important to take into account not only direct ties in order to evaluate how central a node is in the network, but also indirect paths that link one actor to another Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

10 / 26

Centrality indexes on OS Communities Freeman focused his attention on three fundamental measures: Degree, betweenness and closeness (point indexes) Basing on them he elaborated the network-level centralization indexes They quantify the gap between the most prominent actor and all the remaining actors All these centrality indexes, by identifying the most central members in community networks, can be useful: To better characterize these communities To find a possible relationship between prominent members and OS projects’ success, quality and maturity

Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

11 / 26

Data analysis: degree index Degree is the simplest measure of centrality, defining as most central the actors who have the largest amount of ties with other actors in the network Freeman, reviewing previous related work, chose Nieminen definition for point degree: X mij CD (mi ) = (1) j

where m indicates the community member, and mij = 1 if mi and mj are connected, 0 otherwise. The group degree centralization index is defined as follows: i Pg h ′ i=1 CD (n ) − CD (ni ) CD = (2) (g − 1)(g − 2) ′

where CD (n ) is the largest point degree value and g is the number of nodes in the network. This index equals 1 when one actor interacts with all other g − 1 actors, and they interact only with him; its minimum value is 0, when all degree values are equal (the graph is regular) Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

12 / 26

Degree index in OS Communities The degree distribution in log-log scale shows a similar behavior among projects: very few central members have a high degree, while the vast majority have few direct connections: this is a feature of all scale-free networks

Degree Project Arianne Gaim Gallery Geotools Gimp Licq MinGW Miranda Netatalk

N 95 1185 435 301 587 572 97 156 701

Guido Porruvecchio and Selene Uras

Mean 0.077 0.008 0.016 0.031 0.008 0.011 0.103 0.083 0.006

Centr. 0.656 0.402 0.551 0.431 0.658 0.407 0.535 0.341 0.174

OS communities () as Social Networks

ASWEC, 26 March 2008

13 / 26

Degree index analysis Some projects (especially Miranda and Netatalk) have a low value of degree centralization index These communities do not have a central member whose degree is much higher than the others, but instead, a core group of members who are more visible and exchange an higher number of messages As Wasserman pointed out, not all the actors are directly connected to the network core: Using only degree, an actor position risks to be evaluated only with respect to the closest ties Betweenness index overcomes this obstacle considering another topological aspect of the network

Guido Porruvecchio and Selene Uras

OS communities () as Social Networks

ASWEC, 26 March 2008

14 / 26

Data analysis: betweenness index Betweenness is a measure which takes into account how often a node lies in the shortest path (geodesic path) between two other nodes Under the hypothesis that two geodesics with the same length have the same probability to be chosen, Freeman defined actor betweenness: CB (ni ) =

X gjk (ni ) j

Suggest Documents