A Study of FOSS 2013 Survey Data Using Clustering ...

89 downloads 3050 Views 689KB Size Report
FOSS is an acronym for Free and Open Source Software. The ... other facets of the free software and OSS worlds. ... Android (Mostly), Top 500 Supercomputers.
A Study of FOSS 2013 Survey Data Using Clustering Techniques

A Study of FOSS 2013 Survey Data Using Clustering Techniques

A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology

A. Mani1 1

Rebeka Mukherjee2

Women Contributors-1 MCA

Department of Pure Mathematics, Calcutta University 9/1B, Jatin Bagchi Road Kolkata-700029 India E-Mail:[email protected] Homepage: http://www.logicamani.in 2

Department of Comp.Sci.and Engg. Netaji Subhash Engineering College B-13 Rajdanga Nabapally Kolkata-700107, India Email: [email protected]

IEEE Women in Engineering Conference, AISSMS Pune’2016

Details References

pABSTRACTy

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement

FOSS is an acronym for Free and Open Source Software. The FOSS 2013 survey primarily targets FOSS contributors and relevant anonymized dataset is publicly available under CC by SA license. In this study, the dataset is analyzed from a critical perspective using statistical and clustering techniques (especially multiple correspondence analysis) with a strong focus on women contributors towards discovering hidden trends and facts. Important inferences are drawn about development practices and other facets of the free software and OSS worlds.

Data Set Methodology Women Contributors-1 MCA Details References

pMyselfy

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee

Research Areas: Algebra, Logic, Rough Sets, Vagueness and related areas.

Free Software Movement

Foundations of Rough Sets:

Methodology

Course development in Machine Learning, Soft Computing. Occasionally involved in independent consultancy in KDD, Statistics and Specifications GNU/R expert, Free Software Activism: Ubuntu Women Project, GLUG Kolkata, (ilug-Cal.info), Fedora, LQ, GNU/R India, Functional Feminist Principal Organizer and Chair: Special Session on Foundations of Rough Sets, IJCRS’2017, Poland

Data Set

Women Contributors-1 MCA Details References

pFree(dom) Softwarey

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set

The freedom to run the program, for any purpose

Methodology

The freedom to study how the program works, and adapt it to your needs

Women Contributors-1

The freedom to redistribute copies so you can help others

Details

The freedom to improve the program, and release them to the public for the benefit of the whole community. Copyleft Free Software licenses are Better than other FOSS licenses.

MCA

References

pFree Software Projectsy

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement

GNU, Open BSD, Free BSD, NetBSD, Hurd, HelenOS, Linux, Firefox, ChromeOS

Data Set

Android (Mostly), Top 500 Supercomputers

Women Contributors-1

In GNU/Debian over 60000 free software packages are available Kernels: Linux, BSD, Mach micro kernel (Hurd), Spartan (HelenOS multi server micro kernel),Minix Desktops: KDE, GNOME, LXDE, XFCE, Plasma, FluxBox, Enlightenment, . . . Infrastructure related projects, Web Related Projects

Methodology

MCA Details References

pFOSS 2103 Surveyy

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set

GSyC/LibreSoft research group at the Universidad Rey Juan Carlos FOSS’2002 Survey: 2500 Contributors FOSS’2013 Survey : Similar Questions

Methodology Women Contributors-1 MCA Details References

Participants: < 2200!! - Extremely Small Sample Linux Kernel Developers: 2500+, Drupal: 10000+ Free Software is everywhere

pGoals of Studyy

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology

What can be Inferred from the Survey Data?

Women Contributors-1

What can be said about women contributors?

MCA

How Does Multiple Correspondence Analysis help?

Details

What Gender differences can be inferred from the data? What does the survey offer for future surveys?

References

pDescription of Data Sety Data Set: 262 Columns, 2183 Rows 53 Question Blocks Majority of the participants were developers and data is in factor form. Number of factor levels do not admit of easy automatic conversion into numeric values. 34% consider themselves as a part of the Free Software community, while 32% consider themselves part of the OSS community and the rest do not care about labels. 15% of those surveyed think that Free Software and OSS communities are not only different way of thinking but also different way of living. 29% think that they are different only in the principles but they work in the same way while 21% do not care. About 51% have been involved in 1 − 5 FLOSS development projects, 18% in 6 − 10 projects, 7% in 11 − 20 projects, 2% in 21 − 30 projects, 1.6% in 31 − 50 projects, and 1.6% in more than 50 projects.

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology Women Contributors-1 MCA Details References

pSimple Factsy About 20% of participants were involved in a project as a lead figure, 12% in two projects, 7% in three projects, 8% in 4 − 5 projects and 6% in more than five projects in similar capacities. 29% of the participants had never been involved as a leading figure. Questions comparing the contribution of developers to the benefits they get from the FOSS community were ignored by nearly 50% of the participants. Nearly 70% of the participants believe that most developers are less concerned about money. 2% of the people surveyed totally agree that to develop FLOSS is a kind of self-exploitation, because people do not receive any benefits in return for good ideas and work, 6% agree, while 27% disagree and 36% totally disagree. Surprisingly only 51% of participants have used at least two programming languages. About 29% of Python Programmers know C as well.

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology Women Contributors-1 MCA Details References

pMethodologyy Questions were split into the 3 groups: First block of 42 questions of a personal nature Questions 43 − 109 relating to participant’s general outlook in relation to free software and OSS worlds. This will be referred to as the Outlook block. Questions 110 − 168 relating to participant’s choices related to programming. This will be referred to as the Programming block. Questions 169 − 262 relating to participant’s experiences in and views about the free software and OSS development environment. This will be referred to as the Environment block. The above were further subsetted to analyze the involvement of women. Techniques Used: Exploratory Methods, Multiple Correspondence Analysis (MCA) and Clustering on GNU/R. MCA [FSJ2011] is an adaptation of PCA to factor data in which categories are looked for by both columns and rows simultaneously.

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology Women Contributors-1 MCA Details References

pWomen in FOSS-1y

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee

35% of women participants were single, about 11% do not live with their partners, while 3% live with their partners. 3% are married and 0.09% are separated from their partners. 20% of women participants surveyed had children, while 79% did not.

Free Software Movement Data Set Methodology Women Contributors-1 MCA

About 5% of the participants were unemployed. 40% of the women participants work or reside in the USA, 6% in Germany, 6% in the India, 5% in the U.K., and 4% in Australia. 81% of women contributors are at least graduates. Number of Ph.Ds form 10% of the pool. 82% of women contributors are at least proficient in the English language.

Details References

pWomen in FOSS-2y

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement

32% love their present job, while 5% wanted a more interesting job. 28% consider themselves as a part of the Free Software community, while 4% consider themselves part of OSS community. 10% of women think that Free Software communities are different from OSS communities in theory and praxis. 20% think that they differ only in their principles but work in the same way. 21% do not care.

Data Set Methodology Women Contributors-1 MCA Details References

pEducation Status of Womeny 10.4% of participants identified as women (way better than the 2002 figure). Women with university education: 81.4%, Whole dataset: 72% - Entry level is higher for women.

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology Women Contributors-1 MCA Details References

pContribution Typey

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee

76% of women contribute code. Whole dataset: 72%. 81% of women believe that FOSS development is not self exploitive. This rhymes with belief in collaborative community development models and good motivation.

Free Software Movement Data Set Methodology Women Contributors-1 MCA Details References

Figure: Contribution=Self Exploitation?

pTwenties: Popular Starting Agey

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology Women Contributors-1 MCA Details References

Figure: Age at Time of First Contribution

pMCAy

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee

Multiple correspondence analysis [FSJ2011] over the three subsets corresponding to outlook, programming and development environment for women were investigated The library FactoMineR was used for all this. Influential Dimensions were isolated by MCA Hierarchical Clustering on Principal Components (HCPC) was used to find optimal features for clusters Questions that determine clusters effectively could be identified. More details are in paper.

Free Software Movement Data Set Methodology Women Contributors-1 MCA Details References

pResults: Outlook Block: 15 Questionsy

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee

The HCPC procedures indicates 15 very important questions. Q0024: Do you also develop proprietary software? Q0022, Q0021: Age of starting with FOSS Q0025: Are you involved with proprietary software? Q0035: Do you earn money from FLOSS?

Free Software Movement Data Set Methodology Women Contributors-1 MCA Details

Q0023: how many hours per week did you work on FLOSS projects in the last year? Q0030_2: Was the urge to [to learn and develop new skills] a reason for getting into FOSS? Q0034: Do you use FLOSS at work or in academics? Q0032_2: is [to learn and develop new skills] a reason for developing FLOSS today?

References

pWomen: Development Issuesy Optimal Number of Key Questions: 20. Q0057_1, Q0057_4, Q0057_2, Q0057_6, Q0058_8, Q0057_5, Q0058_11, Q0057_7, Q0058_9, Q0057_3, Q0058_12 and Q0058_10 ...

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology

importance of software stability by reduction of bugs, making software suitable for mobiles, etc, improving feature set, attracting users,

Women Contributors-1

leveraging the power of the leader/main company, reduction of time between releases,

References

communication inside projects, getting users involved in contribution, getting enough funds, improving documentation, upstream communication and legal issues On the third block of questions, the methods did not yield much of insight.

MCA Details

pDevelopment Environmenty

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement

High eigenvalues for the first three dimensions #Views of Women on Development > head(mcafemvewfse$eig) eigenvalue var% cum% var dim 1 0.21 9.2 9.2 dim 2 0.14 6.5 15.7 dim 3 0.08 3.6 19.3 dim 4 0.06 2.6 21. dim 5 0.05 2.3 24.2 dim 6 0.05 2.2 26.4

Data Set Methodology Women Contributors-1 MCA Details References

pHCPCy

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement

Applying the hcpc() function on the MCA object, the optimal number of questions required to obtain the position of women contributors on development issues include at least 20 twenty QA. > hcpcfemvewfse$desc.var $test.chi2 p.value df Q0057_1 3.158089e-61 21 Q0057_4 2.622154e-59 18 Q0057_2 6.476614e-59 21

Data Set Methodology Women Contributors-1 MCA Details References

pAssociated Clustersy

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology Women Contributors-1 MCA Details References

pConclusiony

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee

In this research,

Free Software Movement Data Set

the FOSS’2013 survey is contextualized and critically examined Stress has been laid on women in free software and their views on free software development and outlook These have been elucidated using MCA and HCPC methods. In particular the data also suggests a higher entry level for women in FOSS This study would also be useful for improving related survey methodologies and design.

Methodology Women Contributors-1 MCA Details References

pReferences Iy L. Arjona-Reina, G. Robles, and S. DuenÌČas. (2014, January) The FLOSS2013 Free/Libre/Open Source Survey. [Online]. Available: http://floss2013.libresoft.es M. Gardiner and V. Aurora. The Ada Initiative Census,2011. [Online]. Available: https://adainitiative.org/2011/06/06/ census-march-2011-demographic-breakdown-..

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology Women Contributors-1 MCA Details References

R. Stallman, Free Software Free Society: Selected Essays of Richard M. Stallman„ 3rd ed. FSF, 2015. [Online]. Available: http://www.gnu.org/doc/fsfs3-hardcover.pdf A. Mani, Free Software, Women and Functional Feminism, In WWFSâĂŹ2016 Report and Slides, 2016, [Online]. Available: http: //wiki.ubuntu-women.org/Event-Report-WWFS-FWD2016

pReferences IIy

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology Women Contributors-1

F. Husson, S. Le, and J. Pages, Exploratory Multivariate Analysis by Example Using R CRC Press, 2011.

MCA Details References

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology

QUESTIONS?

Women Contributors-1 MCA Details References

A Study of FOSS 2013 Survey Data Using Clustering Techniques A. Mani, Rebeka Mukherjee Free Software Movement Data Set Methodology

CHEERS !

Women Contributors-1 MCA Details References