quality perception, we firstly outline the setup of the research (Brabham,. 2008:75-90). The first ... networking site Facebook and on the crowdsourcing platform Microworkers ..... of Business", Crown Publishing Group, New York, NY, 2008.
Investigating Factors Influencing the Quality of Crowdsourced Work under Different Incentives: Some empirical Results Evangelos Mourelatos*, Manolis Tzagarakis Department of Economics University of Patras 26504 Rio, GREECE {vagmour, tzagara}@upatras.gr
Keywords: Crowdsourcing, Quality of Service, Incentives, Reliability, Performance
Abstract Crowdsourcing is a new form of online labor, where the process of solving a problem is approached by soliciting contributions from a large group of people. In this paper, we attempt to investigate how different incentives affect the quality of work in such contexts, by completing the same task in three different environments: in a laboratory setting, on a social networking as well as on a crowdsourcing site. Analyzing the obtained results indicates that under different incentives, different factors contribute to the quality of work in crowdsourcing tasks. In general the research highlights that the identification of factors contributing positively to higher quality of work in crowdsourcing environments is a complex question, depending on the task at hand.
1. Introduction The rise of the web and its big impact to people’s daily routine has created suitable conditions for new economic processes to emerge. In this respect, crowdsourcing can be considered as a fast-growing activity which involves division of labor for tedious tasks. It can be used to successfully address a great variety of tasks that include funding, reviewing, idea hatching as well as a general search for answers and solutions. In general, crowdsourcing is a process which obtains needed services, ideas, or content and looks for contributions from a large group of people, in the form of an online community, rather than from traditional employees and/or suppliers. The process of crowdsourcing is often used to subdivide tedious work by combining the efforts of numerous self-identified volunteers or part-time workers, where each contributor of their own initiative adds a small portion to the greater result (Howe 2006). Nowadays, it is often used for particular types of work such as translation services, microtasks, image tagging and transcription. (Estellés-Arolas et al. 2012).
1
Crowdsourcing has received in recent years the interest of researchers in various fields that aim to analyze, comprehend, assess and even improve this new form of labor and finally find strategies and frameworks in order to increase the quality of the work being done in high levels (Howe 2008). An overview of the general principles of crowdsourcing aimed towards achieving high quality of work is given by existing literature (Yuen et al. 2011). Quality of work in crowdsourcing is the extent to which the provided outcome of the worker fulfills the requirements of the requester (Allahbakhsh et al. 2013: 76-81). In general, quality of work in such contexts is considered a subjective issue. That’s why many researches try to propose various models and metrics to assess and ensure high the quality of work in such environments. With respect to the existing models, two approaches to achieving quality results can be identified: approaches based on the profile of individual workers and approaches focusing on the specification and design of the submitted task. Workers in crowdsourcing markets usually have different levels of expertise and experience and many times adjust their efforts according to incentives hence affecting the quality of the outcome. (Wang et al. 2013). On the other hand approaches focusing on task design (under which the requester describes the task that should be completed) consists of several components (task definition, user interface, granularity and compensation policy) which obviously affect the quality of the worker’s result (Finnerty et al. 2013: 16-20). In this paper we aim to address questions related to what factor affect the quality of work in crowdsourced tasks when these are performed under different incentives. The focus is in particular whether or not different incentives are associated with different factors influencing the quality of work (Chandler 2013:123-133). Towards this, we conducted experiments where the same crowdsourcing task has been submitted under three different incentive schemes and compare the quality of work received. This paper is structured as follows. In the next section we describe the task workers had to complete and present the three environments where the task was submitted. Consequently, we present the methodology and show the results of the analysis. The paper concludes with a summary and an some future research directions.
2. Related work The literature suggests that the level of the quality of workers’ results closely relates to the incentives of the participants as well as depends on the environment where the experiments take place (Kaufmann et al., 2011:340). In existing literature, motivation in participating in crowdsourcing tasks can be divided in two categories: intrinsic and extrinsic (Thompson et al. 1999:2537). Intrinsic motivation exists if an individual is activated because of his/her seeking for the fulfillment generated by the activity (e.g. acting just for fun). 2
In the case of extrinsic motivation the activity is just an instrument for achieving a certain desired outcome (e.g. acting for money or to avoid sanctions). (Thompson et al. 1999) showed that, in general, extrinsic motivation is generally stronger than intrinsic motivation concerning the use of Internet while (Brabham, 2008) showed that the possibility of earning money (extrinsic motivation) is the most dominant factor to participate in crowdsourcing platforms, followed by the generated fun (intrinsic motivation). On the other hand the research of (Kaufmann et al., 2011) showed that there are cases in crowdsourcing environments where many intrinsic factors seem to dominate the extrinsic ones. Finally (Rogstadius et al. 2011:321-328) presented a study in which the effect of extrinsic and intrinsic motivators on task performance was estimated. The study concluded that work accuracy can be improved significantly through intrinsic motivators, especially in cases where extrinsic motivation is low. In general, existing research point out the positive role that intrinsic and extrinsic motivation play in crowdsourced tasks. However, the question of how motivational aspects can be influenced or triggered by the design choices of the crowdsourcing requester i.e. how a task has to be designed so as to motivate only specific groups of workers, which can guarantee a desired level of quality of work, is not addressed.
3. Modeling Framework The goal of the research reported in this paper is to investigate the incentives that affect the quality of the results in crowdsourced tasks. Towards this, we have selected the form of the experiment to be related to the task of transcription. In particular, the experiment was designed to welcome answers from the participants for addressing the problem of Mondegreen (Wikipedia 2014). The term Mondegreens describes misperceptions as defined by “slips of the ear”, denoting that an utterance is perceived differently to what has actually been said. In general, the analysis of these misperceptions permits a deeper insight into speech processing in the human brain (Meyer et al., 2011: 926-930). This phenomenon is very widespread in music, when a person mishears or misinterprets the lyrics of a song. In particular, in the context of our research we tried to examine whether or not different incentives in crowdsourcing environments can affect the quality of results in transcription tasks. In addition, the research aims to identify factors that may increase or decrease the quality of crowdsourcing results in three different environments: i) in a controlled laboratory setting with university students, ii) on a popular social networking site (Facebook) and iii) on a crowdsourcing platform (microworkers.com) (Hossfeld et al. 2014: 541 – 558). The experiment aims to investigate intrinsic as well as extrinsic motivations: while the experiment in the laboratory and on the microworkers.com environments are extrinsically motivated, the experiment on the social networking site Facebook is intrinsically motivated. 3
4. Methodology In order to investigate better the factors that either raise or reduce the overall quality perception, we firstly outline the setup of the research (Brabham, 2008:75-90). The first experiment was conducted in a controlled laboratory setting of 44 first year undergraduate students of the department of Economics at the University of Patras, which resulted in 44 distinct answers. At first, the students had to fill out a demographic survey which included also questions related to their personality. Subsequently, the students were allowed to listen to an unknown music sample having duration of 32 seconds. The music sample contained eight verses with a total of 42 English words. The students were asked to listen to the music sample as many times they wanted in a two hours period and write down the words they think they heard and were asked to submit their answers. In order to motivate participation, students were told that the quality of their work would be taken into consideration –as a weighted factor- for their final grade. The same experiment was conducted in two more environments: on the social networking site Facebook and on the crowdsourcing platform Microworkers (microworkers.com) in which the experiment was conducted under two different settings: one setup which solicited contributions from workers from all over the world called “international setting” and one setup with a hired group of workers in which workers from certain eastern world countries were deliberately excluded, called the “western world setting”. Eastern countries that were omitted, included Bangladesh, Pakistan, India, China, Egypt etc. The detailed lists of the workers’ countries in both versions of the experiment are available in the appendix of this paper. In each environment, workers participated in the same task with different incentives. While for students the incentive was the grade they receive (extrinsic motivation), on the crowdsourcing platform the incentive had the form of a monetary reward (extrinsic motivation). Participants in the experiment on the social networking site Facebook had no tangible reward (intrinsic motivation).
5. Data Analysis 5.1 Demographics A total of 44 students participated in the laboratory setting of the experiment with the gender distribution shown in table 1.
4
Gender male female Total
Freq. 23 21 44
Percent 52.27 47.73 100
Cum. 52.27 100
Table 1 Gender distribution of the laboratory experiment
All participants were of Greek origin covering the age group 18-19. Likewise all had the same education level (all students had currently finished the first year of their undergraduate studies) and employment status (full-time students). In addition, each participant answered questions related to family monthly income, internet use, English language qualifications and their first contact with social networks. Participants were also asked questions about their personality using a standardized personality test. The following tables (Tables 2-7) summarize the data collected from the participating students in the laboratory setting.
monthly income
Freq.
Percent
Cum.
Missing =3501 Total
18 2 7 6 1 2 3 2 3 44
40.91 4.55 15.91 13.64 2.27 4.55 6.82 4.55 6.82 100
40.91 45.45 61.36 75 77.27 81.82 88.64 93.18 100
Table 2 Family Monthly Income of participating students
english diploma
Freq.
Percent
Cum.
Missing No Degree Competency Advanced competency Proficiency
3 2 20 3 16
6.82 4.55 45.45 6.82 36.36
6.82 11.36 56.82 63.64 100
Total
44
100
Table 3 English qualification levels of participants
5
Home Internet Access
Freq.
Percent
Cum.
Yes No
12 32
27.27 72.73
27.27 100
Total
44
100
Internet Access in primary school
Freq.
Percent
Cum.
Yes No Total
15 29 44
34.09 65.91 100
34.09 100
Table 5 Internet access in primary school of participants Table 4 Home Intenet access of participants
Internet Access in middle school
Freq.
Percent
Cum.
Yes No
41 3
93.18 6.82
93.18 100
Total
44
100
Internet access in high school Yes No Total
Freq.
Percent
Cum.
43 1 44
97.73 2.27 100
97.73 100
Table 7 Internet access in high school of each participant
Table 6 Internet access in high school of participants
With respect to questions related to their personality, these were based on the big five personality test principles (Extraversion, Emotional Stability, Agreeableness, Conscientiousness and Openness to Experience) (Linden et al. 2010:315-327). Concerning the experiment conducted on the social networking site Facebook the total number of the participants was also 44 with the gender and age distribution shown in tables 8 & 9. Gender male female Total
Freq. 17 27 44
Percent 38.64 61.36 100
Cum. 38.64 100
Variable
Obs
Mean
Age
44
25.81
Std. Dev. 3.53
Min
Max
18
33
Table 9 Age distribution of the Facebook participants
Table 8 Gender of the Facebook participants
On Facebook, all participants were also of Greek origin and employment status (full-time work) but differed in their education level as shown in table 10. Education level
Freq.
Percent
Cum.
Certificate program
1
2.27
2.27
High School Degree
10
22.73
25
Education and Training Institute Degree Technological Educational Institution Degree
2
4.55
29.55
7
15.91
45.45
University degree
8
18.18
63.64
Technical University
6
13.64
77.27
Master's Degree
9
20.45
97.73
Doctorate
1
2.27
100
Total
44
100
Table 10 Education level of Facebook participants
6
Regarding the experiment conducted on the crowdsourcing site microworkers.com we had some noteworthy differences in the gender, age and education level as shown in the following tables (Tables 11, 12 and 13).
Gender World Male Female Total
Freq.
Percent
Cum.
41 8 49
83.67 16.33 100
83.67 100
Table 11 Gender of the microworkers.com participants (International)
Gender western World
Freq.
Percent
Cum.
Male Female Total
29 20 49
59.18 40.82 100
59.18 100
Table 12 Gender of the microworkers.com participants (Western world)
Variable
Obs
Mean
Age (world) Age (West)
49 49
26.63 30.02
Std. Dev. 5.99 8.46
Min Max 18 20
49 59
Table 13 Ages of the microworkers.com participants (International)
The participants on the crowdsourcing site microworkers.com had to answer also a question about their education level. The following tables (Tables 14 & 15) show the received answers.
Table 14 Education levels of microworkers.com participants (Intl)
7
Table 15 Education levels of microworkers.com participants (Western)
A first look at the retrieved data reveals that participants in the international setting on the crowdsourcing website had a lower educational level (75.51% have a bachelor level degree) than the western only participants on the same website (61.22%). One last variable which was examined was the number of times participant listened to the song (“Number of repeats”) before entering their final answer in each of the four experiments (Table 16).
Number of Repeats Laboratory Facebook Microworkers International Microworkers Western
Obs.
Mean
Std. Dev.
Min
Max
44 44
16.5 7.59
10.56 4.43
2 2
50 20
49
6.45
5.90
1
23
49
5.76
3.89
1
16
Table 16 “Number of repeats” variable in each experiment
This factor is important in order to estimate the efforts of the participants in providing the lyrics. As can be noticed, the mean number of the repeats in the laboratory setting is the highest (on average 16.5 repeats) in relation to the other experiments conducted on Facebook and in both settings on the microworkers.com platform (on average 7.59, 6.45 and 5.76 repeats respectively).
5.2 Experiment Results Initial Results In the laboratory setting of the experiment, 44 participants provided 44 answers for a music sample with duration of 32 seconds. In general, the quality of results was satisfactory (x near to 0. ). More detailed descriptive statistics are presented on the following table (see Table 17, Figure 1). 8
Variable Results
Obs 44
Mean 0.69
Std. Dev. 0.16
Min 0.26
Max 0.93
Table 17 Descriptive statistics of the provided answers. The minimum percent of lyrics correctly identified by the participants is 26% while the maximum is near to 93% on total of 44 observations.
Figure 1 Distribution of the success rate in the laboratory experiment
In addition we examined possible correlations between the observed variables. The first results showed that the quality of results in the laboratory setting is positively correlated with the students’ demographic and educational levels. For example the correlation between the students’ results and the gender (male, female) was marginally statistically significant at the 5% level of significance (r= 0.095, p= .045). Moreover the correlation between the students’ results and their qualifications in English (on the scale: No degree, competency in English, advanced competency in English, Proficiency in English) is very strong against the null hypothesis and the right hand variable is positively correlated to the left hand variable (r=0.033, p