Noname manuscript No. (will be inserted by the editor)
TaskRec: A Task Recommendation Framework in Crowdsourcing Systems Man-Ching Yuen · Irwin King · Kwong-Sak Leung
Received: date / Accepted: date
Abstract Crowdsourcing is evolving as a distributed problem-solving and business production model in recent years. In crowdsourcing paradigm, tasks are distributed to networked people to complete such that a company’s production cost can be greatly reduced. In crowdsourcing systems, task recommendation can help workers to find their right tasks faster as well as help requesters to receive good quality output quicker. However, previously proposed classification based task recommendation approach, which is the only one in the literature, does not consider the dynamic scenarios of new workers and new tasks in the crowdsourcing system. In this paper, we propose a Task Recommendation (TaskRec) framework based on a unified probabilistic matrix factorization, aiming to recommend tasks to workers in dynamic scenarios. Unlike traditional recommendation systems, workers do not provide their ratings on tasks in crowdsourcing systems, thus we infer user ratings from their interacting behaviors. This conversion helps task recommendation in crowdsourcing systems. Complexity analysis shows that our framework is efficient and is scalable to large datasets. Finally, we conduct experiments on real-world datasets for performance evaluation. Experimental results show that TaskRec outperforms the state-of-the-art approach. Keywords Crowdsourcing · Task recommendation · Matrix factorization · Probabilistic matrix factorization M.-C. Yuen Department of Computer Science and Engineering The Chinese University of Hong Kong, Shatin, Hong Kong, China E-mail:
[email protected] I. King E-mail:
[email protected] K.-S. Leung E-mail:
[email protected]
2
Man-Ching Yuen et al.
1 Introduction Crowdsourcing is outsourcing a task to a large group of networked people in the form of an open call to reduce the production cost. In recent years, crowdsourcing systems have attracted much attentions [1, 2]. Some popular examples of crowdsourcing systems are Amazon Mechanical Turk (or MTurk)1 , CrowdFlower2 and Samasource3 . One of the problems for workers is that it is difficult for workers to find appropriate tasks to perform since there are just too many tasks out there. For example, in February 2011, the number of available HITs (Human Intelligence Tasks) for qualified task workers on MTurk were about 80,000 on average per day. In a crowdsourcing system, a worker has to select a task from more than ten thousands of tasks to work on in order to earn the tiny associated reward of such a few cents. It is important to investigate on how to support task workers to select tasks on crowdsourcing platforms easily and effectively. The available worker history makes it possible to mine workers’ preference on tasks and to provide favorite recommendations. Task recommendation can help workers to find their right tasks faster as well as help requesters to receive good quality output quicker. There are two issues at task recommendation. First, lack of explicit ratings. Although recommendation systems [3–5] are often used to suggest relevant items (news, books, movies, etc.) attracting particular users on the Web, task recommendation is much difficult than product recommendation. Unlike product recommendation, workers do not have to give ratings to tasks to indicate the extent of their favor of each task. Second, the cold-start problem. The cold-start problem is “the recommendation problem on the items that no one in the community has yet rated” [6]. In task recommendation, all tasks to be recommended are new tasks and the task selection periods are usually short. It is very difficult to solve the cold-start problem in task recommendation. To overcome the weaknesses mentioned above, this paper proposes a Task Recommendation (TaskRec) framework in crowdsourcing systems. Our contributions are: (1) To have ratings on tasks, this paper proposes a way to infer user ratings from their interacting behaviors. (2) This paper also proposes a way for task recommendation by performing factor analysis based on probabilistic matrix factorization, the worker latent feature space and task latent feature space are learned. Our approach can solve the cold-start problem which cannot be solved by the previously proposed classification based task recommendation approach [7]. (3) The experimental results on real-world datasets shows that our framework outperforms the state-of-the-art collaborative filtering approach, probabilistic matrix factorization (PMF). (4) The complexity analysis shows that our approach is scalable to very large datasets. The rest of this paper is organized as follows. Section 2 presents the related works. Section 3 presents our proposed TaskRec framework for task recommen1 2 3
Amazon Mechanical Turk website: https://www.mturk.com CrowdFlower website: http://crowdflower.com Samasource website: http://samasource.org
TaskRec: A Task Recommendation Framework in Crowdsourcing Systems
3
dation in crowdsourcing systems. Section 4 describes our experiments. Section 5 addresses possible challenges and concludes with future directions.
2 Related Work 2.1 Crowdsourcing Systems Crowdsourcing is outsourcing a task to a large group of networked people in the form of an open call to reduce the production cost. A crowdsourcing process involves operations of both requesters and workers. A requester submits a task request; a worker selects and completes a task; and the requester only pays the worker for the successful completion of the task. Task recommendation in crowdsourcing is important because of the following reasons: – Motivate workers of diverse background to work on crowdsourcing tasks in long run. Currently, on crowdsourcing sites, most workers only provide moderate contributions [8] and there is a significant population of young and well-educated Indian workers [9]. It can attract more workers to contribute their efforts in long run if a worker find a suitable task on a crowdsourcing site easily. – Improve the quality of work. Workers perform better if they are familiar with the tasks. Chilton et al. showed that task workers only browsed the first few pages on crowdsourcing sites when searching for tasks [10]. The task list for a worker of Amazon MTurk site is usually displayed on hundreds of pages. A worker selects a task from the list of available tasks sorted by a specified feature of tasks such as task creation date and reward amount. When the tasks posted on the first few pages are not suitable for a worker, the worker might choose a task that he does not familiar with and try to complete it to earn the rewards; otherwise, the worker does not select any task. Working with a unfamiliar task might decrease the quality of work.
2.2 Recommendation Systems Broadly speaking, recommendation systems are based on either content filtering approach or collaborative filtering approach. The content filtering approach creates a profile for each user or product, for example, a movie, to characterize its nature. The profiles of users and products allow programs to associate users with matching products. The advantage is that it can address the system’s new products and users. However, the profile information might not be available or easy to collect. On the other hand, the collaborative filtering approach relies only on past user behavior. This approach analyzes relationships between users and interdependencies among products to identify new user-item associations.
4
Man-Ching Yuen et al.
It is generally more accurate than content filtering approach. However, it cannot address the system’s new products and users [11], which is the cold-start problem. To address the cold-start problem, latent factor models are an alternative approach that can approximate the ratings by characterizing both users and items on a number of factors inferred from the ratings patterns. “Some of the most successful realizations of latent factor models are based on matrix factorization.” [11] Matrix factorization has a lot of applications [12, 13]. Although matrix factorization can solve the cold-start problem, it is not scalable. Probabilistic matrix factorization (PMF) model [14] can scale linearly with the number of observations, and performs very well on large, sparse, and imbalanced datasets. Recently, several probabilistic matrix factorization methods [3] have been proposed for collaborative filtering approach in recommendation systems. These methods focus on using low-rank approximations to model the user-item rating matrix for making further predictions. The premise behind a low-dimensional factor model is that there is only a small number of factors influencing preferences, and that a user’s preference vector is determined by how each factor applies to that user. The above approaches are used for user recommendation in social tagging systems. However, task recommendation is much difficult than product recommendation, and workers do not have to give ratings to tasks to indicate the extent of their favor of each task. 2.3 Our Motivation Our motivation is the observation of the increase of difficultly for workers to find their preferred tasks [10, 15, 16]. Zhang et al. also described the need for routing tasks to appropriate individuals to solve complex problems [17]. Based on these observations, Ambati et al. [7] proposed classification based task recommendation approach to recommend tasks to users based on implicit modeling of skills and interests. However, classification based task recommendation approach can not solve cold-start problem. Our proposed TaskRec framework recommend tasks to users by employing matrix factorization based on both worker performance history and worker task searching history. 3 TaskRec Framework Our framework consists of three parts. First, we connect workers’ task preferring information with workers’ category preferring information through the shared worker latent feature space. Second, we connect workers’ task preferring information with tasks’ category grouping information through the shared task latent feature space. Third, we connect workers’ category preferring information with tasks’ category grouping information through the shared category latent feature space. The graphical model of the TaskRec framework is represented in Fig. 1.
TaskRec: A Task Recommendation Framework in Crowdsourcing Systems
5
Fig. 1 Graphical Model for TaskRec
By using a worker-task preferring matrix, we can measure the extend the worker prefer to work the task and provide output that accepted by requesters. Unlike traditional recommendation systems, workers do not have to give ratings to tasks to indicate the extent of their favor of each task. To have ratings on tasks, we transform workers’ behaviors into values as follows:
Worker Behavior Worker’s work done is accepted by requester. Worker’s work done is rejected by requester. Worker completes a task and submits the work done. Worker selects a task to work on but not complete it. Worker browses the detailed information of a task. Worker does not browse the detailed information of a task.
−→ −→ −→ −→ −→ −→
Value 5 4 3 2 1 0
In some cases, the ratings based on value transformation of worker behavior would be inaccurate on reflecting workers’ task preference. For example, a worker’s work done is being accepted, but he might not like the task very much. The problem we study in this paper is how to effectively predict the missing values in the worker-task preferring matrix so as to provide related tasks to workers or to recommend workers for suitable tasks. We define the problem of task recommendation in crowdsourcing systems as follows:
6
Man-Ching Yuen et al.
Table 1 Basic Notations Throughout This Paper Notation W S= {wi }m i=1 V S= {vj }n j=1 CS= {ck }ok=1 l∈R W ∈ Rl×m V ∈ Rl×n C ∈ Rl×o R = {rij }, R ∈ Rm×n U = {uik }, U ∈ Rm×o D = {djk }, D ∈ Rn×o N (x|µ, σ 2 )
Description W S is the set of workers, wi is the i-th worker, m is the total number of workers V S is the set of tasks, vj is the j-th task, n is the total number of tasks CS is the set of task categories, ck is the k-th task category, o is the total number of task categories l is the number of dimensions of latent feature space W is the worker latent feature matrix V is the task latent feature matrix C is the task category latent feature matrix R is the worker-task preferring matrix, rij is the extent of the favor of task vj for worker wi U is the worker-category preferring matrix, uik is the extent of worker wi ’s preference for task category ck D is the task-category grouping matrix, djk indicates the task category ck that task vj belongs to Probability density function of the Gaussian distribution with mean µ and variance σ 2
Definition 1. Task recommendation problem: Given a worker wi , a set of tasks V S= {vj }nj=1 and a set of ratings R = {rij } associated between worker wi and task vj , rank the ratings in R and select the top few tasks in V S for task recommendation to worker wi . To facilitate our discussions, Table 1 defines basic terms and notations used throughout this paper.
3.1 Worker-Task Preferring Matrix Factorization We have m workers, n tasks. The worker-task preferring matrix is denoted as R, the element rij in R means the extent of the favor of task vj for worker wi , where values of rij are within the range [0, 1]. Without loss of generality, we first map the ratings that inferred from worker behavior 1, ..., 5 to the interval [0, 1] using the function f (x) = x/5. Hence, we are given a partially observed worker-task preferring matrix, R, with m workers and n tasks. To learn the workers’ preference on the tasks, we employ matrix factorization, more specifically, Probabilistic Matrix Factorization (PMF) [18], to recover the worker-task preferring matrix. Given the partial observed matrix R, we aim at decomposing the matrix R into two l -dimensional low-rank feature matrices, W and V , where W ∈ Rl×m is the latent feature matrix for workers with column vector Wi , and V ∈ Rl×n is the latent feature matrix for tasks with column vector Vj . To learn the matrices, a Gaussian distribution on the residual of the observed ratings is assumed as [18], and it is defined in Eq. (1):
TaskRec: A Task Recommendation Framework in Crowdsourcing Systems
7
m Y n Y R 2 Iij N (rij |g(WiT Vj ), σR ) ,
(1)
2 p(R|W, V, σR )=
i=1 j=1
where N (x|µ, σ 2 ) is the probability density function of the Gaussian distribuR tion with mean µ and variance σ 2 , and Iij is the indicator function that is equal to 1 if the entry rij is observed and equal to 0 otherwise. The Gaussian distribution model can make predictions outside of the range of valid values. The function g(x) is the logistic function g(x) = 1/(1+exp(−x)), which makes it possible to bound the range of WiT Vj within the range [0, 1]. Similar to [19], to avoid overfitting, zero-mean spherical Gaussian priors are also placed on the worker and task feature matrices, which are defined in Eq. (2):
2 p(W |σW )=
m Y
2 N (Wi |0, σW I), p(V |σV2 ) =
i=1
n Y
N (Vj |0, σV2 I).
(2)
j=1
Hence, through a Bayesian inference, the posterior distributions of W and V based only on the observed ratings are derived in Eq. (3): 2 2 2 2 p(W, V |R, σR , σW , σV2 ) ∝ p(R|W, V, σR )p(W |σW )p(V |σV2 ) m Y n m n Y Y R Y 2 Iij 2 = N (rij |g(WiT Vj ), σR ) × N (Wi |0, σW I) × N (Vj |0, σV2 I). i=1 j=1
i=1
j=1
(3) 3.2 Worker-Category Preferring Matrix Factorization We have m workers and o task categories. The worker-category preferring matrix is denoted as U , where the element uik in U represents the extent of worker wi ’s preference for task category ck . Workers’ performance histories indicate workers’ preference for task categories, so the meaning of uik can be interpreted as whether the worker wi has completed a task of the category ck where the task is accepted (a binary representation), or how strong the worker wi ’s preference is for the task category ck (a real value representation). We represent uik as shown in Eq. (4): uik = g(f (wi , ck )),
(4)
where g(.) is the logistic function, and f (wi , ck ) represents the number of times worker wi completes a task of the category ck where the task is accepted. The idea of worker-category preferring matrix factorization is to derive two low-rank l -dimensional matrices W and C, where W ∈ Rl×m and C ∈ Rl×o are the latent feature matrices for workers and task categories, respectively. The
8
Man-Ching Yuen et al.
column vectors Wi and Ck representing the l -dimensional worker-specific and category-specific latent feature vectors of worker wi and category ck , respectively. We can define the conditional distributions over the observed workercategory preferring matrix in Eq. (5):
2 p(U |W, C, σU )=
m Y o Y
U 2 Iik N (uik |g(WiT Ck ), σU ) ,
(5)
i=1 k=1
where N (x|µ, σ 2 ) is the probability density function of the Gaussian distriU bution with mean µ and variance σ 2 , and Iik is the indicator function that is equal to 1 if worker wi has at least one completed task of the category ck being accepted and equal to 0 otherwise. To avoid overfitting, zero-mean spherical Gaussian priors are placed on the worker and the category latent feature matrices, which are defined in Eq. (6):
2 p(W |σW )=
m Y
2 2 N (Wi |0, σW I), p(C|σC )=
i=1
o Y
2 N (Ck |0, σC I).
(6)
k=1
Hence, through a Bayesian inference, the posterior distributions of W and C based only on the observed ratings are derived in Eq. (7): 2 2 2 2 2 2 p(W, C|U, σC , σW , σU ) ∝ p(U |W, C, σU )p(W |σW )p(C|σC ) m o m o YY Y U Y 2 Iik 2 2 = N (uik |g(WiT Ck ), σU ) × N (Wi |0, σW I) × N (Ck |0, σC I). i=1 k=1
i=1
k=1
(7)
3.3 Task-Category Grouping Matrix Factorization We have n tasks and o task categories. The task-category grouping matrix is denoted as D, where the element djk in D shows the category ck that task vj belongs to. The meaning of djk can be interpreted as whether the task vj belongs to the category ck (a binary representation) . We represent djk as shown in Eq. (8): djk = g(f (vj , ck )),
(8)
where g(.) is the logistic function, and f (vj , ck ) is an indicator variable with the value of 1 if the task vj belongs to the category ck , and 0 otherwise. The idea of task-category grouping matrix factorization is to derive two low-rank l -dimensional matrices V and C, where V ∈ Rl×n and C ∈ Rl×o are the latent feature matrices for tasks and task categories, respectively. The column vectors Vj and Ck representing the l -dimensional task-specific and
TaskRec: A Task Recommendation Framework in Crowdsourcing Systems
9
category-specific latent feature vectors of task vj and category ck , respectively. We can define the conditional distributions over the observed task-category grouping matrix in Eq. (9): 2 p(D|V, C, σD )=
n Y o Y I D 2 N (djk |g(VjT Ck ), σD ) jk ,
(9)
j=1 k=1
where N (x|µ, σ 2 ) is the probability density function of the Gaussian distribuD tion with mean µ and variance σ 2 , and Ijk is the indicator function that is equal to 1 if the entry djk is observed and equal to 0 otherwise. To avoid overfitting, zero-mean spherical Gaussian priors are placed on the task and the category latent feature matrices, which are defined in Eq. (10): p(V |σV2 ) =
n Y
2 N (Vj |0, σV2 I), p(C|σC )=
j=1
o Y
2 N (Ck |0, σC I).
(10)
k=1
Hence, through a Bayesian inference, the posterior distributions of V and C based only on the observed ratings are derived in Eq. (11): 2 2 2 2 p(V, C|D, σC , σV2 , σD ) ∝ p(D|V, C, σD )p(V |σV2 )p(C|σC ) o n n o YY Y I D Y 2 2 N (djk |g(VjT Ck ), σD ) jk × = N (Vj |0, σV2 I) × N (Ck |0, σC I). j=1 k=1
j=1
k=1
(11) 3.4 A Unified Matrix Factorization for TaskRec According to the graphical model of the TaskRec framework described in Fig. 1, we derive the log function of the posterior distributions of TaskRec in Eq. (12): 2 2 2 2 2 ln p(W, V, C|R, U, D, σW , σV2 , σC , σR , σU , σD ) m n m o 2 2 1 XX U 1 XX R Iij rij − g WiT Vj − 2 Iik uik − g WiT Ck =− 2 2σR i=1 j=1 2σU i=1 k=1
n o m n 2 1 XX D 1 X T 1 X T T − 2 Ijk djk − g Vj Ck − 2 W Wi − 2 V Vj 2σD j=1 2σW i=1 i 2σV j=1 j k=1
o o m X n m X o n X X X X 1 X T R U D Ijk ln σD − 2 Ck Ck − Iij ln σR − Iik ln σU − 2σC j=1 i=1 j=1 i=1 k=1
−l
m X i=1
ln σW − l
k=1
n X j=1
ln σV − l
o X k=1
ln σC + C,
k=1
(12)
10
Man-Ching Yuen et al.
where C is a constant independent of the parameters. We can see the Eq. (12) is an unconstrained optimization problem, and maximizing the log-posterior distributions with fixed hyper parameters is equivalent to minimizing the sum-ofsquared-errors objective function with quadratic regularized terms in Eq. (13):
E(W, V, C, R, U, D) m n m X o 2 θU X 2 1 XX R U = Iij rij − g WiT Vj Iik uik − g WiT Ck + 2 i=1 j=1 2 i=1 +
+
θD 2
n X o X
D Ijk djk − g VjT Ck
j=1 k=1 o X θC CkT Ck , 2 k=1
2
+
k=1 m θW X
2
i=1
WiT Wi +
n θV X T V Vj 2 j=1 j
(13)
2 2 2 2 2 2 2 where θU = σR /σU , θD = σR /σD , θ W = σR /σW , θ V = σR /σV2 , and θC = 2 2 σR /σC . The local minimum can be found by performing the gradient descent on Wi , Vj and Ck , and the derived gradient descent equations are described in Eq. (14), Eq. (15) and Eq. (16) respectively:
n
X 0 ∂E = I R g WiT Vj − rij g WiT Vj Vj + θW Wi ∂Wi j=1 ij + θU
o X
0 U Iik g WiT Ck − uik g WiT Ck Ck ,
(14)
k=1
m
X 0 ∂E R = Iij g WiT Vj − rij g WiT Vj Wi + θV Vj ∂Vj i=1 + θD
o X
0 D Ijk g VjT Ck − djk g VjT Ck Ck ,
(15)
k=1
m
X 0 ∂E U =θU Iik g WiT Ck − uik g WiT Ck Wi + θC Ck ∂Ck i=1 + θD
n X
0 D Ijk g VjT Ck − djk g VjT Ck Vj ,
(16)
j=1 0
where g (.) is the first-order derivative of the logistic function. To reduce the model complexity, we set θW = θV = θC in our experiments. The training time for our model scales linearly with the number of observations.
TaskRec: A Task Recommendation Framework in Crowdsourcing Systems
11
3.5 Complexity Analysis The main computation of the gradient descent methods is evaluating objective function E and corresponding gradients on variables. Because of the sparsity of matrices R, U , and D, the complexity of evaluating the objective function in Eq. (13) is O (nR l + nU l + nD l), where nR , nU and nD are the number of non-zero entries in matrices R, U , and D respectively, and l is the number of dimensions of latent feature space. By using the similar approach, we can derive the complexities of Eq. (14), Eq. (15) and Eq. (16). Therefore, the total complexity for one iteration is O (nR l + nU l + nD l). It means that the complexity is linear with respect to the number of observations in the three sparse matrices. The complexity analysis shows that TaskRec can scale to very large datasets.
4 Experimental Analysis In this section, our experiments are intended to address the following two research questions: 1. How is our approach compared with the existing state-of-the-art approaches? 2. How do the model parameters affect the prediction accuracies of our approach?
4.1 Description of Dataset The data required for evaluating our framework require both worker performance history and worker task searching history. However, this kind of dataset is difficult to obtain because these data is only presented to the administrators of crowdsourcing systems and is not available for public. Our dataset is retrieved from the recent NAACL 2010 workshop on crowdsourcing, which has made publicly available all the data collected as part of the workshop4 . The data was collected within a month from multiple requesters seeking data for a diverse variety of tasks on MTurk. The data does not include information about all the tasks that the user may have completed on MTurk or the other tasks that were available on MTurk during the data collection period. Table 2 provides some statistics about our dataset. We use the same dataset as shown in [20] which also conducts experiments for TaskRec. Our dataset is mainly related to tasks for creating speech and language data, and thus this dataset should be categorized into one group in MTurk. To demonstrate our framework, we categorize the dataset by keywords of tasks given by MTurk [20], but we categorize the dataset by both language and keywords given by MTurk in this paper. The number of categories greatly reduces and it improves the performance of TaskRec. Our task categorization is shown in Table 3. 4
NAACL 2010 workshop: http://sites.google.com/site/amtworkshop2010/data-1
12
Man-Ching Yuen et al.
Table 2 Statistics of our dataset Number of workers Number of different tasks Number of categories Total HITs from all tasks Number of ratings Max number of HITs of a worker Min number of HITs of a worker Average number of HITs of a worker 1st quartile (25th percentile) of number of HITs of a worker 2nd quartile (50th percentile) of number of HITs of a worker 3rd quartile (75th percentile) of number of HITs of a worker
1,592 6,639 43 19,815 19,815 2,691 1 12.4 1 2 5
Table 3 Task categorization by both language and keywords given by MTurk in our dataset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
English-Afrikaans translations English-Azeri translations English-Bulgarian translations English-Bangla translations English-Bosnian translations English-Welsh translations English-Spanish translations English-Basque translations English-Farsi translations English-Irish translations English-Hindi translations English-Indonesian translations English-Korean translations English-Kurdish translations English-Latin translations English-Latvian translations English-Mongolian translations English-Maltese translations English-Nepali translations English-Punjabi translations English-Kapampangan translations English-Polish translations
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
English-Romanian translations English-Russian translations English-Slovak translations English-Somali translations English-Albanian translations English-Serbian translations English-Tamil translations English-Thai translations English-Turkmen translations English-Tagalog translations English-Turkish translations English-Tatar translations English-Ukrainian translations English-Urdu translations English-Uzbek translations English annotations Spanish annotations Arabic annotations English relevance judgment English creative writing English transcription
4.2 Evaluation Metrics To compare the prediction quality of our method with PMF, we use the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE) as the comparison metrics. MAE is defined in Eq. (17), and RMSE is defined in Eq. (18): P M AE = s RM SE =
i,j
|ri,j − rˆi,j | N
P
i,j
,
(17)
2
(ri,j − rˆi,j ) N
.
(18)
TaskRec: A Task Recommendation Framework in Crowdsourcing Systems
13
where ri,j denotes the rating that indicates the extent of the favor of task j for worker i, denotes the predicted rating, and N is the total number of testing ratings.
4.3 Performance Comparison In order to show the prediction performance improvements of TaskRec, we compare TaskRec with Probabilistic Matrix Factorization (PMF) [18], the state-of-the-art approach for recommendation systems. In the comparison, we employ different amount of training data, including 80%, 60%, 40% and 20%. 80% training data means we randomly select 80% of ratings from the dataset as training data, and leave the remaining 20% as prediction performance testing. The procedure is carried out 10 times independently, and we report the average values in this paper. The MAE results and RMSE results are reported in Table 4 and Table 5 respectively. In the comparison, we set θW = θV = θC = 0.00004, set θU = 0.0001 and θD = 0.01. The MAE results are reported in Table 4. From the results, we can see that our TaskRec approach outperforms PMF when the dimension is high and the amount of training data is not too low (80%, 60% and 40%). The biases of values in our dataset lead to high MAE results for both PMF and TaskRec. We found that most rejected tasks are already removed in our dataset. Therefore, for the value transformation, we have 10,411 approved tasks (value transformed to 5), 9,399 submitted tasks (value transformed to 3) and only 5 rejected tasks (value transformed to 4).
Table 4 MAE comparison with PMF (A smaller MAE value means a better performance) Training data 80% 60% 40% 20%
Dimension = 5 PMF TaskRec 1.2391 1.3966 1.2406 1.5087 1.2366 1.4388 1.2410 1.5022
Dimension = 15 PMF TaskRec 1.2415 1.3764 1.2442 1.5345 1.2511 1.4272 1.2520 1.4865
Dimension = 20 PMF TaskRec 1.2357 0.5106 1.2493 0.6470 1.2570 1.0235 1.2540 1.2685
Table 5 RMSE comparison with PMF (A smaller RMSE value means a better performance) Training data 80% 60% 40% 20%
Dimension = 5 PMF TaskRec 1.4835 1.9192 1.4869 2.0336 1.4820 1.8237 1.4869 1.8258
Dimension = 15 PMF TaskRec 1.4956 1.8316 1.4971 1.9777 1.5042 1.7802 1.5069 1.7981
Dimension = 20 PMF TaskRec 1.4891 0.8159 1.5005 0.9992 1.5109 1.3789 1.5100 1.5923
14
Man-Ching Yuen et al.
4.4 Impact of Parameters θU and θD TaskRec utilizes both workers’ task preferring information and tasks’ category grouping information to perform the prediction. We incorporate worker-task preferring matrix, worker-category preferring matrix, task-category grouping matrix together based on a unified probabilistic matrix factorization. The parameter θU controls the impact of the worker-category preferring matrix, and the parameter θD controls the impact of the task-category grouping matrix. If we set both θU and θD as 0, it means we only consider workers’ task preferring information; if we set both θU and θD as + inf, it means we only utilizes workers’ category preferring information. We test the impact of the two parameters independently. To reduce the complexity, we assume that θW , θV and θC are equal. When we test the impact of parameter θU , we set θW = θV = θC = 0.00004, set θD = 0.4, and the results are shown in Fig. 2. When we test the impact of parameter θD , we set θW = θV = θC = 0.00004, set θU = 0.0001, and the results are shown in Fig. 3. We found that the impacts of parameters are similar for different values of θW , θV and θC , and also for different dimensionality. Experimental results show that both θU and θD impact the prediction accuracies significantly. It indicates that both workers’ task preferring information and workers’ category preferring information can impact the prediction accuracies significantly. In Fig. 2(a) and Fig. 2(b), we observe that as the value of θU increases and is greater than a threshold value, both MAE and RMSE increase (performances decrease). In Fig. 3(a) and Fig. 3(b), we observe that as the value of θD increases, both MAE and RMSE decrease (performances increase). This observation meets our expectation, because utilizing both workers’ task preferring information and workers’ category preferring information together can perform better than only utilizing either one of them. Our TaskRec approach performs best when θU [0.00001, 0.001] and θD [0.005, 0.1].
5 Conclusion and Future Work In this paper, we have proposed a Task Recommendation (TaskRec) framework in crowdsourcing systems. We have proposed a value transformation method to map workers’ behaviors into ratings. Our proposed method can solve the cold start problem and scale to very large datasets as shown in our complexity analysis. For performance analysis, we have demonstrated our experimental results based on real-world datasets, and we found that our TaskRec approach outperforms PMF when the amount of training data is not too low. Currently, we are building up our crowdsourcing system to collect large and complete dataset for detailed performance evaluation.
TaskRec: A Task Recommendation Framework in Crowdsourcing Systems
(a) θU , MAE
(b) θU , RMSE
Fig. 2 Dimensionality = 20, Impact of Parameters θU
15
16
Man-Ching Yuen et al.
(a) θD , MAE
(b) θD , RMSE
Fig. 3 Dimensionality = 20, Impact of Parameters θD
TaskRec: A Task Recommendation Framework in Crowdsourcing Systems
17
Acknowledgments This work was partially supported by grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (CUHK 413212) and Direct Grant (CUHK 2050498).
References 1. Man-Ching Yuen, Irwin King, and Kwong-Sak Leung. A survey of crowdsourcing systems. In SocialCom ’11: Proceedings of The Third IEEE International Conference on Social Computing, pages 766–773. IEEE Computer Society, 2011. 2. Man-Ching Yuen, Ling-Jyh Chen, and Irwin King. A survey of human computation systems. In CSE ’09: Proceedings of IEEE International Conference on Computational Science and Engineering, pages 723–728. IEEE Computer Society, 2009. 3. Tom Chao Zhou, Hao Ma, Irwin King, and Michael R. Lyu. Tagrec: Leveraging tagging wisdom for recommendation. In Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04, pages 194–199, Washington, DC, USA, 2009. IEEE Computer Society. 4. Hao Ma, Irwin King, and Michael R. Lyu. Effective missing data prediction for collaborative filtering. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’07, pages 39–46, New York, NY, USA, 2007. ACM. 5. Chen Cheng, Haiqin Yang, Irwin King, and Michael R. Lyu. Fused matrix factorization with geographical and social influence in location-based social networks. In AAAI ’12: Proceedings of Twenty-Sixth Conference on Artificial Intelligence, Toronto, Canada, 2012. 6. Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar, and David M. Pennock. Methods and metrics for cold-start recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’02, pages 253–260, New York, NY, USA, 2002. ACM. 7. Vamsi Ambati, Stephan Vogel, and Jaime Carbonell. Towards task recommendation in micro-task markets. In AAAI ’11: Proceedings of The 25th AAAI Workshop in Human Computation. AAAI Publications, 2011. 8. Osamuyimen Stewart, David Lubensky, and Juan M. Huerta. Crowdsourcing participation inequality: a scout model for the enterprise domain. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10, pages 30–33, New York, NY, USA, 2010. ACM. 9. Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. Who are the crowdworkers?: shifting demographics in mechanical turk. In Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems, CHI EA ’10, pages 2863–2872, New York, NY, USA, 2010. ACM. 10. Lydia B. Chilton, John J. Horton, Robert C. Miller, and Shiri Azenkot. Task search in a human computation market. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10, pages 1–9, New York, NY, USA, 2010. ACM. 11. Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, August 2009. 12. Shangming Yang and Mao Ye. Global Minima Analysis of Lee and Seung’s NMF Algorithms. Neural Process. Lett., 38(1):29–51, August 2013. 13. Shangming Yang and Zhang Yi. Convergence Analysis of Non-Negative Matrix Factorization for BSS Algorithm. Neural Process. Lett., 31(1):45–64, February 2010. 14. Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems, volume 20, 2008. 15. Man-Ching Yuen, Irwin King, and Kwong-Sak Leung. Task matching in crowdsourcing. In CPSCom ’11: Proceedings of The 4th IEEE International Conference on Cyber, Physical and Social Computing, pages 409–412. IEEE Computer Society, 2011.
18
Man-Ching Yuen et al.
16. Man-Ching Yuen, Irwin King, and Kwong-Sak Leung. Task recommendation in crowdsourcing systems. In KDD ’12: Proceedings of ACM KDD 2012 Workshop on Data Mining and Knowledge Discovery with Crowdsourcing (CrowdKDD). ACM, 2012. 17. Haoqi Zhang, Eric Horvitz, Rob C Miller, and David C Parkes. Crowdsourcing general computation. In CHI ’11: Proceedings of ACM CHI 2011 Workshop on Crowdsourcing and Human Computation. ACM, 2011. 18. Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In NIPS ’07: Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems. Curran Associates, Inc., 2007. 19. Delbert Dueck, Brendan J. Frey, Delbert Dueck, and Brendan J. Frey. Probabilistic sparse matrix factorization. Technical report, University of Toronto, 2004. 20. Man-Ching Yuen, Irwin King, and Kwong-Sak Leung. Taskrec: Probabilistic matrix factorization in task recommendation in crowdsourcing systems. In ICONIP (2), pages 516–525, 2012.