Nonlinear transformation of tensor factorization for collaborative filtering

0 downloads 0 Views 554KB Size Report
Keywords—Collaborative filtering; nonlinear transfor- mation; tensor .... The 21st Annual Conference of the Japanese Neural Network Society (December, 2011).
The 21st Annual Conference of the Japanese Neural Network Society (December, 2011)

[P3-25]

Nonlinear transformation of tensor factorization for collaborative filtering Shougo Nakamura (PY), Yu Fujimoto, and Kouzou Ohara Graduate School of Science and Engineering, Aoyama Gakuin University E-mail: [email protected] Abstract— In this paper, an extension of tensor factorization based on nonlinear transformation is proposed to formulate collaborative filtering using spatiotemporal information. The nonlinear function we adopted for the transformation has only one tuning parameter, but it will give us greater flexibility to model observed data more precisely. We experimentally investigated the effectiveness of our method using an artificial dataset. Keywords— Collaborative filtering; nonlinear transformation; tensor factorization; model selection; spatiotemporal information

In this paper, we adopt three kinds of tensor factorization methods. The first one is PARAFAC [1], a standard method for tensor factorization, by which a rating sP abcd can be represented as follows, sP abcd =

K 

uak ibk tck pdk .

(1)

k=1

The next one is Tucker [1], which gives the following rating model: sT abcd =

K  K K  K  

uae ibf tcg pdh xef gh ,

(2)

e=1 f =1 g=1 h=1

1

Introduction

In recent years, the rapid growth of the internet enables us to access huge amounts of information. However, we sometimes cannot find appropriate items (products, services, etc.) which we really need among many candidates. Recommender systems have been proposed to solve this problem; these systems will recommend items which seems to be our favorites. Collaborative filtering using spatiotemporal information is an extension of the conventional setup in recommender systems; the purpose of this framework is to predict an unobserved rating sabcd of item b for user a at time c in place d, based on observed ratings (the framework is called multidimensional approach in [4]). The aims of this paper are to formulate collaborative filtering using spatiotemporal information based on tensor factorization and to generalize it to obtain flexible models described only with one additional tuning parameter. In Section 2, we introduce three types of linear tensor factorization and generalize them by nonlinear transformation. Then, we compare the rating models based on them and investigate the effectiveness of our method using an artificial dataset in Section 3. Concluding remarks are given in Section 4. 2

Proposed Models

Our purpose is to construct an appropriate rating model from observed data for collaborative filtering using the spatiotemporal information. To this end, we consider ratings as elements of a tensor, and introduce tensor factorization, where a rating sabcd is represented by parameter vectors, ua = (ua1 , . . . , uaK ) ∈ RK , ib = (ib1 , . . . , ibK ) ∈ RK , tc = (ic1 , . . . , icK ) ∈ RK and pd = (pd1 , . . . , pdK ) ∈ RK , each of which represents user a ∈ {1, . . . , A}, item b ∈ {1, . . . , B}, time c ∈ {1, . . . , C}, and place d ∈ {1, . . . , D}, respectively.

where xef gh ∈ R is called an element of the core tensor; intuitively, xef gh represents joint dependency of uae , ibf , tcg and pdh . As the last one, we consider the simplified form of PARAFAC where K = 1: sO abcd = ua ib tc pd .

(3)

We refer to this model as the Outer-product. These linear methods are not always appropriate for modeling individuals’ ratings in various situations. Thus, we extend them by means of the interlaced generalized linear model [2] that introduces nonlinearity into the linear model by using an arbitrary monotone differentiable nonlinear function. In this paper, we use the following one-parameter family function, 1

φπ (s) = sgn(s)(π|s|) π ,

(4)

where π ∈ (0, ∞) is a parameter for this function. With this function, we propose the following interlaced generalized linear models,  P  π sP (5) abcd = φπ sabcd ,  T  Tπ sabcd = φπ sabcd , (6)  O  Oπ sabcd = φπ sabcd . (7) Note that Eqs. (5)-(7) are reduced to Eqs. (1)-(3) at π = 1, respectively. By introducing nonlinearity through Eq. (4), these extended models are expected to be more flexible than the conventional ones. To estimate parameters in Eqs. (1)-(3) and (5)-(7), we minimize the sum of squared errors between the observed ratings and their estimates as follows,  min (sabcd − s∗abcd )2 , (8) ∗ s

(a,b,c,d)∈S

where S = {(a, b, c, d) | sabcd is observed}, and s∗ corresponds to either of the LHSs of Eqs.(1)-(3) and (5)(7). To solve this minimization problem, we apply

1.05

Table 1: MSEs on training sets for the linear models. PARAFAC

Tucker

7,500 15,000 30,000

1.3466(2) 0.9825(2) 0.8548(2)

0.9051(3) 0.8588(3) 0.8478(3)

1

Outerproduct 0.8998 0.8482 0.8179

MSE

I

I=7,500 I=15,000 I=30,000

0.95 0.9 0.85

the stochastic gradient descent approach [3], where the parameters of each model are iteratively estimated by simple update rules. For example, uak in our proposed nonlinear PARAFAC is updated for a given observation sabcd as follows, uak

 1−π   K   π   ← uak + 2γ π  uak ibk tck pdk     k =1

π × (sabcd − sP abcd )ibk tck pdk ,

(9)

where γ is a small constant. In our experiments, we used γ = 10−6 , and determined K and π for the above models by using 10-fold cross validation (10-CV). 3

Experiments To evaluate our proposed models in collaborative filtering tasks, we compared the models constructed with a part of ratings artificially generated according to the following Tucker model with K = 2 for A = 500, B = 500, C = 2, and D = 5: sabcd =

2  2 2  2  

ʌ=1.6

ʌ=0.8

uae ibf tcg pdh xef gh +εabcd , (10)

e=1 f =1 g=1 h=1

where εabcd ∼ N (0, 1). Out of the total of 2,500,000 ratings, we randomly chose 1,250,000 ratings for test, and I = 7, 500, 15, 000, and 30, 000 ratings for training from the rest. Then, we first selected the best one from the linear models defined by Eqs. (1)-(3) in the mean squared error (MSE) derived from 10-CV on the training set, varying K from 2 to 5 for PARAFAC and Tucker. Next, we searched for the optimal value of π for the nonlinear extension of the chosen model, whereby its MSE is minimized. Finally, we compared the chosen model and its nonlinear extension in terms of the MSE for the test set. Table 1 shows MSEs of the linear models for the training sets, where the values in the parentheses are the corresponding values of K for PARAFAC and Tucker. This result indicates that a simple model, Outer-product, could be a good choice even if the true model is complex. Figure 1 illustrates the results of the parameter search for the nonlinear Outerproduct. Eventually, we adopted π = 1.6, 0.8, and 1.0 for I = 7, 500, 15, 000, and 30, 000, respectively. Note that the optimal value of π is not equal to 1.0 when I is small. Table 2 shows MSEs of both the linear and the nonlinear Outer-product models for the test set, where each model was constructed with the whole training set of each size, and evaluated on the test set. We can find that the nonlinear model is comparable or superior to the linear model. Especially, it

0.8

ʌ=1.0

0.4

0.6

0.8

1

ʌ

1.2

1.4

1.6

1.8

Figure 1: Results for Outer-product model. Table 2: MSEs of Outer-product and their nonlinear extensions for the test set. I OuterNonlinear product Outer-product 7,500 0.9490 0.8556 15,000 0.8460 0.8459 30,000 0.8157 0.8157 tends to much improve MSEs for the smaller training set. From these results, we can say that introducing nonlinear model could be effective if only the limited number of observed ratings are available. 4

Conclusion We formulated collaborative filtering using spatiotemporal information based on tensor factorization, and extended it from the viewpoint of interlaced generalized linear models. For this extension, we introduced a nonlinear function and experimentally evaluated its effectiveness with an artificial dataset including the large number of missing ratings. The experimental results show our method is promising to construct more precise models with the limited number of observed ratings. In this paper, we implemented the extension based on a fixed family of nonlinear functions. However, there are other one-parameter families. Further discussions are needed to select an appropriate family for this type of extension. In addition, we are planning to introduce a regularization term to tensor factorization. References [1] Cichocki, A., Zdunek, R., Phan, A.H., & Amari, S. (2009). Nonnegative Matrix and Tensor Factorizations, Wiley [2] Delannay, N., & Verleysen, M. (2008). Collaborative filtering with interlaced generalized linear models, Neurocomputing, 71, 1300-1310. [3] Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques For Recommender Systems, the IEEE Computer Society, 42(8), 30-37. [4] Adomavicius, G., Sankaranarayanan, R., Sen, S., & Tuzhilin, A. (2005). Incorporating Contextual Information In Recommender Systems Using a Multidimensional Approach, ACM Transactions on Information Systems, 23(1), 103-145.