Open Comput. Sci. 2016; 6:126–137
Research Article
Open Access
Anuja Arora*, Vaibhav Taneja, Sonali Parashar, and Apurva Mishra
Cross-domain based Event Recommendation using Tensor Factorization DOI 10.1515/comp-2016-0011 Received Mar 15, 2016; accepted Jul 29, 2016
Abstract: Context in the form of meta-data has been accredited as an important component in cross-domain collaborative filtering (CDCF). In this research paper CDCF concept is used to exploit event information (context) from two UI matrices to allow the recommendation performance of one domain (Facebook- User-Event Matrix) to benefit from the information from another domain (Bookmyshow- Event-Tag Matrix). The model based collaborative filtering approach Tensor Factorization(TF) has been used to integrate Facebook provided User-Event context information with Bookmyshow Event-Tag context information to recommend events. In contrast to the standard collaborative tag recommendation, our CDCF approach uses one User-Event matrix of Facebook that takes another Bookmyshow Event-Tag matrix as additional informant. The proposed cross-domain based Event Recommendation approach is divided into three modules- i) data collection which extracts the unstructured dataset from the two domains Bookmyshow and social networking site Facebook using API’s; ii) data mapping module which is basically used to integrate the common knowledge/ data that can be shared between considered different domains (Facebook & Bookmyshow). This module integrates and reduces the data into structured events’ instances. As the dataset was collected from two different sites, an intersection of both was taken out. Therefore this module is carefully designed according to reliability of information that is common between two domains; iii) 3 order tensor factorization and Latent Dirichlet Allocation (LDA) used for most preferable recommendation by less pertinent result
*Corresponding Author: Anuja Arora: CSE/IT Department, Jaypee Institute of Information Technology, Noida, India; Email:
[email protected] Vaibhav Taneja: CSE/IT Department, Jaypee Institute of Information Technology, Noida, India; Email:
[email protected] Sonali Parashar: CSE/IT Department, Jaypee Institute of Information Technology, Noida, India; Email:
[email protected] Apurva Mishra: CSE/IT Department, Jaypee Institute of Information Technology, Noida, India; Email:
[email protected]
reduction. The proposed 3 order tensor factorization is designed for maximizing the mutual benefit from both the considered domains (organizer and user). Therefore providing three recommendations: For organizers: 1) system recommends places to conduct specific event according to maximum of attendees of a particular type of event at a specific location; 2) recommending target audience to organizer: those who are interested to attend event on the basis of past data for promotion purposes. For users: 3) recommending events to users of their interest on the basis of past record. Our result shows significant improvement in reduction of less relevant data and result effectiveness is measured through recall and precision. Reduction of less relevant recommendation is 64%, 72% and 63% for place recommendation to organizer, target audience recommendation to organizer and event recommendation to user respectively. The proposed tensor factorization approach achieved 68% precision, 15.5% recall in recommending attendees to organizer and 62% precision, 13.4% recall for event recommendation to user. Keywords: Cross Domain Based Recommendation; Event Recommendation; Matrix factorization; Tensor Factorization
1 Introduction Recommender systems or recommendation systems are a subclass of information filtering system that seek to predict the ’rating’ or ’preference’ that a user would give to an item. Recommender systems have become extremely common in recent years, and are applied in a variety of applications. The most popular ones are probably movies, music, news, books, research articles and products in general. However, there are also recommender systems for experts, jokes, restaurants, financial services, life insurance, people (online dating), and Twitter followers [1, 18]. In the proposed research work, the proposed recommender system approach is being designed for the events happening at various locations in India. It is commonly observed that the difference between the number of people
© 2016 A. Arora et al., published by De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.
Unauthenticated Download Date | 10/15/16 3:09 PM
Cross-domain based Event Recommendation using Tensor Factorization
interested in going to an event and the number of people who actually attend them is very large [7, 11]. Thus, we assayed and employed cross-domain based collaborative filtering(CDCF) [19, 21, 22] to improve the performance of event recommendation by recommending less but effectual/preferable events to that section of society that are more likely to attend them on the basis of past preferences of users fitting in alike social tag. CDCF has recently started to draw significant research attention [5, 22]. CDCF aims to relieve the sparsity problem in individual CF [16] domains by channelizing knowledge among relevant domains. The proposed CDCF technique used here for event recommendation holds particular importance for two reasons. First, this introduces mutual benefit for both organizers as well as the user. Organizers/ data owner can use this for further improving their service quality and to get an idea of their target audiences and a place to conduct the event. Users can use this to know their preferred event list in which they might be interested to go. Second, to identify user’s preferred events tag through single collaborative filtering domain Facebook is not possible as it does not provide events Tag. Therefore, Bookmyshow is used to get event context (tag) information, which is required to resolve the issue and recommend appropriate events. In context of the above stated problems, this work has focused on assignment of social event tags from Bookmyshow to the social networking site Facebook users with respect to the extracted meta-data according to user’s already going/ interested or preferred events to annotate and categorize. So, events and associated tags have been picked from Bookmyshow and users and associated events have been extracted from Facebook and thus events of Bookmyshow which are present on Facebook are recommended to the users as per his past attended event tags as shown in figure1. We illustrate in figure 1 an exemplified scenario of chosen domains- Facebook and Bookmyshow. The proposed recommendation approach would aim to improve a user’s search out of large amounts of contents and identify more effectively the services (events as of now) that are likely to be more preferable or attractive to the user. The quality of the recommendation system would be defined from the perspective of increase in the number of people going for the event. Traditional recommender system evaluation focuses on raising the recommendation accuracy, or lowering the absolute prediction error of the recommendation algorithm. Recently, however, discrepancies between commonly used matrix factorization (e.g. random recommendation) and the experienced quality recommendation using multiple factors of user’s interest have been brought to light. In this research paper, we
| 127
Figure 1: CDCF example, Facebook domain provides User-event mapping [color1 represents user’s going and color2 represent user’s interested event]. However, Bookmyshow provides Event-Tag mapping. This knowledge was used to recommend most preferable events to users.
aim to address these discrepancies by attempting to develop novel means of recommender system’s evaluation by incorporating the multidimensional tensor concept which encompasses qualities identified through traditional evaluation metrics and user-centric factors. Our recommendation system would benefit organizers who plan to increase their revenue by attracting more number of people towards their events and to an individual by recommending the most preferable event for them. The remainder of this paper is organized as follows: Sect. 2 discusses about the background and related work done and its shortcomings. Section 3 discusses about the applicable data collection from considered domains (Facebook and Bookmyshow). Section 4 shows mapping of cross domain datasets as both domains may have different events therefore knowledge mapping is an important aspect of CDCF. Section 5 is about recommendation experiment using tensor factorization and tensor dimensions are further reduced as per the assigned weight using Latent Dirichlet Allocation algorithm. Section 6 is associated with results which show reduction in less relevant recommendations and accuracy improvement with the help of recall and precision [27] and finally, Sect. 7 concludes the paper and mentions future scope.
2 Background and related work In this section, we briefly present some of the research literature related to collaborative filtering recommendation work. Literature survey details that earlier work related
Unauthenticated Download Date | 10/15/16 3:09 PM
128 | A. Arora et al. to the collaborative filtering carried some intrinsic limitations for social networks. The outgrowth of social network has boosted extensive number of recommendation issues in the past years. User contributed personal information has become important source of recommendation systems [24]. Even organizers/ Data owner companies which do not have social network wish to know user’s personalized choice/ requirements through social networking sites. Recommendation system algorithms [2, 13] that attempt to leverage social relationship with any content, presume that users who preferred a particular type of content may also like similar type of other content. This kind of recommendation system is possible using collaborative filtering U-I Matrix if single domain contains all required information. Whereas cross-Domain based concept exploits information from multiple domains if complete content does not exist in a single domain. The idea of exploiting correlations among CF domains was first introduced by Li [15] in 2007, but no solution was provided therein. CDCF raised numerous issues like how could we leverage available personal data provided via various domains to generate better recommendations [10]. For these reasons, recent research has been focused on developing cross domain based collaborative filtering and researchers focused and started work related to this multiple domain collaboration kind of recommendation and proposed a new recommendation approach such as multidomain collaborative filtering methods based on it [5, 21]. Most similar work in this direction has been done by Liang Hu et al. in 2013 [22] and their work proposed personalized recommendation via cross-domain triadic factorization (CDTF). Their devised CDTF model used to learn the triadic factors for user, item and domain and item dimension varies with domains. Loni B. et al. [2014] work is also on same direction but his work doesn’t cover social network data. They build the assumption that different patterns characterize the way user interact with different items(Book, Movies & Music). Recent work has been done by Evgeny Frolov and Ivan Oseledets which is a survey paper and provides a comprehensive overview on the tensor methods and recommender systems [25]. Simon Dooms et al. [8, 9] collected dataset automatically from structured social media posts (i.e., twitter). The dataset consisted of natural and realistic data, and furthermore incorporates all the most recent and popular movies using four dimensions- user, item, rating and time. Werner Geyer et al. [6] by understanding the different classification of events, gave an appropriate ranking measure based on user proximity and the popularity of an event which
may be applied in order to make recommendations more effective. Einat Minkov et al. [17] proposed an active learning extension to the collaborative approach that is aimed at reducing the amount of explicit feedback required from the user. Ioannis Boutsis et al. [4] exploited the social behavior of the human crowd to identify group attendance behaviors and predict the next event for a user to attend. The experimental study with a real-world dataset illustrates the contributions of the approach, and verifies its practicality and efficiency on single domain. Here in this work, we would like to emphasize that side information of various domains can be important source of better recommendation as it can serve as a personalized recommendation using various domains’ user related knowledge [3]. Outside the core research of CDCF subsequent work focuses on modeling of gathered various domain information. Although most of the work on CDCF has focused on Traditional Matrix factorization (U-I matrix) approach, there has been recent work done in adding multiple context (tensors) to recommendation for single domain CF. Panagiotis Symeonidis et al. [20] developed a unified framework to model the three types of entities that exist in a social tagging system: users, items and tags. These data are represented by a 3-order tensor on which latent semantic analysis and dimensionality reduction is performed using higher order singular value decomposition technique. Alexandros Karatzoglou et al. [12] introduced a Collaborative Filtering method based on Tensor Factorization, a generalization of Matrix Factorization that allows for a flexible and generic integration of contextual information by modeling the data as a User-Item-Context Ndimensional tensor instead of the traditional 2D User-Item matrix. Thus, our work is complimentary to the aforementioned works, where we investigate previously unexplored features of event recommendation and try to arrange the best possible dataset consisting of realistic upcoming as well as past events
3 Dataset collection The data set is extracted from Bookmyshow¹ and Facebook. First selected domain is Bookmyshow (under Bigtree Entertainment Pvt. Ltd) which was founded in 2007 and is currently the largest event and movies ticketing brand in India. Bookmyshow is headquartered in Mumbai, Ma-
1 in.Bookmyshow.com/.
Unauthenticated Download Date | 10/15/16 3:09 PM
Cross-domain based Event Recommendation using Tensor Factorization
harashtra and has offices in New Delhi, Bangalore, Hyderabad, Chennai and Kolkata. Bookmyshow has built world class mobile application which according to experts is most successful E-Commerce application in India incurring 60% of its transactions. It is serving to 800 to 900 cinemas in 200 cities and towns. It provides information about upcoming movies and events, show timings, venue details and artist bios. It occupies 85 to 90 percent of the online entertainment-ticketing market. It has established as the leader in India online ticketing industry. Second selected domain is Facebook (Facebook, Inc.) which is an online social networking service founded in 2004 and is one of the largest social networks. It is the fastest company in the Standard & Poor’s 500 Index to reach a market cap of $250 billion. Facebook is headquartered in Menlo Park, California with offices in USA, Canada, Japan, India, etc. Facebook has over 1.18 billion monthly active users as of August 2015 and is used to add other users as "friends", exchange messages, post status updates and photos, share videos, use various apps, share events, etc. Facebook has revolutionized the average person’s social network into an unprecedented web of information available in the palm of a hand.
| 129
Figure 2: Bookmyshow Events count based on location.
3.1 Bookmyshow event Dataset Bookmyshow organizes event data on a row-per-record basis. First of all, we need to generate the Events dataset. We follow 3 sequential steps (1) extract the different events links available from Bookmyshow using python leveraging selenium webdriver during the period Oct 2015 – Dec 2015. (2) Use links to extract out Event’s data using Java JSOUP library. (3) clean and structure the unstructured data manually and stored 1095 events. The features and description of each column of the events log data are presented in table 1. Table 1: Bookmyshow Events Data Format.
Col # 1 2 3 4 5
Description City Event Name Date Event Tags Address
Data Bengaluru Eiffel Tower - Kids Workshop Oct 17 2015 Kids, Workshops, English Haagen Dazs: Bengaluru2981, Icon Mall, 12th Main, 5th Cross HAL II Stage, Indiranagar, Bengaluru, Karnataka 560038, India
Figure 3: Fragment of the Event Data cloud.
The Event Tags column contains event category tags separated by comma. The Date mainly uses the 3 formats of (1) M ddyyyy (ex. Oct 17 2015), (2) M dd – ddyyyy (ex. Oct 17 - 31 2015) and (3) M dd – M ddyyyy (ex. Oct 17 – Dec 15 2015). Extracted events are of various locations. Events count and location are summarized in figure 2. The diversity of these different cites makes dataset more interesting. User event tag and cites play an important role when user selects the event to attend in proposed cross Domain based event recommendation approach. Figure 3 shows a fragment of the Event Data cloud that shows how Bookmyshow has linked event places and associated tags belonging to various events. In the event data cloud figure, Nodes are depicted by three colors: pink, blue and yellow, where pink color circles present event tags, blue color circles presents events name and
Unauthenticated Download Date | 10/15/16 3:09 PM
130 | A. Arora et al. Table 2: Graph API Explorer Query
query Query # 1 search?q=*&type=event¢er={latitude}, {longtude}&distance={radius in km} 2 search?q={city}&type=event 3 {event_id}/?fields=attending
Data type Event Event User
Table 3: Facebook Events data format. Figure 4: Event-Tag graph of two cities (Mumbai & Bengaluru).
yellow color circles show its geography ontology; edges are basically connection in between two nodes according to their associated relationship. This event cloud contains two fundamental relationships: ‘Has_Tag’ and ‘Happens_In’. ‘Has_Tag’ is link in between event and its associated tags and ‘Happens_In’ is relationship in between geography location and corresponding event (event held at specific location). Shown Event cloud fragment depicts 371 events containing 261 tags and held at seven cities those are linked through ‘Happens_In’ and ‘Has_Tag’ relationships. Figure 4 depicts some basic statistics of the events in cities Bengaluru and Mumbai showing the various categories of events and the diversity of each category. There are some categories of events doing good in Mumbai and some in Bengaluru (Example: Stand-Up and Hindi events are happening more in Mumbai Vs Workshops and Party types in Bengaluru.)
3.2 Facebook event Dataset There are events available on Facebook and users can be chosen from interested or going for that event. Users can also be invited to an event by any of his friends or can choose from events displayed on events page. Data is organized on a row-per-record basis. First of all, we need to generate the Events dataset and to do this we follow 2 sequential steps1. extract the different events data using Facebook Graph API Explorer and Facepager Tool. Table II shows query #1 and query#2 that are used to extract events happening in a certain city. Total 1938 events are extracted; 2. pre-process to clean and structure the extracted. Finally 1292 events are structured after preprocessing.
Col # 1 2
Description Event Id Name
3 4 5 6
Start Time Location Time zone End Time
Example 891396917616352 Bengaluru Midnight Marathon 2015 2015-12-05T23:55:00+0530 KTPO, Whitefield, Bengaluru Asia/Kolkata 2015-12-06T05:00:00+0530
Table 4: Facebook extracted User-Events data format.
Col # 1 2 3
Description Name User Id Event Id
Example Amir 1123345677878 9876545433234564
Preprocessing has been done for some manual observations to clean data. We observed in date attribute of data that it uses 3 different date formats - M dd yyyy (ex. Oct 17 2015), M dd – dd yyyy (ex. Oct 17 - 31 2015) and M dd – M dd yyyy (ex. Oct 17 – Dec 15 2015) in Bookmyshow whereas Facebook has been used different date format. Therefore, we preprocessed this date format according to Facebook format and then match both website event data on the basis of date. Similar sort of pre-processing steps implied to map other context information. The features and description of each column of the events log data are presented in Table 3. The events data extracted in previous step has been used to extract number of users attending that event and user ids of the users attending that event. Facebook Graph API query #3 in Table 2 has been used to extract the attendee list. The features and description of each column of the users log data are presented in Table 4. The events and the number of people attending that event makes that event successful or unsuccessful. Figure 5 shows events happening in different cities using Facebook data and figure 6 shows the number of users attending events at various locations.
Unauthenticated Download Date | 10/15/16 3:09 PM
Cross-domain based Event Recommendation using Tensor Factorization
Context information of the users and events (user id, user location, event tag, event location, user attended events etc.) can be used to predict the target audience and recommended users for that event. This information is used to predict and recommend events to similar context information users. There were 137830 users attending these events out of which 19300 were attending multiple (more than one) events. This has been observed that there is significantly less number of events happening in Bengaluru but the number of users attending these events is quite good for a specific period as shown in figure 5 and 6.
| 131
4 Mapping of cross domain Data set A fundamental question that arises during cross domain based recommendation is what would be the common data that can be shared between different domains [12]. Therefore, the data processing module is basically used to integrate the common knowledge/ data that can be shared between considered different domains (Facebook & Bookmyshow). This module integrates and reduces the data into structured event instances. As the dataset was collected from two different sites and usually Bookmyshow event organizers post their event data on Facebook for promotion purpose and to send invite to users. Therefore we intersect extracted data (both website) on the basis of event name and event location and get sufficient amount of mapped dataset as intersection was taken out (equation 1), some manual preprocessing work has been done in Facebook extracted event name to make it more relevant for mapping purpose. The data extracted from Bookmyshow provides us with Event-Tag Matrix and the data extracted from Facebook gives us the potential Event- Attendees(users attending event) Matrix. The Intersection function used to get common knowledge is Mapped dataset = (fb_event name^bms_event name) (1) ^(fb_events location^bms_events location)
Figure 5: Facebook Events count based on location.
After performing cross domain mapping, we can get the reconstructed recommendation matrix where, a user u associated with an event ei with a tag t, in order to retrieve appropriate events harmonizing with mapped tags. Finally, the information that the mapped dataset contains is expressed in Table 5. Table 5: Facebook and Bookmyshow mapped data format.
Figure 6: Active users attending events at various locations.
Col # 1 2
Description Event Id Name
3 4 5
User Id Location Tag
Example 891396917616352 Bengaluru Midnight Marathon 2015 1123345677878 KTPO, Whitefield, Bengaluru Adventure, Gaming
Table 6 shows the mapped event data statistics means events those intersecting of both considered domains.
Unauthenticated Download Date | 10/15/16 3:09 PM
132 | A. Arora et al. Table 6: Facebook and Bookmyshow mapped data.
Website
Facebook Bookmyshow Mapped Events
Intially fetched events 1938 1095 728
After preprocessing events 1292 915 672
No of Users 1,37,830 NA 83,659
5 Cross domain recommendation using Tensor factorization Tensor Factorization (TF) is multidimensional and multimode extension of Matrix Factorization (MF) [23, 25]. This section discussed the details of TF on the considered multiple domains and mentioned the usage of tensor factorization adoption to take advantage of multiple dimensions of cross domains.
Figure 7: Visualization of the three unfolding of considered 3order tensor to recommend best suitable location to organizer.
5.1 The Model: We first briefly review the notations used in tensor factorization technique [14]. Tensors are denoted by calligraphy boldface letters (A, B, . . . ), Matrices by boldface capital letter (A, B, . . . ), vectors are denoted by boldface lowercase letters (a,b,c,. . . ). The ith entry of a vector is denoted by ai , element (i, j) of a matrix A is denoted by aij and element (i, j, k) of 3-order tensor A is denoted by Aijk . Here in this research paper, we are basically providing three recommendations and each recommendation is using 3order Tensors. The proposed tensor factorization technique is designed for maximizing the mutual benefit of both the considered domains.
1) For Organizers: R1- Recommending place(city) to conduct an event As shown in figure 4, some categories of events are attracting more attendees at some specific place. While using CDCF approach for recommending appropriate place to conduct an event to organizer considered 3-order dimensions are event tag (t), place (p) and count of past audience for specific event tags at specific place(c). Three unfolding horizontal, lateral and frontal sides of considered three order tensor A for R1 (recommendation 1) are illustrated in figure 7 and these three sides horizontal, lateral
Figure 8: Visualization of the three unfolding of considered 3order tensor to recommend Target Audience(to organizer) and event (to user).
and frontal denotes event tag (t), place(p), count of past audience respectively as depicted in figure 7. where considered 3 order tensor A, denoted by At:: (event tag vector), A:p: (Place vector), A::c. (Attendees count vector) where At:: event tag vector data collected from Bookmyshow, A:p: past conducted event’s place vector data mapped from both domains Bookmyshow and Facebook and finally A::c. past event’s attendees count vector data collected from Facebook.
2) For Organizer: R2- predicting target audience i.e., audience those might be interested to attend a specific category of event. To predict target interested audience for an event using 3order tensors. Dimensions of 3-order tensor are event tag tensor (t), place tensor (p) and user tensor (u). 3 order
Unauthenticated Download Date | 10/15/16 3:09 PM
Cross-domain based Event Recommendation using Tensor Factorization
tensor B for target audience prediction (recommendation 2) is illustrated in figure 8 where horizontal, lateral and frontal slides of considered 3 order tensor B are denoted by Bt :: (event tag vector), B:p: (Place vector), B::u. (User vector) respectively where Bt:: event’s tag vector data collected from Bookmyshow, B:p: past conducted event’s place vector data formed by mapping from both domains Bookmyshow and Facebook and finally B::u. user vector data collected from Facebook.
3) For User:R3- Recommending event of their interest on the basis of past attended events For event recommendation to user considered dimensions are same as in section B (tag, place and user) and same 3 order tensor B is used for event recommendation to user.
5.2 Proposed tensor reduction Approach In this section we provide details to apply Latent Dirichlet Allocation (LDA) to reduce tensors with less prediction error towards recommendation. We apply LDA to event tags to find topic level distribution and extract topic out of these event tags on the basis of user preferred event tags 1
| 133
in the past. Basically, LDA assign probability along with most appropriate topic chosen from the tags topic such as hill climbing event tag, skiing event tag, paragliding event tag all are under sports and adventure topic level distribution with the help of LDA and LDA assigns event matching probability to each user according to users’ past preferences (past attended events). LDA based event matching is best practice because it matches events according to its topic category otherwise matching of event on the basis of a specific tag/ single tag such as recommend just on the basis of hill climbing and to recommend only its associated events is not perfect practice. Therefore novelty of this research work is to recommend event to user and target audience to organizer on the basis of LDA assigned topic and its matching probability is useful in improving accuracy and reducing prediction error. Here in his Cross Domain based recommendation, we have two 3-order tensor (A, B). Tensor A contains data which was represented by set of 3 triplets {p, c, t} as already mentioned and Tensor B contains data which is also set of 3 triplets {p, u, t}. Tensor reduction uses inputs of tensor ^ and B, ^ apply LDA to get resulting tensor A, ^ B ^ which is A represented by quadruplet {p, c, t, w1} and {p, u, t, w2} respectively for R1 & R2. Tensor B further used to assign weight with respect to R3 using LDA, So resulting tensor ^ for R3 is represented as {p, u, t, w3}. The Algorithm1 deC tails used factorization scheme of improving cross domain based recommendation with the help LDA. We could freely choose the number of topics to be identified by LDA; and we chose 30, the number of topics covered by events in our chosen event site Bookmyshow. Further, we set up LDA such that an event would belong to at most 3 topics according to mapping of tag with Bookmyshow topic list and we consider that an event relating to a topic only if probability of that topic in users’ past preference list is at least 0.50. Pseudo code related to this is listed as follows: • Construction of Input tensor A, B using triplet data {place, count of past attendees, tag} and {place, user, tag} respectively. • Matricization of tensors means transforming a tensor into all 3 mode matrix for both Tensors A, B means creating 3 matrix one for each mode means total 6 matrices for both tensors. • Applying Latent Dirichlet Allocation (LDA) using python library [26] to assign weight on the basis of tags class of similar kinds of events as detailed in Table 7. Applied LDA is based on the hypothesis that while choosing a specific type of event, there are certain usual associated tags of that type of event. For a chosen event assigning an appropriate prob-
Unauthenticated Download Date | 10/15/16 3:09 PM
134 | A. Arora et al. ability from the pool of tags of that specific type of event is required. An event can then be represented as a mixture of different tags. Therefore, LDA is used to match similarity of events by grouping features (tags) of events into unobserved tags. A mixture of these sets then constitutes the observable event. As shown in Table 7, proposed approach assigns ^ with one added ^ B, ^ C weight using LDA and construct A, ^ our approach ^ B, ^ C tensor weight. Further on the basis of A, will recommend new event to user as well as target audience to organizer. LDA gives the probability along with most appropriate topic chosen from the users’ past preferred event list. Here we have to assign topic according to users’ past attended events and recommend upcoming events according to calculated probability (equation 3). Definitely highest probability one is most preferred event. This is known as Bayes Rule and it basically minimize the prediction error. So, if there are ‘t’ topics, the Bayes’ rule is to assign the event to a specific topic (equation 2) where P(i|e t ) > P(j|e t ) for∀j ≠ i
(2)
To know the P(i|e t ) that an event tag belongs to group i is difficult to obtain therefore we can get P(e t |i) which is measures through equation 3 P(et |i).P(i) P(i|e t ) = ∑︀ P(et |j).P(j)
(3)
^ after assigning weight using LDA Table 7: Reconstructed Tensor B for r3: recommending event to User1 and user2.
User U1 U2 U1 U3
U2 U1
U2
U3
Event Name Adlabs Imagica Zangoora Bungee Jumping Sunday climbing session Zangoora Watefall Rappelling Watefall Rappelling Watefall Rappelling
Tag
Place
Adventure
Banglore
Drama
New Delhi Banglore
Sport
Weight
hill Climbing
New Delhi
Bollywood
New Delhi Banglore
0.83
Banglore
0.08
Banglore
0.46
Fun, mountaining Sport Fun, Mountaing, Sport Fun, Mountaining, Sport
6 Result
∀j
Where P(i) and P(j) are prior assigned probability that topic i and j respectively are known group without any measuring factor. Therefore using LDA Waterfall Rappelling event weights are 0.83, 0.08 and 0.46 to user U1, U2 and U3 respectively and LDA assigned topic for this event is sports and as U1 attended all previous events were under this category therefore more weight assigned to this event for U1. This is not the case for U2, U2 user lies in class ‘Entertainment’ and U3 is attended just one event in history so more weight than U2 but less than U1. Probability of weight is increasing on the basis of past event record. ^ 4-order tensor having one ^ B, ^ C, • Construction of A, added tensor ‘weight’ that reduces the dimensionality for R1, R2 & R3 respectively. • Recommend R1, R2 and R3 on the basis of weight of ^ respec^ B, ^ C the elements of reconstructed tensor A, tively
Whenever an event is planned, the organizers goal is to make the event successful by having large number of attendees. Also, attendees want to spend their time to find an event in coherence to his interests. In both the cases, huge data is present and finding out target audience or preferable events is a cumbersome task. Our proposed tensor factorization method is used to improve the recommendation by taking benefits of cross domain multiple dimensions. Hence, this section discusses about the experimentation and results that our approach performs for efficient recommendation in forms of reduction of less relevant data whereas efficiency has been measured in terms of recall and precision of recommended events. As we already discussed that this approach is designed to benefit both domains - organizer as well as user who wish to attend events of his interest. Therefore, in this section we briefly discussed results for all three recommendations- Place recommendation to organizer, Target audience recommendation to organizer and preferable event recommendation to user.
Unauthenticated Download Date | 10/15/16 3:09 PM
Cross-domain based Event Recommendation using Tensor Factorization
| 135
Figure 9: Comparison of Matrix Factorization and Tensor Factorization for recomending place to conduct an event to organizer.
6.1 Result 1: R1- Recommending place(city) to conduct an event. As shown earlier also in data collection section we collected Bookmyshow and Facebook mapped event data of seven cities. This recommendation compares the usability of proposed tensor factorization over standard matrix factorization. We use past count of event attendees as third context information for tensor factorization other than place and tag. Therefore, figure 9 shows result for an upcoming event which is tagged as ‘English’, ‘Festival’, ‘Dance’ and ‘EDM’ (please refer figure 10 for user interface). On the basis of these tags matrix factorization recommends all seven cities whereas TF reduces dimensions according to maximum count of past similar event attendees and recommend only two cities to conduct event. We observe on various set of tags that adding information (count of attendees) in the form of context does make a significant reduction in recommendation result (average 67% reduction in place recommendation).
6.2 Result 2: R2- predicting target audience those are interested to attend event. We compare 3-order tensor factorization result to traditional MF. We first apply matrix factorization which searches for the events from the tags and recommends users who had previously gone to similar events, thus it provides us with a large target audience to focus on. Further we apply tensor factorization by adding place as new context variable. These techniques provide target users/ interested attendees from 83659 user’s dataset. Figure 11 and figure 12 shows the target audience recommendation reduction and recommendation accuracy
Figure 10: CDCF based event recommendation User Interface.
result using tensor factorization and matrix factorization on real Facebook and Bookmyshow data respectively. We observe that adding place as a context variable and using LDA for tensor reduction make a significant difference in the reduction of recommended data. We tried this comparison for one tag, two tag and three tag based recommendation and able to reduce 57%-72% (range) target audience reduction for variable categories of tags(refer figure 11).
Figure 11: Comparison of Matrix Factorization and Tensor Factorization for Target audience recommendation using 1 tag, 2 tags and 3 tags to organizers.
LDA based tensor reduction algorithm attains 68% precision and 37% recall while the MF is 52% precision and 23% recall (figure 12). Thus, it shows that tensor factorization and LDA based reduction are more robust in finding the relevant target audience for organizer.
Unauthenticated Download Date | 10/15/16 3:09 PM
136 | A. Arora et al. giving us the tags of events the users went to. This gives us category tags of events that can again be used to recommend events to the user according to his interests. We got 63% reduction in recommended events after adding place as third tensor using Tensor factorization (figure 13). LDA based tensor reduction algorithm attains 62% precision and 28% recall while the MF is 50% precision and 17% recall (figure 14). Thus, it shows that tensor factorization and LDA based reduction are more robust in finding the relevant event to users. Figure 12: Precision and recall using Tensor Factorization and Matrix Factorization of target audience recommendation to organizer.
Figure 13: Comparison of Matrix Factorization and Tensor Factorization for event recommendation to user.
7 Conclusion and Future Scope Predicting human nature and their preferences hundred percent accurately is a complex task. Data mining algorithms can only predict them accurately to some extent. Understanding the data and its tables is a very important task. Going step by step lets us predict the result with greater accuracy. In this paper we have come up with the fact that an appropriate dataset is of utmost importance in order to make accurate recommendations, incorporating data from two different platforms increases the diversity as well as scope for the recommendation procedure and introduction of n-order tensors (3 in our case) leads to the reduction of number of items to be recommended with improved accuracy. For the future work we will consider increasing the number of tensors being considered which might lead to improved accuracy. Also we can use topic modeling (a sub topic of text mining) to apply on description of events to find appropriate tags from Facebook as these are not explicitly available otherwise and on them we could use some better tensor reduction algorithm for better quality of recommendations.
References [1] Figure 14: Precision and recall using Tensor Factorization and Matrix Factorization of event recommendation to user.
6.3 Result 3: R3- Recommending event of user’s interest on the basis of past attended events and place
[2]
[3]
Matrix factorization searches for the events from the user id to find out the event ids of previously visited events, thus
Abel, F., Gao, Q., Houben, G. J., & Tao, K. (2011). Analyzing user modeling on Twitter for personalized news recommendations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6787 LNCS, pp. 1–12). Anaya, A. R., Luque, M., & Garc??a-Saiz, T. (2013). Recommender system in collaborative learning environment using an influence diagram. Expert Systems with Applications, 40(18), 7193– 7202. Bostandjiev, S., Donovan, J. O., & Höllerer, T. (2012). TasteWeights : A Visual Interactive Hybrid Recommender System. Proceedings of the 6th ACM Conference on Recom mender Systems - RecSys ’12, 35–42.
Unauthenticated Download Date | 10/15/16 3:09 PM
Cross-domain based Event Recommendation using Tensor Factorization
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
Boutsis, I., Karanikolaou, S., & Kalogeraki, V.(2015). Personalized Event Recommendation Using Social Networks. In Mobile Data Management (MDM) 16th IEEE International Conference on(Vol. 1, pp.84-93) Cho, Y. H., Kim, J. K., & Kim, S. H. (2002). A personalized recommender system based on web usage mining and decision tree induction. Expert Systems with Applications, 23(3), 329–342. Daly, E., & Geyer, W. (2011). Effective event discovery: using location and social information for scoping event recommendations. Proceedings of the Fifth ACM Conference on 277–280. De Pessemier, T.,Minnaert, J., Vanhecke, K., Dooms, S., & Martens,L.(2013). Social recommendation for events. In CEUR workshop Proceedings(Vol. 1066) Dooms, S., De Pessemier, T., & Martens, L. (2011). A user-centric evaluation of recommender algorithms for an event recommendation system. In CEUR Workshop Proceedings (Vol. 811, pp. 67–73). Dooms, S., De Pessemier, T., & Martens, L. (2013). Movietweetings: a movie rating dataset collected from twitter. In Workshop on Crowdsourcing and human computation for recommender systems, CrowdRec at RecSys(Vol. 2013, p.43) Fernández-Tobías, I., Cantador, I., Kaminskas, M., & Ricci, F. (2012, June). Cross-domain recommender systems: A survey of the state of the art. InSpanish Conference on Information Retrieval. Huang, A.-J., Wang, H.-C., & Yuan, C. W. (2014). De-virtualizing social events: understanding the gap between online and offline participation for event invitations. Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing - CSCW ’14, 436–448. Retrieved from http://dl.acm. org/citation.cfm?id=2531602.2531606 Karatzoglou, A., Amatriain, X., & Oliver, N. (2010). Multiverse Recommendation : N-dimensional Tensor Factorization for Context-aware Collaborative Filtering. Context, 79–86. Retrieved from http://portal.acm.org/citation.cfm?id=1864727 Kim, K. jae, & Ahn, H. (2008). A recommender system using GA K-means clustering in an online shopping market. Expert Systems with Applications, 34(2), 1200–1209. Kolda, T. G., & Bader, B. W. (2008). Tensor Decompositions and Applications. SIAM Review, 51(3), 455–500. Retrieved from http: //epubs.siam.org/doi/abs/10.1137/07070111X Li, B. (2011). Cross-domain collaborative filtering: A brief survey. In Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI (pp. 1085–1086).
| 137
[16] McLaughlin, M. R., & Herlocker, J. L. (2004). A collaborative filtering algorithm and evaluation metric that accurately model the user experience. Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval - SIGIR ’04, 329–336. Retrieved from http://portal. acm.org/citation.cfm?doid=1008992.1009050 [17] Minkov, E., Charrow, B., Ledlie, J., Teller, S., & Jaakkola, T. (2010, October). Collaborative future event recommendation. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 819-828). ACM. [18] Park, D. H., Kim, H. K., Choi, I. Y., & Kim, J. K. (2012). A literature review and classification of recommender systems research. Expert Systems with Applications. [19] Shi, Y., Larson, M., & Hanjalic, A. (2014). Collaborative Filtering beyond the User-Item Matrix : A Survey of the State of the Art and Future Challenges. ACM Computing Surveys (CSUR), 47(1), 1–45. http://doi.org/http://dx.doi.org/10.1145/2556270 [20] Symeonidis, P., Nanopoulos, A., & Manolopoulos, Y. (2010). A unified framework for providing recommendations in social tagging systems based on ternary semantic analysis. IEEE Transactions on Knowledge and Data Engineering, 22(2), 179–192. [21] Zhang, Y., Cao, B., & Yeung, D.-Y. (2012). Multi-domain collaborative filtering. arXiv Preprint arXiv:1203.3535. [22] Hu, L., Cao, J., Xu, G., Cao, L., Gu, Z., & Zhu, C. (2013, May). Personalized recommendation via cross-domain triadic factorization. In Proceedings of the 22nd international conference on World Wide Web (pp. 595-606). International World Wide Web Conferences Steering Committee. [23] Loni, B., Shi, Y., Larson, M., & Hanjalic, A. (2014). Cross-domain collaborative filtering with factorization machines. In Advances in Information Retrieval (pp. 656-661). Springer International Publishing. [24] Katta, E., & Arora, A. (2015, August). An improved approach to English-Hindi based Cross Language Information Retrieval system. In Contemporary Computing (IC3), 2015 Eighth International Conference on (pp. 354-359). IEEE. [25] Frolov, E., & Oseledets, I. (2016). Tensor Methods and Recommender Systems. arXiv preprint arXiv:1603.06038. [26] https://pypi.python.org/pypi/lda [27] Behl, D., Handa, S., & Arora, A. (2014, February). A bug mining tool to identify and analyze security bugs using naive bayes and tf-idf. In Optimization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on (pp. 294-299). IEEE.
Unauthenticated Download Date | 10/15/16 3:09 PM