28 Modeling Buying Motives for Personalized ... - ACM Digital Library

Modeling Buying Motives for Personalized Product Bundle Recommendation GUANNAN LIU, Beihang University YANJIE FU, Missouri University of Science and Technology GUOQING CHEN, Tsinghua University HUI XIONG and CAN CHEN, Rutgers University

Product bundling is a marketing strategy that offers several products/items for sale as one bundle. While the bundling strategy has been widely used, less efforts have been made to understand how items should be bundled with respect to consumers’ preferences and buying motives for product bundles. This article investigates the relationships between the items that are bought together within a product bundle. To that end, each purchased product bundle is formulated as a bundle graph with items as nodes and the associations between pairs of items in the bundle as edges. The relationships between items can be analyzed by the formation of edges in bundle graphs, which can be attributed to the associations of feature aspects. Then, a probabilistic model BPM (Bundle Purchases with Motives) is proposed to capture the composition of each bundle graph, with two latent factors node-type and edge-type introduced to describe the feature aspects and relationships respectively. Furthermore, based on the preferences inferred from the model, an approach for recommending items to form product bundles is developed by estimating the probability that a consumer would buy an associative item together with the item already bought in the shopping cart. Finally, experimental results on real-world transaction data collected from well-known shopping sites show the effectiveness advantages of the proposed approach over other baseline methods. Moreover, the experiments also show that the proposed model can explain consumers’ buying motives for product bundles in terms of different node-types and edge-types. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Filtering— Recommendation; H.2.8 [Database Management]: Database Applications—Data Mining General Terms: Algorithms, Design, Experimentation Additional Key Words and Phrases: Product bundle, recommendation, buying motives, probabilistic graphical model ACM Reference Format: Guannan Liu, Yanjie Fu, Guoqing Chen, Hui Xiong, and Can Chen. 2017. Modeling buying motives for personalized product bundle recommendation. ACM Trans. Knowl. Discov. Data 11, 3, Article 28 (March 2017), 26 pages. DOI: http://dx.doi.org/10.1145/3022185

The work was partially supported by the MOE Project of Key Research Institute of Humanities and Social Sciences at Universities (12JJD630001) and the National Natural Science Foundation of China 71110107027/71490724/71329201/71531001. Authors’ addresses: G. Liu, Beihang University, Beijing 100191, China; email: [email protected]; Y. Fu, Missouri University of Science and Technology, 1870 Miner Cir, Rolla, MO 65409; email: [email protected]; G. Chen, Tsinghua University, Beijing 100084, China; email: [email protected]; H. Xiong and C. Chen, Rutgers University, 1 Washington Park, Newark, NJ 07029; emails: {hxiong, kenchen}@rutgers.edu. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c 2017 ACM 1556-4681/2017/03-ART28 $15.00 DOI: http://dx.doi.org/10.1145/3022185

ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28

28:2

G. Liu et al.

1. INTRODUCTION

As a widely used marketing strategy, product bundling offers several products/items for sale as one combined product. For example, the hamburger and french fries bundle available at McDonald’s can be purchased at a lower price than purchased separately. Also, telecommunication companies usually sell phones at a cheap price together with two-year service contracts. In online shopping, product bundling becomes even more salient in consumers’ buying behaviors. It was reported that 1/3 of the orders had more than two items in a single purchase at Walmart.com [Zhu et al. 2014]. Indeed, it is usually much easier for online customers to get access to different products with only a few clicks, and they are more likely to be directed to different items recommended by the shopping sites. Moreover, online merchants may often launch various promotional campaigns for product bundling sales. For instance, if the total price of a transaction surpasses a certain value, the shipping fee can be waived or a certain amount of discounts could be granted. In addition to price concerns, product bundles could also be designed by offering special combinations of items that are of functional or convenient features. Examples include a collection of several CDs in memory of a music star, a set of brand camera lens accessories, a package of honey syrup bottles from several flowers, and the like. In general, compared to traditional shopping, online shopping enables the merchants to provide the consumers with more choices and varieties for product bundles, as well as with more personalized bundling services, which are deemed desirable and important in practice. In pursuit of the benefit of product bundling, one straightforward way for online merchants to design product bundles is to find the items that are frequently bought together, which could be discovered, say, by association rule mining [Tan and Kumar 2005]. The discovered itemsets could then be designed as bundles and recommended to the potential consumers for their purchases. Though useful, such bundles are often treated as associative patterns that are applied to all potential consumers, without distinguishing consumers’ personalized preferences. Thus, due to the fact that consumers buy product bundles with various buying motives, driven by their budgets (e.g., free shipping), preferences (e.g., functionality, brand, and uniqueness), and so on, recommending different product bundles to their respective consumers is considered meaningful and important for precise marketing and consumer identification. Figure 1 shows an example of two different consumers in buying product bundles. Lucy bought three pieces of brand-name perfume with sample size together as a product bundle; Lily bought the full size perfume Chanel Chance along with the other two pieces of sample perfume in her transaction. Although these two product bundles have two items in common, the buying motives of the two consumers for the bundles might be quite different due to the difference in the compositions of the product bundles. The target item of Lily may be the most expensive Chanel Chance, and she may also wanted to try some other brands or fragrance, so she chose the two other perfumes with sample size as supplementary items and form a product bundle. While Lucy also preferred perfumes with big brands, but she might be limited by her purchasing budget; or just need several items together to be eligible for free shipping. Note that in association rule mining, however, both of the two bundles may be discovered as itemsets if they were purchased frequently by different groups of consumers. Apparently, the recommendation based on such itemsets may not be effective to all potential consumers. Therefore, it remains to be a challenging task for designing and recommending product bundles to the consumers in a personalized manner that takes into account various preferences and buying motives toward product bundles. In parallel, the phenomenon of product bundling has been studied in both economics and marketing fields [Bakos and Brynjolfsson 1999; Derdenger and Kumar 2013; Garfinkel et al. 2006; Stremersch and Tellis 2002; Yan et al. 2014]. These studies ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

Modeling Buying Motives for Personalized Product Bundle Recommendation

28:3

Fig. 1. An example of product bundle purchase.

have distinguished different types of product bundles, and found that the relationships between products (e.g., complements and substitutes) could influence the performance of product bundling. The strategy of product bundling has been shown empirically to be economically effective, both for merchants and consumers. In prior studies, the associations between items are mostly established due to the co-occurrence in consumers’ purchased bundles; however, the underlying reasons for them to be bought together are merely investigated, and especially these studies do not address the issue of product bundling from the personalized viewpoint. Meanwhile, some efforts have also been made to understand consumers’ preferences toward products and design personalized recommendation approaches. However, these approaches mostly consider each purchased item solely, while the co-occurrence of items in product bundles is ignored. As a matter of fact, the co-occurrence of items in a bundle usually reflects the relationships between items with different features, which can help find consumers’ preferences toward different types of product bundles. Take a look at Figure 1 again, the two customers are seemingly quite similar in terms of the items they have bought. Thus, in traditional recommendation methods such as collaborative filtering [Adomavicius and Tuzhilin 2005; Ricci et al. 2011], the item Chanel Chance bought by Lily may be recommended to Lucy due to the similarity in their purchase histories (i.e., they both bought Chanel Sample and Burberry Sample). Here, the effectiveness of such recommendations may not be satisfactory and therefore needs to be improved since the consumers like Lucy would be unlikely to accept the expensive full size perfume to form a bundle purchase. Similarly, another example is buying digital cameras as bundles as follows. Tom is buying a Nokon D810 camera body at $2499 with a SanDisk SD card at $20 and a LensPen kit at $12, while Wim is buying a Canon PowerShot SX600 camera at $169 with a SanDisk SD card at $20 and LensPen kit at $12. Although Tom and Wim both share the same sets of items (i.e., a Sandisk SD card and a LensPen kit), they may not belong to the same kind of consumers, as they differ in preference for such features as price, functionality and type, where Tom is a DSLR camera user for photography hobby, and Wim is a compact camera user for simple family consumption who knows very little about photography at all. In a nutshell, the bundling phenomena make items associative with each other. The relationships between items are established not only due to their co-occurrence in a bundle but also the association on particular feature aspects. By identifying the relationships between items, the buying motives for product bundles can be discovered ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28:4

G. Liu et al.

in terms of how the related feature aspects are considered by the consumers. This motivates our effort in recommending personalized bundles in light of feature aspects. Therefore, in this article, the item relationships in product bundles are modeled to better capture consumers’ preferences and buying motives. First, each product bundle consists of several items, and items are usually related to each other on particular feature aspects. Thus, we introduce a bundle graph to represent the relationships between items, in which the items are regarded as nodes and every two items within the purchased product bundle are connected with an edge. Second, the nodes in bundle graphs carry the features of items they represent. Since the feature reflect different aspects of items, we introduce the latent factor Node-Type to cluster nodes over the mixture of features, and reduce the nodes to different feature aspects. Third, the relationships between items arise from the association between item feature aspects, and we introduce the latent factor Edge-Type to explain the formation of edges in bundle graphs. Each edge-type can be viewed as an abstraction of the relationships between the nodes at the two ends, connecting pairs of particular node-types. Therefore, the type of a product bundle can be represented with a mixture of edge-types. Moreover, consumers’ buying motives for product bundles can be represented as preferences toward different edge-types. Along this line of thinking, a probabilistic model named Bundle Purchase with Motives (BPM) is proposed to capture the relationships between items for each purchased product bundle. The model generates the product bundle purchase in a probabilistic framework and organizes the transaction records as bundle graphs. The generative process of bundle graphs starts from consumers to edge-types, to the pair of node-types, and finally to the nodes connected by the edge which represent the purchased items. Meanwhile, the nodes are mapped to the latent factor node-type in accordance with their respective item features, which are modeled in a separate generative process. The model is learned to analyze consumers’ preferences toward different types of product bundles, which further helps explain their buying motives. The consumers’ preferences inferred from the model are exploited to design an approach to recommending bundling items for consumers to form bundles in a personalized manner. Finally, experimental evaluations on real-world online shopping data will be conducted to validate the performances of the proposed recommendation approach. The experimental results show the advantages of the proposed model in terms of the effectiveness of our approach for understanding buying motives and recommending product bundles. 2. RELATED WORK

Existing research relating to our work could be categorized in three streams of efforts on product bundling, namely, in marketing and economics studies, in recommender systems research, and in link analysis for graph structures. In marketing and economics studies, product bundling has mainly been discussed as a promotional strategy [Harlam et al. 1995; Stremersch and Tellis 2002]. Stremersch and Tellis [2002] systematically defined product bundling, and identified different types of product bundling. They also aimed to provide the companies with better decisions in designing different types of product bundling, as they found that different bundles could bring distinct performance in sales and profits. Bakos and Brynjolfsson [1999] found that bundling unrelated information goods could be profitable, and they also analyzed the influence of complements and substitutes on product bundling. Yan et al. [2014] revealed that product complementarity significantly influence the performance of product bundles. Derdenger and Kumar [2013] studied product bundling from the perspective of customer valuations with dynamic settings, and they also examined customer segmentation in affecting the effectiveness of product bundling. In brief, the ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.


28:5

literatures in this field mostly focus on the economic efficiency in designing product bundles for companies or consumers. However, their findings and respective bundling strategies were usually not individual oriented, and they could hardly devise specific strategies for how to bind items together as product bundles. In Recommender systems research, several efforts have been made in coping with the bundle recommendation problem, where their settings were based on certain defined optimization goals such as customers’ utility or companies’ profits. Xie et al. [2010] defined composite recommendation as recommending a set of items which meet users’ predefined requirements for cost. They developed greedy algorithms to generate topk recommendations to maximize the total value of items in the package for users. Garfinkel et al. [2006] formulated the recommendation problem as a set covering problem, in which the cost of bundling was minimized. By approximating the probability of users’ buying behaviors from purchase history, Zhu et al. [2014] defined the reward of buying a product bundle as a probabilistic setting in a user-specific fashion, and proposed an optimization algorithm to maximize the total rewards. However, these efforts only considered the overall benefits of product bundles, but did not address the compositions of a product bundle. In addition to these efforts, recently, probabilistic models have been employed to describe users’ choices and purchasing histories, in which items were clustered into distinct topics, and users’ preferences for items were represented as distributions over latent structures [Chua et al. 2013; Liu et al. 2014; Ye et al. 2012]. Wang and Blei [2011] proposed a collaborative topic model to generate users’ choices in documents, and recommended documents in combination with collaborative filtering metrics. In application contexts, Ge et al. [2014] investigated how to recommend travel packages in a cost-effective way, where users’ preferences for cost-effectiveness were captured. Moreover, group recommendation has also become a focal point of research [Gorla et al. 2013; Liu et al. 2012; Yuan et al. 2014], which is oriented to recommending the items for a specific group composed of multiple users. Overall, the problem solved in these studies is oriented to recommending product individually, regardless of the relationships between items bought together in one transaction. In Link analysis for graphs, researchers have developed different relational learning methods to capture the relationships between entities. Blei and Lafferty [2007] proposed Correlated Topic Model to discover the correlations between topics and construct topic graph, and applied the model to large document collections. Yu et al. [2006] exploited nonparametric learning approaches to study relational data and designed a stochastic relational model. Recent studies on link analysis have also incorporated node attributes and contextual information in predicting links in graphs. Nallapati et al. [2008] jointly modeled the texts and links established by citations in the framework of topic modeling to predict links and identify community structures. Wang et al. [2007] proposed a probabilistic model with both topological and semantic similarities of nodes incorporated, in order to predict the probability of co-occurrence of nodes in social networks. Gong et al. [2014] integrated node attributes and link structures in social networks to predict both attributes and links in social networks. Moreover, some efforts applied mixed membership stochastic blockmodel [Airoldi et al. 2009] to predict links in social networks. Jamali et al. [2011] propose to model both the social relations and user rating data in a framework of stochastic blockmodel, which allows users and items to be assigned to different groups. Barbieri et al. [2014] aimed to explain the formation reasons of links in topical or social ways, and they considered node features to develop a model WTFW for generating the social network structure, which was further discussed in recommending links with explanations. In sum, these efforts regard the relationships between different entities as links in graphs, however the links in these studies are treated uniformly and are not categorized as different types. ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28:6

G. Liu et al.

3. BUNDLE PURCHASE WITH MOTIVATIONS 3.1. Problem Statement

Shopping is indeed a comprehensive process of decision-making by consumers, and the process is generally associated with their behavioral patterns. When deciding to buy a product bundle, the consumers would consider each individual item along with how the items within a product bundle can collaboratively magnify their utilities in the purchases. From the perspectives of merchants, they aim to understand consumers’ behaviors in buying product bundles and recommend appropriate product bundles that meet consumers’ purchasing preferences. Let U be the set of consumers, I be the item set, and F be the feature set of items. Each customer u ∈ U has made multiple transactions Tu, where each transaction t ∈ Tu is composed of several items in I. Each item x ∈ I is with a set of features Fx ⊂ F (e.g., price, brands, and functions). The transactions with two or more items can be regarded as product bundles. In this article, rather than modeling consumers’ preferences toward individual product, we propose to model the preferences toward product bundles, in order to jointly uncover the relationships between items. Moreover, the preferences can be further exploited to explain consumers’ buying motives, and to recommend a list of items for consumers to form their respective product bundles. Thus, given a consumer u and the item x ∈ I already in the shopping cart, the recommendation problem is then defined as recommending a list of items R(t) such that u may choose items in R(t) in accompany with item x to form a personalized product bundle for u. 3.2. General Ideas

According to the marketing theories in consumer behavior [Peter et al. 1999], consumers usually follow particular behavioral patterns when buying products [Liu et al. 2015], which are driven by their preferences and innate motives. Before buying a product, consumers will consider different aspects of the candidate products including brand, price, function, and so on, and finally choose the product that best fit their preferences. While for a product bundle, consumers will further consider how the bundled products can benefit them instead of buying each separately. In this way, consumers may behave differently in buying various types of product bundles [Stremersch and Tellis 2002; Garfinkel et al. 2006; Harlam et al. 1995]. In this article, we aim to find consumers’ preferences for product bundles in consistency with the following intuitions. Intuition 1: Each item is attributed by multiple aspects of features such as brand, price, function, and so on; thus, items can be categorized to several groups with respect to these feature aspects. Each group corresponds to the items with similar feature aspects. Intuition 2: Items in a product bundle are usually related to each other, such relationships can be described by the associations of their feature aspects. Item relationships can be categorized in accordance with specific feature aspects. Intuition 3: All the pairwise item relationships within a product bundle together constitute the pattern of the product bundle, and further reflect the buying motives. Intuition 4: Consumers’ choices of product bundles can be reflected by their preferences for different relationships between items. For example, the budget-sensitive consumers may be attracted by a pair of items with lower price than purchased separately otherwise, in order to save money. Accordingly, in describing the relationships between items, we introduce a graph for a bundle where nodes represent items, and edges that connect nodes represent associations between items in the bundle. In this way, each bundle is modeled as a complete undirected graph, named a bundle graph. Thus a product bundle with n items can be represented by a n-node bundle graph with n(n−1) edges. For illustrative 2 ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.


28:7

Fig. 2. Example of bundle graphs with different number of nodes.

Fig. 3. The framework of modeling intuitions for product bundles.

purposes, Figure 2 shows three product bundles each represented by a bundle graph with two, three or four nodes having 2, 3, or 6 edges, respectively. Note that the nodes in the graph are heterogeneous since each node carries different features of the item it represents. This gives rise to the notion of node attributes with respect to item features (e.g., price and brand). As shown in Figure 3, each node in the bundle graph is attributed with its corresponding item features1 . With respect to Intuition 1, the values of attributes a node takes (such as “high” for price) may play a role in categorizing this node into a certain node group or a certain type of nodes (such as “expensive” node group). In this regard, a latent factor Node-Type can be introduced, which is in spirit analogous to topics of documents over a mixture of words in topic models [Blei et al. 2003]. Thus, each Node-Type is represented by a mixture of node features, which also corresponds to a group of nodes with similar features. As an example, Figure 3 has node A belonging to the type “full size perfume of luxury brand with high price,” and nodes B and C belonging to the type “sample size perfume of luxury brand with low price,” where nodes A and B take different values for 1 Hereafter, for the sake of convenience, the terms “attribute(s)” and “feature(s)” will be used interchangeably (otherwise indicated where necessary) in the following discussions.


28:8

G. Liu et al.

Fig. 4. Bundle graphs of Lucy and Lily’s purchase. The symbol “+” on the edges denotes the relationships between two similar nodes, while the symbol “-” denotes the relationships between two different nodes.

features size and price, while nodes B and C share similar node-types. With the latent factor node-type, each node belongs to a particular type with a certain degree, which could be described in terms of a probability distribution over node-types as shown in Figure 3. With respect to Intuition 2, items are associative with each other in their roles corresponding to the feature aspects (e.g., price and brand). Correspondingly, the nodes in a bundle graph are connected by edges in light of a certain feature aspect concerned. For instance, in the cases where the items are functionally compatible, the corresponding nodes are considered to connect with “functional” edges, while in the cases where the items are bought as a bundle due to brand preferences, the nodes are connected with the edges being attributed to branding. Similarly, in the cases where items are bought due to budget-sensitivity, the nodes are then connected with the edges being attributed to price. In this regard, the relationships between items are modeled as formation of edges with respect to node features in bundle graphs. Each edge connects two nodes with certain types in bundle graphs, and the formation of the edge can be attributed to the pair of node-types at the two ends of the edge. Thus, another latent factor edge-type can be introduced to semantically represent the formation of edges. An Edge-Type can be represented with a group of similar edges which connects similar pairs of node-types; thus, the formation of edges can be further attributed to the association of particular node-types. In other words, an instance of an edge-type with respect to two node-types is an edge between two nodes, each belonging to a nodetype of concern. Symbolically, an edge type can be represented by a pair of connected node-types such as (node-type #1, node-type #2). For example, in Figure 5, we have a type of edge (“full size perfume of luxury brand with high price,” “sample size perfume of luxury brand with low price”). Here, an Edge-Type can be semantically explained as a form of relationships between items, which, more thoroughly, reflect consumers’ motives in buying product bundles. Similarly to node-types, each edge belongs to a certain type with some degree, where a probability distribution is used to describe the formation of edges. However, each single edge, if considered independently, cannot reflect the overall characteristics of a bundle graph. With respect to Intuition 3, the relationships between items together make a particular type of product bundle. Consider Figure 1 again, the bipartite graph of the two consumers’ purchase can be modeled as two bundle graphs respectively as shown in Figure 4. The three nodes in Lucy’s bundle graph (shown in Figure 4(a)) have similar values of features such as luxury perfume, sample size, and low price. Then, the three nodes may be labeled with similar node-types “sample perfume with low price,” leading to the three edges belonging to identical edge-type “(sample perfume with low price, sample perfume with low price),” denoted with the ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.


28:9

symbol “+”. In contrast, in Lily’s case (shown in Figure 4(b)), due to the different values taken on size and price for item Chanel Chance, the node may be labeled as another type “full perfume with high price.” Thus, this node forms edges with the other two nodes with the edge-type (“full perfume with high price,” “sample perfume with low price”), denoted with the symbol “-”. Note that the edge BC appear in both of the bundle graphs; however, the whole mixtures of edge-types over all the edges are different in the two cases, making two types of product bundles with different buying motives. Thus, in modeling bundle graphs, we need to consider all the nodes and edges at the bundle level. The two product bundles shown in Figure 4 belong to different types of product bundle, respectively. In other words, the two different bundle graphs represent two different buying patterns corresponding to different preferences of Lucy and Lily in terms of bundle purchases. Apparently, as above-mentioned, this enables us to treat the consumers differently in their respective personalized preferences. In accordance with Intuition 4, consumers’ preferences for product bundles can be decomposed as preferences for different relationships between items; thus, correspondingly, we can model consumers’ preferences as a mixture over edge-types. 3.3. Model Specification

Here, with the above discussions and intuitions, we model the transactional records of bundle purchase for each consumer in a probabilistic model. First, each transaction t of consumer u ∈ U is represented as a bundle graph Gt = (Vt , Et ), where Vt denotes the set of nodes representing the items, and Et denotes the edges between every two nodes. Let Nt = |Vt |, then the number of edges is |Et | = N2t since the bundle graph for t is an undirected complete graph. In order to capture the compositions of bundle graphs, we need to model the generative process of all the nodes and edges, along with the features of the nodes. In a bundle graph, each edge e connects a pair of nodes, we use (x) and (y) to denote the two ends of the edge. In light of the introduced latent factor node-type, all such nodes in bundle graphs can be clustered on certain features, represented with node-type z. Thus, in our model, for any node x, it is then sampled from the type-specific multinomial distribution z,x , which measures the degree that node x belongs to type z. We denote the types of the two nodes with z(x) and z(y) , respectively; thus, the pair of node-types (z(x) , z(y) ) constitutes the type of the edge le connecting the two nodes e = (x), (y). Then, the node-types can be sampled from the multinomial distribution of node-types on edge-types ϒl,z , which measures the degree that node-type z belongs to edge-type l. Moreover, consumers’ preferences are manifested by the mixture of edge-types in the bundle graph. Therefore, for each edge e ∈ Et in the bundle graph of consumer u, the edge-type le is drawn from a consumer-specific multinomial distribution u,l , which measures consumer’s preferences for a particular edge-type. Overall, the generative process of the bundle graph structure starts from the consumer to edge-type, and then to the pairs of node-types, and finally ends at the nodes representing the items. In this way, all the nodes and their connecting edges of a bundle graph are organized. Furthermore, the nodes and their features are modeled in a separate plate. To be specific, each node x in the network has a set of features ( f1x , f2x , . . . , f jx , . . .), and we denote a node-feature pair as a = (x , f jx ). For each node-feature pair, the node-type z is firstly sampled from a prior distribution . Then, the node x is sampled from the node-type z with the multinomial distribution z,x , and the corresponding feature is sampled simultaneously from the multinomial distribution over node-types z, f . In particular, the probability distribution z,x measures the degree that the node belongs to a type, which bridges the nodes in the two separate plates (i.e., the plate of the ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28:10

G. Liu et al.

Fig. 5. The graphical model of BPM. Table I. Notations Symbol U, I, F Gt = (Vt , Et ) e = (x), (y) l z ϒ α, β , τ , ρ, μ L K

Description User set, item set, feature set Bundle graph of transaction t with nodes Vt and edges Et Edge e ∈ Et , with (x) and (y) as the two ends Edge-type Node-type The distribution of nodes over specific node-type The distribution of features over specific node-type The distribution of node-types over specific edge-type The distribution of edge-types over specific consumer The multinomial distribution of node-types Dirichlet prior for , , , ϒ, The number of edge-types The number of node-types

bundle graph and the plate of the nodes with features). More formally, the graphical model is shown in Figure 5. In this figure, the shaded circles represent the observed variables, while the blank circles represent latent variables. The detailed description of the generative process is as follows and the notations used are listed in Table I. —Sample ∼ Dir(α) —For each consumer u ∈ U , sample u ∼ Dir(ττ ) ρ) —For each edge-type l, sample ϒl ∼ Dir(ρ —For each node-type z: α ), sample z ∼ Dir(μ μ) sample z ∼ Dir(α —For each transaction t ∈ Tu purchased by u, represent t with a graph Gt = (Vt , Et ): —For each edge e ∈ Et : —sample le ∼ Multi(u) —sample the types for the two nodes < xe , ye >: y zex ∼ Multi(ϒl ), ze ∼ Multi(ϒl ); —sample xe ∼ Multi(zex ), ye ∼ Multi(zey ). —For each node-feature pair a = (x ∈ I, f j ∈ F):: —Sample node-type z ∼ Multi() ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.


28:11

—Sample node x ∼ Multi(z ); —sample feature f j ∼ Multi(z ) 3.4. Model Inference

α , β , τ , ρ , μ } denote the hyperparameters of the Dirichlet priors, and Let = {α = {, , , , ϒ, ϕ} denote the multinomial distribution. Following the generative process of BPM, the joint distribution can be factored as u; ) p(zz x , z y |ll; ϒ) p(x x|zz x ; ) p(y y|zz y ; ) p(zz, l , x , y , f , u , |) = p(ll|u x |zz ; ) p(ff |zz ; ) p( |). p(zz |θ ) p(x

(1)

We first integrate out the parameters for each term in Equation (1): u; τ ) = u; ) p(|ττ )d p(ll|u p(ll|u

1 nlu+τl −1 u,l du (ττ ) u l=1 → (− nu + τ ) , = τ) (τ u =

L

(2)

→ L where − nu = {nlu}l=1 , and nlu denotes the count of edges in u’s bundle graphs that are assigned to edge-type l. (ττ ) can be computed as, (ττ ) =

L l=1 (τl ) L , ( l=1 τl )

where (·) denotes

the Gamma function. Similarly, we can also obtain the integrated forms for the other terms: −→ −→ (nz(x) + nz(y) + ρ ) l l , p(zz x , z y |ll; ρ ) = ρ) (ρ l

(3)

− → K where nlk = {nlk}k=1 , and nlk denotes the count of node-type k assigned to edge-type l. x x|zz ), p(y y|zz y ) p(x x |zz ) share the same multinomial distribution , we can inteSince p(x grate the parameter as follows: −→ −→ −→ (y) (x ) (n(x) z + nz + nz + α ) x y x|zz ; α ) p(y y|zz ; α ) p(x x |zz ; α ) = , (4) p(x α) (α z − → |I| where niz = {niz }i=1 , and niz denotes the count of node i assigned to node-type z. → (− nz + μ ) p(ff |zz ; μ) = , μ) (μ z

(5)

→ = {n } , and n denotes the count of feature f assigned to node-type z. where − n z f =1 z z There are two latent variables introduced in this model, namely l and z. Thus, we employ Collapse Gibbs Sampling to sample the latent assignment and update the parameters. Specifically, the sampling process consists of sequentially updating the latent assignment of edge-type for each edge le of a transaction and node(y) types ze(x) , ze for each node of an edge, as well as the node-type assignment z for nodes and their features. We need to sample from the full conditional distribution p(le |ll¬e , Rest), p(ze(x) |zz¬e , Rest), p(za |zz ¬a , Rest), and denote the remaining variables as f |F|

f


28:12

G. Liu et al.

Rest except for the variable to be sampled. Then, the full conditional distribution for edge-type le for each edge e ∈ Gt can be derived as follows: p(le = c|ll¬e , Rest) u) p(zz x , z y |ll) p(ll|u ∝ y x u¬e ) p(zz¬e p(ll¬e |u , z ¬e |ll¬e ) (x),k (y),k (y),k (x),k ncu,¬e + τc + nc + ρk) ( k nc,¬e + nc,¬e + ρk) k (nc . ∝ L (x),k (y),k (y),k (x),k l + nc + ρk) l=1 (nu,¬e + τl ) k (nc,¬e + nc,¬e + ρk) ( k nc

(6)

In Equation (6), ncu,¬e denotes the number of edges that belongs to the edge-type c except for the edge e, n(x),k c,¬e denotes the number of nodes that is located at position (x) with edge-type c and node-type k except for the edge e. We can further reduce the last two terms by applying the rule of Gamma function (x) = x(x − 1). In the second term of the equation, let the corresponding node-types for the two nodes of e is k1 k2 , if k1 = k2 , (y),k1 (y),k (y),k1 (y),k 1 1 (x),k1 (x),k1 n(x),k + nc 1 − 2, while if k1 = k2 , n(x),k + nc 1 − 1. In c,¬e + nc,¬e = nc c,¬e + nc,¬e = nc (x),k (y),k (y),k the last term, k nc,¬e + nc,¬e = k n(x),k + nc − 2. Thus , we can further apply the c rule of Gamma function (x + 1) = x(x) = x(x − 1)(x − 1) to reduce the equation into a computable form. For the nodes at the end of an edge e in the bundle graph, e = (xe ), (ye ), and let s ∈ I represents the specific node at (xe ) and r ∈ I represents the node at (ye ), then the (y),r corresponding node-types are (ze(x),s , ze ), which can be sampled as follows: x , Rest) p(ze(x),s = k|zz¬e x x|zz x ) p(y y|zz y ) p(x x |zz ) p(zz , z y |ll) p(x ∝ x x y y x¬e |zz¬e ) p(y y|zz ) p(x x |zz ) p(zz¬e , z |ll¬e ) p(x (y),k

(x),k + nl nl,¬e

∝ K

(y),z

(x),z z=1 (nl,¬e + nl

(7)

(y),s

n(x),s k,¬e + nk

+ ρk + ρz )

|I|

(y),i

(x),i i=1 (nk,¬e + nk

),s + n(x + αs k

),i + n(x + αi ) k

,

where n(x),s k,¬e denotes the count that node s, locating at the position (x) of an edge belongs to node-type k, except for the edge e. In the plate of node features, we need to infer the full conditional distribution for za for each node-feature pair a. Thus, for each node-feature pair a = (xi , f j ), the latent assignment can be sampled as follows: p(za = k|zz ¬a , Rest) =

x |zz ) p(x x|zz x ) p(y y|zz y ) p(ff |zz ) p(zz |) p(x x¬a |zz ¬a ) p(x x|zz x ) p(y y|zz y ) p(ff ¬a |zz¬a ) p(zz¬a |) p(x

nk,¬a + βk

= K

z=1 (nz,¬a

+ βz )

(y),i

+ nk n(x),i k

|I|

(x),s s=1 (nk

(y),s

+ nk

),i + n(x k,¬a + αi

),s + n(x k,¬a + αs )

(8)

j

nk,¬a + μ j

|F|

f f =1 (nk,¬a

+ μf)

,

where nk,¬a represents the number of node-feature pairs that are assigned to type k ),i except for the pair a, n(x k,¬a represents the count of node i belonging to node-type k except for a. ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.


28:13

After sampling the assignment of all the latent variables, the parameters can then be updated as follows: nl + τl lu = p(l|u) = u l , l (nu + τl ) nz + ρz , ϒlz = p(z|l) = l z z (nl + ρz ) (y),i

),i n(x),i + nz + n(x + αi z , = p(i|z) = z (x),i (y),i (x ),i + nz + nz + αi ) i (nz nzf + μk f z = p( f |z) = k . k(n f + μk)

(9)

iz

3.5. Personalized Bundle Recommendation

The proposed BPM aims to uncover consumers’ preferences toward different forms of bundles, and further explain their buying motives via latent factors and the corresponding probabilistic model. Now, we can design an approach to recommending supplementary items to be bundled with the target item. Generally, our setting for this recommendation problem is that given user u has one target item x at transaction t, we recommend a list of products R(t) for the user to form product bundle purchase. Then, the recommendation problem can be formulated as estimating the probability that u would buy item y together with x, i.e., p(y|x, u). The probability p(y|x, u) can be derived with Bayesian rule as follows: p(y|x, u) ∝ p(u, x, y) = p(l|u) p(z(x) , z(y) |l) p(x|z(x) ) p(y|z(y) ) (10)

l,z(x) ,z(y)

=

u,l ϒl,z(x) ϒl,z(y) z(x) ,x z(y) ,y .

l,z(x) ,z(y)

As shown in Equation (10), the probability that u will buy y along with the target item x is decomposed as consumer’s preferences toward edge-types (i.e., p(l|u)), and the node-type affiliation over edge-types (i.e., p(z(x) , z(y) |l)), as well as the node affiliations over node-types (i.e., p(x|z(x) ) p(y|z(y) )). Therefore, intuitively speaking, if the item at (y) is to be recommended together with the item at (x) with higher probability, both items should have high degrees to belong to the two related item groups, respectively, for which consumer u has a high chance to prefer. Specifically, there are two common scenarios in e-commerce that may apply the recommendation approach. One application scenario is that consumers have preliminary needs for items when they log on e-commerce shopping sites, namely the target items. Except for the target item, they may consider to buy other items to form a product bundle with various preferences. In this scenario, consumers may first put the target item in the “shopping cart,” and the system can then recommend other items to form product bundles. The other scenario is based on the webpages of items in online shopping sites. For instance, when users are browsing the webpages for a particular item, some other items may be recommended in terms of “customers bought this also bought,” or “frequently bought together.” However, such recommendations are generally not personalized in light of feature aspects. Thus, with our proposed approach, we can personalize the webpage for an item based on the preferred item relationships of the consumers. More specifically, when a consumer browses a webpage of an item, for example, ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28:14

G. Liu et al. Table II. Datasets Statistics Dataset Number of users Number of items Number of transactions Avg. number of items per transaction Proportion of transactions with more than 2 items Avg. number of transactions per user Avg. price per transaction Avg. standard deviation of price per transaction

cosmetics 18,805 4,914 126,449 1.88

perfume 4,132 1,252 39,906 1.48

0.410

0.226

3.53

5.73

197.84 41.55

239.53 18.62

an HP laptop, the system designed with our proposed approach can identify his/her preference for the relationship formed along with the laptop. Traditional system may recommend a cheap laptop bag because it is “frequently bought together” with the laptop; however, our system may discover that the consumer prefers to buy several items with “compatible functionality” and “comparable price” together; therefore, the system may recommend an HP Display rather than a bag for the consumer to form a bundle. In the designed system, we can incorporate another category of recommendation as “Most Preferred Item Relationships” into the webpage. 4. EXPERIMENTAL RESULTS 4.1. Experiment Setup

We obtained the transaction records from two online merchants in Taobao,2 the largest e-commerce shopping sites in China owned by Alibaba corporation. One merchant mainly sold skin care products including cream, lotion, facial mask, and the like, with the dataset named as cosmetics; the other merchant had perfume as its major products with the dataset named as perfume. Both datasets contained consumers’ transactions and specific items the customers have bought in one transaction within a time period. The cosmetics data includes the transactions in the first four months of the year 2013, and the perfume contains the transactions of the whole year 2013. To better capture consumers’ preferences for bundle purchase, we sampled the consumers with frequent purchase, i.e., the consumers with more than three transactions and at least one product bundle purchase within the time period were retained in the dataset. Moreover, the features of items were extracted including name, brand, price, functionality, origin, and fragrance, and so forth. With all the detailed transactional records and item features, the datasets were exploited to train the proposed BPM model. The statistics of the datasets are listed in Table II. For both of the two datasets, the distributions of order sizes show long tail distribution, and we also found that the number of transactions that included more than two items surpassed 40% for cosmetics and 20% for perfume. Since the proposed model processed the transactions as pairs of items, we supplemented a pseudoitem named “empty” for any transaction with only one item, such that the transactions with only one item are treated as a special case for product bundles. An “empty” item can be regarded as a real item with no features, and it can establish an edge with the existing item in the transaction. Specifically, when inferring the edge-types and node-types for the transactions with such edge “e =x, empty” are inferred in terms of the normal 2 http://www.taobao.com.



28:15

Fig. 6. The price distribution.

inference methods shown in Equations (6)–(8). We first compute the latent assignment of the edge, followed by computing the latent assignment for nodes x and “empty.” Because the “empty” item has no features, the last term in Equation (8) is cancelled such that we can also compute the latent assignment for the “empty” item. Meanwhile, adding the “empty” items would only influence the inference efficiency in a linear manner, because that each transaction with a single item is formulated as an edge, and that the number of item relationships to be inferred would only increase linearly with the number of such transactions. The price of items is one of the key features for a product bundle since consumers’ preferences toward bundles may be related to their purchasing power. The price distributions of the two datasets are shown in Figure 6, where both datasets had skewed distributions in price, while that of perfume was flatter. We also noticed that except for the cheap items, a certain amount of items were with the price between the values 200 and 400 in perfume. We calculated the standard deviation of price for each transaction, and as shown in Table II, though the average price for each transaction in perfume was higher than that in cosmetics, the variance was lower. In order to model the continuous price as a feature in the proposed BPM model, we used equal-depth methods to discretize the continuous values into several intervals and used each discretized price interval as one feature of the item. In the experiment, we fixed the number of intervals to be 10. We also held out a proportion of transactions as testing data by splitting the dataset according to the time of transactions with a ratio of 75/25. For cosmetics, we used transactions of the first three months as training set and the transactions of the last month as testing set. Analogously, the transaction records of perfume between January and September 2013 were used for training, while the remaining were used for testing. 4.2. Model Training

The model training started with random assignments of latent variables, and iterated to update the assignments and the parameters. We set symmetric hyperparameters ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28:16

G. Liu et al.

Fig. 7. Log likelihood with iterations. 1 1 with α = |I| , β = K1 , ρ = K1 , τ = 1L , μ = |F| . We also fitted the model with different parameters, i.e., the number of edge-types and the number of node-types. Figure 7 shows that the log likelihood increased with more Gibbs Sampling iterations, and the model converged after multiple iterations. We found that the likelihood for both datasets increased sharply in the first few iterations, and converge after 200 iterations. We also notice that the models converged in varied values of log likelihood with different parameter settings, meaning that the parameters influenced the models in fitting the transaction records.

4.3. Experiments on Bundle Recommendation

With the method (i.e., BPM) proposed in Section 3.5, we experimented on the datasets to validate the performance of bundle recommendation. However, the information about the target item for a customer’s purchased product bundle is unknown. Note that since the data obtained from the online shopping sites did not contain information about the target items of the customers, we assumed that the item with the highest price was the target item for a transaction, stemming from certain observations that the prices of associative items were usually not greater than that of the target item in a transaction. Furthermore, the items that have been bought together with a target item x in the training set were treated as recommendation candidates. 4.3.1. Baseline Methods. In comparison with our proposed method, several baseline recommendation methods were considered and adapted to the bundle recommendation problem. Bundle recommendation based on co-purchase (BRP): There have existed several efforts [Garfinkel et al. 2006; Xie et al. 2010; Zhu et al. 2014] that aimed to provide solutions to the bundle recommendation problem in light of co-purchase. They usually predefined a goal such as utility or revenue and used greedy strategies to optimize the goal. Here, we considered one of the BRP algorithms [Zhu et al. 2014] for comparison purpose, with the optimizing goal as follows:

max r T x + λx T Qx,

(11)

where r is the reward vector and Q is the cross-dependency matrix, which is computed by the frequency of co-purchase of items from the purchasing history. User-based collaborative filtering (UBCF) [Adomavicius and Tuzhilin 2005]: In UCBF, users are recommended with the items that are preferred by other similar users. As for bundle recommendation, each co-purchased pair of items treated as “meta-item,” and if the “meta-item” appeared in a users’ transactions, the rating of the corresponding ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.


28:17

entry was 1. Here, users’ similarity based on their purchased item edges were calculated via cosine similarity. Probabilistic matrix factorization (PMF) [Mnih and Salakhutdinov 2007]: Similarly to collaborative filtering, the user–item matrix was constructed, where the meta-item in the matrix was indeed the pair of items appeared in transactions. The method decomposed the matrix to low-rank factorized factors, denoted by user latent factors U , and item latent factors V . With the estimated latent factors, we could compute the score for each candidate item y to accompany with target item x for user ui to form bundle as follows: score(y; x, ui ) = Ui,k Vk,y . (12) k

Non-negative matrix factorization (NMF) [Koren et al. 2009]: Similarly to PMF, NMF also decomposed the rating matrix into user latent factors and item latent factors. We also treated the purchased item pair as the meta-item in the rating matrix, and compute the score for each candidate with Equation (12). Relation topic model (RTM) [Chang and Blei 2009]: RTM is originally designed to model the relationship between documents. To make it as a baseline method, we treat each item as a document, and the features are the words in the document. Thus, we can predict the probability that the any two items are related from the model, i.e., given one item x, the probability that x is related to another item y, p(y|x). Non-feature BPM (NF-BPM): In the proposed BPM, the transaction records and the item features were modeled jointly. To examine the effect of feature modeling, which is aimed to capture more semantics in terms of consumers’ behaviors via product feature aspects, BPM without feature modeling (i.e., NF-BPM) was considered as well for comparison purposes in the experiments. 4.3.2. Evaluation Metrics. To evaluate the accuracy of the recommendation list for a consumer to form product bundle purchase, some evaluation metrics in traditional recommendation algorithms were considered, such as Precision@N, Recall@N, as well as nDCG. For each transaction t of the testing dataset, we recommend a list of items Rt = (y1 , y2 , . . . , yN ) given that consumer u has added item x in the shopping cart. Therefore, the Precision@N(t) for this transaction can be computed as the proportion of recommended items exactly appearing in the transaction along with item x, let the items actually bought in transaction t be It : |Rt ∩ It | Precision@N(t) = . (13) N Meanwhile, we can also compute the Recall@N(t) for the transaction as |Rt ∩ It | Recall@N(t) = . (14) |It | Moreover, the ranking of the recommendation list also influences the recommendation quality since it is deemed that those actually bought items should be in higher ranks than others in the recommendation list. Thus, the metric nDCG is considered. Here, the Discounted Cumulative Gain (DCG) can be computed as

DCG@N = rel1 +

N reli , log2 (i)

(15)

i=2

where reli = 1 if the ith item in recommendation list is actually bought, and reli = 0 otherwise (i = 1, 2, . . . , N). Then, by normalizing DCG@N, we have nDCG@N = IDCG@N , DCG@N ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28:18

G. Liu et al.

Fig. 8. The recommendation performance of cosmetics.

Fig. 9. The recommendation performance of perfume.

where I DCG@N stands for the maximum possible value of DCG with optimal rank of the recommendation list. Subsequently, we can average on all the transactions in the testing dataset to get the Precision@N, Recall@N, and nDCG@N as evaluation metrics. Larger values of these evaluation metrics mean better recommendation performances. 4.3.3. Recommendation Performance. We first compared the recommendation performances of the proposed recommendation method with the baseline methods in terms of the evaluation metrics. In the transactions of the testing set, the item with the highest price was regarded as target item, and a list of items were recommended for the customer to form product bundles. In the experiments, we set the number of latent variables to be L = 15, K = 30 for cosmetics, and L = 5, K = 40 for perfume, and obtained the recommendation lists from the proposed model. The recommendation performance comparison with the baseline methods on the two datasets is shown in Figures 8 and 9. Overall, the proposed model achieved the best performances in all the evaluation metrics, and its simplified variation NF-BPM also achieved satisfactory performance. The recommendations from BRP were based on the co-occurrence of items in transactions, and it performed best among all the baseline methods. Concretely, in comparison with BRP, BPM obtains a 45.18% improvement in Precision@15, 40.15% in Recall@15, and 45.91% in nDCG@15 for cosmetics, and 15.22% improvement in Precision@15, 11.4% in Recall@15, and 0.4% in nDCG@15 for perfume. We also see that the recommendation performances RTM, which models the relationships between items, were still poorer than BPM since it does not model the bundle purchase in personalized level. Furthermore, the recommendation performances of UBCF method and matrix factorization based approaches were largely inferior to those of the proposed BPM, because ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.


28:19

Fig. 10. Varying number of edge-types.

the number of item pairs was huge, which resulted in a sparse rating matrix difficult for estimating the ratings accurately. Moreover, compared with its simplified variation NF-BPM, in cosmetics, BPM obtains a 5.13% improvement in Precision@15, 5.68% improvement in Recall@15, and 27.04% improvement in nDCG@15, which shows the contribution of item features in modeling the product bundles and improving recommendation performances. In perfume, the recommendation performance were more close to the complete BPM than that in cosmetics, though NF-BPM could not compete with BPM in overall. This was largely due to the fact that the items in perfume are all perfumes with particular brands and fragrance, and the item features are more similar with only slight difference. Specifically, compared with cosmetics dataset, in which there are 5,389 unique item features, the number of unique features for perfume is only 635. Therefore, the item features of perfume play insignificant roles in categorizing the items, such that the contributions of item features to recommendations are not significant, and the model without features also has a comparable performance in comparison with the complete BPM. In the bundle recommendation, though the average number of transactions per user is only around 5 and some users only have a few transactions, the recommendation in our model is derived from three levels, i.e., the preference for edge-type, and the affiliation of node-type for the edge type, as well as the affiliations of nodes for a node type, as shown in Equation (10). We note that the average number of transactions per user is around 5 and some users only have a few transactions; however, the proposed bundle recommendations are derived from three levels, which are the preference for edge-type, and the affiliation of node-type for the edge type, as well as the affiliations of nodes for a node type, as shown in Equation (10). Indeed, we also had 494,617 item relationships contained in cosmetics and 44,720 in perfume, which were all used in the inference process for identifying specific edge-types. In other words, for a consumer with a few transactions, we can then recommend items based on the relationships inferred from other users given one target item. In addition, since there were two latent variables considered in the model, namely edge-type and node-type, various numbers of each were considered in terms of different parameter settings in order to examine the recommendation performances. Number of edge-types: In the experiments, the number of node-types is fixed to be 30, and the number of edge-types were set between 2 and 20. Figure 10 shows the precision values at top 5, 10, and 15 items with different number of edge-types. In can be seen that for cosmetics, there existed certain patterns with the peak of the curve ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28:20

G. Liu et al.

Fig. 11. Varying number of node-types.

occurring at the number of 14, while for perfume, it achieved best performance with small number of edge-types, such as 3, which indicates that the number of edge-types is relatively limited for perfume and it may arouse over fitting issues if the number of edge-types is set too high. For other metrics such as Recall@N and N DCG@N, the changes in performances were similar to Precision@N under different numbers of edge-types. Number of node-types: In this experiment, the number of edge-types is fixed to be 5, while the numbers of edge-types were set from 10 to 100 to show the recommendation performances. The results on Precision@N are shown in Figure 11. Overall, the number of node-types did not influence the performance of recommendation greatly, and it achieved the best Precision@5 when the numbers of node-types were set as 60 and 40 respectively for cosmetics and perfume. 4.3.4. The Target Item in Bundle Recommendation. In the experimental settings, we originally treated the item with the highest price as the target item. However, different target items may influence the recommendation results because the items with different features can form distinctive relationships with others. In order to understand how the target items influence the bundle recommendation performances, we compared different scenarios of target items in accordance with the recommendation accuracy, i.e., the item with the highest price, the lowest price, and a random item were treated as the target item respectively to obtain the recommendation results. As shown in Figure 12, the precision and recall values of recommendation differed significantly with respect to different scenarios of target items in both datasets. Specifically, when the target items had the lowest price, both the precision and recall values of recommendation were significantly lower, while in the scenarios when the target item is with the highest price, the recommendation performances were much better. These findings were due to the fact that the differences in recommendation performances under various scenarios of target items were affected by the edge-types formed out of item pairs. Observed from the datasets, cheap items were often bought with other cheap items, resulting in high probabilities with respect to the number of related edge-types. Thus, when a cheap item was treated as the target item, the items with similar prices would appear in high ranks in the recommendation list. This may lead to ineffective recommendation if there also exist more expensive items in the transactions. Moreover, in many cases, consumers are more likely to accept recommendations with lower prices. Therefore, the recommendation performance when the target item is the cheapest would be inferior to the scenarios when the target item is the most expensive. ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.


28:21

Fig. 12. Different scenarios of target items.

Fig. 13. The node-type distribution over edge-types.

4.4. Product Bundle Analysis

In further looking at the product bundling in light of the parameters inferred from the proposed BPM, we analyzed the relationships between the latent variables edge-type and node-type to uncover the buying patterns and the underlying motives for buying product bundles. We set L = 5 and K = 30 to train the model for both datasets, and employed the parameters learned from the model to analyze different product bundles. Figure 13 shows the ϒ matrix representing the distributions of node-types over edge-types. In other words, the probability distribution conveyed the information of how an edge-type grouped different node-types. In Figure 13, darker colors denote higher values in node-type distributions for specific edge-types, which can be obtained from the learned parameter ϒl,z . As can be seen in ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28:22

G. Liu et al. Table III. Representative Features for Node-Types Node-type 01 facial mask:0.029 sheet:0.025 essential:0.023 tightening:0.016 onsale:0.012 Sololife*: 0.011 [0.01, 18.0]:0.011 Node-type 11 reparing:0.024 tightening:0.023 anti-winkle:0.018 aging:0.016 Estee Lauder*:0.015 revitalize:0.012 [419.0, 2585.0]:0.012 Node-type 13 floral:0.050 persistence:0.049 Dior*:0.039 EDT:0.022 sample size:0.0180 [22.5, 49.0]:0.013

cosmetics Node-type 15 skin care:0.048 set:0.029 essential:0.020 perfection:0.015 ´ L’OREAL*:0.015 freckle:0.013 [65.0, 99.0]:0.0161 Node-type 22 soft:0.023 toner:0.016 dry skin:0.015 cleansing:0.015 comfort:0.015 Clinique*:0.014 [145.0, 195.0]:0.0107

Node-type 17 tightening:0.037 lift:0.027 anti-winkle:0.025 cream:0.022 on sale:0.013 Estee Lauder*:0.009 [65.0, 99.0]:0.007 Node-type 27 fluid foun:0.033 concealer:0.027 complexion:0.018 perfection:0.017 sun protection:0.017 Dior*:0.016 [419.0, 2585.0]:0.012

perfume Node-type 24 EDP:0.063 Chanel*:0.055 normal size:0.044 all skin types:0.041 France:0.027 [569.0, 1480.0]:0.027

Node-type 26 EDP:0.0457 Marc Jacobs*:0.046 gardenia:0.041 fragrance:0.036 normal size:0.029 [378.0, 449.0]:0.010

Notes: The numerical intervals for each type denotes the price intervals. The features with * as subscripts denote the brands of cosmetics and perfume.

Figure 13, each edge-type was manifested by several concentrated node-types composed of dark-colored blocks on the diagonals as a result of clustering the node-types in their feature aspects. For instance, edge-type 01 had high values in node-types 01 and 02, while edge-type 02 mainly consisted of node-types 07, 08, and 09. Furthermore, the distribution of node-types over edge-types in perfume was more concentrated, with node-type 01 primarily composed edge-type 01, and node-type 21 exclusively composed edge-type 04, meaning that such edge-types consist of paired nodes with similar types. This further reveals the fact that in perfume dataset, consumers were more likely to buy items with similar node-types together. Furthermore, since the nodes were also clustered in feature space in the proposed BPM. Therefore, we could also analyze the representative features for each node-type from the learned multinomial distribution z, f . Here, the features with highest probability values p( f |z) were regarded as representative features. In the cosmetics dataset, the items are mainly skin-care products such as lotion, facial mask, and the like, and these items can help provide different skin cares against aging, drying, and so forth. In the perfume dataset, the items are luxury perfume with different fragrance, brands, and sizes; thus, the vocabularies and terms consist of corresponding features. Table III lists the representative features for several selected types. We can see how the items were clustered in terms of feature aspects such as function, brand, and price. In cosmetics, node-ype 01 consisted of features such as such as facial mask, cheap price, onsale, and so on. Node-types 11 and 17 were similar in feature aspects such as function and brand, both used for resolving aging problems of skin. Notably, these two ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.


28:23

Table IV. Most Representative Item Pairs of Selected Edge-Types for Cosmetics

Edgetype 03

Edgetype 05

x y x y x y x y x y x y x y x y x y x y

Brand Estee Lauder Estee Lauder Estee Lauder Estee Lauder Estee Lauder Estee Lauder Estee Lauder Estee Lauder Lancome Estee Lauder Sololife Estee Lauder Estee Lauder Estee Lauder Estee Lauder Estee Lauder Sololife Estee Lauder Chanel Estee Lauder

Function anti-winkle creme eye complex anti-winkle creme lotion lotion eye complex face creme eye complex multi lift anti-winkle creme repairing mask eye complex eye complex foam cleanser lotion eye complex repairing mask foam cleanser makeup base eye complex

Price 88 105 88 49 49 105 65 105 59 88 139 395 395 208 358 395 139 208 528 395

Other details Time Zone, Night sample size, 5ml Time Zone, Night intensive boosting intensive boosting sample size, 5ml SPF 15, resilience sample size, 5ml reenergy Time Zone, Night natural full size, 15ml full size, 15ml vitality, radiant energy, vitality, radiant full size, 15ml natural vitality, radiant whitening, revealing full size, 15ml

node-types differed tremendously in price, while node-type 17 was cheaper and also with the feature on sale. In addition, in the proposed model BPM, each edge-type generates the pairs of nodetypes of an edge, correspondingly, the edge-type can be represented with a group of item pairs with similar relationships. Concretely, in this regard, we also computed the probability of all the edges (i.e., item pairs) with respect to specific edge-type p((x, y)|l), which can be derived by Bayesian rule as follows: p((x, y)|l) ∝

p(l|u) p(zx |l) p(zy |l) p(x|zx ) p(y|zy )

u,zx ,zy

=

u,l ϒl,zx ϒl,zy zx ,x zy ,y .

(16)

u,zx ,zy

The edges with higher probabilities were regarded as representative edges for the particular edge-type. Tables IV and V list the representative edges for several selected edge-types for the two datasets, including the node features at both ends of the edges. In this way, we could see that edge-types reflected certain patterns for the co-purchase of item pairs. Concretely, in cosmetics, both edge-types 04 and 05 were mostly composed of the items with the brand “Estee Lauder” and similar functions; however, the price of the items in the two edge-types was remarkably different. Note that the items in edgetype 03 had medium price near the value of 100, and also comparable to each other, thus revealing that this edge-types could reflect consumers’ pursuit for discount or free shipping policy. In contrast, the items connected in edge-type 05 had higher price and bigger brands, which may be motivated by the considerations of feature aspects such as brand and function, and mainly consumed by consumers who had high purchasing power. Similarly, in perfume, we can see that edge-type 03 was mainly composed of sample size perfume at both ends, while edge-type 05 was a brand-name perfume in normal size supplemented by a sample size perfume. It is also worth noticing that the item “Burberry Brit Sheer Sample 4.5ml” formed different edges with regard to the two edge-types, which confirms the interpretability of the latent factor edge-type. ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28:24

G. Liu et al. Table V. Most Representative Item Pairs of Selected Edge-Types for Perfume

Edgetype 03

Edgetype 05

x y x y x y x y x y x y x y x y x y x y

Brand Burberry Versace Lavine Paris Versace Burberry Lavine Paris Avon Versace Versace Versace Burberry Chanel Anna Sui Burberry Chanel Burberry Chanel Burberry Chanel Burberry

Fragrance Brit Sheer bright crystal EDP bright crystal Brit Sheer EDP little black bright crystal crystal noir EDP bright crystal Brit Sheer Chanel Chance, EDP secrete wish Brit Sheer bleu Brit Sheer Chanel Chance, EDP brit EDP bleu brit EDP

Price 49 49 38 49 49 38 21 49 49 49 49 709 378 49 575 49 709 39 575 39

Other details sample, 4.5ml sample, 5ml sample, 5ml sample, 5ml sample, 4.5ml sample, 5ml 9ml sample, 5ml sample, 5ml sample, 4.5ml full size, 100ml 75ml sample, 4.5ml male, 50ml sample, 4.5ml full size, 100ml sample, 5ml male, 50ml sample, 5ml

From the above analysis, we find that the proposed model BPM effectively explains distinctive motives in buying product bundles. Different edge-types reflect consumers’ different preferences, driven by distinctive motives; thus, companies can design personalized product bundles in accordance with different edge-types. 5. CONCLUSIONS

In formulating each purchased product bundle as a bundle graph, this article has proposed a probabilistic model, namely Bundle Purchase with Motives (BPM), to describe the generative process of bundle graphs with item features, which enables us to capture the consumers’ preferences and underlying motives for different types of product bundles. With the introduced latent factors node-type and edge-type, the relationships between bundled items are identified to explain consumers’ preferences and buying motives, and to provide personalized bundle recommendation. Thus, an approach has been developed to recommend the bundled items in association with a given item already bought as the target item. Finally, the experiments on real-world data have been conducted for BPM along with other baseline methods against evaluation metrics, showing that BPM effectively outperformed others. Moreover, the parameters inferred from the model uncovered the compositions of the purchased product bundles and indicated the consumers’ buying motives to some extent. Future research can be carried out in the following aspects. One is to extend the model by incorporating consumers’ demographics information such as gender, age, and the like, in order to better model consumers’ purchasing behaviors and preferences. Another is to consider the directions of edges in bundle graphs so that the sequential patterns of the buying items. Moreover, there may be cases where a target item is recommended to link with a bundle as a whole, which extends the current setting into a more general one that can bundle not only separate items together but also bundles, leading to a bundle of bundles. This is a scenario of embedded bundling, which needs further exploration of both practical justification and technical modeling. REFERENCES G. Adomavicius and A. Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17, 6 (Jun. 2005), 734–749. DOI:http://dx.doi.org/10.1109/TKDE.2005.99



28:25

Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2009. Mixed membership stochastic blockmodels. In Proceedings of Advances in Neural Information Processing Systems Conference. 33–40. Yannis Bakos and Erik Brynjolfsson. 1999. Bundling information goods: Pricing, profits, and efficiency. Management Science 45, 12 (Dec. 1999), 1613–1630. DOI:http://dx.doi.org/10.1287/mnsc.45.12.1613 Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco. 2014. Who to follow and why: Link prediction with explanations nicola. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 1266–1275. DOI:http://dx.doi.org/10.1145/ 2623330.2623733 David M. Blei and John D. Lafferty. 2007. A correlated topic model of science. The Annals of Applied Statistics (2007), 17–35. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. The Journal of Machine Learning Research 3 (2003), 993–1022. Jonathan Chang and David M. Blei. 2009. Relational topic models for document networks. In AIStats, Vol. 9. 81–88. Freddy Chong Tat Chua, Hady W. Lauw, and Ee-Peng Lim. 2013. Generative models for item adoptions using social correlation. IEEE Transactions on Knowledge and Data Engineering 25, 9 (2013), 2036–2048. Timothy Derdenger and Vineet Kumar. 2013. The dynamic effects of bundling as a product strategy. Marketing Science 32, 6 (Nov. 2013), 827–859. DOI:http://dx.doi.org/10.1287/mksc.2013.0810 Robert Garfinkel, Ram Gopal, Arvind Tripathi, and Fang Yin. 2006. Design of a shopbot and recommender system for bundle purchases. Decision Support Systems 42, 3 (Dec. 2006), 1974–1986. DOI:http://dx. doi.org/10.1016/j.dss.2006.05.005 Yong Ge, Hui Xiong, Alexander Tuzhilin, and Qi Liu. 2014. Cost-aware collaborative filtering for travel tour recommendations. ACM Transactions on Information Systems 32, 1 (Jan. 2014), Article 4, 31 pages. DOI:http://dx.doi.org/10.1145/2559169 Neil Zhenqiang Gong, Ameet Talwalkar, Lester Mackey, Ling Huang, Eui Chul Richard Shin, Emil Stefanov, Elaine (Runting) Shi, and Dawn Song. 2014. Joint link prediction and attribute inference using a socialattribute network. ACM Transactions on Intelligent Systems and Technology 5, 2 (Apr. 2014), Article 27, 20 pages. DOI:http://dx.doi.org/10.1145/2594455 Jagadeesh Gorla, Neal Lathia, Stephen Robertson, and Jun Wang. 2013. Probabilistic group recommendation via information matching. In Proceedings of the 22Nd International Conference on World Wide Web (WWW’13). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 495–504. Bari A. Harlam, Aradhna Krishna, Donald R. Lehmann, and Carl Mela. 1995. Impact of bundle type, price framing and familiarity on purchase intention for the bundle. Journal of Business Research 33, 1 (1995), 57–66. Mohsen Jamali, Tianle Huang, and Martin Ester. 2011. A generalized stochastic block model for recommendation in social rating networks. In Proceedings of the 5th ACM Conference on Recommender Systems (RecSys’11). ACM, New York, NY, 53–60. DOI:http://dx.doi.org/10.1145/2043932.2043946 Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 8 (2009), 30–37. Guannan Liu, Yanjie Fu, Tong Xu, Hui Xiong, and Guoqing Chen. 2014. Discovering temporal retweeting patterns for social media marketing campaigns. In Proceedings of the 2014 IEEE International Conference on Data Mining (ICDM’14). 905–910. DOI:http://dx.doi.org/10.1109/ICDM.2014.48 Qi Liu, Xianyu Zeng, Chuanren Liu, Hengshu Zhu, Enhong Chen, Hui Xiong, Xing Xie. 2015. Mining indecisiveness in customer behaviors. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM’15). IEEE, 281–290. Xingjie Liu, Yuan Tian, Mao Ye, and Wang-Chien Lee. 2012. Exploring personal impact for group recommendation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 674–683. DOI:http://dx.doi.org/10.1145/2396761.2396848 Andriy Mnih and Ruslan Salakhutdinov. 2007. Probabilistic matrix factorization. In Proceedings of Advances in Neural Information Processing Systems Conference. 1257–1264. Ramesh M. Nallapati, Amr Ahmed, Eric P. Xing, and William W. Cohen. 2008. Joint latent topic models for text and citations. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, NY, 542–550. DOI:http://dx.doi.org/10.1145/ 1401890.1401957 J. Paul Peter, Jerry Corrie Olson, and Klaus G. Grunert. 1999. Consumer Behavior and Marketing Strategy. McGraw-Hill, London. Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. 2011. Recommender Systems Handbook. Vol. 1. Springer. ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Article 28, Publication date: March 2017.

28:26

G. Liu et al.

Stefan Stremersch and Gerard J. Tellis. 2002. Strategic bundling of products and prices: A new synthesis for marketing. Journal of Marketing 66, 1 (2002), 55–72. Pang-Ning Tan and Vipin Kumar. 2005. Association analysis: Basic concepts and algorithms. In Introduction to Data Mining, Chapter 6. Addison-Wesley. ISBN 321321367. Chong Wang and David M. Blei. 2011. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, New York, NY, 448–456. DOI:http://dx.doi.org/10.1145/2020408.2020480 Chao Wang, Venu Satuluri, and Srinivasan Parthasarathy. 2007. Local probabilistic models for link prediction, In Proceedings of the Seven7, Oct. 2007, 322–331. DOI:http://dx.doi.org/10.1109/ICDM.2007.108 Min Xie, Laks V. S. Lakshmanan, and Peter T. Wood. 2010. Breaking out of the box of recommendations: From items to packages. In Proceedings of the 4th ACM Conference on Recommender Systems (RecSys’10). ACM, New York, NY, 151–158. DOI:http://dx.doi.org/10.1145/1864708.1864739 Ruiliang Yan, Chris Myers, John Wang, and Sanjoy Ghose. 2014. Bundling products to success: The influence of complementarity and advertising. Journal of Retailing and Consumer Services 21, 1 (Jan. 2014), 48– 53. DOI:http://dx.doi.org/10.1016/j.jretconser.2013.07.007 Mao Ye, Xingjie Liu, and Wang-Chien Lee. 2012. Exploring social influence for recommendation. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, NY, 671. DOI:http://dx.doi.org/10.1145/2348283.2348373 Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, and Zhao Xu. 2006. Stochastic relational models for discriminative link prediction. In Proceedings of Advances in Neural Information Processing Systems Conference. 1553–1560. Quan Yuan, Gao Cong, and Chin-yew Lin. 2014. COM: A generative model for group recommendation. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 163–172. DOI:http://dx.doi.org/10.1145/2623330.2623616 Tao Zhu, Patrick Harrington, Junjun Li, and Lei Tang. 2014. Bundle recommendation in ecommerce. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR’14). ACM, New York, NY, 657–666. DOI:http://dx.doi.org/10.1145/2600428.2609603 Received November 2015; revised September 2016; accepted November 2016