A sales forecasting model for consumer products based on the ...

5 downloads 2878 Views 1MB Size Report
52 products - Abstract. Sales forecasting is one of the most critical steps of business process. Since the forecasting accuracy of traditional techniques is generally ...
Inf Syst E-Bus Manage (2015) 13:445–473 DOI 10.1007/s10257-014-0265-0 ORIGINAL ARTICLE

A sales forecasting model for consumer products based on the influence of online word-of-mouth Ching-Chin Chern • Chih-Ping Wei • Fang-Yi Shen Yu-Neng Fan



Received: 16 October 2013 / Revised: 15 July 2014 / Accepted: 22 September 2014 / Published online: 10 October 2014  Springer-Verlag Berlin Heidelberg 2014

Abstract Sales forecasting is one of the most critical steps of business process. Since the forecasting accuracy of traditional techniques is generally unacceptable for products with irregular or non-seasonal sales trends, it is necessary to construct a new forecasting method. Past research shows that there is a strong relationship between online word-of-mouth and product sales, but that the extent of the impact of word-of-mouth varies with product category. This study aims to provide an understanding of how electronic word-of-mouth affects product sales by analyzing online review properties, reviewer characteristics and review influences. This new electronic word-of-mouth perspective contributes to sales forecasting research in two ways. First, a novel classification model involving polarity mining, intensity mining and influence analysis is proposed with a framework to elucidate the difference between review categories. Second, the influence of online reviews (i.e., electronic word-of-mouth) is estimated and then used to construct a sales forecasting model. The proposed online word-of-mouth-based sales forecasting method is evaluated by using real data from a well-known cosmetic retail chain in Taiwan. The experimental results demonstrate that the proposed method is especially suitable for products with abundant online reviews and outperforms traditional time series forecasting models for most consumer products examined.

C.-C. Chern (&)  C.-P. Wei  F.-Y. Shen  Y.-N. Fan Department of Information Management, National Taiwan University, 50, Lane 144, Sec. 4, Keelung Road, Taipei 106, Taiwan, ROC e-mail: [email protected] C.-P. Wei e-mail: [email protected] F.-Y. Shen e-mail: [email protected] Y.-N. Fan e-mail: [email protected]

123

446

C.-C. Chern et al.

Keywords Electronic word-of-mouth  Online review  Sales forecasts  Text mining  Time series data

1 Introduction In the current Internet era, where information technology greatly facilitates information sharing and transcends the limitations of traditional word-of-mouth (WOM), WOM has evolved into a more powerful form—electronic word-of-mouth (eWOM). ‘‘For many businesses, online customer opinions have become a type of virtual currency that can make or break their products’’ (Wright 2009). In recent years, e-commerce companies such as Amazon and eBay have been providing forums and rating platforms to serve as their eWOM services. Through these services, despite geographical and time differences, consumers can share their experience with and opinions on a variety of products and affect each other’s purchasing decisions. Prior research has shown that consumer-purchasing decisions are significantly affected by online product reviews (Chen et al. 2008, 2011; Chevalier and Mayzlin 2006; Ku and Chen 2007). Understanding how important consumer advocacy is, figuring how eWOM affects product sales is still a challenge. Some studies found the extent of eWOM’s impact to differ with product category (Mudambi and Schuff 2010; Zhu and Zhang 2010). Other scholars have devoted themselves to studying consumer interactions and the impact on sales through the online review platforms provided by an e-commerce company, e.g., Amazon or eBay. Evidence shows that negative reviews are more influential than positive ones (Chevalier and Mayzlin 2006), while the reviewer characteristics may also play a critical role on the impact of their reviews (Forman et al. 2008; Hu et al. 2008). Sales forecasting is one of the most critical steps of business process because it is the foundation for other operations. Once the expected sales level is determined, it is then used to estimate optimal purchasing or manufacturing quantities; other related decisions can be made afterward. The value of sales forecasting depends on its accuracy. When sales are overestimated, a company overstocks and suffers due to the related holding costs. To boost sales before overstocked products expire, the company would need to discount prices. In contrast, when sales are underestimated, the company loses revenues and profits. Some statistical methods can be used to forecast sales. These include moving averages and exponential smoothing (Keller 2012). These methods are suitable for products with stable demand. When demand is irregular and volatile over a short period of time, traditional statistical methods generally become ineffective for forecasting future sales and make it necessary to construct a new forecasting method. Prior research has shown that eWOM is a critical factor affecting sales, and it is as influential as advertising. As the first form of social interaction, WOM is a wellestablished construct in the marketing literature (Arndt 1967). Recent research has confirmed the significant relationship between online consumer advocacy and

123

A sales forecasting model for consumer products

447

product sales (Chen et al. 2008, 2011; Chevalier and Mayzlin 2006; Moe and Trusov 2011). For example, Chen and Xie (2008) as well as Chevalier and Mayzlin (2006) examined the impact of online reviews on products sold on e-commerce websites. Moe and Trusov (2011) studied the relationship between existing average ratings and sales trends. Zhu and Zhang (2010) have shown that the extent of eWOM’s impact on sales may differ with product characteristics. Though there is great diversity in eWOM topics, most existing research focuses on products sold on e-commerce websites, such as Amazon and eBay. It is important to check the applicability of these findings to other categories of products sold through different channels, such as grocery stores or personal-care chain stores. Furthermore, even though prior studies have empirically shown the existence of a relationship between eWOM and sales, only a few have applied eWOM as a sales forecasting predictor because the relationship between eWOM and sales is not so straightforward (Chevalier and Mayzlin 2006; Forman et al. 2008; Zhu and Zhang 2010). Most studies treat every review identically and adopt an average rating score provided by e-commerce websites as product eWOM. Nevertheless, not all reviews are created equally (Chen et al. 2008; Hu et al. 2008). In this study, we propose a model to estimate the influence of each review by examining various factors such as the properties of the reviews (e.g., polarity, sentiment intensity), the characteristics of the reviewers and the responses from readers (e.g., the number of readers who like the review). In addition, the impact span of eWOM is another critical issue in our study and thus should be carefully evaluated. Although existing studies often treat the impact of an online review on sales as discrete and occurring during single period, we assume that as long as the online review remains accessible, its impact on sales lasts for several periods. Therefore, several important questions arise. How long will the impact of an online review persist? Does the impact of a review on sales remain identical from one period to the next? Or, does the effect diminish in some special forms? The time dimension complicates the sales forecasting task, but it is an important issue that should be investigated. In short, our study focuses on three issues of sales forecasting for consumer products. First, we identify products with a strong relationship between consumer advocacy and sales, since these products are suitable candidates for applying eWOM-based forecasting methods. Second, a sentiment text mining-based system is proposed to evaluate the influence of each individual product review. Finally, a search algorithm is constructed to search for the best-fit eWOM-based forecasting model for a focal product. The rest of the paper is organized as follows. Section 2 describes our research problem. Section 3 presents our proposed eWord-of-Mouth Sales Forecasting Algorithm (WOMSFA). Section 4 demonstrates the implementation of WOMSFA in a real case. In Sect. 5, we compare the results obtained with WOMSFA to those of traditional forecasting methods and discuss the applicability of our proposed method. Finally, we offer our conclusions and suggestions for future research in Sect. 6.

123

448

C.-C. Chern et al.

2 Problem description The research problem investigated in this study is formally stated as: given a sequence of sales data prior to time t for product i (i.e., …, St-2,i, St-1,i) and a collection of online reviews pertaining to product i prior to time t, we attempt to construct a sales forecasting model that considers the influence of these online reviews for predicting the future sales of product i (i.e., predicting St,i, St?1,i,…). Sales are often affected by seasons or special events such as holidays and promotions. However, as Fig. 1 illustrates, some consumer products are associated with irregular sales trends, even after the effects of seasonal and special events have been removed. Traditional forecasting methods are insufficient to analyze the sales trends of these consumer products, let alone forecast their future sales. Because these consumer products are usually experience goods, it is justifiable to use eWOM as a predicator to forecast their future sales. The use of eWOM for sales forecasting first requires estimating the influence strength of each review. In this study, we consider several factors including review properties (e.g., polarity, sentiment intensity), the characteristics of reviewers and responses from readers. Most prior studies examine the polarity (i.e., positive, negative, or neutral) of reviews to determine positive or negative orientations expressed in these reviews. Some research adopts the ‘‘product ratings’’ given in reviews to determine the polarity of the reviews (Duan et al. 2008; Moe and Trusov 2011; Mudambi and Schuff 2010; Zhu and Zhang 2010). Product rating systems often adopt a five-star rating mechanism, e.g., Amazon’s product rating system. This method of determining the polarity of a review is simple and fast, but may become too general. Another way to detect the polarity of a review is to examine the textual content of the review (Ku and Chen 2007; Su et al. 2007). In this manner, a keyword dictionary generally needs to be established before we can analyze the polarity of reviews using text mining techniques. In addition, a review’s sentiment intensity (i.e., the strength of sentiment expressed in the review) is also likely to affect the strength of its influence on future sales (Mudambi and Schuff 2010). Similar to polarity mining, the product rating shown in a review could be used to detect the sentiment intensity of the review. However, this simple method has the same drawbacks as for polarity mining. Text mining is one of the solutions. We can build a dictionary for assigning a degree of intensity to each term and then use the dictionary to analyze reviews. Another potential method relies on grammar (Zhang et al. 2009). For example, after sentences are decomposed using syntactical functions, we can analyze the intensity of sentiment by examining adverbs. Online reviews are diverse in terms of content. Some reviews only contain brief personal preferences, while others may contain product information and personal opinions on or experiences with products or specific product features. Evidently, reviews with more product details and explanations for reviewers’ preferences appear to be more useful, and subsequently, more influential than short reviews. Accordingly, review length (i.e., the number of characters in a review) is also an important factor that could affect the influence strength of a review on sales.

123

A sales forecasting model for consumer products

449

600

$1000 NTD

500 400 300 200 100 0

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64

Time Fig. 1 Weekly sales for Product 2

The next factor considered in this study is reviewer characteristics. The celebrity effect is often observed in blogs and social networking platforms. Opinions from celebrities are often more influential and persuasive than opinions from noncelebrities. In addition, reviews written by celebrities often have more page views, and therefore, are more likely to be exposed to customers and result in stronger influences. Responses from readers represent another category of factors that could affect the influence strength of reviews. Many review systems provide the number of page views and a ‘‘helpful count’’ of each review. Furthermore, some review systems consider reviews with a higher number of page views or a higher helpful count as more informative and place these reviews on the first page to reduce the search cost of users. Accordingly, the number of page views and ‘‘helpful counts’’ for a review imply the potential influence strength of the review and thus will be adopted in this study as factors when assessing the influence strength of a review. In addition to the influence strength of reviews, the next issue that needs to be investigated is the impact span of reviews. Sales data is a type of time series data, as are online reviews. Aggregating all individual sales data at each time point can form a time series of sales data. Similarly, online reviews can be seen as time series data, i.e., for a specific product, the number of online reviews released at each period forms a series over time. Most of the time, reviews remain accessible to readers for a long period of time on online review platforms. Therefore, the impact of a review not only affects current sales, but also sales in several subsequent periods. Moreover, the impact of a review is a function of time. As a review gets old, it will become less popular and, thus, less influential. In contrast, as new reviews are continuously being released, readers may shift to more recent reviews for more current information. Hence, the impact of a review on sales diminishes over time. We therefore assume that the impact of a review on sales is a downward curve over time and eventually has no influence in the end. For example, a review may affect the sales of a product with a degree of 9 in the initial post-launch time period, but with a degree of 7 in the second period, 5 in the third, and so on. By using text mining techniques, we analyze the polarity and sentiment intensity of online reviews, and then determine the influence strength of these reviews.

123

450

C.-C. Chern et al.

Subsequently, we model the impact of a review over time using an exponential function and possibly other curves. Then, we consider linear regression models using the impact of reviews as a predictor and finally choose the one with the lowest mean absolute percentage error (MAPE) (Kahn 1998; Keller 2012).

3 eWord-of-Mouth Sales Forecasting Algorithm (WOMSFA) This study attempts to forecast future sales for consumer products based on online reviews or eWOM. The proposed method not only analyzes the polarity and intensity of reviews, but also separates influential reviews from neutral ones. Since the text mining process is closely related to sentiment properties, we must carefully analyze the sentiments and observe the ways online users network with each other and respond to the reviews. How do the reviewers express their opinions on product properties and their experiences with the products? How do online users network with each other and respond to the reviews? What does it take to make a review helpful and influential? Once these questions are answered, we propose the WOMSFA, as illustrated in Fig. 2, which considers the impact of online reviews as a predictor of sales. A brief overview of WOMSFA is described below. In Step 1, online product reviews are collected from relevant and popular websites and are analyzed to assess the influence strength of each online review. Subsequently, in Step 2, WOMSFA converts each single review into an influence curve over time using several candidate review influence curves. In Step 3, WOMSFA searches for the best regression model with the fewest errors by comparing the results of fitting several candidate review influence curves as the input variables of regression models. In Step 4, WOMSFA then validates the selected regression model using residual analysis and abandons the best model in search of the second best model if the best model does not satisfy the requirements of the regression model. Finally, in Step 5, WOMSFA adopts the validated regression models for sales forecasting. Because Step 5 is fairly straightforward, we only detail the design of the first four steps in the following subsections.

No

Start Step 1: Classify the Online Product Reviews

Step 2: Construct the Review Influence Curve

Step 3: Search for the Regression Model with the Lowest MAPE

Step 4: Validate the Regression Model

Yes

Step 5: Perform Sales Forecasting

End

Fig. 2 WOMSFA steps

123

A sales forecasting model for consumer products

451

3.1 Online product review classification (Step 1) In Step 1, the online reviews are extracted and analyzed from multiple perspectives to determine whether or not a single review has a powerful influence over the sales of the product under discussion. As Fig. 3 shows, the procedure starts with collecting online product reviews, continues with analyzing these online reviews and finally, classifies reviews into one of three categories—strongly positive, strongly negative, and neutral—with different influence strengths. Multiple perspectives are considered to determine whether a single review has a strong influence on the sales of the product under discussion. As we discussed in the previous section, the polarity and the sentiment intensity of a review are likely to affect the review’s influence strength on future sales. In this study, we identify and employ six semantic categories in order to capture the polarity and the sentiment intensity of a review: Positive, Negative, Fine, Bad, Strong, and Weak. The Positive category contains keywords (e.g., ‘‘excellent,’’ ‘‘great’’) that explicitly indicate a positive attitude (polarity) toward a product or product feature, whereas the Negative category contains keywords (e.g., ‘‘bad,’’ ‘‘terrible’’) that explicitly indicates a negative attitude (polarity) toward a product or product feature. Because a positive or negative attitude may implicitly be stated, we also incorporate the Fine (for an implicit positive attitude) and Bad (for an implicit negative attitude) categories. For example, when a review states that the photo quality of a digital camera is rich in color, the keyword ‘‘rich’’ here implies a positive attitude towards this digital camera. In contrast, when a review indicates that a specific mascara product will result in raccoon eyes, the ‘‘raccoon eyes’’ description typically conveys a negative attitude toward this mascara product. Finally, keywords that suggest high sentiment intensity (e.g., ‘‘very,’’ ‘‘extremely’’) are classified as Strong, while keywords that show low sentiment intensity (e.g., ‘‘somewhat’’) are classified as Weak. In this study, we rely on domain experts to construct keyword dictionaries for these semantic categories (i.e., Positive, Negative, Fine, Bad, Strong and Weak) for the products (i.e., personal care and cosmetic products) included in our evaluations. To assess the importance of each semantic category in an online product review, we adopt and extend the TF-IDF metric, commonly employed in information

Start

Collecting the Online Product Reviews

Constructing Keyword Dictionaries

Determining the Weights of Semantic Categories in Each Review

Analyzing the Influence of a Reviewer

Constructing the Feature Vector

Naive Bayes Classification

End

Fig. 3 Procedure for the first step of WOMSFA

123

452

C.-C. Chern et al.

retrieval research (Manning et al. 2009). Traditionally, for information retrieval purposes, a TF-IDF metric is derived for each keyword in a document to show how important the keyword is in the document. However, in our study, relevant keywords for polarity and intensity mining are identified and grouped into different semantic categories. Therefore, for polarity and intensity mining, the analysis of the content of a review is better conducted at the semantic category level (i.e., group of keywords) than at the individual keyword level. Accordingly, the TF-IDF metric is expanded as follows: Let the keyword dictionary for semantic category i (e.g., Positive, Negative, Fine, Bad, Strong or Weak) be Di = {w1, w2, …, wk, …} and wk be a keyword in Di. The extended TF-TDF metric for semantic category i in review j is defined as: P tfkj N Extended - TF - IDFij ¼ P6 P wk 2Di  ln ; Ni ð i¼1 w 2D tfkj Þ þ 0:01 k

i

where tfkj is the term frequency of keyword wk in document j, N is the total number of reviews in the collection, and Ni is the number of reviews containing some keyword(s) in Di. Table 1 exhibits a mascara review (denoted as Review 1) of ‘‘Product 1’’ extracted from the Internet. Table 2 shows the keywords for each semantic category appearing in Review 1. For example, Review 1 does not contain any keywords in the Positive category, but does contain one keyword (i.e., rich) in the Fine category. Assume that there are only four reviews in the collection. Let the number of reviews containing some keyword(s) in the Positive, Negative, Fine, Bad, Strong, and Weak categories be 1, 1, 3, 2, 1, and 1, respectively. The calculation of the extended TFIDF metric (weight) for each semantic category in Review 1 is as follows: • • •

0 Positive category: Extended - TF - IDF1;1 ¼ 7þ0:01  lnð4=1Þ ¼ 0 0 Negative category: Extended - TF - IDF2;1 ¼ 7þ0:01  lnð4=1Þ ¼ 0 1 Fine category: Extended - TF - IDF3;1 ¼ 7þ0:01  lnð4=3Þ ¼ 0:0410

Table 1 An example of an online mascara review (Review 1) Somewhat rich but too easy to smudge! The degree of smudginess is too dramatic! Since my skin type is oily, the raccoon eyes showed up in less than two hours! It is really unbearable.

Table 2 Calculation of term frequency for Review 1 Semantic category

Positive

Negative

Fine

Bad

Strong

Weak

Keywords in Review 1 that belong to each semantic category

N/A

N/A

rich

Smudge, raccoon eyes

Too easy, too dramatic

Somewhat

Sum of term frequencies in Review 1 for each semantic category

0

0

1

3 (because, after stemming, smudge appears twice)

2

1

123

A sales forecasting model for consumer products

• • •

453

3 Bad category: Extended - TF - IDF4;1 ¼ 7þ0:01  lnð4=2Þ ¼ 0:2966 2 Strong category: Extended - TF - IDF5;1 ¼ 7þ0:01  lnð4=1Þ ¼ 0:3955 1 Weak category: Extended - TF - IDF6;1 ¼ 7þ0:01  lnð4=1Þ ¼ 0:1978

In addition to extracting and representing keywords in online product reviews that reflect their polarity and sentiment intensity, some of the properties of the reviews and reader responses are also considered in this study. For example, evidence shows that a ‘‘helpful’’ review tends to have more influence than a general one (Chen et al. 2008; Forman et al. 2008; Hu et al. 2008). Accordingly, we employ several measurements to analyze the influence strength of an online review, including the length of the review (in terms of number of words), the number of ‘‘likes,’’ the number of ‘‘clicks,’’ and the rating of the product in the review. Previous studies also suggest that the reviewer is an important factor affecting the influence strength of a product review (Amblee and Bui 2008; Hu et al. 2008). In this study, we develop a method to quantify the importance of each reviewer, which takes several dimensions into consideration. First, it is believed that celebrities attract more attention and have more influence than non-celebrities; hence recruiting a movie star as the spokesperson for a product is a generally adopted marketing strategy. In addition to the celebrity effect, a reviewer’s experience and knowledge about a product affects the information content of his or her review. Online word-ofmouth from an expert is more accurate and informative than word-of-mouth from ordinary reviewers. Thus, expert impact is also a dimension that should be considered. In this study, we adopt multiple characteristics for the basis of scoring celebrity and expert impacts, including the number of previously posted reviews, the number of previously posted reviews in the product category, ‘‘like’’ counts, the number of followers, and the activity level of the focal reviewer, which shows how frequently the reviewer participates in discussions and shares his/her experiences. Each of the five dimensional scores is then multiplied by a weight and all the scores are totaled to obtain a total weighted score representing the importance of a reviewer. In this study, weights are established based on the advices of experts and needed to be adjusted according to product properties and online review platform characteristics. For example, Table 3 shows an example of the reviewer analysis. The score of Reviewer 1 would be 147 9 0.5 ? 80 9 0.2 ? 1,540 9 0.3 ? 196 9 0.3 ? 99 9 0.1 = 620.2, while for Reviewer 2, it would be 30 9 0.5 ? 19 0.2 ? 166 9 0.3 ? 890.3 ? 76 9 0.1 = 75. In this example, Reviewer 1 is more

Table 3 An Example of Scoring the Influence of a Reviewer Total number of reviews

Number of reviews in category

Number of likes

Number of followers

Activeness

Score

Weight

0.5

0.2

0.3

0.3

0.1



Reviewer 1

147

80

1,540

196

99

620.2

Reviewer 2

30

1

166

8

76

75

123

454

C.-C. Chern et al.

influential than Reviewer 2 and thus the reviews from Reviewer 1 should be considered more important than the reviews from Reviewer 2. Finally, for each online product review, we can construct a feature vector covering the importance of each semantic category (extended TF-IDF metrics) in the focal review, the properties of the focal review, the responses from readers, and the characteristics of the reviewer who posted the focal review. Specifically, the feature vector of review i is defined as Vi = (Positive category, Negative category, Fine category, Bad category, Strong category, Weak category, review length, number of ‘‘likes,’’ number of ‘‘clicks,’’ product rating, reviewer score). Let us use the review shown in Table 1 to illustrate how its feature vector is derived. The review displayed in Table 1 contains 34 words (i.e., its review length is 34). Assume that this review has earned ‘‘Likes’’ from three readers and has been read ten times. Let the reviewer who wrote this review be Reviewer 1. Thus, the reviewer score is 620.2 (see Table 3). Furthermore, assume that Reviewer 1 graded the product mentioned in this review with three stars (out of a total of seven stars). Accordingly, the corresponding feature vector for this review will be (0, 0, 0.0410, 0.2966, 0.3955, 0.1978, 34, 3, 10, 3, 620.2). After constructing the feature vectors for all reviews, we use WEKA1 (Hall et al. 2009), a data mining software to build a Naive Bayes classifier. All product reviews form a document space X. Each review in X is categorized into one of the three classes with different influence strengths: strongly positive, strongly negative and neutral. A subset of the reviews, D, is selected as the training set, the input of the learning method, where D , X. Another segment of product reviews, E , X, is the test set, which is used to verify the accuracy and efficiency of the constructed classifier model. 3.2 Construct the review influence curve (Step 2) In most previous studies, the impact of WOM is simplified as a single period, crosssectional factor (Duan et al. 2008). Since an online review remains accessible on the online review platform once it is posted, to consider its impact as a single-period cross-sectional factor is unreasonable. On the other hand, most products have limited life cycles, so that at the end of the product’s life cycle, online discussions related to this product can barely be found. Furthermore, manufacturers frequently release new product versions as product upgrades so readers usually prefer more recently published reviews. Hence, the timeliness of an online review is a significant factor in analyzing its influence, and it is reasonable to set a ‘‘time limit’’ on the influence of online reviews. Two factors affect the time limit: product life cycle and the number of online reviews. The shorter the life cycle, the smaller the time limit, while the lower the number of related reviews, the longer and stronger the impact lasts. Candidate review influence curves should decrease over time and become zero at infinity or increase at the beginning and decrease after a peak. In this study, we select the

1

Available from: http://www.cs.waikato.ac.nz/ml/weka/.

123

A sales forecasting model for consumer products

455

Table 4 An example of review influence calculations Time period

1

2

Review 1

0.1353

3

4

5

6

7

8

9

0.2707

0.2707

0.1804

0.0902

0.0361

0.0120

0.0034

Review 2

0.1353

0.2707

0.2707

0.1804

0.0902

0.0361

0.0120

Review 3

0.1353

0.2707

0.2707

0.1804

0.0902

0.0361

0.0120

0.0034

0.1353

0.2707

0.2707

0.1804

0.3519

0.3549

0.2982

0.1873

Review 4 0.1353

weight

Sum

0.5413

0.8120

0.7218

0.4511

0.9000

Review 1

0.8000

Review 2

0.7000

Review 3

0.6000

Review 4

0.5000

Sum

0.0034

0.4000 0.3000 0.2000 0.1000 0.0000 1

2

3

4

5

6

7

8

9

Time Period Fig. 4 Example of review influence curves

probability density function of an Exponential distribution or a Poisson probability distribution with multiple k as candidate review influence curves. For example, assume there are four strongly positive reviews about Product 1. Review 1 appears at time 1, Reviews 2 and 3 appear at time 2, and Review 4 appears at time 6. Let the influence of these reviews follow the Poisson probability distribution with k = 2 as f(k, k) = e-kkk-1/(k - 1)! For each review, assume that its influence lasts for eight time periods with varying strength (pi = 8), as shown in Table 4. We graphically illustrate the influence curve of each review and the sum of all review influences in Fig. 4. Nevertheless, reviews are not independent of each other. In other words, the influence of one review may be affected by other reviews. For example, review R1 is posted and the reviewer evaluates the product very positively. If review R1 is analyzed on its own, it may be deemed important. However, suppose that several reviews appear subsequently and most of these reviews describe bad experiences. Consumers may then doubt the opinion in review R1, and the influence of R1 will diminish. To solve this problem, we add a ‘‘weight over time’’ parameter, wit, to adjust the review influence predictor. The review influences are multiplied by the weights of the corresponding periods for review i. For instance, the strongly positive review influences are multiplied by the weights of the corresponding periods and the calculation of the strongly positive review influence at time t will be:

123

456

C.-C. Chern et al.

SPosjt ¼

XPos Xjt

wit gði; tÞ;

i¼1

where XPosjt is the number of strongly positive reviews for product j at time t, wit is the weight of period t for review i, g(i, t) is the influence of review i at time t, and SPosjt is the strongly positive review influence for product j at time t. For instance, in the case where the influence of a strongly positive review follows the curve of the Poisson probability distribution with k = 2, for Review 1, g(1,1) = e-220/(0!), g(1,2) = e-221/(1!), g(1,3) = e-222/(2!), g(1,4) = e-223/(3!), g(1,5) = e-224/(4!), g(1,6) = e-225/(5!), g(1,7) = e-226/(6!), g(1,8) = e-227/(7!) and g(1,t) = 0 for t [ 8, while for Review 2, g(2,1) = 0, g(2,2) = e-220/(0!), g(2,3) = e-221/(1!), g(2,4) = e-222/(2!), g(2,5) = e-223/(3!), g(2,6) = e-224/(4!), g(2,7) = e-225/(5!), g(2,8) = e-226/(6!), g(2,9) = e-227/(7!), and g(2,t) = 0 for t [ 9. The same applies for strongly negative and neutral reviews. For each review, assume that it has influence over eight time periods with different strengths. In the above example, Review 1 has influence from period 1 to period 8, while Review 2 has influence from period 2 to period 9, and so on. To obtain the strongly positive review influence SPosj,t, we total all the weighted influences of reviews for each time period. Weight over time is time related and should be determined according to the product properties being discussed. In this study, we perform various kinds of weight-over-time adjustments to find the weight that best represents the influence of online reviews. 3.3 Search for the regression model with the lowest MAPE (Step 3) In the previous two steps, the collected online product reviews are analyzed and categorized into three classes with different influence strengths, and then various review influence curves are hypothesized. These are the inputs of the least square regression models. Different adjustment approaches are employed to tune the independent variables and find a best fit regression model (with the lowest MAPE) (Kahn 1998; Keller 2012). Three parameters are adjusted: the delay of effect, the review influence curve, and the weights over time. The dependent variable of the regression model is the product sales in each time period Si,t. The independent variables include the product sales of the preceding time period Si,t-1, the strongly positive review influence SPosi,t, the strongly negative review influence SNegi,t, the neutral review influence Neutrali,t, and time period t. To prevent the multicollinearity problem, the correlations of independent variables are examined. If the correlation coefficient is[0.7, one of the independent variables is discarded according to the following sequence: the neutral review influence is discarded first, followed by the strongly negative review influence. For each independent variable pertaining to the influence of online reviews (strongly positive, strongly negative, or neutral), we try different adjustment approaches and build a corresponding regression model. The adjustment approach includes delay of effect Li, review influence curve, and weight over time. For the delay of effect, each independent variable pertaining to the influence of online

123

A sales forecasting model for consumer products

457

reviews needs to try various values for Li, from 0 to Lmaxi. In addition, we use different influence curves (Exponential and Poisson with different k) to adjust the review influence. Our last adjustment step is to add weights over time. Finally, the adjusted reviews are multiplied by the weights over time. 3.4 Validate the regression model (Step 4) The validation of a regression model evaluates whether the regression model violates the four required assumptions of the residual analysis. They include that the error term is a random variable with mean of zero, the error term follows a normal distribution, the variance of error term is the same for all values of the independent variables (homoscedasticity) and the error terms are independent of each other. We adopt the v2 test for normality and the Durbin–Watson test to test these requirements. Furthermore, the coefficients of the selected regression model need to be consistent with our assumptions: the strongly positive review has a positive influence on sales and the strongly negative review has a negative influence on sales. The selected regression model has to satisfy the aforementioned requirements. If any required condition is violated, the regression model is discarded. The model with the second lowest MAPE is searched for substitution and the requirements are examined again. The procedure is repeated until a best fit regression model is selected for each product.

4 The implementation of a real case example In this section, we use the bestselling products of a well-known personal care product retail chain store company in Taiwan to demonstrate the effect of eWOM. The selected company has more than 355 chain stores in Taiwan (2012) and had a turnover of NTD 6.4 billion in 2009. The scale of this personal care product retail chain store company makes it very representative as the company continuously expands. We consider the bestselling products sold by this well-known personal care product retailer in Taiwan from August 2004 to April 2006. After removing products with incomplete data, the remaining 107 products are analyzed in this section. To demonstrate WOMSFA, we collect online product reviews from a wellknown, popular cosmetics forum in Taiwan named Urcosme (www.urcosme.com. tw) (Urcosme 2012) in the first step. A total of 8,386 online reviews for 100 different products are collected from Urcosme, which means that 7 products do not have any online review at all. Experts are invited to manually classify 10.84 % of the reviews (909 reviews) into three classes: strongly positive, strongly negative, and neutral. The feature vectors for these manually classified reviews are then constructed in the first step of our WOMSFA method. WEKA (Hall et al. 2009), a data mining software, is adopted to build the classification model which is then used to categorize the remaining reviews automatically. The model is based on the Naive Bayes classification approach. We

123

458

C.-C. Chern et al.

Table 5 Result of the classification of the training set Experts classified as/WOMSFA classified as

Strongly positive

Strongly positive

104

2

74

Strongly negative

2

47

50

27

24

579

Neutral

Strongly negative

Neutral

have 909 reviews for model training. Using the ten-fold cross-validation approach, the classification model results in 133 strongly positive (i.e., 104 ? 2 ? 27 = 133), 73 strongly negative (i.e., 2 ? 47 ? 24 = 73), and 703 neutral reviews (i.e., 74 ? 50 ? 579 = 703), as Table 5 shows. The model has an accuracy rate of 80.31 %. However, a closer inspection of Table 5 reveals that the discrepancies between the expert classification results and the classifier results mainly occur when reviews are classified from strongly positive/negative to neutral or from neutral to strongly positive/negative, but rarely from strongly positive/negative to strongly negative/positive. Serious classification mistakes (strongly positive reviews classified as strongly negative, or strongly negative reviews classified as strongly positive) seldom occurred (only two in each case). The product selected for demonstration here is ‘‘Product 3, Brand AM-M Mascara’’ and sales period SN3 is 28 weeks. The maximum delay, Lmax3, is set to 2 weeks (LL3 = 0–2). The number of periods for which the influence of a single review lasts (p3) is set as eight. There are four different candidate review influence curves, and two types of weight over time (k = 4, LW = 2). Two of the curves are exponential with k equal to 1 or 0.5. The others are Poisson functions with k equal to 4 or 2. Considering dependency among reviews, we set two types of weights to pffi 1 adjust review influences: wt = 1 or wt = t where t [ {1, SNi}. The former is a constant 1, which assumes that the reviews are independent of each other and of time, while the latter is temporal function and decreases over time, which assumes that the preceding eWOM has more influence than the subsequent eWOM. The regression model involves five independent variables: the product sales of the preceding time period, the strongly positive review influence, the strongly negative review influence, the neutral review influence and the current time period. To search for a best fit prediction model, we try various adjustments. For each class of reviews, there are four possible influence curves and three possible delays (LL3 = 0, 1, 2). By this parameter setting, each product with two types of weight over time will try 43 9 33 9 2 = 3,456 different scenarios. In our collection of online product reviews, 175 reviews are related to Product 3. 45 of them are categorized as strongly positive, 16 strongly negative and 144 neutral by the Naive Bayes classifier. For scenario 1, the delays of effect for all the reviews are set to 1 (LPos3 = LNeg3 = LNeutral3 = 1). The influence curve of the strongly positive reviews is the exponential distribution with k = 0.5 (CPos3 = 2). The influence curve of the strongly negative reviews is the Poisson distribution with k = 4 (CNeg3 = 4). The influence curve of the neutral reviews is the exponential distribution with k = 1 (CNeutral3 = 1). Weight over time wt is a constant of 1.

123

A sales forecasting model for consumer products

459

For scenario 1, the influence curves of strongly positive, strongly negative, and neutral reviews are presented in Fig. 5. WOMSFA calculates the correlation coefficients between these review influences. The correlation coefficients among the three review influences are all \0.7, implying that the influences are not linearly related. Nevertheless, there are only 28 weeks of sales, which is less than six times of the number of predictors (Keller 2012). In order to make the model valid, we need to discard at least one predictor. According to the rules, we removed the influence of neutral reviews from the regression model and solve the regression model. Using the least square method, the best fit regression model is y^3;t = 311.8076 ? 286.3348 9 SPos3,t - 459.154 9 SNeg3,t ? 0.2243 9 S3,t-1 7.88 9 t. This prediction model led to a 14.87 % MAPE, which is much better than 28.33 %, the MAPE of a 3-period moving average method. For scenario 2, the delay of effect for strongly positive reviews is set to 2, while the others are set to 1 (LPos3 = 2, LNeg3 = LNeutral3 = 1). Both the strongly 25.00 Pos Neg

20.00

Neutral 15.00 10.00 5.00 0.00

1

6

11

16

21

26

Time Fig. 5 Review influence curves for Product 3, Scenario 1 9 8

Pos

7

Neg

6

Neutral

5 4 3 2 1 0

1

6

11

16

21

26

Time Fig. 6 Review influence curves for Product 3, Scenario 2

123

460

C.-C. Chern et al.

positive and strongly negative review influence curves are the Poisson distribution with k = 4 (CPos3 = CNeg3 = 3). The influence curve of neutral reviews is the Poisson distribution with k = 2 (CNeutral3 = 4). The weight over time wt is a temporal function. For scenario 2, the three influence curves of each review are as shown in Fig. 6. The first step is to check the correlation between the predictors. The correlation coefficient of the influence of strongly positive reviews and that of strongly negative reviews is 0.8176, which means that simultaneously adopting these two predictors will lead to multicollinearity. Based on the priority, we discard the predictor of the strongly negative review influence and adopt the strongly positive review influence as an independent variable in the regression model, and then solve the regression model. Using the least square method, the best fit regression model is y^3;t = 110.41 ? 362.24 9 SPos3,t ? 179.84 9 Neutral3,t ? 0.42 9 S3,t–1 ? 3.83. This prediction model led to a 20.45 % MAPE, which is also better than the 28.33 % MAPE of a 3-period moving average method. After solving 3,456 scenarios, we select the one with the lowest MAPE and validate its usability. For the model validation process, we test the four required assumptions of the regression model. In addition, the coefficients need to be consistent with our assumptions. The valid regression model with the lowest MAPE of the 3,456 scenarios is: y^3;t = 311.8076 ? 286.3348 9 SPos3,t - 459.154 9 SNeg3,t ? 0.2243 9 S3,t-1 - 7.88 9 t, which is obtained with scenario 1. Accordingly, we choose this best fit model and use it to predict the sales of time period 29. To forecast the sales of period 29, we need to use the reviews of the preceding eight periods (period 21 to period 28) and employ the selected curves to model the review influences. The selected curve for strongly positive reviews is Curve 2 and SPos3,29 = 2.8493. The selected curve for strongly negative reviews is Curve 4 and SNeg3,29 = 0.4992. Also, the observed sales level for period 28 is 591.62. Hence,

2000

Observed Predicted

1800 1600 1400 1200 1000 800 600 400 200 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Time Fig. 7 The observed and predicted sales for Product 3

123

A sales forecasting model for consumer products

461

the predicted sales level for period 29 is y^3;29 = 311.8076 ? 286.3348 9 SPos 3,29 - 459.154 9 SNeg3,29 ? 0.2243 9 S3,28 - 7.88 9 29 = 311.8076 ? 286.33 9 2.8493 - 459.154 9 0.4992 ? 0.2243 9 591.62 - 7.88 9 29 = 802.62. The predicted sales solved by this model and the actual sales of Product 3 are illustrated in Fig. 7.

5 The real case result analysis We use the total sales volume to select the 107 best selling products, which include cosmetic products such as mascara, eye shadow and foundation, as well as daily commodities, such as facial cleanser and shampoo. Some products have abundant online reviews because they have longer sales periods. Among the products with online reviews available, the average number of released reviews per week falls within the range of 0.03 (Product 42) to 7.30 (Product 12). As mentioned before, WOMSFA forecasts future sales for consumer products based on online reviews or eWOM. Because WOMSFA can only be applied to products with enough reviews, we assume that the products need to have at least a weekly average of 0.5 released reviews to be able to apply WOMSFA. Of the 107 products, 67 products have weekly averages of more than 0.5 released reviews. In addition, the product price is also a possible factor. The prices of Products 25, 35, 52, 54 and 69 are all below the average price for the 107 products. If the product price is relatively low, users are less likely to spend time to search for online comments. Hence, price is one of important factors to explain the impact of online reviews on product sales. The average selling price of the 107 products is NT$208 (US$7) with a standard deviation of NT$87 (US$3). Therefore, we require that the products need to have a selling price of at least NT$121 (US$4) to be able to apply

700 600

Price (NT$)

500 400 300 200 100 0 0

1

2

3

4

5

6

7

8

No. of Reviews / Period Fig. 8 Scatter diagram for 107 products

123

462

C.-C. Chern et al.

WOMSFA, which is the average selling price minus one standard deviation. Of the 107 products, 87 products have selling prices of over NT$121 (US$4). As shown in Fig. 8, the scatter diagram of 107 products shows the relationship between price and the number of reviews per period. In summary, 52 products (the green shaded area in Fig. 8) have selling prices of over NT$121 (US$4) and weekly averages of more than 0.5 released reviews. Thirty-five (35) products (the peach shaded area in Fig. 8) have selling prices of over NT$121 (US$4), but weekly averages of fewer than 0.5 released reviews, while 20 products (the blue shaded area in Fig. 8) have selling prices of less than NT$121 (US$4). Therefore, 52 products are suitable to apply WOMSFA. We collect and classify 5,848 product reviews for these 52 products using the Naive Bayes classifier we constructed. It is reasonable to say that not every product is associated with all three classes (strongly positive, strongly negative and neutral) of reviews, and in addition, the proportions of each class may differ from one another. For example, Products 17, 18, 19, 20, 30, 33 and 38 do not have any strongly negative review and Product 48 does not have any strongly positive review. Product 1 has 52.1 % strongly positive reviews (62 strongly positive reviews out of 119 reviews), while Product 2 has only 7.1 % strongly positive reviews (22 strongly positive reviews out of 311 reviews). Product 19 has the largest proportion (91.15 %) of strongly positive reviews, whereas Product 48 has the largest proportion (75 %) of strongly negative reviews. We set the maximum delay LLi to 2 weeks for most of the products. For some products with highly volatile sales trends (Products 2, 50, 47, 44 and 45), we set the maximum delay to 3 weeks in order to expand the search scope. All of the products have four candidate influence curves for each review category (i.e., strongly positive, strongly negative and neutral) and two types of weights over time. The curves and weights are identical to those described in Sect. 4. Using these parameters, each product with LLi = 2 will examine 3,456 different scenarios, while each product with LLi = 3 will examine 43 9 43 9 2 = 8,192 different scenarios. Among the 52 products, WOMSFA could not find suitable models for 10 products, as seen in Table 6 from Products 43 to 52. By comparing the sales trends of these 10 products with those of other products, we find that they generally have smoother sales trends and longer life cycles than other cosmetic products. Smoother sales trends aside, the products with longer life cycles are more resistant to the influences of online reviews. Products 49 and 51 are the examples of the products with long life cycles and we cannot find any acceptable forecasting model using WOMSFA. The observed sales and review influence curves are shown in Figs. 9, 10, 11 and 12 (the reviews are adjusted using exponential distribution with k = 1 and delays were set to 0). As shown in the figures, the sales trends are smooth while the review influence curves are highly volatile. It is reasonable to assume that the relationship between sales and online reviews is weak for this type of products. Furthermore, some products are aggregates of multiple types or colors. For example, Product 50 is an aggregate of several series of eye shadows and Product 52 is an aggregate of different color blushes. We can hardly separate sales data and online reviews by series and color. The difference in sales and online reviews

123

JP-K

AM-M

AM-M

AM-M

JP-W

AM-M

JP-Z

AM-M

JP-W

AM-M

JP-W

JP-W

JP-S

4

5

6

7

8

9

10

11

12

13

14

15

16

JP-K

AM-M

3

EU-L

AM-M

2

18

JP-S

1

17

Brand

Item no

Eye shadow

Mascara

Mascara base

Lotion

Brightening powder

Mascara

Toner

Foundation

Pressed powder

Mascara

Pressed powder

Mousse foundation

Mascara

Mascara

Eye shadow

Mascara

Mascara

Makeup remover

Product

34

79

66

87

63

91

75

34

87

15

79

20

90

78

48

28

66

50

No. of sales periods

0.59

1.28

1.09

1.09

1.17

2.59

1.92

1.26

2.13

7

1.24

6.05

2.44

1.81

2.54

6.25

4.71

2.38

No. of reviews/ period

Table 6 Forecasting results for the 52 products

228

290

245

294

322

219

273

259

266

240

273

366

185

218

260

218

259

245

Selling Price

20

101

72

95

74

236

144

43

185

105

98

121

220

141

122

175

311

119

Total number of reviews

17

78

44

52

58

106

63

15

77

60

52

52

80

56

84

45

22

62

Strongly positive reviews

0

0

6

3

3

12

7

2

16

8

7

8

41

22

1

16

86

11

Strongly negative reviews

3

23

22

40

13

118

74

26

92

37

39

61

99

63

37

114

203

46

Neutral reviews

15.02

23.41

15.60

21.27

20.07

19.21

18.03

15.52

15.85

34.12

13.45

20.68

20.53

16.72

20.84

14.87

18.69

14.95

MAPE of WOMSFA (%)

25.54

29.95

27.49

29.11

28.26

24.90

32.77

25.79

21.01

55.48

21.07

39.08

28.61

23.91

35.12

28.33

28.80

23.40

MAPE of moving average (%)

23.92

27.44

25.09

26.45

26.36

23.58

29.15

23.99

18.68

48.74

18.98

33.31

25.37

21.79

31.95

25.95

25.64

20.83

MAPE of weighted moving average (%)

A sales forecasting model for consumer products 463

123

Brand

JP-T

AMM

JP-S

JP-M

JP-Z

JP-W

JP-S

JP-S

JP-Z

JP-W

JP-S

JP-K

JP-S

AMM

JP-W

JP-Z

JP-Z

Item no

19

20

123

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

Lotion

Makeup base

Facial cleanser

Liquid eyeliner

Sunscreen

Blush

Liquid sunscreen

Toner

Mascara

Finishing mascara

Mascara

Mask

Face scrubs

Blush

Makeup remover

Lipstick

Lip gloss

Product

Table 6 continued

83

58

74

67

82

26

71

76

89

68

69

62

88

91

77

26

27

No. of sales periods

1.25

0.86

0.77

0.78

2.13

0.73

1.28

0.95

3.56

1.93

0.57

0.65

1.64

0.55

4.35

0.85

4.19

No. of reviews/ period

202

196

188

185

168

221

168

258

175

203

245

252

175

137

244

204

162

Selling Price

104

50

57

52

175

19

91

72

317

131

39

40

144

50

335

22

113

Total number of reviews

20

16

36

24

99

12

78

3

177

71

19

18

19

29

70

14

103

Strongly positive reviews

19

8

0

7

8

0

4

35

16

8

3

6

59

2

101

0

0

Strongly negative reviews

65

26

21

21

68

7

9

34

124

52

17

16

66

19

164

8

10

Neutral reviews

21.90

17.31

18.87

18.09

30.11

25.99

35.35

20.88

23.16

17.64

31.30

26.33

23.23

21.61

39.81

14.97

14.69

MAPE of WOMSFA (%)

27.65

24.97

22.18

24.41

38.52

48.16

36.66

24.95

33.38

25.98

44.82

30.83

29.14

23.34

125.28

27.72

27.48

MAPE of moving average (%)

25.56

23.68

20.48

22.23

35.06

41.87

34.28

24.07

29.68

23.85

40.80

28.31

27.20

21.14

96.42

24.31

24.00

MAPE of weighted moving average (%)

464 C.-C. Chern et al.

Brand

JP-Z

JP-Z

JP-K

JP-Z

JP-T

JP-C

JP-T

EU-L

JP-M

JP-Z

JP-Z

JP-M

AM-M

EU-G

JP-K

EU-G

AM-M

Item no

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

Blush

Concentrate

Eye shadow

Brightening pencil

Eyeliner

Pressed powder

Eyeliner

Powder

Makeup base

Eye makeup remover

Liquid eyeliner

Hair treatment

Mascara

Moisture gel

Powder

Toner

Sunscreen cream

Product

Table 6 continued

88

53

74

59

9

54

80

84

65

84

27

50

27

38

12

86

84

No. of sales periods

0.74

0.81

0.82

0.83

0.89

1.26

1.3

1.32

1.51

2.94

7.3

1.6

1.04

2.5

0.67

1.4

3.25

No. of reviews/ period

185

240

254

252

144

234

126

196

143

224

189

252

189

224

325

202

140

Selling Price

65

43

61

49

8

68

104

111

98

247

197

80

28

95

8

120

273

Total number of reviews

32

1

35

3

0

41

8

68

45

197

116

60

8

59

7

42

142

Strongly positive reviews

6

11

2

35

6

1

51

5

6

2

2

3

5

4

0

17

27

Strongly negative reviews

27

31

24

11

2

26

45

38

47

48

79

17

15

32

1

61

104

Neutral reviews





















71.82

12.99

10.47

18.70

13.80

24.28

36.75

MAPE of WOMSFA (%)

40.61

27.35

26.78

33.85

18.14

21.73

20.76

26.45

34.19

50.17

113.33

19.29

29.24

38.65

32.42

29.05

45.43

MAPE of moving average (%)

36.82

25.20

24.17

29.78

17.07

20.68

20.10

24.72

30.92

43.34

101.75

17.99

26.11

35.21

29.52

27.07

41.58

MAPE of weighted moving average (%)

A sales forecasting model for consumer products 465

123

466

C.-C. Chern et al. 2500 2000 1500 1000 500 0 1

6

11

16

21

26

31

36

41

46

51

56

36

41

46

51

56

Time Fig. 9 Observed sales for Product 49

4.5000 4.0000

Posive

3.5000

Negave

3.0000

Neutral

2.5000 2.0000 1.5000 1.0000 0.5000 0.0000 1

6

11

16

21

26

31

Time Fig. 10 Review influence curves for Product 49

700 600 500 400 300 200 100 0 1

6

11

16

21

26

31

36

41

46

51

Time Fig. 11 Observed sales for Product 51

between different colors might end up being large and the aggregation would lead to inaccurate forecasting. Finally, the prices of Products 45, 46 and 52 are all below the average price of the 107 products. If the product price is relatively low, users are

123

A sales forecasting model for consumer products

467

7.0000

Posive Negave Neutral

6.0000 5.0000 4.0000 3.0000 2.0000 1.0000 0.0000 1

6

11

16

21

26

31

36

41

46

51

Time Fig. 12 Review influence curves for Product 51

7.00 Pos Neg Neutral

6.00 5.00 4.00 3.00 2.00 1.00 0.00

1

6

11 16 21 26 31 36 41 46 51 56 61 66 71 76

Time Fig. 13 Review influence curves for Product 8

less likely to spend time to search for online comments. Hence, low price is one of the explanations for the diminished impact of online reviews on product sales. Within the 52 products shown in Table 6, 42 have favorable WOMSFA forecasting results when comparing the MAPE of WOMSFA with that of traditional forecasting methods. For example, the MAPE of the 3-period moving average for Product 3 is 28.33 %, while the MAPE attained by WOMSFA is 14.87 %. The MAPE of the 3-period moving average for Product 8 is 21.07 %. With WOMSFA, the MAPE for Product 8 is 13.45 %. The review influence curves and the observed and predicted sales of Product 3 are shown in Figs. 5 and 7, and of Product 8 are shown in Figs. 13 and 14. As can be seen in Figs. 5 and 7, the combined effect of the strongly positive and the strongly negative curves mimics the sales pattern, showing the important parameters of these two in the sales regression model for Product 3. As for Product 8, a similar pattern of the strongly positive curve and the sales demonstrates why the online word-of-mouth is a good predictor for sales forecasting for this product. WOMSFA has the ability to improve forecasting accuracy even for products with exceptionally volatile sales trends. For Product 21, the MAPEs of the 3-period

123

468

C.-C. Chern et al.

Observed Predicted

1200 1000 800 600 400 200 0 1

6

11 16 21 26 31 36 41 46 51 56 61 66 71 76

Time Fig. 14 Observed and predicted sales for Product 8

Posive Negave Neutral

8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 1

6 11 16 21 26 31 36 41 46 51 56 61 66 71 76

Time Fig. 15 Review influence curves for Product 21

moving average and the weighted moving average are 125.28 and 96.42 %, respectively. By adopting WOMSFA for sales forecasting, we can achieve the MAPE of 39.81 %. The review influence curves and the observed and predicted sales are shown in Figs. 15 and 16. The review influence pattern for Product 21 is not really similar to the sales pattern. However, since the sales are so irregular, the review influence can actually help explain the fluctuation of sales and improve forecasting accuracy. For Product 29, we obtain a valid forecasting regression model. While compared with other forecasting approaches, the performance of WOMSFA is not better. We compare the MAPE of WOMSFA with the MAPE of the 3-period moving average and the MAPE of the weighted 3-period moving average. The weights for the weighted 3-period moving average were (wt-3, wt-2, wt-1) = (0.2, 0.3, 0.5). The MAPEs of the 3-period moving average and weighted moving average are 36.66

123

A sales forecasting model for consumer products

469

1200

Observed

1000

Predicted

800 600 400 200 0

1

6

11 16 21 26 31 36 41 46 51 56 61 66 71 76

Time Fig. 16 Observed and predicted sales for Product 21

and 34.28 %, respectively. By adopting WOMSFA for sales forecasting, we can achieve a MAPE of 35.35 % (bold values in Table 6), which is better than the 3-period moving average but not as good as the weighted moving average. We perform a Student’s paired t test of the 42 products to compare the MAPEs of WOMSFA with the 3-period moving average as well as the MAPEs of WOMSFA with the weighted 3-period moving average. The average and standard deviation of the MAPEs achieved by WOMSFA are 22.12 and 10.38 %, those by the 3-period moving average are 34.56 and 20.71 %, and those by the weighted 3-period moving average are 30.94 and 10.81 %. The t statistics and p value for comparing the MAPEs attained by WOMSFA and the 3-period moving average are -5.9396 and 2.64 9 10-7, while the t statistics and p value for comparing the MAPEs achieved by WOMSFA and the weighted 3-period moving average are -6.0712 and 1.71 9 10-7. Statistically, WOMSFA significantly outperforms the 3-period moving average and the weighted 3-period moving average. With this real case verification, it is concluded that not all product sales are affected by online WOM. Products with abundant online discussions are often influenced by these online reviews. The core of WOMSFA is online product reviews and thus WOMSFA can improve forecasting accuracy and help companies reduce extra holding costs or potential sales losses when products have numerous online reviews. Daily commodities, such as facial cleansers and shampoos, often have smoother sales trends and fewer online discussions, and are hardly affected by online reviews. To improve forecasting accuracy, one should consider other predictors for commodity products, such as discounts, promotions, or advertisements. It is efficient to classify online reviews into strongly influential and neutral ones. Our task has proven that the influences of the three categories of reviews are distinct. The importance of each review is analyzed by considering the review content, the characteristics of reviewers and the responses from readers. Some celebrity reviewers are more influential and their posts attract more consumer attention. Some reviews are less useful and readers tended to ignore them.

123

470

C.-C. Chern et al.

Separating more important reviews from neutral ones helped us better understand the relationship between online reviews and sales. The assumption that the influence of a single review is continuous and nonhomogeneous over time is also verified. In our proposed approach, we model the influence of a single WOM as a curve first ascending and then descending over time. Compared with previous research, our proposed model is more reasonable and more applicable to the real case. WOMSFA has the advantage of automation. Once the system is constructed, text retrieval, review classification, forecasting model searches and validation are processed automatically. In addition, WOMSFA is robust because it searches for the best fit model by setting a search scope, which shows that WOMSFA is not too sensitive to the classification result. As long as the error of the classification result is acceptable, WOMSFA has the ability to find a favorable model. The model search scope determines the complexity of WOMSFA. The search scope consists of the delay of effect, the influence curves and weights over time and thus, their selections are critical to the efficiency of WOMSFA. To determine the delay of effect, we should consider the product life cycle. The shorter the life cycle is, the shorter the delay will be. In our study, we choose 2 weeks as an upper boundary of delay for the products that usually have a life cycle of 6 months to 2 years. The influence curves are also related to the product life cycle. If the life cycle is short, the influence curve is short and sharp. In contrast, products with a longer life cycle come with smoother influence curves. Finally, to choose the weight over time, one should carefully investigate the interaction between users and reviewers on the online review platform. The proposed algorithm can be applied to other types of products as long as customer product opinions are likely to influence their sales. For example, both 3C products and movie box offices can use WOMSFA to forecast future sales since the sales are closely related to online WOM. Companies can build keyword dictionaries and adjust parameters according to their products’ characteristics. As the relationship between online WOM and product sales is confirmed, companies can adopt WOM as a marketing strategy and improve its operational efficiency. The drawback of WOMSFA is the cost of system building: reviews that serve as the training dataset must be manually classified by domain experts. Moreover, WOMSFA can only be used for products with historical sales data, i.e., new products are not suitable for WOMSFA. Another property of WOMSFA is that the text retrieval procedure is significantly related to language and products. Keyword dictionaries must be constructed according to the language characteristics, vocabularies commonly used by reviewers/consumers, and product properties.

6 Conclusion This study contributes to sales forecasting research in two ways. First, by building a novel online review classification model based on text mining and influence analysis, we provide a theoretical framework for understanding the difference between semantic categories of reviews. In addition, we introduce review influence

123

A sales forecasting model for consumer products

471

on sales forecasting and verify the close relationship between online word-of-mouth and consumer behavior. A real case is examined and verified in the study. This study shows that there is a significant relationship between online customer opinions and product sales. Previous research has acknowledged the relationship between e-commerce platforms and their online review systems. This study expands the concept by adding a third party online discussion platform to physical channels, such as the retail chain stores. This also suggests that e-commerce and physical channels should view online WOM as a type of virtual currency that can make or break their products. However, not all product sales are affected by online WOM. For most of the products in our testing data set, WOMSFA improves forecasting accuracy. However, for products with long life cycles, they tend to generate few online discussions, and hence, are not suitable to apply WOMSFA. A novel classification model that involves polarity mining, intensity mining and influence analysis is proposed in this study. The polarity mining and intensity mining are based on specialized keyword dictionaries constructed by domain experts, while the influence analysis considered the review content and the reviewers’ properties simultaneously. Our finding shows that this classification method is favorable and the influences of the three classes of reviews differ significantly, implying that companies should not pay attention to review volumes or positive reviews only. We conclude that some reviews have greater influence on consumer behavior than others, and that these reviews should be extracted and analyzed individually. Our work confirms that the influence of a single review should be continuous, but decrease over time. Compared with the single-period impact model of previous studies, the developed temporal influence model is more reasonable and applicable to analyzing the influence of online WOM. It also shows in this study that the forecasting method that adopts the refined review influence model outperforms traditional time series forecasting models. These findings imply that by investigating the influence of online reviews, we can improve forecasting accuracy. Overall, this study contributes to the literature by proposing a new method of review classification, and by introducing review influence on sales forecasting for fashion products. The result is favorable and shows that online WOM is a type of virtual currency that affects product sales. By using WOMSFA, the improved forecasting accuracy benefits companies by helping them avoid unnecessary holding costs and potential sales losses. In addition, companies are encouraged to manage online customer opinions and regard eWOM as a critical intangible asset. Sales forecasting for new products is of great importance, because firms know nothing about the market and consumer tastes prior to a new product launch. However, the proposed forecasting method of this study, WOMSFA, is based on historical data and released online reviews. Sufficient historical sales data and review numbers are necessary before a suitable forecasting model can be developed. Further research could expand WOMSFA to consider new products by connecting new products to similar existing products. Academics have recognized the importance of consumer advocacy. Our work has shown that there is a significant relationship between online customer opinions and

123

472

C.-C. Chern et al.

product sales. In our study, reviews extracted from popular digital forums are predictive of product sales. However, customers post their opinions only after buying and experiencing the product and are highly unlikely to buy the product again in the near future. Thus, we should also determine whether or not product sales affect online reviews, and whether or not these two items interact. This is a potential topic for future research. The timing and frequency of updates for the proposed forecasting model for each focal product are two very important issues. However, this study does not address these two issues because they involve an entirely new, large-scale investigation effort. The results of sales forecasts are often used to support purchasing and production decisions. Long-term stability is an important consideration when making decisions pertaining to purchasing and production. However, sales departments usually want to react to the market more efficiently and will want to update the forecasting model more frequently. Forecasting model accuracy and timeliness are two crucial considerations when making sales-related decisions. Among different types of decisions, the trade-off between long-term stability and accuracy/timeliness is always important. Therefore, a more thorough investigation will need to be performed to address these two important issues. Acknowledgments This research was sponsored by the Ministry of Science and Technology of Taiwan, under the Grants: NSC 100-2410-H-002-022-MY3 and NSC 100-2410-H-002-021-MY3.

References Amblee N, Bui T (2008) Can brand reputation improve the odds of being reviewed on-line? Int J Electron Commer 12(3):11–28 Arndt J (1967) Role of product-related conversations in the diffusion of a new product. J Mark Res 4(3):291–295 Chen Y, Xie J (2008) Online consumer review: word-of-mouth as a new element of marketing communication mix. Manag Sci 54(3):477–491 Chen P-Y, Dhanasobhon S, Smith MD (2008) All reviews are not created equal: the disaggregate impact of reviews and reviewers at Amazon.com. SSRN: http://ssrn.com/abstract=918083 or doi:10.2139/ ssrn.918083 Chen Y, Wang Q, Xie J (2011) Online social interactions: a natural experiment on word of mouth versus observational learning. J Mark Res 48(2):238–254 Chevalier JA, Mayzlin D (2006) The effect of word of mouth on sales: online book reviews. J Mark Res 43(3):345–354 Duan W, Gu B, Whinston AB (2008) Do online reviews matter? An empirical investigation of panel data. Decis Support Syst 45(4):1007–1016 Forman C, Ghose A, Wiesenfeld B (2008) Examining the relationship between reviews and sales: the role of reviewer identity disclosure in electronic markets. Inf Syst Res 19(3):291–313 Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18 Hu N, Liu L, Zhang JJ (2008) Do online reviews affect product sales? The role of reviewer characteristics and temporal effects. Inf Technol Manag 9(3):201–214 Kahn KB (1998) Benchmarking sales forecasting performance measures. J Bus Forecast 17(4):19–23 Keller G (2012) Managerial statistics. South-Western College Pub, Boston Ku LW, Chen HH (2007) Mining opinions from the web: beyond relevance retrieval. J Am Soc Inf Sci Technol 58(12):1838–1850 Manning CD, Raghavan P, Schutze H (2009) Introduction to information retrieval. Cambridge University Press, Boston

123

A sales forecasting model for consumer products

473

Moe WW, Trusov M (2011) The value of social dynamics in online product ratings forums. J Mark Res 48(3):444–456 Mudambi SM, Schuff D (2010) What makes a helpful online review? A study of customer reviews on Amazon.com. MIS Q 34(1):185–200 Su Q, Zhu Y, Swen B, Yu S (2007) Mining feature-based opinion expressions by mutual information approach. Int J Comput Process Orient Lang 20(2):137–150 Urcosme (2012) http://www.urcosme.com/index.htm Wright A (2009) Our sentiments, exactly. Commun ACM 52(4):14–15 Zhang C, Zeng D, Li J, Wang FY, Zuo W (2009) Sentiment analysis of Chinese documents: from sentence to document level. J Am Soc Inf Sci Technol 60(12):2474–2487 Zhu F, Zhang X (2010) Impact of online consumer reviews on sales: the moderating role of product and consumer characteristics. J Mark 74(2):133–148

123