Int. J. Electronic Marketing and Retailing, Vol. 5, No. 1, 2012
Framework for simulating random clickstream data Mansour Abdoli Kiewit Infrastructure Company, Santa Fe Springs, CA 90670-4040, USA E-mail:
[email protected]
Paul Savory* Department of Management, University of Nebraska-Lincoln, Lincoln, NE 68588-0491, USA Fax: +1-402-472-5855 E-mail:
[email protected] *Corresponding author
F. Fred Choobineh Electrical Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0557, USA E-mail:
[email protected] Abstract: Improving e-store productivity requires well designed algorithms that are efficient in modelling the online behaviour of customers. However, due to competitive pressures and privacy concerns, e-commerce data is often not readily available to researchers. As a result, the pace of innovation for e-store design and applications is restricted. As a solution, this research develops a framework for generating random clickstream data representing online behaviour of visitors to an e-store. The framework is implemented into a discrete-event simulator that models a hypothetical e-store. Key outcomes from the research are the development of the simulator, the structure of the e-store, the parameterisation used to model the e-store and visitors, and the implementation of automated sales agents. The generated clickstream data was verified to represent what one expects from a real e-store. Keywords: clickstream; e-store; e-commerce; electronic commerce; simulator; MatLab; marketing; retailing. Reference to this paper should be made as follows: Abdoli, M., Savory, P. and Choobineh, F.F. (2012) ‘Framework for simulating random clickstream data’, Int. J. Electronic Marketing and Retailing, Vol. 5, No. 1, pp.63–76. Biographical notes: Mansour Abdoli is a Project Engineer with Kiewit Infrastructure Company. He completed both his PhD and MS degrees from the Department of Industrial and Management Systems Engineering at the University of Nebraska-Lincoln.
Copyright © 2012 Inderscience Enterprises Ltd.
63
64
M. Abdoli et al. Paul Savory is a Professor at the Department of Management, University of Nebraska-Lincoln. His research interests include healthcare systems and discrete-event simulation. He has received numerous awards for his teaching effectiveness and has been inducted into the University of Nebraska Academy of Distinguished Teachers. F. Fred Choobineh is a Professor of Electrical Engineering and Milton E. Mohr Distinguished Professor of Engineering at the University of Nebraska-Lincoln. His research interests include design and control of manufacturing systems and the use of approximate reasoning techniques such as fuzzy sets and evidence theory in modelling decision situations and risk analysis. He is a Fellow of the Institute of Industrial Engineers and a member of IEEE and IIE.
1
Introduction
The actions of an online visitor to a website are defined by the sequence of webpages that she visits, information fields that she fills out, and the buttons that she clicks. This sequence of actions is called a visitor’s clickstream. Analysis of clickstream data is useful for determining the objectives of visitors so as to craft individual or group marketing and sales tactics (Boyer and Hult, 2005; Heer and Chi, 2001). Analysis also allows for exploring the impact of website design, sales strategies, and website applications. Unfortunately, there are valid privacy concerns surrounding the use of visitors’ information (Taylor et al., 2009). Additionally, given the competition among e-commerce websites (e-stores), companies are hesitant to share clickstream data with researchers in fear that it will divulge business practices or proprietary information. Even when such access is granted for a case study analysis, the underlying data is often not made available to other researchers for comparison or extension of the original study. Since Fishwick (1996) introduced the idea of web-based simulation, simulation’s use for modelling the internet and online activities has ranged from: using web-technology to model order processing (Lulay and Reinhart, 1998); simulating the internet itself (Floyd and Paxson, 2001); modelling multicast communication protocols in the internet (Pullen et al., 1999); approximating the structure and growth of the internet (Boudourides and Antypas, 2002), and evaluating the performance of routing control algorithms of workflow systems in e-commerce applications (Yang, 2003). Modelling the clickstream behaviour of online visitors to a website is an area that has not been fully explored. Of the limited research that has been done, Montgomery et al. (2004) use Monte Carlo Markov Chain (MCMC) simulation to re-create the surfing behaviour of a shopper at an e-store, where statistical inferences are drawn using randomly generated variations of a given scenario. Since the structure of their website is not explicitly modelled, this approach does not allow much opportunity for creating new scenarios and speculating on non-existing systems. This paper highlights the issues and details involved in simulating visitor clickstream data sequences for a hypothetical e-store which incorporates automated sales agents to increase sales. The availability of random clickstream data offers a minimum cost approach for testing new website strategies, validating results for case studies of website use, and exploring the impact of website applications and sales agent tools. Section 2 describes the hypothetical e-store and discusses key issues for creating SurfSim, a
Framework for simulating random clickstream data
65
MatLab implementation of the framework. Section 3 offers an overview of SurfSim. Section 4 describes the types of visitors to the hypothetical e-store and explains how their behaviour is simulated. Section 5 describes the incorporation of automated sales agents. Section 6 presents a series of experiments used to validate the clickstream sequences generated by SurfSim. Section 7 concludes with potential applications of the research.
2
Hypothetical e-store
To provide a structure for generating random clickstream data, a hypothetical e-store was created. The e-store has eight link (webpage) classes: •
Home: the e-store’s homepage.
•
Account: represents the webpage(s) associated with customer accounts.
•
Info: informative webpages, including FAQ, Help, and Contact Us.
•
Product category: all webpages introducing a group of similar products.
•
Product items: all webpages detailing specifications of a given product. This class is further divided into subclasses according to the product category class.
•
Shopping cart: represents the purchase process.
•
Order: represent the confirmation stage that is reached after the order confirmation link is selected in the shopping cart webpage.
•
Exit: a virtual class that represents the end of a visit.
Tucker (2008) comments, “There is no standard way of building the navigation of [an] e-commerce site.” Nonetheless, by defining a standard configuration at the link class level, we maintain a level of commonality among otherwise different e-stores. This commonality allows validating the clickstream data generated by SurfSim over a broader spectrum. The composition of link classes is similar to the work of Montgomery et al. (2004) who model the surfing behaviour of online visitors based on page classes viewed. Table 1 lists the classes and their associated links (webpages) for the e-store. The e-store offers four product categories, each category containing four distinct products. Aside from a visitor’s intention, the e-store shapes a visitors’ browsing behaviour in two ways. First, a hyperlink structure limits visitors’ navigational choices. Second, the content of a webpage affects the way a visitor chooses her next link. The hyperlink structure of an e-store is best presented by a directed graph, where graph nodes are webpages and graph arcs (links) represent the e-store’s hyperlinks. The directed graph representation of the e-store is defined in SurfSim using two relations: Page2Link representing outgoing links for each page and Link2Page representing the pages that each link is connected to. Nodes of the graph (i.e., webpages) are defined in an array named Pages and the Links array lists the arcs (i.e., hyperlinks) of the graph. In practice, the hyperlink structure of an e-store is dynamic (e.g., the link taking a visitor to the order page is enabled only when the shopping cart is not empty). That is, relations defined by Page2Link and Link2Page have to change for each visitor according to the surfing status of the visitor. Dynamics of the e-store structure and other exceptions are handled in SurfSim by developing a link selection process at the visitor level.
66
M. Abdoli et al.
Table 1
List of classes and links/pages associated with them
Classes
Links
Pages
Home
Home
Home
Account
Account
Account
FAQ, Help, Contact Us
FAQ, Help, Contact Us
Product category
Category 1, Category 2, Category 3, Category 4
Category 1, Category 2, Category 3, Category 4
Category 1
Product 1, Product 2, Product 3, Product 4
Product 1, Product 2, Product 3, Product 4
Category 2
Product 5, Product 6, Product 7, Product 8
Product 5, Product 6, Product 7, Product 8
Category 3
Product 9, Product 10, Product 11, Product 12
Product 9, Product 10, Product 11, Product 12
Category 4
Product 13, Product 14, Product 15, Product 16
Product 13, Product 14, Product 15, Product 16
Shopping Cart, Add, Remove, Continue
Shopping Cart
Order
Order confirmation
Exit
Exit
Product items
Info
Shopping cart Order Exit
The effect of the content of webpages on visitors’ link selection is modelled as a two-step link selection process. First, a link class is selected. Next, a link is selected from the available links for the selected class. This two-step approach provides an added control at the class level for the link selection process, which permits simulating different e-store scenarios in which the detailed information at the link level is not available. The viewing time behaviour of a visitor is also affected by the content of the webpages. In SurfSim, the mean of the viewing time distribution is first generated randomly for each visitor and then adjusted according to a defined array of content-related factors to model the effect of webpage content. This approach supports control over parameters at visitor and webpage levels. In addition to the hyperlink graph and webpage content, other factors such as visitor’s intention, product price and product demand affect a visitor’s link selection. Inclusion of these factors allows one to explore the dynamics of the link selection process due to changes in price (e.g., generated by an automated sales agent offering a discount – ‘buy these two books for $X’) and demand (e.g., caused by seasonality) for different types of visitors. In SurfSim, four types of visitors are considered, based on their intention. The price and demand of products are respectively maintained in the PrdPrice and PrdDmnd arrays. Furthermore, a product desire factor is used to reflect a visitor’s product preference. The combination of these factors is summarised as the product desire that affects the visitor’s likelihood to view and purchase a product. The e-store hyperlink structure, visitors’ intention and their behaviour can vary across industry type. SurfSim is designed to provide a flexible basis for modelling e-stores and visitors that can adapt to various scenarios, most of which can be attained by minor parameter adjustments. In addition, SurfSim supports implementation of an automated sales agent by assigning individual discounts through minimal code adjustment.
Framework for simulating random clickstream data
3
67
Overview of SurfSim
The framework for the clickstream generator consists of four major components: discrete-event engine, e-store, visitors, and agents. Figure 1 presents an overview. The core of SurfSim, a MatLab (http://www.mathworks.com) implementation of the framework, is a discrete-event simulation engine which controls the processing of visitors to the defined e-store and directs the use of automated sales agents for assisting visitors during their visit to the e-store (Abdoli, 2006). As such, SurfSim models how online customers visit, surf, shop, leave and return to the e-store and is capable of simulating the use of pricing techniques and sales tactics at the individual visitor level. Figure 1
Module structure of a clickstream generator E-store
Visitor
Discrete event engine
Sales agents
SurfSim is based on scheduling and performing events initiated by different entities. This approach is commonly known as discrete-event simulation (or event-driven simulation). The backbone of the engine is the list of events that are scheduled to occur in the future. This list is known as the event calendar. The discrete-event simulation engine maintains the list of scheduled events in the event calendar and executes one at a time. The continuum of the simulation is guaranteed by allowing existing events to schedule future events during their execution. For example, a ‘visitor generating’ event schedules another ‘visitor generating’ event and any other events that are supposed to take place for the generated entity. A simulation run can be customised in two ways: modifying the simulation scenario and/or modifying the events. Given a set of defined initial events, a scenario can be changed by, for example, the number/type of initial events and simulation length (or other stopping criteria). Events, which are implemented through MatLab function files, are parameterised to allow for event customisation. Further event customisation is possible through modifying the MatLab source code for existing event functions or even introducing new events (Abdoli, 2006). When an event is changed or a new one is introduced, the event calendar is updated to accommodate for an appropriate call sequence that includes the new event.
4
Visitor entities
Visitor entities represent online customers of the e-store. A visitor entity is created as a visit starts and is terminated upon a visitor’s exit from the e-store. Attributes are assigned
68
M. Abdoli et al.
to each entity to represent characteristics of the visitor (e.g., buyer or browser) and the status of the visit (e.g., assigned discounts, pages visited). Some of these attributes are recorded at the termination of the visit when the visitor entity selects the exit link. When an entity is created to represent a returning visitor, the previous attribute values of the returning visitor are loaded from recorded information. This replicates the use of cookies that record details of a visit to an actual website for future references. Visitor entities surf the e-store’s structure through a series of events that is orchestrated by three probability distributions modelling the link selection, viewing time behaviour during the visit, and the initial page selection at the start of the visit. The parameters of these probability distributions are controlled by the type of visitor. Each type relates the behaviour of visitors to their intention of visiting the e-store. Similar to the classification of visitors suggested by Moe (2003), four visitor types are considered: •
Browsers – neither intend to make a purchase nor are looking for a specific product. They follow links and visit a broad range of pages in the hope of finding something that attracts their attention.
•
Knowledge builders – have no intention to make an immediate purchase; however, they are looking for a specific product and try to learn more about it and possibly will compare the product’s price/characteristics to others.
•
Bargain hunters – seeks discounts and enjoys the shopping (browsing) experience more than the product bought, if any.
•
Buyers – have a specific product in mind and intend to purchase it if available at a reasonable price.
Upon arrival, new visitors are randomly assigned to be one of the four types based on a multinomial probability distribution. A type dependent probability is assigned to determine whether a visitor will return. The time of return is randomly set using an exponential probability distribution with a type dependent parameter. When a visitor returns, she is assigned a new type based on a first-order Markov chain process. When this feature is not required, a unity transition probability matrix can be used to model the case that the initial visitor type stays the same over the simulation run. Jayalal et al. (2007) use a similar approach by using a Markov chain model based on an exponentially smoothed transition probability matrix. Table 2 summarises the default parameters for the arrival process, where the visitor type is kept intact between two consecutive visits. Details on the development of these values can be found in Abdoli (2006). The desire factor affects on-line behaviour of the visitor and is discussed later. Following an arrival event, a visitor entity is set to visit her first webpage (based on an initial page selection distribution) and a link selection event is scheduled for a random time in the future (generated using the viewing time distribution). The initial page is selected using a simple multinomial probability distribution, which could also represent the simpler case when all visitors start from the e-store’s homepage. The link selection behaviour of visitors is modelled as a two-step process. In the first step, a class of links is selected and in the second step a link is selected within the previously selected class. The selection probability of class of links for each visitor type is based on a marginal probability. Table 3 shows the default values. The base marginal values are set such that buyers have higher probability of selecting links leading towards the shopping cart (i.e., ordering a product) than other visitors do. The selection
Framework for simulating random clickstream data
69
probability distributions are calculated based on marginal probabilities and updated based on the visitor type, visit status, product demand, and product price. Table 2
Range of parameter values assigned to a visitor entity Visitor shopping attitude
Simulation parameters
Knowledge builder
Browser
Bargain hunters
Buyer
2
3
1
8
Generating ratio for visitors’ type Visitors’ type change
No change of type (unity transition probability matrix)
Probability of return
0.70
0.90
0.80
0.40
Mean-time-to-return (days)
3
1
1
5
Mean of viewing time (sec.)
(120, 400)
(20, 100)
(40, 200)
(60, 300)
1,200
300
300
600
1
0.5
2
1
Variance of viewing time (sec.2) Desire factor Table 3
The marginal probability of selecting link classes for different visitor types Link class
Visitor type Browsers
(1) (2) (3) Home Account Info. 0.11
0.05
(4) (5) Product Product categories items
(6) (7) Shopping Order Cart
(8) Exit
0.20
0.25
0.22
0.05
0.03
0.09
Knowledge builders
0.09
0.02
0.13
0.29
0.30
0.05
0.03
0.09
Bargain hunters
0.14
0.03
0.08
0.26
0.25
0.10
0.08
0.06
Buyers
0.08
0.06
0.05
0.26
0.25
0.15
0.10
0.05
Visitor type represents visitors’ intention. These intentions are unknown, yet can be estimated based on visit history and verified by analysing the simulation outcomes. Product demand and price represent the effect of supply-and-demand and store pricing on visitors’ purchasing behaviour. Higher demand and lower price is modelled by increasing the visitor’s desire to make a purchase. At the product level, the desire is adjusted by assigning each visitor a random desire factor for each product. In SurfSim, the impact of demand, price and product desire factor on likelihood to select a product, for view or purchase, is modelled as: 2 × product desire factor/(product desire factor + (product price/product demand)). The price and demand are represented as ratio values; that is, price and demand for the period of the study is normalised based on the values at a given time (typically the simulation start time). The product demand ratio is modelled as a periodic function to show the seasonal fluctuation in the market demand. Product price ratio is set to represent the e-store’s pricing policy (e.g., fixed pricing, demand-based dynamic pricing, or visitor-based dynamic pricing). In SurfSim, the pricing is controlled at the visitor level; this allows for implementing and simulating the use of sales agents. The marginal probability for the exit link (also treated as a link class) is updated based on the length of visit. The adjustment is made to reduce the probability of selecting the exit link in the beginning of the visit and to ensure that the simulated visitor does not leave the e-store too early. The length of visit, like other visitor attributes, is initialised and updated during the simulation. After initial adjustments are made to the probabilities
70
M. Abdoli et al.
of exit and other links, the marginal probabilities are adjusted for link availability (determined by the Page2Link relation and the visitors’ surfing status). This is done by setting probabilities of inaccessible link classes to zero and normalising the probability of the available links. Studies of website design often do not account for view time statistics, mainly because reliable data is not always available. However, view times are used in SurfSim to prioritise events and create a more realistic representation. Random view times are assigned to each viewing action of a visitor. The view time distribution is considered a function of the visitor type and webpage content. View time is assumed to follow a truncated gamma probability distribution. The gamma distribution is chosen for its controllable skewness. It is truncated to impose minimum time delay and maximum session length restrictions. The parameters of the gamma distribution are found based on a random average and a given variance, as functions of visitor type. Table 2 lists these parameter values. For instance, the average viewing time of a webpage for a browser customer is randomly selected from the interval (20, 100) seconds. The variance is considered to be 300 squared-seconds. Using the generated average and given variance, the gamma parameters are calculated and used to generate a random viewing time for a webpage. By default, values less than 1 or more than 600 seconds (10 minutes) are truncated.
5
Sales agents
E-stores often employ online decision making tools such as an automated sales agent (Abdoli, 2006). Agents are computer programmes that use their inputs (sensors) and a predefined logic to affect the environment through their outputs (actuators). The logic can be as simple as a reflexive rationale which remains the same throughout the life of the agent, or an adaptive decision engine that learns over time and modifies its reactions to customer actions. SurfSim provides the opportunity to implement and analyse different agents at e-stores. In SurfSim, sales agents use webpages viewed by visitors as inputs and learn about visitors through modelling visitor behaviour based on the input data. A sales agents’ decision engine is developed to address the potential for implementing sales agents that make autonomous decisions such as setting prices and employing a sales tactic. The current implementation of SurfSim adjusts price ratios by estimating the visitor’s type after observing five link selections by the visitors. SurfSim’s incorporation of agents is supported through two MatLab script files: InitAgent and Agent. InitAgent initialises the parameters that define an agent and allows for customising and fine tuning an agent’s learning and decision making logic. Agent contains the functions that perform duties of agents and is called as an event after each link selection event occurs for a visitor. The role of an agent is to collect information during the visit and execute tactics (e.g., package deal price, product recommendation) so that an objective (e.g., make a sale) is achieved. Since information of each visitor is collected and processed separately, the implementation represents a multi-agent system where each agent deals with a separate visitor. For the framework, the sales agents use a high-order Markov chain to learn about the customer behaviour. Different sales strategies are used, some of which use an estimate of
Framework for simulating random clickstream data
71
a visitor’s probability-of-purchase calculated based on the Markov model. For details of the Markov model and estimation of probability-of-purchase, see Abdoli (2006).
6
Simulation validation
A simulation is only meaningful if the data that it generates is representative of what one would expect for a real e-store. For this reason, a series of experiments was conducted and the outcomes were analysed and, when possible, the outcomes were compared with the clickstream summary data provided in Montgomery et al. (2004). Overall, the results verify that the simulation programme generates outcomes consistent with the varied parameters. In combination with the default simulation values discussed in the previous sections, Table 4 lists the parameter values for nine of the experiments that were conducted. The parameters are the length of the simulation in weeks, the number of initial visitors, product normalised price, and product normalised demand. Table 4
List of parameters used in SurfSim experiments Length of simulation (weeks)
Number of initial visitors
Product price
Product demand
1
2
10
1.0
1.0 constant
2
2
100
1.0
1.0 constant
3
1
10
0.8
1.0 constant
4
1
10
0.8
1.5 constant
5
1
10
1.0
1.0 constant
6
1
10
1.0
1.5 constant
7
1
10
1.2
1.0 constant
8
1
10
1.2
1.5 constant
9
2
10
1.0
1.0 on average
Experiment
6.1
Results for experiments 1 and 2
To find out whether the simulation reaches a steady-state, the outcomes of Experiments 1 and 2 are compared. Figure 2 shows the number of online visitors for Experiment 1 with 10 initial visitors and Figure 3 shows the results for Experiment 2 with 100 initial visitors. The maximum number of online visitors is much greater in Experiment 2 than Experiment 1. However, the average number of online visitors after considering a two-week warm-up period is almost the same for both experiments (12.96 and 12.60 for Experiments 1 and 2, respectively). Figure 4 and Figure 5 show the histograms for the number of clicks in individual visits for both experiments after the warm-up period. Similarity between the figures also supports the notion that the initial number of visitors does not dramatically affect the steady-state results of the simulation. Both experiments show a mode of six clicks in each visit, which is close to the mode of five reported by Montgomery et al. (2004).
72 Figure 2
M. Abdoli et al. Number of online visitors in Experiment 1 160 140
# of online visitors
120 100 80 60 40 average = 12.9649 20 0 0
5
10
15
Time [day]
Figure 3
Number of online visitors in Experiment 2 600
# of online visitors
500
400
300
200 average = 12.5952
100
0 0
5
10
15
Time [day]
Histogram of number of clicks in visit sessions in Experiment 1 900 800 700 600 Frequency
Figure 4
500 400 300 200 100 0 0
50 100 Number of clicks per visit.
150
Framework for simulating random clickstream data Figure 5
73
Histogram of number of clicks in visit sessions in Experiment 2 900 800 700
Frequency
600 500 400 300 200 100 0 0
50 100 Number of clicks per visit.
150
Outcomes of Experiments 1 and 2 result in sales conversion rates of 9.5% and 8.7%, respectively. Since the characteristics of products are considered the same in both experiments, it is expected that the number of purchases made will not differ significantly among products. Using a chi-squared goodness-of-fit test, the distribution of the number of purchases made was tested against a uniform distribution for both experiments. The p-values were found to be 0.19 and 0.92 for Experiments 1 and 2, respectively. Therefore, there is no strong evidence suggesting that the simulation outcomes differ significantly from the expectations.
6.2
Results for Experiments 3 to 8
These experiments highlight the effect of price and demand on number of product purchases. With other factors remaining equal, it is expected that the lower the price, the higher the probability of purchase. The supply-and-demand relation also suggests that, given a constant price, higher demand results in a higher probability of purchase. Figure 6 presents a summary of the number of purchases made for Experiments 3 to 8, where two levels of demand (100% and 150% of nominal demand) and three levels of price (80%, 100%, and 120% of nominal price) are used. As expected, decrease in price and increase in demand lead to higher numbers of purchases.
6.3
Results for Experiment 9
Experiment 9 shows the capability of SurfSim in simulating situations where product demand varies and is not constant. In this experiment, a periodic demand is assigned to each product at the beginning of the simulation. Other parameters are set similar to Experiment 1. Table 5 shows the minimum, maximum, and average (normalised) demand for each product. The average demand over all products is close to 1, the constant demand used in Experiment 1. Compared to Experiment 1, the results show a slight drop in the average number of online visitors (11.02 versus 12.96) and the sale conversion rate (7.4% versus 9.5%). This behaviour can be contributed to the non-linearity of the
74
M. Abdoli et al.
purchasing behaviour of visitors, as modelled here as a function of price and demand. In other words, 20% higher demand does not contribute as much as 20% lower demand diminishes the desire to purchase. This result is additional evidence showing that the simulation performs as expected. Figure 6
Checking the effect of demand and price of the number of sales Demand 1
Sales #
1600
Demand 1.5
1613 1338
1400 1200 1000
1119
1119 1051 837
800 0.8
1
1.2
Price
Table 5 Product
Demand parameters for Experiment 9 Minimum demand
Maximum demand
Average demand
1
0.6578
1.3896
1.0237
2
0.1900
1.3570
0.7735
3
0.4949
1.5979
1.0464
4
0.6009
1.8472
1.2240
5
0.3442
1.4249
0.8846
6
0.2527
1.3703
0.8115
7
0.3262
1.6720
0.9991
8
0.3230
1.8945
1.1087
9
0.2984
1.8504
1.0744
10
0.6174
1.3019
0.9597
11
0.3771
1.8382
1.1077
12
0.6417
1.8870
1.2643
13
0.3107
1.8599
1.0853
14
0.1254
1.7141
0.9198
15
0.4390
1.3249
0.8819
16
0.5819
1.9508
1.2664
Average
0.4113
1.6425
1.0269
Framework for simulating random clickstream data
7
75
Concluding remarks
While discrete-event simulation is commonly used to model scenarios in manufacturing, service systems, and health care, its potential applications are unlimited. This research explores using simulation for generating clickstream data that is representative of visitor behaviour to an e-store website. Key outcomes of the research include: •
highlighting the issues involved in programming a clickstream simulator
•
proposing a framework (or data structure) for modelling a hypothetical e-store
•
discussing how one can parameterise visitors and visitor actions to an e-store
•
exploring the implementation of automated sales agents into the e-store simulation.
While SurfSim models a specific and defined hypothetical e-store, its structure and approach is extendible through simple modifications to the underlying parameter values and MatLab code. An important consideration is acknowledging that while the simulation results have been validated, they have not been compared to a real e-store. By being able to use, share, and compare clickstream datasets, researchers can test new website strategies, explore the impact of website applications, and draw comparisons of the impact of changes in website design. This allows for the continued development of systems and approaches for improving e-commerce usability, such as the E-menu work proposed by Tucker (2008) and the web task automation research of Centeno et al. (2004). In addition to replicating the shopping experience of online visitors, the framework also provides a means for implementing different decision-making tools. It allows one to explore the potential of implementing and configuring automated sales agents (e.g., ‘customers interested in this product might also be interested in…’) for increasing the probability of a purchase. This research also supports the development of data mining approaches for exploring online customer behaviour. Wang et al. (2005) comments, “Discovering hidden and meaningful information about Web users usage patterns is critical to determine effective marketing strategies to optimise the Web server usage for accommodating future growth.” The design and testing of such data mining (or web usage) algorithms requires clickstream datasets (Cooley et al., 1999). SurfSim’s capability for generating random clickstream datasets can potentially increase the development pace of data mining approaches for understanding online customer behaviour. This is a rich research area in need of additional development and expansion.
References Abdoli, M. (2006) ‘Implementation of virtual sales agents at e-stores’, PhD dissertation, University of Nebraska-Lincoln, Lincoln, NE, December. Boudourides, M. and Antypas, G. (2002) ‘A simulation of the structure of the world-wide web’, Sociological Research Online, Vol. 7, No. 1, available at http://www.socresonline.org.uk/7 /1/boudourides.html (accessed on 20 January 2010). Boyer, K.K. and Hult, G.T.M. (2005) ‘Customer behavior in an online ordering application: a decision scoring model’, Decision Science, Vol. 3, No. 4, pp.569–598.
76
M. Abdoli et al.
Centeno, V.L., Kloos, C.D., Fernandez, L.S. and Garcia, N.F. (2004) ‘Web task automation: a standards-based proposal’, International Journal of Web Engineering and Technology, Vol. 1, No. 3, pp.374–391. Cooley, R., Mobasher, B. and Srivastava, J. (1999) ‘Data preparation for mining world wide web browsing patterns’, Knowledge and Information Systems, Vol. 1, No. 1, pp.5–32. Fishwick, P.A. (1996) ‘Web-based simulation: some personal observations’, in Charnes, J.M., Morrice, D.J., Brunner, D.T. and Swain, J.J. (Ed.): Proceedings of the 1996 Winter Simulation Conference, Coronado, California, USA, pp.772–779. Floyd, S. and Paxson, V. (2001) ‘Difficulties in simulating the internet’, IEEE/ACM Transactions on Networking, Vol. 9, No. 4, pp.392–403. Heer, J. and Chi, E.H. (2001) ‘Identification of web user traffic composition using multi-modal clustering and information scent’, Proceedings of the 1st SIAM International Conference on Data Mining Workshop on Web Mining, Chicago, IL, USA, pp.51–58. Jayalal, S., Hawksley, C. and Brereton, P. (2007) ‘Website link prediction using a Markov chain model based on multiple time periods’, International Journal of Web Engineering and Technology, Vol. 3, No. 3, pp.271–287. Lulay, W.E. and Reinhart, G. (1998).‘Coordinating order processing in decentralized production units using hierarchical simulation models and web-technologies’, in Medeiros, D.J., Watson, E.F., Carson, J.S. and Manivannan, M.S. (Ed.): Proceedings of the 1998 Winter Simulation Conference, Washington, DC, USA, pp.1655–1662. Moe, W.W. (2003) ‘Buying, searching, or browsing: differentiating between online shoppers using in-store navigational clickstream’, Journal of Consumer Psychology, Vol. 13, Nos. 1–2, pp.29–40. Montgomery, A.L., Li, S., Srinivasan, K. and Liechty, J.C. (2004),‘Modelling online browsing and path analysis using clickstream data’, Marketing Science, Vol. 23, No. 4, pp.579–595. Pullen, J.M., Lavu, L., Malghan, R., Duan, G., Ma, J. and Nah, H. (1999) ‘A simulation model for IP multicast with RSVP’, Internet Engineering Task Force Informational RFC 2490, Internet Society, available at ftp://ftp.rfc-editor.org/in-notes/rfc2490.txt (accessed 20 January 2010). Taylor, D.G., David, D.F. and Jillapalli, R. (2009) ‘Privacy concern and online personalization: the moderating effects of information control and compensation’, Electronic Commerce Research, Vol. 9, No. 3, pp.203–223. Tucker, S-P. (2008) ‘E-commerce standard user interface: an e-menu system’, Industrial Management and Data Systems, Vol. 108, No. 8, pp.1009–1028. Wang, X., Abraham, A. and Smith, K.A. (2005) ‘Intelligent web traffic mining and analysis’, Journal of Network and Computer Applications, Vol. 28, No. 2, pp.147–165. Yang, S.J. (2003) ‘Design issues and performance improvements in routing strategy on the internet workflow’, International Journal of Network Management, Vol. 13, No. 5, pp.359–374.