From Real Purchase to Realistic Populations of Simulated Customers Philippe Mathieu, S´ebastien Picault
To cite this version: Philippe Mathieu, S´ebastien Picault. From Real Purchase to Realistic Populations of Simulated Customers. Demazeau, Y., Ishida, T., Corchado Rodr´ıguez, J.M., Bajo P´erez, J. 11th International Conference on Practical Applications of Agents and Multia-Agent Systems (PAAMS 2013), May 2013, Salamanca, Spain. Springer, 7879, pp.216-227, 2013, Lecture Notes in Computer Science.
HAL Id: hal-01004252 https://hal.archives-ouvertes.fr/hal-01004252 Submitted on 27 Aug 2014
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destin´ee au d´epˆot et `a la diffusion de documents scientifiques de niveau recherche, publi´es ou non, ´emanant des ´etablissements d’enseignement et de recherche fran¸cais ou ´etrangers, des laboratoires publics ou priv´es.
From Real Purchase to Realistic Populations of Simulated Customers Philippe Mathieu and S´ebastien Picault Universit´e Lille 1, Computer Science Dept. LIFL (UMR CNRS 8022), Cit´e Scientifique 59655 Villeneuve d’Ascq Cedex, France
[email protected] http://www.lifl.fr/SMAC/
Abstract. The use of multiagent-based simulations in marketing is quite recent, but is growing quickly as the result of the ability of such modeling methods to provide not only forecasts, but also a deep understanding of complex interactions that account for purchase decisions. However, the confidence in simulation predictions and explanations is also tightly dependent on the ability of the model to integrate statistical knowledge and the situatedness of a retail store. In this paper, we propose a method for automatically retrieving prototypes of consumer behaviors from statistical measures based on real data (receipts). After preliminary experiments to validate this data mining process, we show how to populate a multiagent simulation with realistic agents, by initializing some of their goals with those prototypes. Endowed with the same overall behavior, validated in earlier experiments, those agents are put into a spatially realistic store. During the simulation, their actual actions reflect the diversity of real customers, and finally their purchase reproduce the original clusters. Besides, we explain how such statistically realistic simulation may be used to support decision in retail, and be extended to other application domains. Keywords: Agent-based Simulations, Knowledge discovery, Marketing, Interactions
1
Introduction
Since several years, individual-based models and multiagent-based simulations have been used to enhance the understanding of complex marketing issues and support decision in the context of retail stores [1, 2]. Due to the introduction of fine-grained information regarding individual behaviors, agent-based models are able to provide not only global predictions (such as the quantities of transactions or revenue) but also insights concerning the reasons that make marketing techniques efficient or not. Classical marketing analysis techniques, used e.g. to segment customers into subgroups with similar needs, or to detect items that are frequently bought together, consist in retrieving global information from large databases through data mining algorithms [3, 4]. For instance, association rules capture co-occurrences between items in customer baskets, and thus allow the retailer to offer promotions on frequently associated items, or to propose relevant similar products. But, since those techniques only capture statistical features of customer behavior, without being able to provide any kind of causal
2
P. Mathieu, S. Picault
explanation, their quality is highly dependent on the application that uses the retrieved knowledge [5]. Besides, the data collected in retail stores result from a complex decision process, which is affected by seasonal, geographical, cultural, environmental factors, by demographic and psychographic variations of the customers, by the brand management, and by in-store events such as promotions. Thus it is difficult to assess the stability of the rules that are built from the data, and quite impossible to predict how changes in the store management affect the rules. On the contrary, agent-based models allow to take into account individual preferences and even psychological expertise [6], so as to build an accurate description of the motivations and needs of each customer. Then it is the actions of those simulated customers that are responsible for the purchases that are predicted. Hypotheses regarding the factors that influence sales are made explicit in the model: they can be understood and examined by experts, and validated (or not) through an appropriate experimental setup. As a counterpart, individual-based models, especially when they involve cognitive agents (e.g. for accounting psychological motivations [6]), tend to require too much expertise, which is not always easy to acquire or implement. In addition, few store simulation models do take into account the spatial issues which are considered crucial in retail stores, such as shelves allocation, items placement, checkout sizing, point of sale display, etc. In a previous work, we addressed those questions in order to build a simulation of supermarkets, where the agents were situated in a realistic environment [7]. This situatedness is necessary for raising multiagent systems from casual, ad hoc simulations, to full decision support systems, able to predict how the clients react to changes in the spatial organization of the store, marketing events (e.g. discount, publicity...), and competitor shops. In this paper, we propose a simulation approach which does not rely much on expert knowledge; instead, it tries to retrieve as much information as possible from retail data (e.g. receipts) as in classical basket analysis methods; but, this knowledge discovery process is used to initialize the purchase preferences of the population of simulated customers with statistically realistic traits. Combined with a model of customer behavior, those traits produce statistically realistic purchase when the agents are acting in their environment. Since we designed and validated a model of customer behavior in a previous work [7], we will focus in this paper on the realism of customer populations, i.e. on a purchase decision model which can mimic actual behaviors. The main classical technique for information retrieval in actual data is the affinity analysis [3, 8], which is based on the census of item co-occurrences in the purchases. It can be directly applied to actual receipts. On this basis, association rules between items (i.e. X → Y where X,Y are disjoint sets of items) can be inferred [3], together with a support (proportion of purchases which include both X and Y ) and a confidence (conditional probability of buying the articles of Y when those of X are in the basket). This approach is very helpful for cross-selling or up-selling, and to some extent it can provide indications in product placement (e.g. try to associate in the shelves items that are frequently bought together). Its first limitation is the computational time, which grows at least with the cube of the number of items [9]. But also, it is quite difficult
From Real Purchase to Realistic Populations of Simulated Customers
3
to use association rules to drive the purchase decisions of agents, since a rule can only suggest items that are related to others, but not how to bootstrap the basket. An alternative approach consists in trying to predict shopping lists from the existing receipts. For instance in [10] this method is implemented in a personal assistant that learns individual purchase habits, so as to remind the customers of their most probable needs during the shopping trip, and to propose personalized promotions. The purpose of this application is very far from ours, and the classification methods that are used do not build any symbolic description of the shopping list, but only a prediction over rough product categories. Nevertheless, this work assesses the possibility to perform some kind of induction over real receipts so as to identify an underlying shopping list. In our proposal, we try to join the identification of similar customers (like in classical market segmentation) and the induction of abstract descriptors for those clusters, specifically prototypical shopping lists. Then we use the association of clusters and shopping lists to generate profiles of agents which buy close items. Hence the approach we propose follows a kind of methological loop. We define a representation frame for transactions (e.g. receipts) and their abstract description: prototypes built by generalization. Afterwards those prototypes are used to initialize the agents in the simulation and the simulated transactions can be in turn analysed. The paper is organized as follows: section 2 presents briefly the context of the grocery store simulation that has been designed and tested in [7, 11]; especially we explain how shopping lists are used to induce purchase preferences. Section 3 describes the way we represent relevant information to identify and characterize items, transactions and prototypes. Section 4 presents the data mining process that builds prototypes from transactions, and section 5 how the overall procedure has been validated. Finally we explain in section 6 how to use our approach within multiagent simulations.
2
Context of the simulations
The knowledge discovery process we propose in this paper has been tested within an existing grocery store simulation [7, 11], endowed with a realistic environment and populated with behaviorally convincing artificial customers. 2.1
An Interaction-oriented model
In this work, we had to acquire expertise and build the simulation model in a quite incremental and empirical way. Thus the modeling method had to be highly understandable by experts outside the field of computer science, e.g. psychologists or marketing advisors, and enable a step-by-step design of behaviors. Therefore, we used the principles of the ’Interaction-Oriented’ approach [12]. In this method, each relevant entity of the model is represented by an agent, without any prior distinction between “true” agents and resources or objects; each behavior is modeled by a separated piece of code called an “interaction”, i.e. a sequence of actions between a source agent and one or more target agents, controled by some conditions. The interaction is realizable if the source and target agents fulfil the conditions.
4
P. Mathieu, S. Picault
Agents and interactions can be developed in independent libraries, then the simulation is designed by assigning interactions to pairs of agents (in an ’interaction matrix’ – see table 1). It is a visual way to express what families of agents are allowed to interact, and with what interactions. This interaction matrix is processed by a generic simulation engine, which essentially evaluates what interactions can be realized, between what agents, and subsequently determines the actions to be performed by each agent. Actually, describing the action capabilities of the entities of the model in terms of interactions that can be performed or undergone, instead of behaviors embedded in the agents, is very close to the theory of affordances [13]. Thus, it makes this quite natural to use for psychologists and facilitates knowledge acquisition. 2.2
Model of a customer agent
Table 1. The interaction matrix that defines the behavior of all agents in our simulation. For instance, the intersection between line ’Customer’ and column ’Item’ contains two interactions, ’Take’ and ’MoveTowards’, which means that a customer agent may either take an item agent, or move close to it, depending on the priority (first number – here, taking an item has the highest) and on the distance between agents (second number), assuming that the conditions of the interactions are fulfiled for both agents. The 0/ column contains reflexive interactions (i.e. where the target agent is the source itself). Empty columns and lines have been removed. PP
PP Target PP P
Source
Customer Item Sign Checkout Door
0/
Customer
Wander (0) GoToPlace (1, ∞) Notify (1, 10) Notify (1, 10) Open (10) Notify (1, 15) Close (10) CheckOut(7, 1) SpawnCustomer(1) Notify(1, 10)
Item
Checkout
Queue
Door
MoveTowards (2, 10) MoveTowards (3, 10) StepIn (5, 2) Exit (8, 1) Take (4, 1) MoveOn (6, 1) WalkOut (7, 1)
Handle (8, 1) ShutDown (9, 1)
In the work presented here, we use a model of customer behavior that was previously developed for the simulation of a retail store, aimed at studying human vendors confronted to artificial clients [11]. Thus, the overall behaviors of simulated customers have been validated by marketing experts and are set once and for all in the work described here. In what follows, the vendor was removed for studying only the clients. The corresponding interaction matrix is shown on table 1. To summarize, it is assumed here that all clients have the same overall behavior, but differ in their needs: thus they are endowed at startup with a shopping list, which specifies more or less precisely what items they are likely to buy in the store. The needs may be specified with accuracy, e.g. “SodaCola light, family pack”, or with vague indications, e.g. “spring water”, which can match much more actual items. The purchase decision is implemented by the ’Take’ interaction, which consists of two conditions (the target agent matches an item of the shopping list of the source agent; and: the source has enough money) and one action (put the target in the basket of the source).
From Real Purchase to Realistic Populations of Simulated Customers
5
During the simulation, interactions occur according to the interaction matrix and the state, perceptions and positions of agents, producing a consistent consumer behavior: artificial clients try to find all items which figure in their list, within a limited amount of time. They may be endowed with a mental map of the shop or with no prior knowledge; they also are notified by panels, checkouts, items, etc. about relevant information to help them. In the original work, the shopping lists were either computed through a pure random process (following a Poisson distribution based on the average basket size), or implemented “by hand”, according to a specific mise en sc`ene, in a deliberate intend to put the vendor in a problematic case. In the experiments described below (see section 5), those lists have been built through the knowledge retrieval algorithm we propose.
3
Knowledge representation
Before describing the data mining process which builds prototypes from transactions, we must explain how we represent both kinds of information. This step may require the intervention of a marketing expert, but afterwards the knowledge retrieval process is automatic. 3.1
Items identifiers: from SKUs to meaningful descriptions
In retail stores, each unique product is usually identified by a “Stock-Keeping Unit” (SKU) in order to track availability and demands. The SKU does not necessarily carry any special meaning regarding the nature and characteristics of the product. Other methods, such as the Universal Product Code (UPC), European Article Number (EAN), etc. can be used as well. It may also happen that the actual purchase are anonymised through an automatically generated identifier, e.g. in order to perform basket analysis under strong confidentiality constraints. Those identification methods are actually not well suited for extracting more than co-occurence rules. Relevant marketing knowledge (e.g. product family, quality, relative price, brand image, organic label...) must be added to characterize the products so as to allow an explanatory analysis. In our approach, each unique product is identified by a tuple of strictly positive integers, which encodes the features values that are considered relevant in the application context. For instance, if the relevant features are the brand, the product family and the details (e.g. respectively “SodaCola”, “beverage”, “soda with cola”), then products will be identified only by a triple of integers, e.g. (31, 4, 15). This allows a representation of all products at an arbitrary fine level, including specific labels such as “organic”, “fair trade” or “gluten-free”. Also, continuous values such as the price or weight may be encoded through a prior categorization (e.g. 1 for “cheap”, 2 for “average”, 3 for “expensive”; or 1 to 4 for small to extra large packings). To some extent, the mapping between SKU (or other identification systems) to this kind of integer tuple can be performed automatically, through an appropriate join of databases. Yet, the selection of features that have to be taken into account for providing a relevant description of the products may involve a marketing expertise.
6
P. Mathieu, S. Picault
3.2
Transactions and prototypes
Our data mining process relies upon the recording of actual purchases. The simplest way to do this is to retrieve information directly from the receipts. A transaction can be computed as a mere enumeration of the unique products of a receipt. Quantities are not taken into account as such (exactly like in the classical affinity analysis process), though they could be added as a trait of each item (like other continuous features, see above). Thus, from a list of SKU, we build a set of integer tuples. From those transactions, the knowledge retrieval process consists in building prototypes, which are aimed at describing an “abstract receipt” so as to characterize clusters of receipts. In order to do so, we introduce the concept of prototype item, which are also integer tuples, but allowing the value 0 as a wildcard. For instance, a product characterized as “any brand”, “beverage”, “soda with cola”, could be described by the triple (0, 4, 15). The null tuple (0, 0, 0) means “any item”. A prototype is simply a set of prototype items. Those prototypes, built from real data, may also be used as a “shopping list” for simulated customers, because it may often happen that only few traits of the desired items are specified. For instance, Mr Smith always buys “soda with cola” but is indifferent to the brand, while Mr Wesson is likely to buy any organic yoghurts from the brand “Yoopla”. The use of the 0 wildcard is very helpful for expressing such vague wishes. In the next section, we show how such prototypes are actually built from the transactions.
4
Steps of the data mining process
In order to analyse the purchases, we proceed as follows: 1. The transactions database is partitioned into clusters (this requires first to define a distance measure between receipts, which is itself based on a distance measure between items). 2. For each cluster: (a) all items that appear in the transactions are in turn classified so as to build prototype items ; (b) the prototype composed of the union of prototype items is scored against the transactions of the cluster. 4.1
A measure of item similarity
Since some items of a customer’s shopping list may be not fully specified, it is expected that several customers who have the same prototype (i.e. the same shopping list) will not get exactly the same items. Thus, if the distance between transactions relies only upon the equality of items, it is likely to produce a defective clustering. Instead, we propose to modulate the comparison between transactions, by taking into account the distance between items.
From Real Purchase to Realistic Populations of Simulated Customers
7
A simple way to do so is to compute a Hamming-like distance (or conversely, a Hamming-like similarity index). If two items are encoded by the tuples I = ( f1 , ..., fn ) and I ′ = ( f1′ , ..., fn′ ), the similarity between items is defined as: σ (I, I ′ ) = 1n ∑ni=1 δ ( fi , fi′ ) where δ ( fi , fi′ ) = 1 if fi = fi′ or fi = 0 or fi′ = 0, and 0 otherwise (note that this definition works fine for actual items and for prototype items as well). 4.2
A measure of transaction similarity
In order to compare transactions, we started with a well-known measure which is frequently used for measuring similarities between sets: the Jaccard index [14]. It is de| |X∩Y | fined for any subset X,Y as follows: J(X,Y ) = |X∩Y |X∪Y | = |X|+|Y |−|X∩Y | As it has been previously said, the rough use of the Jaccard index cannot fit our needs, since it would make no difference between disjoint sets and sets that contain very similar but different items. Thus we propose an extension of the Jaccard index, based on the distance between items. It consists in computing the best matching score between items of both transactions (thus we called it the best-match Jaccard index, denoted by JBM ). To compute it between a transaction T = {I1 , ..., I p } and another transaction T ′ = {I1′ , ..., Iq′ }, we follow these steps: 1. Compute the matching matrix (σi, j ) with σi, j = σ (Ii , I ′j ) 2. For each k between 1 and min(p, q): (a) compute µk = maxi, j (σi, j ) = σi⋆ , j⋆ (if several (i, j) values verifyµk = σi, j , we take one pair (i⋆ , j⋆ ) which minimizes: ∑i6=i⋆ σi, j⋆ + ∑ j6= j⋆ σi⋆ , j ) (b) replace (σi, j ) by the submatrix obtained by deleting row i⋆ and column j⋆ min(p,q)
3. µBM = ∑k=1 µk plays the same role as |X ∩Y | in the classical Jaccard index, so µBM we have: JBM (T, T ′ ) = p+q−µ BM For example, we take T ={(1, 1, 2), (3, 5, 8), (13, 21, 34)} and T ′ ={(1, 1, 2), (3, 6, 8), (12, 13, 14), (1, 1, 34)}. Since T ∩ T ′ ={(1, 1, 2)} only, we have J(T, T ′ ) ≈ 0.1666667, while JBM (T, T ′ ) = 0.4 because several items of T and T ′ are close. A large number of similarity and distance measures can be used as well [15, 16]; actually, the method we propose is also suitable for frequently used measures, such as Ochiai [17] or Sørensen-Dice [18], which can be extended using the same best matching algorithm as we did for Jaccard. This point was checked experimentally through the same procedure as we present in section 5. 4.3
Transactions clustering
We apply the best-match Jaccard index for computing a distance matrix between all transactions in the database: ∆BM = (di, j ) with di, j = 1 − JBM (Ti , T j ). The distance matrix can be used with a large number of clustering techniques; we chose a very classical hierarchical clustering algorithm, namely that implemented in the flashClust library in R [19], which easily provides dendrograms w.r.t. the similarity between transactions.
8
P. Mathieu, S. Picault
Then, the dendrogram can be cut into K classes (e.g. with the R function cutree [20]). The appropriate value of K is far from obvious, so we tried to evaluate empirically the appropriate height to cut the tree from randomly generated prototypes, where the value of K was known (see section 5). We found that a height h ≈ 0.6 was quite convenient in all cases. 4.4
Prototype induction
Building a prototype for a class, means essentially finding a set of integer tuples (with zeros allowed) which has the highest matching value with the N transactions of the class w.r.t. the best-match Jaccard index. We start with a frequency analysis, i.e. for each item I that appears on some transactions of the given class, we compute f (I) as the ratio between the number of transactions that include I and N. Rare items, i.e. with f (I) < ε, may be considered casual purchase, or “noise”, and simply discarded (typically this works fine with ε ≈ N1 ). Conversely, very frequent items, i.e. with f (I) > θ , may be considered a must-have and kept unchanged in the prototype (typically θ ≈ 0.95). Regarding intermediate items, we have to classify them again, in order to be able to detect that e.g. “SodaCola” products are always associated with “organic yoghurts” of several brands. Thus we compute a matrix distance between items: (Di, j ) with Di, j = 1 − σ (Ii , I j ) and use it for building a dendrogram of the items. There again, the number of classes KI is not known a priori. Therefore, we iterate the following process for several possible values of KI : 1. for each cluster: build a prototype item by putting zeros where features differ ; for instance, if the items in the cluster are (1, 5, 7), (1, 6, 7) and (1, 12, 7), then the prototype item is (1, 0, 7) 2. collect all prototype items and join them with the very frequent items (see above) to build the candidate prototype PKI 3. compute the score of KI as the average value of JBM (PKI , T ) for all transactions T in the original class. Finally, we keep the value KI⋆ (and associated prototype PKI⋆ ) which maximizes this score.
5
Validation of the data mining process
As explained before, prototypes are used in a simulation process to produce artificial transactions. Since the agents perform autonomous behaviors, according to their shopping lists which contain prototype items (i.e. with wildcards), and in the situated context of a realistic store with possibly missing (or hard to find) items, the transactions that occur in the simulation are not expected to be exactly the same than the real ones. Yet, we have to assess that the simulation is able to reproduce the same kind of customer behavior that is observed in reality. Therefore we can analyse the transactions produced by the simulation, build the corresponding prototypes through the same data mining process, and compare them to the prototypes that resulted from real data.
0.4
0.6
0.8
1.0
0.2
0.4
0.6
0.8
Dendrogram
(b)
Height 0.0
0.2
0.4
0.6
0.8
507 508 506 505 501 526 519 520 511 523 502 528
504 509 527 513 515 521 503 522
518 524 525
Dendrogram
(c)
21 6614 52 70 35 87 8 65 23 85 4 49 2 44 65 27 40 55 7853 71 33 34210 86 1 58 36 22 18 12 67 8 2 48 9 80 31 11 13 72 68 57 88 5 4 74 47 73 93 4 92 83 95 17 2659 16 9691 39 56 98 30 895032 46 63 79 19 69 25 2484 29 51 9997 5 38 34 61 94 37 76 75 90 43 81 60 15 20 4162 28 100 77 7 64 222 271 230 285 206 243 227 265 297 223 217 293 263 276 258 281 287 202 218 242 292 283 240 235 254 290 228 275272 209 251 232 249 248 277 241 262 273 298 247 289 278 208 221 2 55 246 282 211 267 219 261 205 237 210 201 260 203 299 266 284 256 252264 279 225 294 257 229 234 300214 207 288 236 295 238 259245 250 224 226 216 213239 270 212 280 233 253244 274 204 286 231 291 220 268 215 296 269 453 408 457 485 487 452 492 481 411 470436 416 455 419 429 412 473 407 479 489 431 459 490 500 417 424 496 443 486406 414 456 468480 432 445 422 464 465 498 403 410 441 440 475458 409 434 447 418 448 499 425 437 420 450 467 471 478 494 474 483 482 462 484 401 404 491 415 461 460454 438 469 413 472 449 495 423 421 428 430 433402 488405 442 477 444 446 466 476 426 451 427 435 497 439 463493 133 195 172155 166 187 110 118 150 193 138 141 165 148 164 130 176161 169 175 131 158 168 198 122 136 124 177 197190 146 117 199 137 103 139 153 107 191192 160 112 189 132 145 134 185 114 101 126 125 196 144 113 104 106 135 188 183 173 184 154 123 121 156 181 115 102 128 129 147 109 171 1142 20 200 105 116 162 167 170 182 127 178 108 152 157 143 180 186 111 119 194 151 163 140 159 174 149 179 363 312 313 352 330 358 368 393 346 351319 325 365 348 362 309 302 306 335 308 360 397 321 383 332 334 329 388 395 304 364 339 373386 372 385 343 369 376 399 328 347 370 371 375 307 378 387 394 317 318 398 320 323 400 316 342 344 382 353 361311 305 381 337 389 354 303 338 301 315 357 326 390 359 396314 380 377 310 345 350322 379 333 349 340 341 366336 355324 327 374 391 392331 356 384367
512
510
514 517516
From Real Purchase to Realistic Populations of Simulated Customers
Height 0.0
6642063 2587 192 2990 1042 2896 1504 1974 2482 2558 3162 259 2097 2082 1152 1364 411 2203 576 1767 2342 2871 2786 1916 2224 1671 2538 2517 3168 1710 1525 2951 3106 836 1088 244 412 428 212 2288 220 1134 617 1004 2118 341 2829 1 653 2242 840 1261 194 1913 1743 3115 1927 2914 2380 887 2693 2687 3169 2045 1706 3191 1369 18 683 642 1113 2765 675 1166 2314 2407 1476 2825 1911 195 898 2356 2462 647 1940 2960 1647 681 2909 2544 2804 1834 1959 197 238251 396 2416 308 2463 22561 62 707 515 1292 2034 2217 1 902 1040 3069 2256 257 3199509 70 222 293 1812 405 1954 1263 2601 2272 2396 283 1615 1661 675 1140 1813 1642 2956 1835 1441 2120 990 1481 756 779264 178 975 985 1178 2534 44 2019 2596 2604 599 1656 1264 1870 504 749 231 1068 1309 2414 8284 242 215 3156 976 1532 2 099 2659 2337 2232 2627 374 2585 1 284 1556 1054 1213 43 758 2237 2893 1942 2060 327 1530 1033 1039 2736 1346 30551599 302 3146 949 1874 1050 1059 235 2920 2141 1236 922 1543 2067 216 2044 1624 1772 2346 1008 1925 1183 1394 564 3044 554 1077 1783 1872 2137 2318 3104 1890 2644 414 2490 2931 3067 158 2029 26 38 369 906 2182 162 1689 1 149 1838 1881 475 2325 2154 973 1845 3010 1922 2386711 316 2971 947 1108 1295 735 1253 2726 139 465 1173 1027 367 1725 2236 820 135 1828 2630 2234 968 2481 813 3077 986 517 2977 2992 294 930 2899 357 3058 2235 763 2762 1583 2682 149 3017 1745 1938 680 965 1937 908 1486 2746 3054 1470 2611 3186 368 2068 1492 1917 1632 2504 257 624 769 1546 904 1353 2014 2941 425 566 2824 2939 3084 2729 943 2929 494 2102 951 1224 1466 2984 2975 2266 2855263 1019 1055 1822 2963 1436 1161 1205 1516 2219 3026 2519 2979 2844 2974397 690 1318 46 250 80 2945 896 3189 3020 1967 3028 2962 1851 2695 1 607 3160 933 1063 3184 1956 2017 125 254 444 807 1089 2477 2713 2803 2675 2738 431 1499 834 2867 353 2194 2998 867 1330 104 551 2105 2 406 2350 2665 2339 2768 1294 956 2289 417 753 2424 528 544 2618 874 2111 3008 1798 330 1190 2721 888 2352 1781 384 2381 1453 1852 710 37 2514 1397 2767 1445 23481427 967 2859 9 1045 1237 639 659 761 3078 892 1320 1862 688 1639 1323 1893 2502 2569 1242 2005 12955 758 123 1551 1383 1663 438 2995 1107 2640 2617 2324 1121 3083 2458 2589 2428 624 1975 1018 1137 1879 1485 2579 2599 1900 2280 2069 2781 3196 394 745 2226 7 22 505 2255 243 1757 142 236 386 1774 241 1086 2982 1892 2331 486 1372 2774 800 1608 2507 2715 787 1433 1549 2146 1550 1747 1764 2933 416 1248 23065 663 228 253 2870 23019 916 300 60 668 2882 377 2669 731 2479 2127 1030 2074 3041 1795 1327 2389 1904 2523 3125 1971 3085 1399 2653 484 2718 146 1939 1299 2714 2185 830 2894 1184 2292 2316 895 388 850 1303 1849 11 1109 1508 2830 2841 531 2070 2935 1701 2547 2648 319 1483 2908 645 2177 654 746 1558 1987 1096 1272 2271 176 1012 1003 1017 778 1765 558 3166 2847 608 1375 4 841 1825 2811 3087 40 2405 1740 1317 1731 2430 2484 402 1104 928 2053 2250 2937 1381 496 970 1367 945 1707 1832 2501 1842 832591 572 2862 1186 2705 1875 2925 1310 1962 992 1037 2225 3130 1720 2436 3792658 2281 1542 532 437 69 1823 480 2708 1794 2731 3113 2622 387 1735 1928 2717 1335 2969 1442 3037 481 3050 2228 164 206 2305 2302 2441 507 607 944 1339 2175 1670 755 1930 2869 1439 1616 2186 22571826 2290 404 940 2950 454 573 1752 17 1304 3128 1534 2103 1880 2052 32 2915 2522 2806718 857 2176 770 1098 1536 574 210 2275 2836 1082 2207 290 651 1540 809 2938 1270 1630 1837 1722 3136 2311 137 271 78 321 1469 2900 180 2593 2371 2525 1960 3098527 822 269 1196 747 1562 1201 2131 143 1329 852 1677 2754 808 2089 11949 064 1792 1564 3040 2880 256 47 1250 430 1579 1382 2719 991 12 245 998 7 86 794 382 1620 1746 2168 432 2152 2307 1021 1420 2584 1 800 30312756 1538 487 1079117 2637 705 2306 657 1810 371 1882 1978 1555 1692 806 783 1131 217 1288 897 631 2499 1182 2166 2101 288 730 1868 2229 2512 1561 2858 1262 2652 2025 1419 1610 1600 172 2957 495 1484 1488 1301 2560 1498 1696 1202 2890 1151 153 913 1628 1345 1918 1601 2066 48 2865 2796 1278 1885 2750 2184 508 1114 2326 1120 1535 1430 177 1618 1733 3157 772 42 221 2734 15 25131990 2180 342 2641 165 560 866 1362 9115 2247 2764 2555 1779 2351 2470 460 669 1464 2887 373 1176 869 2061 335 632 1714 1948 1277 27882155 2309 418 2942 383 1804 488 2295 2073 1443 2744 2338 1544 2013 3061 3080 1143 102 21 1116 1985 2940 846 868 3141 652 798 2997 278 1816 611 2773 2839 604 1332 536 1728 569 2388 1300 161 592 3122 2541 2733 640 1319 2364 1594 2113 1750 1438 1963 2215 2656 3097 2264 2083 2993 1 48 1865 7171907 855 491 2 286 1207 2794 1296 277814 251 1691 167 1233 1524 1155 555 633 2494 2415 2581 2320 1889 36 623 2459 1477 520 537 1658 2710 2874 2873 1171 2655 1254 2128 1282 1958 2167 912 919 3094 2651 8 1245 1643 955 516 1259 1894 3182 3074 1209 76 122 701 1705 348 953 1729 2161 1541 2864 646 856 663 1153 2149 1596 1888 865 2885 109280 55 901 152 199 2270 3185 2590 2845 1220 24 612 1226 828 1991 2489 95 734 2039 525 2818 861 1170 1631 2666 119 981 2769 1719 3180 3072 1866 1869 3171 2385 2716 462 2686 129 420 1302 853 1644 2711 401 744 740 1821 2159 2691 506 1545 1396 25392964 613 538 1356 803 1449 1859 163 1276 829 2930 1699 175 239 1943 2822 999 2570 1749 2571 1133 2790 2723 20882926 542 2093 1289 1410 2273 51995 1920 1103 941 957 1260 2003 1239 2533 1337 2282 406 2603 1366 939 2378 12402 412 1043 952 1478 1863 2528 1298 2819 1500 2041 3029 2654 2493 3110 724 2301 2798 3013 2518 3073 1075 2973 2932 1435 1574 1083 1931 3076 2668 2793725 2072 1667 445 1011 1101 29 3093 423 448 1081 1297 2875 2204 2431 875 1945 1313 927 1078 1379 227 317 1724 1844 1984 2465 2179 2983 1650 1801 501 1188 63 2837 1340 2958 11858 651 183 1144 1737 709 2347 1069 1243 1748 99 1093 24432065 2265 2058 26492742 291 814 1848 2610 2133 2727 1423 1754 3107 2018 2379 2605 160 2461 1361 2553 2008 92 2438 31 1814 1921 3082 1046 1128 1047 2857 1405 625 1234 1831 2451 2813 1727 605 1391 2233 1341 1578 2417 2469 776 2683 2142 381 37 15821567 1871 2548 2464 1041 2355 723 2384 815 1843 2104 191 1090 1167 413 2004 410 557 1652 2816 1057 2626 643 2427 1846 565 1533 1495 359 500 860 1462 1654 609 2285 1235 2759 719 2680 863 854 1126 588 1452 1807 1977 824 2524 435 2792 1487 1451 2801 2888 559 224452 1279 2492 364 2706 497 1136 3190 980 190 2009 799 2473 3005 1877 436 407 716 2080 1590 762 1460 2197 2868 1563 2586 1895 543 1903 696 1191 2021 1531 3133 3167 2800 49 1148 1229 314 356 2485 1147 739 1680 1 05 1480 2707 1467 22601521 987 672 630 1981 593 687 1097 2390 1378 114 2279 3 47 424 193 2552 492 1613 1458 1588 1734 2328 1393 935 1611 3101 2614 10 426 1673 849 3066 303 2546 274 1117 2886 2972 2536 899 170 936 1527 2515 1007 1223 272 3172 2577 1355 547 2919 3109 498 2452 1791 203679 338 391 459 2511 3138 403 703 322 785 419 873 336 715 790 1702 1517 2245 1336 638 1897 2632 616 19572124 2075 2206 1172 1519 511 2486 1266 1512 400 2238 1169 1979 2211 218 2789 1479 1307 2480 2031 2147 1342 1461 2483 1371 2802126 1392 2212 267 1424 2595 346 22 3036 2787 226 2861 182 118240 2852 2304 2308 1269 1657 224 2843 3198 39 881 325 977 1216 1365 816 1780826 1187 1909 1400 2620 577 1106 1857 1513 1773 1951 582 1668 1409 2709 2775 156 916 2449 3165 2684 587 2132 1992 2035 2062 2361 2989 1739 2336 145 2815 121 26741174 2100 2542 27581674 1228 2672 819 1238 929 1510 1782 134 299 1738 1537 1697 765 1711 3108 695 1350 479 2616 238 366 729 934 545 1641 979 1715 2012 2333 1659 2784 1173 1820 842 1769 523 261 1204 2985 1363 1883 514 782 1221 502 3147 324 526 959 529 530 2253 884 22021493 1016 1062 2496 2623 996 909 693 827 202 1306 1402 490 1672 3053 1732 1360 2959 1351 2216 1926 2598 265 1418 1283 200 1570 2751 2823 1999 2913 767 1473 328 19 946 315 2474 2895 3143 1789 2907 2162 2988 585 1741 246 439 1157 2724 301 674 879 2510 101 2130 1271 472 2516 2832 764 1162 61 266 2110 2447 332 2426 361 570 1785 2468 1763 247 1447 1029 1584 2457 25561753 915 229 1759 1217 2987 1026 2071 1617 1929 2898 1291 2055 11708 568 2976 1770 2866 2737 469 1860 627 2126 1626 1703 433 1598 2612 1225 2435 673 141 329 1645 3033 252 1965 2568 2334 468 1946 2606 1623 2129914 1118 2050 349 1968 2200 2418 449 1482 1867 2676 1087 1501 784 2341 2300 2526 3014 2905 1333 1604 3178 188 1955 590 2850533 1168 2608 25081348 2943 2027 2838 591 2965 446 1833 636 1125 948 1208 3154 1 678 2358 393 477 2809 320 2201 629 237 339 1518 2967 2391 1589 2442 2872 3090 864 1344 2671 2741 1119 821 1256 2403 1456 2139 1718 2410 1685 2602 2891 1387 2703 2115 340 2678 1786 2049 2688 331 96 2096 1966 2924 635 1547 535 691 2573 307 2936 1246 1471 1395 2095 1840 147 1240 2432 201 1060 2254 886 326 1368 1514 2114 2024 2897 3002 89 1571 2098 1666 309 811 2849 2884 1572 757 1560 3021 1761 2520 2183 443 1806 995 2981 305 3012 2952 282 1953 2434 1193 1352 1325 225 586 2419 775 2319 1474 2037 1575 2777 2712 1415 2192 2860 1818 1712 843 1421 2638 2360 87 1334 2509 2340 1676 284 1061 1377 3145 2297 1247 2056 3197 2743 911 1163 292 1634 1941 464 1428 2785 2740 1648 1884 773 1694 2600 1322 1384 2689 2471 2467 111 174 1688 2231 7 66 679 463 399 1910 3 58 273 2770 2444 2010 567 3192 2633 1526 876 1222 1633 1961 2051 962 1994 2222 2643 499 13 276 1805 1597 3022 2700 1591 3174 62 422 1554 296 2287 2685 3068 209 1552 805 2745 2094 3193 556 562 1606 2854 65 1784 2125 3183 1972 1686 2820 1401 3179 2136 2701 2267 2158 658 2551 1573 795 1811 2399 1619 2365 323 112 989 11177 4402109 334 2043 136 450 352 950 1936 1357 2269 2283 1629 2720 870 2856 1388 2367 2853 2108 1934 1557 28781411 5391723 1730 1878 1331 2532 1682 2163 1494 2375 1084 1454 1775 1964 2195 2564 2776 2033 1853 2170 1976 3095 204 844 2048 2421 1386 1001 1425 1716 2826 1407 2171 2227 1416 3135 2748 3149 3045 1824 3027 2223 3148 682 2928 2312 1085 2335 2883 2978 2362 2251 1625 138 1679 3105 942 994 1873 546 1199 372 2771 2922 3175 310 972 1417 2828 2580 1146 2970 1717 1343 1528 2646 2387 3060 1581 53 670 130 2376 6 568 1 51 584 1052 2621 1993 2567 926 1437 441 743 2411 578 600 1265 24401051 2214 620 789 21036 917 179 3127 593024 614 698 3176 72 802 2011 1490 2563 181 28 3116 2220 1444 1998 198 71 3099 285 287 1520 1609 1756 984 297 2980 1275 3088 1839 777 1799 2466 997 1446 1509 2377 150 1145 1693 2797 634 7102500 2357 2398 91 113 350 2472 268 442 563 1506 1459 2575 23153 757 1891 2150 1690 2209 622 1952 1192 73 157 2085 3052 2112 1338 2370 2799 1308 2747 2148 2566 2892 2007 2028 2079 1847 2877 720 721 3150 2369 1829 2808 1850 2699 2425 2193 2160 311 2549 918 2827 925 2354 2679 345 902 1316 2609 534 580 1660 2647 380 771 1760 2291 1290 2498 2506 871 1091 1713 75 637 3039 85 1553 3158 2923 1180 2345 589 2673 699 552 1219 1076 2284 74 1924 650 1970 31649 119 3103 2263 969 2274 2393 893 900 312 2535 1206 2576 3051 823 1698 697 2879 1463 2246 2557 42645 73 1071 2791 1565 1000 1622 3132 365 2084 318 1515 3187 1214 2221 2187 3043 389 727 1855 2761 1548 11811700 258 2642 2450 1135 2116 583 655 3032 2032 1432 512 20462766 2760 561 2543 2445 155 1010 3131 2550 2588 1025 2495 385 94 1112 792 1595 2144 26342353 1009 2530 3177 351 2213 451 1129 248 86 306 3064 648 903 1159 741 1755 2091 1158 1742 1602 2239 41408 55 2488 2639 103 3151 503 1403 3025 3124 1 200 2821 1431 1390 159 732 2181 2401 2755 2722 1070 1434219 1429 2022 1827 1776 2540 1404 1566 2934 313 25 774 2 863 838 2889 392 3120 2057 2848 1035 2059 2344 2814 3140 2117 2248 2876 736 2299 2140 1986 2961 3797 030 2446 2392 457 1559 1067 2559 35 2092 482 714 700 1503 752 1919 3063 2947 1102 2042 397 1586 540 982 081 575 1646 2 968 603 127 2408 510 2261 1593 2077 279 2157 64 1726 833 1933 3007 344 2704 2 278 354 2763 1465 1681 1635 2090 2363 1569 2531 3195 1932 2657 780 1778 1100 1448 232 932 1267 1111 1905 3 056 678 847 1210 1376 831 1796 825 1640 1080 2635 708 2702 2332 2368 2038 1195 3117 1185 3114 796 1389 6 06 610 3188 676 376 1472 1028 2078 295 553 1354 2991 2670 931 954 2592 2851 1603 2783 889 168 628 2906 1809 2491 7 68 2554 1286 1935 1215 2487 1370 1856 2503 548 1886 1996 748 1022 1274 185 2565 2782 124 880 2121 2323 1251 1766 3134 1475 108 223 1665 203 1803 2690 3102 1073 2681 106 733 2413 602 1105 1901 3164 1049 726 1212 3046 2151 120 375 260 1704 68 2313 1997 2310 3038 2412 1095 3121 2780 1324 1621 2537 1906 3048 1915 45 810 3004 2016 133 1092 2327 1150 2054 1655 2437 213 2153 541 2650 2460 2383 3071 1771 1522 3075 1507 1491 781 3092 3023 1422 1074 427 706 3015 1819 1989 6 601110 619 207 378 728 2772 2453 1973 2583 1969 1189 2881 8 39 598 1281 1 69 597 3137 2994 2321 2999 2294 1896 1768 1683 1636 110 461 2047 1983 1876 1130 907 817 751 1950 493 960 1 402400 415 594 2954 3 1287 2218 2619 1580 2191 804 1489 1141 2277 1024 30 1005 2026 1258 1160 3126 684 2064 2296 3111 3070 3001 1614 1751 1529 453 851 923 33 878 704 41 98 2840 1861 518 1034 702 1154 56 2521 3096 1255 208 2903 2366 470 1006 3152 54 1099 3144 2000 2433 23 3159 2404 894 2911 128 1787 2190 2268 3 006 2625 3129 2 249 649 694 360 818 2258 3091 1138 1232 1426 6011218 2040 2173 653 2015 1002 2293 2779 27 1257 1058 2698 1 044 1197 66 656 2529 211 1014 2904 667 166 519 1326 3173 1914 626 1777 97 2953 1066 2572 2831 2834 395 1797 2259 2455 877 1115 2597 2317 1311 1505 2835 1038 2178 2276 2119 2662 100 429 521 1815 2243 2409 1947 1312 1695 920 1523 1031 2169 1802 483 595 304 988 2189 2002 3049 3891 112 16 665 2423 641 662 2817 2 578 1887 801 882 2 343 2298 1912 2349 3 170 692 298 1721 145 196 1175 2422 2 329 275 2986 1156 2135 1592 2230 1231 2631 2739 50 938 1315 1347 921 2948 1664 2505 2545 184 3057 677 2478337 1577 270 333 1349 476 1244 685 859 1065 249 1122 759 2286 582 474 2448 1836 671 1123 2 728 2475 3047 3123 513 712 1053 478 1539 2165 2628 1132 93 1020 2 210 2006 3100 214 3200 1864 596 1015 1314 549 2439 1230 1496 2174 2901 793 458 1511 77 1808 1273 858 1898 1385 255 421 2374 2753 689 1788 2805 3194 363 963 2661 2846 2106 3118 2076 2629 971 2810 466 812 2205 2497 186 1124 571 2023 2696 2725 2198 3062 408 467 835 2020 1687 2372 1638 230 1293 1502 1359 2697 1321 2574 644 233 1980 2667 187 2456 205 1627 107 550 1450 2394 2303 1094 832 1398 1252 485 1023 7 3003 618 132 2134 3034 3142 2749 2138 2692 1908 2812 2795 1457 1127 2322 8 83 905 131 277 2199 20 1662 452 90 848 1203 2454 3018 2395 2315 1744 983 2359 917 2087 2660 837 1227 1709 67 1587 7 38 1605 3079 2107 1374 1164 440 2664 1684 154 362 978 2607 81 1899 2143 974 1830 1373 234 2921 1988 1736 2123 2833 2842 2594 456 754 2240 1612 2636 3016 1048 2001 281 966 2420 2429 1669 1139 666 2164 2944 2677 291888 2912 1793 2927 409 522 581 1249 791 1013 1194 1305 2966 171 447 2188 2562 615 2730 2208 3011 1576 2241 3163 1358 1406 2949 3081 1198 2172 3161 3035 993 1280 1817 621 1285 2373 1762 144 2732 1944 964 1585 1211 1661 3086 937 1032 1072 3089 2910 1380 434 1179 2196 2615 390 3009 58 845 1328872 1413 1982 3139 2086 742 961 686 750 2030 21923 946 885 355 34 3181 862 1790 2262 1165 2735 579 1414 713 2902 1241 3155 189 524 471 2613 3 98 2527 2996 289 1497 370 1142 788 2752 2807 2156 2252 1854 2476 13042 16 2122958 1468 2694 343 1268 890 1455 2330 489 1056 3059 1637 1841 760 924 3000
9
However, a first step towards analysing the outcome of the multiagent simulation is to prove the robustness of our prototype building method. Otherwise, possible differences between the prototypes that reflect the activity of the agents and the original ones, could not be explained by the behavior of the agents or the characteristics of the environment, but maybe just by a high sensitivity of the data mining process to perturbations. Thus we present here how the robustness of our analysis method has been tested.
0.2
Dendrogram
(a)
Stochastic simulations of prototypes instantiation
Fig. 1. Dendrograms built according to the similarity of transactions in 3 experiments. Parameters: (a) NP = 4, NI = {5, 10, 20, 40}, NT = 200, NA = 5 %, NM = 5 %, NO = 0; (b) NP = 8, NI = 10, NT = 400, NA = 5 %, NM = 5 %, NO = 0; (c) NP = 5, NI = 20, NT = 100, NA = 5 %, NM = 5 %, NO = 5 %. The horizontal line represents the cut height (0.6).
5.1
In order to perform extensive tests, we generated several sets of prototypes, each one containing random prototype items. Then, in order to obtain a coarse-grained, but quick, simulation of the purchase induced by such prototypes, we instantiated them through a stochastic process, based on the following parameters:
– the number NP of prototypes to test – the number of prototype items NI (i) in each prototype i – the number of transactions per prototype, NT (i), which determines how many transactions are instantiated for prototype i – the number of additional random items NA (i) which indicates how many randomly chosen items are added to transactions of prototype i (we do so to represent e.g. casual or compulsory purchase which are not necessarily related to the prototypical purchase habits) – the number of missing items NM (i) which indicates, in prototype i, how many prototype items will be kept uninstantiated (this offers the possibility that some items may not be found or not be needed) – the number NO of transactions that do not belong to any prototype (and generated completely randomly).
The “instantiation” of a prototype item consists in replacing each zero by a random, strictly positive, value. Thus this operation depends on the domain of the tuples that
Height 0.0
483 542 379 599 420 625 645 154 33 316 196 141 560 569 707 216 279 289 418 523 543 475 694 715 377 571 680 82 180 274 533 566110 619 698 574 735 198 106 419 563 774 140 265 111 713 737 41 349 215 237 515 327 609 526 66 246 43 153 671 678 643 120 656 497 663 218 283 22 299 644 720513 8 56 509 182 44 547 247 116 408 103 261 145 186 286 244 57 512 189 452 796 212 217 275 164 752 143 321 459482 702 584 494 602 73172 455 578 2 20 284 480 800 407 76 736 428 556 626 403 444 538 343 440 100 424 426 262 502 367 425 605 409 173 748 133 381 750 469 313 397 206 142 570 658 641 788 203 229 466 537 635 301 356 298 465 567 315 340 514 170 209 354 79 765 795 740 467 572 267 777 199 21 31 151 149 414 434 783 329 78 302 5 49733 699 339 431 762 34 136 767 172 75626 450 147 668 266 516 317 287 594 188 306727 587 744 310 506 524 675 277632 673 109 243 352 227 596 16 264 719 350 476 127 410 137 755 263 285 214 637 67 27195 96 446 48 29385 781 689 558 199 79917 360 107 487749 314 665672 63 651 763 90 112 169 276 441 660 40 295 474 723 37 92 346 326 385 624 320 398 197 739 579 347 52 436 190 126 399 86 761 117 535 695 36 430 224323 337 369191 548 544 10 618 627 696 114 205 507 115 305 481 655 161 24 631 29 576 383 236 640 9270 361 772 201 705 146 500 662 12 38 135 519 192 580 621 47 401 269 622 94318 144 463 531 581 242 449 484 230 380 488 657 7 78 798 13 685 322 457 768 545 211 590 7791 76 525 89 168 319 583 167 611 623 634 477 57565 447 18 208 540 129 292 179 390 789387 155 231 272 14161 128 511 400 309 797 453 184 225 5 423 252 388221 3549 670734 554 642260 486 775 690 238 239 522732 234 770412 125 248 58 207 138 328 46 123 171 75 91693 51 204 588 28312 335613 692 373 759448 429 667 213 332 250 753647 573 721176 98 617 378 21 130 518 794 751 747 709 697 652 646 636 589 562 552 527 521 510 472 470 456 435 402 368 290 281 222 193 185 150 88 87 68 4 32 790 786 757 730 714 701 677 676 638 607 601 598 592 561 559 553 517 443 417 415 406 376 375 351 333 296 268 253 249 235 228 174 118 163 325 257 11 194 661 490 532764 278 280 771 458 564427 557 288 421254 251 464792 374 565 55577 460 371 633766 536 202 25 59195 364241 45 541 308 97 232529 348 20485 259793 725 754717 610 471 586 712 331 628 787649 113 716629 491 604722 608 365 654 39 393 77 639745 53 255 784 210 593 729 479 81 615 437 131 528 503 504 132 370 616 156 439389 158454 746 700 664 758 223 294 653 760 74344 738 728 219 496 683 83 256 630 438 782 84 432 686 152 612157 166 741 358 679 779 539 614 582 258 336 422 785 291 395 404 650 445 555 345 70 304 297 688 493 620 724 102 468 462 175 391 226 568 687 353 282 520 691 706 366 334 104 433 743 71 451 585 357 597 534 405 183 372 489 1 245 105 591 338 341 674 159 648 742 708 330 60 187 773 24 181 681 669 461 416 359 342 50 73 307 6 311 233 505 492 7 108 684 69 501 42 16215 413 392 293 363 19 273 119 134 101 200 603 718 122 27 240 726 396 80 703 324 551 139 178 600 659 62 442 499 362 780 394 354 498 177 384 355 508 550 710711 165 382 606 666 723 04682 495 546 595 769 160 300148 64 411 478 30 530473 303 386
10
P. Mathieu, S. Picault
describe concrete items, i.e. the maximum values allowed for of each integer in the tuple. In our experiments we used 5-element tuples with maximum values (20, 100, 10, 5, 2) (on the basis of earlier work [11]). We conducted automatic experiments and evaluations with a combination of all parameters within the following ranges: NP : 4 – 10; NI : 5, 10, 20, 40; NT : 50, 100, 200, 400, 800; NA : 0, 5, 10 % of items in the transactions; NM 0, 5, 10 % of items in the transactions; NO : 0, 5, 10 % of total transactions. 5.2
Results and discussion
As figure 1 shows on three experiments, transactions produced by the instantiation of random prototypes are well discriminated: cutting the trees at height ≈ 0.6 is sufficient to identify clusters that exactly reflect the original ones (cf. fig. 2a-b), even when transactions are built with random additional items or with random missing items. When the database also contains random transactions, i.e. which do not come from any existing prototype, the clustering still identifies the original clusters, but also very small classes (see fig. 2c), which are most of the time singletons. When applying the induction process, those classes can be simply discarded because there is nothing to generalize in them. In all experiments (up to NO = 10 % of the total number of transactions) the prototype building process was successful, which indicates enough robustness.
best.match.jaccard 5ï200ïï10ï200ïï20ï200ïï40ï200 0
50
100
best.match.jaccard 5ï20ï100ïo5ï20
best.match.jaccard 8ï10ï400
150
200
0
50
100
150
200
250
300
350
400
0
20
40
60
80
100
8
4 7
5 6
actual prototype
actual prototype
actual prototype
3 5
4
2 3
2
4 3 2 1
1 1
1
2
3
estimated prototype
4
0 1
2
3
4
5
estimated prototype
6
7
8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
estimated prototype
(a)
(b) (c)
Fig. 2. Level plot comparison between estimated transactions clusters (abscissae) and original prototypes (ordinates) for experiments (a), (b) and (c). While (a) and (b) match perfectly, in (c) there are “noise” transactions (prototype “0”) generated from pure random choice. It appears that they are not classified among the “true” clusters, but instead are put in small classes (here, 1 for each random transaction).
6
Experimental setup for multiagent simulations
At the present time, the multiagent simulator designed in [11] has been modified so as to represent shopping lists with prototypes and the items by integer tuples. We have conducted preliminary experiments with the same randomly generated prototypes than in stochastic simulations. For now, the simulated transactions reproduce the same prototypes.
From Real Purchase to Realistic Populations of Simulated Customers
11
However, this result is dependent on the time limit which is given to the agents for their shopping trip. If too short, they exit the store without purchasing all items of their list. This does not affect the transaction clustering (because of its robustness towards missing items) but may alter the prototypes that are built from the simulated transactions. Indeed, the observation of the paths of the customers in the store points several “hot areas” where items are easily found: conversely, missing items are often the same (contrary to what happened in stochastic simulations), thus the corresponding prototype is a subset of the original one. Far from being a limitation of the system, this property gives insights on how such kinds of simulation may help the placement of products, the spatial organization of the store, etc.: the limitation of time spent in the shopping trip is crucial, not only for the purpose of realism, but more significantly because it appears as the criterion that forces the store manager to optimize the positioning of products. Ongoing work focuses now on the analysis and integration of large databases of real receipts. We are also modifying the environment in order to reproduce the store where the data were collected. We have for instance to integrate the actual positioning of items and information signs in the store. Indeed, the correctness of those informations may have a serious impact on the outcomes of the simulations, so we have to check them carefully and validate them with experts before we can start large-scale simulations.
7
Conclusion
The design of an integrated tool for decision support in the field of grocery retail and marketing is a long-term purpose indeed. However, in this paper we try to combine an incremental simulation approach (which is quite convenient to express complex hypotheses regarding individual behaviors, environmental configuration, etc.) with data mining algorithms (which usually indicates global, statistical features of a system). Our proposal is therefore able to endow agents populations with statistically realistic features, which in turn affect the behavior of the agents so as to produce statistically similar outputs. Far from being only qualitative, we show that this similarity can be measured. As shown above, our process is quite robust to noise in data. The results of the integration of real receipts in the multiagent simulation (still in progress) will be described in further publications. Noteworthy, our method does not try to discover “true” classes of customers, such as a socio-economic, or demographic, or geographic segmentation would aim at. We only intend to capture similarities between traces left by individual actions, and use an abstract description of those traces as parameters of agents behaviors. Thus, taking into account geographic influences or seasonal variations merely relies upon an appropriate choice of the recording extent and duration for the real data. Besides, we believe (though we cannot provide experimental evidences yet) that the transaction and prototype representation we propose can be applied to many other fields where the behavior of the entities can be characterized by such sets of features (e.g. molecular biology with phenotypical expressions of co-activated genes, ecology, or auction management), so as to participate in the bootstrap of multiagent simulations from empirical data.
12
P. Mathieu, S. Picault
References 1. Schwaiger, A., Stahmer, B.: Simmarket: Multiagent-based customer simulation and decision support for category management. In: Proceedings of MATES 2003: multiagent system technologies. Volume 2831 of LNAI., Springer (2003) 74–84 2. Siebers, P.O., Aickelin, U., Celia, H., Clegg, C.W.: Using intelligent agents to understand management practices and retail productivity. In: Proceedings of the Winter Simulation Conference (WSC’07), Washington, D.C. (December 2007) 2212–2220 3. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 13th ACM SIGMOD Conference, Washington, D.C. (May 1993) 4. Mladeni´c, D., Eddy, W.F., Ziolko, S.: Exploratory analysis of retail sales of billions of items. In Goodman, A., Smyth, P., eds.: Frontiers in Data Mining and Bioinformatics. The 33rd Symposium on the Interface of Computing Science and Statistics. (2001) 5. Sheth-Voss, P., Carreras, I.E.: How informative is your segmentation? a simple new metric yields surprising results. Marketing Research (Winter 2010) 9–13 6. Zhang, T., Zhang, D.: Agent-based simulation of consumer purchase decision-making and the decoy effect. Journal of Business Research 60(8) (August 2007) 912–922 Complexities in Markets Special Issue. 7. Yoann, K., Mathieu, P., Picault, S.: An interaction-oriented model of customer behavior for the simulation of supermarkets. In Huang, X.J., Ghorbani, A.A., Hacid, M.S., Yamaguchi, T., eds.: Proceedings of IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’10), IEEE Computer Society (September 2010) 407–410 8. Han, J., Kamber, M.: Data mining: concepts and techniques. Elsevier (2006) 9. Cavique, L.: A scalable algorithm for the market basket analysis. Journal of Retailing and Consumer Services 14(6) (November 2007) 10. Cumby, C., Fano, A., Ghani, R., Krema, M.: Predicting customer shopping lists from pointof-sale purchase data. In: Proceedings of the 10th International Conference on Knowledge Discovery and Data mining (KDD’04), New York, NY, USA, ACM (2004) 402–409 11. Mathieu, P., Panzoli, D., Picault, S.: Virtual customers in an agent world. In Demazeau, Y., et al., eds.: Proceedings of the 10th International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS’12). Advances in Soft Computing, Springer (2012) 12. Kubera, Y., Mathieu, P., Picault, S.: IODA: an interaction-oriented approach for multi-agent based simulations. Journal of Autonomous Agents and Multi-Agent Systems (2011) 1–41 13. Gibson, J.J.: The Ecological Approach to Visual Perception. Hillsdale (1979) ´ 14. Jaccard, P.: Etude comparative de la distribution florale dans une portion des alpes et du jura. Bulletin de la Soci´et´e Vaudoise des Sciences Naturelles 37 (1901) 547–579 15. Choi, S.S., Cha, S.H., Tappert, C.C.: A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics 8(1) (2010) 43–48 16. Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation. In: Proceedings of the SIAM International Conference on Data Mining (SDM 2008), SIAM (2008) 243–254 17. Ochiai, A.: Zoogeographic studies on the soleoid fishes found in japan and its neighbouring regions. Bulletin of the Japanese Society for Fish Science 22 (1957) 526–530 18. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3) (1945) 297–302 19. Langfelder, P., Horvath, S.: Fast R functions for robust correlations and hierarchical clustering. Journal of Statistical Software 46(11) (2012) 1–17 20. Becker, R.A., Chambers, J.M., Wilks, A.R.: The New S Language. Wadsworth & Brooks/Cole (1988)