On Aggregation Bias in Structural Demand Models
Pei-Chun Lai and David A. Bessler Department of Agricultural Economics, Texas A&M University, College Station, TX 77843
Poster prepared for presentation at the Agricultural & Applied Economics Association 2010 AAEA,CAES, & WAEA Joint Annual Meeting, Denver, Colorado, July 25-27, 2010
Copyright 2010 by Pei-Chun Lai and David A. Bessler. All rights reserved. Readers may make verbatim copies of this document for non-commercial purposes by any means, provided that this copyright notice appears on all such copies.
On aggregation bias in structural demand models Pei-Chun Lai and David A Bessler Department of Agricultural Economics, Texas A&M University, College Station, Texas 77843
Introduction Consu m er d em and analysis attracts consid erable attention. It rem ains an open qu estion, how ever, w hether estim ating d em and w ith aggregate d ata is reliable w hen d isaggregate store-level d ata is given. Dem and m od els m ay prod u ce biased resu lts w hen applied to d ata aggregated across stores w ith d ifferent pricing strategies. In this stu d y, the graphical m od el is u sed to investigate the follow ing qu estion: Do we find the same structure when we fit causal models on sub-groupings of stores, as we find when we fit models on aggregate data from all stores? Graphical m ethod s for the d iscovery of cau sal connection in stru ctu ral equ ation m od els ( SEM ) provid e interesting tools to ju stify cau sal claim s betw een variables. N evertheless, an observed relation am ong variables m ight reflect the influ ence of a hid d en com m on cau se, thu s m aking the correlation spu riou s. Fast Cau sal Inference (FCI) algorithm is d eveloped to explore the cau sal stru ctu ral w hen latent confou nd ers exist. We apply constraint based FCI algorithm on the Dom inick’s scanner d ata and zip cod e inform ation for the chain stores. The d ata set contains w eekly sales inform ation (03/ 02/ 95-03/ 06/ 96) of Coke 6 package w ith 12 fl oz abou t 74 su perm arket chain stores in Chicago area. The sales inform ation inclu d es su perm arket’s retail price ( Pr ), m anu factu rer ’s w holesale price ( Pw ), w eekly sold qu antity (Q), and store-specific m ed ian fam ily incom e (I).
If there is a association betw een the correspond ing error term s (i.e. 3 4 ), for SEM w ith correlated errors, the possible influ ence of latent (u nobserved ) confou nd ers can be taken into accou nt by im plem enting the FCI algorithm . Since w e attem pt to d etect the existence of aggregation bias, w e classify the w hole d ata into aggregate and d isaggregate grou ps Figu re 1 illu strates the processing flow of ou r analysis.
Coke 6/12 fl oz weekly sales data.
Handle the data into aggregate and disaggregate parts.
Disaggregate
Aggregate
The whole supermarket stores are categorized into low-consumers’ income and high-consumer’s income groups.
Each store is treated as homogeneous.
Make series of statistical tests on partial correlation and conditional independence relationships among related variables.
Compare the ultimate causal structures.
Figure 1. Flow of model processing
Materials and methods We d o not im pose an a priori cau sal flow am ong the fou r d em and related variables stu d ied here. The u su al stru ctu re of d em and has the follow ing cau sal graph:
Q 1 Pr 2 I 1
Pw
Q
Pw 3 I 4
(a)
The d isaggregate-level d ata is d efined by u sing 1990 U.S. Censu s inform ation. The stores that fall into grou p one are those that face a consu m er base w hose m ed ian fam ily incom e is less than $35,597 (first qu artile). Stores that resid e in zip cod es characterized by m ed ian hou sehold incom es greater than $48,705 d efine ou r second d isaggregate grou p (third qu artile). We ignore stores w here m ed ian fam ily incom es are betw een the first and third qu artiles. Figu res 2 and 3 d isplay the PAGs of aggregate-level and d isaggregate-level d ata. Ou r find ings show that: • For the variables Pw , Pr , and I, they have a d irect effect on sold qu antity, or their relation w ith Q is d u e to a com m on cau se, or a com bination of both. • In the aggregate and low m ed ian hou sehold incom e graphs, either m anu factu rer m ay have m ore pricing pow er over su perm arket retailer, or su perm arket retailer has m ore pricing pow er over m anu factu rer, or there is a latent com m on cau se of Pw and Pr , or there is a com bination of these. • For stores that face m ed ian fam ily incom e greater than $48,705, there is no relation betw een Pw and Pr . • We find agreem ent in 3 ed ges and d irections bu t w e m iss one ed ge.
Mean
Median Value
First Quartile
Third Quartile
42486.7
42065
35597
48705
Table 1. Statistics of store-specifically median household income. The first quartile and third quartile are used to make the disaggregate groups.
The ou tpu t of the FCI algorithm is a partial ancestral graph (PAG ) and the ed ges in a PAG can be interpreted as follow s:
Figure 2. PAG of aggregate-level data (p=0.01).
a b : a is a cau se of b. a b : there is a latent com m on cau se of a and b so that a
Pr
Pr 1 Pw 2
Results
I
Literature cited Kw on, D.H . 2007. Causality and A ggregation Economics: the Use of High Dimensional Panel Data in M icro-Econometrics and M acroEconometrics, Ph. D. Dissertation, Departm ent of Agricu ltu ral Econom ics, Texas A&M University. Pearl, J. 2009. Causality: M odels, Reasoning, and Inference. Cam brid ge University Press, 2nd ed ition.
d oes not cau se b and b d oes not cau se a. a b : a is a cau se of b, or there is a latent com m on cau se of a and b, or both. a ◦— ◦b : either a is a cau se of b or b is a cau se of a, or there is a latent com m on cau se of a and b, or there is a com bination of these.
Tem m e, D. 2006. Constraint-Based Inference Algorithm s for Stru ctu ral Mod els w ith Latent Confou nd ers-Em pirical Application and Sim u lations. Computational Statistics 21:151-182. Tenn, S. 2006. Avoid ing Aggregation Bias in Dem and Estim ation: a Mu ltivariate Prom otional Disaggregation Approach. Journal of M achine Learning Research 9:1437-1474.
(b)
Be sure to separate figures from other figures by generous use of white space. When figures are too cramped, viewers get confused about which figures to read first and which legend goes with which figure. Figure 3. PAGs of disaggregate-level data (p=0.01). The two disaggregatelevel groups are defined along the lines of median family income: (a) PAG of group one (b) PAG of group two.
Conclusions Dem and estim ates based on aggregate d ata is possibly biased w hen stores are heterogeneou s. In this stu d y, w e u se FCI algorithm to test if an aggregation bias exists w hen aggregating d ata across stores w ith d ifferent geographical popu lation d istribu tion. The qu estion w e ask is: d oes aggregation across stores give u s the sam e resu lt as d isaggregate analysis? The answ er is no! The aggregate resu lt is not precisely consistent w ith d isaggregate resu lt, bu t they are sim ilar to each. Ou r resu lt su ggests that w hen aggregating d ata, som e association betw een variables m ay spu riou sly exist. H ow ever, how to obtain a properly m od ified aggregate d em and fram ew ork to avoid this problem is not answ ered in this poster. Unlike trad itionally statistical m ethod , w e d etect the cau sal patterns betw een variables to exam ine the existence of aggregation bias. Cau sal d iscovery techniqu es u su ally assu m e that all cau ses are observed and know n a priori. This is the so-called cau sal su fficiency assu m ption. H ow ever, this presu m ption is not alw ays tru e. FCI algorithm is helpfu l to check the possible u nobserved latent confou nd ers betw een variables w hen there is cau sal insu fficiency. Finally, as several previou s stu d ies in m arketing ind icate, ou r resu lts show retail price and consu m ers’ fam ily incom e m ay have effects on pu rchase behavior. We fou nd this resu lt w ithou t im posing the cau sal stru ctu re a priori.
Acknowledgments
For further information
We gratefu lly acknow led ge the provision of panel scanner d ata by the University of Chicago, Jam es M. Kilts Center, Grad u ate School of Bu siness.
Please contact
[email protected].