a protocol for extracting and summarising the main

0 downloads 0 Views 1MB Size Report
Santos Rodriguez C., Ganassali S., Casarin F., Laaksonen P., & Kaufmann H. (2013). Consumption. Culture in Europe: Insight into the Beverage Industry.
Key  view:  a  protocol  for  extracting  and  summarising  the  main  insights  of  survey  responses.   By   Ganassali   Stephane   and   Jean   Moscarola,   IAE-­‐IREGE   -­‐   University   of   Savoie   (France)   -­‐   sgana@univ-­‐savoie.fr   1st  Southern  European  Conference  on  Survey  Methodology  (SESM)  -­‐  Barcelona  12  to  14  December   2013  -­‐  Universitat  Pompeu  Fabra.     Introduction   Research   analysts   are   continuously   interested   in   trying   to   get   efficiently   the   most   relevant   information  from  their  survey  responses  (Han,  Kamber  &  Pei,  2011).  Result  visualisation  is  one  of   the   various   steps   of   the   Knowledge   Discovery   Process   (Goebel   and   Gruenwald,   1999)   and   we   may   consider   it   is   one   of   the   most   important   for   survey   data   analysts.   Indeed,   when   the   number   of   questions  is  high,  research  analysts  may  face  an  overload  of  possible  results,  with  a  huge  number   of   descriptive   and   comparative   analyses.   Ideally,   professional   analysts   would   like   to   access   efficiently   to   the   most   significant   results   of   the   survey   and   to   visualize   them   in   a   very   synthetic   way.   That   summarization   approach   is   one   of   the   various   data-­‐mining   methods   (Fayyad,   1996)   and   is  very  helpful  both  for  a  better  interpretation  and  a  more  efficient  communication  to  the  decision   makers.   More  specifically,  comparative  (or  bivariate)  analyses  potentially  represent  a  very  demanding  and   subtle   step   for   long   surveys.   In   such   vast   studies   (>   30   questions   1),   we   may   usually   find   dozens   of   behavior,   opinion,   or   motivation   indicators   and   frequently   a   dozen   of   identity   descriptors   (like   socio-­‐demographics   for   consumer   studies).   That   means   that   we   may   be   interested   by   hundreds   (sometimes  thousands)  of  possible  relationships  between  variables.  Depending  on  the  number  of   categories  of  the  variables,  the  full  amount  of  possible  associations  between  all  the  categories  in   the  survey  could  easily  reach  several  tens  of  thousands  of  units.   The  “key-­‐view”  approach     The   “key-­‐view”   approach   we   propose   is   a   protocol   aimed   at   extracting   quickly   and   summarizing   the  main  findings  of  a  survey.  It  would  start  with  the  consolidation  of  all  survey  variables  (except   socio-­‐demographics)  for  each  respondent  (Cathelat,  1990)  and  the  transformation  of  all  indicators   into   dummy   variables.   Then,   a   cluster   analysis   is   performed   through   ascending   hierarchical   classification  to  identify  the  main  “opinion  groups”  in  the  sample.  Key  view  tables  are  performed   for   illustrating   the   main   (usually   positive)   specificities   of   the   clusters.     The   following   table   is   an   example   taken   from   a   study   about   chocolate   attitudes   and   consumption   patterns   (Ganassali,   2013).    

                                                                                                                        1

 In  a  study  based  on  a  large  panel  of  3500  surveys,  30  was  identified  as  the  average  number  of  questions  for  a   standard  questionnaire  (Ganassali,  2005).  

1    

  According  to  the  significant  elements  identified  in  the  key-­‐view  table,  researchers  may  produce  a   correspondences   analyses   map   that   can   be   very   useful   for   visualising   the   main   internal   characteristics   of   the   clusters.   In   the   graph   below,   we   can   clearly   see   for   example   that   the   "involved/connaisseurs"   group   is   able   to   quote   more   specialised   chocolate   brands   like   Valrhona,   Menier   or   Villars.   They   clearly   associated   chocolate   consumption   with   specific   moments,   places,   moods   and   persons,   and   they   are   particularly   strong   on   the   involvement   dimensions   measures   proposed  in  the  survey.    

    2    

In   a   second   stage,   clusters   can   be   described   and   contextualized   via   their   particular   socio-­‐ demographics   registered   in   the   survey.   From   the   table   below,   we   can   see   for   example   that   the   "involved/connaisseurs"   group   is   specifically   composed   of   executives   or   teachers,   aged   over   37,   generally  living  in  a  household  earning  more  than  5000  euros.  Negative  specificities  may  also  be   presented   (with   the   minus   sign   in   the   table),   like   for   instance   the   relative   lack   of   students   or   entrepreneurs  in  the  same  "involved/connaisseurs"  group.  

  The  UK  beverages  survey   As  another  illustration  of  the  approach,  a  long  survey  was  conducted  in  the  United  Kingdom  within   the   Coberen   research   project   in   2011,   focused   on   beverages   consumption   (Santos   et   al.,   2013).   The   questionnaire   first   addressed   beverage   consumption   representations   and   patterns.   Then,   some   more   global   questions   were   asked,   dedicated   to   general   consumption   attitudes   and   preferences.   The   survey   was   composed   of   120   questions,   covering   a   dozen   of   pages.   (See   http://www.sphinxonline.net/coberen/drink_uk/quest_uk.htm   to   visualise   the   questionnaire   in   English).  The  median  input  time  was  19  minutes.  The  final  structure  of  the  Internet  questionnaire   was  defined  as  follows:   •





Beverage  Consumption   o Wall  of  pictures  and  word  associations   o Picture  scales  for  drinking  preferences   o Preferred   alcoholic   and   non-­‐alcoholic   beverages,   consumption   patterns   (volume,   expense,  places,  brands,  distribution  channels…)  and  motives   o Beverage  consumption  situations   General  Consumption  Culture   o The  Consumer  “Mind  Set”:  Overt  and  covert  dimensions   o Consumer  practices   o Consumption  contexts   Others   o National  culture  dimensions   o Socio-­‐Demographics:  Country,  age,  gender,  education  and  income  levels  

As  presented  in  the  paper  before,  the  potential  number  of  associations  to  be  studied  is  huge  for   that   survey   and   data   analysts   were   interested   in   summarizing   methods   for   identifying   the   most   relevant  results.   As  an  illustration  of  the  key-­‐view  approach,  we  consider  a  selection  of  the  questions  asked  in  the   Coberen  survey  in  the  UK,  including  categorical,  scales,  numerical  and  textual  variables.  We  study   3    

the   preferred   beverages,   the   consumed   quantities   for   tea,   coffee,   water,   beer   and   wine,   the   consumption   circumstances   and   motives   for   the   five   same   beverages,   the   so-­‐called   overt   and   covert  “consumer  practices”  (more  general  consumption  patterns),  the  chosen  items  in  a  wall  of   pictures  presented  in  the  introduction  of  the  survey  and  the  related  choice  justifications  (textual   data).   Finally,   six   socio-­‐demographic   variables   are   taken   into   consideration   in   the   data   analyses.   630  responses  were  collected  in  the  United  Kingdom  through  a  professional  panel  provider.   As  defined  in  the  Key-­‐View  protocol,  all  survey  variables  are  consolidated  for  each  respondent  and   transformed   into   dummy   variables.   A   cluster   analysis   is   performed   through   ascending   hierarchical   classification   to   identify   the   main   consumer   groups   in   the   sample.   Thanks   to   the   key   views   performed  around  the  cluster  variable  (it  is  defined  then  as  the  “pivot”  variable),  it  is  quite  easy  to   interpret   the   classes   and   to   identify   the   type   of   consumers   they   include.   Class   n°1   meets   the   coffee  drinkers,  class  n°2  the  tea  fans,  class  n°3  the  wine  and  beer  consumers  and  class  n°4  the   healthy  drinkers.  

  To   illustrate   the   process   of   selection   and   summarisation,   the   following   figure   represents   the   conversion  from  a  set  of  six  cross  tabulations  to  one  key-­‐view  table,  in  which  only  the  significant   correspondences   are   presented.   In   the   example,   116   possible   correspondences   are   considered   and  less  than  half  of  them  show  a  significant  difference  to  the  null  hypothesis.  Finally,  27  couples   of  categories  are  significantly  over-­‐represented  (chi-­‐square  test  –  p:  0,01)  and  are  selected  for  an   inclusion  the  “Key-­‐view”  table.  

4    

  Looking  at  the  consumer  practices,  it  is  interesting  to  go  further  in  the  description  of  the  clusters.   For   example,   it   seems   like   the   “coffee   drinkers”   are   uninvolved   shoppers   while   the   “healthy   consumers”  are  more  careful.  The  same  type  of  tables  for  the  socio-­‐demographic  variables  would   inform   us   for   example   that   alcohol   drinkers   tend   to   be   male   consumers   with   quite   a   high   household  monthly  income  (3000-­‐5000  euros).  

  It   may   also   be   possible   to   include   more   “qualitative”   variables   like   in   our   case:   the   selected   pictures  (from  a  wall  proposed  in  the  introduction  of  the  survey)  and  the  words  used  for  justifying   their   choices   (grouped   and   recoded   through   lexical   analysis).   We   can   see   from   the   table   hereafter   that  “healthy  consumers”  tend  to  chose  pictures  representing  non-­‐alcoholic  beverages  (tap  water,   cup   of   tea,   expresso   etc.)   and   mention   functional   motives   for   explicating   their   choices:   5    

refreshment  and  warming  for  example.  Those  analyses  are  relevant  to  get  deeper  insights  about   the  representations  associated  by  UK  people  to  the  consumption  of  the  different  beverages.  

  As  a  synthetic  picture,  we  could  decide  to  take  all  the  dependant  variables  together  in  one  single   Key-­‐View  and  for  a  better  visualisation,  to  represent  the  correspondences  via  a  factorial  map.  On   one  hand,  that  makes  the  results  a  little  bit  difficult  to  read,  but  on  the  other  hand,  we  can  take  all   the  significant  correspondences  in  one  single  figure,  mixing  behaviours,  motives,  representations   (selection  of  pictures),  verbatim  (related  justifications),  and  identity  variables.    

  6    

Options   Technically,   it   may   be   interesting   to   mention   that   in   some   software   like   Sphinx   Survey   (2013),   it   is   possible  to  monitor  the  selection  of  the  significant  correspondences  to  be  extracted  on  the  key-­‐ view  table  (or  map).  As  you  can  see  on  the  screenshots  below,  the  researcher  may  decide  about   the  variables  to  be  considered  in  the  analysis,  about  the  statistical  rules  of  the  selection  (level  of   significance   threshold   –   minimum   frequency)   and   about   the   information   to   be   presented   in   the   Key  View  table:  frequency,  percentage  or  p  value  for  example.  

  Conclusion   A   large   majority   of   statistical   techniques   for   social   sciences   is   still   designed   accordingly   to   an   ancient   tradition,   in   which   data   were   rare   and   expensive.   Today,   we   face   another   situation   because  social  scientists  and  practitioners  are  able  to  collect  and  receive  a  huge  amount  of  data,   and   their   main   concern   probably   moves   from   estimation   issues   to   selection   and   summarisation   problems.  The  “key-­‐view”  approach  could  be  one  of  the  multiple  responses  to  the  expectations  of   contemporary  data  analysts.   References   Cathelat   B.   (1990),   Socio-­‐Styles-­‐Système...   les   Styles   de   Vie   :   théorie,   méthodes,   applications,   Paris,   Editions  d'Organisation.   Fayyad   U.   M.   (1996),   Data   Mining   and   Knowledge   Discovery:   Making   Sense   Out   of   Data,   IEEE   Expert,  vol.  11,  no.  5,  pp.  20-­‐25.   Ganassali   S.   (2005),   Les   caractéristiques   rédactionnelles   du   questionnaire   :   fondements   et   pratiques,  Colloque  Francophone  sur  les  Sondages,  Université  Laval,  Québec/Canada.  

7    

Ganassali   S.,   (2013),   Le   protocole   du   mur   d’images   en   ligne   et   son   impact   sur   la   qualité   des   réponses,   Proceedings   of   the   29th   Congress   of   the   French   Association   of   Marketing,   University  of  la  Rochelle.   Han   J.,   Kamber   M.   &   Pei   J.   (2011),   Data   Mining:   Concepts   and   Techniques   3rd   edition,   Waltham,   Elsevier.   Santos  Rodriguez  C.,  Ganassali  S.,    Casarin  F.,    Laaksonen  P.,  &  Kaufmann  H.  (2013).  Consumption   Culture  in  Europe:  Insight  into  the  Beverage  Industry.  Hershey:  IGI  Global.     Sphinx  

Survey   software   handbook   (2013)   -­‐   http://infos.lesphinx.eu/en/resources/pdf/Handbook.pdf  -­‐  accessed  10th  December  2013.  

       

8    

Suggest Documents