Santos Rodriguez C., Ganassali S., Casarin F., Laaksonen P., & Kaufmann H. (2013). Consumption. Culture in Europe: Insight into the Beverage Industry.
Key view: a protocol for extracting and summarising the main insights of survey responses. By Ganassali Stephane and Jean Moscarola, IAE-‐IREGE -‐ University of Savoie (France) -‐ sgana@univ-‐savoie.fr 1st Southern European Conference on Survey Methodology (SESM) -‐ Barcelona 12 to 14 December 2013 -‐ Universitat Pompeu Fabra. Introduction Research analysts are continuously interested in trying to get efficiently the most relevant information from their survey responses (Han, Kamber & Pei, 2011). Result visualisation is one of the various steps of the Knowledge Discovery Process (Goebel and Gruenwald, 1999) and we may consider it is one of the most important for survey data analysts. Indeed, when the number of questions is high, research analysts may face an overload of possible results, with a huge number of descriptive and comparative analyses. Ideally, professional analysts would like to access efficiently to the most significant results of the survey and to visualize them in a very synthetic way. That summarization approach is one of the various data-‐mining methods (Fayyad, 1996) and is very helpful both for a better interpretation and a more efficient communication to the decision makers. More specifically, comparative (or bivariate) analyses potentially represent a very demanding and subtle step for long surveys. In such vast studies (> 30 questions 1), we may usually find dozens of behavior, opinion, or motivation indicators and frequently a dozen of identity descriptors (like socio-‐demographics for consumer studies). That means that we may be interested by hundreds (sometimes thousands) of possible relationships between variables. Depending on the number of categories of the variables, the full amount of possible associations between all the categories in the survey could easily reach several tens of thousands of units. The “key-‐view” approach The “key-‐view” approach we propose is a protocol aimed at extracting quickly and summarizing the main findings of a survey. It would start with the consolidation of all survey variables (except socio-‐demographics) for each respondent (Cathelat, 1990) and the transformation of all indicators into dummy variables. Then, a cluster analysis is performed through ascending hierarchical classification to identify the main “opinion groups” in the sample. Key view tables are performed for illustrating the main (usually positive) specificities of the clusters. The following table is an example taken from a study about chocolate attitudes and consumption patterns (Ganassali, 2013).
1
In a study based on a large panel of 3500 surveys, 30 was identified as the average number of questions for a standard questionnaire (Ganassali, 2005).
1
According to the significant elements identified in the key-‐view table, researchers may produce a correspondences analyses map that can be very useful for visualising the main internal characteristics of the clusters. In the graph below, we can clearly see for example that the "involved/connaisseurs" group is able to quote more specialised chocolate brands like Valrhona, Menier or Villars. They clearly associated chocolate consumption with specific moments, places, moods and persons, and they are particularly strong on the involvement dimensions measures proposed in the survey.
2
In a second stage, clusters can be described and contextualized via their particular socio-‐ demographics registered in the survey. From the table below, we can see for example that the "involved/connaisseurs" group is specifically composed of executives or teachers, aged over 37, generally living in a household earning more than 5000 euros. Negative specificities may also be presented (with the minus sign in the table), like for instance the relative lack of students or entrepreneurs in the same "involved/connaisseurs" group.
The UK beverages survey As another illustration of the approach, a long survey was conducted in the United Kingdom within the Coberen research project in 2011, focused on beverages consumption (Santos et al., 2013). The questionnaire first addressed beverage consumption representations and patterns. Then, some more global questions were asked, dedicated to general consumption attitudes and preferences. The survey was composed of 120 questions, covering a dozen of pages. (See http://www.sphinxonline.net/coberen/drink_uk/quest_uk.htm to visualise the questionnaire in English). The median input time was 19 minutes. The final structure of the Internet questionnaire was defined as follows: •
•
•
Beverage Consumption o Wall of pictures and word associations o Picture scales for drinking preferences o Preferred alcoholic and non-‐alcoholic beverages, consumption patterns (volume, expense, places, brands, distribution channels…) and motives o Beverage consumption situations General Consumption Culture o The Consumer “Mind Set”: Overt and covert dimensions o Consumer practices o Consumption contexts Others o National culture dimensions o Socio-‐Demographics: Country, age, gender, education and income levels
As presented in the paper before, the potential number of associations to be studied is huge for that survey and data analysts were interested in summarizing methods for identifying the most relevant results. As an illustration of the key-‐view approach, we consider a selection of the questions asked in the Coberen survey in the UK, including categorical, scales, numerical and textual variables. We study 3
the preferred beverages, the consumed quantities for tea, coffee, water, beer and wine, the consumption circumstances and motives for the five same beverages, the so-‐called overt and covert “consumer practices” (more general consumption patterns), the chosen items in a wall of pictures presented in the introduction of the survey and the related choice justifications (textual data). Finally, six socio-‐demographic variables are taken into consideration in the data analyses. 630 responses were collected in the United Kingdom through a professional panel provider. As defined in the Key-‐View protocol, all survey variables are consolidated for each respondent and transformed into dummy variables. A cluster analysis is performed through ascending hierarchical classification to identify the main consumer groups in the sample. Thanks to the key views performed around the cluster variable (it is defined then as the “pivot” variable), it is quite easy to interpret the classes and to identify the type of consumers they include. Class n°1 meets the coffee drinkers, class n°2 the tea fans, class n°3 the wine and beer consumers and class n°4 the healthy drinkers.
To illustrate the process of selection and summarisation, the following figure represents the conversion from a set of six cross tabulations to one key-‐view table, in which only the significant correspondences are presented. In the example, 116 possible correspondences are considered and less than half of them show a significant difference to the null hypothesis. Finally, 27 couples of categories are significantly over-‐represented (chi-‐square test – p: 0,01) and are selected for an inclusion the “Key-‐view” table.
4
Looking at the consumer practices, it is interesting to go further in the description of the clusters. For example, it seems like the “coffee drinkers” are uninvolved shoppers while the “healthy consumers” are more careful. The same type of tables for the socio-‐demographic variables would inform us for example that alcohol drinkers tend to be male consumers with quite a high household monthly income (3000-‐5000 euros).
It may also be possible to include more “qualitative” variables like in our case: the selected pictures (from a wall proposed in the introduction of the survey) and the words used for justifying their choices (grouped and recoded through lexical analysis). We can see from the table hereafter that “healthy consumers” tend to chose pictures representing non-‐alcoholic beverages (tap water, cup of tea, expresso etc.) and mention functional motives for explicating their choices: 5
refreshment and warming for example. Those analyses are relevant to get deeper insights about the representations associated by UK people to the consumption of the different beverages.
As a synthetic picture, we could decide to take all the dependant variables together in one single Key-‐View and for a better visualisation, to represent the correspondences via a factorial map. On one hand, that makes the results a little bit difficult to read, but on the other hand, we can take all the significant correspondences in one single figure, mixing behaviours, motives, representations (selection of pictures), verbatim (related justifications), and identity variables.
6
Options Technically, it may be interesting to mention that in some software like Sphinx Survey (2013), it is possible to monitor the selection of the significant correspondences to be extracted on the key-‐ view table (or map). As you can see on the screenshots below, the researcher may decide about the variables to be considered in the analysis, about the statistical rules of the selection (level of significance threshold – minimum frequency) and about the information to be presented in the Key View table: frequency, percentage or p value for example.
Conclusion A large majority of statistical techniques for social sciences is still designed accordingly to an ancient tradition, in which data were rare and expensive. Today, we face another situation because social scientists and practitioners are able to collect and receive a huge amount of data, and their main concern probably moves from estimation issues to selection and summarisation problems. The “key-‐view” approach could be one of the multiple responses to the expectations of contemporary data analysts. References Cathelat B. (1990), Socio-‐Styles-‐Système... les Styles de Vie : théorie, méthodes, applications, Paris, Editions d'Organisation. Fayyad U. M. (1996), Data Mining and Knowledge Discovery: Making Sense Out of Data, IEEE Expert, vol. 11, no. 5, pp. 20-‐25. Ganassali S. (2005), Les caractéristiques rédactionnelles du questionnaire : fondements et pratiques, Colloque Francophone sur les Sondages, Université Laval, Québec/Canada.
7
Ganassali S., (2013), Le protocole du mur d’images en ligne et son impact sur la qualité des réponses, Proceedings of the 29th Congress of the French Association of Marketing, University of la Rochelle. Han J., Kamber M. & Pei J. (2011), Data Mining: Concepts and Techniques 3rd edition, Waltham, Elsevier. Santos Rodriguez C., Ganassali S., Casarin F., Laaksonen P., & Kaufmann H. (2013). Consumption Culture in Europe: Insight into the Beverage Industry. Hershey: IGI Global. Sphinx
Survey software handbook (2013) -‐ http://infos.lesphinx.eu/en/resources/pdf/Handbook.pdf -‐ accessed 10th December 2013.
8