Respondent decides to misreport the answer (e.g. social desirable answers). ...... verwijst naar een verschil in meet fouten die gepaard gaan met de verschillende methoden. ...... VANDEN ABEELE Veronika, Motives for Motion-based Play.
KU LEUVEN FACULTEIT SOCIALE WETENSCHAPPEN
Mixed-mode Data Collection Basic Concepts and Analysis of Mode Effects
Promotor: Prof. Dr. Geert Loosveldt Co-promotor: Prof. Dr. Geert Molenberghs [I-BioStat] Onderzoekseenheid: Centrum voor Sociologisch Onderzoek [CeSO]
2013
Proefschrift tot het verkrijgen van de graad van Doctor in de Sociale Wetenschappen aangeboden door Jorre VANNIEUWENHUYZE
KU LEUVEN FACULTEIT SOCIALE WETENSCHAPPEN
Mixed-mode Data Collection Basic Concepts and Analysis of Mode Effects Jorre VANNIEUWENHUYZE
Proefschrift tot het verkrijgen van de graad van Doctor in de Sociale Wetenschappen
Nr. 238
Samenstelling van de examencommissie: Prof. Dr. Jan Van den Bulck (voorzitter) Prof. Dr. Geert Loosveldt (promotor) Prof. Dr. Geert Molenberghs [I-BioStat] (co-promotor) Dr. ir. Barry Schouten [Centraal Bureau voor de Statistiek & Universiteit Utrecht, NL] Prof. Dr. Bart Meuleman Prof. Dr. Edith de Leeuw [Universiteit Utrecht, NL] Prof. Dr. Peter Lynn [University of Essex, UK]
2013
De verantwoordelijkheid voor de ingenomen standpunten berust alleen bij de auteur. Gepubliceerd door: Faculteit Sociale Wetenschappen - Onderzoekseenheid: Centrum voor Sociologisch Onderzoek [CeSO], KU Leuven, Parkstraat 45 bus 3601 - 3000 Leuven, België. 2013 by the author. Niets uit deze uitgave mag worden verveelvoudigd zonder voorafgaande schriftelijke toestemming van de auteur / No part of this book may be reproduced in any form without the permission in writing from the author. D/2013/8978/8
Contents Contents
v
List of Tables
ix
List of Figures
xi
Acknowledgement
xiii
I Introduction & Basic concepts
1
1 Introduction
3
1.1
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
Note on the writing process
. . . . . . . . . . . . . . . . . . . . . .
6
1.3
Outline of the dissertation
. . . . . . . . . . . . . . . . . . . . . . .
9
2 Some Basic Concepts of Mixed-Mode Surveys
13
2.1
A working denition for modes
. . . . . . . . . . . . . . . . . . . .
14
2.2
Combining modes in surveys . . . . . . . . . . . . . . . . . . . . . .
18
2.3
Mixed-mode survey designs
. . . . . . . . . . . . . . . . . . . . . .
24
2.4
Survey Error in Mixed-Mode Surveys . . . . . . . . . . . . . . . . .
28
2.5
Illustration: The Survey on Surveys . . . . . . . . . . . . . . . . . .
38
II Papers
43
3 Evaluating Mode Eects in MM Surveys
45
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
3.2
A method to disentangle mode eects . . . . . . . . . . . . . . . . .
47
3.3
An illustration with the ESS data . . . . . . . . . . . . . . . . . . .
53
3.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
Contents
vi 4 Mode Eects on Means and Variances
67
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
4.2
Dening the mode eects
70
4.3
A method to evaluate the mode eects
. . . . . . . . . . . . . . . .
71
4.4
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
4.5
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
. . . . . . . . . . . . . . . . . . . . . . .
5 Evaluating Mode Eects on Data Quality
93
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
5.2
The European Social Survey . . . . . . . . . . . . . . . . . . . . . .
96
5.3
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
5.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6 Evaluating Mode Eects Three Methods
117
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2
Dening mode eects
6.3
Three methods to evaluate mode eects . . . . . . . . . . . . . . . . 122
6.4
An illustration using a survey on surveys . . . . . . . . . . . . . . . 127
6.5
General discussion
. . . . . . . . . . . . . . . . . . . . . . . . . 119
. . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7 Mode Eects through Back- & Front-Door
141
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.2
The problem of counterfactuals
7.3
The back-door and front-door methods . . . . . . . . . . . . . . . . 151
7.4
Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.5
General discussion
. . . . . . . . . . . . . . . . . . . . 144
. . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8 The Advantage of Mixed-Mode Surveys
167
8.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.2
A procedure to evaluate mixed-mode designs . . . . . . . . . . . . . 170
8.3
An illustration with the Survey On Surveys
8.4
General discussion
. . . . . . . . . . . . . 177
. . . . . . . . . . . . . . . . . . . . . . . . . . . 193
III Conclusions
197
9 Conluding remarks
199
9.1
The future of mixed-mode surveys . . . . . . . . . . . . . . . . . . . 199
9.2
Guidelines for future research
9.3
Beyond mixed-mode surveys . . . . . . . . . . . . . . . . . . . . . . 203
. . . . . . . . . . . . . . . . . . . . . 200
Contents
vii
IV Appendices
207
A Example of back-door method SAS code.
209
B Example of front-door method SAS code.
215
C Example instrumental variable SAS code.
221
References
227
SummarySamenvattingRésumé
237
viii
Contents
List of Tables 2.1
Data-collection modes
. . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2
Survey modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.3
Possible data-collection mode combinations in surveys . . . . . . . .
20
2.4
Possible survey mode combinations in surveys
. . . . . . . . . . . .
22
2.5
Components of selection error
. . . . . . . . . . . . . . . . . . . . .
30
2.6
Components of measurement error . . . . . . . . . . . . . . . . . . .
31
3.1
Response frequencies and response rates
. . . . . . . . . . . . . . .
56
3.2
Sample proportions . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
3.3
Mode eects on political interest . . . . . . . . . . . . . . . . . . . .
60
3.4
Mode eects on perceived political complexity
. . . . . . . . . . . .
62
3.5
Mode eects on voter turnout
. . . . . . . . . . . . . . . . . . . . .
62
3.6
Required sample sizes to detect moderate mode eects
4.1
Opinion about surveys items
4.2
Descriptive statistics
4.3
Mode eects estimates
5.1
Response frequencies of the ESS round four
5.2
The ESS MTMM experiments . . . . . . . . . . . . . . . . . . . . . 106
5.3
Reliability, validity, and quality estimates . . . . . . . . . . . . . . . 110
5.4
Mode eects on quality estimates
6.1
Response frequencies and response rates
6.2
Opinions about surveys items in the three sample groups . . . . . . 130
6.3
Mode eect estimates on the means of the three methods . . . . . . 134
6.4
Comparison of the three methods to evaluate mode eects
7.1
Full mixed-mode data
7.2
Observed and counterfactual means . . . . . . . . . . . . . . . . . . 148
7.3
Target Variables.
7.4
Selection and measurement eect estimates . . . . . . . . . . . . . . 161
. . . . . . .
64
. . . . . . . . . . . . . . . . . . . . .
83
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85 87 97
. . . . . . . . . . . . . . . . . . . 112 . . . . . . . . . . . . . . . 128
. . . . . 138
. . . . . . . . . . . . . . . . . . . . . . . . . 147
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
List of Tables
x 8.1
Response frequencies
8.2
Fixed and variable costs of the designs
. . . . . . . . . . . . . . . . . . . . . . . . . . 180 . . . . . . . . . . . . . . . . 182
List of Figures 2.1
Causal models of mixed-mode data
. . . . . . . . . . . . . . . . . .
35
2.2
The Survey on Surveys design . . . . . . . . . . . . . . . . . . . . .
40
4.1
Survey design with response frequencies . . . . . . . . . . . . . . . .
80
5.1
Causal graphs representing mixed-mode data . . . . . . . . . . . . . 100
5.2
Causal graphs representing MTMM experiments . . . . . . . . . . . 108
6.1
Process underlying data in a mixed-mode context. . . . . . . . . . . 120
6.2
Analysis model with control variables.
6.3
Analysis model with comparative data. . . . . . . . . . . . . . . . . 125
7.1
Causal graphs representing mixed-mode data . . . . . . . . . . . . . 146
7.2
The Survey on Surveys design . . . . . . . . . . . . . . . . . . . . . 156
7.3
More complex analysis models . . . . . . . . . . . . . . . . . . . . . 166
8.1
MSE curves when mail is benchmakr mode . . . . . . . . . . . . . . 185
8.2
Dierence MSE, mail versus mixed-mode . . . . . . . . . . . . . . . 186
8.3
Dierence MSE, face-to-face versus mixed-mode . . . . . . . . . . . 187
8.4
MSE curves when face-to-face is benchmakr mode . . . . . . . . . . 189
8.5
Dierence MSE, mail versus mixed-mode . . . . . . . . . . . . . . . 190
8.6
Dierence MSE, face-to-face versus mixed-mode . . . . . . . . . . . 191
9.1
Complex analysis models . . . . . . . . . . . . . . . . . . . . . . . . 204
. . . . . . . . . . . . . . . . 123
xii
List of Figures
Acknowledgement Thanks to Geert Loosveldt for thoroughly supervising my work for four years and for giving me all the opportunities to grow in my discipline. Thanks to Geert Molenberghs for co-supervising my work and for helping me with numerous thoughts and questions about data analysis. Thanks to Barry Schouten, Thomas Klaush, Koen Beullens, Hideko Matsuo, Bart Meuleman, and Jaak Billiet for discussing my work from time to time and for providing very useful suggestions which certainly improved my research. Thanks to Rina, Ilias, and Anaïs for bringing joy and love to my live and for preventing me working whole days long, especially during weekends and holidays.
xiv
Acknowledgement
Part I Introduction & Basic concepts
Chapter 1 Introduction 1.1 Problem Statement In empirical research, an investigator aims at answering a research question as accurately as possible.
Sometimes, such a research question involves the char-
acteristics of a human population. In that situation, a possible approach to get information about this population is collecting data on a set of variables from a sample of population members by conducting a
survey
(Punch, 2003).
Traditionally, survey designs start from using one single data-collection mode like Computer-Assisted Personal Interviewing (CAPI), Computer-Assisted Telephone Interviewing (CATI), Mail Self-Administered Questionnaires (MSAQ's), or Web Self-Administered Questionnaires (WSAQ's), to collect data from all sample members.
However each data-collection mode has its strengths and weaknesses
because several types of mode-specic survey error may be introduced (Biemer & Lyberg, 2003; de Leeuw, 2005; Dillman, Smyth & Christian, 2009). Such survey error can lead to incorrect conclusions regarding the research question. In order to combine the strengths and to neutralize the weaknesses of dierent data-collection modes, survey researchers increasingly suggest to use survey designs including more than one data-collection mode (Hochstim, 1967; de Leeuw, 2005; Brick & Lepowski, 2008; Dillman, Smyth & Christian, 2009).
A special form
of such a design allows data from distinct sample members to be collected by dierent data-collection modes or sets of data-collection modes (de Leeuw, 2005;
1. Introduction
4 Roberts, 2007).
Such a design is called a
mixed-mode survey
and increasingly
becomes popular among surveyors. Mixed-mode surveys thus are the counterparts of single-mode surveys in which data of all respondents are collected by the very same set of data-collection modes. Mixed-mode surveys are argued to be advantageous over single-mode surveys because they may result in either lower systematic or lower random
selection error
,
i.e. the error introduced by only observing a small subset of population members instead of the entire population (de Leeuw, 2005; Voogt & Saris, 2005). Firstly, a mixed-mode survey may reduce systematic selection error (e.g., nonresponse error or coverage error) relative to a single-mode survey because certain population members might not be willing or able to respond to the mode of the single-mode survey but do respond to another mode in the mixed-mode survey. a mixed-mode survey may reduce random selection error (e.g.
Secondly,
sampling error)
because some respondents may respond by a cheap mode in the mixed-mode survey while they would respond by an expensive mode in the single-mode survey, and, as a result, larger samples can be drawn within the same budget constraints. However, the advantage of lower selection error might not be guaranteed because of two reasons.
First, the advantage might not be guaranteed because a
reduction in selection error might be overwhelmed by an increase in measurement error (de Leeuw, 2005; Voogt & Saris, 2005; Dillman, Smyth & Christian, 2009; Weisberg, 2005). Measurement error refers to the error introduced by observing distorted values instead of respondents' true values.
Typical examples of meas-
urement errors are social desirability error in interview modes, recency eects in telephone surveys, or primacy eects in self-administration modes (see, among others, de Leeuw, 2005; Dillman, Smyth & Christian, 2009; Schwarz, Strack, Hippler & Bishop, 1991). The choice for a mixed-mode design thus rstly involves a trade-o between selection error and measurement error. Second, the advantage might not be guaranteed because the use of mixedmode designs requires additional xed costs relative to single-mode designs for the development and implementation of additional modes. These additional xed costs limit the available budget for actual data-collection and thus might result in higher random selection error instead of lower selection error. The choice for a
1.1. Problem Statement
5
mixed-mode design thus secondly involves a trade-o between additional random selection error and general selection error reduction. Unfortunately, evaluation of selection error and measurement error in mixedmode surveys is dicult because selection eects and measurement eects between the modes are completely confounded.
Selection eects
refer to dierences in re-
spondent composition between the modes and make mixed-mode surveys attractive relative to single-mode surveys (Voogt & Saris, 2005; Dillman, Phelps et al., 2009). Indeed, absence of selection eects means that respondents of single-mode surveys would be similar to respondents of the mixed-mode survey.
Measurement eects
,
in turn, refer to dierences in measurement error between the modes (Bowling, 2005; Voogt & Saris, 2005) and they occur when the answers of the very same respondents dier between the modes.
As a consequence, measurement eects
make mixed-mode surveys unattractive relative to single-mode surveys because they result in additional measurement error. Direct estimation of selection and measurement eects, however, is impossible within mixed-mode data.
Indeed, dierences between respondents of dierent
modes may occur from dierences in respondent composition (i.e. a selection eect) or dierences in measurement error (i.e. a measurement eect). This confounding makes it dicult to estimate real data quality of mixed-mode surveys relative to single-mode surveys, and to adjust mixed-mode data for measurement error. Nevertheless, the confounding between selection and measurement eects in mixed-mode surveys overlaps with the central theme of the causal inference literature (e.g., among others, Morgan & Winship, 2009; Pearl, 2009; Weisberg, 2010). The causal inference literature provides, among others, three distinct methods to disentangle selection and measurement eects, i.e. the back-door method, the front-door method, and the instrumental variable method (Pearl, 1995, 2009). The
back-door method front-door method
starts form the inclusion of covariates which capture the selec-
tion eects. The
, in contrast, starts form the inclusion of covari-
ates which capture the measurement eects. The instrumental variable method, in turn, involves a comparison of mixed-mode data with comparable single-mode data. To date, the application of the back-door, the front-door, and the instrumental variable methods has hardly been discussed in the context of mixed-mode surveys.
1. Introduction
6
The aim of this dissertation is to ll this gap by providing a theoretical discussion of all three methods within a mixed-mode survey framework. Such a discussion provides guidelines for the development of better error reduction methods and error estimation techniques by using appropriate survey designs and adjusting covariates.
1.2 Note on the writing process The main part of this dissertation includes six papers which are published in or submitted to international scientic journals: 1. A Method for Evaluating Mode Eects in Mixed-mode Surveys (Chapter 3, published in
Public Opinion Quarterly
, volume 74, issue 5, pages 10271045,
in 2010). 2. A Method to Evaluate Mode Eects on the Mean and Variance of a Continuous Variable in Mixed-Mode Surveys (Chapter 4, published in
Statistical Review
International
, volume 80, issue 2, pages 306322, in 2012).
3. Evaluating Relative Mode Eects on Data Quality in Mixed-Mode Surveys (Chapter 5, under review). 4. Evaluating Relative Mode Eects in Mixed-Mode Surveys Three Methods to Disentangle Selection and Measurement Eects (Chapter 6, published in
Sociological Methods & Research
, volume 42, issue 1, pages 82104, in 2013).
5. Evaluating Mode Eects in Mixed-Mode Survey Data Using Back-Door and Front-Door Methods (Chapter 7, under review). 6. On the Relative Advantage of Mixed-Mode versus Single-Mode Surveys (Chapter 8). These papers discuss the application of the back-door method, the front-door method, and the instrumental variable method within mixed-mode survey data. In doing so, the papers also provide examples of the application of all three methods in practice. This section elucidates the writing process of the papers in order to facilitate reading and to improve understanding of this dissertation. The research process started with the analysis of two particular datasets, that are the Survey On Surveys dataset on the one hand (Storms & Loosveldt, 2005) and the European Social Survey (ESS) round four dataset on the other hand
1.2. Note on the writing process
7
(Eva et al., 2010; ESS Round 4: European Social Survey Round 4 Data, 2008). Both these datasets stem from surveys with a broadly similar survey design. This design includes the selection of two independent random samples where data of one sample is collected by a mixed-mode survey design while data of the other sample is collected by a single-mode survey design. The general design of both the Survey On Surveys and the ESS was developed in order to compare the single-mode sample outcomes with the mixed-mode sample outcomes. Such comparisons are topics of the rst three papers of this dissertation. These papers describe statistical methods for disentangling selection eects and measurement eects on several types of parameters with increasing complexity.
The rst paper restricts focus to the estimation of mode eects on simple
means and proportions. The second paper broadens this focus by simultaneously analysing mode eects on variances. Subsequently, the third paper focusses on the analysis of mode eects on quality estimates obtained from multitrait-multimethod (MTMM) experiments, which requires knowledge of covariance matrices. However, soon it was discovered that the problem of confounded selection and measurement eects overlaps with a central theme of the causal inference literature (Morgan & Winship, 2009; Pearl, 2009; Weisberg, 2010). Within this causal inference literature, it was further discovered that comparing a mixed-mode dataset with a single-mode data set boils down to using an instrumental variable (Angrist, Imbens & Rubin, 1996; Bowden & Turkington, 1990; Heckman, 1997, 1996). The instrumental variable method is one possible method to disentangle selection and measurement eects, and it has already been thoroughly discussed in several theoretical papers and books.
These discussions elaborate on, among others, the
requirements and assumptions of the method, and allowed us to improve and to simplify the method explanation relative to the initial papers. Indeed, even though the initial two papers make implicit use of the causal inference framework and the instrumental variable method, they do not explicitly refer to the causal inference literature or the instrumental variable method. The third paper, in contrast, does refer to to the causal inference literature and the instrumental variable method. However, within the context of mixed-mode surveys, the instrumental variable method is not optimal for several reasons.
Firstly, the instrumental vari-
able method only allows for estimating conditional mode eects but not marginal
1. Introduction
8
mode eects. For example, measurement eects can only be estimated for people responding by mode
a
but not for people responding by mode
b.
Secondly, the
instrumental variable method automatically implies the survey mode of the singlemode survey to be the benchmark mode, that is the mode which comes with ignorable measurement error according to the researcher's belief. This forced choice of the benchmark mode might be strange and unwanted in many situation. Thirdly, the instrumental variable method does not prove itself to be useful for estimation of target statistics, which usually is the rst goal of a survey. Nonetheless, next to the instrumental variable method, the causal inference literature also provides the back-door and front-door methods (Pearl, 2010, 2009), which can likewise be used for the estimation of mode eects and which do allow for estimating marginal mode eects, for freely choosing a benchmark mode, and for estimating target statistics. The back-door method has already been widely applied within the mixed-mode literature, but its assumptions are mainly ignored. The front-door method, in turn, remains relatively unexplored within mixed-mode survey studies.
As a consequence, both the back-door and front-door methods
are introduced in the three latter papers and are compared with the instrumental variable method as well with each other. The fourth paper of this dissertation compares the instrumental variable method with the back-door method, in which socio-demographic variables are used for adjusting dierential selection into the dierent modes. Within the mixed-mode literature, socio-demographic variables are often used for back-door adjustment but these variables probably do not meet the required back-door method assumptions. The resulting mode eect estimates obtained by the back-door method are very dierent from the estimates obtained by the instrumental variable method. The fth paper of this dissertation, in turn, compares the back-door method, again using socio-demographic adjustment covariates, with the front-door method. For the front-door method, an adjustment variable is selected which includes information about whether the respondents found answering the questions a pleasant or unpleasant task.
Once again, large dierences in mode eects estimates
are found between both methods. The question thus is which covariate meets the assumptions in the best way.
1.3. Outline of the dissertation
9
Finally, the sixth paper slightly diers from the previous ve. This sixth paper does not explore estimation techniques for mode eects but provides a discussion about relative quality estimation of mixed-mode surveys compared to single-mode surveys. Indeed, the nal goal of mixed-mode surveys is to increase survey quality relative to single-mode surveys, but it is dicult to examine this quality due to the confounded selection and measurement eects.
However, the methods for
disentangling selection and measurement error, which are discussed in the rst ve papers, more or less allow for comparing quality between mixed-mode and single-mode surveys.
1.3 Outline of the dissertation The main body of this dissertation includes the six papers. However, these papers use some concepts of which the meaning may slightly dier from the standard meaning in the existing research.
For that reason, the next chapter of this in-
troductory Part I, that is Chapter 2, rst provides a discussion of some basic concepts of mixed-mode surveys in order to facilitate clear understanding. This chapter includes discussions about dening the concept `mode', about how modes can be combined in survey designs, about dierent mixed-mode survey designs, and about how mixed-mode surveys relate to survey error. Part II subsequently includes the six papers and thus forms the main body of this Ph.D. dissertation.
The papers are more or less chronologically ordered
based on the moment of writing and publication.
Most of the papers focus on
the instrumental variable method, which was the rst method to be explored. However, note that the instrumental variable method is not named as such within the earliest papers.
When these papers were written, the link with the causal
inference literature had not been made yet.
This link was made in the later
papers, in which the back-door and front-door methods are introduced as well. Chapter 3 includes the rst paper `A Method for Evaluating Mode Eects in Mixed-mode Surveys', which was published in
Public Opinion Quarterly
, volume
74, issue 5, pages 10271045, in 2010. This paper explores the instrumental variable method (even though it is not named as such) for the estimation of measurement and selection eects on proportion and mean parameters.
Estimation
1. Introduction
10
of these mode eects is illustrated on variables about political interest, political complexity, and voter turn-out within the European Social Survey (ESS) round four data. Chapter 4 includes the second paper `A Method to Evaluate Mode Eects on the Mean and Variance of a Continuous Variable in Mixed-Mode Surveys', which was published in 322, in 2012.
International Statistical Review
, volume 80, issue 2, pages 306
This paper continues the exploration of the instrumental variable
method but provides algorithms for the estimation of measurement and selection eects on mean and variance parameters.
It further also includes algorithms to
calculate optimal sample sizes for the estimation of mode eects. The use of these algorithms is illustrated on six items about opinions about surveys, which stem from the Survey on Surveys. Chapter 5 includes the third paper `Evaluating Relative Mode Eects on Data Quality in Mixed-Mode Surveys', which has been submitted to a journal and got a rst positive review but has not been accepted yet. This paper uses the instrumental variable method for the estimation of selection and measurement eects on quality estimates obtained from MTMM experiments using structural equation models. The European Social Survey is used as a guiding example but these data show a lot of survey design deciencies. The main goal of this paper is to provide some guidelines for future research on mixed-mode surveys. Chapter 6 includes the fourth paper `Evaluating Relative Mode Eects in Mixed-Mode Surveys Three Methods to Disentangle Selection and Measurement Eects', which was published in
Sociological Methods & Research
, volume
42, issue 1, pages 82104, in 2013. This paper compares the back-door method, which is commonly used within the existing mixed-mode literature, with two versions of the instrumental variable method. The comparison between the methods is illustrated by the Survey on Surveys dataset and yields very dierent estimates and conclusions. Chapter 7 includes the fth paper `Evaluating Mode Eects in Mixed-Mode Survey Data Using Back-Door and Front-Door Methods', which is nearly accepted by a journal after two positive reviews. This paper compares the back-door method with the front-door method, discusses the required assumptions of both methods,
1.3. Outline of the dissertation and provides guidelines for suitable covariates.
11 The Survey on Surveys data is
used for illustration of the methods. Chapter 8 includes the nal paper `On the Relative Advantage of Mixed-Mode versus Single-Mode Surveys', which will be submitted to a journal in the near future. This paper discusses the mean squared error as a tool for estimating the relative advantage of mixed-mode designs versus single-mode designs.
However,
estimation of this advantage requires knowledge of the population target statistic. An optimal estimate of the target statistic can be obtained by the back-door or front-door method.
The front-door method is used for an illustration with the
Survey on Surveys data. Finally, Chapter 9 in Part III provides a brief concluding discussion. Firstly, this discussion will briey repeat the shortcomings of the datasets which are analysed in this dissertation. These shortcomings may provide guidelines for future mixed-mode survey research designs. Secondly, this discussion will provide topics for future studies which may help mode eect estimation in mixed-mode surveys. Thirdly, this discussion will draw analogies to other survey methodological topics which resemble mixed-mode surveys and thus require similar solutions.
12
1. Introduction
Chapter 2 Some Basic Concepts of Mixed-Mode Surveys Within the existing literature on mixed-mode surveys, there is some confusion about important basic concepts. Many terms are given dierent meanings or are used interchangeably. This chapter aims to discuss and provide clear denitions of the most important terms in order to improve understanding of the remainder of this dissertation. Section 2.1 rst provides a working denition for `mode' as a concept. In doing so, it makes a distinction between data-collection modes and survey modes. Section 2.2 provides a discussion of two possible ways to combine modes within survey designs, and one of both possible combinations results in mixed-mode survey designs. Section 2.3 provides an overview of several possible mixed-mode survey subdesigns.
These subdesigns dier from each other by the
way respondents are selected for the dierent survey modes. Section 2.4 provides a discussion of survey error issues within mixed-mode surveys. Within this section, survey error is decomposed into selection error and measurement error, where selection error is believed to be reduced when using mixed-mode surveys but measurement error may be increased in mixed-mode surveys. Finally, Section 2.5 provides an illustration of all concepts by applying them to a real survey example.
2. Some Basic Concepts of Mixed-Mode Surveys
14
2.1 A working denition for modes
An outline of basic concepts of mixed-mode data-collections rst requires a working denition for `mode'. Within the survey methodology literature, modes refer to means of communication for the transmission of information between the research team and sample members. A mean of communication is generally dened as any combination of objects, devices and persons used for communicating information between two signicant people (DeFleur & Dennis, 1991). Survey methodologists usually pay most attention to the data-collection phase because this phase has a considerable eect on overall data quality (Biemer & Lyberg, 2003). As a consequence, the data-collection phase is generally used as a framework for dening and classifying communication modes within a survey context. Nonetheless, such a framework is inaccurate because the research team may communicate with sample members at time points beyond the mere data-collection itself (de Leeuw, 2005). This dierence between the entire contact process and the mere data-collection phase obliges to make a distinction between data-collection modes in particular and survey modes in general. A
data-collection mode
is further
dened as a mean of communication used for communicating the survey questionnaire and the answers on this questionnaire during the data-collection phase of a survey. A
survey mode
, in turn, is dened more generally as a mean of communic-
ation used for communicating overall survey information, which may go beyond the mere data-collection phase.
As such, data-collection modes refer to special
forms of survey modes because data-collection modes only include communication about the data rather than the entire survey. Within the existing literature, `mode' is usually thought of as a categorical descriptor and this paper follows this trend. Subsection 2.1.1 provides a typology of data-collection modes while subsection 2.1.2 provides a more general typology of survey modes. However, note that such typologies may not be suciently precise because `mode' in fact is a dicult multidimensional construct. For an illuminating discussion on this point, refer to Couper (2011) .
2.1. A working definition for modes
15
2.1.1 Data-collection modes Data-collection modes can be classied by the used communication devices, by the presence or absence of agent-assistance, and by the used devices of data administration. First, data-collection modes may include one of four possible communication devices, that is face-to-face communication (i.e.
no device), telephonic
communication, communication via paper, or communication over the Internet. The distinction between face-to-face, telephonic, paper, and Internet communication is, however, arbitrary because some devices can be split up further.
For
example, telephonic communication can be split up into communication via xed land-line telephone and communication via mobile phone (Callegaro et al., 2007; Brick et al., 2007; Vicente, Reis & Santos, 2009). Likewise, Internet communication can be split up into communication via computer (desktop, laptop, tablets) and communication via smartphones (Fuchs, 2008; Peytchev & Hill, 2010). Such detailed distinctions, however, go beyond the scope of this dissertation. Second, data-collection modes can also be distinguished by the presence or absence of an active intermediate agent (Couper, 2008). This intermediate agent usually is the interviewer, whose main task is to help establishing communication between the research team and the sample member. Third, data-collection modes can be classied by the device of data administration. In general, two administration devices are used, that is paper or computers. Compared to data administration on paper, data administration on a computer allows for more complex routing structures between the questions and avoids additional data-processing. In total, ten data-collection modes can be distinguished which are dened by particular combinations of the four communication devices, agent-assistance, and
the
two
administration
devices
(see
Table
2.1).
The
most
popular
Computer-Assisted Personal Interviewing Computer-Assisted Telephone Interviewing Mail SelfAdministration Questionnaires Web Self-Administration Questionnaires data-collection modes at present day are (CAPI),
(CATI),
(MSAQ's) and
(WSAQ's). CAPI involves a personal face-to-face interview usually at the
respondent's house by an interviewer who uses a portable computer with a preprogrammed questionnaire. CATI is similar to CAPI except that the interview is conducted via telephone. With an MSAQ, the researcher delivers a paper printed
2. Some Basic Concepts of Mixed-Mode Surveys
16
questionnaire to the respondent by postal service and reckons on the respondent's willingness to complete the questionnaire and to send it back.
For WSAQ's, a
link is oered to the respondent which guides him or her to a questionnaire on an Internet-page. Besides the four popular modes, many other data-collection modes can be mentioned.
In general, these modes are simple variations of the four popular
modes but may aect survey results in particular ways.
A variation on CAPI,
Computer Assisted Self-Interviewing Paper-and-Pencil Personal Interview Paper-and-Pencil Self-Interviewing Automated Computer-Assisted Telephone Interviewing Touch-tone Data Entry Interactive voice response Disk By Mail Self-Administered Questionnaire
is
(CASI) where the interviewer turns over
his portable computer to the respondent to answer a set of questions without knowledge of the interviewer.
(PAPI) and
(PASI) are similar to CAPI and CASI, except
that the interviewer uses a paper questionnaire instead of a computer.
(ACATI) is a variation to CATI where
the questions are read out by an automatic voice and the respondents' answers are administered automatically by using
(TDE) or
(IVR) technology. An alternative for an MSAQ is a
(DBMSAQ) but this mode never gained much
popularity. An alternative for WSAQ's includes Voice over IP (VoIP) technology which allows to conduct an interview over the Internet similar to CATI. This is called
Computer-Assisted Web Interviewing
(CAWI).
2.1.2 Survey modes Like data-collection modes, survey modes can be categorized along the four communication devices and agent-assistance. However, survey modes do not specically refer to data collection and can thus not be categorized by a data administration device. In total, seven survey modes can be distinguished which are dened by all possible combinations of the four communication devices and agent-assistance (except for one; see Table 2.2). The three most prevalent survey modes are
Telephone
(AT),
Unassisted Mail
(UM), and
Unassisted Web
Agent-assisted
(UW). For AT con-
tact, the sample members are usually contacted by telephone from a call-centre. With UM and UW, in turn, the sample members are contacted by postal mail
. . . via the Web
CAWI
CAPI CATI
MSAQ
PASI
on paper
administration
WSAQ
DBMSAQ
ACATI
CASI
on a computer
administration
The four most popular modes are typeset in bold. Acronyms: PAPI = Paper-and-Pencil Personal Interview, CAPI = Computer-Assisted Personal Interviewing, PASI = Paper-and-Pencil Self-Interviewing, CASI = Computer Assisted Self-Interviewing, CATI = ComputerAssisted Telephone Interviewing, ACATI = Automated Computer-Assisted Telephone Interviewing, MSAQ = Mail Self-Administration Questionnaire, DBMSAQ = Disk By Mail Self-Administered Questionnaire, CAWI = Computer-Assisted Web Interviewing, WSAQ = Web Self-Administration Questionnaire.
. . . via mail
PAPI
on a computer
on paper
. . . via telephone
. . . face-to-face
administration
administration
unassisted by intermediate agent
assisted by intermediate agent
Survey data can be collected by dierent data-collection modes.
Communication goes . . .
Table 2.1:
2.1. A working definition for modes 17
2. Some Basic Concepts of Mixed-Mode Surveys
18
or over the Internet respectively. Another common survey mode is
Face-to-face
Agent-assisted
(AF) but its use is mainly restricted to surveys with face-to-face data-
collections (PAPI, CAPI, PASI, and CASI). A further possible survey mode is
Agent-assisted Web
(AW) where contact is
made by an Internet call using, for instance, Voice over IP (VoIP) technology. A next possible survey mode includes Unassisted Telephone (UT) where a telephone contact is made using an automated message to the sample member. A last possible survey mode is
Unassisted Face-to-face
(UF) where a personal letter is handed over
to the sample member by the intermediate agent. As such, the intermediate agent has no active role in the mere communication itself. By denition, agent-assisted mail contact is theoretically and practically impossible.
2.2 Combining modes in surveys Within surveys, data-collection and survey modes can be combined in two distinct ways; across questionnaire items and contact moments, or across sample members. Both ways have important consequences with respect to survey error, but, nevertheless, the distinction between them is a source of confusion within the existing literature. This section aims to clarify this confusion by providing a detailed explanation of the two possible combinations. Like in the previous section, subsection 2.2.1 rst focuses on data-collection modes while subsection 2.2.2 broadens this focus by considering survey modes.
2.2.1 Combining data-collection modes Data-collection modes can be combined in two distinct ways, that is across questionnaire items and across sample members. Because of these two possible combinations, two times two possible data-collection designs can be distinguished (Table 2.3).
The simplest data-collection design uses only one single mode
a
for the
collection of all data for all sample members (Table 2.3a). The rst way to combine data-collection modes is a
multi-mode condition
in
which dierent data-collection modes are used to collect responses on dierent sets of survey items or questions (de Leeuw, 2005). For example, responses of the rst
2.2. Combining modes in surveys
19
Contact between the research team and the sample members can be made by dierent survey modes.
Table 2.2:
assisted by
unassisted by
intermediate agent
intermediate agent
. . . face-to-face
AF
UF
. . . via telephone
AT
UT
. . . via mail
UM
AW
UW
Communication goes . . .
. . . via the Web
Acronyms: AF = Agent-assisted Face-to-face, AT = Agent-assisted Telephone, AW = Agentassisted Web, UF = Unassisted Face-to-face, UT = Unassisted Telephone, UM = Unassisted Mail, UW = Unassisted Web.
items of a questionnaire might be collected by mode latter items might be collected by mode
b
a
while the responses of the
(Table 2.3b).
Such a combination of
modes is called a multi-mode condition because the condition to collect data of one sample member includes multiple modes. When a sample member is always contacted by the very same survey mode, his condition is called a
uni-mode condition
(Table 2.3a). Multi-mode conditions are standard practice in survey methodology because they use the advantages of dierent data-collection modes while all sample members are nonetheless treated equally (de Leeuw, 2005; Dillman, Smyth & Christian, 2009). A major advantage of multi-mode conditions is their ability to reduce survey error. For example, face-to-face interviews often include CAPI for the principal part of the questionnaire but include CASI or PASI for a limited set of sensitive questions. These sensitive questions are then treated more anonymously and social desirable responses may be avoided. An other example is a telephone interview which starts with CATI for some general questions, but then switches to a cheaper ACATI for the bulk of the questionnaire. Taking all together, multi-mode conditions are thus supposed to provide a win-win situation because survey error is reduced while every sample member is allocated to the very same condition (de Leeuw, 2005).
For that reasons, this dissertation does not further concern
multi-mode conditions. The second way to combine data-collection modes is a
collection
mixed-mode data-
in which the same data of dierent sample members is collected by
2. Some Basic Concepts of Mixed-Mode Surveys
20
Table 2.3:
ways.
Data-collection designs can combine data-collection modes in several
A single-mode data-collection with a uni-mode condition uses the very same data-collection mode for all items and all sample members. (a)
Sample member 1 2 ...
item 1 mode a mode a ...
item 2 mode a mode a ...
item 3 mode a mode a ...
item 4 mode a mode a ...
item 5 mode a mode a ...
item 6 mode a mode a ...
... ... ... ...
item 1 mode a mode a ...
item 2 mode a mode a ...
item 3 mode a mode a ...
item 4 mode b mode b ...
item 5 mode b mode b ...
item 6 mode b mode b ...
... ... ... ...
item 1 mode a mode b ...
item 2 mode a mode b ...
item 3 mode a mode b ...
item 4 mode a mode b ...
item 5 mode a mode b ...
item 6 mode a mode b ...
... ... ... ...
item 1 mode a mode c ...
item 2 mode a mode c ...
item 3 mode a mode c ...
item 4 mode b mode d ...
item 5 mode b mode d ...
item 6 mode b mode d ...
... ... ... ...
A single-mode data-collection with a multi-mode condition uses dierent datacollection modes for dierent items, but the same data-collection mode for each item separately across the sample members. (b)
Sample member 1 2 ...
A mixed-mode data-collection with uni-mode conditions uses the same datacollection mode for all items with each sample member separately, but dierent datacollection modes for dierent sample members (c)
Sample member 1 2 ...
A mixed-mode data-collection with multi-mode conditions uses dierent datacollection modes for dierent items with each sample member separately, and dierent data-collection mode combinations for dierent sample members. (d)
Sample member 1 2 ...
2.2. Combining modes in surveys dierent data-collection modes.
21
Mixed-mode data-collections thus assign dier-
ent sample members to dierent conditions. For example, data of the rst sample member may be collected by mode be collected by mode
b
a while data of the second sample member may
(Table 2.3c). When the same mode condition is used for
all sample members, a survey is further called a
single-mode survey
even though
dierent survey modes might be used (Table 2.3b). Mixed-mode data-collections might be advantageous over single-mode datacollections because they may provide analysis samples which better represent the population (de Leeuw, 2005; Dillman, Smyth & Christian, 2009).
Nevertheless,
mixed-mode data-collections might simultaneously introduce other error relative to single-mode data-collections.
This interplay between dierent forms of error
within mixed-mode and single-mode data-collections is complex and is discussed in further detail in Section 2.4. To conclude this subsection, note that multi-mode conditions and mixed-mode data-collections are not mutual exclusive.
Multi-mode conditions can easily be
used within mixed-mode data-collection designs. For example, both modes
b
a
and
might be used for the collection of the rst sample member's responses, while
modes
c
and
d
might be used for the collection of the second sample member's
responses (Table 2.3d).
2.2.2 Combining survey modes Like data-collection modes, the more generally dened survey modes can also be combined in two distinct ways, that is across contact moments and across sample members.
Combinations across contact moments are also called
mode conditions mixed-mode surveys
multi-
while combinations across sample members are further called . Note that, within the remainder of this dissertation, mixed-
mode surveys will also be used to refer to mixed-mode data-collections because data-collection modes merely are particular forms of survey modes (see Section 2.1). Once again, the two possible survey mode combinations lead to two times two possible general survey designs (Table 2.4), where the simplest design uses only one single survey mode
a
for every contact between the research team and
the sample members (Table 2.4a).
2. Some Basic Concepts of Mixed-Mode Surveys
22
Table 2.4:
Survey designs can combine survey modes in several ways.
A single-mode survey with a uni-mode condition uses the very same mode for all contacts with all sample members. (a)
Sample member 1 2 ...
prenotication mode a mode a ...
screening mode a mode a ...
data-collection contact mode a mode a ...
reminder mode a mode a ...
data-collection recontact mode a mode a ...
... ... ... ...
prenotication mode a mode a ...
screening mode b mode b ...
data-collection contact mode b mode b ...
reminder mode a mode a ...
data-collection recontact mode b mode b ...
... ... ... ...
prenotication mode a mode b ...
screening mode a mode b ...
data-collection contact mode a mode b ...
reminder mode a mode b ...
data-collection recontact mode a mode b ...
... ... ... ...
prenotication mode a mode b ...
screening mode a mode b ...
data-collection contact mode b mode ∅ ...
reminder mode ∅ mode b ...
data-collection recontact mode ∅ mode a ...
... ... ... ...
A single-mode survey with a multi-mode condition uses dierent modes for dierent contact moments, but the same mode at each contact for each sample member.
(b)
Sample member 1 2 ...
A mixed-mode survey with uni-mode conditions uses the same mode for all contacts with each sample member separately, but dierent modes for dierent sample members (c)
Sample member 1 2 ...
A mixed-mode survey with multi-mode conditions uses dierent modes for dierent contacts with each sample member separately, and dierent mode combinations for dierent sample members. A special situation is no communication (`mode ∅').
(d)
Sample member 1 2 ...
2.2. Combining modes in surveys First, a
multi-mode condition
23
now refers to using dierent survey modes for
dierent contact phases like prenotications, screening procedures, and reminders (Biemer & Lyberg, 2003; de Leeuw, 2005; Dillman, Smyth & Christian, 2009). For example, prenotications and reminders may be carried out by mode other phases may be carried out by mode
b
a
while the
(Table 2.4b). When a sample member
is always contacted by the very same survey mode, his condition is, once again, called a
uni-mode condition
(Table 2.4a).
Like with data-collection modes, multi-mode conditions with survey mode combinations are standard practice in survey methodology because they may reduce survey error while all sample members are nonetheless treated equally (de Leeuw, 2005; Dillman, Smyth & Christian, 2009). For example, nonresponse error may be reduced by establishing legitimacy and trust for face-to-face or telephone interviews through sending postal advance letters to all sample members (de Leeuw, Callegaro, Hox, Korendijk & Lensvelt-Mulders, 2007). Taking all together, multimode conditions with survey mode combinations are thus also supposed to provide a win-win situation (de Leeuw, 2005) and do not form any further concern within this dissertation. Second,
mixed-mode surveys
refer to surveys in which dierent survey modes
are used for contact with dierent sample members within at least one contact phase. For example, the rst sample member may always be contacted by mode
a
while the second sample member may always be contacted by mode
b
(Table
2.4c). When the same mode condition is used for all sample members, a survey is called a
single-mode survey
even though dierent survey modes might be used
(Table 2.4b). Like mixed-mode data-collections, mixed-mode surveys might be advantageous over single-mode surveys because they may provide analysis samples which better represent the population (de Leeuw, 2005; Dillman, Smyth & Christian, 2009). Nevertheless, mixed-mode surveys might also simultaneously introduce other error relative to single-mode surveys and, as already mentioned, this interplay between dierent forms of error is discussed in further detail in Section 2.4. To conclude this subsection, like with data-collection modes, both a mixedmode survey and multi-mode conditions with survey mode combinations can be included in the very same survey design. For example, the rst sample member
2. Some Basic Concepts of Mixed-Mode Surveys
24
may be prenotied and reminded by mode
a
but contacted by mode
wise, while the second sample member may be contacted by mode a data-collection follow-up in mode
a
b
b
other-
except for
(Table 2.4d).
Further, also note that no
communication might be treated as a special mode.
Theoretically speaking, no
communication between particular sample members and the research team may aect survey error and thus may introduce dierent survey conditions for dierent sample members.
The introduction of dierent conditions then render a survey
a mixed-mode survey even though this was not the initial intention.
No com-
munication may occur from dierent sources; There might be no contact because sample members can not be localized, because sample members lack a mean of communication (e.g. sample members without a telephone in a telephonic contact phase), or because a contact makes no sense. An example of the latter situation is the fact that no reminders are sent to sample members who already responded to the survey.
2.3 Mixed-mode survey designs Sample members of mixed-mode surveys can be selected for dierent modes according to dierent designs (see, among others, de Leeuw, 2005; Roberts, 2007; Dillman, Smyth & Christian, 2009; Martin, 2011). In general, mixed-mode survey designs can be divided into two classes, that are unstratied and stratied mixedmode survey designs.
Unstratied mixed-mode survey designs are characterised
by an a priori possibility of each sample unit to respond by each mode, while this is not true for stratied mixed-mode survey designs. Within both unstratied and stratied mixed-mode survey designs, sample members can further be selected for the dierent modes in two ways. Within unstratied mixed-mode survey designs, modes can be oered simultaneously within within
sequential
designs.
concurrent
designs, or sequentially
Within stratied mixed-mode survey designs, modes
can be oered on the basis of units' characteristics within or at random within
allocative
designs.
comparative
designs,
All four designs are discussed below in
Subsections 2.3.1 and 2.3.2, but note that these designs are merely archetypes. In practice, these four basic designs can simultaneously be included in one single survey.
2.3. Mixed-mode survey designs
25
Further, note that all four mixed-mode survey designs can theoretically as well as practically be used in cross-sectional and longitudinal settings.
In a cross-
sectional setting, each sample unit refers to one single person so that mixed-mode surveys refer to using dierent modes to collect data from dierent people.
In
a longitudinal setting, however, each sample unit refers to one single time point within a person rather than this person him- or herself, and mixed-mode surveys may thus refer to using dierent modes at dierent time points of data-collection. For that reason, it is better to speak in terms of sample units rather than sample members or persons.
2.3.1 Unstratied mixed-mode survey designs The rst class of mixed-mode survey designs includes
unstratied designs
, which
start from one random sample of units. Subsequently, the sample units are asked to respond by one of the dierent data-collection modes. Put dierently, in unstratied designs, each sample unit has an a priori possibility to respond by each mode.
Mode selection of sample units thus mainly depends on the units them-
selves. Within unstratied designs, units' selection for the dierent modes can be guided in either of two ways, that is concurrently or sequentially. First, in
concurrent mixed-mode survey designs
, dierent modes are oered
simultaneously to the sample units. This design might improve cooperation because people may appreciate being able to choose their preferred mode (Diment & Garrett-Jones, 2007; Shih & Fan, 2007). However, rm empirical evidence for the advantage of concurrent mixed-mode survey designs is lacking because several studies found equal or lower response rates compared to single-mode surveys (Medway & Fulton, 2012; Dillman, West & Clark, 1994; Millar & Dillman, 2011). These lower response rates might be explained by the observation that oering mode choices might make each mode less appealing (Schwartz, 2004). Second, in
sequential mixed-mode survey designs
, dierent modes are oered
sequentially to the sample units. Usually, a cheap mode is used for the initial datacollection phase while an expensive mode is used for non-response or non-contact follow-up in a second phase (de Leeuw, 2005). Sequential designs are considered more attractive than concurrent designs for lowering costs and increasing response
2. Some Basic Concepts of Mixed-Mode Surveys
26
rates (Hochstim, 1967; Siemiatycki, 1979; Converse, Wolfe, Huang & Oswald, 2008; Dillman, Phelps et al., 2009; Millar & Dillman, 2011). Firstly, lower costs are achieved because the cheap modes may be exhausted to their fullest potential. Secondly, higher response rates may be achieved because a dicult mode choice is circumvented (Schwartz, 2004).
2.3.2 Stratied mixed-mode survey designs The second class of mixed-mode survey designs includes
stratied designs
, which
rst divide the population in dierent strata and subsequently assign each stratum to one mode.
In contrast to unstratied mixed-mode survey designs, stratied
mixed-mode survey designs are thus characterized by an a priori impossibility of each sample unit to respond by each mode because units will never be oered more than one mode (or set of modes). Mode selection of sample units thus mainly depends on the research team and not on the units themselves. Unit stratication can be done on the basis of population members' characteristics within
allocative comparative mixed-mode survey designs
designs, or at random within First, in
comparative
designs.
, dierent modes are oered to
dierent groups or strata on the basis of units' characteristics. For example, strata can be dened by country of residence within cross-national or cross-cultural research, or by time points within longitudinal surveys. Although group comparison relies on a `principle of equivalence' (Jowell, 1998), the reasons for using comparative mixed-mode survey designs are numerous as illustrated by the examples below (de Leeuw, 2005). In cross-national research, comparative mixed-mode survey designs may be justied for the following reasons. Firstly, dierent survey traditions among dierent countries or survey agencies may urge to mix modes across countries. Secondly, differences in practical restraints across countries may oblige to use dierent modes. For example, in some countries detailed registers and address informations are available, but in others they are not.
Thirdly, population groups may dier on
several aspects (e.g., literacy levels or electronic know-how) and require dierent survey modes. Fourthly, mode coverage may dier among groups (e.g., the lack of electronic equipment within some countries, or immigrants with less Internet
2.3. Mixed-mode survey designs
27
access). Lastly, geographical considerations may make comparative mixed-mode data-collections more suitable (e.g., densely populated areas make face-to-face surveys attractive, but sparsely populated areas do not).
In short, a mode that is
optimal for one country may be less suitable for an other and single-mode surveys might thus lead to large variation in data quality among dierent population groups while mixed-mode surveys might avoid this problem. In longitudinal research, comparative mixed-mode survey designs may be justied by both practical and costs considerations (Dillman & Christian, 2005; de Leeuw, 2005). Firstly, expensive modes are often chosen for the initial phase because these modes usually allow good sampling frames, are eective in gaining cooperation, and allow optimal screening.
Once a good panel of respondents is
established in this rst phase, subsequent phases may be executed by cheaper modes to lower total survey costs.
Secondly, the mode of the initial phase may
become inecient in later phases.
For example, when panel members disperse
geographically, face-to-face contacts become dicult in contrast to mail, Internet or telephone contacts.
Thirdly, as a panel study continues, new data-collection
techniques may become available to replace the original data-collection mode (e.g. Internet surveys in the last decades). Fourthly, some phases require specic datacollection modes because of their content (e.g. medical tests require face-to-face contacts). Lastly, modes can be switched back and forth during the whole study in order to have better control on data quality. Second, in
allocative mixed-mode survey designs
, independent samples are
drawn from the population and each sample is assigned to a dierent mode (Feskens, Kappelhof, Dagevos & Stoop, 2010).
This design may be preferred
because it avoids annoying recontacts like in sequential designs or the possibly dicult choice between modes in concurrent designs. Allocative designs are sometimes used for survey-methodological research as a kind of experimental design. In applied research, however, their use is rather limited because thorough research towards this design is still lacking (possibly because this design is often erroneously classied as a concurrent design where units themselves choose the data-collection mode).
2. Some Basic Concepts of Mixed-Mode Surveys
28
2.4 Survey Error in Mixed-Mode Surveys Mixed-mode surveys are argued to be advantageous over single-mode surveys because they may reduce survey error.
However, this advantage might not auto-
matically be guaranteed because survey error might also be larger in mixed-mode surveys.
The relation between mixed-mode surveys and survey error forms the
main topic of this section.
As such, this section aims to clarify the merits and
pitfalls of mixed-mode surveys.
Subsection 2.4.1 rst provides a general discus-
sion of survey error and its components. Subsequently, Subsection 2.4.2 will apply this general survey error framework to mixed-mode surveys. As such, it will argue that mixed-mode surveys might lower survey error because they obtain better analysis samples, but might also increase survey error because of increased survey costs and measurement error. Nonetheless, it is dicult to evaluate survey error within mixed-mode surveys because selection eects are completely confounded with measurement eects.
Possible solutions for this confounding problem are
discussed in Subsection 2.4.3.
2.4.1 Survey error: a short overview The main goal of surveys is to provide answers to scientic research questions as accurately as possible. knowledge of a
population
The answer to one particular research question requires
target statistic
which is dened on a
target variable
within a
target
(Hansen, Hurwitz, Marks & Mauldin, 1951; Punch, 2003). However,
in practice, target statistics can only be approximately estimated (Lessler, 1984; Hansen et al., 1951). The dierence between a target statistic and its observed estimate is called
survey error
and may result in incorrect conclusions with respect
to the research question (Groves, 1989; Biemer, 2010; Groves et al., 2004). Survey error may be introduced during two inferential steps, that are the and the
measurement step
selection step
(Groves et al., 2004; Groves, 1989; Särndal, Swensson
& Wretman, 1992; Saris, 1997; Voogt & Saris, 2005). First, the selection step refers to the selection of a small set of population members which is used for data-analysis. The true values of this small set of sample members dene a
true sample statistic
which would be unbiased estimator of the
2.4. Survey Error in Mixed-Mode Surveys
29
target statistic under simple random sampling from the entire population. The difference between this true sample statistic and the target statistic is called
error
selection
(or error of non-observation; Groves, 1989). Selection error thus refers to er-
ror caused by dierences between the selected small set of population members and the entire target population. Such dierences may occur from several sources including the operationalization of the population, coverage of the sampling frame, random sampling, non-contact of population members, response incapability of population members, response refusal of population members, item non-response, case-deletion during data-cleaning, case-deletion during the analyses, or changes in population composition over time (see Table 2.5). Most of these errors depend on decisions from the research team, except for non-contact, response-incapability, response-refusal, and item non-response, which depend on the population members themselves. Second, the
measurement step
refers to the measurement of the selected pop-
ulation members' data. From these observed data, an
observed sample statistic
is
calculated as an estimate of the true sample statistic and, by extension, the target statistic. The dierence between this observed sample statistic and the true sample statistic is called
measurement error
(or error of observation; Groves, 1989). Meas-
urement error thus refers to error caused by dierences between observed and true values of selected population members. Such dierences may occur from several sources including the operationalization of the variables, the formulation of the questions, the questionnaire and question design, the questionnaire transmission, the respondents' comprehensions of the questions, the respondents' cognitive efforts for retrieving relevant information to answer the question, the respondents' judgements of their answers, the respondents' answer reporting, the response administration, data processing, statistical analysis, and timeliness of measurement (see Table 2.6). Most of these errors depend on decisions from the research team, except for question comprehension, retrieval of relevant information, judgement of answer, and response reporting, which depend on the respondents themselves. Given a particular survey design, selection and measurement errors may be systematic or random. Systematic error is error which is persistent over all hypothetical realisations of a survey using the very same design. Systematic selection error is called
selection bias
and may arise from, among others, coverage error, non-
2. Some Basic Concepts of Mixed-Mode Surveys
30
Table 2.5:
Selection error may occur from dierent sources.
Error Description Research team dependent sources: 1) operationalization A dierence between a fuzzy denition of the target population of the population within the research question and a well-dened verbal description of the population. 2) coverage A dierence between the described population and the sampling frame. 3) sampling A dierence between the sampling frame and the random sample from this sampling frame. Population members dependent sources: 4) non-contact A failure to get contact with particular sample members. 5) responseParticular sample members are incapable of participating in the incapability survey. 6) response-refusal Particular sample members refuse to participate in the survey. 7) item non-response Respondents do not provide substantial answers to the target variable. Research team dependent sources: 8) data-cleaning Erroneous deletion of respondents from the data le during datadeletion cleaning. 9) analysis deletion Particular respondents are ignored during analysis (e.g. listwise deletion). 10) timeliness in The population composition changes between the start of the surselection vey and the tabulation of the results.
2.4. Survey Error in Mixed-Mode Surveys
Table 2.6:
Measurement error may occur from dierent sources.
Error Description Research team dependent sources: 1) operationalization Dierence between a fuzzy denition of the target variable within of the variables the research question and a well-dened verbal description of that variable. 2) question Formulation of a question which asks for one particular value informulation stead of the intended true (possibly varying) value. 3) questionnaire & The inuence on the outcome of, among others, the question order, question design questionnaire and question lay-out, the presence of additional explanation, the questionnaire introduction, the use of open or closed questions, and the number of response categories. 4) questionnaire Eect of the communication medium on the outcome (e.g. intertransmission viewer eect). Population members dependent sources: 5) question A mismatch between the interpretation of the respondent and the comprehension intended interpretation of the researcher with respect to the questions and all possible answer categories. Respondent puts insucient eort into getting the correct answer 6) retrieval of (i.e. satiscing). relevant information 7) judgement of Respondent decides to misreport the answer (e.g. social desirable answer answers). 8) response reporting Respondent inaccurately responds the answer. Research team dependent sources: 9) response Inaccurate administration of the responses by the research team. administration 10) data processing Erroneous adaptations of values in the data le. 11) statistical analysis The use of imprecise or biased estimators or analysis models. 12) timeliness of Respondents change between the start of the survey and the tabumeasurement lation of the results.
31
2. Some Basic Concepts of Mixed-Mode Surveys
32
contact error, or response refusal error.
measurement bias
Systematic measurement error is called
and may arise from social desirability, primacy and recency ef-
fects, recall bias, and acquiescence. Random error, in contrast, is error which is neutralized over repetitions of the survey using the very same design. The best known example of random selection error is sampling error. An example of random measurement error might be primacy and recency eects when response categories are randomly ordered. The variability of random selection and measurement errors is called
selection variance
and
measurement variance
respectively.
The occurrence of survey error further denes the quality of a survey design, which can be decomposed into the validity (or accuracy) and reliability (or precision) (Lessler & Kalsbeek, 1992). The validity of a survey design is determined by the occurrence of systematic errors, while the reliability is determined by the occurrence of random errors. Survey quality can further also be decomposed between selection and measurement error. First, selection error species the
ity validity
external qualexternal
of a design because it refers to incorrect inferences from the analysis sample to
population members outside the sample. Selection bias thus determines , while selection variance determines
external reliability
internal quality internal validity internal reliability
ment error species the
. Second, measure-
of a design because it refers to incorrect
inferences from observed responses to the true values within the analysis sample. Measurement bias thus determines determines
, while measurement variance
.
2.4.2 Survey error in mixed-mode surveys Traditionally, survey research makes use of single-mode designs which start from the choice of one particular data-collection mode. However, this choice generally involves a trade-o between dierent forms of selection error because data-collection modes with high systematic selection error (e.g. nonresponse or coverage error) mainly go with low random selection error (e.g. sampling error) and vice versa. In order to avoid this trade-o between random and systematic selection error, survey methodologists have suggested to use mixed-mode surveys instead of single-mode surveys (de Leeuw, 2005; Dillman, Smyth & Christian, 2009). Indeed, mixed-mode
2.4. Survey Error in Mixed-Mode Surveys
33
surveys are argued to reduce selection error relative to single-mode surveys for two reasons. First, a mixed-mode survey may reduce systematic selection error (e.g., nonresponse error or coverage error) relative to a single-mode survey because certain population members might not be willing or able to respond to the mode of the single-mode survey but do respond to another mode in the mixed-mode survey. In this case, the mixed-mode survey oers greater external validity than the singlemode survey. Second, a mixed-mode survey may reduce random selection error (e.g. sampling error) because some respondents may respond by a cheap mode in the mixed-mode survey while they would respond by an expensive mode in the single-mode survey. In this case, the mixed-mode survey oers greater external reliability than the single-mode survey because larger samples can be drawn within the same budget constraints.
Note that this situation also allows to lower the total survey costs
while preserving both external validity and reliability. Nevertheless, both reasons make clear that mixed-mode surveys are only advantageous over single-mode surveys if dierent groups of respondents are selected for the dierent data-collection modes. Such a dierence is called a
selection eect
because it refers to dierences in selection error when both groups are separately compared with the target population. The absence of selection eects would mean that the population is equally well represented by the dierent mode groups of respondents. In that case, one could restrict focus to merely one group of respondents by using a single-mode design. However, notwithstanding a possible advantage of selection eects on selection error reduction, mixed-mode designs do not necessarily perform better than singlemode designs. Indeed, data quality of mixed-mode designs might be worse relative to single-mode designs for two reasons. First, the use of mixed-mode designs requires additional xed costs relative to single-mode designs for the development and implementation of the additional modes. These additional xed costs might overwhelm a reduction in variable costs per survey respondent. The choice for a mixed-mode design thus rstly involves a trade-o between xed and variable costs.
2. Some Basic Concepts of Mixed-Mode Surveys
34
Second, selection eects might be counteracted by another type of mode effects, namely
measurement eects
(de Leeuw, 2005; Voogt & Saris, 2005; Dillman,
Smyth & Christian, 2009; Weisberg, 2005).
Measurement eects are dierences
in measurement error accompanying the dierent data-collection modes (Voogt & Saris, 2005; Weisberg, 2005) and thus occur when the answers of the very same respondents would dier if dierent data-collection modes were used.
Typical
examples of measurement eects are social desirability error in interview modes, recency eects in telephone surveys, or satiscing in self-administration modes (see, among others, de Leeuw, 2005; Dillman, Smyth & Christian, 2009; Schwarz et al., 1991). The choice for a mixed-mode design thus secondly involves a trade-o between selection error and measurement error. Unfortunately, both trade-os are dicult to evaluate because selection effects and measurement eects are completely confounded within mixed-mode data. Indeed, selection and measurement eects are confounded because dierences between the groups of respondents selected for the dierent modes can either be caused by dierences between these respondents (i.e., a selection eect) or by differences in measurement (i.e., measurement eects). The simultaneous occurrence of selection eects and measurement eects in mixed-mode surveys thus complicates evaluation of real data quality of mixed-mode designs relative to single-mode designs.
2.4.3 Solutions for mixed-mode surveys The previous subsection made clear that it is dicult to evaluate data quality in mixed-mode surveys due to confounded selection and measurement eects. However, this confounding forms a central topic of the causal inference literature (e.g., among others, Morgan & Winship, 2009; Pearl, 2009; Weisberg, 2010).
Indeed,
causal inference theory can be applied to mixed-mode surveys because measurement eects merely refer to a causal eect of survey mode
Y.
M
on the target variable
Selection eects, in turn, refer to spurious correlations between
Y
and
cause dierent population members are selected for the dierent modes. consequence, the relation between urement eects (Figure 2.1a).
Y
and
M
M
be-
As a
contains both the selection and meas-
2.4. Survey Error in Mixed-Mode Surveys
35
B selection eect
M
Y measurement eect
(a) In a mixed-mode dataset, measurement and selection eects are completely confounded..
M
Y
Back-door covariates B allow for unbiased estimation of mode eects by blocking or explaining the selection effect. (b)
I
M
F
Y
(c) Front-door covariates F allow for unbiased estimation of mode eects by blocking or explaining the measurement eect.
M
Y
Instrumental variables I allow for unbiased estimation of conditional mode eects by manipulating mode selection. (d)
The relations between variables in mixed-mode data can be represented by causal graphs (Pearl, 1995, 2009).
Figure 2.1:
2. Some Basic Concepts of Mixed-Mode Surveys
36
In general, the causal inference literature provides three strategies to solve the confounding problem between selection and measurement eects by the inclusion of well-chosen covariates into the analysis model. These strategies involve the use of back-door covariates, front-door covariates, and instrumental covariates. The rst strategy to circumvent counterfactuals is the
back-door method
1995, 2009). This method involves the inclusion of a set of variables analysis model where and
M
B
B
(Pearl,
into the
explains the selection eects as a common cause of
(see Figure 2.1b).
The back-door method starts from two assumptions
(Pearl, 2009; Morgan & Winship, 2009).
mode selection assumption
Y
The rst assumption is the
and requires that
B
ignorable
fully captures the selection eect
between the modes. If this assumption does not hold, part of the selection eect is not captured and the confounding problem remains. is the
mode-insensitivity assumption
The second assumption
and requires that the
insensitive, i.e. there is no measurement eect on
B.
B
variables are mode-
If this assumption does not
hold, part of the measurement eect is channelled through
B
and the confounding
problem remains once again. Within the existing mixed-mode literature, the back-door method has already been widely applied (e.g., among others, Lugtig, Lensvelt-Mulders, Frerichs & Greven, 2011; Heerwegh & Loosveldt, 2011; Jäckle, Roberts & Lynn, 2010; Hayashi, 2007; Fricker, Galesic, Tourangeau & Yan, 2005; Holbrook, Green & Krosnick, 2003; Greeneld, Midanik & Rogers, 2000), but most of these applications merely use socio-demographic variables as back-door covariates.
However, such socio-
demographic variables might easily be argued to be mode-insensitive, but they might not suciently explain why dierent people are selected for the dierent modes. Nonetheless, this thorniness is largely ignored within the existing studies. Future studies may thus focus on searching better back-door covariates for explaining selection eects. Because selection eects mostly depend on the respondents themselves, focus should be set on mode dierences in noncontact, response capability, response refusal, and item nonresponse (see Table 2.5). Put dierently, proper back-door covariates should try to measure population members' capabilities to be contacted and to respond by each mode, population members' mode preferences, and population members' willingness to comprehensively complete surveys.
2.4. Survey Error in Mixed-Mode Surveys
37
The second strategy to circumvent counterfactuals is the (Pearl, 1995, 2009). variables
F
front-door method
The front-door method involves the inclusion of a set of
into the analysis model where
F
explains, in contrast to the back-
door method, the measurement eect as an intermediate variable between
M
(see Figure 2.1c).
and
Like the back-door method, the front-door method also
starts from two assumptions (Pearl, 2009; Morgan & Winship, 2009). assumption is the
Y
exhaustiveness assumption
and requires that
F
The rst
fully captures
the measurement eects between the modes. If this assumption does not hold, part of the measurement eect is not captured and the confounding problem remains. The second assumption is the selection eects on
F.
is channelled through
isolation assumption
and requires the absence of
If this assumption does not hold, part of the selection eect
F
and the confounding problem remains once again.
Unlike the back-door method, no application of the front-door method could be found within mixed-mode studies so far. Future studies may thus also focus on searching better front-door covariates for explaining measurement eects. Because measurement eects mostly depend on the respondents themselves, focus should be set on mode dierences in respondents' question comprehension, respondents' retrieval of relevant information, respondents' judgement of answer, respondents' reporting (see Table 2.6). Put dierently, proper front-door variables should try to measure, among others, recency and primacy eects, response burdens, satiscing, acquiescence, or social desirability. The third method to circumvent counterfactuals is the
method
instrumental variable
(Bowden & Turkington, 1990; Angrist et al., 1996; Heckman, 1997, 1996).
The instrumental variable method involves the inclusion of a binary variable
I
into the analysis model which divides the sample into two groups. However, this variable must meet one important requirement which is that all respondents of one group respond by one single mode, say mode veldt & Molenberghs, 2010).
m1
(Vannieuwenhuyze, Loos-
Put dierently, Variable
I
involves a comparison
between a mixed-mode dataset and a single-mode dataset and indicates to which dataset a sample member belongs. As a result, this variable determines the mode of data-collection
M
and partly breaks the confounding between selection and
measurement eects (Figure 2.1d). The use of an instrumental variable also starts from two assumptions. The rst assumption is the
measurement equivalence as-
2. Some Basic Concepts of Mixed-Mode Surveys
38
sumption ativity assumption
and requires that measurement error of mode
m1
is equal in both the
mixed-mode and the single-mode sample. The second assumption is the
represent-
and requires that the single-mode and the mixed-mode datasets
represent the same population. Like the front-door method, the instrumental variable method is less wellknown within the mixed-mode literature.
Though this method merely requires
one single-mode dataset and one mixed-mode dataset to be compared, future studies may also focus on survey design improvements such that the instrumental variable method assumptions are optimally met.
However, within the context
of mixed-mode surveys, the instrumental variable method might not be optimal for several reasons (Vannieuwenhuyze et al., 2010; Vannieuwenhuyze, Loosveldt & Molenberghs, 2012). Firstly, the instrumental variable method only allows for estimating conditional mode eects but not marginal mode eects. Secondly, the instrumental variable method automatically implies that the survey mode of the single-mode survey to be the benchmark mode, that is the mode which comes with ignorable measurement error according to the researcher's belief. This forced choice of the benchmark mode might be strange and unwanted in some situations. Thirdly, the instrumental variable method does not prove itself to be useful for estimation of target statistics, which usually is the rst goal of a survey. Nevertheless, the instrumental variable method might still be useful for obtaining selection and measurement eects estimates. The back-door, front-door, and instrumental variable methods provide promising tools for estimating mixed-mode data quality and for measurement error adjustment. The papers of this dissertation, which are included in Chapters 3 to 8, provide thorough discussions of all three methods including the assumptions, advantages, disadvantages, possible improvements, and possible applications.
2.5 Illustration: The Survey on Surveys This section discusses the `Survey on Surveys' dataset as an illustration to the concepts introduced in this chapter.
This dataset is analysed in four of the six
papers of this dissertation. The Survey on Surveys data stem from a survey about population members' opinions about surveys which was organized in Flanders,
2.5. Illustration: The Survey on Surveys
39
Belgium, by the Centre for Sociological Research of the KU Leuven, in the fall of 2004 (Storms & Loosveldt, 2005).
The survey was presented as a survey about
opinion polls in Flanders and included a
e5
gift voucher as an incentive.
The Survey on Surveys sample consisted of 1200 Flemish persons aged between 18 and 80 sampled from the national register.
First, each sample member was
randomly selected for one of two arms (Figure 2.2). The upper arm includes 960 sample members, while the lower arm includes 240 sample members.
A sample
member of the upper arm was rst contacted by a letter (UM) with an invitation to complete an enclosed paper questionnaire (MSAQ). If the sample member did not return the paper questionnaire, a rst reminder was sent by mail (UM) two weeks later. If the sample member did still not return the questionnaire, a second reminder accompanied by a new questionnaire was sent by mail (UM once again) four weeks after the rst reminder. Sample members who did not return the paper questionnaire after two months were contacted by an interviewer at home (AF) to complete a face-to-face interview (CAPI). Nonetheless, the sample members did not know about this face-to-face follow-up during the initial mail phase.
A sample member of the lower arm was immediately contacted by an
interviewer at home to complete a face-to-face interview (AF followed by CAPI). During the entire survey process, dierent sequences of data-collection and survey modes are used for communication between the research team and the sample members. With respect to survey modes, ve groups of respondents can be distinguished (See Figure 2.2).
Groups
1,
2 , and
3
are mail respondents
but dier from each other by the amount of required reminders or mail contacts, that is no reminder, one reminder, and two reminders respectively.
Group
4
includes respondents who answered by a face-to-face interview but who were rst contacted by mail. Group 5 , also contains face-to-face respondents but who were not contacted for a mail questionnaire initially. These groups thus clearly show that the entire Survey on Surveys is a mixed-mode survey where dierent sequences of survey modes are used for dierent respondents. In practice, survey researchers mainly focus on the data-collection phase because most error is introduced during this phase.
Only considering the data-
collection modes (which are highlighted by the rectangles in Figure 2.2), groups 1 , 2 and 3 collapse to one group because they all involve an MSAQ. Likewise,
2. Some Basic Concepts of Mixed-Mode Surveys
40 1st contact
UM
1
MSAQ
1st reminder
UM
2nd reminder
UM
random
2
MSAQ
3
MSAQ FU contact
CAPI
AF
1st contact
CAPI
AF
4
nonresponse 5
nonresponse
The survey design of the Survey on Surveys can be represented as a decision tree in which each sample member follows one path from left to right and encounters several survey and data-collection modes. Figure 2.2:
groups 4 and 5 collapse as they include CAPI. Put dierently, two distinct datacollection mode groups are encountered within the entire Survey on Survey, namely an MSAQ and a CAPI group. Because sample members are randomly assigned to the upper or lower arm of the total survey design, both arms can be treated as separate studies as well. As a consequence, both arms can separately be evaluated for being a mixed-mode or single-mode survey. The upper arm clearly is a mixed-mode survey with four mode groups, three mail groups 1 , 2 , 3 with a dierent amount of reminders, and the face-to-face group 4 (Figure 2.2). The upper arm remains a mixed-mode survey when one merely focusses on data-collection modes because an MSAQ group ( 1 , 2
and
3 ) can be distinguished from a CAPI group ( 4 ).
The lower arm, in
contrast, is a single-mode survey because only one group of respondents can be distinguished, namely group 5 which includes a sequence of an AF contact and a CAPI data-collection. Further, mixed-mode survey data can be collected by dierent designs.
The
Survey on Surveys includes an allocative design because of the randomization in two arms. The upper arm, however, also includes a sequential mixed-mode design because the sample members have no knowledge of the face-to-face follow-up at
2.5. Illustration: The Survey on Surveys
41
the outset. The lower arm, in turn, does not include an additional mixed-mode design.
In short, the entire Survey on Surveys design is a combination of one
sequential design within one allocative design. The particular survey design of the Survey on surveys was set up in order to lower survey error. Indeed, the sequential mixed-mode mailface-to-face design was used for obtaining lower selection error relative to a single-mode mail or singlemode face-to-face design. First, a single-mode mail design would probably involve larger selection bias because non-respondents would likely be more negative about surveys. Within the mixed-mode design, data of such non-respondents could still be collected by the face-to-face follow-up because face-to-face contact is known to be more persuasive for response compared to contact via mail. Second, a singlemode face-to-face design would probably involve larger selection variance because of the expensiveness of face-to-face interviews. Indeed, face-to-face interviews require interviewer payments and thus restrict the possible total sample size. Within the mixed-mode design, a larger sample can be drawn because part of the respondents answer by a cheap mail questionnaire. The particular topic of the Survey on Surveys, however, might cause a confounding between selection eects and measurement eects.
First, as already
noted, there might be selection eects as non-respondents to the mail questionnaire are likely to be more negative about surveys (Loosveldt & Storms, 2008). The mail group data conrmed this expectation: the later a mail questionnaire was returned, the more negative the opinions are about surveys. Measurement eects may occur because respondents will probably tend to report more positive social desirable opinions about surveys when interviewed face-to-face relative to administration by anonymous mail questionnaires. The mere presence of the interviewer may lead respondents to give socially desirable answers. Consequently, the positive answers obtained in the face-to-face follow-up may not reect the respondents' real opinions. In contrast to the mail survey, face-to-face interviews thus introduce a serious risk of measurement error, which results in a measurement eect (Dillman, Smyth & Christian, 2009; Voogt & Saris, 2005). Four of the six papers in Chapters 3 to 8 use the Survey on Surveys data for illustration of the back-door, the front-door, and the instrumental variable methods. The instrumental variable method can straightforwardly be applied by comparing
2. Some Basic Concepts of Mixed-Mode Surveys
42
the two arms of the Survey on Surveys design, that is by comparing the sequential mixed-mode mailface-to-face data with the single-mode face-to-face data. Application of the back-door and front-door methods, however, required ad hoc solutions because the Survey on Surveys data were not collected for the application of both methods.
For the back-door method, socio-demographic variables
like sex, age, education level, and employment status are used as back-door covariates because these variables are also widely used in other mixed-mode studies. For the front-door method, a question is used about whether respondents found answering the questions a pleasant or unpleasant task. The application of all three methods provide very dierent selection and measurement eects estimates. These dierences show that the assumptions of some methods are not optimally met.
Part II Papers
Chapter 3 A Method for Evaluating Mode Eects in Mixed-mode Surveys Jorre T. A. Vannieuwenhuyze, Geert Loosveldt, & Geert Molenberghs Published in
Public Opinion Quarterly, 74(5), 10271045, in 2010.
Survey designs in which data from dierent groups of respondents are collected by different survey modes, become increasingly popular. However, such mixed-mode (MM) designs lead to a confounding of selection eects and measurement eects (measurement error) caused by mode dierences. Consequently, MM data has poor quality. Nevertheless, comparing MM data with data from a comparable single-mode survey allows measuring selection eects and measurement eects separately. The authors develop a method to evaluate mode eects and illustrate this method with data from a Dutch MM experiment within the European Social Survey program. In this experiment, respondents could choose between three modes: a web survey, a telephone interview, or a face-toface interview. Mode eects on three political variables are evaluated: interest in politics, perceived complexity of politics, and voter turnout in the last national election. Measurement Error, Mode eects, Mixed-Mode Survey, Selection eects, European Social Survey Abstract:
Keywords:
3.1 Introduction Increasingly, data are gathered by mixing dierent survey modes in one design (Dillman, Smyth & Christian, 2009; Weisberg, 2005).
One type of such mixed-
mode (MM) designs includes the collection of the same data from dierent sample members by dierent modes. Such a MM data collection can be advantageous in several ways (de Leeuw, 2005; Dillman, Smyth & Christian, 2009). First, it can
3. Evaluating Mode Effects in MM Surveys
46
help reduce coverage error because several modes are available to contact dierent groups of hard-to-reach respondents. Second, a MM data collection can help lowering non-response and non-response bias in order to reduce the Total Survey Error (TSE) because every respondent can choose his mode of preference between several modes.
Third, MM data collections can help reducing costs because a
substantial part of the sample will be surveyed by a cheap mode. However, notwithstanding their advantages, MM designs do not automatically lead to higher data quality or smaller TSE (Voogt & Saris, 2005). MM designs may lower non-response bias and avoid coverage error, but they may introduce other forms of bias as well.
Mode eects
can make MM data highly unusable by
simultaneously generating selection eects and measurement eects (measurement error).
Selection eects
occur when dierent types of respondents choose dierent
modes to complete the survey.
As such, they are forms of non-response error,
i.e. various types of respondents do not respond in certain modes by self-selecting themselves for another mode. The occurrence of a selection eect is in itself not a problem. On the contrary, its occurrence makes using a MM design valuable. Indeed, because of selection eects, some respondents may accept participation while they would not (non-response) or could not (non-coverage) in a single-mode survey (Biemer, 2001; Day, Dunt & Day, 1995; de Leeuw & Van Der Zouwen, 1988; Dillman, Phelps et al., 2009; Voogt & Saris, 2005). Similarly, others will accept participation by a cheap mode lowering total survey costs.
Measurement eects
, on the other hand, refer to the inuence of a survey mode
on the answers respondents give, so that one person would give dierent answers in dierent modes (Bowling, 2005; Voogt & Saris, 2005; Weisberg, 2005).
Put
dierently, measurement eects are caused by dierences in measurement errors (Groves, 1989).
These errors may originate from dierences in, among others,
whether items are presented sequentially or simultaneously to the respondent, interviewer eects and social desirability, primacy and recency eects, recall bias, acquiescence, etc. (Bowling, 2005; Brick & Lepowski, 2008; de Leeuw, 2005, 1992; Dillman, 1991; Dillman, Smyth & Christian, 2009; Schwarz et al., 1991). In order to evaluate the TSE introduced by a MM data collection, selection eects and measurement eects should be investigated separately.
The major
3.2. A method to disentangle mode effects
47
problem of MM designs, however, is that selection eects and measurement eects are completely confounded. Dierences (or similarities) between the outcomes of modes can be caused by dierences between the respondents or by dierences in measurement error (de Leeuw, 1992; Weisberg, 2005).
The literature suggests
using response matching on a set of mode-insensitive variables (e.g. gender, age, education level, etc.) to disentangle both mode eects (e.g. de Leeuw, 2005; Jäckle et al., 2010). Nevertheless, this method assumes that the matching variables are closely related with the variables of interest, but this assumption can hardly be supported. So, exclusive focus on MM survey data almost precludes evaluation of selection eects and measurement eects separately. However, comparing MM data with data of a comparable single-mode survey allows disentangling mode eects to a certain extent. This article aims to develop a method to disentangle measurement eects from selection eects on the proportions and the mean of a multinomial variable by comparing a MM dataset with a comparable single-mode dataset. This method will be introduced in the next section. Subsequently, section 3.3 illustrates this method with mixed-mode data from the European Social Survey by calculating the mode eects on the parameters of three politics related variables.
3.2 A method to disentangle mode eects in a mixed-mode dataset using comparable singlemode data Let us assume we have a mixed-mode (MM) dataset of size
nm
where some re-
spondents responded by mode A while the others responded by mode B. Let us further assume that we also have a single-mode dataset where all respondents responded by mode A. We will call this dataset the comparative dataset because this data will be compared with the data from the MM sample. Let sample size of this comparative sample, and Further, we denote by
Y
n = nm + nc
Y
denote the
the total sample size.
J categories. Ya and Yb . Ya refers A, while Yb refers to
the multinomial variable of interest with
Two versions of this variable can be distinguished, namely to the values of
nc
when this variable is observed by mode
3. Evaluating Mode Effects in MM Surveys
48
the same variable though observed by mode B. We assume that each population member takes values on both these variables and these values are not necessarily the same for each person. Considering the outcome of dierent survey modes as dierent variables allows us to evaluate measurement eects merely by comparing
Ya
and
Yb .
Of course, given the survey design either
Ya
or
Yb
is observed for
Ya and Yb = (πm1 , . . . , πmJ )
each respondent and this problem should be circumvented. Both a multinomial distribution with parameter vector
m=a
or
b
πm
follow where
respectively.
Additionally, we dene variable
M
as the mode the respondent `chooses' when
he or she is or would be a respondent of the Mixed-Mode experiment. Thus, is a binary variable with values
a
(mode A) or
b
M
(mode B) following a Bernoulli
distribution.
3.2.1 Representativity assumption As already noted, our method to evaluate mode eects involves comparing the MM sample with the comparative sample. However, in doing so we implicitly assume that the realized samples (MM and comparative) represent the same population. Put dierently, we assume that dierences in the distribution of the unbiased version of the variable(s) of interest are only caused by sampling error (or purely random non-response and coverage error).
entativity assumption
.
We call this assumption the
repres-
Dierences in systematic coverage error can usually be
evaluated easily by comparing how the sampling frame was set up in both survey designs. Unfortunately, dierences in systematic non-response error, in contrast, can generally not be evaluated directly. Nevertheless, two arguments can be put forward to substantiate this assumption. First, if both samples contain a comparable set of respondents, all respondents of the MM sample, either responding by mode A or B, would also accept participation a single-mode survey completely conducted by mode A. In some situations, this assumption is reasonable given the used modes as our example in Section 3.3 illustrates. As a consequence, we expect the dierence between the response rates to be zero and this dierence can be tested statistically. A dierence in response rates which is not signicant is an argument enforcing the representativity assump-
3.2. A method to disentangle mode effects
49
tion. Still, a comparison of response rates as an argument for the representativity assumption is not decisive because both samples may have attracted dierent respondents by putting dierent eort to reach certain types of respondents. A second argument for the representativity assumption involves a comparison of the composition of both datasets on a set of `mode-insensitive' socio-demographical variables. If both samples turn out to have a comparable composition, this can be used as an additional argument in favor of the representativity assumption. Still, this argument is not decisive either because it is only valid if these sociodemographical variables are closely related with the unbiased version of the variable(s) of interest.
3.2.2 Dening the mode eects We can now dene the selection eect on the proportion parameter of category
j
as
the dierence between this proportion measured by the same mode, but observed on the two dierent groups of respondents, namely those who would answer by mode A and those who would answer by mode B in the MM sample. If we choose mode A as the standard mode, the selection eect category
j
Sa (πj )
on proportion
of
can be dened as follows:
Sa (πj ) = P (Ya = j|M = a) − P (Ya = j|M = b). Next, we can dene the measurement eect
πj
πj
of category
j
(3.1)
M (πj ) on the proportion parameter
as the dierence between this proportion measured by the two
dierent modes, though observed on the same group of respondents. If this group of respondents are the respondents who would choose mode B (i.e.
M = b),
the
measurement eect is equal to
Mb (πj ) = P (Yb = j|M = b) − P (Ya = j|M = b).
(3.2)
P (Ya = j|M = a) and P (Yb = j|M = b) can simply be estimated with the MM data. P (Ya = j|M = b), however, is never observed directly because Ya is not measured for the respondents who chose to answer by mode B. In both these denitions,
3. Evaluating Mode Effects in MM Surveys
50
Nonetheless we can use the law of total probability to prove that
P (Ya = j|M = b) = P (Ya = j)
P (M = a) 1 − P (Ya = j|M = a) . P (M = b) P (M = b)
(3.3)
If we substitute (3.3) into (3.1) and (3.2), we get
Sa (πj ) =
1 [P (Ya = j|M = a) − P (Ya = j)] P (M = b)
(3.4)
and
Mb (πj ) = P (Yb = j|M = b) − P (Ya = j)
1 P (M = a) + P (Ya = j|M = a) . P (M = b) P (M = b) (3.5)
Given the available data we can estimate the factors on the right hand side of both (3.4) and (3.5): -
P (Ya )
from the comparative dataset, which is a sample completely surveyed
by mode A. -
P (Ya |M = a) from the MM data, more specically from the respondents who responded by mode A.
-
P (Yb |M = b)
from the MM data as well, but now from the respondents who
responded by mode B. -
P (M = a)
Sometimes,
P (M = b) from the whole MM data set variable Y is a scale variable where the categories and
can be ordered
and the dierence between every two adjacent categories can be assumed to be equal. In that situation, we can also dene the mode eects on the mean, because the mean can be expressed as a function of the proportions:
µm =
J X
jπm,j
for
m=a
or
b.
(3.6)
j=1 It can be shown that the selection eects on the mean equals
Sa (µ) = (µa = j|M = a) − (µa = j|M = b) P = Jj=1 jSa (πj )
(3.7)
3.2. A method to disentangle mode effects
51
and the measurement eect on the mean is
Mb (µ) = (µb = j|M = b) − (µa = j|M = b) P = Jj=1 jMb (πj )
(3.8)
All mode eects, as dened in (3.4), (3.5), (3.7), and (3.8), are transformations of proportion parameters. All these proportions can be estimated from the sample data and their sampling distribution is known to be asymptotically normal (Agresti, 2002; Casella & Berger, 2002). The sampling variances and covariances of these proportion estimates can also be calculated easily. Given these properties, the Delta method restricted to the rst-order Taylor series approximation (Agresti, 2002; Casella & Berger, 2002) proves that the selection and measurement eects are asymptotically normal as well, and provides approximations of their sampling variances. For a detailed overview of these calculations, we refer to the technical note of Vannieuwenhuyze and Molenberghs (2010).
3.2.3 Required sample size calculations/ Power issues An additional question which should be asked when evaluating the mode-eects is whether the total sample size mode eect sizes. Let a minimal power
z -value
β
θ
n
is suciently large to detect small to medium
denote the size of the mode eect we want to detect with
given that we use a signicance level
√ z = θ/ σ 2 ,
α. θ
corresponds with a
(3.9)
σ 2 is the sampling variance of the mode eect estimate. The absolute value −1 −1 −1 of this z -value should at least be equal to Φ (α) + Φ (β), where Φ is the −1 inverse cumulative normal function. Φ (α) corresponds with the minimal z -value −1 to detect a signicant eect with signicance level α. Φ (β) is the dierence −1 between Φ (α) and the required z -value of θ so that θ is detected with a minimal power β . For example, if we like to detect a mode eect with a power of 0.80 while it is evaluated with a one-sided test with signicance level α of 0.95, the z -value where
3. Evaluating Mode Effects in MM Surveys
52 corresponding with
θ
should be
|z| ≥ Φ−1 (0.95) + Φ−1 (0.80) = 1.64 + 0.84 = 2.48. Further, using the properties of the Delta method (Agresti, 2002; Casella & Berger, 2002), it can be shown that all the sampling variances of the mode eects are of the form
σ2 = In this equation
ac /nc
am /nm
and
am ac + . nc nm
(3.10)
represent the contribution of respectively the
comparative and the mixed mode sample to the sampling variance of the mode eects estimates.
ac
and
am
can be calculated in analogy with the Delta method
but using a covariance matrix for the sample proportions which is not corrected for the sample sizes of both samples. As a result these statistics do not depend on these sample sizes. The exact formulas of
ac
and
am
can be found in the technical
note of Vannieuwenhuyze and Molenberghs (2010) as well. Given the estimation of
ac
and
am
from the data, implementing (3.10) into
(3.9) allows calculating the minimal required sample sizes to achieve a decent power given the critical signicance level:
ac nc
−1 2 θ2 −1 ≥ Φ (α) + Φ (β) . + nam m
(3.11)
Because the total sample includes two independent samples, two strategies can be used. In the rst strategy, the sample size of the mixed-mode sample, the comparative sample,
nc ,
nm ,
or
is held constant and the required sample size of the
other sample is calculated by rearranging the terms in (3.11). For a xed sample size of the comparative group, the minimal sample size of the mixed-mode sample becomes:
n m ≥ am
θ2 ac 2 − −1 −1 nc [Φ (α) + Φ (β)]
−1
The second strategy involves keeping the ratio of both
. nm
(3.12)
and
nc
constant, so
that they can be expressed as functions of the overall total sample size:
nm = λn
3.3. An illustration with the ESS data and
nc = (1 − λ)n
where
0 < λ < 1. λ
refers to the proportion of the total sample
size which is assigned to the MM design. When overall total sample size
n
53
λ
is kept constant, the required
to achieve the preferred power can be calculated by
n ≥ (λac + (1 − λ)am )
θ2 λ(1 − λ) [Φ−1 (α) + Φ−1 (β)]2
−1 .
(3.13)
3.3 An illustration with the ESS data 3.3.1 the ESS and the mixed mode experiment The European Social Survey (ESS) started in 2002 as a biennial survey conducted in 30 European countries (ESS Round 4: European Social Survey Round 4 Data, 2008).
Its goal is to chart and explain the interaction between Europe's chan-
ging institutions and the attitudes, beliefs and behavior patterns of its diverse populations.
It contains topics like, among others, trust, politics, social values,
social exclusion, discrimination, religion, national identity, and life course. So far, four waves of data gathering have been performed with the last wave elded in 2008/2009. In order to encourage equivalence across countries, all ESS surveys have completely been carried out by face-to-face personal interviews (CAPI) so far. Because of the costs of FTF interviews, declining funds, declining response rates, changing coverage issues, and the resistance from certain countries without a tradition of conducting FTF interviews, a MM experiment was set up in the Netherlands parallel to the 4th round (Eva et al., 2010). The purpose of this MM experiment was to compare a mixed mode survey design with the main Dutch ESS survey by using exactly the same questionnaire. In this illustration, we will use the 2674 sample members of the main Dutch ESS data which could be matched to a telephone number in the sampling list. These respondents were reached by at most 10 interviewer contact attempts at home. In the MM experiment, a sample of 878 persons with a matched phone number was drawn from the very same sampling list and assigned to a concurrent MM design. In this concurrent design, sample members could choose between three survey
3. Evaluating Mode Effects in MM Surveys
54
modes, a web questionnaire (CAWI), a telephonic interview (CATI) or a face-
1
to-face personal interview at home (CAPI), from the very rst contact . Sample members without matched telephone were also included in both the main ESS and the MM experiment, but almost all of the experiment respondents responded by a FTF interview as well. Consequently, this group is hardly useful to evaluate mode eects. Both samples contain a simple random sample of households in which one household member older than 15 years was selected randomly. To correct for dierences in household sizes, normalized design weights proportional to the household size were used in all analyses. The MM experiment started with a telephonic contact (1st telephonic screening) including 14 call attempts. If a person was willing to participate the survey, the dierent survey modes were oered simultaneously so that the respondent could immediately choose his or her preferred mode.
All sample members who
could not be contacted or refused to participate in the 1st screening were subject to a second telephonic screening which was performed analogously to the rst screening. The follow up of non-response depended on the mode someone chose in the telephonic screenings. First, the respondents who chose to complete the web questionnaire were recontacted at most 14 times telephonically to remind them to complete the questionnaire. If a respondent refused to complete the web questionnaire, still a telephonic or FTF interview was oered.
Nonetheless, these non-respondents
were not automatically recontacted by an interviewer at their house. Second, the sample members who chose a telephonic interview were either interviewed immediately during the telephonic screening or an appointment was made for a call back.
Although these sample members were allowed to change
their mind and to ask for a Web survey or a FTF interview, only one switched to a Web survey. Non-response could occur if there was no contact at an appointment. These non-respondents were approached FTF for a personal interview in a follow up phase after the telephonic screening phase.
1 The
MM experiment also contains a sequential design in which modes were oered sequentially (rst web, then telephone, then FTF) in stead of simultaneously. However, we only restrict our analyses to the concurrent MM data.
3.3. An illustration with the ESS data
55
Lastly, the respondents who chose a FTF interview were visited by an interviewer at home.
Non-contacts or non-response were not followed up in another
survey mode. Sample members who could not be contacted or who refused to participate during the telephonic screening were subject to a FTF follow up as well. These respondents were oered to complete a personal interview.
If they refused, still
the web survey and the telephone survey were oered, in that order. Response frequencies of both datasets can be found in Table 3.1. For convenience, respondents with partial incomplete answers on the variables described in the next section were left out for the further analyses. Both the main ESS data and the MM experiment data were further separately weighted on a set of sociodemographical variables (age x sex, urbanization, and household size) increasing the population representativeness. The marginal population distributions of these
2
variables were obtained from the `Centraal bureau voor de statistiek (CBS) '. The adjusting post-stratication weights were calculated using iterative proportional tting or raking procedures (Deming & Stephan, 1940; Izrael, Hoaglin & Battaglia, 2000). To end we should make one additional remark. The MM sample is gathered by three survey modes and selection eects and measurement eects can be expected between all of these modes. However, our method only allows evaluating dierences between CAPI (mode A) on the one hand and a combination of CATI and CAWI (mode B) on the other hand. The latter two modes cannot be compared with each other without additional assumptions. As a consequence, mode A corresponds in a certain way with a single mode CAPI survey while mode B corresponds with a concurrent mixed mode CATI-CAWI survey.
The measurement eects then
represents the dierences between the parameter estimates of both these surveys. The selection eects, on the other hand, represent the dierences between the respondents who choose CAPI or who choose CATI or CAWI in a three mode design, but on the parameter estimates which would be obtained with a two mode CATI-CAWI survey for all these sample members (i.e.
mode B). This specic
problem would not have happened if the MM experiment contained only two modes (CAPI and any other mode).
2 www.cbs.nl
3. Evaluating Mode Effects in MM Surveys
56 Table 3.1:
rates
Response frequencies and response ESS MM exp.
ESS round 4
CAWI
160
CATI
88
CAPI
104
1294
352
1294
15
72
nonresponse
313
1022
noncontact
108
125
not eligible
90
161
878
2674
44,67%
51,49%
total response partial response
total sample ∗ response rate
based on sample members with matched phone number only : = total response/(total sample - not eligible) ∗
3.3.2 Checking the representativity assumption Since both samples are drawn from the very same sampling frame, there can be no dierence in systematic coverage error. Further, it is well-known and generally observed that CAPI often results in high response rates (relative to the other modes) (de Leeuw, 1992). Consequently, a switch from a single-mode CAPI survey to a mixed mode survey is probably mainly driven by the idea of lowering costs rather than increasing response and coverage. Put dierently, it makes sense to theoretically assume that the CAWI and CATI choosers of the MM experiment, would also accept to participate by a FTF survey when they were sampled for the main ESS round 4 data collection. However, the response rate of the ESS MM experiment is, remarkably, signicantly smaller than the response rate of the main ESS survey (±7%, see Table 3.1). This inequality is probably caused by dierences between the two surveys in eorts made to reach all sample members. Sample members of the MM experiment who choose to participate by CAWI but did not respond were not followed up by a CAPI indeed. This inaccuracy in sample design might explain the dierence in response rates.
3.3. An illustration with the ESS data
57
On the other hand, a comparison of the realized samples of the MM experiment and ESS round 4 on several socio-demographical variables (age
× sex, urbanization,
household size, education) and only corrected by the design weights did not show any signicant dierence (tables not included). So, this can be used as an argument enforcing the representativity assumption. Nevertheless, we corrected for the small remaining dierences using normalized propensity scores weights derived from the complete set of variables mentioned above (Rosenbaum & Rubin, 1983; Sato & Matsuyama, 2003).
As a consequence, both datasets are comparable on these
socio-demographical characteristics.
3.3.3 Variables In this illustration we will separately analyze three politics-related variables: political interest, perceived political complexity, and voter turnout. Respondents were asked how interested they are in politics and could choose one out of four answer categories: (1) not at all interested, (2) hardly interested, (3) quite interested, and (4) very interested. Subsequently, respondents were asked how often politics seems so complicated that they cannot really understand what is going on. Five possible answers were oered: (1) never, (2) seldom, (3) occasionally, (4) regularly, and (5) frequently.
Further, the respondents were asked whether they voted in the last
Dutch national election in November 2006, yes (1) or no (2). In the CAPI mode all answer categories were read out to the respondent by the interviewer in the right order (reversed order as mentioned above for political interest), excluding don't know-categories. For the political complexity question, the reading was accompanied by a show card with all ve substantial answer categories.
In the CATI mode, the question and the answers were read out to
the respondent analogous to CAPI but no show cards were used. In the CAWI mode the questions were shown using the very same wording and order of answer categories.
If the respondent tried to skip a question, however, a don't know-
answer appeared at the bottom of the answer list. The respondent was obliged to select one answer. All of the three variables are expected to be susceptible to mode eects. First, political interest may be aected by a measurement eect because it is seen as a
3. Evaluating Mode Effects in MM Surveys
58
civic duty (Voogt & Van Kempen, 2002). It has been argued that measurement eects are strongest on questions about such socially desirable behavior (Brick & Lepowski, 2008; Schwarz et al., 1991; Voogt & Saris, 2005; Weisberg, 2005). Because of the present interaction between interviewer and respondent, respondents act by social norms and give cultural acceptable answers in an interview survey. As a consequence, we expect that people tend to over report their interest in face-to-face surveys, while this tendency will occur less frequently in self-reported questionnaires (Aquilino, 1994; Bowling, 2005; de Leeuw, 1992; Dillman & Christian, 2005; Dillman, Phelps et al., 2009; Voogt & Saris, 2005; Weisberg, 2005). Perceived complexity of politics and voter turnout is generally highly correlated with political interest (e.g. in the ESS round 4 the correlation between interest en perceived complexity is
−0.433, p < 0.001;
the dierence in interest is
between voters and nonvoters in the ESS round 4,
p < 0.001).
0.673
Highly interested
people generally evaluate politics less complex and voters are usually more interested in politics. So, we expect measurement eects on these variables as well. Second, Voogt and Van Kempen (2002) also argue that non-respondents are usually less interested in politics.
Because the CAPI group of MM experiment
contains a considerable group of non-respondents of the rst phase of the survey, we can expect selection eects on all three variables as well.
We expect that
the CAPI choosers of the MM experiment are less interested in politics, perceive politics as more complicated, and are less likely to have voted in the last election.
3.3.4 Results Table 3.2 summarizes the observed sample proportions and means which are used to calculate the mode eects estimates. The mean perceived political complexity already shows a remarkable trend. If there were no measurement eects, we could expect that the mean in the main ESS data falls between the means of the two MM groups provided that the representativity assumption holds.
Indeed, the
representativity assumption means that the ESS and the mixed mode sample, which is the combination of the two MM groups, present the same population. The data, however, show a dierent trend. The mean political complexity in the
3.3. An illustration with the ESS data
59
main ESS is smaller than in both MM groups which might be explained by modeeects.
3.3.4.1 Political interest In Table 3.3 the reader can nd the estimated measurement eects and selection eects for political interest.
As this table makes clear, signicant measurement
eects can be found for the categories `hardly interested' and `quite interested'. The measurement eect on the category `hardly interested' is positive which means that more respondents will indicate to be `hardly interested' when this question is asked by CAWI or CATI compared to the situation when this question is asked by CAPI. As the measurement on the category `quite interested' is negative, the opposite conclusion can be made. Further, the measurement eect on the mean is negative as well, and this is in line with our expectation that the CAPI mode measures a higher mean political interest compared to a combination of CAWI and CATI. As a consequence the one-sided
p-value
can be used and this turns out to
be signicant as well. So, respondents may report a higher interest in politics in front of an interviewer because this probably is socially desirable behavior (Voogt & Van Kempen, 2002). If the two-sided
p-values
of the selection eects are considered, none of the
selection eects seems to be signicant.
We expected that the CAPI choosers
in MM design were less interested in politics because this group contains more non-respondents of the rst phase of survey (Voogt & Van Kempen, 2002). This means that the selection eect on the mean should be negative, but, as Table 3.3 shows, this expectation is not met. Consequently, we can not conclude that the respondents choosing CAPI in the MM experiment are on average less interested in politics than their CATI or CAWI choosing colleagues because the former group contains more hard-to-reach respondents.
3.3.4.2 Perceived political complexity Table 3.4 summarizes the estimated mode eects on perceived political complexity. Considering the proportions of all answer categories, there is a signicant negative measurement eect on the category `seldom'. So, respondents are more likely to
3. Evaluating Mode Effects in MM Surveys
60
Table 3.2:
Sample proportions MM exp.
CATI/CAWI
CAPI
ESS r4
P(not at all interested)
0.084
0.033
0.067
P(hardly interested)
0.330
0.188
0.224
Political Interest P(quite interested)
0.488
0.679
0.607
P(very interested)
0.098
0.100
0.101
mean
2.600
2.846
2.743
P(never)
0.113
0.007
0.082
P(seldom)
0.171
0.136
0.269
P(occasionally)
0.379
0.518
0.355
P(regularly)
0.236
0.297
0.208
P(frequently)
0.102
0.042
0.085
mean
3.043
3.231
2.947
P(voted)
0.857
0.826
0.854
Political complexity
Voter turnout P(M=1)
0.255
Table 3.3: MEASUREMENT EFFECT P(not at all interested) P(hardly interested) P(quite interested) P(very interested) mean SELECTION EFFECT P(not at all interested) P(hardly interested) P(quite interested) P(very interested) mean
Mode eects on political interest p
p
eect
SE(eect)
two side
one side
am
ac
0.005 0.093 -0.094 -0.004 -0.107
0.021 0.037 0.041 0.025 0.062
0.823 0.012 0.023 0.877 0.086
0.412 0.006 0.012 0.439 0.043
0.118 0.368 0.439 0.160 0.998
0.113 0.313 0.430 0.164 0.951
-0.046 -0.049 0.097 -0.002 0.139
0.028 0.060 0.072 0.046 0.098
0.100 0.420 0.178 0.964 0.154
0.050 0.210 0.089 0.482 0.077
0.224 1.080 1.542 0.635 2.797
0.113 0.313 0.430 0.164 0.951
3.3. An illustration with the ESS data
61
consider politics seldom complex when they answer this question by CAPI compared to the situation when they answer by CATI or CAWI. Further the selection eects on `never' and `seldom' are signicantly negative, and on `occasionally' signicantly positive.
So, respondents choosing the CAPI mode, are less likely to
never or seldom but more likely to occasionally nd politics too complex than respondents choosing CATI or CAWI. Because we expected the CAPI mode to measure a lower perceived political complexity compared to the CATI/CAWI combination, the measurement eect on the mean should be positive, which is conrmed by the data. Moreover, the onesided
p-value
shows that this measurement eect is signicant.
So, respondents
tend to report that they better understand politics when they are surveyed by a personal FTF interview. This observation might be explained by social desirability bias. The sign of the selection eect on the mean comes up to our expectations as well, because a positive selection eect means that the CAPI choosers evaluate politics as more complex. This selection eect is signicant as well which conrms our hypothesis.
3.3.4.3 Voter turnout Table 3.5 summarizes the sample proportions and the estimated mode eects of the variable voter turnout.
Since this variable has only two answer categories,
measurement eects and selection eects are complementary for both probabilities (did vote or did not vote) and the mean. No measurement eect or selection eect signicantly dierent from zero can be noticed. As a consequence, a combination of CATI and CAWI as survey modes does not seem to result in a dierent estimation of the probability of voting compared to a survey totally conducted by CAPI. Analogously, a dierence in voting behavior between CAPI choosers and CATI/CAWI choosers is not conrmed either.
3.3.4.4 Sample size calculation Let us now illustrate how to calculate the required sample sizes of the ESS example to detect small to moderate values of the mode eects with a specic power. For this paper, we restrict our example to the means of the three politics-related
3. Evaluating Mode Effects in MM Surveys
62
Table 3.4:
Mode eects on perceived political complexity
MEASUREMENT EFFECT P(never) P(seldom) P(occasionally) P(regularly) P(frequently) mean SELECTION EFFECT P(never) P(seldom) P(occasionally) P(regularly) P(frequently) mean
p
SE(eect)
two side
one side
am
ac
0.005 -0.144 0.080 0.058 0.002 0.194
0.023 0.033 0.042 0.036 0.024 0.089
0.815 0.000 0.056 0.112 0.945 0.029
0.408 0.000 0.028 0.056 0.473 0.015
0.141 0.255 0.447 0.342 0.142 2.009
0.135 0.354 0.413 0.297 0.141 2.059
-0.100 -0.179 0.218 0.118 -0.058 0.382
0.017 0.054 0.077 0.070 0.032 0.121
0.000 0.001 0.005 0.090 0.070 0.002
0.000 0.001 0.003 0.045 0.035 0.001
0.054 0.841 1.781 1.479 0.288 4.134
0.135 0.354 0.413 0.297 0.141 2.059
Table 3.5: MEASUREMENT EFFECT P(voted) SELECTION EFFECT P(voted)
p
eect
Mode eects on voter turnout p
p
eect
SE(eect)
two side
one side
am
ac
-0.006
0.030
0.835
0.418
0.231
0.225
-0.037
0.058
0.523
0.262
1.016
0.225
3.3. An illustration with the ESS data
63
variables. Let us assume that we like to detect a small mode eect equal to 0.05 times the range of the variables, with a power of .80 and a signicance level of .95 (one-sided). The sample estimates of
ac
and
am
can be found in Tables 3.3, 3.4,
and 3.5. Using the rst strategy, we x the sample size of the comparative group and manipulate the sample size of the MM experiment. This would be useful in the ESS because the MM experiment has been conducted additional to the main ESS data collection. We x ESS round 4.
nc
at 1294, which is the achieved sample size of the main
The calculated required sample sizes
nm
for this strategy can be
found in the last-but-one column of Table 3.6. The realized sample in the mixed mode experiment contains of 352 respondents. As the results show, this sample size was only sucient to detect small measurement eects on the variable political interest. Other small mode eects on the means of the three variables of interest would not be detected with such a small sample size in the experiment.
Some
nm 's even mount up to approximately 1000 which means that the MM experiment should include a rather large sample to detect a small mode eect.
Further, it
should be noted that it is impossible to detect a selection eect of 0.05 on voter
nm . This results from the (= ac /nc ) is already larger
turnout with a power of .80 for any possible
fact that
the variance introduced by the main ESS
than the
maximum acceptable variance of the selection eect. In the second strategy we x
λ
at 0.214 which is the contribution of
the total sample size of the ESS round 4 and the MM experiment.
nm
to
The results
of required total sample size can be found in the last column of Table 3.6. These results show that a total sample size of approximately 2300 respondents allows for detecting small mode eects with a power of .80, except for the mode eect on voter turnout. With respect to the latter, the total sample size should be almost 6000. The actual total sample size, however, is only 1646 which means that the realized ESS and mixed mode experiment sample can only detect a signicant small measurement eect on the variable political interest. The sample size calculations in this section lead to the conclusion that the realized sample sizes of the ESS, the mixed mode sample or both, are mostly too small to detect small mode eects except for the measurement eect on political interest. With a sample size of 650 in stead of 352 in the mixed mode experiment,
3. Evaluating Mode Effects in MM Surveys
64
Required sample sizes to detect moderate mode eects with power=.80 and signicance level=.05 Table 3.6:
minimal eect
nm ∗
n◦
meas. e.
0.15
332
1572
sel. e.
0.15
644
2201
Variable
eect
pol. intr. pol. comp. vote
meas. e.
0.2
419
1884
sel. e.
0.2
629
2302
meas. e.
0.05
998
3331
sel. e.
0.05
N.A.
5801
: keeping n constant : keeping λ constant N.A.: impossible to estimate ∗
c
◦
for example, small mode eects on political interest and perceived complexity could have been detected. With respect to voter turn out, however, the sample sizes should be unreasonably large to be able to detect small mode eects.
3.4 Discussion The purpose of this article is to illustrate how two dierent types of mode effects, i.e. selection eects and measurement eects, can be disentangled within a MM survey context. This kind of evaluation is quasi impossible if only a simple MM survey dataset is available but we showed that the presence of data from a single-mode comparative survey allows investigating selection eects and measurement eects separately. However, this evaluation of mode eects relies on some assumptions which need further discussion. The rst and probably most stringent assumption is the representativity assumption which has already been discussed in Section 3.2.1. means that
systematic
This assumption
coverage and non-response error should be equal in both
the MM sample and the comparative sample. The more this assumption is violated, the more the mode eect estimates would probably be biased. The magnitude of this bias depends on the correlation between the variable of interest and, what we call, the survey acceptance patterns of the sample members.
3.4. Discussion
65
With the survey acceptance pattern, we refer to the willingness of a respondent to participate in both the mixed mode survey and the comparative survey. The larger the group of sample members who only would participate in one of the survey designs, the less both samples probably represent the same population and the more bias the method can introduce on the mode eects estimates. The magnitude of the bias then depends on the extend to which this group of sample members dier from the sample members who would participate in both surveys. Put dierently, the larger the correlation between the survey acceptance pattern and the variable of interest, the larger the bias on the mode eect estimates. In short, the bias on the mode eect estimates thus depends on the survey acceptance patterns and the correlation with the variable of interest. Future research may include a sensitivity analysis to the robustness of the mode eects estimates to uctuations in these patterns and the correlations. Second, our method also implies that measurement error and bias for mode A is equal in both the MM sample and the comparative sample. The particular survey design of both samples might, however, enhance dierences in measurement. Nevertheless, we expect this assumption to be less stringent compared to the representativity assumption. Further, our method has two limitations as well.
The rst limitation of the
method refers to the denition of the measurement eect.
We calculated the
measurement eect by the dierence between the statistics obtained in mode A and mode B respectively but only for the respondents who choose mode B in a MM design. So, these eects are not calculated on the whole sample, but only on a part of the sample. The question is whether these measurement eects can be generalized to the respondents who choose mode A in the MM survey. Second, the method we oered works ne if there is only one comparative dataset and the MM data is gathered by only two modes. When the MM sample includes more than two modes (say mode A, B, and C), as in the ESS data, additional comparative samples and assumptions are required. Otherwise mode B and C are completely confounded in the conclusions. Such an additional sample should include two of the three modes, for example A and B, so that mode eects between A and B can be estimated exactly. However, in order to estimate the exact mode eects, the researcher must assume in that situation that the respondents of
3. Evaluating Mode Effects in MM Surveys
66
mode B and C in the triple-mode sample are comparable to the mode B-choosers in the double-mode sample. This assumption might obstruct the validity of modeeects estimates. To conclude, we like to make a suggestion for future surveys. Our method is applicable as soon as a mixed-mode and a comparative sample are available with strong signs that the comparability assumption is valid or only slightly violated. This means that the MM data may not only be gathered by a concurrent design as in the ESS, but also by a sequential design where the modes are oered one after each other to the sample members. Such a sequential design can start with a cheap mode B to lower costs as much as possible and a follow-up can be organized in mode A to reduce non-response as much as possible. Parallel to this MM data collection a small comparative sample can be drawn and surveyed completely in mode A. Such an extended MM design allows for evaluating the mode eects, even though costs are reduced and non-response is probably lowered.
An additional
advantage of such a design is that the implementation of mode B can be organized with a primary focus on lowering measurement error while non-response is only a secondary concern. The implementation of mode A, in contrast, should focus on non-response reduction while measurement error is of secondary concern. To guarantee the validity of the representativity assumption, a considerable time gap between the initial and the follow-up phase in he mixed mode sample may help countering the inuence of a refusal in the initial phase on participation in the follow-up.
Chapter 4 A Method to Evaluate Mode Eects on the Mean and Variance of a Continuous Variable in Mixed-Mode Surveys Jorre T. A. Vannieuwenhuyze, Geert Loosveldt, & Geert Molenberghs Published in
International Statistical Review, 80(2), 306322, in 2012.
Mixed-mode surveys, in which dierent respondents complete the survey by dierent modes, become increasingly popular. That said, such surveys may lead to a confounding of two forms of mode eects, i.e. selection eects and measurement eects. Exclusive focus on mixedmode data almost precludes disentangling both eects. In this paper, we show how this problem can be circumvented merely by comparing mixed-mode data with a comparable single-mode sample. The proposed method allows estimating the mode eects on the mean and variance of a continuous variable. As an illustration, the authors estimate mode eects on six items on opinions about surveys. mixed-mode, selection eect, measurement eect, mode eect, opinion about surveys Abstract:
Keywords:
4.1 Introduction Increasingly, survey data are gathered by mixed-mode designs in which dierent respondents complete the survey by dierent survey modes such as CAPI, CATI, mail questionnaires, or web questionnaires (de Leeuw, 2005; Dillman, Smyth &
4. Mode Effects on Means and Variances
68
Christian, 2009; Voogt & Saris, 2005). The aim of such mixed-mode survey designs is to reduce nonresponse and nonresponse bias or to lower the total survey costs. Nevertheless, a gain in data quality compared to single-mode designs cannot be guaranteed because of the occurrence of distinct
mode eects
. A mode eect refers
to the impact of the mode on the data quality. Two forms of mode eects can be distinguished: selection eects and measurement eects.
Selection eects
occur when the respondents of dierent modes dier on the
variables of interest. Their occurrence are not a problem, quite on the contrary. Stemming from selection eects, nonresponse error can be reduced because some respondents would respond in a mixed-mode survey while they would not (or less) in single-mode surveys, and costs can be lowered because some respondents would accept cheap modes (Biemer, 2001; de Leeuw, 2005; Dillman, Phelps et al., 2009). As a consequence, to evaluate the advantage of using mixed-mode designs, selection eects should be estimated. If selection eects are absent, a single-mode design exists providing the same data quality probably at lower costs.
Measurement eects
refer to dierences in measurement error accompanying
the dierent survey modes (Voogt & Saris, 2005; Weisberg, 2005). Put dierently, because of measurement eects, the same respondents possibly provide dierent answers to the same questions in dierent modes. Examples of measurement effects are social desirability error in interviewer modes, recency eects in telephone surveys, or satiscing in self-administered modes (see, among others, de Leeuw, 1992; Dillman, Smyth & Christian, 2009; Schwarz et al., 1991). The occurrence of measurement eects means that mixed-mode data cannot simply be compared with single-mode data unless the data are adjusted for these measurement eects. It is clear that it is worth to estimate both the selection eects and measurement eects and to make inferences about them. Nevertheless, within mixed-mode data, it is almost impossible to estimate selection eects and measurement eects because they are completely confounded (de Leeuw, 1992; Weisberg, 2005).
In-
deed, dierences between the groups of respondents can be caused by dierences between these respondents (i.e., a selection eect) or by dierences in measurement (i.e., measurement eects). Existing research reported in the literature usually tries to disentangle mode effects by calibrating mode groups on a set of
mode-insensitive
variables (e.g., among
4.1. Introduction
69
others, Heerwegh & Loosveldt, 2011; Jäckle et al., 2010; Hayashi, 2007; Fricker et al., 2005; Holbrook et al., 2003; Greeneld et al., 2000) using techniques from the non-response and causal inference literature (Rubin, 1976; Little & Rubin, 2002; Schafer & Graham, 2002; Morgan & Winship, 2009; Pearl, 2009). These techniques include matching, weighting, or the inclusion of the mode-insensitive variables in the analysis model.
In order to get unbiased mode eects estimates, however,
these techniques assume independence between the mode group allocation and the variables of interest when controlling for these mode-insensitive variables. If not, part of the selection eect is not removed by the mode-insensitive variables and it remains unclear to what extent dierences between the mode groups are caused by selection or measurement eects. Nonetheless, this assumption is problematic because it is usually dicult to nd a convenient set of mode-insensitive variables in practice. As a consequence, existing research generally failed to uncover both forms of mode eects. This paper starts from the idea to enrich the available information to assess mode eects by extending mixed-mode survey data with comparable single-mode data.
Within such a setting, Vannieuwenhuyze et al. (2010) already illustrated
how mode eects on proportions and mean parameters of univariate multinomial distributed variables can be disentangled. Under certain circumstances, their model probably involves milder assumptions compared to the methods using modeinsensitive variables. In this paper, we extend the technique of Vannieuwenhuyze et al. to the mean and variance of a continuously distributed variable. We start this paper with an overview of possible formalizations of both mode eects in Section 4.2.
Section 4.3 discusses how to measure mode eects with
survey data and how to make inferences on the mode eects. Additionally, this section provides tools to calculate minimal sample sizes to detect mode eects with a certain power, and to calculate the required sample sizes to minimize total survey costs. Finally, Section 4.4 provides an illustration of the methods using real survey data from a survey on opinions about surveys.
4. Mode Effects on Means and Variances
70
4.2 Dening the mode eects Let us assume that we have a mixed-mode dataset where data of some respondents
a while the data of the other respondents is obtained by mode by Y the continuous variable of interest. As such, two versions
is obtained by mode
b.
Let us denote
of
Y
can be distinguished, depending on the mode used to administer the data.
Ya
Let
refer to the values of
Y
when these are administered by mode
refers to the data collected by mode values on both
Ya
Yb
and
b.
a,
while
Yb
All population members theoretically take
and these values are not necessarily equal because of
possible measurement eects. Of course, in a mixed-mode dataset, either
Ya
or
Yb
is observed for each respondent. Let further
G
represent the mode group allocation of a population member
when he was a sample member of the mixed-mode sample (ignoring the possib-
G
ility of nonresponse).
thus is a binary variable with possible values
This variable follows a Bernoulli distribution with parameter
τa ,
a
and
b.
the proportion of
a. parameter θ can
population members who would complete the mixed-mode survey by mode Selection eects and measurement eects on a distributional now be dened in several ways. The selection eect
S(θ)
can be dened as the
θ's in both groups of respondents, namely those responding responding by mode b respectively. This denition leads to
dierence between the by mode
a
and those
two relevant quantities:
Sa (θ) = (θa |G = a) − (θa |G = b) ,
(4.1a)
Sb (θ) = (θb |G = a) − (θb |G = b) ,
(4.1b)
and
θa and θb describe the distribution of Ya and Yb respectively. Both quantities reect the dierence between both mode groups a and b, but dier with respect where
to the mode of measurement. It is up to the researcher to decide whether to use
Ya
or
Yb
according to his beliefs about the measurement error accompanying both
survey modes.
θ can be dened as the dierence in this parameter between the two versions of Y , Ya and Yb respectively: The marginal measurement eect
M (θ)
on parameter
4.3. A method to evaluate the mode effects
71
M (θ) = θa − θb .
(4.2a)
However, sometimes the researcher might only be interested in a measurement effect conditional on the mode group allocation
G rather than the marginal measure-
ment eect. This can be the case when one mode is taken as the standard because the researcher, for example, believes that this mode does not involve measurement error. If mode
b
is the standard mode, the researcher might only be interested in
the measurement eect for the population members choosing mode
a:
Ma (θ) = (θa |G = a) − (θb |G = a) .
(4.2b)
Analogously, the measurement can also be dened conditional on mode
Mb (θ) = (θa |G = b) − (θb |G = b) .
b: (4.2c)
It can further be shown that the marginal measurement eect is a convex combination of the conditional measurement eects:
M (θ) = τa · Ma (θ) + (1 − τa ) · Mb (θ)
4.3 A method to evaluate the mode eects on the mean and variance of a continuous variable 4.3.1 Calculation of the mode eects An ordinary mixed-mode sample hardly allows estimating any of the mode eects of the previous section without stringent assumptions.
Indeed, these denitions
of the mode eects require estimating the probability distribution of
Yb |G = a
Ya |G = b
or
but these variables are counterfactual events within mixed-mode data
(Pearl, 2009; Morgan & Winship, 2009). However, if an independent comparable single-mode dataset is present next to the mixed-mode data, say completely collected by mode
a (for an example see Vannieuwenhuyze et al., 2010, or the illustration
in section 4.4), the selection eect dened in (4.1a) and the measurement eect
4. Mode Effects on Means and Variances
72
dened in (4.2c) can be estimated under the assumptions of the next paragraph (Vannieuwenhuyze et al., 2010). This single-mode dataset is called the
sample
comparative
because it is compared with the mixed-mode sample.
A comparison of a mixed-mode dataset with a single-mode dataset starts from two assumptions.
First, comparing both datasets assumes that they contain a
rep-
comparable set of respondents. Vannieuwenhuyze et al. (2010) call this the
resentativity assumption
same population.
because it means that both datasets should represent the
The validity of this assumption is often hard to check but in
some situations reasonable to accept (e.g., see the illustration). Second, a comparison also assumes that measurement error accompanying the modes is equal in both datasets. More specic, the measurement error of mode
a
should be stable
among the datasets which means that exact the same variables are observed. As soon as these assumptions are met (possibly after design corrections), the method outlined in the next paragraphs is applicable. Let us denote by pa|a (y), def (Ya |G = a), Ya|b = (Ya |G = sequence,
Ya
def
pa|b (y), and pb|b (y) the density functions of Ya|a = def b), and Yb|b = (Yb |G = b), respectively. As a con-
follows a mixture distribution with two components of the previ-
ously dened probability functions (Frühwirth-Schnatter, 2006; McLachlan & Peel, 2000):
pa (y) = τa · pa|a (y) + (1 − τa ) · pa|b (y). Such a mixture distribution has the nice feature that the mean
σa2
of
and
Ya
Ya|b
µa
and the variance
can easily be expressed as functions of the means and variances of
Ya|a
(Frühwirth-Schnatter, 2006):
µa = τa µa|a + (1 − τa )µa|b
(4.3)
2 2 ) + (1 − τa )(µ2a|b + σa|b ) − µ2a . σa2 =τa (µ2a|a + σa|a
(4.4)
and
4.3. A method to evaluate the mode effects
73
Using (4.3) and after some algebra, selection eect (4.1a) on the mean can now be expressed as
Sa (µ) = µa|a − µa|b 1 µ − µ , = 1−τ a a|a a
(4.5)
while measurement eect (4.2c) on the mean can be expressed as
Mb (µ) = µa|b − µb|b 1 τa = 1−τ µa − 1−τ µa|a − µb|b . a a
(4.6)
The mode eects on the variance, on the other hand, can be expressed as functions of the distribution parameters as well by using both (4.3) and (4.4):
2 2 Sa (σ 2 ) = σa|a − σa|b 2 = σa|a −
1 (µ2 + σa2 ) 1−τa a 2 µ2a|a + σa|a
τa + 1−τ a 1 µa − + 1−τ a
τa µ 1−τa a|a
(4.7)
2
,
and
2 2 − σb|b Mb (σ 2 ) = σa|b
=
1 2 (µ2a + σa2 ) − σb|b 1−τa τa 2 − 1−τ µ2a|a + σa|a a
−
1 µ 1−τa a
−
τa µ 1−τa a|a
(4.8)
2
.
All expressions on the right hand side of (4.5), (4.6), (4.7), and (4.8) can be estimated directly from the data:
• τa
can be estimated by the proportion of respondents allocated to mode
within the mixed-mode sample, denoted by
• µa|a 2 σ ˆa|a •
and
2 σa|a
tˆa .
can be estimated by the sample mean
of the respondents allocated to mode
Likewise, the sample estimates
y¯b|b
and
a
2 σ ˆb|b
a
y¯a|a
and sample variance
in the mixed-mode sample.
of
µb|b
and
2 σb|b
can be obtained
from the mixed-mode sample, though now using the respondents allocated to mode
b.
4. Mode Effects on Means and Variances
74
σa2 , nally, can be estimated by the sample mean y¯a and sample variance σ ˆa2 of the comparative sample because these data are solely collected by mode a.
• µa
and
4.3.2 Inferences for the mode eects Let us now denote by
γ
the column vector containing all the necessary parameters
to calculate the mode eects in (4.5), (4.6), (4.7), and (4.8), i.e.,
2 2 , τa )0 , , µb|b , σb|b γ = (µa , σa2 , µa|a , σa|a
and let
γ ˆ
contain their sample estimates. Relying on large-sample theory and the
central limit theorem, the asymptotic distribution of covariance matrix
Σ,
γ ˆ
is normal with mean
γ
and
of which the elements can be derived under simple random
sampling as follows (Lehmann, 2001; Agresti, 2002; Cramér, 1971). The sampling
y¯∗ is σ∗2 /n and the sampling variance of a variance estimate σ ˆ∗2 is υ∗2 /n where υ∗2 = var [(Y∗ − µ∗ )2 ]. The sampling covariance of y¯∗ and σ ˆ∗2 is (n − 1)µ3,∗ /n2 where µ3,∗ is the third central moment of Y∗ (Cramér, 1971; Espejo & Singh, 1999). This covariance is approximately equal to µ3,∗ /n if n is suciently large. As a result, the following asymptotic sampling distributions variance of a mean estimate
are derived:
" # y¯a → N σ ˆa2 " # y¯a|a → N 2 σ ˆa|a " # y¯b|b → N 2 σ ˆb|b where
nm
and
nc
#! " # " 2 µ µa σ 3,a a , , n−1 c 2 σa µ3,a υa2 # " #! " 2 σ µ µa|a 3,a|a a|a , , (τa nm )−1 2 2 σa|a µ3,a|a υa|a " # " #! 2 σ µ µb|b 3,b|b b|b , ((1 − τa )nm )−1 , 2 2 σb|b µ3,b|b υb|b
denote the sample sizes of the mixed-mode sample and the
comparative sample respectively. Further the asymptotic sampling distribution of
tˆa
is (Agresti, 2002)
tˆa → N τa , n−1 m τa (1 − τa ) .
4.3. A method to evaluate the mode effects
75
The remaining sampling covariances between the elements of
•
The covariance between
(¯ ya , σ ˆa2 )
γ ˆ
are zero:
and the other sample parameters is zero
because these estimates come from two dierent and independent random samples, the comparative and the mixed-mode sample respectively.
2 2 ) because these sample estimates refer ) is independent from (¯ yb|b , σ ˆb|b • (¯ ya|a , σ ˆa|a to two dierent subpopulations of the mixed-mode sample. Assuming that the value of
G
is xed for each population member, the mixed-mode sample
in fact contains two random samples for both groups of respondents. Further,
ˆa , the proportional size of these sample estimates are also independent from t these subpopulations.
0
(Sa (µ), Mb (µ), Sa (σ 2 ), Mb (σ 2 )) can be written as a column vector function g(γ) of γ . Let now ∆ denote the Jacobian matrix of this vector 0 function, containing the partial derivatives ∂g(t)/∂t evaluated at t = γ (The exact derivatives can be found in the appendix). The mode eects g(γ) can now be estimated by g(ˆ γ ) and the Delta method (Agresti, 2002; Casella & Berger, 2002; The mode eects
Lehmann, 2001) shows that these mode eects estimates follow an asymptotic normal distribution of which the variance can be approximated as follows:
g(ˆ γ ) → N (g(γ), ∆Σ∆0 ) .
(4.9)
The result of (4.9) allows us to make approximate inferences about all four mode eects as dened in (4.5), (4.6), (4.7), and (4.8).
4.3.3 Power analysis for mode eects An additional question which should be asked when evaluating the mode-eects is whether the total sample size is suciently large to detect small to medium mode eect sizes.
Let
minimal power
z -value
β
ϑ
denote the size of the mode eect we want to detect with a
given that we use a signicance level
√ z = ϑ/ σ 2 ,
α. ϑ
corresponds with a
(4.10)
4. Mode Effects on Means and Variances
76
σ 2 is the sampling variance of the mode eect estimate. this z -value should at least be equal to
where of
The absolute value
|z| ≥ Φ−1 (1 − α) + Φ−1 (β), where
Φ−1
is the inverse cumulative normal function.
(4.11)
Φ−1 (1 − α)
corresponds to
α. Φ−1 (β) is the dierence between this minimal z -value and the required z -value of ϑ so that ϑ is detected with a minimal power β . 2 The value of z now depends on the sample sizes through σ , which in turn depends on the sample sizes through Σ but not through ∆. Let us now rst dene the 7 × 7 covariance matrices Σc,obs and Σm,obs :
the minimal
z -value
to detect a signicant eect with signicance level
"
Σc,obs and
"
Σm,obs where
# 12,2 02,5 = nc · Σ ◦ , 05,2 05,5
◦
# 02,2 02,5 = nm · Σ ◦ , 05,2 15,5
denotes the element-wise matrix multiplication, and
rix of numbers
n
of dimension
i × j.
ni,j
denotes a mat-
The four upper-left elements of
Σc,obs
in
fact contain the sampling covariances of the parameter estimates of the comparative sample uncorrected for the the sample size
nc .
Likewise,
Σm,obs
contains the
sampling covariances of the parameter estimates of the mixed-mode sample uncor-
nm .
rected for
Using the properties of the Delta method (Agresti, 2002; Casella &
Berger, 2002; Lehmann, 2001), it can be shown that all the sampling variances of the mode eects are of the form
σ2 = where the
0
∆Σm,obs ∆
ac 's
and
am 's
am ac + , nc nm
(4.12)
are provided by the diagonal elements of
respectively. In this equation
ac /nc
and
am /nm
∆Σc,obs ∆0
and
represent the contribu-
tion of, respectively, the comparative and the mixed-mode sample to the sampling
4.3. A method to evaluate the mode effects variance of the mode eects estimates while
77
ac and am do not depend on the sample
sizes. Given the estimation of
ac
and
am
from the data, substituting (4.12) into
(4.11) allows calculating the minimal required sample sizes to achieve a decent power given the critical signicance level:
ac nc
−1 2 ϑ2 −1 am ≥ Φ (1 − α) + Φ (β) + nm
(4.13)
Because the total sample includes two independent samples, two strategies can be used. In the rst strategy, the sample size of the mixed-mode sample, the comparative sample,
nc ,
nm ,
or
is held constant and the required sample size of the
other sample is calculated by rearranging the terms in (4.13). The second strategy involves keeping the ratio of both as functions of the overall total
0 < λ < 1. λ
nm and nc constant, so that they can be expressed sample size: nm = λn and nc = (1 − λ)n where
refers to the proportion of the total sample size which is assigned
to the mixed-mode design and the minimal
n
can once again be calculated by
rearranging the terms in (4.13).
4.3.4 Minimizing costs The second strategy of the previous subsection involves estimating the required minimal total sample size by xing the proportion
λ
of the total sample size as-
signed to the mixed-mode sample. If the relative costs of both modes known, however, a
λmin
a
and
b
are
can be chosen in order to minimize the total cost of the
complete survey. The total cost
T
is equal to
T = T0 + ca nc + [τa ca + (1 − τa ) cb ] nm = T0 + ca (n − nm ) + [τa ca + (1 − τa ) cb ] nm , where
T0
refers to the xed costs including the development and implementation
of the questionnaire in both modes, interview by mode
b.
(4.14)
a, and cb
ca
is the marginal cost of conducting one
the marginal cost of one completed interview in mode
4. Mode Effects on Means and Variances
78
The total cost is minimal if the total derivative
dT / dnm
is equal to zero. This
total derivative takes into account the implicit relation between
n
and
nm
given
by (4.13), and is equal to
∂T ∂T dn dT = + = 0. dnm ∂nm ∂n dnm
(4.15)
Replacing the `greater than' sign by an equal sign in (4.13), we get
am ac + + n − nm nm and thus
dn =− dnm
∂F ∂nm
constant
= F (n, nm ) = 0,
a =1− m ∂F ac ∂n
n − nm nm
2 .
Substituting (4.16) into (4.15) and after some algebra, we get that where
" λmin = 1 +
s
ac am
#−1 cb τa + (1 − τa ) . ca
(4.16)
nm = λmin n,
(4.17)
4.4 An example with a survey on opinions about surveys 4.4.1 Data In this section we will illustrate the methods to evaluate mode eects with data from a survey about opinions about surveys, organized in 2004 in Flanders, Belgium by the Centre of Survey Methodology of the Katholieke Universiteit Leuven (Leuven University). This survey included a sequential mixed-mode design starting with a mail questionnaire in an initial phase but using a face-to-face (FTF) interview for follow-up (Storms & Loosveldt, 2005). A mail survey is considered a cost eective and simple way to collect data from a large sample while the more expensive FTF survey can push up the response rate of the initial mail phase (de Leeuw, 2008). As a consequence, this design should meet the main objective of
4.4. Example
79
the survey which was to collect data from as many sample units as possible at the lowest price. A simple random sample of 960 people aged between 18 and 80 was drawn from the Flemish population using the national register, and the mail questionnaire was introduced to this sample as a survey about polls in Flanders. Because the use of cash incentives was not allowed, a gift voucher of
e5 for returning the questionnaire
was used as an incentive. The survey started on October 18, 2004, a rst reminder was sent by mail two weeks later, and a second reminder accompanied by a new questionnaire was sent 4 weeks after the rst reminder.
The whole mail phase
lasted two months. Afterwards, nonrespondents of the mail phase were recontacted at home for a FTF interview.
Yet, some of these nonrespondents lled in the
mail questionnaire during the follow-up phase instead of being interviewed. These respondents will also be assigned to the mail group for the remainder of this paper. Besides the mixed-mode sample, however, an additional simple random sample of 240 people was drawn and only contacted to participate in a FTF interview. The survey questionnaire and procedure for this sample was entirely equal to the second phase of the mixed-mode group. The only dierence with the mixed-mode sample is that respondents did not get the mail questionnaire rst. As a consequence, this additional sample can act as the comparative dataset. A FTF interview then is equal to mode
a in our method,
while the mail questionnaire refers to mode
b.
An
overview of the complete sample design can be found in gure 4.1. For simplicity, we will restrict this illustration to those respondents who responded to all variables listed in the next subsections. Further, we compared both datasets with the population on the combined dis-
1
tribution of gender and age-categories (age categories: 18-27,28-37,38-47,48-57,5867 and 68-80), but we did not nd signicant dierences (MM data: DF=11,
p=.120;
comparative data:
χ2 =6.844,
DF=11,
p=.812).
χ2 =16.614,
This indicates
that both datasets describe the population well with respect to the distribution of these two basic characteristics. Nevertheless, we calculated post-stratication weights using the ratio of the sample and population proportions of sex
×
age,
and used these in all further analyses.
1 Population
data retrieved on 12/05/2010 from the Eurostat Website: http://epp.eurostat
.ec.europa.eu/portal/page/portal/population/data/database
4. Mode Effects on Means and Variances
80
initial phase
follow-up phase
Mail: full response
Mixed-Mode Sample N=960
A@ A@ A @ R A @ A A AA U
N=447 Mail: partial response N=74 Mail: non response N=426 Mail: not eligible N=13
Mail: full response
@ AB BA@ BA @ R BA @ BA B A B AA U B B B BN
N=27 Mail: partial response N=7 FTF: full response N=124 FTF: partial response N=8 FTF: non response N=211 FTF: not eligible N=49
FTF: full response
Comparative Sample N=240
A@ A@ A @ R A @ A A AA U
N=155 FTF: partial response N=7 FTF: non response N=61 FTF: not eligible N=17
Figure 4.1:
Survey design with response frequencies
4.4. Example
81
4.4.2 Checking the representativity assumption The method discussed in the previous section involves comparing the mixed-mode with the comparative data. However, as we already mentioned, this comparison involves the representativity assumption which means that both datasets represent the same population with respect to the variable of interest.
However, this
assumption can not be tested for the same reason the Missing At Random (MAR) assumption is untestable within missing data-analysis (Rubin, 1976; Little & Rubin, 2002; Schafer & Graham, 2002).
Indeed, testing this assumption needs a
comparison of the distributions of either information is partially missing because spondent.
Ya
Yb on the either Ya or Yb or
entire samples, but this is observed for each re-
Still, two arguments can be put forward towards the validity of the
representativity assumption: a comparison of the response rates and a comparison of the samples on a set of mode-insensitive variables. First, if both samples represent the same population, the response rates should be more or less equal because the same population members would accept participation in both the mixed-mode and the comparative design. Only considering full responses, the initial mail phase of the MM design reached a response rate of 47.20%, but the FTF follow-up pushed up this response rate to 66.59% which is rather high in the context of a general population survey.
The comparative
sample, however, even got a higher response rate, namely 69.51%. The dierence between these response rates is only 2.91 percent points, which is not signicant (S.E.=
.051, p = .567).
So, this can be used as an argument in favour of the repres-
entativity assumption but we should note that this argument does not necessarily hold in the other direction. Equal response rates do not necessarily mean that the samples are comparable. Second, both datasets can also be compared on their composition of a set of
mode-insensitive
variables.
This mode-insensitiveness in fact is an assumption
that respondents will always give the same answer to these variables, regardless of the mode by which they complete the survey. pared the samples on age
×
For this illustration, we com-
gender, educational level (no degree, primary school,
lower secondary, upper secondary, college , or university), ownership of a personal email-address, activity status (full time employees,
>50%
and
18 years), adolescents (between 12 and 18 years) and children (