Proceedings of Q2006 European Conference on Quality in Survey Statistics
An experimental hedonic regression approach to estimate quality adjusted price changes in Italian Consumer Price Survey Alessandro Brunetti, Federico Polidoro, Anna Volpe Rinonapoli
1
Keywords: quality changes, pure price index, hedonic prices.
1. Introduction The issue of managing quality changes in consumer price statistics is crucial. The frequent changes in the characteristics that identify each elementary item selected for the survey of consumer prices, make necessary enhancing the statistical tools to detect the quality changes and estimate price movements. European Regulation and Eurostat approach to Consumer Price Indices stress that the Harmonized Index of Consumer Prices (HICP) is a pure price index that is not influenced by changes in other variables. Therefore the attention by Eurostat to statistical techniques to treat and adjust consumer prices for quality changes has grown in order to assure comparability among different Member States. Starting from U.S. debate arising from Boskin Commission conclusions and the experience of Bureau of Economic Analysis (BEA) and Bureau of Labor Statistics (BLS), the use of hedonic methods to adjust consumer prices for quality changes has gained a growing attention by Eurostat and National Statistical Institutes. For different groups of products Eurostat suggests the use of hedonic prices as the best practice to adjust prices for quality changes. In Italy the practices adopted by Istat to deal with the issue of quality changes in consumer prices are related, with minor exceptions, to the so-called “matched models approach”. The aim of this paper is investigating, in empirical terms, the results obtained with the adoption of hedonic regression approach to quality changes. Attention will be focused on prices collected for products belonging to a specific groups (durables) and in particular to clothes washing machines. Preliminary results of the use of the hedonic regressions for consumer prices of clothes washing machines, will be analysed taking into account evidences coming from the data collected. Problems that are related to the practical applicability of the hedonic methods are also investigated.
1
A. Brunetti, Italian National Institute of Statistics (Istat), Via Torino 6, Rome 00184 Italy (
[email protected]); F. Polidoro, Istat, Via Torino 6, Rome 00184 Italy (
[email protected]); A. Volpe Rinonapoli, Istat, Via Torino 6, Rome 00184 Italy (
[email protected])
2. The issue of quality adjustment (QA) and main approaches to QA The Consumer Price Index intends to measure the effects of price changes holding constant other economic factors, but the frequent changes in the characteristics that identify each elementary item included in the basket of consumer prices, due to innovation and qualitative improvements, make necessary enhancing the statistical tools to detect the quality changes and estimate the pure price movements. QA is the process by which prices are adjusted to account for changes in product quality. In the HICP manual (Eurostat, 2001) and in the CPI manual (ILO et al., 2004), it is possible to distinguish three main types of quality adjustment procedures: Direct comparison (the analyst chooses the new product that most closely matches the disappearing one, no quality change is observed and no adjustment is made), Explicit methods (e.g. package size adjustment, judgmental QA, hedonic regression, option pricing etc.; the value of the quality change is estimated from observed changes in the characteristics of the model) and Implicit methods (e.g. overlap, bridged overlap, class mean imputation etc.; the quality change is estimated from price differences between similar models). The traditional method for controlling quality change, generally used by statistical offices, is known as the “matched model” method. According to Silver (1999) “it attempts to compare ‘like’ with ‘like’ “, that’s to say that, on the assumption that the old and the new model have the same specification (same relevant characteristic) and that they are simultaneously available in the market in two adjacent time periods, it commonly assumes that there is no quality change or that difference in quality is equal to the difference in price between the models. A number of problems connected with the use of this method (samples may be no representative or out of date, it is sensitive to unusual price changes when items appear or disappear, decisions about the replacement depend on the subjective judgment of the statistical analyst) may lead to upward or downward biases (Moulton, 2001). The hedonic quality adjustment can be a valid alternative to the implicit adjustment implied by the matched model in particular to build up price indices for fast changing products (e.g. Information Communication Technology products).
3. Brief review of hedonic methods Even though the first application of hedonic methods dates back to the 1930s, its use with regard to the CPI began in the second half of the 1980s in the United States. Since 1996, when the Boskin Commission testified that the U.S.’ CPI was biased upward by 1.1 percentage points per year (Boskin, 1996) and that about the half of the bias could be attributed to neglected product innovations, the role of hedonic models in the CPI has become more and more important. Nowadays, several statistical offices have implemented hedonic methods and others are conducting research into extending their use for quality adjustment purposes especially with respect to categories of goods and services where quality changes are frequent such as computers, dvd players, audio equipment, clothes washers and dryers, microwave ovens, refrigerators etc.
2
3.1 Hedonic function The hedonic method is one of the explicit methods used for quality adjustment. Assuming that consumers, when they decide to purchase a product, check and compare the characteristics of similar products, the quality adjustment is usually based on a regression model which expresses the price as a function h of product characteristics. (1)
pi = h(ci ) ,
where p is the price of a variety i of an item (e.g. computer) and ci is a vector of characteristics associated with the variety (speed, memory, monitor size etc.). In this perspective a new model is simply a different combining of the existing characteristics. The hedonic method uses the information on the quality changes of product characteristics to divide the price change in two components: the pure price change (the product has the same characteristics in different periods) and the price change due to the better quality of the product. In econometric applications, the three most commonly used functional forms are the log- log, the semilog and the linear. Log-log model (2)
ln p = ln β 0 + β1 ln x1 + β 2 ln x 2 + ... + β k ln x k + ε k ,
Semilog model (3)
p = β 0 + β 1 ln x1 + β 2 ln x 2 + ... + β k ln x k + ε k ,
Linear model (4)
p = β 0 + β 1 x1 + β 2 x 2 + ... + β k x k + ε k ,
where xk are the characteristics, βk are the hedonic coefficients or the implicit prices of characteristics and εk is a random error term. The adoption of the hedonic method requires time and effort (a big amount of information is required) and is limited by a number of problems such as the hard identification of the product characteristics that influence p, the rapid change of the consumer tastes, multicollinearity (if characteristics are correlated the variance of the estimated coefficients is high), the choice of the appropriate functional form etc. 3.2 Hedonic price indexes Using the hedonic function Triplett (2001) describes four methods for calculating price indexes: on one side the time dummy variable method and the characteristics price index method (the so called “direct” methods) and on the other side the hedonic price imputation method and the hedonic quality adjustment method (the so called “indirect”
3
methods). The time dummy variable method pools all available data on a product over several time periods, fits a regression model and uses the coefficients on the variables relating to time as a direct measure of the price index (Ball and Allen, 2003). The regression equation used is: (5)
K
ln pit = α t + ∑ α ik ln zik + β t Dt + ε itk , i =1
where α is the constant term, α k is the regression coefficient or implicit hedonic price and ε it is an error term. The regression coefficients β t of the dummy variables Dt represent the pure price change not associated with the product characteristics z k . Each characteristic is either included in the regression or not by multiplying it by 1 or 0. The regression model adopted belongs to the “Adjacent Period Time Dummy variable approach”, according to Triplett terminology. This method holds constant the hedonic coefficient αk for only two periods and is more desirable. The characteristics price index method uses the implicit characteristics prices in a conventional weighted index formula and has the advantage of the independent determination of the functional form and the index number. With the imputation method the shadow prices of the characteristics can be different in every period and the price index is constructed comparing the annual price of a model with the price that the same model would have had in the reference year.
4. European Regulation and international context Until now, the reference for QA in terms of European legislation is the Regulation 1749/1996 of 9 September 1996. The Regulation in the Article 2 defines QA and Article 5 establishes minimum standards for procedures of quality adjustment. Concerning minimum standards, the Regulations stresses that “(…) In non case should a quality change be estimated as the whole of the difference in price between the two items, unless this can be justified as an appropriate estimate”. A wide effort has been done by Eurostat to provide to MS more detailed guidelines. Since 2001 the case by case approach has been adopted and a Task Force has been set up. The central idea of the “case by case approach” is identifying the methods for QA for different group of products, rating as A, B or C methods respectively the best ones (A) the methods that may be used (B) and those ones that are not to be adopted (C). The performance of each method for each group of products, has been evaluated on the basis of 5 criteria. The task force has released conclusions concerning some groups of products: clothing, Books, CDs and Software (Games), Cars, Durables. Regarding Durables, to which it is referred the case study of the present paper, actually general conclusions have not been drawn. Hedonic method has been rated as A method, but it should not be used in all cases. Therefore, for the time being, the issue of QA is still one of the main source of the lack of harmonization in HICP at European level and further efforts have to be carried out in
4
order to improve the comparability of the methods adopted across MS. Really hedonic methods have increased their importance in the practice of some European countries, but still they appear not be adopted on a general scale as it is in USA.
5. Italian practices Since 1999, the chain index methodology adopted by Istat for CPI (with the consequent annual revision of the basket of products and of the local elementary item data collection plan and the weight updating), has reduced, during the year, the incidence of the issue of lack of model matching and of the matter of substitutions of the elementary items that have been selected in the base period (December of the previous year). The amount of substitutions that have been carried out in 2005 has been less than 3% of the total amount of elementary prices collected. Moreover the chain index methodology has also allowed, for some aspects, to deal with the “outside the sample problems” (Triplett, 2004) and in particular with the possible lags in introducing new products that have not been selected for the index calculation. Generally speaking, the chain index methodology has allowed to keep updated the basket of products, the weights, the elementary items selected for the data collection, but it does not solve the issue of QA, even for the base period when the list of elementary items is revised. Therefore, as for the majority of the countries, also in Italian CPI, the issue of QA is very important in order to improve the quality of the data disseminated. In 2005, with respect to the products for which prices are collected at territorial level (about the 80% of the basket), whereas for the remaining 20% data collection is carried out directly by Istat), the practices adopted were three: overlap (implicit QA, the more diffused approach), package size adjustment (explicit QA), direct comparison (that implies no adjustment for quality because it is deemed that the elementary items, the replaced and the replacing one, are comparable). The package size adjustment is very frequent in the sectors of processed food and grocery, whereas the direct comparison and overlap are practices that are adopted for clothes, durables and others. For these groups of products, the main criticism arises when, if the direct comparison is not possible, the price of the replacing elementary item selected is not observable in a period when also the replaced one is available and indeed, it has to be estimated. This estimate implies the practice of “link to show no price change method” (Triplett, 2004) that means that the entire price difference between the two elementary items is due to the quality change. This practice is not permitted by Eurostat, unless this way to estimate the quality change is the more appropriate (European Regulation 1749/96). This is the main reason why the attention of the present paper has been focused on a group of products for which the current practice for QA in Italy is the overlap method and on the possibility of adopting and explicit QA method (hedonic regression) in alternative to the current practice. In particular the case study has been dedicated to the clothes washing machines in order to investigate the matters and the results of the application of a hedonic regression.
5
6. Case study In 2005, 54 Durables weighted for 11.1% on the basket of products and clothes washing machines represented the 0.1%: for 42 products, prices are collected at territorial level by Municipal Offices of Statistics (MOS) - e.g. domestic electric appliances - and the remaining 12 directly by Istat (e.g. cars). Prices for the majority of Durables were collected monthly (except furniture). In 2005, during the entire year, 210.697 elementary prices were collected. Concerning the Durables for which prices are collected by MOS, table 1 resumes the frequency of substitutions and of the different methods adopted for QA. The peaks of substitutions in the months of February, May, August and November, are due to the product (for the majority furnishings) for which prices are collected quarterly. It is evident that, with respect to the percentage of substitutions carried out for the entire basket of products during 2005 (less than 3% of the elementary prices collected), the percentage carried out for durables is higher (almost 5% during 2005). The second remark concerns the prevailing method adopted that is clearly the overlap method. Table 1 – Durables in Italian CPI: substitutions and QA. Year 2005 (absolute and relative frequencies)
Total amount of substitutions Frequency of substitutions for 100 monthly observations Frequency of Direct Comparison for 100 substitutions Frequency of overlaps for 100 substitutions
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
861
1769
515
504
1405
493
423
895
366
506
1192
484
4.9
10.1
2.9
2.9
8.0
2.8
2.4
5.1
2.1
2.9
6.8
2.8
2.4
1.7
1.0
0.6
1.2
1.2
0.5
1.8
0.5
0.6
1.9
3.5
97.6
98.3
99.0
99.4
98.8
98.8
99.5
98.2
99.5
99.4
98.1
96.5
Concerning clothes washing machine, that is the product on which the case study is focused, the first remark concerns the percentage of substitutions (7.2%) carried out in 2005 that is higher than that one above mentioned for Durables (almost 5%). Moreover, apart from the beginning and the end of the year, table 2 shows that the method adopted has been almost in all cases, the overlap method. On the basis of these evidences coming from micro data and with reference to year 2005, it has been investigated, for the clothes washing machines, the alternative adoption of hedonic quality adjustment. First of all it has been fixed the more relevant technical characteristics that could allow to specify hedonic functions. These characteristics have identified in the following four: brand, energy consumption, spin velocity and charge capacity.
6
The preliminary issue that has been dealt with, has been the availability of information regarding the technical characteristics of the appliances for which prices have been collected. Istat, as usual at the end of the previous year, has provided a general description of the product. On the basis of this description, MOS have identified the elementary items to be observed during the entire 2005 specifying brand and technical characteristics of them. Analyzing the information concerning technical characteristics specified by MOS, a strong difficulty has emerged to find those ones useful for defining hedonic functions. Finally it has been selected a subset of elementary items (167), of chief towns (39) for a total amount of 2004 elementary prices collected monthly. Table 2 – Clothes washing machines in Italian CPI: substitutions and QA. Year 2005 (absolute and relative frequencies)
Total amount of substitutions Frequency of substitutions for 100 monthly observations Frequency of Direct Comparison for 100 substitutions
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
42
60
41
33
37
21
24
33
25
43
37
27
8.6
12.3
8.4
6.8
7.6
4.3
4.9
6.8
5.1
8.8
7.6
5.5
4.8
3.3
0.0
0.0
0.0
0.0
0.0
3.0
0.0
0.0
0.0
11.1
Frequency of overlaps 95.2 for 100 substitutions
96.7 100.0 100.0 100.0 100.0 100.0 97.0 100.0 100.0 100.0 88.9
6.1 Description and model The regression model adopted belongs to the “Adjacent Period Time Dummy variable approach” (in brief APTD). This approach implies a single hedonic function that covers two adjacent periods data : K
ln p = α 0 + ∑α k zik + βDτ +1 + ε it t i
(6)
t = τ ,τ + 1
k _1
Where
z ik variables refer both to quantitative variables (their log) and qualitative
variables (being modeled using dummies), Dτ +1 is the time dummy 2 and ε it summarizes the effects of unobservable variables (assumed to be independent of Notably, equation (6) has the two following implications: •
2
z ik and Dτ +1 ).
The difference in the (log) prices of the same model i (that is, holding constant the relevant characteristics of the model) in period τ and τ+1 is given by :
It takes on the value 0 (alternatively 1) when price observations refer to period τ (τ+1). 7
(7)
piτ +1 ln τ = β pi
∀i .
The coefficient β (its anti-log) is a measure of the percentage change (on average) in washing machine prices between the two considered periods, holding fixed the characteristics of the washing machine. Specifically, if β is negative (positive), it provides a measure of the rate at which the price of the clothes washing machine is falling (rising) between periods τ and τ+1. •
The difference in the (log) prices of two different models i and j in the same period t is: (8)
ln
p tj pit
= ∑α k (z jk − zik ) ∀i ≠ j K
k =1
t = τ ,τ + 1 .
The price variability among models, which is observed in the same period t, is decomposed in the sum of the effects that are due to the existing differences in the relevant features defining each clothes washing machine model 3 . The model was estimated for each couple of adjacent months in year 2005 (from January-February to November-December) using un-weighted least squares 4 . Concerning the model specification, the set of characteristics that define the vector of independent variables consists of three quantitative variables: energy consumption, spin velocity, and charge capacity. Moreover, in order to capture the effect of brand on prices, the clothes washing machine brands have been grouped into four clusters : low, medium-low, medium-high and high class. Particularly, brands have been classified according to the distribution of prices of the clothes washing machine models observed in the reference year. For example, a specific brand has been considered as belonging to the high class if the prices of all the models of that brand (or, eventually, the majority of the models of that brand) fall in the last quartile of the distribution of the clothes washing machine prices 5 . 6.2 Results and conclusive remarks Results from hedonic regressions are reported in table 3 in which the estimated coefficients associated with the set of independent variables are shown, as well as their corresponding standard errors and t-values. The overall performance of our model is fairly satisfying. Considering the whole set of the eleven regressions, the R-squared never fall below the threshold of 70 per cent. For what concerns the quantitative variables, the estimated coefficients have the expected signs. 3
It is worth noting that, in this model specification, the assumption of time invariability of the regression coefficients holds only on a time length of two periods. 4 The estimation of the model coefficients was performed by Stata software. 5 Models of different brands with similar characteristics tend to have different prices: this price difference may be due to a sort of “reputation effect”. 8
Table 3 – Clothes washing machines in Italian CPI: hedonic regression coefficients and statistics. Year 2005 (absolute and relative frequencies) N obs R-squared coefficient Std. Err. t Coefficient Std. Err. t Coefficient Std. Err. t Coefficient Std. Err. t coefficient Std. Err. t coefficient Std. Err. t coefficient Std. Err. t coefficient Std. Err. t coefficient Std. Err. t
Jan-Feb Feb-Mar Mar-Apr Apr-May May-Jun Jun- Jul 334 334 334 334 334 334 0.7014 0.7018 0.7185 0.7272 0.7291 0.7218 Constant 1.4291 1.5155 1.5541 1.3739 1.4703 1.8175 0.2397 0.2321 0.2203 0.2106 0.2130 0.2280 5.96 6.53 7.05 6.52 6.90 7.97 Energy consumption 0.0994 0.0787 0.0743 0.0661 0.0653 0.0641 0.0461 0.0447 0.0425 0.0421 0.0420 0.0420 2.16 1.76 1.75 1.57 1.56 1.53 Spin velocity 0.7198 0.6995 0.6947 0.6935 0.6709 0.6538 0.0356 0.0345 0.0330 0.0327 0.0328 0.0331 20.25 20.28 21.07 21.20 20.47 19.77 Charge capacity 0.1454 0.1298 0.1193 0.1215 0.1520 0.1338 0.1141 0.1101 0.1052 0.1042 0.1031 0.1038 1.27 1.18 1.13 1.17 1.47 1.29 Brand (1st class) -0.3370 -0.3226 -0.3201 -0.1500 -0.1515 -0.3680 0.0427 0.0408 0.0393 0.0242 0.0238 0.0381 -7.89 -7.90 -8.16 -6.19 -6.35 -9.66 Brand (2nd class) -0.1952 -0.1622 -0.1631 -0.2065 0.0438 0.0422 0.0404 0.0390 -4.45 -3.84 -4.04 -5.29 Brand (3rd class) -0.1718 -0.1464 -0.1440 0.0194 0.0172 -0.1913 0.0397 0.0378 0.0364 0.0207 0.0205 0.0353 -4.33 -3.88 -3.96 0.93 0.84 -5.43 Brand (4th class) 0.1770 0.2053 0.0396 0.0389 4.47 5.28 Time dummy -0.0103 0.0003 -0.0045 -0.0007 -0.0017 -0.0056 0.0151 0.0149 0.0143 0.0141 0.0138 0.0138 -0.68 0.02 -0.32 -0.05 -0.13 -0.40
Jul-Aug Aug-Sep Sep-Oct Oct-Nov Nov-Dec 334 334 334 334 334 0.7116 0.7117 0.7059 0.7015 0.7049 1.9303 0.2224 8.68
1.8453 0.2154 8.57
1.8217 0.2176 8.37
1.6948 0.2073 8.18
1.5098 0.2101 7.19
0.0461 0.0418 1.10
0.0652 0.0418 1.56
0.0977 0.0422 2.31
0.1260 0.0416 3.03
0.1427 0.0418 3.41
0.6367 0.0332 19.16
0.6464 0.0337 19.16
0.6582 0.0342 19.22
0.6574 0.0340 19.33
0.6773 0.0342 19.80
0.1081 0.1016 1.06
0.1390 0.0996 1.40
0.1396 0.0932 1.50
0.1318 0.0865 1.52
0.1853 0.0862 2.15
-0.3717 0.0387 -9.60
-0.3723 0.0395 -9.42
-0.3709 0.0407 -9.10
-0.1757 0.0261 -6.72
-0.1671 0.0269 -6.22
-0.2005 0.0398 -5.04
-0.1999 0.0407 -4.91
-0.1998 0.0420 -4.76
-
-
-0.1879 0.0359 -5.24
-0.1929 0.0366 -5.26
-0.1900 0.0378 -5.03
0.0097 0.0224 0.43
0.0055 0.0231 0.24
-
-
-
0.1963 0.0421 4.66
0.2125 0.0422 5.03
-0.0090 0.0141 -0.64
-0.0025 0.0144 -0.18
0.0016 0.0149 0.11
-0.0045 0.0152 -0.30
0.0019 0.0148 0.13
Moreover, while the coefficients of charge capacity and, in a minor measure, the coefficients of energy consumption appear to be poorly significant, spin velocity appears to be a relevant factor in explaining the price variability of observed models. As far as brand variables are concerned, in order to avoid multicollinearity problems, only three of the four dummies representing the brand classes have entered explicitly in
9
the regression model 6 . The size of their coefficients, which is increasing with the level of the brand class, reflects the criteria according to which the groups were defined 7 . Otherwise, the estimated coefficients of the brand classes were, with only few exceptions, significant. On the contrary, as a result of this set of regressions, there is no evidence of any influence of the time dummy on the dependent variable. That is, on the basis of the present dataset, the pure price change as measured by β coefficients appears to be null over the time periods under consideration. It has been argued that this result depends crucially on the inertia showed by the prices of clothes washing machines models included in our dataset. Actually, looking at the subset of elementary items that are not substituted, the relative frequency of changes in their prices never exceeds the nine per cent, in every couple of adjacent months considered (table 4). Table 4 – Clothes washing machines in Italian CPI: diffusion of changes with respect to previous month of elementary items without substitution. Year 2005 (absolute and relative frequencies) without substitutions
Month
total elementary items
total
feb-05
167
131
that have registered a change with respect to the previous month 12
that have registered a growth with respect to the previous month 3
that have relative relative registered frequency frequency a decrease of changes of changes with with with respect to respect to respect to the the total the total previous elementary elementary month items items 9 0.07 0.09
mar-05
167
141
8
6
2
0.05
0.06
apr-05
167
140
9
6
3
0.05
0.06
mag-05
167
142
10
4
6
0.06
0.07
giu-05
167
148
9
2
7
0.05
0.06
lug-05
167
144
13
1
12
0.08
0.09
ago-05
167
131
10
2
8
0.06
0.08
set-05
167
139
8
2
6
0.05
0.06
ott-05
167
144
9
4
5
0.05
0.06
nov-05
167
145
7
2
5
0.04
0.05
dic-05
167
147
4
2
2
0.02
0.03
6
Stata software is provided with specific routines which automatically choose the model specification in presence of this kind of problem. 7 Notably, in each regression the dropped dummy represents a standard case and the value of the coefficients of the other dummies provides a measure of the effects on prices which is due to any deviation from it. 10
The scatter plot matrix also provides evidence supporting the view that the variability of the observed prices is almost completely explained by the variability of the models in the dataset. Considering the set of observations collected in January and February the scatter plots 8 highlight the significant correlation between spin velocity and prices (figure 1) and the weak correlation between energy consumption (figure 3) and charge capacity (figure 2) and prices. Figure 4 shows that scatter plot of prices with reference to two different times (January and February 2005) does not change, confirming, in terms of correlation, the results obtained with the coefficients of the regression that, moreover, are stable also for the other couples of months compared during 2005. It is clear that the results obtained are partial but have allowed to highlight some preliminary conclusions and to trace some future developments of this kind of analysis with reference to Italian CP indices. These future developments can be resumed in the following ones: • • •
it is necessary to verify if the adoption of other direct or indirect hedonic approaches confirms the preliminary results of the present analysis; an extension of the considered time period has to be taken into account; a hedonic adjusted price index has to be estimated in order to make a comparison with the index calculated by Istat according to the current practices.
Figure 1 – Clothes washing machines in Italian CPI: correlations between spin velocity logarithms (X) and prices logarithms (Y) in hedonic regression. January - February 2005 6.8 6.6 6.4 6.2 6.0 5.8 5.6 5.4 5.2 5.0 5.8
6.0
6.2
6.4
6.6
6.8
7.0
7.2
8
In figures 1-4, we have considered only the quantitative variables and the time dummy. For what concerns other couples of months, the scatter plot matrix remains almost unchanged. 11
Figure 2 – Clothes washing machines in Italian CPI: correlations between charge capacity logarithms (X) and prices logarithms (Y) in hedonic regression. January - February 2005 6.8 6.6 6.4 6.2 6.0 5.8 5.6 5.4 5.2 5.0 1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
Figure 3 – Clothes washing machines in Italian CPI: correlations between energy consumption logarithms (X) and prices logarithms (Y) in hedonic regression. January - February 2005 6.8 6.6 6.4 6.2 6.0 5.8 5.6 5.4 5.2 5.0 -2.2
-2.0
-1.8
-1.6
-1.4
-1.2
-1.0
12
Figure 4 – Clothes washing machines in Italian CPI: correlations between dummy variable (X) and prices logarithms (Y) in hedonic regression. January - February 2005 6.8 6.6 6.4 6.2 6.0 5.8 5.6 5.4 5.2 5.0 -1.0
-0.5
0.0
0.5
1.0
1.5
2.0
Acknowledgements Special thanks to Stefania Occhiobello (Istat), for the contribution for data extraction and elaboration.
References Ball, A. and A. Allen (2003), “The introduction of Hedonic Regression Techniques for the quality adjustment in the Producer Prices Index (PPI) and Harmonised Index of Consumer Prices (HICP)”, Economic Trends, 592, February, London: Office for National Statistics. Bascher, J. and T. Lacroix (1999), “Dish-washers and PCs in the French CPI: Hedonic Modeling, from Design to Practice”, presented at the Fifth Meeting of the International Working Group on Price Indices, Reykjavik, Iceland, August 25-27. Boskin, M. J., E. R. Dulberger, R.J. Gordon, Z. Griliches and D. Jorgenson (1996), “Toward a More Accurate Measure of the Cost of Living”, final report to the Senate Finance Committee, Advisory Commission to Study the Consumer Price Index, December 4.
13
Botrić, V. (2004), “Hedonic Regression and Price Indices – Application to the Personal Computer Price Index in Croatia”, Economic Trends and Economic Policy, No. 98, pp. 30-61. Feenstra and M. D. Shapiro (eds.), Scanner Data and Price Indexes, National Bureau of Economic Research Studies in Income and Wealth, Vol. 64. Chicago, IL: University of Chicago Press, pp. 317-48. Eurostat (2001), Compendium of HICP Reference Documents, 2/2001/B75, Working Documents. Haschka, P. (2005), “Inflation measures: too high – too low – internationally comparable?”, Paper presented at the OECD Seminar, Paris, 21-22 June 2005. Hulten, C. R. (2002), “Price Hedonics: A Critical Review”, Federal Reserve Bank of New York Economic Policy Review, 9(3) (September), pp. 5-15. ILO/IMF/OECD/UNECE/Eurostat/The World Bank (2004), Consumer Price Index manual: Theory and practice, Geneva: International Labour Office. Kokoski, M., K. Waehrer and P. Rozaklis (2000), “Using Hedonic Methods for Quality Adjustment in the CPI: The Consumer Audio Products Components”, BLS WP 344. Liegey, P. R. (), “Hedonic quality adjustment methods for clothes dryers in the U. S. CPI”, Bureau of Labor Statistics. Moulton, B. (2001), “The Expanding Role of Hedonic Methods in the Official Statistics of the United States”, OECD Meeting of National Accounts experts, Paris, 9-12 October. Pakes, A. (2003), “A Reconsideration of Hedonic Price Indexes with an Application to PCs”, American Economic Review, 93(5) (December), pp. 1578-96. Silver, M. (1999), “An Evaluation of the Use of Hedonic Regressions for Basic Components of Consumer Price Indices”, Review of Income and Wealth, Vol. 45, No. 1, pp. 41-56. Silver, M. and S. Heravi (2004), “Hedonic price indexes and the matched models approach”, The Manchester School, Vol. 72, No. 1 January 2004, 1463-6786, pp. 24-49. Triplett, J. (2001), “IT, Hedonic Price Indexes and Productivity”, Paper presented at the IAOS Satellite Conference, Tokyo, 30-31 August. Triplett, J. (2004), "Handbook on Hedonic Indexes and Quality Adjustments in Price Indexes: Special Application to Information Technology Products ", OECD Science, Technology and Industry Working Papers, 2004/9, OECD.
14
An experimental hedonic regression approach to estimate quality adjusted price changes in Italian Consumer Price Survey Federico Polidoro (Istat,
[email protected]) Alessandro Brunetti (Istat,
[email protected]) Anna Volpe Rinonapoli (Istat,
[email protected])
Outline of the presentation
1. Introduction 2. Brief review of hedonic methods 3. International practices 4. Italian practices 5. Case study 6. Conclusive remarks
1. Introduction QA approaches in CPI
CPI: it intends to measure the effects of price changes holding constant other economic factors BUT in the real world goods and services are always changing in their characteristics
1. Introduction QA approaches in CPI Quality adjustment: the process by which prices are adjusted to account for changes in product quality
Matched model method: DIRECT COMPARISON: no quality change is observed, no adjustment EXPLICIT METHODS: based on observed changes in characteristics IMPLICIT METHODS: based on price changes observed for similar models
1. Introduction QA approaches in CPI EXPLICIT METHODS
IMPLICIT METHODS
• Package
• Price
size adjustment • Single variable adjustment • Option pricing • Production cost adj. • Judgmental quality adj. • Supported judgmental quality adjustment • Hedonic regression for price imputation • Other hedonic regression methods • Mixed approaches
change taken as quality change • Overlap • Bridged overlap • Monthly chaining and replenishment • Class mean imputation • Retropolation
2. Brief review on hedonic methods History and main issues History:1939 Court 1961 Griliches 1974 Rosen 1986-88-90 Triplett 1996 Boskin Report
concept first application to CPI theoretical basis further studies CPI bias
The quality adjustment is based on a regression model which expresses the price as a function of product characteristics.
pi = h (ci )
Price difference = quality change + pure price change Assumption: consumers, when deciding to purchase a product check and compare the characteristics of similar products
2. Brief review on hedonic methods Functional forms LOG LOG MODEL
ln p = ln β0 + β1 ln x1 + β 2 ln x2 + ... + β k ln xk SEMILOG MODEL
p = β 0 + β1 ln x1 + β 2 ln x2 + ... + β k ln xk LINEAR MODEL
p = β0 + β1 x1 + β2 x2 + ...+ βk xk
2. Brief review on hedonic methods Hedonic price indexes 1. TIME DUMMY VARIABLE METHOD It pools all available data on a product over several time periods, fits a regression model and uses the coefficients on the variables relating to time as a direct measure of the price index (Ball, Allen) T
K
ln p = α 0 + ∑ β D + ∑ α k ln z + ε t i
t
t =1
t
k =1
t ik
t i
The regression coefficients of the dummy variables D represent the pure price change not associated with the product characteristics z.
2. Brief review on hedonic methods Hedonic price indexes 2. CHARACTERISTICS PRICE INDEX METHOD It uses the implicit characteristics prices in a conventional weighted index number formula. Advantage: the functional form and the index number can be determined independently. 3. INDIRECT METHODS: •
THE HEDONIC PRICE IMPUTATION METHOD
•
THE HEDONIC QUALITY ADJUSTMENT METHOD
2. Brief review on hedonic methods Matched model versus hedonic model MATCHED MODEL METHOD • samples may be no representative or out of date • sensitive to unusual price changes • decisions about the replacement may be faulty Upward/downward biases
HEDONIC METHOD • it requires time and effort • identifying the product characteristics that influence p is hard • rapid changes in consumer taste • multicollinearity • functional forms • weighting or not
3. International practices Brief review • BLS-USA: rent, PCs, television, audio equipment products, video cameras, dvd players, refrigerators, microwave ovens, washing machines, clothes dryers • CANADA: PCs • FRANCE: dishwashers, PCs, TVs, books • SWEDEN: clothing, PCs • UK: washing machines, PCs • GERMANY: PCs, cars, EDP products, electrical household appliances and entertainment electronics, owner-occupied dwellings
3. International practices European Regulation Context: Regulation 1749/96 Definition: quality change, quality adjustment, replacement price
Minimum standards: the difference of price between 2 items (substitute and item to be replaced) has not to be attributed completely to the quality change (unless justified)
3. International practices Eurostat guidelines Classification of QA methods A methods (reference methods) B methods (that may be used) C methods (that should not be used) Task force and case by case approach (by specific product areas)
4. Italian practices General review • Centralised data collection: sampling approach based on consumer preferences • Territorial data collection (prevalent practices): a. Direct comparison (no QA) b. Package size adjustment (explicit QA) c. Overlapping link (implicit QA)
4. Italian practices Product area: Durables in 2005 • Rating for QA method uncertain at Eurostat level for durables • Data collection from quarterly to monthly frequency • 54 products (12 for which prices are collected by Istat, 42 by Municipal Offices of Statistics) • 210,697 (for 42 products) elementary prices collected in 12 months
4. Italian practices Substitutions and QA of Durables in 2005
Durables in Italian Consumer Price Survey: substitutions and QA in 2005 Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Total amount of substitutions
861
1769
515
504
1405
493
423
895
366
506
1192
484
% on total amount of monthly observations
4.9
10.1
2.9
2.9
8.0
2.8
2.4
5.1
2.1
2.9
6.8
2.8
% of Direct Comparison
2.4
1.7
1.0
0.6
1.2
1.2
0.5
1.8
0.5
0.6
1.9
3.5
% of Overlapping link
97.6
98.3
99.0
99.4
98.8
98.8
99.5
98.2
99.5
99.4
98.1
96.5
4. Italian practices Substitutions and QA of clothes washing machines in 2005 Clothes washing machines in Italian Consumer Price Survey: substitutions and QA in 2005 Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Total amount of substitutions
42
60
41
33
37
21
24
33
25
43
37
27
% on total amount of monthly observations
8.6
12.3
8.4
6.8
7.6
4.3
4.9
6.8
5.1
8.8
7.6
5.5
% of Direct Comparison
4.8
3.3
0.0
0.0
0.0
0.0
0.0
3.0
0.0
0.0
0.0
11.1
% of Overlapping link
95.2
96.7
100.0
100.0
100.0
100.0
100.0
97.0
100.0
100.0
100.0
88.9
5. Case study Brief description of the sample
• Subset of prices collected for clothes washing machines that have been used for Italian CPI calculation • Different typology of retail outlets • Period January - December 2005
5. Case study Brief description of the sample
• A panel of about 250 monthly elementary items • Data collected in 48 chief towns • More than 31 brands considered
5. Case study Brief description of the sample Each elementary item (model) has been classified according to six characteristics ( the vector of the hedonic regression) a. Class of energy consumption (quantitative) b. Charge capacity (quantitative) c. Spin dry velocity (quantitative) d. Brand (qualitative) e. Washing efficiency (qualitative) f. Delivery service (qualitative)
5. Case study The dummy model adopted • The regression model adopted belongs to the “Adjacent Period Time Dummy variable approach” - according to Triplett terminology (in brief APTD) • The regression equation used is : K
ln pit = α 0 + ∑ α k zik + β Dτ +1 + ε it k =1
t = τ ,τ + 1
5. Case study The dummy model adopted • Where : • zik variables refer both to: • quantitative variables (their log); • qualitative variables (being modeled using dummies) •
Dτ+1 is the time dummy: it takes on the value 0 (1) when price observations refer to period τ (τ+1)
5. Case study The dummy model adopted • APTD implies a single hedonic function that covers two adjacent periods data • The coefficient β (its anti-log) is a measure of the percentage change in washing machine prices between the two considered periods, holding fixed the characteristics of the washing machine
5. Case study The dummy model adopted • Specifically, if β is negative (positive), it provides a measure of the rate at which the price of the washing machine is falling (rising) between periods τ and τ+1 • β allows to measure the pure price change
6. Conclusive remarks • A crucial issue: the collection of qualitative information in the field (the experimental approach in Milano with digital camera and portable PCs) • Future developments a. possible extension of experimentation of hedonic prices to other product areas (starting from clothes and footwear) b. Implementing an unique classification of brands and technical characteristics
Proceedings of Q2006 European Conference on Quality in Survey Statistics
Improving Questionnaire Translations and Translation Processes Barbara H. Forsyth, Martha Stapleton Kudela, Kerry Levin, Deirdre Lawrence and Gordon B. Willis 1
1. Introduction This paper explores procedures for developing and evaluating questionnaire translations for surveys administered in multiple languages. We focus on recent work for the National Cancer Institute (NCI) translating an English-language questionnaire on tobacco use into Mandarin Chinese, Cantonese Chinese, Korean and Vietnamese. We used an iterative translation, evaluation and review process. This paper describes the iterative process and lessons learned. Our purposes are pragmatic. Can we identify useful practices for developing and testing questionnaire translations? The Tobacco Use Survey (TUS) is administered in the United States as a component of the Current Population Survey (CPS). It asks questions about tobacco use patterns, smoking prevalence, workplace smoking policies, level of nicotine dependence, medical advice to quit smoking, quit attempts, cessation methods used, and changes in smoking norms and attitudes. Several features of the TUS questionnaire and the content area in general make this questionnaire a good candidate for translation into four Asian languages. Although the general Asian-American population tends to have a lower rate of tobacco use than the general U.S. population (Centers for Disease Control and Prevention, 1998), tobacco use rates vary considerably among Asian-American subgroups (Ma et al., 2004; Ma et al., 2002). Furthermore, local surveys show high tobacco use rates for some Asian-American subgroups (Lew & Tanjasiri, 2003; Ma et al., 2002). Because tobacco use is associated with serious health risks, it is important to conduct tobacco use surveys using methods that ensure complete and accurate data from Asian-Americans. Survey researchers have advocated a range of practices for producing effective translations (e.g., Census Bureau, 2004; Harkness, Van de Vijver & Mohler, 2003; McKay, Breslow, Sangster, Gabbard, Reynolds, Nakamoto & Tarnai, 1996). Translation researchers seem to prefer team-based approaches when study resources permit them. Team approaches generate more translation options and provide sounder and less idiosyncratic translation review and evaluation (Census Bureau, 2004; European Social Survey, 2002; Harkness et al., 2003). We selected an iterative, team-based approach based on the five-stage translation framework described in Harkness et al. (2003).
1
Barbara H. Forsyth, Martha Stapleton Kudela and Kerry Levin, Westat, 1650 Research Blvd., Rockville, MD, U.S.A., 20850; Deirdre Lawrence and Gordon B. Willis, National Cancer Institute, 6130 Executive Blvd., Rockville, MD, U.S.A., 20852
1
2
2. Translation and Evaluation Processes We developed a 5-step model for translating the tobacco use questionnaire into Mandarin, Cantonese, Korean and Vietnamese. The five steps are: translation; review; initial adjudication, cognitive interview pretesting and final review and adjudication. This paper focuses on the procedures we used to conduct each step. 2.1 Translation The translation staff consisted of three independent professional translators. One was multilingual in English, Mandarin and Cantonese, one was bilingual in English and Korean and one was bilingual in English and Vietnamese. The translation project aimed to develop translations that “ask the same questions” (e.g., Harkness et al., 2003). Based on these instructions, the three translators worked independently to produce target-language translations. A translation coordinator supervised their work and was available as needed to answer questions and provide guidance. The translation step yielded 3 sets of translated items: a single translation for the Mandarin and Cantonese dialects of Chinese, a second translation in Korean, and a third translation in Vietnamese. Translators also provided documentation that described specific translation challenges they encountered and decisions they made to deal with these challenges. These translation products were input for the next review step. Because of the way our research was funded, the translation step was independent from succeeding review, adjudication and pretesting steps. We would prefer to set up collaborative working relations among translators, reviewers and adjudicators early in the translation step. Fortunately, we were able to implement a more collaborative approach after the initial translation step was finished. Lessons learned: Translation step • Involve review, adjudication and pretest staff early in translation step, while setting up translation goals. • Include early review as part of translation step to identify and eliminate ambiguities in translation task specifications before translation moves too far ahead. • Give unambiguous instructions to translators, including the reasons for and structure of the survey interview conversation.
2.2 Review For the review, adjudication and pretesting phases, we created the position of Survey Language Consultant (SLC) for each of the target languages. The SLCs fulfilled two
3
broad functions: they reviewed the initial questionnaire translations to identify translation options, and they supervised cognitive interview pretest activities. Using the same staff for review and pretest roles was a way to ensure that review results informed pretesting designs and that pretesting results informed ongoing review activities. We hired a total of 4 SLCs. Three of them were engaged early in the project and we hired a fourth later when we recognized errors in initial rounds of reviewing and pretesting for the Vietnamese-language translation. One SLC was multilingual in English, Mandarin Chinese and Cantonese Chinese, one was bilingual in English and Korean, and two were bilingual in English and Vietnamese. At the outset, we anticipated that the review and initial adjudication steps would be relatively brief and straightforward. Early on, it became clear the project would benefit from more formal review and adjudication processes. We developed a template that SLCs used to structure their reviews and to document results from their reviews. SLCs completed the template by identifying items in the target translations that seemed problematic, describing the reasons an item was problematic, suggesting a possible solution, and describing how each suggested revision would improve the target translation. The SLCs had different levels of experience with survey methods. We developed training materials, on-the-job training activities, and other feedback and support resources to provide the survey methods background needed to ensure comparable levels of review across the three target-language questionnaires. We conducted a 2hour training session to give SLCs information about the survey purposes and measurement goals, objectives for the translation review tasks, using the review template to accomplish their reviews, and cognitive pretest interview methods. We set up routine biweekly group meetings to discuss review progress, questions, and interim results. In addition, we set up informal meetings with SLCs as needed to discuss more detailed questions, language-specific issues, and unexpected problems. The templates that SLCs used to document the results of their reviews together with the original translations and translator notes were inputs to the initial adjudication phase. Lessons learned: Review step • Previous experience with survey methods is useful but not necessary to ensure effective input from reviewers. •
•
SLCs who lack previous survey experience benefit from ongoing conversations with research staff and each other about item intent, wording and translation options. Engage reviewers early, during translation, to reduce needs for large-scale revisions after translation is completed and to benefit from direct interaction between translators and reviewers.
4
2.3 Initial Adjudication Effective adjudication requires knowledgeable and versatile adjudicators (e.g., Census Bureau, 2004; Harkness et al, 2003; Harkness, Pennell & Schoua-Glusberg, 2004). Working through university research centers specializing in tobacco-related research, we found a lead adjudicator who had subject-matter expertise, translation experience, and familiarity with survey data collection methods. The lead adjudicator spoke Chinese as her first language, and was skilled in the Mandarin and Cantonese dialects. Through her academic appointment, she had access to other tobacco researchers who spoke Vietnamese or Korean as their strongest (or “first”) languages. These researchers formed the adjudication team. Wherever SLCs identified potential problems with a target-language translation, the adjudicators’ tasks were to review the problem and the suggested solution and to make a decision whether and how to revise the original translation. We adopted all decisions made by the adjudication team during the initial adjudication phase and folded them into the original translations to produce a pretest version for each target-language. Lessons learned: Initial adjudication step • University research centers are good resources for knowledgeable and versatile adjudicators. •
The structure of the adjudication process was effective. All adjudicators’ decisions fit within the task guidelines, and reviewers and translators respected the adjudicator role.
2.4 Cognitive Interview Pretest Cognitive interviews are structured, open-ended interviews designed to gather detailed information about the cognitive thought processes respondents use to understand and answer survey questionnaire items (Beatty, 2004; Forsyth & Lessler, 1991; Willis, 2005). For example, results can identify items that use unfamiliar or inappropriate terminology, items that respondents interpret in unexpected ways, or items that ask for information respondents have difficulty remembering. When cognitive interviews are used to test a questionnaire translation, results can be used to identify additional translation deficiencies. For example, target-language terms that respondents interpret differently than intended, target-language terms that are unfamiliar, or target-language terms that have culture-specific meanings. We developed an English-language cognitive interview script that consisted of the questionnaire items with cognitive probes inserted after selected questions. For example, probe questions asked respondents to describe how they understood particular question and response wordings, whether the response sets seemed incomplete, and whether any questions asked for information respondents have difficulty recalling. SLCs translated the cognitive interview script into the target
5
languages. Each SLC hired two cognitive interviewers to conduct cognitive pretest interviews in the appropriate target language. We conducted a 6-hour session to train interviewers to administer cognitive interviews in their target languages. The training session gave overviews of standard interview practices and conventions and the content of the tobacco use items. In addition, the training covered cognitive interview goals and techniques, reviewed the cognitive interview probes and their purposes, and included an English-language demonstration of a cognitive interview. Most of the training was in English, but trainees spent roughly 2 hours using role-playing methods to conduct practice cognitive interviews in their target languages. Lessons learned: Train cognitive interviewers • Include separate reviews of the questionnaire and the cognitive interview script to highlight the different functions of survey questions and cognitive interview probes. • Provide ample time for monitored practice and feedback. Conduct monitored practice interviews both in English and in the target language. • Give guidelines and additional practice adapting cognitive interview probes as necessary to avoid repetition and to follow up unexpected responses.
SLCs recruited respondents who had limited ability to speak or understand English and a mix of education, years in the United States, gender, and socioeconomic status (represented by occupation). Our budget was not large enough to also include respondents from a mix of regional dialects within each target language. Lessons learned: Recruit respondents • Document respondent recruiting activities in detail. Details will help staff evaluate alternative approaches and suggest better approaches to anyone who has trouble finding eligible volunteer respondents. • Word-of-mouth contacts through community networks were effective for finding and recruiting eligible volunteer respondents. Contacts through professional networks were particularly productive. • For word-of-mouth recruiting methods, SLCs should be prepared to provide extensive explanation about the study including study goals, what it requires of respondents, and the SLC’s role in the study. • Volunteer respondents had relatively few concerns about participating in the pretest interviews.
We anticipated the cognitive interviews would last about an hour and we paid respondents a small incentive to thank them for their time. SLCs suggested supplementing the incentives with small gifts such as fruit or cookies, particularly when interviews were held in respondents’ homes. For example, the Vietnamese-language
6
interviews coincided with the Moon Festival, and interviewers brought moon cakes, an ethnic Vietnamese sweet, for respondents. Other interviewers brought fruit baskets or gift certificates for local ethnic food markets. We conducted two rounds of cognitive interviews. In the first round, 6 interviewers conducted a total of 27 interviews. Nine interviews tested the Chinese-language translations. Roughly half of the Chinese-language interviews were in Mandarin and roughly half were in Cantonese. Nine interviews tested the Korean-language translation, and nine interviews tested the Vietnamese-language translation. Results from the first round of cognitive interviews revealed important problems with the Vietnamese-language translation. After the Vietnamese-language translation was rereviewed and revised, 2 new cognitive interviewers conducted a second round of 5 interviews to test the revised Vietnamese-language translation. SLCs observed all cognitive interviews, took detailed notes, reviewed interview audiotapes, and wrote a summary for each interview. SLCs monitored cognitive interview quality based on their observations, notes and audiotape reviews. We scheduled debriefing meetings with individual SLCs every second or third interview to give them an opportunity to report any issues or concerns about interviewer performance. SLCs used these meetings as opportunities to gather suggestions, advice and operational support. Research staff used these meetings as opportunities to monitor quality and identify interim results. Based on interim results reported by the Vietnamese-language SLC, we suspected the quality of the cognitive interviews and of the translation itself. We hired a second Vietnamese-language SLC who reviewed tape recordings from the Vietnameselanguage interviews. He discovered a variety of errors. Cognitive probes and interviewer instructions were translated incorrectly, tested survey items were administered improperly, and interviewers mistakenly read interviewer instructions and skip patterns to respondents. We decided we could not trust results from the first round of Vietnamese-language interviews. The new Vietnamese-language SLC followed our established review and adjudication process to revise the Vietnamese-language translation. Also, he hired and trained 2 new cognitive interviewers to conduct a second round of 5 interviews testing the revised Vietnamese-language translation. We have reported the cognitive interview results elsewhere (Willis et al., 2005a; 2005b). The focus here is on identifying effective analytic processes. SLCs’ interview summaries were the primary source for identifying key findings and recommendations. We asked SLCs to write summaries that “described everything that happened.” We coached them to avoid editing out details that seemed unimportant to them, and to include information about interviewer behaviors and observer reactions as well as information about respondent behaviors and reactions. The resulting, inclusive summaries gave analysts access to a full range of reactions, including verbal, nonverbal and emotional reactions from respondents and reactions from interviewers and observers.
7
Lessons learned: Conduct and summarize cognitive interviews • Introduce independent quality control activities early, during initial review and adjudication steps. Careful quality control activities are particularly important when relatively few research staff are skillful with the target languages. • Anticipate retraining. Use SLCs’ summaries and reviews to identify retraining needs. Look for adequate detail, minimal redundancy within interviews, and evidence of proper use of cognitive interview probes. • Add active quality control activities during initial cognitive interviews to support interview observation and SLC debriefing activities. • The following active quality control activity seems promising to us because it increases the likelihood that early interviews will be as useful as later interviews. − Break the first round of cognitive interviews into small sets of 1 or 2 interviews. − Separate these sets, giving time to observe and/or review interviews and re-train interviewers as needed. − When possible, observe initial interviews using simultaneous interpretation to ensure observers have access to all interview details. − Engage a range of staff in interview observation and review to ensure input from several viewpoints. • Consider using SLCs to conduct cognitive interviews rather than hiring a separate cognitive interviewing staff to further benefit from the knowledge and perspective SLCs gain during review and adjudication steps.
We used qualitative analytic methods to review the summaries and identify general themes. As part of the analysis, we often consulted with SLCs to verify that we were interpreting interview results correctly and to gather advice about recommendations and priorities. SLCs reviewed draft reports and recommendations and provided additional feedback. We used their feedback to finalize a set of recommendations in preparation for the final review and adjudication.
Lessons learned: Identify key findings and develop recommendations • The general and nondirective instruction to “describe everything that happened” helped SLCs produce detailed and useful interview summaries.
8
2.5 Final review and adjudication We convened a meeting to conduct the final review and adjudication. At this meeting, NCI’s project director determined which recommendations to accept and what revisions to make to the three target-language questionnaires. We did not attempt to define specific roles for research staff and the SLCs during the final review and adjudication step. We found that research staff tended to focus on identifying and classifying problems respondents had understanding and answering survey questions. SLCs focused on giving examples and context to illustrate the types of problems observed and to clarify effects on survey responses. The project director found that both types of information were necessary to make good decisions about revising the three targetlanguage questionnaires. The initial and final adjudication steps differed considerably in terms of the number of changes considered and also in terms of the kinds of changes considered. At the initial adjudication step, adjudicators made extensive revisions to the target-language questionnaires, and most of the changes focused on improving the individual translations. In contrast, the project director made relatively few changes during the final adjudication step. Some of these changes improved the translations, but several changes were more general, addressing problems observed across all three translations. We believe that differences in the number and kinds of revisions made during the initial and final adjudication steps indicate the general success of the five-step model for translation and evaluation. The review and initial adjudication steps identified a lion’s share of the shortcomings in the draft translations. These shortcomings were effectively addressed by changes made in the initial adjudication step. Thus, the cognitive testing and final adjudication steps could focus on more universal issues such as clarity of question purpose and response set completeness.
Lessons learned: Final review and adjudication step • Aside from the decision-making role filled by the project director, leave other roles unspecified. This gives research staff and SLCs freedom to select the topics about which they feel qualified to speak. • When early steps of the translation process effectively address translation errors and shortcomings, then later steps can focus on more general questionnaire design issues that may influence responses regardless of the language used to administer interviews.
9
3. Conclusions
In summary, we believe the 5-step translation and evaluation model implemented here produced effective target-language translations for the English-language tobacco use items. Our assessment is based on qualitative observations. Review and cognitive interview results suggest the target-language translations effectively represent the source questionnaire. Successive rounds of review and evaluation produced successively smaller revisions. The translation and evaluation methods we employed supported a collaborative research environment. Thus, the project benefited from the diverse kinds of expertise that individual team members brought to their tasks. We are currently conducting research to evaluate the 5-step translation and evaluation model more objectively, using behavior coding methods (Cannell, Fowler & Marquis, 1968) to quantify data quality. Our goal is to compare behavior coding results with the review and cognitive interview results reported here to determine whether revisions made based on review and cognitive testing activities actually enhanced survey data quality. References Beatty, P. (2004), “The Dynamics of Cognitive Interviewing”, In Presser, S., Rothgeb, J.M., Couper, M.P., Lessler, J.T., Martin, E., Martin, J. & Singer, E. (eds.) Methods for Testing and Evaluating Survey Questionnaires, New York: Wiley. Cannell, C. F., Fowler, F. J., & Marquis, K. (1968). The influence of interviewer and respondent psychological and behavioral variables on the reporting in household interviews. Vital and Health Statistics, Series 2, No. 26. Washington, DC: U.S. Government Printing Office. Census Bureau (U.S.) (2004). Census Bureau Guideline: Language translation of data collection instruments and supporting materials. Suitland, MD. Issue date April 5, 2004. Centers for Disease Control and Prevention (U.S.) (1998). Tobacco use among U.S. Racial/Ethnic Minority Groups – African Americans, American Indians and Alaska Natives, Asian Americans and Pacific Islanders, and Hispanics: A report of the Surgeon General. Atlanta, Georgia. European Social Survey (2002). An outline of ESS translation strategies and procedures. Available at http://naticent02.uuhost.uk.uu.net/ess_docs/translation.pdf. Accessed September 30, 2005. Forsyth, B.H. & Lessler, J.T. (1991), “Cognitive Laboratory Methods: A Taxonomy”, In Biemer, P.P., Groves, R.M., Lyber, L.E., Mathiowetz, N.A. & Sudman, S. (eds.) Measurement Errors in Surveys, New York: Wiley.
10
Harkness, J.A., Pennell, B-E, & Schoua-Glusberg, A. (2004), “Survey Questionnaire Translation and Assessment”. In Presser, S., Rothgeb, J.M., Couper, M.P., Lessler, J.T., Martin, E., Martin, J. & Singer, E. (eds.) Methods for Testing and Evaluating Survey Questionnaires, New York: Wiley. Harkness, J.A., Van de Vijver, F.J.R., & Mohler, P. (2003), Cross-Cultural Survey Methods, Hoboken, N.J.: Wiley. Lew, R. & Tanjasiri, S.P. (2003). Slowing the epidemic of tobacco use among Asian Americans and Pacific Islanders. American Journal of Public Health, 93, 764- 768. Ma, G. Tan, Y., Toubbeh, J.I., Su, X., Shive, S.E., & Lan, Y. (2004). Acculturation and smoking behavior in Asian-American populations. Health Education Research, 19, 615625. Ma, G.X., Shive, S., Tan, Y. & Toubbeh, J. (2002). American Journal of Public Health, 92, 1013-1020. McKay, R.B., Breslow, M.J., Sangster, R.L., Gabbard, S.M., Reynolds, R.W., Nakamoto, J.M., & Tarnai, J. (1996). “Translating survey questionnaires: Lessons learned”, New Directions for Evaluation, 70, pp. 93-105. Willis, G.B. (2005), Cognitive Interviewing: A Tool for Improving Questionnaire Design, Thousand Oaks, CA: Sage. Willis, G.B., Lawrence, D., Kudela, M., Levin, K. & Miller, K. (2005a). “The use of cognitive interviewing to study cultural variation in survey response.” Paper presented to the Questionnaire Evaluation Standards (QUEST) Workshop. Willis, G.B., Lawrence, D., Thompson, F., Kudela, M., Levin, K. & Miller, K. (2005b). “The use of cognitive interviewing to evaluate translated survey questions: Lessons learned.” Paper presented to the Federal Committee on Statistical Methodology. Washington, DC.
11
Improving questionnaire translations and translation processes Barbara Forsyth, Martha Kudela and Kerry Levin Westat Deirdre Lawrence and Gordon Willis National Cancer Institute
Overview Tobacco use items 5-step translation model
Review steps Survey Language Consultant role Implementation Lessons learned
Conclusions
Tobacco use items Patterns of use Smoking prevalence Workplace smoking policies Nicotine dependence Advice about quitting Attempts to quit and methods used Changes in smoking attitudes
5-Step model Translation Review Initial adjudication Cognitive interview pretest Final adjudication
Survey language consultants (SLCs) Reviewed initial translations Supervised cognitive interview pretests Four SLCs
Mandarin and Cantonese Chinese Korean Vietnamese
Step1: Translation Three independent translators Instruction: “Ask the same questions” Products
Translated survey items Documentation – translation issues and approaches
Lesson learned
Set up more collaborative translation process when feasible
Step 2: Review Formal review process Template tool
Identify problematic items Describe problems Suggest solutions
Biweekly meetings – progress, questions, and interim results Lesson learned
SLC experience with surveys useful but not necessary
Step 3: Initial adjudication Adjudication team Adjudicator tasks
Examine identified problems Assess suggested solutions Make decisions about changes to translations
Step 4: Cognitive interview pretest Cognitive interview purposes Design
9 interviews in Chinese and Korean 14 interviews in Vietnamese
Procedures
Script Cognitive interviewer training Incentives and gifts Observation and quality monitoring
Step 4: Cognitive interview pretest
Lesson learned: Further quality review
First round – smaller sets of interviews Time to observe, review and re-train Observation with simultaneous interpretation Use SLCs as cognitive interviewers
Analysis
SLC summaries – describe “everything that happened” Qualitative analysis Input and feedback from SLCs
Step 5: Final adjudication Few additional revisions Some language-specific improvements Several more general revisions – to address issues across all three target languages
Conclusions
5-Step model and processes effective
Translations judged accurate Successively smaller revisions across steps Team approach – translation benefited from mix of individual expertise
Additional research
More quantitative evaluation of translations and data quality Behavior coding field test
Proceedings of QC2006 European Conference on Quality in Survey Statistics, Cardiff (UK)
Cohort analysis of household budget data: Potentialities and limitations of analyzing social change in tobacco expenditures over time Thomas Grund, Georgios Papastefanou, Matthias Fleck 1
Key words: Household-budget data, Income and Expenditure Survey, cohort analysis, social change, individualization, consumption
1. Introduction Our paper is related to a research project where we want to find out whether or not the influence of socio-economic status on food consumption has changed in Germany within the last decades. In many cases analyses of social change are based on period comparisons of two or more cross-sectional datasets, to identify some kind of trend trajectory. We want to show, that a more adequate understanding of social change needs a differentiation by cohort analysis – a methodological tool to separate age-, period- and cohort-effects (Glenn 1977). Looking at tobacco expenditure differences between white and bluecollar workers in Germany we will show to what extent household budget data can be utilized for a cohort analysis. By this we want to highlight potentialities and limitations of household budget data as they are given in Germany for social change analysis of consumption behavior.
2. Social Change and Food Consumption An influential theoretical approach that connects socially structured resources with lifestyles and consumption patterns has been developed by Pierre Bourdieu (Bourdieu 1982). In his concept of the “Habitus”, social stratification and consumption behaviour map on to each other very closely. However, some theorists are sceptical towards the importance of socio-economic status in modern societies (see e.g. Clark et al. 1993; Clark & Lipset 1991; Pahl 1989; Pakulski & Waters 1996; Waters 1994). In this context theorists like Beck (1992), Giddens (1973, 1995) or Bauman (1988, 1997) have developed an individualization argument which says that individual behavior is no longer 1
Thomas Grund, Sidney Sussex College, University of Cambridge, CB 23HU Cambridge, United Kingdom, e-mail:
[email protected]; Georgios Papastefanou, Centre for Survey Research and Methodology (ZUMA), 68072 Mannheim, Germany, P.O. Box 122155 (e-mail:
[email protected]); Matthias Fleck, Centre for Survey Research and Methodology (ZUMA), 68072 Mannheim, Germany, P.O. Box 122155 (e-mail:
[email protected])
governed by social structures but more and more by individual preferences. Applying this argument to preferences for food, one might expect that contrast between bluecollar and whitecollar households have diminished over time with regard to consumption practices. A straightforward empirical test for such a hypothesis would be to compare the covariation of food practices and socio-economic position over time. But this strategy is somewhat misleading, because it only looks at period processes and fails to account for other mechanisms which might produce social change as well. In his seminal work on generational succession Karl Mannheim (1928/1929) has outlined that social change can be reflected at least in three fundamental processes. These are: 1. a period process, 2. an age process and 3. a generation process. The period process postulates that contextual changes or historical events affect the population in a general and direct way. The age process means that behaviour changes when people get older because of biological, psychological or sociological reasons. Finally the generational process has to be understood in a way that people who share formative events and periods in the sensitive phase of their life will develop enduring orientations and preferences. Applying these concepts to consumption patterns over time, we can specify the following research questions: • • •
Did social status differences, like that between households of blue and whitecollar workers, in relation to consumption diminish in the process of societal change in the last decades? Is there an age process of consumption differentiation between blue and whitecollar households, which contributed to the observable changes over time of social status differences? Is it possible to disentangle observable consumption differences between blue and whitecollar workers over time into a cohort and a period process?
3. Adequacy of Data: Potentialities and Limitations One way to get empirical results to answer these questions is to make a cohort analysis with adequate data. But which data are suitable for a cohort analysis? In the case of Germany, which we are focusing on in one part of our research, data produced by the governmental Income and Expenditure Survey (“Einkommens- und Verbraucherstichprobe”, EVS) seem to be the most promising source. But they also show some important limitations. We can judge the adequacy and limitations of the data according several representation dimensions.
3.1 Time representation How are age-, cohort- and period-time represented in the dataset? Period-time covers a time span of now more then 40 years, from the first survey conducted in 1962 to 1963, which was replicated in 1969, than in 1973 and thereafter every 5 years, which means 1978 to 2003. In all surveys age and year of birth of the household head are measured. 3.2 Representation of food consumption The surveys are designed to measure comprehensively household food expenditures. This is done by writing down the money spent for all food and drinking items in diaries over a time interval of one month. The consumption over one year is then covered by 12 sub-samples of household, each measuring one different calendar month. The food classification system remained the same over the years, which is an important potentiality but it might very well be the case that information on specific items were not collected some decades ago, because they were practically absent in these days. An example is the collection of expenditures for exotic fruits which were introduced to the market not so long ago. In case of other items like that of tobacco this is not a problem. The EVS measures consumption on the household level. So we do not have information on the individual consumption in the household. In order to compare consumption patterns of different households either a) the analysis has to be reduced to households with the same composition or b) to use a per head consumption proxy indicator, which needs an appropriate weighting scheme that reflects the amount of household members has to be applied. 3.3 Representation of socio-economic status over time Beside the expenditures and income the EVS program did collect information on the household composition and the household’s type of residence plus information on the socioeconomic status of each household member. By using the data on the household head, the main income earner, it is possible to characterize the social status of the household. In more recent surveys the EVS expanded the number and detail of the variables, for example in the last survey educational status is also available. But this is not the case in past surveys. So for change analysis only very general information on the social status is available, differentiating between bluecollar, whitecollar, civil servants, self-employed and farmers. Furthermore this poses an important limitation on the interpretation of changes, because it seems reasonable, that the occupational and educational composition of the group of bluecollars workers, for example, did change significantly between 1983 and 2003.
3.4 Population representation To run a cohort analysis of social status differentiation needs a high number of cases, because we have to differentiate according to age-, period-, cohort- and social statusgroups. The German EVS is suitable for that kind of analysis because the sample size is very big. The available data-sets, which are accessible as scientific use files, exceed more than 10.000 households for each survey. So this is a very good basis to compare homogeneous groups, which makes the interpretation more clear cut. Since the start of EVS the samples are drawn by a quota design. Households who declared their willingness to participate are incorporated into the survey by a distribution scheme. This scheme is based on the distribution of state, household type, income status and occupational status as it is given by the Microcensus, which is a yearly nonvoluntary labor force survey conducted each year. So at best the EVS data fit in relation to these three variables to the Microcensus quota. Due to the voluntary participation and the relatively high respondent-burden, among other things, the EVS samples show strong tendencies for a self-selection bias. This leads to a middle-class bias which is a well known phenomenon for sample surveys with voluntary participation (Hartmann 1990; Hartmann/Schimpl-Neimanns 1992). Because of systematically different chances of getting into contact with selected households and of their willingness to cooperate – initially as well as during the survey-period – the data contain biases for several subgroups. Under-representation can be observed for: Single-person households; very young and especially households headed by a relatively old person; households of blue-collar workers, self-employed and farmers; extremely rich and very poor; households of foreigners (Becker/Hauser 2003; BörschSupan/Reil-Held/Schnabel 2003; Braun 1978; Lang 1998; Pöschl 1993). Households of bluecollar workers, self-employed and farmers have a reduced tendency to cooperate compared to white-collar workers and civil servants. Although the EVS is regarded as the best sample-data for the several income-sources (and expenditures as well) in Germany an under-representation of extremely rich and very poor households is evident. Households of foreigners (which are sampled since 1993) show such a dramatic under-representation that an analysis of that group is not justifiable. With regard to dropouts during the survey-period there are some hints that those subgroups who are hard to gain as participants are also more susceptible to quit their participation during the observation period.
4 Data, Indicators and Method 4.1 Dataset Our exemplary study to analyse differences between blue- and whitecollar workers on their tobacco expenditures is based on scientific use files of the German Income and Expenditure Survey, the EVS (Einkommens- und Verbrauchsstichprobe). The EVS -
beginning in 1962/63 - collects income and expenditures data of private households and is based on a stratified quota sample of about 0,2 percent of all private households in Germany. People participate in these surveys on a voluntarily basis. For the present analysis we had access to data from 1983, 1993, 1998 and 2003. Specifically we analyzed the data of a supplementary subsample of those households, which run a one month diary on expenditures on food and beverages (survey on nutrition, beverages and tobacco). To make the data comparable across time, we had to restrict the 1993, 1998 and 2003 samples to households, which resided in West-Germany and whose household head had German citizenship. The 1983 of sample consisted of 21 968 households, for 1993 the sample size was 12 638 households, for 1998 the effective sample data size was 9 825 households and for 2003 it was 25 736. 4.2 Indicators In the EVS surveys from 1983, 1993, 1998 and 2003 expenditures for tobacco had the same classification, so it was possible to analyze it consistently over time. In order to compare the expenditures of households with different compositions we divided the amount of money spent for tobacco by the number of adults in each household. The expenditures were measured in Pfenning, the German currency until the change to the Euro currency. Because in the 2003 survey the currency was Euro we transformed all information in EUR-Cent expenditures in a month. However, looking at expenditures change might also take place because of increased taxes or inflation from 1983 to 2003. Nevertheless, without elaborating on the causes for these developments and therefore correcting the level of expenditures, we can analyse the differences between the two groups observed out expenditure data. For this purpose we weighted the net difference between the two groups against the tobacco expenditures of whitecollar workers. By applying such a relative measurement we can track differences between social status groups over time despite of changes in the overall level of expenditures. 4.3 Method In this paper we focus on tobacco expenditures of bluecollar and whitecollar worker households in West-Germany from 1983 to 2003 and tracked relative differences between social status groups over time. In a further step we conducted subgroup analyses and examined these kinds of relative differences between blue- and whitecollar worker households for different age groups. Therefore we used the age of the household head (the main incomer earner) to assign households to an age group. In order to derive a standard cohort table we split our samples up in age groups along 5 year intervals. While the first age groups had to be designed as “up to 24 years” because of the information in the datasets, the other age groups were “25 till 29 years”, “30 till 34 years”, etc. The last category that we could construct was “70 years and older”, however we just looked at households up to 64 years because there were only a
few employers older than 64 in our datasets. Breaking our datasets down into subgroups we still had sufficiently large number of cases in all subgroups (see table 1). Table 1: Minimum number of cases in the EVS 1983 to 20031 1983 1993 1998 2003 337 200 145 423 Overall 67 21 22 51 Bluecollar workers 103 75 49 186 Whitecollar workers 1 In all years the minimum number of cases for overall, bluecollar and whitcollar households was in the age group “up to 24 years”.
As it has been stated above, it seems necessary to differentiate between period, age and cohort effects. In this paper we will follow a descriptive way of analyzing the crosstabulation of tobacco expenditure indicator by age and period. By different ways of examining this kind of table from different angles it is possible to get close to a separation of age, cohort and period effects. For this purpose in figure 1 three schematic ways of reading are shown that might help us to identify: a) period, b) age and c) cohort effects.
Figure 1: Three ways to read a standard cohort table a) Period effect
b) Age effect
period
age
c) Cohort effect
period
age
period
age
5 Results Looking overall at the indicators for the relative difference between blue- and whitecollar worker households from 1983 to 2003 we can examine increasing contrasts (table 2). While in 1993 Bluecollars spent 36 per cent more on average, in the year 2003 they already spent 48 per cent more for tobacco than Whitecollars. In 1998 the contrast between these two groups is slightly smaller (28 per cent), however, we can still identify a trend that the differences became stronger.
Table 2: Net tobacco expenditures of blue- and whitecollar worker households in EUR-Cent Bluecollar workers Whitecollar workers Difference Relative Difference
1983 1991 1723 268 16
1993 2741 2012 728 36
1998 2574 1010 563 28
2003 3210 2194 1016 48
According to our reasoning above it is necessary to differentiate this table additionally by age groups. The results are documented in table 3.
Table 3: Relative differences between blue- and whitecollar workers in their tobacco expenditures up to 24 years 25 to 29 years 30 to 34 years 35 to 39 years 40 to 34 years 45 to 49 years 50 to 54 years 55 to 59 years 60 to 64 years 1 Data not available
1983 -31 21 39 11 26 19 8 22 -24
19881 -
1993 64 73 58 29 44 26 5 16 10
1998 43 41 83 46 22 36 -15 -16 15
2003 34 107 100 40 42 38 47 -13 68
5.1 Period Effect In order to find out if between 1983 and 2003 events took place that had an effect on the reported contrasts above we follow figure 1a and compare the columns in table 3. A pure period effect (no age or cohort effects) would mean that the differences between age groups remain the same, but the overall level has changed over time. When we compare the results for 1983 and 2003 we can find increasing contrasts in the age between 25 to 54 years. Bluecollar workers spend more for tobacco than whitecollar workers. Comparing 1983 with 1993 or 1998 we can also see an increase of contrasts, especially in the age group 25 to 49. For the groups up to 24 and 55+ years there is no clear trend from 1983 to 2003. This suggests that there is a period effect of increasing contrasts in the age groups 25 to 49 from 1983 to 2003. However, because we can also find huge differences between age groups within a year, we can also say that age or cohort effects might have occurred as well.
5.2 Age Effect A pure age effect would mean that an age group shows the same numbers every year, while the contrasts differ with the age of the household. Looking at different age groups in the year 1983 it is difficult to find a clear pattern. In 1993 we also do not have a strictly linear change when people get older. Nevertheless we can see a trend that the contrast between blue- and whitecollar workers is stronger in a younger age. But still, the groups “up to 24”, “35 to 39” and “50 to 54” do not exactly fit in this pattern. In the year 1998 and 2003 it is also difficult to identify a pattern. However, when we only look at the age 30 to 54 we can also find diminishing contrasts amongst older people. In the younger (up to 30) and older groups (55+) there seems to be a lot of variation and these groups do not fit in the pattern that we observe for the middle aged households. 5.3 Cohort Effect The last effect we are interested in would be purely represented by our table when we could see the same differences following birth cohorts through time. We cannot find exactly the same values when we look at all cells where people were born at the same time. But despite of that we can see a pattern of changing contrasts when we compare different birth cohorts. In younger birth cohorts the differences are much stronger than in older birth cohorts. This suggests that the year of birth has an effect on the strength of the contrast between blue- and whitecollar workers.
6 Discussion and Conclusions In this exemplary study we have analysed relative differences between blue- and whitecollar workers in their expenditures for tobacco between 1983 and 2003 in WestGermany. In a traditional way of analyzing the overall trend data we find increasing contrasts between blue and whitecollar households, which do not fit to the individualization argument (table 2). By differentiating the overall trend results for concordant age groups we were able to conduct a descriptive cohort analysis. Separately from an age effect and from a period effect, we could also identify a cohort effect on tobacco expenditures. This means that the change towards increasing contrasts of blue and whitecollar households has at least two sources: some general periodic changes in the societal context, but also changes via the succession of generations. The contrasts between blue and whitecollar households in relation to tobacco expenditures were most significant in younger generation households. These are tentative results, which show, that it is necessary and worthwhile to try to differentiate time changes into generational, historical and age based changes. By further refinement with multivariate analysis of the expenditure outcome measures,
household expenditure data replicated over time seem to provide an interesting database for an empirical discussion of the individualization thesis.
References Bauman, Z. (1988), Freedom. Milton Keynes: Open University Press. Bauman, Z. (1997), Postmodernity and its Discontents. New York: New York University Press. Beck, U. (1992), Risk Society. Towards a New Modernity. London etc.: Sage. Becker, I.; Hauser, R. (2003), Anatomie der Einkommensverteilung. Ergebnisse der Einkommens- und Verbrauchsstichproben 1969-1998. Berlin: edition sigma Börsch-Supan, A.; Reil-Held, A.; Schnabel, R. (2003), “Household Saving in Germany”, in A. Börsch-Supan (Ed.), Life-Cycle Savings and Public Policy. A Cross-National Study of Six Countries. New York: Academic Press, pp. 57-99. Bourdieu, P. (1982), Die feinen Unterschiede. Kritik der gesellschaftlichen Urteilskraft. Frankfurt a.M.: Suhrkamp. Braun, H.-U. (1978), „Werbung der Haushalte für die Einkommens- und Verbrauchsstichprobe 1978“, Wirtschaft und Statistik, 7/1978, pp. 410-412. Clark, T. et al. (1993), The Declining Political Significance of Social Class, International Sociology, 3, pp. 293-316. Clark, T.; Lipset, S. (1991), Are Social Classes Dying? International Sociology, 4, pp. 397-410. Hartmann, Peter H. (1990), „Wie repräsentativ sind Bevölkerungsumfragen? Ein Vergleich des ALLBUS und des Mikrozensus“, ZUMA-Nachrichten, Vol. 26, pp. 7-30. Hartmann, Peter H.; Bernhard Schimpl-Neimanns (1992), „Sind Sozialstrukturanalysen mit Umfragedaten möglich? Analysen zur Repräsentativität einer Sozialforschungsumfrage“, Kölner Zeitschrift für Soziologie und Sozialpsychologie 44, pp. 315- 340. Giddens, A. (1977), The Class Structure of Advanced Societies. Hutchinson. Giddens, A. (1995), „Strukturation und sozialer Wandel“, in H.-P. Müller, M. Schmid (Eds.) Sozialer Wandel. Modellierung und theoretische Ansätze. Frankfurt a.M.: Suhrkamp, pp. 151-191. Glenn, N. (1977), Cohort Analysis. London: Sage. Lang, O. (1998), Steueranreize und Geldanlagen im Lebenszyklus: Empirische Analysen zu Spar- und Portfolioentscheidungen deutscher Privathaushalte. ZEWWirtschaftsanalysen, Band 32. Baden-Baden: Nomos. Mannheim, K. (1928/1929), „Das Problem der Generationen“, Kölner Zeitschrift für Soziologie und Sozialpsychologie 7: 157-185, 309-330. Pahl, R. (1989), Is the Emporer naked? International Journal of Urban and Regional Research, 4, pp. 711-720. Pakulski, J.; Waters, M. (1996), The Death of Class. London: Sage. Pöschl, H. (1993), „Werbung und Beteiligung der Haushalte an der Einkommens- und Verbrauchs¬stichprobe 1993“, Wirtschaft und Statistik 6/1993, pp. 385-390. Waters, M. (1994), Succession in the Stratification System: A Contribution to the `Death of Class' Debate, International Sociology, 3, pp. 295-312.
Large-scale household budget data and cohort analysis: potentials and restrictions European Conference on Quality in Survey Statistics 2006 Cardiff (UK), 25 April 2006
Thomas Grund
Georgios Papastefanou
Matthias Fleck
[email protected]
[email protected]
[email protected]
Outline
1
Food consumption and social change
2
Research questions
3
Four criteria of data adequacy
4
Example: Tobacco expenditures in Germany
5
Discussion
6
Conclusion
Food consumption and social change
Do social differences become less important for food consumption over time?
t0
t1
t2
t3
time
Research questions period
Can we find a period effect in the analysis of the differences between blue- and whitecollar workers in their exenditures? Did such differences (if they exist) become less over time?
Are there significant differences in the social contrasts according to age groups?
Do changes in the composition of the samples over time (different birth cohorts) have an effect on our results?
age
period age
period age
Four criteria of data adequacy
1
Time representation
2
Consumption representation
3
Social status representation
4
Population representation
Example: Tobacco expenditures in Germany Can we find a period effect in the analysis of the contrast between blue- and whitecollar workers in their expenditures for tobacco? Did such differences (if they exist) become less over time? Are there also age or cohort effects?
Data Cross-sectional data from German Income and Expenditure Surveys 1983, 1993, 1998 and 2003. Concentration on households in West-Germany with German citizenship. Method Cohort analysis. Sub-analysis of different age groups.
Tobacco expenditures of Blue- and Whitecollars in Germany EUR-Cent per month per #adults
1800 1600
Blucoallars
1400
Whitecollars
1200
Bluecollars spend more money for tobacco than Whitecollars.
1000 800 600 400
The relative difference also seems to increase from 1983 to 2003.
200 0
83 9 1
93 9 1
98 9 1
03 0 2
Something happened in 1998.
Measuring relative differences
Relative expenditure difference
Difference between Bluecollars and Whitecollars = Overall expenditure average
Example:
Expenditures for tobaccomonth 83
and s r age la r l e o v c ea t lue Cent r B u t ndi -Cen een EUR nce e e w p R t r x be rs in if f e o e l in EU e D c c a c l e n l ol oba vera a t iv l fere hitec f T i e o D R W
- 1016
2052
- 50
Number of cases in age groups Number of cases
all cases
4000
1983
21968
3500
1993
12638
3000
1998
9825
2500
2003
20219 ∑ = 64650
2000 1500 1000
up
to
24 25
-2
9 30
-3
4 35
-3
9 40
-4
4 45
-4
9 50
-5
4 55
-5
9 60
-6
4 65
-6
70
9
d an
m
or
e
A ge
0
ra ng e
500
Number of Whitecollars in age groups Number of cases
all cases
1600 1400 1200 1000 800
1983
5345
1993
4450
1998
3007
2003
7519 ∑ = 20321
600 400
A ge
-2 30 9 -3 35 4 -3 40 9 -4 45 4 -4 50 9 -5 55 4 -5 60 9 -6 4 70 65 an - 69 d m or e
25
up
to
24
0
ra ng e
200
Number of Bluecollars in age groups Number of cases 1983
3806
1993
1477
1998
1278
2003
2244
ra ng e A ge
-2 30 9 -3 35 4 -3 40 9 -4 45 4 -4 50 9 -5 55 4 -5 60 9 -6 4 70 65 an - 6 d 9 m or e
∑ = 8805
25
up
to
24
900 800 700 600 500 400 300 200 100 0
all cases
Period effect 1983
1988
1993
1998
2003
up to 24 age
-50
59
60
38
25 – 29 age
21
62
47
89
30 – 34 age
39
56
71
74
35 – 39 age
12
29
42
37
40 – 44 age
27
45
21
38
45 – 49 age
19
26
34
35
50 – 54 age
9
4
-17
45
55 – 59 age
23
15
-16
-13
60 – 64 age
-32
10
15
75
period effect
Period effect
When we compare the results for 1983 and 2003 we can find increasing constrasts in the age between 25 - 54. Bluecollars spend more than Whitecollars. Comparing 1983 with 1993 or 1998 we can also see an increase of contrasts, especially in the age group 25 – 49. For the groups up to 24 and 55+ years there is no clear trend from 1983 to 2003. This suggests that there is a period effect of increasing contrasts in the age groups 25 – 49 from 1983 to 2003.
1983,1984,1985, 1986,1987,1988, 1989,1990,1991, 1992,1993,1994, 1995………2003
Age effect 1983
1988
1993
1998
2003
till 24 age
-50
59
60
38
25 – 29 age
21
62
47
89
30 – 34 age
39
56
71
74
35 – 39 age
12
29
42
37
40 – 44 age
27
45
21
38
45 – 49 age
19
26
34
35
50 – 54 age
9
4
-17
45
55 – 59 age
23
15
-16
-13
60 – 64 age
-32
10
15
75
age effect
Age effect
Following different age groups through time we can find an increase of contrasts, as mentioned before. So we do not find a pure age effect. When we compare different age groups at all points in time (thus controlling a period effect) we find diminishing contrasts when people get older. We find this for the age range 25-54. This effect is much clearer in 1993, 1998 and 2003 than in 1983.
Cohort effect 1983
1988
1993
1998
2003
till 24 age
-50
59
60
38
25 – 29 age
21
62
47
89
30 – 34 age
39
56
35 – 39 age
12
29
40 – 44 age
27
45
45 – 49 age
19
26
50 – 54 age
9
4
55 – 59 age
23
15
60 – 64 age
-32
10
c oh or t 1 974 - 78 c oh 42 37 or t 1 969 - 73 c oh 21 38 or t 1 964 - 68 c oh 34 35 or t 1 959 c oh -17 45 - 63 o r t1 954 - 58 c oh -16 -13 or t 1 949 c oh 15 75 - 53 or t 1 944 c c oh - 48 o c h oho or t 1 or t 1 r t 939 929 - 43 -33 1934-3 8
71
74
cohort effect
Cohort effect
We can not find a clear increase or decrease of contrasts when we follow the cohorts through time. In younger birth cohorts the differences are much stronger than in older birth cohorts.
Discussion When we simply look at constrasts of tobacco expenditures between blue and whitecollar workers over time we would come to the conclusion that they increased from 1983 to 2003 Doing a cohort analysis such a period effect is not as pure anymore. Instead we find an effect of age and of cohort. This suggests that the formerly identified period effect is partly due because of changes in the sample (other cohorts). Differences are greater amongst younger people. They are also greater amongst people born more recently.
Conclusion Data from household budget surveys can be used to analyse social contrasts in food consumption (or tobacco) over time.
In order to find out if there are changes over time we have to do a cohort analysis and to look at period, age and cohort effects.
In the case of Germany, data from Income and Expenditure Surveys offer sufficiently enough cases to do a cohort analysis.
Some methodological concerns remain when we use German household budget data.
Contact
Thank you for your attention!
Thomas Grund
[email protected]
Georgios Papastefanou
[email protected]
Proceedings of Q2006 European Conference on Quality in Survey Statistics
Matrix procedure in business surveying Heikki Hella, Maria Huhtaniska 1
1. Introduction In the field of the statistical survey sampling, the development the business survey methods lives a progressive phase. This means that various official statistics have to be compiled and produced with different survey methods and from increasing amount of response data. The main problem is that this phenomenon occurs simultaneously with the strong and sudden structural changes of the respondent populations. Survey research has a real challenge to compile the data fast, with lower costs, fewer errors and minimum bias. The Matrix procedure (MP) is defined as a certain set of coefficient calculations to a) describe the structure of the survey data, and to b) obtain enterprise coefficients for determining a survey frame of the respondents. The MP also includes the use of graphics to support the calculations. The employment of the MP can be divided into five main steps: 1) calculating the structural parameters and coefficients, 2) quality control, 3) itemwise coverage examinations with different MP coefficient versions, 4) defining the final survey frame, and 5) additional calculations for the extra frame population. Statistical production process contains several stages: survey and sample design, use of administrative registers, pilot studies, data collection from enterprises, statistical calculations, quality control and dissemination of statistical publications. In the pilot and pretesting studies the MP forms a supporting framework to investigate and to assess how the frame, questionnaire and single items should be designed, and how the enterprises should be selected to the survey to produce high quality data for the compilation of the final statistics. For its original purpose, the respondent selection process, the MP offers several alternative means from which one is currently used in the Bank of Finland.
1
Heikki Hella, Bank of Finland, PO Box 160, FI-00101 Helsinki, Finland (
[email protected]); Maria Huhtaniska, Bank of Finland, PO Box 160, FI-00101 Helsinki, Finland (
[email protected]). The views expressed are those of the authors and do not necessarily reflect the views of the Bank of Finland.
1
The purpose of this paper is to give a demonstration by presenting the theoretical framework of the MP, its four different versions and its use in the treatment of business survey data in the frame determination. In addition, the authors will suggest further development of the intensity indicator of the MP, the use of the MP in various pilot and pretesting studies, and to link up graphical tools with the MP.
2. Matrix procedure 2.1 Theoretical framework 2.1.1 Structure and steps Survey data set can be displayed in a form of a matrix where the columns describe single questions (items i = 1,…,M) and the rows show the respondents (enterprises e = 1,…,N) and the respective responses of the survey (πei). The Appendix displays the theoretical concepts of the MP. The MP contains three structural parameters namely the survey item weight (di), which presents the percentage distribution of the items; the enterprise item weight (cei) which presents the percentage distribution of the enterprises within each item; and the item enterprise weight (gei) which presents the percentage distribution of the items for each enterprise. The intensity indicator (I) shows how strong is the response intensity of an enterprise or an item or the whole matrix. The intensity indicator for enterprise is used in one of the MP coefficient versions, Weighted Combined Factor (WCF). The other three versions are the enterprise sum coefficient (ESC), the enterprise maximum weight coefficient (EMAWC) and the enterprise minimum weight coefficient (EMIWC). The use of the MP can be described with five stage steps. First, one must display the data with a matrix form to be able to calculate the structural parameters cei, gei and di and the four MP coefficients including the intensity indicators Ie for each enterprise. Second, one must do the final quality checking using the structural parameter distributions. A checking threshold can be decided before the calculations since it is a percentage share. For example choosing 0.5 as a checking threshold means that one checks all the responses that cover a half or more from each item sum. Third, the calculations are used in itemwise coverage investigations. The comparison of the different MP versions is made with selected cut-off points, for example 85, 90 and 95. The goal of the survey will determine the number of cut-off points to be examined. Besides comparing the coverage of each item in different MP versions, one must also compare the total number of enterprises needed for each cut-off point. At this stage, one determines the acceptable coverage per item in each MP version and thus the preliminary cut-off points.
2
Fourth, one needs to check the enterprises around the preliminary cut-off point and the frame of preceding years. Also the mergers, acquisitions, births and deaths of enterprises must be checked since registers are never totally up to date when used. At this stage, the MP version to be used is determined as well as the final cut-off point. Fifth, if wanted, one could do additional calculations for the extra frame population which was not included to the final survey frame. One option is, for example, conduct a simple random or systematic sampling. The fifth step is not employed to the pilot surveys and pretesting for obvious reasons. 2.1.2 Matrix procedure versions Enterprise sum coefficient (ESC) The enterprise sum coefficient, ESC, is calculated by summing up the value of the responses, πei, for each enterprise respectively. The ESC for enterprise e is M
∑π i =1
ei
= πe1 + …+ πeM .
(1)
The ESC is the simplest way to calculate a commensurable coefficient for enterprises to be able to rank the enterprises according to the coefficients and to form a frame for the survey selection. If there is no need to give any weight to some particular items or enterprises, the ESC is to be used since it treats all the items and enterprises equally. Enterprise maximum weight coefficient (EMAWC) The maximum weight coefficient, EMAWC, is calculated by using the value of the responses, πei, and multiplying them with the respective survey item weights, di, and finally summing up these products. The EMAWC for enterprise e is M
∑π i =1
ei
d i = πe1 d1 + …+ πeM dM .
(2)
Calculating the coefficient for each enterprise by using the EMAWC gives higher weight for those enterprises which have larger value of the response in the items that are greater in the data matrix. Therefore, if there is a need to give a larger weight on the items that are greater in the whole survey, the EMAWC could be used. Enterprise minimum weight coefficient (EMIWC) The minimum weight coefficient, EMIWC, is calculated similarly by using the value of the responses, πei, but multiplying them with (1 – di) i.e. with 1 - the respective survey item weights, di, and finally summing up these products. The EMIWC for enterprise e is
3
M
∑π i =1
ei
(1 − d i ) = πe1 (1 - d1) + …+ πeM (1 - dM ).
(3)
Calculating the coefficient for each enterprise by using the EMIWC gives higher weight for those enterprises which have larger value of the response in the items that are smaller in the data matrix. Therefore, if there is a need to give a larger weight on the items that are smaller in the whole survey, the EMIWC could be used. Intensity indicator (I) The concept of the intensity indicator (I) is used in the fourth MP version. The intensity of each enterprise (Ie) is defined as a number of items in which a certain enterprise has non-empty responses in relation to the number of all items in the survey. Correspondingly, the intensity of each item (Ii) is defined as a number of non-empty responses within the item in relation to total number of enterprises in the survey. Intensity of the total survey (IT) is defined as a number of all non-empty responds to all elements of the data matrix. The general definition of this simple intensity indicator is (Ie ; Ii ; IT ) = (number of non-empty elements / total number of elements)*100.
(4)
The range of these variables is [0, 100]. For example in the survey of 10 items, an enterprise with 4 non-empty responses has an intensity Ie of 40 per cent. Weighted combined factor (WCF) The Weighted Combined Factor (WCF) differs from its ancestors in the way that the MP coefficient is weighted with some value α ∈ (0,1) and it contains the intensity indicator for enterprise (Ie) which is weighted with eligible value β, where β = 1- α. The WCF for the enterprise e can be calculated: WCFe(MP) = α * MPe /100 + β * Ie,
(5)
where MPe denotes the MP coefficient (ESC, EMAWC or EMIWC) for enterprise e and Ie is the intensity indicator of the enterprise e. The weights α and β are subjectively determined, e.g. if α = 0.90 and β = 0.10, the WCF for enterprise e in the case of the ESC is then WCFe(ESC) = 0.90 * ESCe /100 + 0.10 * Ie.
(6)
Calculating the coefficient for each enterprise by using the WCF takes into account not only the value of the responses but the number of the responses. If there is a need to take into account the frequency of the responses as well as the value, the WCF could be used. By altering the weights α and β, one can give either large value to the frequency or to the value of the responses. Note that the MP version selected to be
4
used in the WCF may also give weight to the items according to their size (with the exception of the ESC). 2.2 Implementation As the coefficient for each enterprise is calculated, the enterprises can be arranged in decreasing or increasing order to create an ordering of the enterprises for the use of analyses and survey selection process. Whether to use ESC, EMAWC, EMIWC or the WCF depends on the survey in hand; a statistician may want the frame to include more enterprises with larger values on the larger items or vice versa larger values on the smaller items. As shown later on, the borderline cases near the cut-off point could be investigated by using different MP versions. The objective of using the MP versions is to get the different lists (orderings) of the respondent candidates. These lists are then analysed and assessed for determining the final cut-off point of the survey. Auxiliary information can naturally provide valuable signal for placing the cut-off point. For example, the data from the taxation register may bring out new enterprises that should be included to the survey. Comparing the cumulative per cent distributions of the coefficients from each MP version could also give important supplementary information about the cut-off area and help determining the final cut-off point. 2.3 Joint use of matrix procedure and cut-off sampling The sampling in statistics may be a probability-based, a non-probability sampling or a mixed one. The cut-off 2 procedure is one of the most often used non-probability sampling method. This method is convenient when (see e.g. Särndal et al., 1997): - the population has a highly skewed distribution, i.e. the great part of the values is concentrated to a few statistical units, - the bias in the sample estimates caused by the cut-off is assumed to be negligible, - no reliable frame exists for the great number of small respondents of target population, and - the comparison between costs and other resource requirements versus required accuracy of estimates supports fast and simple cut-off procedure. One additional prerequisite to use the cut-off sampling method is the stability condition; both the frame and the single enterprises are not allowed to change too much from a preceding period; in turbulent periods the cut-off procedure is at its weakest (Hansen, Hurwitz and Madow, 1953, vol. 1). Elisson and Elvers (2001) remark, that there should be clear principles for using the cut-off sampling. In cut-off sampling all enterprises to certain a size (e.g. turnover) are included into the surveyed set and beyond this threshold value no units are collected in the sampling process. The threshold value of the cut-off method is often determined subjectively. In 2
The word "cut-off" refers to the borderline value between the included and the excluded sample units.
5
the MP approach, the terms preliminary and final cut-off point are used. The preliminary cut-off point means that after calculating the MP version(s) and arranging the enterprises in order based on the respective coefficient, one determines a preliminary cut-off point, for example 85 per cent, and explores what is the coverage per item with this cut-off point and examines the structure of the remainder. When all the necessary adjustments to the preliminary cut-off point are made, the final cut-off point is determined to be the one that defines the list of enterprises to be included in the survey. For the joint use of the MP and cut-off sampling, some supporting activities e.g. annual controlling of the births and the deaths of enterprises and up to date reviewing of the published mergers and acquisitions are needed. These supporting procedures help to keep the relative bias estimate, due to the cut-off sampling, on a negligible level. 2.4 Fields of application 2.4.1 Frame design and maintenance Frame design is here defined as a set of investigations of the registers and other information sources which include the potential respondents who provide as relevant coverage of enterprises as possible for the conducted survey. Maintenance of a frame is a continuous activity needed in updating the relevant list of the respondents 3 . With the MP, the survey frames are investigated and it may reveal a need to update the frames. This is done in the second and fourth steps of the MP. Beside the frame maintenance, the MP technique is suitable in designing survey questionnaires and in analysing the results of the pilot and pretesting studies before the actual business survey process. A well known fact is that pilot studies are unfortunately a neglected topic in the survey literature (see Biemer and Lyberg, 2003). In the pilot stage, practical and simple statistical calculations can be carried out by the MP and the results may give signals, for example, to change the name of some item, remove an item, divide an item into two items etc. 2.4.2 Quality control and rearrangement of matrix elements The MP can be used also as a simple quality control (QC) tool. The quality control is defined here as the use of the cei and gei -coefficients (enterprise item weights and item enterprise weights) and simple graphics to search and detect inconsistencies 4 in the data matrix. The outlying observations are then checked and corrected if identified erroneous. A very large or a very small size of cei in the matrix gives a signal of whether the corresponding responses should be corrected. A threshold value should be set for each 3
A classic reference about the business survey frames and registers is Colledge (1995). One often used, informal definition of outlier is "an observation or a patch of observations which appears to be inconsistent with the remainder of that set of data" (Barnett and Lewis, 1994). The business survey framework has its own type of outliers: representative and non-representative outlier, see Lee (1995).
4
6
cei -curve (for each item). For example, selecting 0.4 as a threshold value means that all the responses that cover 40 or more per cent of the item total should be checked. The inconsistency is evaluated within each question i since cei form the per cent distribution of each item. Another way to discover outliers is to calculate the distribution of the single responses to different items in each row of the matrix. The gei -curves can be displayed such that the enterprises are on the y-axis and each item are on the x-axis 5 . In practise, for example 10 largest (by size) enterprises are placed on the y-axis. The cei and gei -curves are presented in Section 3.2. As is noted, the inconsistency is defined here in relation to the certain sub-sample: one item or one enterprise. Figure 1 displays the two-way rearranged items and enterprises in the WCF version of the MP. It shows the concentration and the increasing ordering of the survey data towards to the lower left corner of the cubic (marked with arrows). Beside the concentration, the rearranged matrix shows the structure of the survey or the pilot experiment. The information obtained from the rearrangement is important e.g. in evaluating and comparing the items, single responses or the zero zone areas in the matrix. Figure 1. Two-way rearranged BoP data matrix
5
In this case also a patchy outlier (i.e. a group of inconsistent observations) may occur. This happens if some observations of an enterprise are not its own, but they are the observations of the corresponding group company, which includes this subsidiary.
7
3. Application with Finnish BoP data 3.1 Structural parameters and coefficients In this section, the calculations are based on the Finnish BoP business survey data. The data includes the enterprise sector BoP data from the year 2004 and were used in the survey selection process for the survey year 2006 6 . The total number of enterprises included in the matrix is 551, and they come from all the BoP surveys available (monthly survey, annual frame survey, FDI surveys and mergers and acquisitions survey). The BoP data matrix contains 10 items that were chosen as a result of benchmark investigations and they cover the BoP stocks at an aggregate level. The largest item is “Assets from a foreign affiliate” with a share of 37 per cent while the smallest item, “Portfolio investment assets”, covers only 0.5 per cent of the total survey. The structural parameters, cei, gei and di, were calculated by using the formulas presented in the theoretical framework (see Appendix). The structural parameters and the MP version coefficients were calculated by using the formulas (1) – (5) in the Subsection 2.1.2. 3.2 Quality control In Figure 2 the itemwise enterprise item weights of the BoP data are used as a quality control (QC) tool. It shows the responses of the 97 enterprises 7 (90 per cent of the total sum) by item. The number of enterprises is limited due to lack of space and for confidentiality reasons. Because the enterprise item weights are calculated within the item, significantly large values of the cei indicate a possible error (or a real outlier) and are worth of checking. If 0.4 8 is chosen as a checking limit, 3 data values should be checked (see Figure 2). One must keep in mind that using the enterprise item weights and the item enterprise weights as a QC tool needs the total data or the conditional data (e.g. 50 largest enterprises) to be in use.
6
The selection process for statistical year t+1 is done in the autumn of year t when the registers (BoP data and official business register maintained by Statistics Finland) concerning year t-1 are ready for use. 7 One enterprise is omitted for technical reason. 8 Threshold 0.4 denotes that one observation covers 40 per cent of the item sum in question.
8
Figure 2. Itemwise enterprise item weights (cei) of the BoP data, 97 largest enterprises by the WCF 0,5
0,4
0,4
0,3
0,3
{c1}; PI Assets {c2}; OI Assets {c3}; TC Assets {c4}; FDI outward Assets {c5}; FDI outward Liabilities {c6}; FDI inward Assets {c7}; FDI inward Liabilities {c8}; PI Liabilities {c9}; OI Liabilities {c10}; TC Liabilities
Share
0,5
0,2
0,2
0,1
0,1
0,0
0,0
0
10
20
30
40
50 60
70
80 90 100
0.00 = empty matrix elements
Enterprise
The Figure 3 displays the item enterprise weights (gei) for the 10 largest enterprises of the BoP data calculated by the WCF. There are 3 questions in which 4 different enterprises consist more than 60 per cent of the item total. Taken into account that the survey deals with financial items, this is not surprising but rather interesting, that even in the group of 10 largest enterprises the values in some items are quite small. Figure 3. Enterprisewise item enterprise weights (gei) of the BoP data, 10 largest enterprises by the WCF 1,0
1,0
0,8
0,8
0,6
0,6
0,4
0,4
{g1}; Enterprise1
Share
{g2}; Enterprise2 {g3}; Enterprise3 {g4}; Enterprise4 {g5}; Enterprise5 0,2
0,2
0,0
0,0
{g6}; Enterprise6
TC Liabilities
PI Liabilities
OI Liabilities
FDI inward Liabilities
FDI inward Assets
FDI outward Liabilities
TC Assets
FDI outward Assets
OI Assets
PI Assets
{g7}; Enterprise7 {g8}; Enterprise8 {g9}; Enterprise9 {g10}; Enterprise10
0.00 = empty matrix elements
3.3 Itemwise coverage in matrix procedure versions In Table 1, the first row presents the item sums and the second row the corresponding percentage share. The variation between items is enormous; the largest item (assets
9
from foreign affiliate) covers some 37 per cent of the total survey while the smallest item (portfolio investment assets) covers less than 1 per cent of the total survey. Table 1 also presents cumulative sums of the value of the responses, respective per cents, and number of enterprises in different MP versions and with different cut-off points for each item. As the cut-off point increases, the cumulative per cent of each item increases with different amount. These percentage changes are clearly distinct in different MP versions. The itemwise results provide information to make inferences on the versions and improvement on the frame selection process. The itemwise cumulative per cent may be larger than the corresponding total cut-off point and for some items the cumulative per cent keep remarkably smaller than the total cut-off point. The comparison of the itemwise differences a) between the items in each MP version and b) certain item in different MP versions provide important information about the survey data and therefore about the list of enterprises to be selected to the survey in question. Table 1. Itemwise value and the coverage in respect to the cumulative per cent of the MP versions in the BoP data Assets
Portfolio investment Item sum, EUR million 501 Share, % 0,54 Number of Enterprises
39 52 74 125
Cum % 80 % 85 % 90 % 95 % 80 % 85 % 90 % 95 %
Euro value
26 35 51 87
80 % 85 % 90 % 95 % 80 % 85 % 90 % 95 %
Euro value
41 55 79 132
80 % 85 % 90 % 95 % 80 % 85 % 90 % 95 %
56 81 132 235
80 % 85 % 90 % 95 % 80 % 85 % 90 % 95 %
ESC
EMAWC
EMIWC
WCF
Euro value
Coverage
Coverage
Coverage
Euro value
Coverage
FDI outward Loans, deposits and other Trade credits 1502 5027 1,62 5,41
Assets from foreign affiliate 34353 36,94
FDI inward Liabilities to foreign affiliate 12989 13,97
Assets from foreign owner 3187 3,43
Liabilities Liabilities to foreign owner 13242 14,24
Portfolio investment 9638 10,36
Loans, deposits and other Trade credits 11225 1325 12,07 1,42
402 411 437 458 80,30 82,02 87,24 91,41
645 730 892 1282 42,96 48,57 59,38 85,31
4589 5002 5027 5027 91,28 99,51 99,99 100,00
30779 32321 33260 33686 89,60 94,08 96,82 98,06
11617 12343 12594 12903 89,44 95,02 96,96 99,34
918 1060 1616 2170 28,80 33,27 50,72 68,11
7291 7604 9180 11211 55,06 57,42 69,33 84,66
9601 9601 9601 9638 99,62 99,62 99,62 100,00
7689 8979 9824 10656 68,50 79,99 87,51 94,93
1128 1144 1324 1325 85,11 86,34 99,91 100,00
353 353 429 454 70,44 70,49 85,52 90,60
322 407 459 752 21,41 27,06 30,52 50,05
4384 4385 4608 5027 87,20 87,23 91,66 100,00
30596 32231 33001 33701 89,06 93,82 96,06 98,10
11181 11654 12409 12772 86,08 89,72 95,53 98,33
26 32 199 695 0,81 1,01 6,26 21,82
4984 5852 7395 10345 37,64 44,19 55,84 78,12
8539 8539 9601 9601 88,60 88,60 99,62 99,62
6076 6921 8549 9859 54,13 61,66 76,16 87,83
870 877 1148 1158 65,63 66,21 86,67 87,39
402 411 437 458 80,30 82,02 87,24 91,41
645 736 892 1306 42,96 48,96 59,38 86,92
4867 5002 5027 5027 96,82 99,51 99,99 99,99
30633 32321 33177 33631 89,17 94,08 96,58 97,90
11687 12343 12541 12903 89,97 95,02 96,55 99,34
918 1385 1664 2296 28,80 43,45 52,22 72,05
7291 8083 9970 11420 55,06 61,04 75,29 86,24
9601 9601 9601 9638 99,62 99,62 99,62 100,00
8272 8979 9802 10689 73,69 79,99 87,32 95,23
1128 1144 1325 1325 85,11 86,34 99,99 99,99
437 437 460 469 87,24 87,24 91,78 93,49
734 892 1256 1399 48,85 59,38 83,60 93,12
5006 5027 5027 5027 99,58 99,99 100,00 100,00
32721 33260 33733 34195 95,25 96,82 98,20 99,54
12381 12594 12912 12976 95,32 96,96 99,41 99,90
1385 1664 2258 2806 43,45 52,22 70,87 88,06
7841 10107 11353 12649 59,21 76,32 85,73 95,52
9601 9601 9638 9638 99,62 99,62 100,00 100,00
9019 9824 10687 11083 80,35 87,51 95,21 98,73
1145 1325 1325 1325 86,44 99,99 100,00 100,00
Total 92990 100,00
As the ESC treats all the items equally regardless of their size, the comparison between different versions should be made against it. The EMAWC weights large items more than small items and was a major contestant against the ESC. However, for the BoP compilation, collecting more of the large items does not answer the purpose since the main goal is to collect as much data as possible from all the items with as small number of enterprises as possible. Choosing the EMAWC with total coverage of 90 per cent would decrease the number of enterprises to 51 compared to 74 with the ESC but the
10
coverage of all the items would decrease respectively (see Table 1); the coverage of the item “Assets from foreign affiliate” would decrease to alarming 6 per cent. Given that the EMAWC was not acceptable, the attention turned into its opposite, the EMIWC, which weights the small items more than the large items. The itemwise coverage remains almost unchanged compared to the ESC but the number of enterprise increase. The EMIWC might also yield to the group of enterprises to be chosen to the survey to include enterprises with less value in other than the small items. In relatively small countries like Finland one must also take into account the confidentiality issues. If the share of one enterprise is excessively large, the whole item might become confidential. And with secondary confidentiality even more items must be hidden. 3.4 Survey frame A manual checking is necessary rather than solely using a technical data matrix as a survey selection tool. A comparison with the data matrices from the preceding years is also useful. In fact, one might want to use different version of the MP's with the borderline cases of the chosen cut-off point. The itemwise sensitivity paths for the WCF are presented in Figure 4. The same cumulative per cents are presented stepwise in Table 1. However, the cut-off need not to be a round figure and Figure 4 show how the cumulative shares of different items converge towards 100 per cent. With the WCF, 90 per cent coverage in all the items require about 95 per cent total cumulative coverage and 80 per cent coverage in all the items require only 92 per cent total coverage. The item “Assets from foreign owner” is the slowest to converge towards 100 per cent. Then again, this item is relatively small, so it might not weigh too much when analysing the whole frame.
Coverage per item, %
Figure 4. Itemwise coverage in respect to the WCF cumulative per cent of the BoP data 100
100
90
90
80
80
70
70
60
60
50
50
40
40
30 85
30 91 89
95 93
97,5 96
99,1 98,4
99,6
PI, Assets OI, Assets TC, Assets FDI outward, Assets FDI outward, Liabilities FDI inward, Assets FDI inward, Liabilities PI, Liabilities OI, Liabilities TC, Liabilities
WCF Cum %
11
For the final selection of the enterprise frame to be included in the survey one needs to check 1. enterprises near the preliminary cut-off point, 2. frames of the preceding years, and 3. mergers, acquisitions, births and deaths of enterprises after the register has been updated and to decide 4. acceptable coverage per item, 5. final MP version to be used, and 6. final cut-off point to be used.
4. Conclusions and further development The matrix procedure with the cut-off sampling method comprises a handy method in practical business surveying. The MP is divided into five steps which produce three structural parameters and four MP versions for the use of quality control, itemwise coverage examination and the definition of the final cut-off point and survey list frame. The structural parameters (cei, gei and di) can be used in the simple two-way quality control operation: itemwise by cei -coefficients and enterprisewise by gei -coefficients. The di -coefficients are used in three of the four different MP versions but also in structural analysis of the survey data, e.g. in pilot studies. A special variable, intensity indicator, is constructed to obtain more informative coefficient for every enterprise. Before the determination of the final cut-off point of the total survey and thus the final survey frame, the itemwise cumulative coverage in each MP version (ESC, EMAWC, EMIWC and WCF) is investigated. As seen, the MP developed here is not an automatic or a mechanical procedure. It needs and presupposes assessment, experience and inferences in different steps of the application. Three objectives of the further research are essential: 1) to study the form and volatility of the itemwise sensitivity paths of different MP versions, 2) to study the coefficient of variation (CV) results of the itemwise variables in each MP version, and 3) to research the shape of the cei –coefficient distributions to study if the data is approximately suitable (skewed enough) for the cut-off sampling. These can also be conducted by simulations. For the fifth step of the MP, additional calculations for the extra frame population, augmenting the joint method of the MP and the classic cut-off method with two components is suggested: 1) with so-called modified cut-off sampling of Young et al. (1999) and 2) with a group of some graphical tools (classic and robust). For the pilot, pretesting and the post survey studies, it is important to produce itemwise investigation results e.g. graphically by a bag plot or via some robust graphics (see Rousseeuw et al., 1999). The results obtained may be valuable for redesigning and/or correcting the first survey version.
12
For the WCF version, the new variables for the intensity indicators should be searched to strengthen the cut-off sampling procedure (see Elisson and Elvers, 2001). For example, the size of personnel and the number of branched of the enterprises could be such new variables. The third example would be the number of the sectors in which an enterprise has business activity.
Bibliography Barnett,V. and Lewis, T. (1994), Outliers in Statistical Data, Third Edition, New York: Wiley. Biemer, P. and Lyberg, L. (2003), Introduction to Survey Quality, New York: Wiley. Colledge, M. J. (1995), "Frames and Business Registers: An Overview", in Cox, B. G., Binder, D. A., Chinnappa, B. N. Christianson, A. Colledge, M. J. and Kott, P. S. (eds.) Business Survey Methods, New York; Wiley, Chapter 2. Elisson, H. and Elvers, E. (2001), "Cut-off sampling and estimation", Proceedings of Statistics Canada Symposium 2001, Achieving data quality in a statistical agency: a methodological perspective, 8 pages. Groves, R., Fowler, F. Jr., Couper, M., Lepkowski, J., Singer, E., and Tourangeau, R. (2004), Survey Methodology, New Jersey: Wiley. Hansen, M. H., Hurwitz, W. N. and Madow, W. G. (1953), Sample Survey Methods and Theory, vol. 1 and 2, New York: Wiley. Jocelyn, W., Brodeur, M., Garriguet, D. and Forget, N. (2000), "Sampling and estimation strategies for commodity surveys: the Canadian annual survey of manufactures", Proceedings of the Second International Conference on Establishment Surveys (ICES II), pp. 147 – 155. Lee, H. (1995), "Outliers in Business Surveys", in Cox, B. G., Binder, D. A., Chinnappa, B. N. Christianson, A. Colledge, M. J. and Kott, P. S. (eds.) Business Survey Methods, New York: Wiley, Chapter 26. Rousseeuw, P. J., Ruts, I. and Tukey, J. W. (1999), "The Bagplot: a Bivariate Boxplot", The American Statistician, 53, pp. 382-387. Särndal, C.-E., Swensson, B. and Wretman, J. (1997), Model Assisted Survey Sampling (Corrected fourth printing), New York: Springer-Verlag. Young, J. – Sang Eun, L. – Jae Woo, N. (1999), "Comparing the cut-off and modified cut-off sample design on factory wastes statistics", Contributed paper in The 52nd Session of the International Statistical Institute, Helsinki, 2 pages.
13
Appendix. Theoretical framework Item 1
Enterprise 1
Enterprise 2
: Enterprise e
: Enterprise N
π11 π21
Item i
… c11 g11
π1i
c21 g21
π2i
ce1 ge1
πei
:
:
π1M
c2i g2i
π2M
cei gei
πeM
N
cNi gNi
N
e1
d1
∑π
…
e =1
Enterprises e = 1, …, N Items (questions) i = 1, …, M
c2M g2M
di
πNM
∑π e =1
M
i =1
cNM gNM
eM
dM
∑π e =1 M
Item enterprise weight gei = πei /
∑π i =1
ei
ei
∑π
2i
i =1
i =1
∑π
2i
i =1
i =1
∑π i =1
∑π
ei
∑π
di
i =1
∑π
Total =
i =1
N
(1 – di)
i =1
ei
(1 – di)
∑π
di
i =1
WCFMP
f1 *100 M f2 *100 M
MP1 + β ∗ I1 100 MP α ∗ 2 + β ∗ I2 100
fe *100 M :
M
Ni
Ie
:
:
M
Ni
2i
M
:
M
(1 – di)
: M
ei
1i
M
∑π
di
:
M
Ni
(1 – di)
fN *100 M
α∗
:
α∗ α∗
M
∑ ∑π
ei
M
Enterprise sum coefficient ESCe =
∑π i =1
ei M
Enterprise maximum weight coefficient EMAWCe =
∑π
ei
di
∑π
ei
(1 – di)
i =1 M
Enterprise minimum weight coefficient EMIWCe =
i =1
Weighted Combined Factor WCFeMP = α
9
∗
MPe + β ∗ I e , where α, β are subjectively determined and α + β = 1 100
Intensity indicator can correspondingly be calculated to each item (Ii) and to the total matrix (IT).
14
MPe + β ∗ Ie 100
:
*100, where fe is number of responses to different items for enterprise e
e =1 i =1 N
Enterprise item weight cei = πei /
∑π
M
e =1 i =1
fe M
M
di
i =1
M
M
= ∑ ∑ π ei
∑ π 1i
i =1
∑π
EMIWC
M
∑ π 1i
:
N
ei
EMAWC
: ceM geM
Intensity indicator 9 Ie = N
Euro value responses πei ; Total
c1M g1M
:
πNi
cN1 gN1
ESC
:
:
πN1
e =1
c1i g1i
:
πe1
∑π
Item M
MPN + β ∗ IN 100
Proceedings of Q2006 European Conference on Quality in Survey Statistics
Do Questionnaire Recommendations Lead to Measurable Improvements? Some Experiments with Alternate Versions of Complex Survey Questions Paul Beatty 1 , Floyd J. Fowler 2 , and Carol Cosenza2 1. Introduction
In designing survey questionnaires, researchers must often make decisions about how to optimally present complex information to respondents. For example, researchers may need to decide how to organize concepts within a question, whether to use one or more questions to obtain some particular information, or whether to illustrate concepts through examples or definitions. As questions grow more complex to meet data requirements, the need for better guidance on such decisions increases. While some questionnaire design guidance derives from experimental research (see Sudman, Bradburn and Schwarz, 1996 for an overview), much of it derives from principles, past experience, and common sense. At other times, guidance is unavailable or contradictory. We are often asked to provide recommendations on the wording and structure of a number of complex health questions. Some recommendations have been derived from cognitive interview results, although others have been based entirely upon technical review. In either case, it has generally been difficult to predict how these recommended changes would actually affect response distributions, or whether these changes would make the questions easier to administer to respondents. Such information would be useful in assessing how much of an improvement was made (if any) if these recommendations were adopted. We used a set of these questions and their alternatives as a basis for a series of experiments described below. Collectively, these experiments were designed to provide additional questionnaire design guidance regarding several issues related to complex questions. 2. Methods We selected a number of examples of draft complex questions that had been proposed for major federal health surveys. Not all of these questions had actually been administered, and some were substantially modified before going to the field. In each case, we constructed an alternative version that was designed to obtain the same information. Some alternatives used almost identical words, but varied the question structure. Other alternatives included simplified definitions or different approaches to
1
Paul Beatty, National Center for Health Statistics, 3311 Toledo Road, Room 3218, Hyattsville, MD, 20782 USA Floyd J. Fowler and Carol Cosenza, Center for Survey Research, University of Massachusetts-Boston, Boston, MA, 02125 USA 2
1
illustrate concepts (e.g., replacing a list of examples with a general definition). In this paper we will discuss five separate experimental manipulations. Question alternatives were embedded in one of two questionnaire instruments. These questionnaires were administered via a random-digit dialing (RDD) telephone survey (n=450) conducted by the University of Massachusetts—Boston. Because the purpose of the study was split-ballot experimentation (including a number of experiments not covered in this paper) and not creation of population-based estimates, we accepted any adult from contacted households to serve as a respondent. There was no attempt to convert initial refusals and callbacks were minimal. In each experiment, we were interested in whether the survey responses differed across question versions. Of course, it is not always possible to determine which of two distributions is more accurate, but we generally had a priori hypotheses about how we expected the experimental manipulations to affect responses and arguments regarding expected problems with one version of the question. Examination of the response distributions should enable us to confirm or refute these expectations. In addition, we tape recorded the interviews whenever respondents gave permission to do so. These recordings were behavior-coded using procedures summarized by Fowler and Cannell (1996). The purpose of behavior coding is to help us understand how easy the questions are to administer. A number of different codes were used, but this paper will focus on a select few: the number of times that the initial response to the question was inadequate; the number of times the respondent interrupted the question before it was fully read; the number of times some sort of probing was required to get a response; and the number of times the respondent asked for clarification, repeat of question, or similar assistance. It is worth noting that behavior coding has been used for many years to identify problematic questions, i.e., those that are difficult to administer in a standardized manner. However, the use of behavior coding to compare administration of alternative versions of the same question is relatively new, and poses some challenges: most of the differences we are considering are subtle, and the problems captured by behavior codes generally only affect a small proportion of the overall sample. We did not expect most difference to be statistically significant at the a=.05 level, and were looking for general trends—however, we did perform significance tests and include those results below. 3. Experiment 1: Question structure Consider the following survey question: “What kind of place do you usually go to for routine medical care? Is it a doctor’s office, clinic or health center, hospital emergency room, hospital outpatient clinic, or some other place?” The designers of this question desired to present an exact set of response categories to respondents. Ideally, we want respondents to hear the entire list of response categories, identify the one that best suits their situation, and report that single response to the interviewer. We do not want respondents to answer before hearing the entire list, because a more appropriate choice may be provided for them after they respond. We
2
also want respondents to respond using one of these exact response categories—this minimizes the amount of interpretation or probing required by interviewers to obtain a quantitative response, and maximizes standardization. However, there is ample anecdotal evidence that respondents often interrupt before reaching the end of the list. Other times, respondents provide responses that do not perfectly conform to one of the response choices. Presumably this is because when we reach the question mark, respondents begin the process of formulating their answer, and some respondents are not paying attention to the information we provide about the specific response choices we would like for them to use. Given that, is there an alternative way to structure the question that might improve its performance? The structure of the question above is very traditional: question followed by responses. One alternative would be to present the eligible responses before actually administering the question. Such a wording might look like this: “People can get routine care in different places, including a doctor’s office, clinic or health center, hospital emergency room, hospital outpatient clinic, or some other place. Which of those places do you usually go to for routine medical care?” In this version, the question mark is actually at the end of the question. At that point, respondents have heard everything that we want to present to them. There is very little potential for them to interrupt before then. On the other hand, there are some reasons to dislike this structure. It is potentially more awkward. Are respondents actually able to keep response categories in mind and make use of them before they know the response task that the categories are used for? Is it possible that this structure actually causes more administration problems than it fixes? Table 1: Behavior codes for Experiment 1 (question structure) V1 (n=156)
V2 (n=175)
signif
Interruptions
28.2%
2.9%
p