A supplement to the Manual for constructing questionnaires based on the Theory of ... University of Newcastle. 21 Claremont Place. Newcastle upon Tyne.
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
14
APPENDIX C: DISCUSSION PAPER
MEASUREMENT ISSUES IN THE THEORY OF PLANNED BEHAVIOUR: A supplement to the Manual for constructing questionnaires based on the Theory of Planned Behaviour
Jillian J Francis1; Marie Johnston2; Martin P Eccles1; Jeremy Grimshaw3; Eileen F S Kaner1
May, 2004
1
Centre for Health Services Research University of Newcastle 21 Claremont Place Newcastle upon Tyne NE2 4AA United Kingdom 2
University of Aberdeen, Scotland
3
University of Ottawa, Canada
43
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
The Theory of Planned Behaviour (TPB; Ajzen, 1985, 1988) and its precursor, the Theory of Reasoned Action (TRA; Fishbein & Ajzen, 1975) are prominent models in health psychology research. From 1985 to April 2004, the TPB has featured in 622 papers included in the PsycINFO data base and 230 papers included in the Medline database, and the frequency has increased during each five year period since the publication of the theory (for the combined databases: 8 in 1985-1989; 107 in 1990-1994; 312 in 1995-1999; 425 in 200004/2004). This theory has been applied to the maintenance and change of health related behaviours such as smoking cessation, condom use, exercise behaviour and dietary behaviour (e.g. reviews by Armitage & Conner, 2001; Godin & Kok, 1996). More recently the TPB has been used to investigate the behaviour of health care professionals (e.g. prescription of antibiotics for uncomplicated sore throat, Walker, Grimshaw & Armstrong, 2001; radiology referral for patients with lower back pain, Eccles, Bonetti, Johnston, Steen, Grimshaw, Baker, Walker & Pitts, 2002). In this context, TPB studies have been used in process evaluation (to identify mechanisms underlying interventions that attempt to change behaviour) and to identify the specific beliefs associated with professionals’ adherence to clinical guidelines. The TPB proposes that intention and perceived control over the behaviour are the proximal predictors of behaviour; that the three predictors of intention are attitudes, subjective norms and again, perceived behavioural control. These three predictors are ‘latent’ variables, that is, they cannot be directly observed, but must be inferred from questionnaire responses. They may be measured either directly (e.g. by asking people to report whether their attitude to the behaviour is favourable or unfavourable) or indirectly (by asking people about specific beliefs and then combining the scores according to theoretical principles to infer whether the attitude is favourable or favourable). TPB constructs are measured using questionnaires that are tailored to the specific behaviour of interest, following a preliminary qualitative study that determines item content. The behaviour of interest is first carefully defined in terms of doing what, to whom, where and when (Action, Target, Context and Time; the TACT principle).
44
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
TPB questionnaires take a ‘TACT-specific’ form, so each new TPB investigation requires the development of a tailored questionnaire. Because of the breadth of potential applications of the TPB in health behaviour and health care, a range of researchers across several disciplines are interested in using this model. For this reason, we have written a manual (Francis, Eccles, Johnston, Walker, Grimshaw, Foy, Kaner, Smith, Bonetti, 2004) to guide researchers in constructing questionnaires to measure the TPB constructs. The manual sets out the steps in creating a TPB questionnaire, including initial qualitative work, wording of questions, response formats and scoring. There are already several sources of advice about constructing TPB questionnaires (e.g. Ajzen, 2004; Conner & Sparks, 1995; Godin & Kok, 1996), but the manual referred to here seeks to provide a streamlined guide that integrates information from a range of sources. The relevant theoretical and research literature is sometimes confusing for researchers outside the field of psychology, as it contains diverse views about how to operationalise the theory. This paper raises some major points of conflict and presents arguments in an attempt to resolve them. In order to assist researchers to construct questionnaires using the manual, it was necessary to make decisions about issues that are often debated and to recommend a measurement strategy. Our aim in being prescriptive rather than discursive in the manual is not to stifle consideration of these issues, merely to assist those who find it efficient to follow a clear set of guidelines for constructing questionnaires. However, discussion of our operational decisions in the context of current and longstanding debates is required, and we provide it in this paper. The following issues are discussed in this paper: (1) Reasons why it is advisable to include in questionnaires both direct and indirect measures of the predictor variables; (2) The practice of scoring the indirect measures using the multiplicative composite approach; (3) The choice of appropriate endpoints when constructing response formats for measuring beliefs; (4) Appropriate tests of reliability of indirect measures; and (5) The suggestion that completing a questionnaire constitutes an intervention that may change behaviour, thereby confounding systematic attempts to use TPB questionnaires in evaluating interventions.
45
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
1. It is advisable to use both direct and indirect measures of the predictor variables The predictor variables are attitude to the behaviour; perceived social pressure to perform the behaviour; and perceived control over the behaviour. Direct measures take a generic form that is potentially applicable to a range of behaviours (e.g. Doing X is harmful/beneficial; pleasant/unpleasant; good/bad; worthless/valuable, for measuring attitude). There are obvious advantages of this approach. First, questionnaire items are easy to develop, as the same format can be used for a range of behaviours without the necessity to develop a different questionnaire for every behaviour under investigation. Second, responses with respect to different behaviours are directly comparable. A less obvious feature of this approach is that it assumes that people have direct access to these evaluations and can accurately give a ‘summary report’ of an underlying cognitive structure that may include components that are complex (i.e. consisting of subcategories), ambivalent (i.e. consisting of some positive and some negative beliefs) or irrelevant (i.e. unlikely to influence behaviour). Indirect measures, in contrast, take a specific form (e.g. By doing X it is unlikely … likely that I will have to review the patient’s medication). Such questions are applicable only to the behaviour being studied. Before constructing such a question it is usual to conduct a qualitative study to elicit the beliefs that most often come to mind among individuals in the target population. The TPB questionnaire includes questions about the likelihood that a belief itself is true, and also questions about the desirability – or undesirability – of the outcome described (e.g. Having to review the patient’s medication is extremely undesirable . . . extremely desirable). By weighting (multiplying) perceived likelihood by a number representing outcome desirability, an estimate can be made of the size of the contribution of a specific belief to global attitude, relative to the size of the contributions of other beliefs. This ‘indirect’ measurement approach does not make the assumption that individuals can give a summary estimate of their global attitude. However, it makes other assumptions. It assumes that people can accurately report their beliefs in a probabilistic way and can also report relative weightings. It further assumes that attitudes are composed of a rational combination of these weighted probabilities. Another complication is that this measurement
46
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
method depends on the development of questionnaire items that, together, have sufficient content validity – that cover the breadth of content of beliefs about the behaviour – to correlate well with the direct measure. It is likely that TPB questionnaires vary in this respect, as a function of a number of practical constraints such as desired questionnaire length, complexity of the behaviour and quality of information accessed in the elicitation study. This form of method variance alone is likely to account for a large portion of the differences between study results. Such differences may not reflect inadequacies of the model itself. Thus, the direct and indirect ways of measuring attitudes (and the other TPB predictors) make different assumptions about the cognitive structures and processes that underlie these variables and about the capacity of individuals to access and report them. Therefore, unless there are overwhelming reasons not to do so, it is good practice to include both types of measures in TPB questionnaires. It is also likely that in so doing, it will be possible to explain more variance in intentions than by using only one type of measure. A further reason why it is advisable to use both direct and indirect measures is that the correlations between them can be used to establish convergent validity. However, a problem associated with this is that, if correlations are unacceptably low, it is not at all obvious which type of measure is the ‘gold standard’ or criterion against which the validity of the other should be assessed. In such cases, one solution is to use an intention measure as the criterion. This sounds dangerously like circular reasoning if the aim of the study is to test the TPB model. However, given that hundreds of studies and several meta-analyses have shown that the model is robust, it seems reasonable to accept the model and, with some caution, to evaluate the validity of both direct and indirect measures against intention scores. 2. The scoring of indirect measures using the multiplicative composite approach It has been argued that using multiplicative composites is unsatisfactory from a statistical point of view because it involves multiplying entities by zero, when zero is part of an arbitrary scale rather than a ‘true score’ (e.g. Bagozzi, 1984; Schmidt, 1973). This claim requires some serious attention. As explained above, attitudes are measured ‘indirectly’ by weighting the perceived likelihood of a behavioural belief by a number representing the
47
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
desirability of the outcome and summing the weighted scores. In a similar way, subjective norms are measured by weighting normative beliefs (about whether specific groups or individuals are perceived to exert pressure on the responder to enact the behaviour) by the responder’s motivation to comply with these social pressures. Perceived behavioural control is measured indirectly by asking responders to estimate the probability that controlling factors will occur and these probabilities are weighted by the responder’s estimate of the power that these factors have to influence the behaviour. We argue that the statistical claim is based on confusion between multiplication as an interaction – which does require a scale that includes a ‘true’ zero – and as a weighting process – which is not subject to this constraint. An example of a multiplicative composite is illustrated in Figure 1.
B1
Patients get upset if I measure
Unlikely
1
2
3
4
5
6
7
Likely
Unlikely
1
2
3
4
5
6
7
Likely
Extremely
-3
-2
-1
0
+1
+2
+3
Extremely
their blood pressure
B2
If I measure a patient’s blood pressure I have to see patients more often
E1
Patients getting upset is …
undesirable
E2
Having to see patients more often is …
Extremely
desirable
-3
-2
-1
0
+1
+2
undesirable
+3
Extremely desirable
Figure 1. Example of questionnaire items to illustrate a multiplicative composite. Note. B = item to assess a behavioural belief; E = item to assess an outcome evaluation; bolded numbers indicate hypothetical responses.
48
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
To combine these responses into an attitude score, the response for behavioural belief B1 is weighted by the response for outcome evaluation E1. Thus, even if a behavioural belief is held to be likely, it will not impact on overall attitude if the individual’s evaluation is that, if it occurred, it would be neither a good nor a bad consequence of the behaviour. Conversely, even if the behavioural belief is held to be not very likely, it will impact (negatively) on the overall attitude score if the individual’s evaluation is that, if it occurred, it would be undesirable (see Figure 1, items B2 and E2). To allow for both kinds of judgements (behavioural belief and outcome evaluation), and to include an appropriate range of beliefs, the weighted beliefs are summed to create an overall (‘indirectly’ measured) attitude score. At this point it is worth noting that this summated weighting method is widely used across a range of theoretical and disciplinary perspectives. The subjective expected utility model of behavioural decision theory (Edwards, 1954), the instrumentality-value model of attitude (Rosenberg, 1956) and the expectancy-value models of social psychology (such as the TPB) have enjoyed wide currency for several decades. The statistical argument thus calls into question the validity of thousands of studies conducted over several decades. It is thus useful to consider whether the argument is valid. Although this statistical problem was pointed out three decades ago (e.g. Schmidt, 1973), it has been brought into clear focus with the recent publication of a thorough critique and recommended solutions by French and Hankins (2003). Focusing on the TPB, these authors make serious claims about the inadequacies of scoring the antecedent variables, e.g. behavioural beliefs and outcome evaluations, which together are proposed to be correlated with direct measures of attitudes. This argument is then generalised to the other composites in the model (normative beliefs and motivation to comply to predict subjective norms; control belief strength and control belief power to predict perceived behavioural control). Problems with Indirect Measures of Attitudes French and Hankins (2003) give a number of reasons why the multiplicative approach is flawed. First, “for this procedure of correlating the multiplicative composites derived in this way to be statistically sensible, a ratio scale is needed with a true rational zero” (p. 38). The
49
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
authors go on to argue that there is no true zero in a scale from -3 to +3 for measuring behavioural beliefs because it would be more sensible to treat them as probabilities and score them from 0 (impossible) to 1 (certain). This argument is very persuasive. Furthermore, it has a long history in psychology, according with Tolman’s (1932) views about ‘expectancies’ and with Fishbein and Ajzen’s (1975) advice that behavioural beliefs are subjective probabilities (supported by their example, scored from 0 to 1; p. 29)1. Thus, a unipolar scoring system for behavioural beliefs appears to be the most sensible approach. It does not necessarily follow from this, however, that the same argument applies to the other components of indirect measures (outcome evaluations, normative beliefs, and so on). This problem does not permit a simple rescaling solution, as French and Hankins point out. The choice of scaling (-3 to +3 versus +1 to +7) makes a difference to the rank orderings of individuals’ expectancy-belief scores and therefore influences the size of the correlation coefficients obtained. Substantial empirical and statistical evidence supports this claim. French and Hankins conclude that “potentially, this invalidates all the analyses examining the association between beliefs and attitude reported in the many TRA and TPB studies” (p. 40). However, from the point that correlations differ for the two methods, it does not logically follow that both of these methods are flawed; it simply follows that at least one of them is flawed. Perhaps it is preferable to acknowledge the problem and search for a suitable solution. The second problem is that, if both behavioural beliefs and outcome evaluations are scaled as -3 to +3, then multiplying them raises the issue of the double negative. Thus, a consequence that is both likely and desirable (+3 x +3 = +9) contributes to positive attitude to the same extent as a consequence that is both unlikely and undesirable (-3 x -3 = +9). This is a problem that threatens the face validity of such a scoring system2 (illustrated in Table 1) and so it seems appropriate to reject the option of scoring both variables on a -3 to +3 scale.
1
However, they do recommend that bipolar scoring “may be preferable in many practical situations” (p. 61, Note 7). 2 Interestingly, 30 years ago this very property of a scaling system that permits a double negative was seen as an advantage by Mitchell (1974, p. 1064).
50
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Table 1 presents a hypothetical example illustrating both these scoring methods. According to both methods, a total attitude score of zero would represent a neutral attitude whereas positive and negative numbers would represent, respectively, attitudes for and against this behaviour. There is thus a meaningful zero in this scale. The sense of these responses (from a hypothetical research participant) suggests that, on balance, the responder is slightly against ordering radiology tests for lower back pain. However, the double negative in item 4 results in an apparently positive attitude when two bipolar scales are used. A comparison of the face validity of Items 2 and 4 suggests that Item 2 represents a more strongly positive belief than Item 4; yet, the double negative results in a score of +9, compared with only +4 for Item 2. Ajzen (1991) has suggested that the solution is to use ‘optimal scaling’, whereby the scaling decision is made after patterns in the data have been detected. According to this view, the perceptions of study participants may be better reflected by a unipolar scale in some studies and by a bipolar scale in others. It seems reasonable to accept French and Hankins’ argument that this approach makes unacceptable assumptions about the similarities and differences between responders in different studies and thus poses problems of generalisability. The third problem follows from the analytic strategy suggested by Cohen (1978). Using this standard statistical approach, behavioural beliefs and outcome evaluations are entered into a hierarchical regression equation on the first step and the belief-evaluation interaction (the multiplicative composite) is entered at the second step. According to French and Hankins, most studies using this approach have reported that the interaction terms entered at the second step did not contribute further to the prediction of global attitude scores. That is, the interaction model did not add to the variance already explained by an additive model. However, it is important to note the argument of Eagly and Chaiken (1993) that a model in which behavioural beliefs and outcome evaluations are used in isolation from each other to predict attitudes is not theoretically meaningful. From this perspective, the theoretically coherent approach is to enter the multiplicatives first.
51
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Table 1. Hypothetical data to illustrate the difference between scoring using two bipolar scales and one each of bipolar and unipolar scales. Two bipolar scales Behavioural belief (B) Range: -3 (extremely unlikely) to +3 (extremely likely) 1a
Ordering a radiology test for lower back pain is useless diagnostically
1b
Doing something that is useless diagnostically is
2a
Ordering a radiology test for lower back pain shows the patient that I regard the symptoms as serious
2b
Showing the patient that I regard the symptoms as serious is
3a
Ordering a radiology test for lower back pain is a waste of NHS resources
3b
Wasting NHS resources is
4a
A radiology test for lower back pain exposes the patient to unsafe levels of radiation
4b
Outcome evaluation (E) Range: -3 (extremely undesirable) to +3 (extremely desirable)
One bipolar, one unipolar scale Product terms (BxE) Possible range: + 9
+1 -2
+2
Product terms (BxE) Possible range: + 21
-2
-10
+2
+12
-1
-5
-3
-3
+6 +2
+4
+1
+5 -1
-1
-3
Possible ranges Sums of products (attitude scores)
Outcome evaluation (E) Range: -3 (extremely undesirable) to +3 (extremely desirable)
+5 -2
Exposing the patient to unsafe levels of radiation is
Behavioural belief (B) Range: +1 (extremely unlikely) to +7 (extremely likely)
+1 -3
+9 + 36 +10
+ 84 -6
52
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Proposed Solution for Indirect Measurement of Attitudes French and Hankins (2003) recommend that the use of multiplicative composites should cease and that alternative analytic strategies should be chosen, depending on the research aims. Experimental designs (and ANOVA analyses) are effective solutions to the problem of multiplicatives if the research focuses on providing a test of the interaction between manipulated variables. This approach is appropriate if the research objective is to find whether solutions based on multiplicative composites make sense to people (but see Table 2 for a caveat to this general principle). Smith (1996) used this type of design and reported that data generated in this way gave strongest support for a model using unipolar scoring for behavioural beliefs and bipolar scoring for outcome evaluations. This is the approach that makes the most sense theoretically, as argued below. According to French and Hankins, to determine whether people use information in an additive or multiplicative way the technique called conjoint measurement analysis is appropriate. Any of the above solutions may be satisfactory from a statistical point of view, but as French and Hankins admit, they do not apply to the majority of TPB research, which is more concerned with conducting surveys to predict variance in an outcome variable. According to French and Hankins (2003), a technically acceptable solution for TPB surveys is to use individually salient beliefs rather than ‘modal’ beliefs (where questionnaire items are constructed based on a qualitative study, so as to represent the most frequently elicited beliefs from a sample selected from the target population). When an individual is asked to list beliefs about the outcomes of an action, it is reasonable to assume that these beliefs are regarded as likely by that individual. If that is the case, the issue of multiplicative composites may be bypassed, as outcome evaluations alone will predict attitude scores (Cronen & Conville, 1975). In practice, however, this method is unwieldy. It is much more demanding for participants, thus leading to low response rates and it may be difficult for some responders to understand the task, thus leading to missing or poor quality data3. Giving
3
Although Haddock and Zanna (1998) have established that responders are capabale of answering open-ended questions, they did not investigate response rates.
53
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
examples of behavioural beliefs to help participants understand the task is problematic, as these same beliefs may be reported by participants, leaving researchers unsure about whether they are actually salient or whether giving the example artificially increases the cognitive accessibility of a belief (Bargh, Chaiken, Govender & Pratto, 1992). French and Hankins’ (2003) concluding recommendations are that, when researchers are concerned with relating behavioural beliefs and outcome evaluations to global attitude scores, they should use the approach described by Schmidt (1973) and Cohen (1978), i.e. entering the individual constructs first and the interaction terms second; or the approach used by Haddock and Zanna (1998), i.e. using individually salient beliefs. We find neither of these solutions acceptable, the first because it is not theoretically sensible and the second because it is not operationally sensible. Taken together, then, the arguments of French and Hankins (2003) lead to an uncomfortable pair of options. Either we accept the statistically unsatisfactory practice of correlations based on variables derived from multiplying scores on two bipolar scales, or we accept the theoretically unsatisfactory practice of using behavioural beliefs and outcome evaluations in isolation to predict attitudes. If the problem consists of choosing between two unsatisfactory methods, perhaps we need to ask whether the problem arises from the way in which the issue is understood. Multiplicative Composites: Interactions or Weights? We propose that, in current discussions of the problem, researchers have developed a somewhat habitual way of construing the process of multiplication as an interaction. Cohen (1982) defined an interaction as “an effect on Y, over and above the overall effects of the separate research factors U and V, due to the joint operation of U and V” (p. 54). Rather than seeing multiplication always as the interaction between two variables, it is instructive to consider multiplication as a weighting process whereby different behavioural beliefs are assigned different levels of importance or influence. This is, we believe, in the spirit of the original theoretical formulation (Fishbein & Ajzen, 1975). In this text, Fishbein and Ajzen state that “a person’s attitude toward any object is a function of his beliefs about the object
54
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
and the implicit evaluative responses associated with those beliefs” (p. 29). It is notable that they use the phrase ‘a function of’ rather than ‘an interaction between’. Seeing multiplication as a weighting process has important implications. Two examples illustrate this (Boxes 1, 2). Box 1 When playing contract bridge, the score awarded for winning a trick varies according to the context in which the trick is won. For example, playing in a contract to take 8 tricks with hearts as trump, if the bidder makes an extra (ninth) trick, this trick may be worth 30, 100, 200 or 400 depending on whether the opponents challenged by ‘doubling’ or not (30), and if doubled, whether the bidder’s side was ‘not vulnerable’ (100) or ‘vulnerable’ (200) and whether the bidder ‘redoubled’ or not (200 not vulnerable or 400 vulnerable). However if the bidder had been in a contract to take 10 tricks but made only nine, that ninth trick would earn zero points. Thus, the score for the same behaviour – winning one trick – is weighted (multiplied) by 400, 200, 100, 30 or zero depending on the context of the game. In this example, multiplication is a valid process even when the scales involved are not ratio scales. For bridge, the behaviour (winning one extra trick) is scored on a nominal (dichotomous) scale and the weighting scale is an ordinal scale whose values are determined arbitrarily (by the rule book). Furthermore, the weighting scale includes zero and performance on each trick is multiplied (weighted) by the relevant number. Bridge enthusiasts may analyse many aspects of the game, but it seems certain that none will consider the weighting system in terms of a statistical interaction with the winning of tricks because, conceptually, a weighting process is very different from an interaction even though both processes involve multiplication. In the context of contract bridge, the weights 400, 200, 100, 30 and zero make no sense on their own. So it is with the weighting of behavioural beliefs by outcome evaluations. The same level of behavioural belief will represent different contributions to attitude scores for different individuals, depending on how it is weighted by different strengths of positive and negative outcome evaluations.
55
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Box 2 The second example more closely parallels the processes involved in creating multiplicative composites. Consider the fundamental multiple regression equation of Y scores on X scores, where a is the intercept constant and where each of k predictors is qualified by ‘beta weights’, β1, β2, β3 and so on (Formula 1). Y’ = a + β1X1 + β2X2 + . . . βkXk
Formula 1
These beta weights represent the relative contribution of each independent variable, X, to prediction of the dependent variable, Y (Cohen, 1978) and are identified empirically. Beta weights may be positive or negative. They may also be zero, that is, they may be attached to a variable which has absolutely no impact on the dependent variable. However, there are very few situations in which it would make sense to analyse the beta weights separately from the variables they qualify. Nor would it make sense to think of the regression analysis as an investigation of the interaction between the sum of the beta weights and the sum of the predictor variables. This is because here, also, multiplication is part of a summated weighting process, as distinct from an interaction effect. In essence, by proposing the use of an additive model, French and Hankins are suggesting that we should first examine the summed beta weights to determine whether they contribute to the prediction of Y.
In their seminal paper on expectancies, Olson, Roese and Zanna (1996) describe this process of weighting, also implying that one dimension is unipolar (subjective probability) and the other is bipolar (evaluation: good-bad). “In perhaps the best known expectancy-value formulation of attitudes, Fishbein and Ajzen (1975) propose that attitudes towards objects are based on the perceivers’ beliefs about the attributes of the object (subjective probability estimates that the object possesses various attributes), weighted by the perceivers’ evaluations of those attributes (judgments of the attributes along a good-bad dimension).” (p. 219)
56
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
In practice, weights are more commonly constant (across a sample) whereas interactions involve variables. However, weights can sometimes be variables. Indeed, Fishbein and Ajzen (1975) compared the effects of a linear model of information integration with the expectancy-value model, noting that, “according to a linear model, the weight placed on each variable is constant across stimulus persons” whereas “according to the expectancyvalue model, the ‘weights’ … can vary across stimulus persons” (p. 237). Some important distinctions between weighting schemes and interaction effects are presented in Table 2. From these examples we argue that outcome evaluations are weights. As such, they may be negative, zero or positive and they reflect the differential impact of behavioural beliefs on attitudes. Thus, it does not make conceptual sense for them to be treated as predictor variables (main effects) in their own right. Furthermore, if numbers are regarded as mere symbols that represent concepts, then it makes sense to base scaling decisions on the meaning of the concepts and not only on the properties of the numbers. Hence, a unidirectional concept (e.g. probability, as in behavioural beliefs) is better reflected by a unipolar measurement scale, whereas a bidirectional concept (e.g. good-bad, as in outcome evaluations) is better reflected by a bipolar measurement scale. The scaling solution with respect to investigating the antecedents of attitudes, then, is to measure behavioural beliefs on a unipolar scale (+1 to +7) and to weight them using outcome evaluations measured on a bipolar scale (-3 to +3). This approach is consistent with the advice of Mitchell (1974) and with the findings of Smith (1996). It has the added advantage of resulting in a composite score whose valence indicates the direction of the attitude. This reflects a fundamental feature of attitudes as defined in the literature. Fishbein (1967) agreed with Gordon Allport in stating that the “bipolarity in the direction of an attitude (i.e., the favourable versus the unfavourable) is often regarded as the most distinctive feature of the concept” (p. 477). A range of possible scores for which zero is the mid-point thus seems to have considerable face validity and is easy to interpret. A further argument for a bipolar weighting scale for outcome evaluations relates to the idiosyncratic and directional nature of evaluations. The example in Box 3 illustrates.
57
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Table 2. Differences between weighting schemes and interaction effects in the context of multiple regression analysis.
Features of interactions (e.g. between variables U and V)
Features of weighting schemes (e.g. behavioural beliefs by outcome evaluations)
1. Interest is in the combined effect on the dependent variable (Y) of two constructs, not individual items. The constructs may be assessable by means of multiple items but these would be summed prior to testing for interaction.
1. Interest is in the strength of influence on the dependent variable (attitude) of single items (Behavioural belief 1) in comparison with the strength of influence of other items (Behavioural beliefs 2, 3, 4). The items may be summed to obtain a score for the construct, but only after weighting.
2. The multiplication term “is not the interaction … but merely contains it”. For example, “for the product set UV to be made into the U x V interaction set … we need to partial U and V from it” (Cohen, 1982, p. 56; italics original).
2. The product term in its entirety (e.g., Behavioural belief 1 x its corresponding Outcome evaluation) contains the information of interest and nothing else.
3. The effects of the multiplicative elements are mathematically symmetrical. That is, when the Y, V relationship is conditional upon the level of U, this also holds with U and V interchanged (Cohen, 1982, p. 55). They are also conceptually symmetrical.
3. The effects of the multiplicative elements are not conceptually symmetrical because, although it makes sense to think of the items (Behavioural belief 1, Behavioural belief 2, etc.) influencing attitude to different degrees, it does not make sense to think of the weights on their own as influencing attitude.
4. Only one or two interaction terms at a time are readily interpretable, as each element in each product set is a different construct (e.g. attitude and educational attainment).
4. Three or more product sets (weighted items) are easily interpretable, as they refer to the same construct (e.g. attitude).
5. As a result of 4, the appropriate analytic strategy is to enter each product term (interaction) in separate steps of a hierarchical regression.
5. As a result of 4, the appropriate analytic strategy is to enter each product term (weighted value) in the same step of a hierarchical regression.
6. In the context of the TPB, there are no theoretical assumptions underlying the interaction approach: this technique is relevant to exploratory research, not theoretically based research.
6. In the context of the TPB, the weighting procedure is based on a theoretical assumption: that individuals can effectively weight their beliefs and report those weightings.
58
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Box 3 Imagine that the research goal is to determine people’s attitudes to watching Film X. A qualitative study leads to identification of the following modally salient beliefs: Film X is scary Film X is violent Film X features Gloria Dweeb In Film X, the sound track has more dramatic effect than the script To assess outcome evaluations, some researchers would construct questionnaire items in the form, ‘The scariness of a film is an important issue to me’ and so on. However, this does not clarify the critical question of whether the responder likes or dislikes scary films, likes or dislikes violent films, likes or dislikes the actor Gloria Dweeb, or prefers effective sound tracks relative to effective scripts. Thus, a weighting process ideally should elicit directional information from the responder. For some items, it is possible for the researcher to make a reasonable guess about the direction, but this may not always be accurate.
Indirect Measured of Subjective Norms Having come a little way down the conceptual path, let us return to the antecedent variables relating to subjective norms and perceived behavioural control. French and Hankins deal with these constructs with a brief statement of generality: “The statistical models of how other types of respondents’ beliefs combine to produce the Subjective Norm and Perceived Behavioural Control constructs in the TRA and TPB are essentially the same as the statistical model linking ‘expectancy-value’ beliefs and Attitude. Hence, the statistical criticisms can be generalized to the relationships proposed between other types of beliefs and the Subjective Norm and Perceived Behavioural Control constructs.”
(p. 39)
59
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
It may be true that the statistical problems are the same for all three constructs, but we have proposed a solution to the problem for attitudes, based on a conceptual distinction (between a weighting process and an interaction effect) relating to an arithmetic operation which is the same for both (multiplication). We have also suggested that the scaling decision should rest on the meaning of the concept to be measured (in particular, whether it is a unidirectional or bidirectional concept). Therefore, it may be illuminating to deal separately with the conceptual issues relating to the theoretical antecedents of subjective norms and perceived behavioural control. Proposed Solution for Indirect Measurement of Subjective Norms Subjective norms reflect perceived social pressure to engage in the behaviour. In the same way as for attitudes, indirect measurement involves summing a set of weighted beliefs. For subjective norms, normative beliefs – whether important others engage in, or approve/disapprove of, the behaviour – are weighted by how motivated the responder is to comply with that pressure. In this case the initial qualitative study elicits the individuals or groups whose opinions and practices are likely to result in perceived social pressure. For example (from Box 3), the responder’s closest friends may be going to see Film X this evening, resulting in perceived social pressure to go along. But the responder’s parents may strongly disapprove of the film, resulting in perceived social pressure not to go. The responder’s music teacher may be impressed by the sound track and think the responder should watch the film and observe how the music enhances the dramatic effects. The responder’s employer may wish her to work this evening instead of going to see the film. Which of these pressures will have the greatest influence on the behaviour? Scores representing these pressures are weighted by the responder’s motivation to comply with each, resulting in a set of directional scores which are then summed. Here also, we maintain that a weighting process is different from an interaction effect and that weights (scores for ‘motivation to comply’) make no sense in isolation from the beliefs they qualify. We have also argued the benefits of combining a unipolar scale with a
60
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
bipolar scale so that, after weighting, the valence of the score reflects the direction of the pressure. From the example given above, it seems important to register the direction of the social pressure by using a bipolar scale (-3 to +3) to assess normative beliefs. This poses a problem for the weighting scale, because if, to be consistent with our policy for attitudes, we apply a bipolar weighting scale, we return to the unpalatable possibility of double negatives. It is logically possible for double negatives to make sense (My parents disapprove of my going to see Film X and I always do the opposite of what they say, so I’ll go). However, because the vast majority of social referents are positive, it makes sense to adopt a ‘directional hypothesis’ by predicting that, unless there are exceptional circumstances, it is reasonable to use a unipolar weighting scale reflecting (positive) motivation to comply, scored from +1 to +7. Proposed Solution for Indirect Measurement of Perceived Behavioural Control This construct reflects internal (self-efficacy) and external (actual control) constraints on behaviour and was the predictor added to the original TRA model, resulting in the more recent TPB. Its antecedents are proposed to be control belief strength (e.g. It may be difficult for me to find transport to the cinema) weighted by the power of that belief to influence behaviour (e.g. Difficulty in finding transport to the cinema makes it much less likely that I will go to see the film). Thinking of the concepts as presented in this example, it seems that control belief strength is a matter of probability and hence would be scored on a unipolar scale (+1 to +7). In contrast, control belief power should reflect a bidirectional judgement (less likely or more likely), which is best assessed using a bipolar scale (-3 to +3). Taken together, these weighted control beliefs would then also be expressed in terms of a greater (positive scores) or lesser (negative scores) likelihood of engaging in the behaviour. The measurement of perceived behavioural control (PBC), however, may not be as straightforward as the Ajzen (1988) model suggests. Conner and Sparks (1995) note that “further research may be required to clarify the best way to assess the determinants of PBC” (p. 127). Several studies have investigated multiple measures of perceived control over health behaviour (e.g., Bonetti, Johnston, Rodriguez-Marin, Pastor, Martin-Aragon, Doherty,
61
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Sheehan, 2001; Steptoe & Wardle, 2001). Furthermore, Skinner (1991) argues that the structure of control-related beliefs changes with age. Taken together, these studies suggest that perceived control over health outcomes is a multidimensional construct comprising as many as six factors. Furthermore, internal and external control beliefs with respect to health behaviours are related to other conceptualisations of control (e.g. Skinner 1991; Wallston, Wallston & De Vellis, 1978). This multidimensionality suggests that, in constructing a set of control belief items, careful attention should be given to achieving a balance between different aspects of control (as suggested by Ajzen, 2002). As we argue below, this is an area in which it is advisable to maximise content validity through ensuring breadth of content, even if this appears to be at the expense of internal consistency. Before leaving the question of measuring control, one further issue is worth highlighting. Some confusion has arisen in some studies in which the behaviour of interest is not doing something, e.g. not prescribing antibiotics for patients with sore throat. For some behaviours (e.g. X), ‘doing X’ and ‘not doing X’ are not symmetrical with respect to control issues. For example, if it is relatively easy not to prescribe antibiotics for patients with sore throat, it does not follow that it is difficult to prescribe in this situation. Indeed, people with high generalised self-efficacy are likely to report that it is often easy not to do something and also easy to do it. Salient beliefs may also differ depending on whether the behaviour of interest involves doing something or not doing it. For these reasons, when defining the behaviour of interest in a TPB study, the direction of the definition should be consistent throughout the questionnaire. In our example of the prescribing behaviour of general medical practitioners, a study investigating ‘prescribing’ versus ‘not prescribing’ as the behaviour of interest would each require a different TPB questionnaire. Summary of Proposed Solutions for Indirect Measurement of TPB Constructs In summary, we have argued that the antecedent variables for each of the three predictor variables in the TPB are best measured using a weighting process involving one unipolar and one bipolar scale. The conceptual reasons to support this policy are different for each of the three predictor variables, but the properties of the resulting summed scores are the
62
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
same: each ranges from a negative number through zero to a positive number, thus reflecting both the direction and magnitude of the composite. In analysis, it makes sense to investigate the weighted beliefs, but not the weighting systems in isolation, because to have meaning, the weights (i.e. outcome evaluations; motivation to comply; and control power) require a set of corresponding beliefs to which they apply. We have argued that it is valid to use multiplicative composites as indirect measures of the predictor variables in the TPB, because they represent weighting procedures rather than interactions. Based on the principle of multiplying scores on a unipolar scale by scores on a bipolar scale, this method results in a composite score that is easily interpretable. Positive scores reflect favourable attitudes; social pressure to enact the behaviour; and control factors that make the behaviour more likely. Negative scores reflect unfavourable attitudes; social pressure not to enact the behaviour; and control factors that make the behaviour less likely. The proposed solutions for each of the predictor variables are summarised in Table 3.
Table 3. Summary of proposed solutions for indirect measurement of the predictor variables. Construct Attitude
Beliefs
Weights
Behavioural beliefs,
Outcome evaluations,
scored +1 to +7
scored -3 to +3
Normative beliefs,
Motivation to comply,
scored -3 to +3
scored +1 to +7
Perceived Behavioural
Control beliefs,
Control power,
Control
scored +1 to +7
scored -3 to +3
Subjective Norm
3. Appropriate endpoints and response formats The ideal number of response options is discussed by several authors (e.g. Ajzen, 1988; Godin & Kok, 1996). Consideration may be given to whether the responders in the sample are likely to be able to make valid distinctions between, say, 5, 6 or 7 levels of
63
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
likelihood and whether it is appropriate to offer a neutral response. Empirical work on the number of response options was done by Osgood, Suci & Tannenbaum (1957), who advised that the ideal number of points depends on the sample. More highly educated groups and more motivated groups are likely to manage a greater number of options. It seems that the majority of TPB studies have used student samples and a 7-option response format. Although pilot testing of a particular questionnaire may lead researchers to use a lower number of options, this will reduce potential variance in the data set and so should be done with caution. The labelling of endpoints is an important issue and there is some variation reported in the research literature. Of course, the choice depends on the stem of the question. An appropriate match between stem and endpoint is a significant consideration in maximising the face validity of the questions (and hence in enabling responders to give coherent answers). First, endpoints for direct measures of the predictor variables are considered. Direct measures of attitudes (i.e. whether the person is in favour of or against enacting the behaviour in question) usually involve semantic differential scales (Osgood et al., 1957), in which adjective pairs represent opposite ends of a dimension (e.g. good – bad; useful - useless). Valois & Godin (1991) have demonstrated that adjective pairs should be selected carefully, as the same set of endpoints can have different levels of consistency and predictive power in the context of different behaviours. Direct measures of subjective norms (i.e. perceived social pressure to enact the behaviour or not) are of two kinds: injunctive norms are people’s perceptions of what important other people think they should do. These are represented by the endpoints (People who are important to me think I) should/should not and approve/disapprove (of). In contrast, descriptive norms are people’s perceptions of what important other people actually do. These are represented by the endpoints (People who are important to me) do/do not and always (do)/never (do). The examples given here are of ‘endpoints’ which occur in the middle of a sentence, with a stem and tail formatted around the response scale. Another option is to make a statement in a full sentence and ask the responder to indicate their level of agreement on a scale from strongly agree to strongly disagree.
64
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Direct measures of perceived behavioural control are designed to tap the proposed internal and external dimensions of control. Internal control concerns a person’s perceived self-efficacy, i.e. whether the person feels confident about being able to enact the behaviour. It may be assessed using items involving endpoints not at all confident and extremely confident. External control concerns whether there are external constraints on the person’s ability to enact the behaviour and may be assessed using items involving endpoints completely/not at all (up to me). The three predictors of behavioural intentions may also be assessed indirectly by examining the antecedent beliefs (using the summed weighting process described above). Our earlier arguments relating to the use of multiplicative composites have some direct consequences for the choice of endpoints. If it is accepted that indirect measurement of attitudes involves a set of behavioural beliefs expressed as probabilities, the appropriate endpoints are likely/unlikely. Similarly, outcome evaluations may be expressed as directional statements by using extremely undesirable/extremely desirable or extremely bad/extremely good. Indirect measurement of subjective norms involves a set of normative beliefs expressed as directional statements (using the endpoints always do/never do and approve/disapprove) and a set of statements of motivation to comply expressed as a unipolar scale (not at all important/extremely important to me). Indirect measurement of PBC involves a set of control strength beliefs expressed as probabilities (likely/unlikely) and a set of control belief power statements expressed directionally (much less likely/much more likely). In summary, decisions about appropriate endpoints for questionnaire items are based on earlier decisions about the use of unipolar and bipolar scales. Proposed solutions are summarised in Table 4. In addition, some researchers strongly prefer to present items as complete sentences and to offer the endpoints, Agree/Disagree. This seems fine as long as it does not obscure the intent of the item as unipolar or bipolar. A central factor in selecting endpoints has been the issue of face validity: ensuring that there is clarity and consistency in sentences constructed as questionnaire items.
65
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Table 4. Summary of recommended endpoints for labelling direct and indirect measurement scales for the three TPB predictor variables. Construct Attitude
Type of measure Direct
Recommended endpoints Semantic differentials, carefully selected (To complete a sentence)
Indirect: Behavioural belief
Likely/unlikely
Indirect: Outcome evaluation
Extremely undesirable / extremely desirable Extremely bad/extremely good
Subjective Norm
Direct: injunctive norm
(Insert into sentence) think I should / should not approve / disapprove
Direct: descriptive norm
do / do not always do / never do (After complete sentence) Agree / Disagree
Indirect: Normative belief
As for direct measurement of subjective norm
Indirect: Motivation to comply
(To complete a sentence) not at all important/ extremely important
Perceived Behavioural
Direct: internal
(Insert into sentence) not at all confident/ extremely confident
Control Direct: external
(To complete a sentence) completely up to me/ not at all up to me
Indirect
(Insert into sentence) much less likely/ much more likely
66
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
4. Using internal consistency as the criterion for reliability of indirect measures The reliability of a measurement instrument is “the extent to which it yields consistent results over repeated observations” (Eagly & Chaiken, 1993, p. 67). Eagly and Chaiken explain that the internal consistency co-efficient, ‘alpha’, is an estimate of reliability that is based on the average of the correlations between all items in a scale. They state that “alpha is the current standard statistic for assessing the reliability of a scale composed of multiple items. … It is the most appropriate reliability measure to use for Likert and semantic differential scales because these methods assume that the items are parallel sample measures of the same attitude content domain” (p. 67). The objective of this section is to argue that Eagly and Chaiken’s view does not apply to indirect measures of attitudes, subjective norms and perceived behavioural control because beliefs relating to any one of these predictors do not necessarily belong to the same content domain. For example, Box 4 presents two behavioural beliefs, two normative beliefs and two control beliefs that may be associated with GPs’ examining the feet of patients with diabetes. On the face of it, there is no reason to suppose that responses to these pairs of items would be highly correlated with each other, so they are not obviously parallel measures of the same content domain. Hence, Eagly and Chaiken’s advice suggests that alpha reliability is not appropriate for these indirect measures.
Box 4 The following pairs of beliefs do not necessarily belong to the same content domain. Poor correlations between scores on these items would not necessarily indicate poor validity. Behavioural beliefs: Examining the feet of my patients with diabetes encourages them to care for their feet Examining the feet of my patients with diabetes may alert me to circulation problems Normative beliefs: Other GPs examine the feet of their patients with diabetes Chiropodists would approve of my examining the feet of my patients with diabetes Control beliefs: Examining the feet of my patients with diabetes can be done quickly Some of my patients with diabetes have poor foot hygiene
67
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
It is generally agreed (e.g. Weinstein & Sandman, 2002) that expectancy–value models aim to represent the cognitive processes by which expected benefits, advantages or positive evaluations are weighed against expected costs, disadvantages or negative evaluations. That is, attitudes or other constructs involving evaluations are composed of positive and negative components, with negatives subtracted from positives to give an overall evaluation. In other words, such constructs as attitudes are not necessarily internally consistent. Fishbein (1967) argues this point convincingly: “while an individual’s attitude will be highly correlated with an estimate based on a consideration of many of his beliefs, it may be uncorrelated or even negatively correlated with any single belief considered in isolation” (p. 480). To reject inconsistent items is akin to rejecting all the negative beliefs about a behaviour and include only the positive beliefs, thereby not only losing important information but resulting in an attitude score that is more extreme than the ‘true’ attitude. It is little wonder, then, that attitudes measured in this way are often not good predictors of behaviour. It is also hardly surprising that, by adding another construct called attitude ambivalence, prediction of behaviour can be improved: adding this new construct is like putting back the same information that was earlier rejected because of its inconsistency with the majority of behavioural belief x evaluation products. Eagly and Chaiken (1993) go on to explain the relation between reliability and validity. “The reliabilities of the various measures [of a construct] set an upper bound for the validity co-efficients. According to classic true score theory, a measure’s [convergent] validity co-efficient cannot be greater than the square root of its reliability co-efficient” (p. 69). However, in the case of indirect measures, the above argument shows that raising the alpha reliability co-efficient by eliminating items that correlate poorly with other items actually reduces the content validity of the attitude measure by requiring the attitude to be composed of evaluatively and probabilistically uniform beliefs. For all three predictor variables, content validity is crucially dependent on including items that cover the breadth of content of the constructs. In this situation, then, there seems to be a trade off between validity and reliability if reliability is assessed using an internal consistency criterion.
68
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
There are two possible solutions to this problem. One is to enter positively evaluated beliefs and negatively evaluated beliefs as separate predictors in a regression analysis. However, this still does not overcome the problem that beliefs representing extremely positive and probable features and slightly positive or slightly probable features will not be highly correlated with each other. But a greater problem lies in the possibility that this process does not represent the subtractive nature of the trade off between positive and negative features. Examples are presented in Boxes 5 and 6. Box 5 Imagine that I want to measure my financial savings capacity. I earn £1,000 each month and my cost of living is £800 each month, and I want to use these data to infer savings capacity. I can assess the convergent validity of these measures against the increase in my bank balance at the end of each month. I could enter earnings and costs in a reliability analysis in which the benefit of working is 1,000 and the benefit of living is -800. Of course, neither of these will produce an alpha co-efficient unless they fluctuate over repeated measures. If there are fluctuations, then the alpha co-efficient will probably be unacceptable because when my income is 1,100 I will probably spend more, so my ‘benefit of living’ is likely to be -900, i.e. the two measures may be zero- or negatively- correlated. But savings capacity would be much more easily and accurately measured by ‘adding’ the positive and negative scores, giving a savings capacity score of 200. This is likely to converge well with my bank balance data.
Box 6 Imagine that I want to measure a person’s attitude to eating fattening foods. Imagine also that, at Time 1, the person rates sweet, high-fat foods as extremely delicious (and deliciousness as extremely desirable; score of +7 x +3 = +21) and also as extremely fattening (and fatteningness as quite undesirable; score of +7 x -2 = -14). Over repeated measures of these items I could enter deliciousness ratings and fattening-ness ratings, and subject the scores to internal consistency analysis. Because these beliefs are not logically connected, there is no reason to suppose that deliciousness beliefs will fluctuate in the same as direction fattening-ness beliefs. So there will likely be a close-to-zero correlation between weighted beliefs relating to this attitude. Thus the alpha co-efficient will be unacceptable. But attitude would be much more easily and accurately measured by adding the positive and negative scores, giving a general attitude score of +7 at Time 1. This is likely to be consistent with actual behaviour, that is, we would predict that the person will tend to eat fattening foods.
69
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
In summary, it is perfectly reasonable to propose that people may hold beliefs of opposite valence about any particular behaviour. However, psychology appears to have taken on the habit of using internal consistency as a criterion for reliability without considering the nature of the construct being measured. It would be counter-productive to lose important data about attitudes by unthinkingly applying this principle. Reliability, then, is better assessed using test-retest methods or perhaps parallel forms of the indirect attitude measurement instrument. Note that no trade off is required if reliability is assessed using the test-retest method. Test-retest reliability is recommended by Ajzen (2004). 5. Completion of a TPB questionnaire may change behaviour Some have argued (e.g. Ogden, 2003) that completing a questionnaire may itself alter cognitions and behaviour, and that it is thus not valid to use questionnaires in social cognition research, including TPB research. This criticism is simply a specific example of the general scientific principle that measurement affects behaviour. Measurement influences the phenomenon to be measured in the physical domain as well as psychological domain. For example, it is well known that ‘white coat syndrome’ can result in elevated blood pressure readings. So, rather than arguing that models requiring measurement are flawed, it is more instructive to ask, to what extent is this a problem, what are the underlying psychological processes and are there strategies to manage the problem? There are many models in psychology that document and explain the effect of measurement on behaviour. Large research literatures focus on the effects of research procedures such as priming, salience, order effects in questionnaires, systems whereby people generate targets, goals or plans, and other types of self-regulation. These models not only explain these effects but predict them: they are part of the theory. An example is Selfawareness theory (Duval & Wicklund, 1972). According to this theory, attention can be turned outward to the environment (with self as subject) or inward towards the self (with self as object). Objective self-awareness activates a process in which salient aspects of the self and relevant ‘self-standards’ are compared. When people are objectively self-aware, they are likely to be discomforted by self-standard discrepancies and therefore will tend to alter their
70
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
behaviour so that it more closely meets the standard. Objective self-awareness can be triggered by fairly simple environmental cues such as a picture of a camera on a motorway or a questionnaire that asks about a particular behaviour. The idea that TPB questionnaires invite a process of self-regulation and behaviour change is thus not new. Of course, a proposed causal chain whereby questionnaire completion influences cognitions and, in turn, behaviour is not the only possibility. It is also well known that behaviour can change cognitions (e.g. Festinger & Carlsmith, 1959), and thoughtful consideration of this phenomenon is needed is some research contexts. So, how can these effects best be managed? If administering a questionnaire may confound the effects of other interventions, for example, when a TPB survey is conducted in parallel with a controlled experiment or trial, then at least two management strategies may be effective. First, where the study procedure includes before (Time 1) and after (Time 2) measures administered to the intervention and control groups, the study design may include a second control group of participants who are matched on known variables, and who complete only the Time 2 measures. Alternatively, possible confounding effects of administering a TPB questionnaire may be minimised by using a brief form of the questionnaire (e.g. using only direct measures) at Time 1, to establish group equivalence, and administering a longer form (using direct and indirect measures) at Time 2. Changes in specific beliefs as a result of the intervention could then be inferred from differences between the intervention and control groups at Time 2. In the context of observational studies, it is important for researchers to be mindful of the possibility that studies that are apparently observational may in fact be intervention studies. There are several implications of this possibility. First, TPB surveys that are undertaken to assess the prevalence of a particular behaviour in a population may misestimate the true population parameters. For example, if a sample of employees completes a TPB survey about their attitudes, subjective norms and perceived behavioural control with respect to exercising at the office gym and their behaviour is assessed through gym records, it is possible that completing the questionnaire will increase exercise behaviour. Estimates of
71
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
exercise behaviour among the larger employee group based on this study would thus be somewhat optimistic. A further implication is that, even in the context of an exploratory pre-intervention study, health researchers may succeed in altering cognitions and behaviour in the desired (more healthy) direction. Rather than threatening the validity of the theoretical models used, this possibility represents good news for public health researchers. In summary, the notion that completing questionnaires may change behaviour is a well documented phenomenon and, depending on the objectives of the research, may influence researchers’ decisions about study designs, materials and procedures. To conclude, the Theory of Planned Behaviour has enjoyed nearly two decades of empirical work and has informed hundreds of investigations into health behaviour and behaviour change. The theoretical structure is adaptable to a vast range of behaviours. However, this very flexibility renders the model scientifically treacherous: every action, target, context and time must use a new measurement instrument that requires careful development,
pre-testing
and
assessment
of
psychometric
properties.
In
short,
operationalising the TPB constructs is hard work. Instead of continuing to conduct TPB research and reporting that the results generally support the model, or provide a useful framework for the research, is behoves psychologists to redouble efforts to maximise the validity and reliability of the measurement tools and methods. The development of each questionnaire using rigorous methods and critical thinking is required before research findings can be used to judge the model. This paper has addressed some recent issues and longer standing traditions in measuring the TPB constructs. The discussion is grounded in theoretical, statistical and conceptual arguments. These arguments are offered in the hope that a critical, theoretically coherent perspective on measurement will give researchers confidence in the capacity of the Theory of Planned Behaviour to be a fruitful basis for behaviour change research for some time yet.
72
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
References Ajzen, I. (1985). From intentions to actions: A theory of planned behavior. In J Kuhl, J Beckman (Eds), Action control: From cognition to behavior (pp. 11-39). New York: Springer. Azjen, I. (1988). Attitudes, personality and behavior. Milton Keynes; OUP. Ajzen I. (1991). The theory of planned behaviour. Organizational Behavior and Human Decision Processes, 50, 179-211. Ajzen I. (2002). Perceived behavioral control, self-efficacy, locus of control, and the Theory of Planned Behavior. Journal of Applied Social Psychology, 32, 665-683. Ajzen I. (2004). Website: http://www-unix.oit.umass.edu/%7Eaizen/ Armitage CJ, Conner M (2001). Efficacy of the Theory of Planned Behaviour: A metaanalytic review. British Journal of Social Psychology, 40, 471-499. Bagozzi RP (1984). Expectancy-value attitude models: An analysis of critical measurement issues. International Journal of Research in Marketing, 1, 295-310. Bargh JA, Chaiken S, Govender R, Pratto F. (1992). The generality of the automatic attitude activation effect. Journal fo Personality and Social Psychology, 62, 893-912. Bonetti D, Johnston M, Rodriguez-Marin J, Pastor M, Martin-Aragon M, Doherty E, Sheehan K (2001). Dimensions of perceived control: A factor analysis of three measures and an examination of their relation to activity level and mood in a student and cross-cultural sample. Psychology and Health, 16, 655-674. Cohen J (1978). Partialed products are interactions: partialed powers are curve components. Psychological Bulletin, 85, 858-866. Cohen J (1982). “New-Look” multiple regression/correlation analysis and the analysis of variance/covariance. In G Keren (Ed.), Statistical and methodological issues in psychology and social sciences research (pp. 41-69). Hillsdale, NJ: Erlbaum. Conner M, Sparks P. (1995). The Theory of Planned Behaviour and health behaviours. In M Conner, P Norman (Eds), Predicting health behaviour pp. 121-162. Buckingham: OUP.
73
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Cronen VE, Conville RL (1975). Fishbein’s conception of belief strength: A theoretical, methodological, and experimental critique. Speech Monographs, 42, 143-150. Duval TS, Wicklund RA (1972). A theory of objective self-awareness. New York: Academic Press. Eagly A, Chaiken S (1993). The psychology of attitudes. Forth Worth TX: Harcourt Brace Jovanovich. Eccles MP, Bonetti D, Johnston M, Steen N, Grimshaw J, Baker R, Walker A, Pitts N. (2002). The impact of audit and feedback and educational reminder messages on radiology referrals: An implementation modelling experiment. Edwards W (1954). The theory of decision making. Psychological Bulletin, 51, 380-417. Festinger L, Carlsmith JM. (1959). Cognitive consequences of forced compliance. Journal of Abnormal and Social Psychology, 58, 203-210. Fishbein M (1967). Attitude and the prediction of behavior. In M Fishbein (Ed.), Readings in attitude theory and measurement (pp. 477-492). New York: Wiley. Fishbein M, Ajzen I (1975). Belief, attitude, intention, and behaviour. New York: Wiley. Francis JJ, Eccles MP, Johnston M, Walker A, Grimshaw J, Foy R, Kaner EFS, Smith E, Bonetti D. (2004). Constructing questionnaires based on the Theory of Planned Behaviour: A manual for health services researchers. Centre for Health Services Research, University of Newcastle upon Tyne, UK. French DP & Hankins M (2003). The expectancy-value muddle in the theory of planned behaviour – and some proposed solutions. British Journal of Health Psychology, 8, 37-55. Godin G, Kok G (1996). The Theory of Planned behaviour: A review of its applications to health-related behaviours. American Journal of Health Promotion, 11, 87-98. Haddock G, Zanna MP (1998). On the use of open-ended measures to assess attitudinal components. British Journal of Social Psychology, 37, 129-149. Mitchell TR (1974). Expectancy models of job satisfaction, occupational preference and effort: A theoretical, methodological, and empirical appraisal. Psychological Bulletin, 81, 1053-1077.
74
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Ogden, J. (2003). Some problems with social cognition models: A pragmatic and conceptual analysis. Health Psychology, 22, 424-428. Olson JM, Roese NJ, Zanna MP (1996). Expectancies. In ET Higgins & AW Kruglanski (Eds), Social psychology: Handbook of basic principles (pp. 211-238). NewYork: Guilford. Osgood CE, Suci GJ, Tannenbaum PH (1957). The measurement of meaning. Urbana: University of Illinois Press. Rosenberg MJ (1956). Cognitive structure and attitudinal affect. Journal of Abnormal and Social Psychology, 53, 367-372. Schmidt FL (1973). Implications of a measurement problem for expectancy theory research. Organizational Behavior and Human Performance, 10, 243-251. Skinner EA. (1991). Development and perceived control: A dynamic model of action in context. In MR Sroufe, LA Sroufe (Eds), Self processes and development. The Minnesota symposia on child psychology, Vol. 23 (pp. 167-216). Hillsdale, NJ: Erlbaum. Smith JL (1996). Expectancy, value and attitudinal semantics. European Journal of Social Psychology, 26, 501-506. Steptoe A, Wardle J (2001). Locus of control and health behaviour revisited: A multivariate analysis of young adults from 18 countries. British Journal of Psychology, 92, 659-672. Tolman EC (1932). Purposive behaviour in animals and men. New York: Appleton-CenturyCrofts. Valois P, Godin G. (1991). The importance of selecting appropriate adjective pairs for measuring attitude based on the semantic differential method. Quality and Quantity, 25, 57-68. Walker AE, Grimshaw JM, Armstrong EM (2001). Salient beliefs and intentions to prescribe antibiotics for patients with a sore throat. British Journal of Health Psychology, 6, 347360. Wallston KA, Wallston BS, DeVellis RF (1978). Development of the Multidimensional Health Locus of Control (MHLC) Scale. Health Education Monographs, 6, 160-170.
75
ReBEQI WP2 Theory of Planned Behaviour Questionnaires: Discussion paper
Weinstein ND, Sandman PM (2002). Reducing the risks of exposure to radon gas: An application of the precaution adoption process model. In D Rutter & L Quine (Eds), Changing health behaviour: Intervention and research within social cognition models (pp. 66-86). Buckingham: OUP.
76