If both variables are nominal (category) then compound bar charts of various sorts .... In line with general mathematical notation, the horizontal axis or horizontal ...
CHAPTER 6
Relationships between two or more variables Diagrams and tables
Overview z Most research in psychology involves the relationships between two or more variables. z Relationships between two score variables may be represented pictorially as a
scattergram (or scatterplot). Alternatively, a crosstabulation table with the scores broken down into ranges (or bands) is sometimes effective. z If both variables are nominal (category) then compound bar charts of various sorts
may be used or, alternatively, crosstabulation tables. z If there is one score variable and one nominal (category) variable then often tables of
means of the score variable tabulated against the nominal (category) variable will be adequate. It is possible, alternatively, to employ a compound histogram.
Preparation You should be aware of the meaning of variables, scores and the different scales of measurement, especially the difference between nominal (category) measurement and numerical scores.
60
PART 1 DESCRIPTIVE STATISTICS
6.1
Introduction Although it is fundamental and important to be able to describe the characteristics of each variable in your research both diagrammatically and numerically, interrelationships between variables are more characteristic of research in most areas of psychology and the social sciences. Public opinion polling is the most common use of single-variable statistics that most of us come across. Opinion pollsters ask a whole series of questions about political leaders and voting intentions which are generally reported separately. However, researchers often report relationships between two variables. So, for example, if one asks whether the voting intentions of men and women differ it is really to enquire whether there is a relationship between the variable ‘gender’ and the variable ‘voting intention’. Similarly, if one asks whether the popularity of the President of the USA changed over time, this really implies that there may be a relationship between the variable ‘time’ and the variable ‘popularity of the President’. Many of these questions seem so familiar to us that we regard them almost as common sense. Given this, we should not have any great difficulty in understanding the concept of interrelationships among variables. Interrelationships between variables form the bedrock of virtually all psychological research. It is rare in psychology to have research questions which require data from only one variable at a time. Much of psychology concerns explanations of why things happen – what causes what – which clearly is about relationships between variables. This chapter describes some of the main graphical and tabular methods for presenting interrelationships between variables. Diagrams and tables often overlap in function as will become apparent in the following discussion. We should emphasise that graphs and tables are not simply ways of smartening up a report or dissertation. Their function in statistical analysis is much deeper than this and they are at the heart of the analytic work of the researcher. Graphs and tables should be the mainstay of a good statistical analysis not the end product. Their role is crucial from the start of the analysis as part of the familiarisation process with one’s data which leads to understanding of what is going on in the data. So looking at charts which first of all give the distributions of each of the variables in your study is the initial stage. This can lead you to identify problems such as very skewed distributions for a variable or bunching and clustering around particular data points. Then you can move onto the graphs and tables which allow you to understand the relationships between two variables. This may well be your first indication that your expectations are being confirmed by your data. But it may show that the relationships that you are expecting are more complex than you imagined or that there is a possibility that there are outliers which spuriously appear to create a relationship between your variables but there is no relationship for the bulk of the data. One has to enter this phase with an open mind since it involves getting to understand your data and becoming familiar with its characteristics. This is why you do research. They may seem like very basic procedures compared with the riches of more advanced statistics but they are basic because they are the base from which your analysis is built. Figure 6.1 gives the key steps to consider when describing relationships between two variables in diagram and table form.
6.2
The principles of diagrammatic and tabular presentation Choosing appropriate techniques to show relationships between two variables requires an understanding of the difference between nominal category data and numerical score
CHAPTER 6 RELATIONSHIPS BETWEEN TWO OR MORE VARIABLES
Table 6.1
Types of relationships based on nominal categories and numerical scores Variable X = numerical scores
Variable X = nominal categories
Variable Y = numerical scores
type A
type C
Variable Y = nominal categories
type C
type B
FIGURE 6.1
61
Conceptual steps for showing relationships between two variables
data. If we are considering the interrelationships between two variables (X and Y) then the types of variable involved are as shown in Table 6.1. Once you have decided to which category your pair of variables belongs, it is easy to suggest appropriate descriptive statistics. We have classified different situations as type A, type B and type C. Thus type B has both variables measured on the nominal category scale of measurement.
6.3
Type A: both variables numerical scores Where both variables take the form of numerical scores, generally the best form of graphical presentation is the scattergram or scatterplot. This is a sort of graph in which the values on one variable are plotted against the values on the other variable. The most familiar form of graph is one that plots a variable against time. These are very familiar from newspapers, especially the financial sections (see Figure 6.2).
62
PART 1 DESCRIPTIVE STATISTICS
FIGURE 6.2
The dramatic fall in share price in the Timeshare Office Company
FIGURE 6.3
A scattergram showing the relationship between two variables
Time is no different, statistically speaking, from a wide range of other numerical scores. Figure 6.3 is an example of a scattergram from a psychological study. You will see that the essential features remain the same. In Figure 6.3, the point marked with an arrow represents a case whose score on the X-variable is 8 and whose score on the Y-variable is 120. It is sometimes possible to see that the points of a scattergram fall more or less on a straight line. This line through the points of a scattergram is called the regression line. Figure 6.3 includes the regression line for the points of the scattergram. One complication you sometimes come across is where several points on the scattergram overlap completely. In these circumstances you may well see a number next to a point which corresponds to the number of overlapping points at that position on the scattergram. In line with general mathematical notation, the horizontal axis or horizontal dimension is described as the X-axis and the vertical axis or vertical dimension is called the Y-axis. It is helpful if you remember to label one set of scores the X scores since these belong on the horizontal axis, and the other set of scores the Y scores because these belong on the vertical axis (Figure 6.4).
63
CHAPTER 6 RELATIONSHIPS BETWEEN TWO OR MORE VARIABLES
FIGURE 6.4
Table 6.2
A scattergram with the X- and Y-axes labelled and overlapping points illustrated
Use of bands of scores to tabulate the relationship between two numerical score variables
Variable X
Variable Y 1–5
6–10
11–15
16–20
21–25
0–9
15
7
6
3
4
10–19
7
12
3
5
4
20–29
4
9
19
8
4
30–39
1
3
2
22
3
40–49
3
2
3
19
25
In Figure 6.4, overlapping points are marked not with a number but with lines around the point on the scattergram. These are called ‘sunflowers’ – the number of ‘petals’ is the number of cases overlapping at the same point. So if there are two ‘petals’ then there are two people with the same pattern of scores on the two variables. If there are three ‘petals’ then three people have exactly the same pattern of scores on the two variables. Another way of indicating overlaps is simply to put the number of overlaps next to the scattergraph point. Apart from cumbersomely listing all of your pairs of scores, it is often difficult to think of a succinct way of presenting data from pairs of numerical scores in tabular form. The main possibility is to categorise each of your score variables into ‘bands’ of scores and express the data in terms of frequencies of occurrence in these bands; a table like Table 6.2 might be appropriate. Such tables are known as ‘crosstabulation’ or ‘contingency’ tables. In Table 6.2 there does seem to be a relationship between variable X and variable Y. People with low scores on variable X also tend to get low scores on variable Y. High scorers on variable X also tend to score highly on variable Y. However, the trend in the table is less easily discerned than in the equivalent scattergram.
64
PART 1 DESCRIPTIVE STATISTICS
6.4
Type B: both variables nominal categories Where both variables are in nominal categories, it is necessary to report the frequencies in all of the possible groupings of the variables. If you have more than a few nominal categories, the tables or diagrams can be too big. Take the imaginary data shown in Table 6.3 on the relationship between a person’s gender and whether they have been hospitalised at any time in their life for a psychiatric reason. These data are ideal for certain sorts of tables and diagrams because there are few categories of each variable. Thus a suitable table for summarising these data might look like Table 6.4 – it is called a contingency or crosstabulation table. The numbers (frequencies) in each category are instantly obvious from this table. You might prefer to express the table in percentages rather than frequencies, but some thought needs to go into the choice of percentages. For example, you could express the frequencies as percentages of the total of males and females (Table 6.5).
Table 6.3
Table 6.4
Gender and whether previously hospitalised for a set of 89 people Person
Gender
Previously hospitalised
1
male
yes
2
male
no
3
male
no
4
male
yes
5
male
no
...
...
...
85
female
yes
86
female
yes
87
female
no
88
female
no
89
female
yes
Crosstabulation table of gender against hospitalisation Male
Female
Previously hospitalised
f = 20
f = 25
Not previously hospitalised
f = 30
f = 14
Table 6.5
Crosstabulation table with all frequencies expressed as a percentage of the total number of frequencies Male
Female
Previously hospitalised
22.5%
28.1%
Not previously hospitalised
33.7%
15.7%
CHAPTER 6 RELATIONSHIPS BETWEEN TWO OR MORE VARIABLES
65
You probably think that Table 6.5 is not much of an improvement in clarity. An alternative is to express the frequencies as percentages of males and percentages of females (Table 6.6). By presenting the percentages based on males and females separately, it is easier to see the trend for females to have had a previous psychiatric history relatively more frequently than males. The same data can be expressed as a compound bar chart. In a compound bar chart information is given about the subcategories based on a pair of variables. Figure 6.5 shows one example in which the proportions are expressed as percentages of the males and females separately. The golden rule for such data is to ensure that the number of categories is manageable. In particular, avoid having too many empty or near-empty categories. The compound bar chart shown in Figure 6.6 is a particularly bad example and is not to be copied. This chart fails any reasonable clarity test and is too complex to decipher quickly.
Table 6.6
Crosstabulation table with hospitalisation expressed as a percentage of the male and female frequencies taken separately Male
Female
Previously hospitalised
40.0%
64.1%
Not previously hospitalised
60.0%
35.9%
FIGURE 6.5
Compound percentage bar chart showing gender trends in previous hospitalisation
FIGURE 6.6
How not to do a compound bar chart
66
PART 1 DESCRIPTIVE STATISTICS
6.5
Type C: one variable nominal categories, the other numerical scores This final type of situation offers a wide variety of ways of presenting the relationships between variables. We have examined the compound bar chart so it is not surprising to find that there is also a compound histogram. To be effective, a compound histogram needs to consist of: z a small number of categories for the nominal category variable z a few ranges for the numerical scores.
So, for example, if we wish to plot the relationship between managers’ anxiety scores and whether they are managers in a high-tech or a low-tech industry, we might create a compound histogram like Figure 6.7 in which there are only two values of the nominal variable (high-tech and low-tech) and four bands of anxiety score (low anxiety, medium anxiety, high anxiety and very high anxiety). An alternative way of presenting such data is to use a crosstabulation table as in Table 6.7. Instead, however, it is almost as easy to draw up a table (Table 6.8)
FIGURE 6.7
Table 6.7
A compound histogram
Crosstabulation table of anxiety against type of industry Frequency of anxiety score 0–3
4–7
8–11
12–15
Low-tech industry
7
18
3
1
High-tech industry
17
7
0
0
Table 6.8
Comparison of the statistical characteristics of anxiety in two different types of industry Mean
Median
Mode
Interquartile range
Variance
High-tech industry
3.5
3.9
3
2.3–4.2
2.2
Low-tech industry
5.3
4.7
6
3.9–6.3
3.2
CHAPTER 6 RELATIONSHIPS BETWEEN TWO OR MORE VARIABLES
67
which gives the mean, median, mode, etc. for the anxiety scores of the two different groups.
Key points z Never assume that your tables and diagrams are good enough at the first attempt. They could prob-
ably be improved with a little care and adjustment. z Do not forget that tables and diagrams are there to present clearly the major trends in your data (or
lack of them). There is not much point in having tables and diagrams that do not clarify your data. z Your tables and diagrams are not means of tabulating your unprocessed data. If you need to present
your data in full then most of the methods to be found in this chapter will not help you much. z Labelling tables and diagrams clearly and succinctly is an important part of the task – without clear
titling and labelling you are probably wasting your time.
COMPUTER ANALYSIS The SPSS Statistics instruction book to this text is Dennis Howitt and Duncan Cramer (2011), Introduction to SPSS Statistics in Psychology: For version 19 and earlier, Harlow: Pearson. Chapters 8 (tables) and 9 (diagrams) in that book give detailed step-by-step procedures for the statistics described in this chapter together with advice on how to report the results. Figure 6.8 gives the SPSS Statistics steps for producing contingency tables, compound charts and histograms.
FIGURE 6.8 SPSS Statistics steps for contingency tables, compound charts and histograms