J. R. Statist. Soc. A (2007) 170, Part 2, pp. 425–445
Understanding the 2001 UK census migration and commuting data: the effect of small cell adjustment and problems of comparison with 1991 John Stillwell and Oliver Duke-Williams University of Leeds, UK [Received March 2006. Final revision September 2006] Summary. Origin–destination statistics have been produced from the last three UK censuses. The paper describes what is new about the 2001 census interaction data on migration and commuting, considers the disclosure control methods that were applied to cells containing small values and demonstrates the problems that are associated with making comparisons with 1991 data. The effect of small cell adjustment procedures on the interaction data sets is investigated by means of selective analyses at different spatial scales. Some recommendations are made in light of the problems that were manifest in 2001. Keywords: Census counts; Commuting; Destinations; Interaction; Migration; Origins; Small cell adjustment
1.
Introduction
As in 1991, two major migration and commuting data sets are available from the 2001 UK census: the special migration statistics (SMS) and the special workplace statistics (SWS) (Rees et al., 2002; Cole et al., 2002). However, in Scotland, the SWS in 2001 have been replaced with a new set of special travel statistics (STS) that include journeys to place of study as well as to place of work. Collectively, these data sets are also known as the 2001 origin–destination statistics. In this paper we refer to them as the ‘census interaction data’. They are currently accessible to members of the academic community and data suppliers who are registered with the Census Registration Service via the ‘Web-based interface to census interaction data’ (WICID). The WICID system is the software system (Stillwell and Duke-Williams, 2003) that was developed by the Census Interaction Data Service (CIDS), the ‘data support unit’ funded by the Economic and Social Research Council that has been rebadged as the Centre for Interaction Data Estimation and Research (CIDER) from August 1st, 2006. The initial part of this paper (Section 2) aims to provide an understanding of the structure and content of the various sets of 2001 SMS, SWS and STS that were released at various times during late 2004 and early 2005 by the Office for National Statistics (ONS) and the General Register Office for Scotland. It also seeks to emphasize what is different from previous censuses. New data sets have been added to the existing database in the WICID system and researchers have been encouraged to utilize 2001 data in research projects. However, it is important to recognize limitations that are associated with the data that may obscure interpretation and inhibit time series analysis. One significant area for consideration is the effect on the interaction data of small cell adjustment, the method which was chosen by the ONS to eliminate the risk Address for correspondence: John Stillwell, School of Geography, University of Leeds, Leeds, LS2 9JT, UK. E-mail:
[email protected] © 2007 Royal Statistical Society
0964–1998/07/170425
426
J. Stillwell and O. Duke-Williams
of disclosure in 2001. This adjustment procedure differed from the suppression methods that were used for disclosure control of interaction data in 1991 (Rees and Duke-Williams, 1997). The adoption of this new adjustment measure complicates the task of making comparisons of migration or commuting flows between 2001 and 1991. Section 3 of the paper presents selective analyses of the effect of small cell adjustment on the interaction data at various spatial scales, whereas Section 4 explores some of the other confounding features of comparative analysis between 2001 and 1991. The evidence that is presented is summarized in the concluding section of the paper, which contains recommendations for interaction data collection and adjustment by the census agencies at the next round of census taking in 2011. The final section also gives some indications of how the CIDER will evolve in the meantime to include data sets that will complement and support the analysis of the migration and commuting that censuses provide. 2.
The content of interaction data from the 2001 census
2.1. Spatial scales and table counts The 2001 census interaction data are collectively much larger and more complex than those that preceded them in 1991 or in 1981. They come in three sets, where each set refers to a particular level or set of spatial units (Table 1). Level 1 refers to the local authority districts across Great Britain, a mixture of different types of local government authorities in England, Wales and Scotland, together with Parliamentary constituencies in Northern Ireland. We refer to these sets of administrative units as ‘interaction districts’. Level 2 involves ward level data and contains an amalgamation of census area statistics (CAS) wards in England, Wales and Northern Ireland, and standard table wards in Scotland. Standard tables provide the most statistically detailed tables of census wards or postcode sectors whereas CAS tables have been designed as reduced versions of the standard tables (General Register Office for Scotland, 2003) for use at a much lower level of geography, that of output areas (OAs). The wards that are used for reporting the CAS are slightly different from those used in the standard tables. This occurs because there were about 50 instances where CAS ward counts were below the permitted threshold and these spatial units were merged with neighbours. There are no differences between the CAS and standard table wards in Northern Ireland. Thus, since there is a mixture of types of ward at level 2, the Table 1.
Geographical units used in the 2001 SMS, SWS and STS
Country
Units for the following levels: Level 1
Level 2
Level 3
Census area statistics wards (7969)
Output areas (165665)
Wales
London boroughs (33) Metropolitan districts (36) Unitary authorities (46) Other local authorities (239) Unitary authorities (22)
Scotland
Council areas (32)
Northern Ireland
Parliamentary constituencies (18)
Census area statistics wards (881) Standard table wards (1176) Census area statistics wards (582)
Output areas (9769) Output areas (42604) Output areas (5022)
Total
Districts (426)
Interaction wards (10608)
Output areas (223060)
England
Understanding Census Migration and Commuting Data
427
Table 2. Tables and counts in the 2001 and 1991 interaction data sets Data set
2001 SMS 1991 SMS 2001 SWS 1991 SWS 2001 STS
Tables and counts for the following levels: Level 1 (district)
Level 2 (ward)
Level 3 (OA)
10 tables, 996 counts Set 2: 11 tables, 94 counts 7 tables, 936 counts — 7 tables, 1176 counts
5 tables, 96 counts Set 1: 2 tables, 12 counts 6 tables, 354 counts Set C: 9 tables, 274 counts 6 tables, 478 counts
1 table, 12 counts — 1 table, 36 counts — 1 table, 50 counts
spatial units are referred to collectively as ‘interaction wards’. The spatial units at level 3 across the UK are OAs, a new tier of small areas built up from unit postcodes, containing around 125 households and nesting within wards and parishes (Martin, 2002). The overall volume of interaction data is larger than before partly because of the flows that are produced at the new geographical scale of OAs, partly because of the inclusion of new statistics for Northern Ireland and STS for Scotland for the first time and partly because of more detailed classifications of the variables in the tables. The 2001 interaction data sets were prepared as counts contained within series of tables at each of the three levels. A summary of the tables and counts from the 2001 and 1991 censuses (Table 2) indicates that, although the number of tables is much the same, there are considerably more counts in 2001 than in 1991. In particular, the STS for Scotland in 2001 contain counts for children aged under 16 years and therefore require additional categories in certain tables. The 1991 SWS data identified in Table 1 are the 10% sample of journey from home to work flows produced only at ward level and referred to as SWS set C (Cole et al., 2002). These are distinct from set A, which contained the characteristics of residents living in specified areas who had jobs at the time of the census, and set B, which included equivalent characteristics of individuals at their places of work. 2.2. Migration data What is different about the migration tables and counts of flows in 2001 compared with 1991? All the 2001 migration tables begin with the characters MG followed by a code for the level (1, 2 or 3) and then a table code. In a number of cases, modified versions of tables exist for flows to destinations in Northern Ireland. In these cases, the letter N is used as a suffix to the table code in the WICID system to distinguish between versions. In Table 3, we compare the different tables for migration produced from the two censuses at level 1, using the variables of age, family status, ethnicity, illness, economic activity, moving groups, tenure, occupation and language. Thus, in 2001 SMS Table MG101, for example, migrants are disaggregated by 24 age groups and two sex categories (plus person and ‘all age’ totals), giving a total of 75 counts, whereas in the 1991 SMS there are two tables containing age and sex counts with five and 19 age categories respectively, providing 48 counts altogether (see Stillwell et al. (2005) for more detail). Unlike 1991, where infants aged less than 1 year were excluded altogether (since they had not been born 12 months before the census), the 2001 migrant status for children aged under 1 year in households is determined by the migrant status of their ‘next of kin’ (which was defined as, in order of preference, mother, father, sibling with nearest age, other related person and household reference person).
428
J. Stillwell and O. Duke-Williams Table 3. Tables and counts from 2001 SMS level 1 and 1991 SMS set 2 Variable
Age Family status Ethnicity Limiting illness Economic activity Moving groups Tenure Occupation Some knowledge of Gaelic, Welsh or Irish Marital status
2001 level 1 details
1991 set 2 details
Tables
Counts
Table MG101 Table MG102 Tables MG103, MG103N Table MG104 Tables MG105, MG108 Table MG106 Table MG107 Table MG109 Table MG110
75 54 33
—
Tables
Counts
Tables 1, 3 — Table 5
48 — 4
84 378
Table 6 Tables 7, 9, 10
4 21
16 32 288 36
Table 2 Tables 8, 8S — Tables 11S, 11W
2 7 — 2
—
Table 4
6
There were no family status or occupation tables for the district migration data in 1991, whereas the marital status table that was used in 1991 was discontinued in 2001. Important changes which were introduced in 2001 are the increased disaggregation of counts in the ethnicity table to include six groups other than white, the addition of age and sex categories in the illness, economic activity and language tables, and the introduction of an occupation table based on the new national statistics socio-economic classification that replaced the classification of social class based on occupation and socio-economic group that was used in 1991. An alternative table of counts by ethnic group is included for Northern Ireland at level 1 since the classification of flows to destinations within Northern Ireland is only twofold: whites and others. One major difference between the migration data sets in 2001 and 1991 is the treatment of students. In 1991, most students were recorded at their place of parental domicile rather than at their term-time address, unless they were living away from the parental home. To address concerns about the poor recording of students in 1991, the Office of Population Censuses and Surveys produced an additional table (Table 100) as part of the small area statistics and local base statistics series of tables, giving counts of resident students and school-children for every local authority district by local authority district of term-time address. In the 2001 census, students were counted at their term-time addresses and, consequently, those students migrating from parental home to term-time location, from one term-time location to another or from term-time location to another area (after graduation) in the 12 months before the census are included in the data. This may cause some problems when interpreting migration data that are classified by occupation, for example, because migrant status is that recorded at the time of the 2001 census. A student living in Leeds who graduated in the summer of 2000 and moved immediately to London to take a new job working for a finance company would be recorded as a higher professional rather than a student (Champion and Coombes, 2007). The inclusion of student migrants in 2001 relates to another new dimension of the composition of the migration data, which is the introduction of an entirely new unit of measurement in 2001, the ‘moving group’. This is a development from the concept of the ‘wholly moving household’ which underpinned counts in 1991 Tables 8 (on tenure) and 9 (on economic position). The concept of a moving group refers to a single person or a group of people within a household or
Understanding Census Migration and Commuting Data Table 4.
429
Numbers of moving groups within the UK, 2000–2001† Results for the following numbers of people: 1
Groups Wholly moving households Number 719379 % 42.2 Other moving groups Number 1545286 % 84.5
2
3
All
500461 29.3
486356 28.5
1706196 100
178041 9.7
104753 5.7
1828080 100
1821914 51.4
3542215 100
367029 16.2
2268397 100
Migrants Wholly moving households Number 719379 1000922 % 20.3 28.3 Other moving groups Number 1545286 356082 % 68.1 15.7 †Source: 2001 census SMS level 1, Table 6.
communal establishment who moved together from the same usual address one year before census day. Thus, a single ‘migrant’ who moves alone actually constitutes a moving group as does a ‘wholly moving household’, a household in which all members of the household are migrants and moved from the same address, as used in 1991. To provide some clarification and to give an indication of the relative volumes of counts in different categories, Table 4 indicates, for flows within the UK, the numbers in wholly moving households and other moving groups by size of group (1, 2 and 3 or more people). It is apparent that the ‘one-person’ counts for wholly moving households and other moving groups are the same for groups and for migrants. Whereas the data that are used in Table 4 indicate the counts that were available from 2001 SMS Table MG106, counts of moving groups are available by tenure (SMS Table MG107), economic activity (SMS Table MG108) and national statistics socio-economic classification (SMS Table MG109). Table 4 shows that there were approximately 5.8 million people within the UK migrating in 2000–2001 in 3.5 million moving groups, of which 48% were wholly moving households and 52% were ‘other moving groups’. A very high proportion of those people moving in other moving groups were in fact individual movers (85%). Among the 3.5 million people migrating in wholly moving households, over half involved three or more people moving together, with 28.3% in two-person households and a fifth as single persons. In contrast, over two-thirds of migrants in other moving groups were single persons, with similar numbers split between two-person and three- or more person groups. Where there is only one person in the moving group, that person is the moving group reference person. For a group of students moving from one house to another, for example, the moving group reference person would be the eldest of the group. If each of the students had moved from a different address one year previously, then they would each be recorded as a separate household. A household is described as partly moving if one or more members of the household is a migrant but not all members of the household have moved from the same usual address. At level 2, there are five tables of ward migration data in 2001 compared with only two in 1991. This provision contains important new information on ethnicity, on occupation and on
430
J. Stillwell and O. Duke-Williams
tenure, as well as more disaggregated age groups. The data counts of migration by ethnic group at district and ward level provide new insights into the internal migration behaviour of ethnic minority groups, although the disaggregation at ward scale is only between whites and nonwhites. However, it is disappointing that the flows of immigrants in the 2001 SMS at all spatial scales are confined only to one count of those migrants with origin outside the UK. This is a retrograde development from the 98 foreign origins that were specified in 1991 and prevents analysis of white or non-white immigrants from different world regions. The classification of migrants with ‘unknown origin’ that was used in 1991 is not used in the 2001 data, owing to imputation of missing origins. However, a new classification of migrants with ‘no usual address’ one year before the census was introduced in the 2001 data. Although these two classifications are distinct, they may be expected to have some overlap. In 2001, the number of people migrating within or into the UK with no prior usual address was 456736 compared with 325630 migrants within or into Great Britain with origin unknown in 1991. The circumstances that lead to an individual’s categorization as having no usual address may include extended vacations, homelessness, vagrancy or living in some form of temporary accommodation. Analysis of the 2001 migration data indicates that, in comparison with other migrants, those with no usual address one year before the census were more likely to be young adult males, to be living alone and to be unemployed. At level 3, the 2001 census SMS data for OAs consist of one table with flow counts for four age groups (all ages, 0–15 years, 16 years–pensionable age and over pensionable age) for three sex groups (person, male and female), giving 12 counts altogether. 2.3. Commuting data The 2001 commuting data are divided into the journey to work or SWS data sets in England, Wales and Northern Ireland and the travel to work or study (STS) in Scotland. The destination location in the SWS refers to the place where a person works in their ‘main job’ or the depot address for people who report to a depot, whereas the destination in the STS is the place that a person travels to for their main job or course of study (including school). Like the SMS, the 2001 SWS and STS have been produced at three levels, whereas the 1991 SWS set C data were produced only for wards. Moreover, the 2001 SWS and STS are 100% counts whereas the 1991 SWS are a 10% sample of the population and need to be scaled up when used for comparative analysis. No commuting data were published for Northern Ireland in 1991. Tables and counts for the two SWS data sets at this spatial scale are compared in Table 5. The coding of each table in 2001 involves the single character W together with a level code and a numeric code signifying the table number. There are similarities in the tables on age, family status and mode of travel, but the main differences are the counts for the new national statistics socio-economic classification of occupation that are included in 2001 SWS Table W104 and the discontinuation in 2001 of counts relating to hours worked, distance travelled and cars available, all of which were included in the 1991 SWS. A similar set of SWS tables to the 2001 level 2 suite has been provided at level 1 with the addition of one further table on ethnic groups (seven categories) by type of employment (all persons, full-time student, in full-time employment and in part-time employment) by sex. A separate version of this table is available for Parliamentary constituencies in Northern Ireland which is identical except that the ethnic group categories are restricted to only white and nonwhite. At level 3, one table is available, providing 36 counts of commuters by mode of travel to work for all individuals, students and non-students. No level 3 commuting data were produced at all for residences in Northern Ireland, owing to concerns about disclosure. The STS data sets for Scotland at all three levels are very similar to those of the SWS for the rest of the UK
Understanding Census Migration and Commuting Data
431
Table 5. Tables and counts available from 2001 SWS level 2 and 1991 SWS set C Variable
Age and type Family status Mode of travel Occupation Employment Hours worked Distance to work Cars available
2001 level 2 details
1991 set C details
Tables
Counts
Tables
Counts
Table W101 Table W102 Table W103 Tables W104, W105 Table W106 — — —
72 108 52 86 36 — — —
Table 1 Table 2(2) Table 5 Tables 7, 8 Table 9 Table 2(1) Table 4 Table 6
54 12 22 102 48 10 16 10
except that the age categories extend to ages below 16 and above 74 years, with cells that refer to employment being masked out for these age groups. Finally, all STS tables include additional columns tabulating ‘other persons’ including those who neither work nor study and thus do not have a daily travel event. Although the 2001 interaction data represent a huge and very rich data resource, there are some characteristics or shortcomings that need to be recognized. One of the most important limitations or problems with the 2001 data relates to the disclosure control methods that were used. 3.
Effect of small cell adjustment
In 2001, following a consultation exercise and discussion on various alternatives, the ONS decided to adopt a series of methods to prevent risk of disclosure known as ‘pre-tabulation’ and ‘post-tabulation’ adjustments. Pre-tabulation modifications come from edit and imputation and involve ‘record’ level imputation for a single person or household and ‘item’ level imputation for an individual field or variable. One important imputation for the migration and commuting counts was that of migrants whose origin was missing or unstated and of commuters with a missing or unstated workplace. Some of these counts were quite substantial in number and, owing to imputation, these counts were not tabulated separately in 2001 as they were in 1991; consequently there is no way of distinguishing these flows in the 2001 SMS, SWS or STS. One-number census adjustments such as ‘thresholding’ and ‘record swapping’ also occurred pre tabulation, i.e. before the SMS and SWS were generated. The setting of minimum thresholds of numbers of people and households for the release of sets of output occurred at different spatial scales whereas the process of record swapping involved exchanging a sample of records in one area with similar records from other geographical areas. The third technique that was used to prevent disclosure is the so-called ‘small cell adjustment method’ (SCAM) and involves the adjustment of small counts appearing in all the SMS and SWS tables in the interaction data with the exception of flows of migration with destinations in Scotland and of flows of commuters with origins in Scotland. The STS flows at OA level were also subject to adjustment although those at district and ward level were left unadjusted. It is this adjustment which has caused most profound concern among the user community of the origin–destination statistics since, as we shall show, many of the cells of interaction matrices will contain small flows, particularly as the level of spatial resolution reduces. Although the precise details of the adjustment methodology have not been released, it is assumed that small
432
J. Stillwell and O. Duke-Williams
Fig. 1.
Cells in Table MG301 as shown in the WICID system
counts refer to values of 1 or 2 and these have been adjusted to values of 0 and 3 in such a way that there is a 23 probability that a value of 1 will be adjusted to 0 and a 13 probability that it will be adjusted to 3, whereas the probabilities for adjusting the value of 2 to 0 or 3 are the other way around (Duke-Williams and Stillwell, 2007). These assumptions are supported by the observation that the values 1 and 2 do not appear in tables which have been modified by using the SCAM, whereas values of 0, 3 and higher do occur. One important feature of the SCAM adjustment is that it has been applied separately to each table at each spatial scale rather than being applied once at the scale of the smallest units. Intuitively we would expect that the major effect of the SCAM to be on OA-to-OA flows since these areas are the smallest spatial units, but how can the effect be assessed at this scale? This is a difficult challenge because of the huge volume of potential flows that are involved at this level and the sparse matrices that will contain very large numbers of origin–destination pairs between which there is no interaction flow. In theory, if we consider total migrants, the potential matrix of adjusted interarea flows contains 223060 origins and 180456 destinations, which represent over 40 billion cells! There are fewer destinations because migrant flows with destinations in Scotland have not been disclosure controlled. The table containing migration flow data at the OA level is Table MG301 (‘Total migrants by broad age group and sex’). The ONS provides adjusted data on OA-to-OA flows for so-called ‘interior’ cells; these are cells numbered 5 and 6, 8 and 9, and 11 and 12 in Fig. 1. If any one of these cells contained a non-zero flow, then all six cell counts were generated and included in the data that were supplied to the CIDS, together with the relevant totals. The values in cells 1, 2, 3, 4, 7 and 10 are all based on aggregations of the interior cells and are therefore consistent with the respective sums of the interior values of each row and of each column. Data were thus supplied for just under 1.5 million OA–OA pairs in this system, representing less than 0.004% of all potential OA–OA pairs. In other words, there were 1.5 million pairs of OAs between which at least one migrant moved usual address. When there was no such flow between a given pair of OAs (either because there really was no flow or every cell had been adjusted to 0), then no record was produced in the output file that was distributed. So what is the frequency of the adjusted counts that are included in the interior cells of this table? In total, there are about 5 million migrants moving within this system of OAs. Fig. 2 is a frequency distribution of interior cell values and shows that there are over 10 million values of 0, no values of 1 or 2 and over 1 million values of 3 in the data supplied. Values of 4 and above exhibit a steady rate of decline in frequency, whereas the value of 3 has a higher frequency than suggested by a linear extrapolation of the higher count values. Only cell values up to 30 are shown thereafter. From a user perspective, this shows that almost all of the flows that are extracted from an OA–OA query will be either 0 or 3: the evidence of the SCAM at this level is very prominent. In fact, 99.3% of values in the interior cells supplied are either 0 or 3, with 3
Understanding Census Migration and Commuting Data
Fig. 2.
433
Distribution of interior cell values in 2001 SMS Table MG301
accounting for 95% of all migrants. Of course, the majority of the cells with zero flows will have been 0 originally; only a proportion will have been ‘adjusted to 0’ from values 1 and 2 but it is not possible to distinguish these. Turning to the SWS for the same spatial system, we observe that, although the interior cells of Table W301 (OA level journey-to-work flows) have been modified by the SCAM in the same way, the clustering is not quite so extreme. The proportion of all interior cells that are either 0 or 3 is almost the same as is the case with MG301, at 99.5%. However, 3 accounts for a smaller proportion (77.5%) of all commuters. The remaining 22.5% of workers are tabulated in unmodified cells of this table. The difference is explained by the spatial focusing of commuting flows: some OAs in the centres of cities, for example, have very high numbers of workers travelling to them. We refer to this effect as the single-table impact of the SCAM. However, in addition to this single-table effect, there are two other noticeable ways in which the effects of the SCAM will impact on users. Firstly, the SCAM has caused variation between conceptually equivalent counts within sets of tables for a single flow at the same scale (the ‘between-tables effect’); secondly, the SCAM has created variation in conceptually equivalent values within the same table of counts across different spatial scales (the ‘aggregation effect’). The between-tables effect is most obvious when we consider totals and subtotals in tables. These are calculated as the sums of the adjusted data so all tables are internally additive. However, we know that different tables are independently adjusted and this means that counts of the same population in two or more different tables at the same scale may not necessarily be equivalent. The problem occurs in the SMS and SWS data sets at level 2 (ward) and level 1 (district); it is not relevant to any level 3 (OA) data sets, as the latter contain only a single output table for each flow. There are two direct counts of total migrants in SMS level 2, and five in SMS level 1. SWS level 2 has five counts of total commuters, whereas SWS level 1 has six such counts. We shall use data from both levels 1 and 2 to illustrate the effects. The way in which the SCAM introduces differences between these totals can be examined by comparing the two available totals in SMS level 2 from Table MG201 (age by sex) and Table MG203 (ethnic group by sex) (Fig. 3). Although the interior cells in each of these tables will
434
J. Stillwell and O. Duke-Williams
(b)
(a)
Fig. 3.
(a) Table MG201 (age by sex) and (b) Table MG203 (ethnic group by sex) in the 2001 SMS
be subject to independent SCAM adjustment, cell 1 in each of these tables contains the total number of migrants for each ward pair. These totals are aggregations of flows in interior cells. The question that we might ask in this case is ‘What differences occur in non-zero ward-to-ward flow totals between the two tables?’. There are over 1.1 million ward-to-ward flow totals that are non-zero in either Table MG201 or MG203 or both. The totals are different in 61% of those cases. Fig. 4 shows differences (not totals) between the two tables. It is possible to have difference values of 1 and 2 in this context but the distribution of differences is again dominated by 3 (owing to the predominance of values of 0 in one table and 3 in another rather than, say, the difference between 9 and 12). The frequency of differences that are greater than 10 is very low and the largest difference between the absolute totals is 21. In this case, the total that is taken from Table MG201 is 60, whereas the total that is taken from Table MG203 is 39. Clearly, from a user perspective, taking a total from one table as a denominator for a value taken from the other table could lead to misleading
Fig. 4.
Distribution of differences between Table MG201 and Table MG203 totals
Understanding Census Migration and Commuting Data Table 6.
435
Net migration rates comparison for London boroughs, 2000–2001
Borough
Top five City of London Kingston upon Thames Lambeth Sutton Barking and Dagenham Bottom five Brent Kensington and Chelsea Hounslow Ealing Newham Mean net rate Correlation with Table TT37 rate
Net migration rate per 1000 population for the following tables: TT033
MG101
MG201
MG301
4.87 3.33 0.00 −0.15 −0.49
5.01 3.95 0.55 1.39 −0.16
11.70 2.76 −0.69 0.66 −0.23
3.34 3.14 0.76 1.30 −0.16
−13.33 −13.50 −14.04 −14.64 −16.73 −7.10 —
−12.94 −13.70 −13.12 −15.08 −18.04 −6.94 0.985
−13.47 −14.49 −14.24 −14.70 −17.30 −7.34 0.975
−12.68 −14.37 −13.49 −14.84 −18.05 −7.18 0.980
results, yet it would be a mistake that would be easy to make. This situation is also relevant to the use of OA area statistics counts as denominators since different tables contain inconsistent total individual and household counts and thereby lead to different rates being calculated (Rees et al., 2005). To examine the SCAM aggregation effect, we consider different net migration rates per 1000 population for London boroughs (Table 6) based on data derived directly from (a) Table TT033 and (b) SMS Table MG101 and data aggregated up from (c) SMS Table MG201 and (d) SMS Table MG301. Table TT033 ‘Migration (people)’ is the theme table containing aggregate gross in-migration and out-migration counts extracted via CASWEB, the Web interface to census aggregate outputs and digital boundary data. Tables MG101, MG201 and MG301 are the three SMS tables containing an age breakdown of migration flows at each spatial scale. Across most of Greater London, boroughs experience significant losses in terms of net internal migration rates. Table 6 presents the five boroughs of London that are at each end of the net migration rate ‘league table’. Comparison by eye of the figures that are presented gives an indication of the variation in rates that occurs. The most significant variation is apparent for the City of London but this is likely to be an anomaly created because of the small population size of this borough and the fact that it may have a higher proportion of small flows subject to the SCAM. Table 7 contains a summary of the aggregation and between-table differences for total migrants from the SMS and indicates significant differences between counts from different tables at the same scale and, in particular, between total counts from different tables at different scales. There is no obvious pattern: the total from Table MG201 is higher than the total from Table
436
J. Stillwell and O. Duke-Williams
Table 7.
UK internal migration in 2000–2001 totals from different SMS tables
Variable
Age Family status Ethnicity Long-term limiting illness
Results for the following levels from which total derived: Level 1
Level 2
Level 3
Table MG101: 6202016 Table MG102: 6204876 Table MG103: 6206216 Table MG104: 6205128
Table MG201: 6267740 † Table MG203: 6190926 †
Table MG301: 6164996 † † †
†Not applicable.
MG101 but the total from Table MG301 is lower. It is surprising to find that the difference between ward and district level for the age table (64000) is greater than that between OA and district level (37000). It seems ironic that we observe seven different counts of total migrants in the UK in the 2000–2001 period from a census that is supposed to be the first so-called ‘one-number census’! Finally in this section, we examine the effects of the SCAM by comparing in-migration for a local authority district in England (Leicester) that has been modified by the SCAM with inmigration for a local authority district in Scotland (Edinburgh) that has not been modified. In this instance, we report on three comparisons: (a) the frequency distributions of all interior cells in Table MG101 that are actually subject to the SCAM; (b) the frequency distribution of overall totals in Table MG101 that are the most commonly used values and are sums of cells that have been treated by the SCAM; (c) the totals by age and sex between Tables MG101 and MG201, e.g. at district and ward level. The interior cell values in this case are those in the cells for males and females for each age group, whereas the persons cells for each group are an aggregation of the modified data for males and females and the overall totals are an aggregation of the totals in each person–age group. The comparison of interior cell values from Table MG101 (age and sex) (Fig. 5) indicates similar distributions of in-migration flows to Leicester and Edinburgh but no values of 1 or 2 for Leicester of course. Fig. 6 shows the frequency distributions of total in-migration flows from other districts to Leicester and Edinburgh from Table MG101. The Leicester values display obvious groupings on values of 3 and multiples of 3, suggesting that the effect of the SCAM is pronounced. The frequency distributions of total in-migration flows to Edinburgh are rather different, indicating that there are fewer small flows and more larger flows. Producing similar graphs (not shown) at ward level (i.e. all wards in UK to all wards in Leicester, excluding flows within Leicester) generates results that are closer to expectation—Leicester exhibits clustering on multiples of 3, whereas for Edinburgh the highest frequency flow is ‘1’ followed by a steady decay. We compare age-specific in-migration flows to Leicester between SMS levels 1 and 2 in Table 8. In this case, we need to aggregate the Table MG101 age categories to those used in Table MG201. Thereafter, we calculate the absolute differences by age group based on Table MG101 minus Table MG201 and compute the absolute difference as a percentage of the Table MG101 value. The total flow from Great Britain to Leicester is 35343 from Table MG101 and 35487 from
Understanding Census Migration and Commuting Data
437
(a)
(b)
Fig. 5.
Comparison of interior cells flows in Table MG101 for (a) Leicester and (b) Edinburgh
(a)
(b)
Fig. 6.
Comparison of in-migration totals from Table MG101 for (a) Leicester and (b) Edinburgh
Table MG201, giving a difference of −144 or −0.4%. There appears to be no pattern of positive or negative association by age and sex. Some differences are proportionately large; others small. Values in the aggregate table are well above 3, encouraging users to forget that the data are still affected by the SCAM. 4.
Problems associated with comparison between censuses
One of the main objectives of the CIDS is to enable researchers to extract data sets that facilitate comparisons of patterns of migration or commuting over time. If we consider migration in this context, according to the 2001 SMS count in Table MG101, the total number of migrants
438
J. Stillwell and O. Duke-Williams Table 8. Comparison of levels 1 and 2 in-migration to Leicester by age and sex Age (years)
0 1–2 3–4 5–9 10–11 12–14 15 16–17 18–19 20–24 25–34 35–44 45–59 60–64 65–74 75 Total
Absolute difference Total
Males
Females
−37 −54 33 19 −7 −6 −3 −25 −34 −118 102 −3 −103 59 17 16 −144
−16 19 1 −2 16 7 3 −9 −26 31 −73 −43 −31 47 15 −11 −72
−21 −73 32 21 −23 −13 −6 −16 −8 −149 175 40 −72 12 2 27 −72
Absolute difference as % of level 1 total Total
Males
Females
−4.3 −4.9 3.6 1.0 −1.2 −0.7 −1.1 −3.8 −1.2 −1.2 1.3 −0.1 −5.3 14.6 3.2 1.9 −0.4
−3.8 3.3 0.2 −0.2 5.6 1.7 2.3 −3.1 −2.2 0.7 −1.8 −2.3 −3.0 23.3 5.5 −4.6 −0.4
−4.9 −13.6 7.0 2.3 −7.8 −3.0 −4.5 −4.3 −0.5 −2.7 4.4 2.6 −8.1 5.9 0.8 4.6 −0.4
Fig. 7. Differences between migrants in 2001 and 1991 by age and sex (sources: 1991 census SMS set 2 Table 3 and 2001 census SMS level 1 Table 1): , males; , females
who were aged at least 1 year in Great Britain in the 12 months before the census date in 2001 (referred to as 2000–2001) was 5923202 compared with 1991 SMS set 2 count in Table 1 of 4688180 total migrants in the equivalent 12-month period, 1990–1991. This suggests an increase of 1.24 million or 26% between the two periods. A comparison of the age–sex structure of total migration in the two periods (Fig. 7) indicates major increases in the 35–39, 50–54 and 55–59 years age groups for both sexes. There are increases also in the 16–19 years age group, although we must remember that the 2001 data include students whereas the 1991 data do not. In older ages, differences are smaller, and decreases are apparent in the migration of females in their late 70s and early 80s. Differences appear to be greater for females than for males in the middle age
Understanding Census Migration and Commuting Data
(a)
439
(b)
Fig. 8. Net migration balances, districts, Great Britain, in (a) 1990–1991 and (b) 2000–2001 (sources: 1991 and 2001 census SMS): positive net migration individuals ( ) 298 in 1990–1991 and 216 in 2000–2001; negative net migration individuals ( ) 161 in 1990–1991 and 191 in 2000–2001
groups and less in the younger and older age ranges. In fact, the percentage change is negative for females aged between 75 and 84 years. At the district scale, the geographical patterns of net migration loss and gain (Fig. 8) show some similarity although the spatial units are different. It is evident that, in aggregate terms, London and the big cities are continuing to experience net losses, whereas rural areas are gaining, although the ratio of net gainers to net losers is reducing from 1.85 to 1.13 (from 298:161 to 216:191). However, an accurate comparison of aggregate and subgroup migration in 2000–2001 and 1990–1991 is confounded by various problems, several of which have been discussed earlier in the paper. In summary, these problems relate to the definition of variables, the measurement and adjustment of counts and the existence of inconsistencies in the spatial units. The problems in the definition of particular counts of migrants stem from the inclusion of new tables in 2001 for family status and occupation, the exclusion of a marital status table in 2001, new categorizations of variables that were used in 2001, no breakdown of immigrants by region of origin in 2001, the new concept of moving group in 2001 and the inclusion of students in the 2000–2001 data. Among the problems of the measurement and adjustment of counts are the inclusion of infants aged under 1 year in 2001, different treatment of underenumeration in each census, the miscoding of 3141 migrants in 2001 as immigrants rather than internal migrants, the imputation of migrants with unstated origins and inclusion of migrants with no usual address in 2000–2001,
440
J. Stillwell and O. Duke-Williams
the difficulties in defining populations at risk with some of the variables (e.g. moving groups) and the SCAM adjustment of counts for confidentiality that was discussed in the previous section. The miscoding error appears to relate to British Forces Post Office addresses, some of which are overseas but many are in the UK. If the British Forces Post Office address was outside the UK, the address one year ago should relate to the ‘overseas country’. However, it appears that some of those based in the UK one year before the census were coded as ‘migrants from outside the UK’, but the migrant country coded as one in the UK. Finally, problems of inconsistency in geographical area definition are due primarily to the absence of migration data for Northern Ireland in 1990–1991, local government reorganization during the 1990s, which was particularly associated with the creation of unitary authorities in Wales, Yorkshire and the Humber and the South West, as well as the redrawing of ward boundaries, and the introduction of a new tier of OAs in 2001 (although no data were produced at enumeration district level in 1991). Two strategies have been adopted by the CIDS in response to the problem of geographical boundary change. Firstly, at the supradistrict scale, a five-tier hierarchy of approximately common zones has been constructed from the initial set of 459 districts in 1991 and 426 districts in 2001 across the UK. The levels in the hierarchy are as follows: (a) (b) (c) (d) (e)
417 ‘district’ zones; 218 ‘intermediate’ zones; 100 ‘health area’ zones; 47 ‘city region’ zones; 12 Government office regions based on the 1991 definition.
The 100-zone geography is based on past work using National Health Service central register data for 98 health areas in England and Wales plus Scotland and Northern Ireland as single spatial units (Office of the Deputy Prime Minister, 2002) whereas the city region system has been used previously for comparative analysis of migration in the UK and Australia (Stillwell et al., 2000, 2001). At the most disaggregate level, best fit district zones have been identified through geographical information system overlay of the district boundaries for 1991 and 2001. Districts with new boundaries in 2001 have been amalgamated with neighbours to provide approximate consistency with 1991. Thereafter, higher level zone sets have been created through aggregation of the base level districts by using look-up tables. The second approach that was used by the Table 9. Comparison of adjusted total migration in Great Britain, 1990–2001 and 2000–2001† Sample
Primary 1990–1991 MIGPOP 1990–2001 Primary 2000–2001 Infants Students: economically active, full time Students: economically inactive Adjusted 2000–2001 Change 1991–2001 % change 1991–2001
Males
Females
Total
Number
%
Number
%
2293059 2644586 2974734 65550 94488
48.9 50.3 49.2 51.1 42.8
2395121 2615217 3076652 62634 126413
51.1 49.7 50.8 48.9 57.2
4688180 5259803 6051386 128184 220901
231217
48.6
244736
51.4
475953
2583479 −61107 −2.13
49.4
2642869 27652 1.06
50.6
5226348 −33455 −0.64
†Sources: 1991 census SMS and 2001 census SMS.
Understanding Census Migration and Commuting Data
(a)
Fig. 9.
441
(b)
Net migration (adjusted) for intermediate zones in (a) 1990–1991 and (b) 2000–2001
CIDS team to tackle the problem of boundary change at the subdistrict scale has been to reestimate 1991 and 1981 interaction data for 2001 wards to create a consistent set of time series data for 1981, 1991 and 2001. The methodology is spelt out in detail in Boyle and Feng (2002) and is not repeated here. The application of this modelling has been to create a series of new derived data sets from the SMS and SWS that users can access via the WICID system. Given the problems with the measurement of migration that were referred to earlier, we can attempt to make some adjustments to compensate for underenumeration in 1991 and the inclusion of infants and students in 2001. Table 9 shows the effect of using the count of migration adjusted for underenumeration that was produced by Simpson and Middleton (1999) for 1991 (the so-called MIGPOP data set in the WICID system) and the count of migrants from 2001 adjusted to remove infants plus economically active and inactive students. The result is to reduce the change between 1991 and 2001 to a small decline in total migration involving a drop in migration of males of 2.3% which is offset by a small increase of 1% in migration of females. The geographical patterns of net migration that result from these adjustments for intermediate zones (Fig. 9) are very similar indeed between 1991 and 2001. 5.
Looking ahead: some recommendations and proposals
Several of the issues emerging from this paper enable us to formulate recommendations for how the census offices handle interaction data sets at the 2011 census and how the CIDER might
442
J. Stillwell and O. Duke-Williams
evolve in the meantime to support research that is based on these and other interaction data sets. A successful 2011 census will add to the extensive body of interaction data that have been captured from previous censuses and it is encouraging to note that migration and commuting are definite question topics in the 2011 census consultation document (Office for National Statistics, 2005). The issues that were discussed earlier allow us to frame recommendations relating to outputs and questions. Firstly, there is little doubt that the SCAM has had a major effect on the interaction data at all spatial scales but especially at the OA level, where a very large number of small flows are recorded. The replacement of values 1 and 2 with 0 or 3 has created immense uncertainty among users of the OA flow data with severe limitations on their usefulness. Moreover, the separate adjustment of each table at different spatial scales presents an additional problem of lack of consistency. Given that there appear to have been no disclosure risk problems with the unadjusted flows in Scotland, our recommendation is that the SCAM should be dropped altogether in 2011 or that a less damaging method of disclosure control should be considered that produces consistent counts across different tables at the same level (if not between levels) and which has a less dramatic effect on flows at level 3. One concern is that, in the aftermath of the SCAM implementation, it may be that the ONS may simply decide not to produce any OA level data. However, simple unadjusted counts (with no age or sex disaggregation for migration or mode of travel for commuting) at OA level would be better than no data provision at this level or the provision of more detailed age–sex flows that have been heavily modified. The flow totals from Table MG301 for destinations in Scotland were not subject to the SCAM. An analysis of these totals showed that about 20% of all OA–OA flows to destinations in Scotland were of three or more migrants. These flows, which account for 44% of migrants, would have been left unaffected if a table that contained a single count had been produced and adjusted by the SCAM process. It is worrying that even stricter disclosure control methods might be applied than those which were used in 2001. If an alternative to the SCAM were to be chosen post tabulation, then we would suggest that the ONS produces sets of unadjusted data and considers allowing users in the academic community access to these data in a safe setting, under conditions that are similar to those used for handling the controlled access microdata sample (http://www.ccsr.ac.uk/sars/2001/hhold-cams/index.html) or the Longitudinal Study data (http://www.celsius.lshtm.ac.uk/). A second recommendation on outputs, stemming from the difficulties in making comparisons between 2001 and 1991, is to maintain the same count definitions, classification frameworks and geographical units as those which were used in 2001 or to make changes that enable aggregations of 2011 data that will be consistent with 2001 counts. Thirdly, and more specifically, we suggest that more detail is provided in the SMS on the location of immigrants to the UK from overseas. At a time when immigration is a high profile issue, when the extent of ethnic concentration is under debate and when there is a conspicuous paucity of alternative sources of data on immigrant flows, it seems counter-productive to have no detailed data on the origins of migrants to the UK. Turning to questions that might be added to the 2011 census that would be of very considerable benefit to the research community as well as to practitioners, we make a further three recommendations. Firstly, the inclusion of students in the flow data for migration during 2000–2001 has highlighted one of the key disadvantages of transition data that record only those characteristics of migrants at the time of the census rather than one year previously as well. As a consequence, although knowing the geographical origin and destination of a migrant individual, household or moving group, we do not know the socio-economic characteristics of the individuals or units concerned before they made their migration. Champion
Understanding Census Migration and Commuting Data
443
and Coombes (2007) demonstrate the problems that this causes when interpreting the flows of moving group reference individuals by occupation (derived from 2001 SMS Table MG109) out of large cities; part of the migration losses of human capital from our cities measured by the net balance between outflows and inflows of those in professional, managerial and technical occupations is due to the inclusion of those who were students in April 2000 and who moved into jobs that were defined by these occupational categories after graduating. Similar problems are associated with interpreting the data on economic position and tenure and, consequently, the inclusion of a request for information about these characteristics as well as occupation on the migration question would be very beneficial for more accurate data interpretation as well as better understanding of the relationship between geographic mobility and socioeconomic mobility that has hitherto only been possible by using the Longitudinal Study (Fielding, 1992). The second recommendation on questions is for the ONS and the Northern Ireland Statistics and Research Agency to follow the example of the General Register Office for Scotland by extending the journey-to-work question to include journey to place of study across the rest of the UK and thereby creating a set of STS that are consistent across the whole country. This would provide very useful information about the daily flows of children to schools that are partly responsible for the problems of traffic congestion that are so apparent in many of our cities. Finally, it is important to remember that the results of the census should be used to obtain a better understanding of human behaviour, of migration and commuting in this case. We know that increasing numbers of people have more than one home and move between different locations over the course of a year, spending different durations of time at each place. Similarly, increasing numbers of individuals have more than one job or work in more than one workplace, spending different amounts of time in different commuting destinations. Consequently, we favour the inclusion of supplementary questions about other homes or workplaces and duration of stay. In relation to this, it is gratifying to see that the General Register Office for Scotland are using questions about multiple usual addresses in their census test in 2006 (General Register Office for Scotland, 2005) that will provide potential for new sets of interaction data that are derived from these questions. Questions 21 and 22 on the census test form both relate to ‘other’ usual addresses. Question 21 asks about motivation for living elsewhere during part of the week or year and question 22 allows the identification of a precise geographical location together with information about duration of stay on a ‘nights per week’ and ‘weeks per year’ basis. These types of question would provide valuable information about personal mobility (if not migration) and the duration-of-stay questions would tell us much about movement patterns of those owning second homes, for example. It is a shame that the census test in Scotland does not include any parallel questions about place and duration of work. During 2001–2006, the primary concern of the CIDS was to develop a service allowing members of the academic community to access and download migration and commuting data. The key objective was to develop the WICID system, the software interface system, and to install the various data sets from the 2001, 1991 and 1981 censuses. This paper has reviewed some of the characteristics that are associated with the new 2001 interaction data sets and highlighted some of their shortcomings. Relatively little research has yet appeared in the academic literature based on the interaction data for 2001 so far given the limited duration of its availability but we can expect to see publications starting to appear as the research that is currently under way bears fruit. The paper has also attempted to provide some reaction to the challenges and concerns surrounding interaction data. During the next 5 years, users will be keen to undertake more sophisticated research and it will be essential that the CIDER can support that research, facilitating access through further development of the interface, adding further data
444
J. Stillwell and O. Duke-Williams
sets and providing direct support for using the data as well as information about the data sets to users. The following five objectives have been set for the CIDER over the next 5 years. The primary objective will be to continue to maintain a high quality service for users to access and extract data from the WICID system on a continuous basis, to deal with enquiries immediately and professionally, to respond to user feed-back and suggestions on improvements and to revise documentation where necessary. Second, there are key developments of the WICID system that need further attention: system updating and improving the user interface, and extending the analysis facilities. A third objective is to gather or estimate further UK census-based data sets. There are essentially two types of census-based data that will be added to the existing system: ‘primary’ census interaction data and estimated or ‘derived’ data. In relation to the former, it will be important to aggregate 2001 interaction data that are available at existing spatial scales— district, ward or OA—to more aggregate spatial scales both above the district level (e.g. city regions) or below district level (e.g. super OAs in England and Wales; data zones in Scotland). Since data will be published at super OA or data zone levels for a wide variety of topics including deprivation, housing, crime, health, employment and education, these data are expected to provide a new focus for academic investigation and therefore migration and commuting flow data that are estimated for these geographies will be very useful. The fourth objective is to include some important UK non-census-based data sets in the WICID system. This is the major departure from the current data provision, reflecting the transformation of the existing CIDS into CIDER, where users can extract and analyse data that have been assembled from a wider range of sources than just the Census of Population, thus increasing interest in and use of the system. Data sets of particular value are the National Health Service central register patient data, and Higher Education Statistics Agency data and pupil level annual school census data. Underpinning this expansion is the recognition that these various interaction data sets provide key information about the time series context for census data and the trends in interaction behaviour between censuses. Some of these data sets are currently used for subnational population estimation and projection. Finally, it is imperative to prepare for incorporating the results of the 2011 census into the WICID system, to monitor the proposed implementation of a continuous population survey (Office for National Statistics, 2004) and to align with the proposed integrated population statistics system (Office for National Statistics, 2003). The CIDER will lobby to ensure that the 2011 census questions on migration and commuting allow consistency with previous questions and will argue strongly that the returns are processed and adjusted so as to cause the minimum loss of information and lack of clarity. As we have attempted to demonstrate in the paper, the SCAM has caused huge uncertainty among users of the interaction data, and the census agencies are advised to reconsider alternative disclosure control methods for application to the outputs of the 2011 census, given the negative effects of the SCAM. The results of the 2011 census will form the basis of the information that is contained in the proposed integrated population statistics system that will subsequently be updated with data from further censuses (if maintained), the proposed continuous population survey and other administrative and registration systems. This system is likely to generate interaction data more regularly and it will be very important to ensure that data released are maximized without the effects of disclosure control becoming too detrimental. The experience of the CIDER in handling different types of registration data in the run-up to 2011 will be beneficial if and when data from the new integrated system eventually come on line. Although the prospects of more regular and consistent data on migration in the UK under a registration system are attractive and exciting, it is much less clear what the implications will be for commuting data post 2011.
Understanding Census Migration and Commuting Data
445
Acknowledgements The authors acknowledge financial support for the CIDS from the Economic and Social Research Council and the Joint Information Systems Committee under the 2001–2006 census programme (project H507255177) and from the Economic and Social Research Council under the 2006–2011 census programme (project RES-348-25-0005). The authors are grateful to the Joint Editor and the referees for their comments on earlier versions of this paper. Census output is Crown copyright and is reproduced with the permission of the Controller of the Stationery Office and the Queen’s Printer for Scotland. References Boyle, P. and Feng, Z. (2002) A method for integrating the 1981 and 1991 GB Census interaction data. Comput. Environ. Urb. Syst., 26, 241–256. Champion, A. and Coombes, M. (2007) Using the 2001 census to study human captial movements affecting Britain’s larger cities: insights and issues. J. R. Statist. Soc. A, 170, 447–467. Cole, K., Frost, M. and Thomas, F. (2002) Workplace data from the census. In The Census Data System (eds P. Rees, D. Martin and P. Williamson), pp. 269–280. Chichester: Wiley. Duke-Williams, O. and Stillwell, J. (2007) Investigating the potential effects of small cell adjustment on interaction data from the 2001 Census. Environ. Planng A, 39, no. 5, in the press. Fielding, A. (1992) Migration and social change. In Migration Processes and Patterns, vol. 2, Population Redistribution in the United Kingdom (eds J. Stillwell, P. Rees and P. Boden), pp. 225–247. London: Belhaven. General Register Office for Scotland (2003) Scotland’s Census 2001 – Reference Volume. Edinburgh: General Register Office for Scotland. General Register Office for Scotland (2005) GROS current view on questions for the 2006 census test and 2011 Census. General Register Office for Scotland, Edinburgh. (Available from http://www.gro-scotland. gov.uk/files/current-view-for-questions-for-2006-and-2011.pdf.) Martin, D. (2002) Geography for the 2001 Census in England and Wales. Popln Trends, 108, 7–15. Office of the Deputy Prime Minister (2002) Development of a Migration Model. London: Office of the Deputy Prime Minister. Office for National Statistics (2003) Proposals for an integrated population statistics system. Discussion Paper. Office for National Statistics, London. Office for National Statistics (2004) Proposals for a continuous population survey. Consultation Paper. Office for National Statistics, London. Office for National Statistics (2005) The 2011 Census: initial view on content for England and Wales. Consultation Document. Office for National Statistics, London. Rees, P. and Duke-Williams, O. (1997) Methods for estimating missing data on migrants in the 1991 British Census. Int. J. Popln Geogr., 3, 323–368. Rees, P., Parsons, J. and Norman, P. (2005) Making an estimate of the number of people and households for output areas in the 2001 Census. Popln Trends, 122, 27–34. Rees, P., Thomas, F. and Duke-Williams, O. (2002) Migration data from the census. In The Census Data System (eds P. Rees, D. Martin and P. Williamson), pp. 245–267. Chichester: Wiley. Simpson, L. and Middleton, E. (1999) Undercount of migration in the UK 1991 Census and its impact on counterurbanisation and population projections. Int. J. Popln Geogr., 5, 387–405. Stillwell, J., Bell, M., Blake, M., Duke-Williams, O. and Rees, P. (2000) A comparison of net migration flows and migration effectiveness in Australia and Britain: part 1, total migration patterns. J. Popln Res., 17, 17–41. Stillwell, J., Bell, M., Blake, M., Duke-Williams, O. and Rees, P. (2001) A comparison of net migration flows and migration effectiveness in Australia and Britain: part 2, age-related migration patterns. J. Popln Res., 18, 19–39. Stillwell, J. and Duke-Williams, O. (2003) A new web-based interface to British census of population origindestination statistics. Environ. Planng A, 35, 113–132. Stillwell, J., Duke-Williams, O., Feng, Z. and Boyle, P. (2005) Delivering census interaction data to the user: data provision and software developments. Working Paper 05/01. School of Geography, University of Leeds, Leeds.