South Docks. Dingle. Wavertree. Abercromby. Edge Hill. Princes Park. Liverpool CBD. Wallasey. Hoylake. Heswall. Claughton. Birkenhead. Prenton. Rock Ferry.
Environment and Planning A, 1980, volume 12, pages 1357-1382
Functional regionalisation of spatial interaction data, an evaluation of some suggested strategies I Masser Department of Town and Regional Planning, University of Sheffield, Sheffield S10 2TB, England
J ScheurwaterU Instituut voor Planologie, Rijksuniversiteit Utrecht, Utrecht 2506, The Netherlands Received 4 July 1979, in revised form 30 April 1980
Abstract. Three alternative approaches (the functional distance approach, the iterative proportional fitting based procedure and the intramax procedure) to the functional regionalisation of spatial interaction data are compared and evaluated in this paper. After a general discussion of the basic properties of each approach, the detailed processes that are involved in each case are illustrated by reference to a worked example. The last section of the paper presents the findings obtained by each of these approaches as a result of empirical studies of Greater London, Merseyside, and the Netherlands. The study as a whole identifies a number of ambiguities in the existing literature which point to the need for a more rigorous examination of current methodology. They also suggest that more attention needs to be given to the role that is played by aggregation procedures as against data transformation procedures in the evaluation process. Although each user must decide which of the three methods best suits his particular purposes, the results obtained by means of the intramax procedure seem generally more readily interpretable than those obtained by either of the other two methods. 1 Introduction
The results of functional regionalisations of interaction data are of practical value to analysts because of the insights that they give into the underlying structure of spatial systems and the central role that they play in applied research in a variety of fields. In the last few years there has been a considerable growth in interest in the delimitation of functional regions as a result of a number of national and international comparative urban systems studies (see, for example, Drewett et al, 1976; Hay and Hall, 1977). Alongside these developments, there has been a growing recognition of the need for more systematic studies of spatial representation in the accounting systems that are used in a wide variety of planning models (see, for example, Masser and Brown, 1978). The importance of functional regional delimitation in applied research is now becoming recognised by the authorities that are responsible for the collection of data in Britain and there are signs that this may be reflected in the presentation of the results of the 1981 Census (Coombes et al, 1979). However, the selection of an appropriate functional regionalisation strategy presents both practical and technical problems for the analyst. Although there is a broad consensus amongst analysts as to what is meant by a functional region in general terms, there is considerable diversity of opinion as to how it can be specified in practice. The key criterion is the strength of interaction that takes place between spatial units. On this basis, functional regions can be defined as "areas or locational entities which have more interaction or connection with each other than with outside areas" (Brown and Holmes, 1971, page 57). This broad definition leaves considerable flexibility to the analyst regarding the choice of measures. The simplest solution to the problem is to define ad hoc indices of selfcontainment. Measures of this kind have been very widely used, particularly in the delimitation of travel to work areas (Smart, 1974), even though they suffer from obvious deficiencies from the point of view of comparability and transferability. ^[ Current address: National Physical Planning Agency, Zwolle, The Netherlands.
1358
I Masser, J Scheurwater
To overcome these difficulties a number of specialised procedures have been put forward for functional regionalisation purposes in relation to interaction data. The oldest of these strategies is that which was based on the functional distance approach that was developed initially by Brown and his colleagues (Brown and Horton, 1970; Brown et al, 1970; and Brown and Holmes, 1971). This strategy makes use of the assumptions that underlie Markov chain analysis for the delimitation of functional and nodal regions. An alternative to this approach has been put forward by Masser and Brown (1975) as a tool for zoning system design in interaction modelling. Their intramax procedure is a variant of Ward's (1963) well-known hierarchical aggregation procedure which has been modified to take account of the effects that variations on zone size have on the strength of interaction. In addition Slater (1975) has developed a functional regionalisation procedure for the analysis of hierarchical structure in interregional migration flows. This procedure involves the identification of the strong components of directed graphs in an interaction matrix which has been standardised by means of an iterative proportional fitting procedure. These three strategies indicate the diversity of interpretations that can be given in practice to the basic concept of a functional region. They also highlight the element of choice that exists in these methods when seen from the standpoint of potential users. Under these circumstances it is not surprising that the differing properties of these methods have provoked a lively debate on their strengths and weaknesses. Unfortunately, the debate so far has been restricted to specific methods and little attempt has been made to compare alternative approaches in a systematic way. This paper seeks to remedy this deficiency in some measure with respect to the three approaches to the functional regionalisation of interaction data that were cited above. The paper falls into three parts. The first part consists of a discussion of the essential features of each of the three procedures in relation to the criticisms that have been levelled against them. A worked example of the application of the three procedures with reference to a limited subsystem of a larger spatial system is given in the second part to enable a detailed analysis to be made of the computational processes involved in each case. The third part presents the findings of several empirical studies involving the three procedures and a comparison is made of the results obtained by these means with respect to three different data sets. The three procedures have two essential elements in common. They are all designed for the functional regionalisation of spatial interaction data and developed in such a way that data units are uniquely allocated to specific groups. For this reason, the three procedures form a distinctive subset of strategies which can be used for functional regionalisation purposes and the discussion of functional distance excludes the option that is contained in this approach for delimiting overlapping nodal regions. It should also be noted that the authors have been closely involved with the development of one of the three procedures that is being evaluated in this paper and that they cannot therefore be considered as totally neutral observers. However, it should be emphasised at the outset that the exercises that are reported upon in the paper were undertaken with the primary objective of obtaining comparative results which would improve the general level of discussion and awareness of the strengths and weaknesses of all three methods. For this reason it is hoped that the findings of this exercise will be of value to the analyst who must make his own decisions as to which procedure best suits his purposes. 2 Three approaches to the functional regionalisation of interaction data 2.1 Functional distance The concept of functional distance has been put forward by Brown and his colleagues as a tool for the delimitation of functional and nodal regions from spatial interaction data.
Functional regionalisation of spatial interaction data
1359
Functional regions are defined in this case as "areas or locational entities which have more interaction or connection with each other than with outside areas" (Brown and Holmes, 1971, page 57), and nodal regions are regarded as a special type of functional region in which the notion of dominance or order is introduced to reflect the relationship of one area to another. Functional distance is measured in terms of the "net effect of entity properties upon the propensity of the entities to interact" (Brown and Holmes, 1971, page 60). This is indicated by the values of the mean first passage time matrix of the initial interaction matrix. Given this data transformation, various alternative grouping strategies can be used to obtain a functional regionalisation and the results obtained by these means may differ considerably from one another despite the use of a common starting point. In their own work on functional regionalisation Brown and Holmes select a hierarchical aggregation procedure for their purpose which maximises betweengroup variation and minimises within-group variation of the standardised scores matrix that can be derived from the mean first passage time matrix (Ward, 1963). The use of mean first passage time matrices as a starting point for functional regionalisation has been questioned by several authors (see, for example, Stephenson, 1974) on the grounds that measures which incorporate indirect flows are only appropriate for the analysis of linked sequences of movements over time within systems such as the movement of population. In their view serious questions of interpretation arise when this approach is used with respect to other types of interaction. This is particularly the case in connection with daily circular movements such as commuting flows, shopping and school trips. The question of interpretation is further complicated by the fact that two mean first passage time matrices can be obtained from each interaction matrix according to whether the initial matrix or its transpose is used as a basis for the calculation. These reflect the 'forward' and the 'backward' projection possibilities, respectively. This presents no problem where sequential (and thereby 'forward') movements are involved as is the case with migration but raises serious difficulties in other cases. Issues of this kind demand very careful consideration particularly where comparisons between data sets are involved. For example, a strict comparison of commuting flows and migration data requires a destination (or 'backward') based measure for the former and an origin (or 'forward') based measure for the latter. In addition, the values that are obtained for the mean first passage time matrix do not take into account the effect of differences in size between the destination zones so that there is likely, in consequence, to be a high degree of correlation between them and the size of each zone. This can be interpreted as a measure of the rank order or hierarchy that is associated with each zone. However, it must be borne in mind that these values are dependent on the configuration of zones that is used in the analysis and that different results could be obtained for different configurations of the same set of basic locational entities. For this reason it is important to select entities which can be interpreted in terms of rank order when using this method in empirical studies. The choice of aggregation procedures also presents some awkward problems. It is difficult to reconcile the objectives of the hierarchical aggregation procedure (or, for that matter, most of the other standard procedures in cluster analysis), with the concept of the functional region as it is defined by Brown and Holmes when it is applied to standardised scores of the mean first passage time matrix. There is no guarantee that methods which seek to minimise within-group variation and to maximise between-group variation at each stage of the grouping process in this case will necessarily produce regions which have more interaction with each other than with outside areas.
1360
I Masser, J Scheurwater
Despite these limitations, the functional distance approach is an attractive concept particularly in relation to the delimitation of nodal regions in which rank and order questions are given special prominence. But the interpretation of the transformation of the data set that is used and the choice of aggregation procedure that must be applied to the transformed scores for regionalisation purposes are both open to criticism. 2.2 The iterative proportional fitting based procedure Slater's iterative proportional fitting based procedure involves "the fitting of a hierarchical structure to an asymmetric matrix of linkages" (Slater, 1976b, page 125) which represent the relative intensity of interaction between zones. The initial interaction matrix is transformed in this case to eliminate the effects that differences in its row and column totals have on the volume of interaction by means of the iterative proportional fitting procedure that has been put forward by Fienberg (1970). The entries in this standardised matrix have a number of useful statistical properties. The interaction structure of the original matrix as defined by its cross product ratios is preserved and the entires in the adjusted matrix can also be interpreted as maximum entropy estimates of the original matrix, given the constraints that all its row and column totals are equal (Bacharach, 1970, pages 83-85). A hierarchical clustering algorithm based on the concept of the strong components of a directed graph is then used in relation to the transformed interaction matrix for regionalisation purposes. This can be viewed as "a directed graph analogue of single linkage cluster analysis" (Leusmann and Slater, 1977). Slater is primarily concerned with the analysis of hierarchical structure. He suggests that zones or groups of zones with high scores can be interpreted as possessing a very strong local identity whereas zones or groups of zones with low scores that are isolated at an early stage of the grouping process can be regarded as having a strong sense of national identity insofar as they have no strong tie to any particular zone (see, for example, Slater, 1975, page 455; 1976b, page 128). This interpretation has been questioned by Holmes (1977) on the grounds that low scores may also reflect peripheral regions, and the connection between hierarchical structure and functional regionalisation is also not clear despite claims that the results obtained by these means yield "a highly satisfactory system of functional regions" (see, for example, Slater, 1977; 1978). Holmes (1977; 1978) has criticised the iterative proportional fitting based procedure that is used for transformation purposes on the grounds that this distorts the initial values in the interaction matrix. However, this is more of a practical question than a fundamental statistical question. It reflects the problems that may arise when there are very large differences between zones in terms of their row and column totals. These problems are likely to be accentuated when the method is applied to matrices containing a large number of zero elements and can, under certain conditions, result in the failure of the iterative proportional fitting based procedure to produce a convergent solution (see Mohr, 1975, parts II and III). The aggregation procedure that is used in this case shares most of the deficiencies that have been attributed to conventional single linkage cluster analysis. The most important of these is the property called chaining "which refers to the tendency of the method to cluster together at a relatively low level objects linked by chains of intermediates" (Eyeritt, 1974, page 61). Single linkage cluster analysis is a useful tool for identifying optimally connected clusters but is of only Hmited value in defining homogeneous compact clusters which have more interaction with each other than with other groups. Despite these limitations, the iterative proportional fitting based procedure is a useful tool for the analysis of the hierarchical structure that underlies interaction data.
Functional regionalisation of spatial interaction data
1361
However, the unclear connection between hierarchical structure and functional regions together with the particular problems arising from the use of single linkage cluster analysis based methods may limit its potential usefulness for functional regionalisation. 2.3 The intramax procedure The objectives of the intramax procedure correspond completely with the initial definition of functional regions. The intramax procedure seeks "to maximise the proportion of the total interaction which takes place within the aggregations of basic data units that form the diagonal elements of the matrix, and thereby to minimise the proportion of cross-boundary movements in the system as a whole" (Masser and Brown, 1975, page 510). Like the iterative proportional fitting based procedure the intramax procedure is concerned with the relative strength of interactions once the effect of variations in the size of the row and column totals is removed. In this case, however, relative strength is expressed in terms of the difference between the observed values and the values that would be expected on the basis of the multiplication of the row and column totals alone. This measure can also be interpreted in terms of the additional information that is gained with respect to the structure of interaction from the elements of the matrix as against its row and column totals. The aggregation procedure is simply a modification of Ward's (1963) hierarchical aggregation procedure in which the conventional form of objective function is replaced by criteria which take account of the overall effect of interaction across group boundaries. In this case the zones with the greatest difference in terms of their observed and expected probabilities are fused together at each stage in the grouping process and the process of transformation and comparison is repeated at subsequent stages in the grouping procedure to take account of the effects of previous fusions. The intramax procedure has features in common with both of the other procedures. Like the iterative proportional fitting based procedure, it is concerned with comparisons between pairs of values in the matrix of relative flows but it differs from this procedure in that it is also concerned with the cumulative effect of previous groupings and the combined strength of these flows at every stage in the aggregation process. In the latter respect it is analogous to the procedure that is recommended for the functional distance method except that, in this case, the objective function is based on pairwise comparisons rather than a variance measure which takes account of all observations for the respective (groups of) zones. The formulation of the objective function in the original intramax procedure has been criticised by Hirst (1977) on the grounds that it does not eliminate the effects of unequal marginal distributions. Although this objection can be partially overcome by reformulating the objective function, Hirst argues that the results obtained by these means will tend to favour small zones because of the differences in upper bounds between the values obtained for small as against large zones. Unexpected results may be obtained, particularly when there is a large difference between the amount of interaction that is produced and attracted to a particular zone (see Tyree, 1973). Despite these objections, and the use of Hirst's reformulation of the objective function in subsequent applications (see, for example, Masser and Scheurwater, 1978) the intramax procedure is the only one of the three procedures which explicitly identifies regions which have more (direct) interaction with each other than with other areas at each stage of the grouping process. Because of this, it avoids most of the criticisms that have been raised regarding the interpretation of the results obtained by means of the functional distance approach. The intramax procedure also avoids most of the criticisms that have been levelled at the interpretation of the results obtained by means of the iterative proportional fitting procedure in that it explicitly considers the cumulative effects of previous fusions at each stage in the grouping process.
1362
I Masser, J Scheurwater
The intramax procedure has also a number of practical advantages over the other two methods from the point of view of the user. It involves only a series of direct comparisons between the observed and the expected values that are calculated by the multiplication of the respective row and column totals. This avoids the complex set of matrix manipulations that are required for the calculation of the mean first passage times in the functional distance approach and the repeated iterations of the entire interaction matrix that is needed for the iterative proportional fitting based procedure. Because it avoids overall matrix manipulation, the intramax procedure is more readily applicable to large data sets than either of the other procedures and can also be adapted more easily to deal with large, sparse matrices containing a large number of zero elements. 2.4 Evaluation The main features of the three functional regionalisation procedures are summarised in table 1. This shows that each of the three procedures has advantages and limitations. The intramax procedure is the only one which is directly interpretable in terms of its original objectives. The interpretation of results obtained by the functional distance approach in terms of the authors' own definition of a functional region can be questioned because of the inclusion of indirect flows in the analysis and also when Table 1. A summary of the main features of three methods for functional regionalisation. Functional distance
Iterative proportional fitting based procedure
Intramax procedure
Objectives Delimitation of functional (and nodal) regions: that is, regions which have more interaction with each other than with other regions
Hierarchical regionalisation: that is, fitting a hierarchical structure to an asymmetric matrix of linkages
Delimitation of functional regions: that is, regions which have more interaction with each other than with other regions
Estimation of matrix of relative interaction
Estimation of a matrix of differences between expected and observed values
Distortions likely to occur when large differences in size between zones. These problems will be accentuated where sparse matrices are involved
Estimation procedure repeated at each step of the grouping process. Bounding problems imply bias towards small zones in practice
Strong components of directed graphs identified sequentially
Stepwise cluster analysis for functional regionalisation
Modified version of single linkage cluster analysis. Divisive method based on critical links between individual zones, not groups of zones
Results can be directly interpreted in terms of original objective once effect of size differentials taken into account. Inherent suboptimality of procedure
Data transformation Method Estimation of mean first passage time matrix Comments Summarises indirect as well as direct flows. Assumption that flows are sequentially directional in nature (forwards or backwards) Aggregation procedure Method Choice left open but stepwise cluster analysis of standardised matrix for functional regionalisation Comments Can results be interpreted in terms of initial objective? Inherent suboptimality of procedure. Procedure can also be used for delimitation of nodal regions
Functional regionalisation of spatial interaction data
1363
a total variance minimising method is used for aggregation purposes. Similar questions can also be raised in connection with the use of a form of single linkage cluster analysis to obtain functional regions in the iterative proportional fitting based procedure. The functional distance approach differs fundamentally from the other two methods in that it takes account of indirect as well as direct flows. This gives rise to additional problems of interpretation where there is no clear directionality in the data set as is the case in daily commuting flows. Problems may also arise in practical applications of the iterative proportional fitting based procedure method where there are large variations in zone size and/or matrices containing a large number of zero elements. These problems are largely resolved in the intramax procedure at the expense of a bias in favour of small zones. When the aggregation methods are compared, it can be seen that the iterative proportional fitting based procedure and the intramax procedure are concerned with pairwise comparisons whereas the functional distance approach utilises criteria based on rowwise and ^columnwise comparisons. Despite these differences, the functional distance and the intramax procedures make use of the same stepwise aggregation procedure for grouping purposes. It should be noted that stepwise procedures of this kind do not guarantee optimal results for any given level of aggregation but this can be resolved by extending the basic analytical framework (Masser and Scheurwater, 1978). 3 A worked example 3.1 Introduction The detailed operation of the three functional regionalisation procedures can best be studied by means of a worked example based on one. of the data sets that is considered in the next section. In this section the order of presentation is changed so that the results of the functional distance approach can be considered last rather than first because of the greater complexity that is involved in these calculations. The worked example involves a subset of four contiguous zones within a larger data set (zones 12, 15, 25, and 30 in Greater London). The basic data, together with the results of all three functional regionalisation procedures, are set out in table 2. 3.2 The iterative proportional fitting based procedure A comparison of the results obtained by the iterative proportional fitting procedure with the basic data demonstrates the effect of eliminating the differences in the size of the row and column totals on the strength of the interaction that is recorded in the matrix. This is particularly striking in relation to the directional emphasis that was present in the basic data set. For example, after standardisation, the flow from C to B is nearly as large as that from B to C even though the latter was nearly two and a half times as great as the former in the original matrix. This reflects the influence that the lowest row and the lowest column total exerts on the former as against the effect that the second highest column total and the third highest row total has on the latter row. Similar changes in emphasis can be seen elsewhere in the table. After transformation the flow from D to B is greater than that from B to D even though the latter was nearly double the size of the former in the basic data set. The sequence of grouping illustrates the way in which the iterative proportional fitting procedure both complements and deviates from conventional single linkage cluster analysis. When the values in the matrix are ranked in order of size, it can be seen that the two largest flows concern the interaction between zones B and C. These fuse together at the first stage of the grouping process. The second step of the grouping process involves a more complex procedure by which A groups to B and C despite the very low values that are recorded for the flows from A to B and from C to A. This fusion arises because of the circuit that is created by the strong flows
1364
I Masser, J Scheurwater
Table 2. A worked example. Basic data set of flows from one zone to another D C B A 185 147 36 A (1472) (608) 149 125 62 B 58 (941) 86 20 C 72 118 (787) D 59
Total 666 (2138) 438(1046) 328 (1269) 535 (1322)
Total
503 372 678 922 27984 (1975) (980) (1619) (1709) (64509) Note: Figures in brackets refer to intrazonal flows and to total flows including intrazonal flows. The ... in the tables indicate that data for other zones are included in data set estimates, but are omitted from the tables. Iterative proportional fitting based procedure Results obtained by iterative proportional fitting based procedure A B C D ... Total A B C D
— 0-1948 0-0696 0-1194
0-0995 — 0-2922 0-2111
0-1951 0-3256 — 0-1662
0-1713 0-1906 0-1452 —
1-0000 1 -0000 1 -0000 1 -0000
Total 1 -0000 1 -0000 1 -0000 1-0000 Sequence of strong components defined for grouping purposes B -> C 0-3256 C -» B 0-2922 B and C grouped together D->B 0-2111 A-> C 0-1951 B -+ A 0-1948 A grouped with B and C B -> D 0-1906 D grouped with A, B, and C Intramax procedure Results of first cycle of intramax procedure (in units of 10~ 7 ) A A B C D
300 80 226
B 172 — 466 556
C 425 880 — 551
Combined scores for grouping purposes A B C
D 506 699 397
D
B 472 1346 C 505 1255 948 D 732 B and C group together. Results of second cycle of intramax procedure (in units of 10~7) A B/C D
A
B/C
D
179 226
329 553
506 533
Combined scores for grouping purposes A
B/C
D
B/C 508 D 732 1086 D groups with B and C; subsequently A groups with B/C and D.
Functional regionalisation of spatial interaction data
1365
from A to C, from C to B and from B to A. This type of grouping is a special feature of the strong component method which carries with it the possibility that more than two zones will be combined at the same stage in the stepwise procedure as the circuit of directed graphs is completed. , For example, if there was only a weak flow between B and C in this case, all three zones would join together at the same time. Table 2 (continued) Functional distance Mean first passage time matrix (last row gives mean values) A B C D
A — 43-84 48-84 46-67
B 60-32 — 56-00 57-92
C 29-44 24-37 — 30-70
D 20-14 19-27 22-28 —
Mean
56-79
72-67
45-38
34-52
Z scores derived from the mean first passage time matrix A B C D A B C D
-2-6583 -1-6333 -2-0075
-2-0148 -2-7199 -2-4069
-2-36662 -2-3729 -3-1175 -2-5165 -2-0206 -2-1785
Distanced squared matrix used for grouping purposes (first cycle only) A B C D B 4-54 C 6-09 8-25 D 4-45 5-92 2-97 C groups with D; subsequently A groups with B, and A and B with C and D. Functional distance—the transpose case Mean first passage time matrix of the transpose of the basic data set D (Mean values) A B C — 47-40 65-63 A 45-31 52-31 B — 75-17 84-35 104-32 88-17 — 96-86 111-79 C 91-30 103-32 D — 71-20 52-97 59-83 53-96 Z scores derived from the mean first passage time matrix A B C D A -1-9737 -3-0091 -2-7002 B -2-0487 -3-6996 -2-5344 C -1-5000 -3-6291 -2-6456 D -1-9014 -2-8821 -3-0477 Distance squared matrix used for grouping purposes (first cycle only) A B C D B 10-02 C 10-56 14-26 D 8-28 12-58 8-44 A groups with D; C groups with A and D; B groups with A, G, and D.
1366
I Masser, J Scheurwater
3.3 The intramax procedure The differences between the observed and the expected values that are given in table 2 are based on the revised objective function suggested by Hirst (1977). These estimates also differ from those that have been obtained for the other two procedures in that they also take account of the values of the diagonal elements of the matrix in accordance with recommended practice (Masser and Brown, 1975, page 522). Consequently the relative difference in size between the row and column totals is much less in this case than that in the zero diagonal case. For this reason the results of the data transformation are less spectacular than those obtained by means of the iterative proportional fitting procedure, and the relationships between B and C, and B and D are to a large extent preserved in this case instead of being substantially altered as was the case with the iterative proportional fitting procedure. Despite these differences, the first step in the grouping process involves the same pairing as that for the iterative proportional fitting procedure, but the results obtained for the second and third steps are different in this case because the grouping of A to B and C is inhibited by the weak flows that are recorded from A to B and from C to A. Consequently D groups with B and C at the second stage in the grouping process and A. The results obtained by these means also tend to confirm expectations about the bias towards smaller zones that is implicit in the intramax procedure. In this case the sequence of groupings correlates directly with the size order of the basic data units. 3.4 Functional distance The application of the functional distance approach is much more complex than either of the other two methods. In the first place the mean first passage time matrix must be calculated by a series of operations on the initial matrix. Then the standardised Z scores must be derived from the mean first passage time matrix to provide the basic data that are required for the grouping process. In this case, a distance squared matrix is then calculated at each step in the grouping process to identify the fusion which minimises this function. The results of only the first stage in the grouping process are given in table 2. The mean first passage time matrix must be seen as an inverse transformation of the basic matrix. For example, the column means reflect the differences in the column totals of the initial matrix in inverse order. The values of the elements themselves also take account of the effects of the rowwise standardisation as carried out on the initial data set in the first stage of the data transformation process. For instance, the flow from B to D emerges as being lower (and therefore more powerful) than the flow from A to D despite the difference that exists in the absolute values for these flows. The Z score matrix indicates the extent to which size differences are eliminated by a further standardisation of the mean first passage time matrix. It should be noted that there is an overall similarity between these values and those obtained by means of the iterative proportional fitting and the intramax procedures once the inverse nature of these scores is taken into account. For example the key values of the flows from B to C and C to B are easily identifiable in both cases. However, these points of comparison disappear once the distance squared matrix is calculated on the basis of overall rowwise and columnwise comparisons between zones. Despite the high Z scores that are recorded for the pairwise relationship between zones B and C, the overall distance between these zones is much greater than that for several other possible pairings. Consequently zones C and D group together first in this case, and are followed by the fusion of zones A and B. This disparity between the strength of the pairwise interactions and that of the row and column distances reinforces the objections
1367
Functional regionalisation of spatial interaction data
that were discussed in the previous section with respect to the interpretation of the results obtained by this method in terms of Brown and Holmes' own definition of functional regions. 3.5 Functional distance: the transpose case In this case the results obtained for the mean first passage time matrix reflect the columnwise standardisation of the basic data and the row means correlate inversely with the row totals. For this reason the value of the flow from C to B emerges as being lower (and therefore stronger) than the flow from C to D even though the latter is nearly half as great again in numerical terms as the former. Some points of similarity and some basic differences can be seen if the Z scores from the transpose case are compared with those obtained from the previous transformation. Although the flows from B to C and C to B emerge as the two strongest flows in both cases the rank order of the other flows is often substantially changed. As in the previous case, these differences in individual scores are only indirectly reflected in the distance squared matrix, and A fuses with D at the first step of the grouping process and then C groups with A and D at the second stage. The interpretation of these findings is further complicated because they are also in complete constrast to those obtained in the other application of the functional distance approach. 3.6 Discussion The main findings of the worked example are summarised in figure 1 in dendrogram form. This shows that no two methods give the same results and that there are substantial variations both in the sequence of grouping and in the combinations of zones that fuse together. In the case of functional distance the interpretation of results is further complicated by the existence of two solutions with considerable differences between them. Despite these differences, certain common features between the methods emerge from the worked example. This is particularly evident in the results obtained by different data transformation methods. Generally the consequences of using different transformation methods appear to be less dramatic than those of using different aggregation procedures. For example, similar results would have been obtained for the intramax procedure to those obtained by the iterative proportional fitting based procedure if conventional single linkage cluster analysis had been used in the former case in place of a modified version of Ward's method. The main conclusion that emerges from the worked example is that far more attention needs to be given to the evaluation of aggregation procedures in discussions of functional regionalisation than has hitherto been the case. The results of the worked example also support some of the observations that have been made by the critics of particular procedures. The sequence of grouping in the intramax procedure correlates with the relative sizes of the row and column totals in the manner that might be expected in view of its bias towards smaller zones. A-
A-
A-
A-
B •
B-
B•
D-
c •
C-
C-
C-
(a) (b) (c) (d) Figure 1. Dendrograms for the worked example: (a) iterative proportional fitting based procedure, (b) intramax procedure, (c) functional distance, and (d) functional distance transpose case.
1368
I Masser, J Scheurwater
Similarly, the lack of correspondence between the strongest pairwise interactions and the way in which zones are grouped together by the functional distance method, provides supporting evidence for the criticisms that have been raised regarding the difficulties of interpreting the results obtained by these means in terms of the authors' own objectives. 4 Three empirical studies
4.1 Introduction Three studies were carried out to evaluate the performance of the three functional regionalisation procedures in practice. The intermetropolitan borough migration material for Greater London, and the journey to work data for Merseyside that have been used in an earlier paper (Masser and Brown, 1975) were again utilised to enable comparisons to be made between the results obtained by the initial and revised intramax procedures. As both data sets concern interaction at the intraregional scale, and most applications of the functional distance and iterative proportional fitting procedures have occurred at the interregional scale, a third study was carried out at the interregional level for comparative purposes. This was based on information relating to migration between the forty 'Corop' regions that have been defined for statistical purposes in the Netherlands. The main features of these three data sets are summarised in table 3. This shows that they all involve medium-sized spatial systems varying from twenty-nine to forty zones and that they represent essentially an intermediate level of spatial aggregation. This is reflected in the relatively high proportion of intrazonal trips in Greater London. It should be noted that the somewhat lower proportion of intrazonal trips that is recorded for the Netherlands refers only to movements across local authority boundaries, and that this proportion would be effectively doubled by the inclusion of intralocal authority migration in the statistics. The intermediate level of spatial aggregation is also reflected in the proportion of zero elements in the interaction matrix and none of the three data sets can be considered as a sparse matrix of the Table 3. Basic features of the data sets. [Sources: Greater London (Gibbons, 1972); Merseyside (Liverpool Corporation, 1969); and Netherlands, special tabulations of material provided by the Central Bureau of Statistics.] Study area Greater London
Merseyside
Type of interaction Year Number of zones Percentage of zero elements in the interaction matrix Percentage of intrazonal trips
migration 1965-1966 33 7-6 56-6
journey to work migration 1965 1973 40 29 22-4 0-2 27-2 39-9
General indicators Including diagonal values Coefficient of variation of Coefficient of variation of Correlation coefficient for Correlation coefficient for
the row totals the column totals row totals/column totals interaction matrix/transpose
0-442 0-354 0-932 0-987
0-412 0-935 -0-156 0-642
0-835 0-696 0-951 0-975
Excluding diagonal values Coefficient of variation of Coefficient of variation of Correlation coefficient for Correlation coefficient for
the row totals the column totals row totals/column totals interaction matrix/transpose
0-597 0-358 0-785 0-774
0-446 1-222 -0-389 0-152
0-840 0-622 0-845 0-770
Netherlands
Functional regionalisation of spatial interaction data
1369
kind that might give rise to convergence problems with respect to the iterative proportional fitting based procedure. A more detailed analysis of the properties of the matrix and its row and column totals shows that the commuting data differ in certain important ways from the migration data. In the first place, as the coefficient of variation for the column total which relates to the distribution of workplaces indicates, there are very large differences in size between the number of workplaces in each zone. In this case, one zone, the central business district of Liverpool, attracts one quarter of all work trips in the Merseyside area. Second, it should be noted that there is an inverse relationship between the distribution of workplaces and the distribution of residences in the area. This results in a lack of symmetry in the data set which can be seen from the low value that is recorded for the correlations between the matrix and its transpose particularly when the diagonal values are excluded from the analysis. The two migration data sets are broadly similar in character in that they demonstrate a high correlation between the distribution of origins and destinations which is reflected in a relatively high degree of symmetry in the matrix itself. In both cases the range of variation in the row totals relating to the origins is greater than that of the column totals which relate to the destinations. This reflects the extent to which migration flows at both the intraregional and interregional levels are leading to an overall reduction in the range of variation of population sizes between zones. The values for the coefficients of variation show that the range of values both for the row totals and for the column totals is greater in the Dutch data than in the Greater London data. This reflects the essential differences in size between large urban agglomerations such as Amsterdam and Rotterdam and the rural regions of the Netherlands. By comparison the values for Greater London show a remarkable similarity in size of the row and column totals with one conspicuous exception, that is the City of London. 4.2 Greater London The main findings of the Greater London study are summarised in figure 2 in the form of the dendrograms that were obtained by means of the three functional regionalisation procedures. The results of the transpose case for the functional distance approach are also included for comparative purposes. From this it can be seen that there is a fundamental difference between the results obtained by the iterative proportional fitting based procedure and those produced by the other procedures. Several instances occur where more than two zones combine in a single step in the iterative proportional fitting based procedure and there are also many cases where single zones do not fuse with other zones until the later stages of the grouping process. Seven out of the last nine steps in the grouping process involve single zones. As expected there is some tendency towards chaining which is not evident in the other dendrograms. According to Slater, late entries of single zones into the grouping process should be interpretable in terms of zones that look towards the region as a whole rather than zones which possess a strongly local character. However, although this explanation seems reasonable for cases such as the City of London, Camden, and even possibly Islington, it is hard to see Hackney and Tower Hamlets in these terms. Because of this, these findings raise further questions regarding the interpretation of the results obtained by this method. In contrast, the results obtained by means of the intramax and the functional distance procedures have well-defined tree structures which are characteristic of most applications of Ward's stepwise grouping procedure. In the case of intramax there are relatively few late entries of single zones and none occur after the twentyfourth step of the grouping process. Despite their similarity in appearance, there are
1370
Camden Hackney Newham Barking Havering Redbridge Waltham Forest Tower Hamlets Haringey Enfield Barnet Islington Lambeth Southwark Lewisham Bexley Greenwich Bromley Brent Harrow Eaiing Hillingdon Croydon Merton Sutton Hounslow Richmond Kingston Wandsworth City Kensington Westminster Hammersmith
I Masser, J Scheurwater
1 5 12 15 25 30 33 14 7 22 16 8 9 13 11 17 23 18 19 24 21 26 20 29 32 27 31 28 10 2 3 4 6
40 Iterative proportional fitting based procedure Camden Islington Haringey Enfield Barnet City Westminster Kensington Hammersmith Brent Ealing Harrow HiUingdon Hounslow Richmond Hackney Tower Hamlets Newham Barking Havering Redbridge Waitham Forest Lambeth Wandsworth Kingston Merton Sutton Croydon Lewisham Southwark Bexley Greenwich Bromley
1 8 7 22 16 2 4 3 6 19 21 24 26 27 31 5 14 12 15 25 30 33 9 10 28 29 32 20 11 13 17 23 18
D=r
10
20
30
40
50
60
Intramax procedure
Figure 2. Dendrograms obtained from the Greater London migration data.
70
80
90
100
1371
Functional regionalisation of spatial interaction data
Camden Barnet Kensington Hammersmith Westminster Brent Ealing HiUingdon Harrow Hounslow Richmond Lambeth Wandsworth Croydon Sutton Merton Kingston Lewisham Bexley Greenwich Southwark Bromley City Hackney Islington Enfield Haringey Newham Barking Havering Redbridge Tower Hamlets Waltham Forest
1 16 3 6 4 19 21 26 24 27 31 9 10 20 32 29 28 11 17 23 13 18 2 5 8 22 7 12 15 25 30 14 33
h-
1
1
'—h-h '—.—J u i
i
—'
i
— -—— - i .
=•—->|
h
—n—M—1
13
1
J
v 1
1 1-
~l
—J
1 J
1
J
I
1
1
1
,1
10
100
1
.. , _
1000
Functional distance
Camden Brent City
Kensington Westminster Hammersmith Haringey Islington Barnet Enfield Ealing Hounslow Richmond Harrow Hillingdon Lambeth Southwark Wandsworth Croydon Kingston Merton Sutton Lewisham Bexley Bromley Greenwich Hackney Tower Hamlets Waltham Forest Newham Redbridge Havering Barking
1 19 2 3 4 6 7 8 16 22 21 27 31 24 26 9 13 10 20 28 29 32 11 17 18 23 5 14 33 12 30 25 15
^ r
p-
10 Functional distance—transpose case
Figure 2 (continued)
100
1000
I Masser, J Scheurwater
1372
considerable differences between the results obtained for the functional distance approach according to whether the initial matrix or its transpose is taken for grouping purposes. For example, the top three zones that are contained in the dendrogram for the transpose form members of substantially different groups in the initial case. Figure 3 summarises the main features of all four sets of results in spatial terms for the groupings that occur at the twenty-seventh stage in the aggregation process. This shows that the application of the iterative proportional fitting procedure gives rise to a mixture of very large (local?) regions and very small (national?) regions in the inner city whereas the intramax procedure produces regions which are broadly similar in size and compact in shape with the exception of the central core with its elongation to the City of London. Figure 3 also indicates that the differences between these results and those obtained with the initial intramax procedure are very small at this level of aggregation. In this case, Hackney is allocated to the north eastern group rather than to the northern group whereas Southwark is allocated to the eastern rather than the southern group of zones (see Masser and Brown, 1975, page 514). The functional distance approach produces less compact regions than those obtained by the intramax procedure. A more disturbing question which emerges
Iterative proportional fitting based procedure
Functional distance (shadings indicate earlier fusions of noncontiguous bdu)
Intramax procedure [shadings indicate bdu allocated to different groups in initial (1975) intramax procedure]
Functional distance transpose case (shadings indicate earlier fusions of noncontiguous bdu)
Figure 3. Groupings of basic data units (bdu) at the twenty-seventh stage of the aggregation process in Greater London.
Functional regionalisation of spatial interaction data
1373
from these results is the extent to which noncontiguous zones are fused together by this procedure. This occurs in four cases in the basic functional distance procedure and in three entirely different cases in the transpose case. These findings are particularly hard to interpret because the noncontiguous groupings take place to a very large extent in the earlier rather than the later stages of the. aggregation process which implies that they involve zones with very strong similarities rather than possible problem zones which have no obvious connection to any other zones. These findings raise new questions with respect to the interpretation of the results that may be obtained by this version of the functional distance approach in terms of zones which have more interaction with each other than with outside areas in that it could be taken as axiomatic that nearby (and therefore contiguous) zones should have more interaction with each other than noncontiguous zones. It should also be noted that the grouping of noncontiguous zones has an important effect on the structure of the grouping itself in that it influences and restricts possible subsequent fusions. This will accentuate the extent to which stepwise procedures such as that used in this case produce suboptimal groupings for any given level of aggregation. 4.3 Merseyside The essential features of the dendrograms that are shown in figure 4 are broadly similar to those shown in figure 2 despite the different character of the data set that is involved in this case. The question of late entries together with the tendency towards chaining can be clearly seen in the results for the iterative proportional fitting based procedure. Similar problems also arise with respect to the interpretation of zones which enter the grouping process at a late stage. The inclusion of the central business district in this category might be expected as this is clearly the most nonlocal zone in the Merseyside area, but the interpretation of other late entries is much less clear. Similarly the results obtained for the intramax and functional distance procedures all show clearly the marked tree structures that characterise this type of aggregation procedure. However, as in the case of Greater London, there are potential differences between the results obtained from the transpose of the initial matrix for the functional distance approach as against the initial matrix itself, but this is complicated by the fact that the data are concerned with circular rather than sequential movement and plausible arguments can be put forward to justify a decision to select either as the starting point for an analysis. In the case of the initial data set the analyst would be concerned with the extent to which the distribution of workplaces is the product of journeys from residential areas. In the transpose case the analyst would be concerned with the extent to which the distribution of residential areas is influenced by the location of workplaces. The spatial groupings that are obtained at the twenty-fourth stage in the aggregation process by these procedures are shown in figure 5. Like the dendrograms, the main features of these maps are similar to the maps that were obtained from the Greater London data. The iterative proportional fitting procedure produces three very large groups and two small groups in the inner area. The intramax procedure produces groups which are broadly similar in size, but with a more marked sectoral structure in the case of North Merseyside than was found in Greater London. When these results are compared with those obtained by the initial intramax procedure (Masser and Brown, 1975, page 516), some interesting differences emerge, although the overall pattern of spatial grouping is much the same in both cases. The central business district merges with the north eastern group in the results of the revised formulation of this procedure rather than the southern group, and Huyton (zone 15) does not remain as an isolated zone in this case. As in Greater London, the results of both versions of the functional distance procedure give rise to noncontiguous groupings,
1374
Crosby Bootle Maghull North Docks Everton Kirkdale Fazakerley Kirkby West Derby Huyton Gateacre Aigburth Speke Anfield South Docks Dingle Wavertree Abercromby Edge Hill Princes Park Liverpool CBD Wallasey Hoylake Heswall Claughton Birkenhead Prenton Rock Ferry Bebington
I Masser, J Scheurwater
1 3 2 4 6 5 8 9 14 15 19 20 21 7 11 17 18 12 13 16 10 22 24 26 23 27 25 28 29
vV
20
30
10
Iterative proportional fitting based procedure
Crosby Bootle Maghull North Docks Everton Liverpool CBD Kirkdale Fazakerley Anfield Kirkby South Docks Dingle Princes Park Wavertree Gateacre Aigburth Speke Abercromby Edge Hill West Derby Huyton Wallasey Hoylake Heswall Birkenhead Claughton Prenton Rock Ferry Bebington
1 3 2 4 6 10 5 8 7 9 11 17 16 18 19 20 21 12 13 14 15 22 24 26 23 27 25 28 29
h-n 10
20
30
40
50
60
Intramax procedure
Figure 4. Dendrograms obtained from the Merseyside commuting data.
70
90
100
Functional regionalisation of spatial interaction data
Crosby MaghuU North Docks Bootle Kirkdale Anfield Everton Fazakerley Kirkby West Derby Liverpool CBD South Docks Speke Abercromby Wavertree Dingle Edge Hill Princes Park Gateacre Aigburth Huyton Wallasey Hoylake Bebington Heswall Claughton Prenton Rock Ferry Birkenhead
1375
1 2 4 3 5 7 6 8I 9 14 10 11 21 12 18 I 17 13 I 16 19 20 15 22 24 29 26 23 25 28 27
h^
10
1000
100
Functional distance
Crosby MaghuU Bootle North Docks Kirkdale Fazakerley Anfield Kirkby Everton Liverpool CBD South Docks Princes Park Dingle Abercromby Edge Hill West Derby Huyton Wavertree Speke Gateacre Aigburth Wallasey Birkenhead Hoylake Claughton Bebington Prenton HeswaU Rock Ferry
^^T I
,
1
i
i—-i
n^-i 1
1 1
M
1 l
1
1
'
1 —
0-1 Functional distance—transpose case
Figure 4 (continued).
._
1
—
1
1
1
10
100
1000
1376
I Masser, J Scheurwater
and different noncontiguous groupings occur according to whether the initial matrix or its transpose is used at the starting point for the analysis.
Iterative proportional fitting based procedure
Intramax procedure [shadings indicate bdu allocated to different groups in initial (1975) intramax procedure]
Functional distance (shadings indicate earlier fusions of noncontiguous bdu)
Functional distance transpose case (shadings indicate earlier fusions of noncontiguous bdu)
Figure 5. Groupings of basic data units at the twenty-fourth stage of the aggregation process in Mersey side.
4.4 The Netherlands The main features of the two previous studies are confirmed by the findings of the interregional study that was carried out on the Netherlands migration data. This can be seen in figure 6 which shows the spatial grouping of zones that had been obtained by the thirty-first stage in the grouping process. As in the other two cases the results of the iterative proportional fitting based procedure show a number of large zones, mainly on the periphery of the Netherlands together with a number of further zones in the inner urban core, and Amsterdam (zone 23) remains a separate zone at this stage in the grouping process. The results obtained by the intramax procedure show essentially a pattern of compact regions which takes into account differences in population density within the country, and both variants of the functional distance approach produced groupings of noncontiguous zones. In this case, however, there is a tendency for the same combinations of noncontiguous zones to be repeated in both
1377
Functional regionalisation of spatial interaction data
versions of this procedure, but there are nevertheless very substantial differences-;-t between the two sets of results obtained by these means. -,•>•..-,
0
25
50 km
0 i
25 u
50 km 1
Iterative proportional fitting based procedure
Intramax procedure
Functional distance (shadings indicate earlier fusions of noncontiguous bdu)
Functional distance transpose case (shadings indicate earlier fusions of noncontiguous bdu)
V-.i
Figure 6. Groupings of basic data units at the thirty-first stage of the aggregation process in the Netherlands.
1378
I MasseryJ Scheurwater
4.5 Discussion In all three studies, the results obtained by means of the intramax procedure appear to be more easily interpretable in terms of functional regions than those of either of the other methods. The combination of single zones and very large groups that are produced by the iterative proportional fitting based procedure in all these cases highlights the essential difference between this strategy and the other two methods. The empirical studies show that the interpretation of the single zones in terms of nationwide (or regionwide) as against local characteristics is not always straightforward and the large groups lack the compactness that might be associated with functional regions. Although the dendrograms obtained by means of the functional distance approach have a more regular tree structure and give rise to regions which are more comparable in size, the interpretation of these results is inhibited by the existence of noncontiguous groupings which, in some cases, is further complicated by irregularities of shape. In some cases there is the additional complication of the need to choose between the initial matrix and its transpose as the starting point for investigation. This is ruled out on a priori grounds in the case of Greater London and Netherlands because of the sequential nature of migrational flows, but presents a major problem with respect to the interpretation of the work trip data that were used in the Merseyside case. It should also be noted that the largest differences between the results occur in this case where this ambiguity of interpretation is the greatest. In comparison, the results obtained by means of the intramax procedure present few difficulties with respect to interpretation, but it must be borne in mind that the intramax procedure has been criticised on the grounds that it favours groupings between small zones. If this is the case, there should be a positive correlation between the order of grouping and the size of the row and/or the column totals in these studies. To test this hypothesis the Spearman's rank order correlation coefficient was calculated not only for the results obtained by the intramax procedure but also for those obtained by means of the other methods in the empirical studies and for the initial version of the intramax procedure that was presented in an earlier paper (Masser and Brown, 1975). These results are summarised in table 4 which shows that in every case the intramax procedure, as expected, gives rise to a positive correlation between zone size and order in the grouping process. However, table 4 also shows that the results obtained by use of the iterative proportional fitting based procedure are similar to those of the intramax procedure and that those of the standard version of the functional distance approach also show a positive correlation between the sequence and grouping and the size of the row and column totals. Although the values of the correlation coefficient are generally lower than those obtained for the other two procedures, there are important exceptions as in the crucial case of the column total in Merseyside where the correlation obtained by means of the functional distance approach is greater than that obtained by either of the other methods. The interpretation of the results obtained for the transpose case presents more serious problems. Although the values of the correlation coefficient are fairly strongly positive with respect to the critical test regarding the Merseyside commuting data where a plausible case can be made for basing an analysis on the transpose in preference to the initial data set, the values obtained in the other two studies vary from (barely) negative in the Netherlands case to markedly negative in the Greater London case. A comparison of the results obtained by the revised version of the intramax procedure and the initial version shows that there is a fundamental difference between them. The results obtained by the first version of the intramax procedure show a negative correlation with zone size. This confirms Hirst's criticism that the initial formulation of the objective function did not fully take account of differences in row
1379
Functional regionalisation of spatial interaction data
and column totals. Consequently the results tend to be biased in favour of large zones in this case. The results of the revised intramax procedure also support Hirst's expectations, but table 4 demonstrates that the problem is by no means resolved in the two alternative approaches to functional regionalisation in that similar correlations can be seen in the case of the iterative proportional fitting based procedure and, to a lesser extent, in respect of the functional distance approach when used in conjunction with the initial data set. It is also apparent that the problem of bias cannot be resolved by reverting to the initial formulation of the intramax procedure. This only replaces a positive bias by a negative bias. A possible explanation for these findings can be found in the influence that the initial configuration of spatial units has on the sequence of grouping. This is demonstrated most clearly in the iterative proportional fitting based procedure where the effect of differences in the size of spatial units is also accentuated by the exclusion of the values of the diagonal elements of the matrix following Slater's (1976b, page 126) own recommendations. Given a zero diagonal for spatial units of considerably different size both in population and in areal terms, it might be expected that the strongest links will be formed between smaller units in that local links within larger units will be mainly classified as intrazonal interaction and explicitly excluded from the analysis. Similar problems might arise in the case of peripheral regions where the number of opportunities for interaction is substantially restricted as a result of the location of the spatial units themselves. Some support for this interpretation is forthcoming from a more recent paper by Slater (1979) in which he demonstrates that there are similarities between the dendrograms that are obtained for the USA from interstate migration data and those can be derived from information regarding spatial contiguity and the number of units that adjoin each spatial entity. With this in mind it is worth considering what effect the initial configuration of spatial units might have on the intramax procedure bearing in mind that the diagonal Table 4. Correlations between zone size and point of entry into the grouping process a. Spearman rank order correlation coefficients Greater London Iterative proportional fitting based procedure Intramax method Functional distance Functional distance—transpose case (Initial intramax method) Column totals Iterative proportional fitting based procedure Intramax method Functional distance Functional distance—transpose case (Initial intramax method) Row and column totals combined Iterative proportional fitting based procedure Intramax method Functional distance Functional distance—transpose case (Initial intramax method) a
Merseyside
Netherlands
0-532 0-622 0-230 -0-273 -0-455
0-210 0-371 0-006 0-247 -0-144
0-535 0-640 0-178 -0-051 -0-388
0-314 0-646 0-203 -0-516 -0-409
0-245 0-145 0-354 0-310 -0-285
0-580 0-659 0-108 -0-074 -0-386
0-503 0-637 0-258 -0-438 -0-435
0-423 0-422 0-332 0-523 -0-265
0-542 0-634 0-133 -0-075 -0-384 The intramax results include the diagonal values, the others exclude the diagonal values.
1380
I Masser, J Scheurwater
values are explicitly included in this case. From the interpretation that has been developed above, it might be expected that stronger connections would appear between pairs of smaller zones containing a relatively low proportion of intrazonal interaction than between pairs of larger zones containing a relatively high proportion of intrazonal interaction and that, all other things being equal, the former would tend to fuse together before the latter. For this reason it might be argued that the bias noted by Hirst, far from being a disadvantage, is in fact a positive advantage in that it is a reflection of the inherent characteristics of the structure of spatial interaction in the matrix. Obviously more work is needed before any firm conclusions can be drawn on these issues, but the findings of the empirical studies suggest that these issues cannot be considered solely in statistical terms and that there may be a behavioural explanation for these phenomena. 5 Conclusions This paper draws attention to some of the problems which are associated with the development of functional regionalisation procedures for the analysis of spatial interaction data. It has identified a number of ambiguities in the existing literature on the subject which point to the need for a more rigorous examination of current methodology. The results of the worked example raise important questions as to the role that is played by aggregation procedures as against data transformation procedures in this process. The findings of the three empirical studies suggest that the results obtained by means of the intramax procedure appear to be more readily interpretable in terms of functional regions than those obtained by either of the other two methods. They also suggest that more consideration needs to be given to the extent that the initial configuration of spatial units exerts an influence upon the results that are obtained in all three procedures before definite conclusions can be reached with respect to their statistical interpretation. Each of the three methods has its own particular advantages and limitations from the point of view of the functional regionalisation of spatial interaction data and it is, of course, up to the user to choose which method best suits his particular purposes. When considering the choice of regionalisation procedure certain additional qualifications must also be borne in mind. It must be emphasised that, although the iterative proportional fitting based and intramax procedures incorporate an explicit aggregation method, the functional distance approach leaves the choice of aggregation method open. Two basic strategies have been put forward for this purpose by its authors. The first of these includes the recommendation that a stepwise aggregation procedure of the kind used in this study is employed in practical applications. Other types of aggregation procedure could be used for this purpose, but it seems likely that they would bring with them similar problems to those discussed above with respect to the interpretability of the results. For example, the use of single linkage cluster analysis in this case would probably resolve the problems of noncontiguity at the expense of chaining in the grouping process and the results obtained would be subject to the same criticisms as those levelled at the iterative proportional fitting based procedure on the grounds that they do not necessarily give rise to groupings of zones which have more interaction with each other than with other zones. The second strategy involves a completely different set of rules for grouping purposes. These permit, for example, the creation of overlapping regions and the possibility that zones remain unallocated at the end of the regionalisation process. As stated at the outset, this paper is concerned with the evaluation of procedures which involve a unique allocation of zones to groups and no attempt has been made to consider the results that might be obtained by the nodal regionalisation option that exists within the functional distance approach.
Functional regionalisation of spatial interaction data
1381
It must also be emphasised that the empirical studies in this paper have been concerned with only two types of interaction data in the context of mediurn-sized spatial systems. Further examination is needed of a wide range of data types in relation to a greater variety of spatial systems before more general conclusions can be drawn in this context. These should give special attention to the problems that must be resolved in the functional regionalisation of large spatial systems where the interaction matrices involved contain a large number of zero elements. At the same time further consideration needs to be given to the extent that sample size affects the quality of results obtained by these means. Studies of this kind are essential if the continued development of purposeful tools for spatial interaction analysis is to take place. Acknowledgements. The authors wish to express their gratitude to Dr G A van der Knaap (Economic Geographical Institute, Erasmus University, Rotterdam, Netherlands) and Dr P B Slater (Regional Research Institute, West Virginia University) for carrying out the computational work that was required in connection with the application of the functional distance and the iterative proportional fitting procedures respectively. The authors also wish to emphasise that they alone are responsible for the interpretation that is given to the results obtained by these procedures in this paper. References Bacharach M, 1970 Biproportional Matrices and Input Output Change (Cambridge University Press, Cambridge) Brown LA, Horton F E, 1970 "Functional distance: an operational approach" Geographical Analysis 2 76-83 Brown LA, Holmes J H, 1971 "The delimitation of functional regions, nodal regions and hierarchies by functional distance approaches" Journal of Regional Science 11 SI-12 Brown LA, Odland J, Golledge RG, 1970 "Migration, functional distance and the urban hierarchy" Economic Geography 46 472-485 Coombes M G, Dixon J S, Goddard J B, Openshaw S, Taylor P J, 1979 "Daily urban systems in Britain: from theory to practice" Environment and Planning A 11 556-574 Drewett R, Goddard J B, Spence N, 1976 British Cities: Urban Population and Employment Trends 1951-1971 RR-10, Department of the Environment, London Everitt B, 1974 Ouster Analysis (Heinemann, London) Fienberg S E, 1970 "An iterative procedure for estimation in contingency tables" Annals of Mathematical Statistics 41 907-917 Gibbons C, 1972 "Interborough migration" RM 340, Planning and Transportation Department, Greater London Council, London Hay D, HallP, 1977 Urban Regionalisation of Great Britain 1971 European Urban Systems WP-1.1, Department of Geography, University of Reading, Reading, Berks Hirst M A, 1977 "Hierarchical aggregation procedures for interaction data: a comment" Environment and Planning A 9 99-103 Holmes J H, 1977 "Hierarchical regionalisation by iterative proportional fitting procedures: a comment" IEEE Transactions on Systems, Man and Cybernetics 7 474-477 Holmes J H, 1978 "Transformations of flow matrices to eliminate the effects of differing sizes of origin-destination units: a further comment" IEEE Transactions on Systems, Man and Cybernetics 8 325-332 Leusmann C S, Slater P B, 1977 "A functional regionalisation program based on the standardisation and hierarchical clustering of transaction flow tables" Computer Applications 4 769-777 Liverpool Corporation, 1969 Merseyside Area Land Use Transportation Study, Final Report Liverpool Corporation, Liverpool Masser I, Brown P J B, 1975 "Hierarchical aggregation procedures for interaction data" Environment and Planning A 7 509-523 Masser I, Brown P J B (Eds), 1978 Spatial Representation and Spatial Interaction (Nijhoff, Leiden) Masser I, Scheurwater J S, 1978 "The specification of multi level systems for spatial analysis" in Spatial Representation and Spatial Interaction Eds I Masser, P J B Brown (Nijhoff, Leiden) pp 151-172 Mohr M, 1975 A Consistency Problem of Multi Regional Input Output and Existence Conditions of Constrained Biproportional Matrices PhD thesis, Massachusetts Institute of Technology, Cambridge, Mass
1382
I Masser, J Scheurwater
Slater PB, 1975 "A hierarchical regionalisation of Russian administrative units using 1965-1969 migration data" Soviet Geography 16 453-465 Slater P B, 1976a "Hierarchical internal migration regions of France" IEEE Transactions on Systems, Man and Cybernetics 6 321 -324 Slater PB, 1976b "A hierarchical regionalisation of Japanese prefectures using 1972 inter prefectural migration flows" Regional Studies 10 123-132 Slater P B, 1977 "Reply to Holmes" IEEE Transactions on Systems, Man and Cybernetics 7 477 Slater P B, 1978 "Reply to Holmes" IEEE Transactions on Systems, Man and Cybernetics 8 332 Slater PB, 1979 "State boundary length as a determinant of migration regions" unpublished manuscript, Regional Research Institute, Morgantown, W Va, USA Smart M W, 1974 "Labour market areas: use and definition" Progress in Planning 2 239-353 Stephenson LK, 1974 "On functional regions and indirect flows" Geographical Analysis 6 383-385 Tyree A, 1973 "Mobility ratios and association in contingency tables" Population Studies 27 577-588 Ward J H, 1963 "Hierarchical grouping to optimise an objective function" Journal of the American Statistical Association 58 236-244
p
© 1980 a Pion publication printed in Great Britain