Comparisons of competing methods could provide important guidelines. A gcr,eral evaluative approach based upon the discrepancy between the orrginal ...
Regional Sctence and Urban Economics 10 (1980) 203 537. 4 North-Holland
REGIONAL A
TAXONOMY
Comparison of Some Hierarchic and Non-Hierarchic Strategies* Manfred
M. FISCHER
Final version recetled March 1979
The elTects of subjective decrsions. particularly In the case of regional taxonomic methudologq arc often underestimated. Therefore the aptitude OC alternative methods needs more c.areful cvruuation. Comparisons of competing methods could provide important guidelines. A gcr,eral evaluative approach based upon the discrepancy between the orrginal resemblance and the derived cophenetic matrix is outlined for the hierarchic regional taxonomic case. Five main aggregative linkage strategtes are appraised by means of quantitative measures. Furthermore two iterative non-hierarchic methods are compared for handling non-hierarchic regional taxonomic problems. where particular attet;tion IS paid to the influence of different Initial partitions upon the (sub) optimal result.
I. introduction: Different regional taxonornic problems One of the most central terms in geography and regional science is the term ‘region’. From a systemtheoretic point of view a (general) region can be considered as a (relational) system, in which a set of spatial (in the geographic space topologically connected) basic units is characterized by a set of k-ary predicates with k 2 1. Then a homogeneous region consists of spatially contiguous basic units which show a hign degree of correspondence concerning the chosen I-ary predicates ( =attributes), and a functional region of spatially contiguous basic umts which indlcatc a high degree 01 interdependence concerning the k-ary prcdicatcs with I; > 1 ( = Ii-L!Jy relations). Furthermore, the subset of k-ary predicates with kz2 is empty in the case of a homogeneous the subset of attributes is
region,
whcrcas
in the cast of a functional
region
empty. In practice often functional regions dcfincd by only one 2-ary relation such as interaction of goods. serviiccs, capital OI labour arc used. It is necessary to distinguish between homogeneous respectively functional regions and homogeneous respectively functional regional types where the *This paper was presented at the Regional Conference of the Internut~onal (if-ographtcai llnion, Working Group ‘Systems Allalysls and Mathematical Motlcl~‘. l.;t~o, ILII~ 25 71. 1’)7h.
Hierarchic homogeneous regional typification
Hierarchic regional taxonomic problems
Homogeneous Non-hierarchic homogeneous regionai typification
--___
Non-hierarchic regional taxonomic problems
-
.’
taxonomic
problems
Hierarchic homogeneous regionalization
Non-hierarchic homogeneous regionalization
taxonomic
regional
‘Table 1 of disjoint
regional
A classification
regional
Hierarchic functional regional typification
Non-hierarchic functional regional typification
Functional
problems. problems
Hierarchic functional regionahzation
regionalization
functional
Non-hierarchic
taxonomic
-
S k
spatial basic units do not occupy a contiguous area in geographic space. The terms region and regional type can be subsumed under the category regional taxon. Therefore the problems of determining and delimiting functional or homogeneous regions respectively regional types can be denoted as regional taxonomie ones, The main distinguishing characteristic between regions and regional types is the spatial contiguity of tne spatial basic units. The above mentioned distinction into four different regional taxonomic problems, however, will not always be sufficient. A further disaggregation into hierarchic and non-hierarchic problems see--Is to be more adequate. The resulting classification of these eight different regional taxonomic problems is shown in table 1. Although it may be desired in some specific situations to construct overlapping ( =non-disjoint) regional taxa we have chosen to discuss only regional taxonomic problems for which the result is a set of non-overlapping regional taxa in the non-hierarchic case or a finitt family of distinct partitions of the given spatial basic units in the hierarchic case, where each of these partitions contains only disjoint regional taxa. The relevance of considering regional taxonomic problems may be clarified by the following remarks: (i) Efficient regional policy an,d planning require in the first step the availability of relevant regional data and therefore purpose-oriented adequate regional taxa in order to obtain an insight in certain aspects of the regional structure and to use suitable instruments. Regitinal typifications and regionalizations can help to reduce the number of spatial basic units with respect to a certain purpose. In this way a more adequate and more efficient ordering and storing of regional data can be achieved. It should be noted that by means of regional typifications or regionalizations real empirical systems can not be explained. By giving insight into correspondences and differences concerning regional phenomena they can, however, stimulate us to explore and to develop meaningful spatial hypotheses. (ii) If we accept that a geographic theory (in the sense of the analytic philosophy of science) is a system of informative statements whose explanandum includes certain spatially indicated terms and where a spatial indication of the explanandum leads to a spatial indication of the statements of the antecedent conditioils, the boundary conditions and ‘or the nomoiogical hypotheses of the systcir., then regional taxa could make an important contribution to such a spatial indication and therefore -more generally formulated .- to geo*Traphic theory building respectively to a confirmation or rejection of exi+ng geographic theories or spatial hypotheses. (iii) Spatial mathematical model building can be considered as a 3-stage process [cf. Cliff and Ord (1975, discussion pp. 342-343)]. The first stage
506
M.M. Fischer, Regional taxonomy
involves the specification of the model form using mathematical and statistical techniques. The second one is primarily concerned with the design of a zoning system (i.e., with solving non-hierarchical regional taxonomic problems) and the third stage with the integration of some expression of macrospatial structure (e.g., the definition of an interaction function) into the model. Openshaw (1977) could demonstrate by a series of empirical studies concerning spatial interaction modeis that different regionalizations of a study area affect both the interpretation and the acceptability of this model type. Therefore the process of solving nonhierarchical regionalization problems plays an important role in the second stage of the above mentioned model building process. There are a number of numeric methods for so!ving the regional taxonomic problems. In order to choose between them one needs to know their relative merits and the kinds of taxonomic informations they produce. One main purpose of this contribution is to discuss and to evaluate some competing taxonomic methods for the hierarchic homogeneous as well as for the nonhierarchic homogeneous case.
2. A framework for solving homogeneous regional taxonomic problems 1. order to specify and to illustrate the homogeneous regional taxonomic problems we start from the following situation: Let the set 0 = {O,,. . .,O,,). (={l,..., n}) denote the n spatial basic units characterized by a set A=(&..., AP) of p (in our study metric) attribute::. The measurements on the attributes of the n spatial basic units can be arranged in a (~t,p)-data matrix. (iij), where each entry .~ij in the data matrix is the score of Oi for Aj. The (p, 1)-matrix pi : =il= (i- ,,, . . ., .~i,)’ denotes the vector representing Oi. The II spatial basic units represented by the 11vectors x,, . . ., 8, can be considered as n spatial points in the p-dimensional attribute space. In each case of the homogeneous regional taxonomic problems the u spatial basic units are to be grouped into smaller subsets, so-called regional taxa, in such a manner, that at least one of the following two principles is fulfilled : .-The principle of iwerml I~ornogtw~it~~: The individual regional taxi\ should I)e as homogeneous in the attribute space as possible with respect to the specified attributes. --The prirlciple CJJ’ exterrtal separation: Different regional taxa should be as much apart in the attribute space as possible with respect to the specified attributes. The proce:,s for solving the homogeneous
regional taxonomic
problems by
means of numeric (1978a. b)],
methods
involves
various
stages
[compare
Fischer
(i) the definition of problem and target, which influence the size and the number of the spatial basic units in the study area, (ii) the selection of attributes characterizing the spatial units with respec :t to (i) and the standardization or normalization of the attribute values, (iii) the choice of a resemblance measure, (iv) the choice of a taxonomic strategy, (v) the diagnosis and evsluation of the result by means of statistical procedures (e.g.. by discriminant analytic methods). In this particular contribution stage (iv) is considered. Attention is also paid to the influence of different resemblance measures llpon the results in the hierarchic regional taxonomic case [cf. stage (iii)]. There are a number of different taxonomic methods. Hierarchic methods ‘optimize’ in some lvay a hierarchy of regional taxa, but in general do not produce an ‘optimal’ nonhierarchic structure of the set 0 by successively aggregating respectively disaggregating the regional taxa. Misclassifications resulting from earlier steps in the hierarchical process also influence the subsequent following grouping. iterative non-hierarchic methods, however, possess the reallocation property which can be considered as a valuable advantage for solving non-hierarchic regional taxonomic problems. Therefore hierarchic taxonomic methods, especially aggregative ones,’ can be used only for solving hierarchic regional taxonomic problems. while for solving nonhierarchic regional taxonomic problems there are particularly iterative nonhierarchic methods. A solution of the non-hierarchic (disjoint) regional taxonomic probkms leads to a partition P:=( T,,. .., T,) of the given set 0 into pairwise r,onoverlapping and non-empty regional taxa T,. . . ,, T,, where m is smaller than t1 and any spatial basic umt is assigned to one taxon only. There is a large number 01’different taxonomic methods for solving this kind of problem. In section 4 two iterative non-hierarchic r;lethods will be compared. A solution of the hierarchic regional taxonomic problems yields to a hierarchy If regional tuxa, more precisely formulated, to a family (P”“, . . ., I”“. . . .,P”‘l of distinct partitions of 0 into pairwisc non-overlapping regional taxa with the following properties: (i) P’“’ is the trivial partition basic unit.
containing
a regional taxon for each spatial
‘Whereas polythetic dlsaggregative hlerarchlc methods require an intolerably large amount of computation, monothetic disaggregative hierarchic methods produce rather hetcrtjgcncous regional taxa and often lead to mwzlasalficatwns.
M.M. Fischer, Regional
508
taxonomy
P(I)
is the trivial *partition containing only one regional taxon with all spatial basic units. and ts;n-1, that (iii)P(‘) is an immediate refinement of P (r+l) for O~r~t-1 is P(‘) is formed from Pcrel) by aggregating one or more sets of regional tha of P(‘- l) to one or more new taxa. Therefore the number m(‘) of taxa in P(‘) must obey the constraint m(‘)> #+ ‘) for 05 r 5 t - 1, where &+ ‘I denotes the number of taxa in P(r-‘-l). (ii)
Case StudyPortugal 1970
m Q 20 40
Fig. 1. Disaggregation
60 BOhm
of the study
area Continental basic units).
Portugal
into n= 18 district% (=spatial
The effects of subjective decisions in hierarchic respectively non-hierarchic taxonomic strategies are often underestimated. Therefore the role of alternative methods needs more careful evaluation. Methodological and theoretical considerations could help to guide the choice of a method [e.g., Fischer (1977, 1978a)], but are not quite satisfactory for the reason that they give only little insight into the performance character of the n.ethods when applied to empirical data. In order to compare the different methods we have taken Continental Portugal with its spatial disaggregation into )I= 18 districts as actual example. The location of these spatial basic l+ts is given in fig. 1. A general breakdown of the 22 attributes2 chosen to characterize ‘The selection of the 22 attributes for the analysis was highly influenced by the data availability as well as by the knowledge of the socioeconomic structure of the Portuguese districts and was based on an examination of the product moment correlation matrix of 40 attributes by discrrrding those with very similar (logical) correlation patterns.
.M.M. Fischer.
Regional
tnxonomJ
509
Table 2 A list of the attributes used to characterize the n= lg Portuguese spatial basic unitsa Attributes A, A2 A3 A,
Attributes
4 A, A, A8
the demogruphic
structure
in 1970
of of of of
the employment
structure
in 1970
labor force managerial skilled workers in service population engaged in industry population employed in agriculture, hunting, forestry and fishrng rhe kousing structure Ct 7970
dwellings dwellings dwellings dwellings
characteriring
with bathroom and/or shower constructed after 1945 with 6 and more rooms with tlcctricity
the household
structure
in 1970
percentage of single-person households percer:age of owner-occupied households percentage of worker households
Attributes A 16 A ,, A ,I A ,p A 2. AL, AZ2
of of of of
characterizing
percentage percentage percentage percentage
Attributes A I3 A I4 A ,5
charucteri:ing
percentage percentage percentage percentage
Attributes AQ A IO A 11 A 11
characterizing
general fertility rate ( =number of births per 1000 females in the age group 1.Z49) infant mortality rate I ‘number of ‘nfant deaths per 1000 life births) emigration rate I = number of emigrants per 1000 of the population) percentage of the population with 65 years and oldet
reflecting
other socioeconomic
aspects
in 1970
private consumption of electric energy per household number of hospital beds per 1000 of the population number of pupils in secondary schools per 1000 of the population GNP per labor force number of illegitimate births per 1000 life births population densit) percentage of urlxn population -.
“Sources oj the duru: instituto National de Estatistica (1 Recenseamcnto da Habita@o 1970, Anuario Estatistico 1970, 1974, 1976: Estatistica da Educaclio 1970; Estatis:icas da Saide 1970; Estatisticas da ConstrucPo e da Habitdc2o 1970; Estatisticas da Energia 1970).
the socioeconomic structure of the Portuguese spatial basic units in 1970 is outlined in table 2. The choice of the attributes usually depends upon different factors, such as upon the purpose of the regionalization respectively regional typification, and therefore upon an implicit or explicit theory respectively hypothesis, furthermore upon the data availability and upon the discriminating power of the attributes. It is important to note that the choice of the attributes also reflects in -some kind the investigaror’s judgement of relevance for the purpose of the regionalization reF.pectively regional typification. Since certain attributes may possess a high degree of variation owing to a measurement in different dimensions and would dominate the other attributes in the taxonomic process, al1 attribute values are reduced to the
510
M.M.
Fischer, Regional taxonomy
transformation used in this study is a same scale. The particular normalization procedure, where each vector 2.j of the raw data matrix (iii) is divided by its norm Il?;.jllz. Th e normalized vector is denoted by xaj and is defined as
In section. strategies section 4 taxonomic
3 we will discuss and compare some aggregative hierarchic linkage for solving the hierarchic regional taxonomic problems and in two iterative non-hierarchic methods for non-hierarchic regional problems.
3. Hierarchic homogeneous regional taxonomic problems: An evaluative compzison of some aggregative linkage strategies The srarting point of the aggregative hierarchic strategeies considered in this section is the estimation of taxonomic resemblances between any two spatial units. 3.1. Resemblance
measures used in the study
Several resemblance measures have been proposed for metric attributes. In principle two different resemblance concepts and therefore similarity respectively dissimilarity ( = distance) measures can be distinguished. Similarity measures will produce larger values in the case of greater resemblance and, consequently, in the case of less resemblance smaller values, while the opposite is true considering distance measures.3 The most commonly used of all (metric) measures is the euclidean distance which can he derived from the &-norm of a vector and is defined for any (Oand rinteger, rational or x . Under thesedistance measures&, is known as Manhattan or City-Block metric. In taxonomic use the distance d,(Xi,Xj) is usually multiplied by a factor p- ’ ’ in order to take into account that p may vary, e.g., in the.case of missing data. Thzse modified distance measures (1, are called average Minkowski metrics and possess the same properties as the Minkowski metr LS.They are invariant under any translation. r12(respectively d,) is furthermore invariant under any orthogonal linear transformation, but none of these measures is invariant as regards the changes of scales or other non-singular transfo’mations. The parameter r has the function of internally weighting the differences ~.~ik -xjk~ for k = 1, . . .,p in dependence of their respective magnitude. Therefore the dl- (respectively (I,-) metric, or more generally formulated the d,- (respectively (i,-) metrics with r> 1, attach greater importance to larger differences than to smaller ones within the scope of resemblance measurement, whereas in the case of d, (respectively d, ) smaller and larger differences are dealt in the same way. In contrast to these overall distance measures so-called partial resemblance measures emphasizl: some particular aspects of the resemblance structure, but neglect others. As a representative of this kind of measure we have chosen the non-metric itngular which ignores additive and measure product moment correlation proportional size differences.’ f:
(Sik
-si,
)(sjk-.~j,
)
(4)
with - I 5 ‘i,~ 1 and where .ii. (respectively .tj,) is the mean of the ith (respectively jth) row of the data matrix (.K,~).This measure proves to be adequate only in the case when the resemblance between basic units can be expressed by a linear relation between corresponding cotnponcnts of Xi and for k = I,, . ., p and with constants LS.~) and when X, (i.e., SiLr therefore the resemblance is detcrmlncd by the shape components and not by (‘.yjk
+
(’
‘The rcsemblancc between basic units can be separated term
size difference
unit
0,
.Y,~=cxil all
k:
differ
from
may he used to describe situations
thttw
of another
for all I,: proportional additive
\ipc difQrcncc)
rli\t;Inc‘c\ l’rom IIIC ~II~II)
0,
by the
\ILC dlfferencct
SIX
multiplication
\~;ICC tllkr
and shape componcnth.
of some
or bv some constant
01~ 111~ contrarv
111IIIC .IIIII~ILIIC
into
in which the attribute
-two
qxttial
constant
amount
basic
units
o111> III the \hapc Ic.~..
T!K
values of one basic factor
C. (IX..
C (IX.. .Y,~= .x,~ 4 L for which
are at equal
Boyce (196Y)j.
(i2(XirXj):=
rtx,, X,): =
Product moment correlation measure
Formulae
Average euclidean distance measure
Average Manhattan distance measure
Name
=I
i
Resemblance
(xik-xjkjz
Table 3
i
)
used for solving the hierarchic
t (xik-xi.)-(xjk-xj k=l
=I
f
measures properties
problems.
Overall distance measure; metric; %variant under any translation; larger and smaller differences 1.~~~ - xikl are dealt in the same manner
Important
taxonomic
Partial similarity measure; ignores proportional additive ditferences in size; non-metric
and
Overall distance measure; metric; invariant under any translation; invariant under any orthogonal linear transformation; particular stress is laid upon larger differences lx, - xjlI]
regional
M.M. Fischer. Regional taxonomy
513
the size components of the resemblance structure. In general, any fixed nonlinear relation between corresponding components of Xi and Xj will lead to a ri,-value which is different from + 1. Furthermore r is neither invariant with respect to any translation nor invariant with respect to changes of scales and therefore strongly depending upon the chosen scale unit as well as upon the zero vector. The three resemblance measures used for solving hierarchic regional taxonomic problems are summarized in table 3. 3.2. Linkage strategies
used in the study
The hierarchic taxonomic strategies compared in this aggregative. They base upon the following general principle:
study
are
(i) Starting position: Each of the )I spatial basic units forms a regional taxon, i.e., the finest partition PC’) is the starting point. (ii) rth step in the aggregation process: Those two regional taxa q, T,EP(‘-” with p# v, which show a maximum of similarity respectively a minimum of separation of all pairs of regional taxa belonging to PC’- ‘I, are grouped together to form a new regional taxon denoted by (T,, T’.). This step is reiterated until a partition containing all the n spatial basic units is achieved.’ Evidently, the taxonomic process depends essentially on the definition of a resemblance measure between taxa. Five different possibilities which lead to the following taxonomic strategies are considered here? - Single linkage strategy (SLS), - Complete linkage strategy (CLS), - Unweighted centroid linkage strategy (UZLS), - Unweighted arithmetic average linkage strategy (IIAALS), -Weighted arithmetic average linkage strategy (WAALS). In the case of the single linkage strategy the reserrblance between distinct regional taxc\ is defined as (dis)similarity between their closest spatial basic units, one in each regional taxon. Consequently SLS is capable of recognizing non-ellipsoidal regional taxa in the attribute space, but is not able to delineate poorly separated regional taxa. The disadvantage of leading often to long serpentine-like regional taxa (i.e., to the so-called chaining ‘If the maximum respectively minimum is not determined uniquely one may choose arbitrarily one pair of basic units showing the highest respectively the smallest value. “These five linkage strategies combmed with the above mentioned three resemblance measures yield to 15 different linkage methods. We h tve c(losen to use the MINT package program. a minimal NT-SYS version, developed by Rohlf (Department of Ecology and Evolution, State University of New York at Stony Brook).
514
M.M. Fischer. Regionul tuxonomp
tendency) and generally to rather heterogeneous regional taxa has been criticized frequently. The CLS is conceptionally as simple as the SLS. The resemblance between two distinct regional taxa is now defined as (dis)similarity between their most remote pair of spatial basic units. This criterion in general makes the CLS produce compact regional taxa which can be joined to others only with difficulty and at relatively low overall similarity values. As all basic units in one regional taxon are linked to each other basic unit of this taxon at some minimum similarity respectively maximum distance value, this strategy leads to more homogeneous regional taxa than SLS. In contrast to SLS and CLS the arithmetic average linkage strategies UAALS and WAALS compute the resemblance between two regional taxa TM,7;, as the arithmetic average of the (dis)similarities between any pair (O,,O,), where 0, belongs to T,+and 0, to 7;.. WAALS differs from UAALS in such a way that it attaches equal importance to the regional taxa being fused, regardless of their sizes. On the contrary, UAALS weights the single basic units of both regional taxa equally and assigns therefore greater weight to larger regional taxa. The unweighted centroid iinkage strategy assumes that the regional taxa are represented adequately by the centroids and measures the resemblance between regional taxa as the (dis)similarity between their centroids. UZLS assigns in contrast to the weighted centroid linkage strategy - equal weight to each basic unit of the regional taxa being aggregated. The centroid strategy UZLS often does not yield to monotonic results, i.e., that the similarity (respectively distance) value associated with t2e aggregation of the most similar regional taxa often does not increase (respectively decrease) monotonically from aggregation step to aggregation step. Therefore it may turn out that the similarity (respectively distance) between awe regional taxa may be greater (respectively smaller) than the similarity (respectively distance) between pairs of regional taxa which have been aggregated at earlier stages of the hierarchic process. This phenomenon, known as reversal phenomenon, diminishes the efficiency as well as the interpretability of the centroid strategy. These five taxonomic strategies are generally applicable to similarity and distance matrices. Lance and Williams (1967) have developed a recursive formula for the five above mentioned taxonomic strategies by which the resemblance Rr,,r, between regional taxa T,,.= T,u T,, and T, can always be computed from previous finer regional taxa (i.e., from the resemblances between regional ta7.a of the previous finer partition),
wkere ctx,,xc, /3, -y are parameters which depend on the specific taxonomic stidtegy. where R *,, l (respectively RT,T,, R,,,,,, RTuT,,) denotes the resemblance between the regional taxa T, and 7;,,.- T,u IT;. (respective!y r,
4,
(I”,
4.r
:
:
zr
Y.
:
2
1.
31”
.4,,
---r (Y”,.)
Y.
0
0
B
-t
7
0
3
criterion
0
’
rT
’
I
“’ 0 a
ET”
Y 3 L’ O”El
”
( ”
ET
0, F T
0
u:,
’
I
.
a
’
’
0 EI.” ’ ’
O”E7
O:El*
ET
” + L. 0
measure
with x,.,, R,. centroids
respectively
s
T_.T,
between regional
of T,, 7;.
{sr,.i.
min id,- r ; = max i&.1.
respectively
T..I,
max isr r ; = min
T.T I
min (d, T i = min {d,,).
respectively
max {sT,r,} = max (s,,}. Uf,’ T”.r# 0 I)ETI
Aggregation
aNotes: d, I respectively sTUr. denote a distance respectively similarity spatial basic >iits belonging to the regional taxon TU.7;. respectively T,,.. bLance and Williams (1367). ‘Johnson (1967). dSneath and Sokal (1973).
UZLS ( =ccntroid,b unweighted pair-group centroid methodd)
----
TLS (z-furthest nel;hbor,” maxlmum method’)
--_------~-----~~~
SLS ( = nearest neighbor,b minimum method’)
Strategies
properties
(1967)].
taxa T,, 7; ; qu.ql. respectively q., ¬e
the number of
Equal weight to each basic unit of the regional taxa being fused, sometimes chaining tendency, reversal phenomenon, no particular stress laid upon either the principle of ~xtrrnul sepurution or the principle of internul homogeneity [attribute space conserving strategy in the sense of Lance and Williams (1967)].
Relatively compact regional taxa, particular stress laid upon the principle ojinternal homogeneity [more generally formulated: attribute space dilating strategy in the sense of Lance and Williams (1967)].
Williams
Relatively heterogeneous regional taxa and often tendency to chaining, particular stress laid upon the principle of externul srpurution [more generally formulated: attribute space contracting strategy ir the sense of Lance and
Important
of the aggregative linkage strategies used for solving the hierarchic regional taxonomic problems.’
Parameter values
Characterization
Table 4a
Z
4.
“”
9, 4.”
a,
0
P
---------__
0
Y
”
,
”
’
; =
’
O”E I’
max “z,. I
{&d..).
{&;;%$
Y Y 0, E T
oy;
0:~ r’ ’
;;;
measure between regional
problems.*
Nore: xT”v:=(q”3T
c)/q””
qMFdenote the number
I +q;x.
of
Equal weight to the regional taxa being aggregated (regardless of their sizes), no particular stress laid upon either the principle of external separation or the principle of internal homogeneity, combined with the l-stage regionalrevershl approach phenomenon ization [attribute space conserving. strategy in the sense of Lance and Williams (1967)-J.
---------_
Equal weight to each basic unit of the_regional taxa being fused, no particular stress laid upon either the principle of external Gpparation or the principle of internal homogeneity. combined with the l-stage regionalization appreach reversal phenomenon [attribute space conserving strategy in the sense of Lance and Williams (196711.
properties
taxonomic
Important
regional
taxa T., T, ; q., ql, respectively
with w’,= (1/2p- respectively w, = (1/2pc and a, respectively u, the number of prior aggregation steps of 0, respectively 0,.
Tw,T
max {sr
T*,T P
used for solving the hierarchic
criterion
min {d, T ) =
respectively
;.“T” {*+,I= I
Aggregation
linkage strategies
aNotes: d,“, L respe~!, iely .~r,~* denote a distance respectively similarity spatial basic units belonging to the regional taxon TM,T, respectively T,,. ‘Lance and Williams f 1967). ‘Johnson (1967). dSneath and Sokal (1973).
WAALS ( = weighted pair-group method using arithmetic averagesd)
------__
UAALS (= group averageb unweighted pair-group method using arithmetic averagesd)
Strategies
values
of the aggregative
Parameter
Characterization
Table 4b
M.M. Fist her, Reginnal :axonomy
517
and T,, 7;, and T,, T, and 7;.) measured by a distance or by a similarity measure. The values of the parameters ru, r,.. p,;’ for the different strategies are presented in table 4, where q,,. (respectively q,, q,.) denotes the number of spatial basic units belonging to the regional taxon T,, (respectively T,, T,). 3.3. A general perspective c$ evaluative comparison typification results
of the hierarchic
regional
After having discussed some of the properties of the linkage strategies as well as of the resemblance measures used in the study, the live aggregative strategies applied to the average Manhattan distances between the 18 Portuguese spatial basic units are compared first and then the three resemblance measures using one particular taxonomic strategy. Hierarchic regional taxonomic results obtained from alternative strategies can be compared and appraised in an intuitive, subjective manner. A visual inspection of the dendrograms produced by CLS, UAALS, WAALS and based upon average Manhattan distances shows only slight topological differences and even no topological differences between the UAALS and the WAALS result. Comparing the UAALS and the WAALS dendrograms with the SLS and the UZLS ones, marked differences in outcome can be observed. It should be noted that only the regional subtaxon (0,, 0,, O,,, 0 ,,, Ols) is simultaneously recognized by all the five linkage strategies. Furthermore the chaining-tendency of SLS and UZLS is noteworthy (cf., fig. 2). Substantial assistance for an adequate choice between the different solutions of the hierarchic regional typification problem may be achieved by means of a quantitative evaluation. Jardine, Jardine and Sibson (1967) and others have shown that a dendrogram can be characterized mathematically as an ultrametric d* respectively s *. This ultrametric measure is defined by the monotonically increasing (respectively monotonically decreasing) numeric values, which are associated with each level of the tree or more precisely formuiated with each partition of the finite family of partitions. By means of this ultrametric a corresponding resemblance matrix (d$) respectively (s$), the so-called cophenetic matrix, can be derived. An entry of this matrix describing the resemblance between a pair of spatial basic units is defined as the numerical value associated with that partition in which these basic units first occur in the same regional taxon. The general evaluative pcrspcctivc used bases upon the diccrepancy between the original resemblance matrix (d,,) rcspcctivoly (s,~) and the derived cophenetic matrix (d$) respectively (SC). A hierarchic regional taxonomic result will be considered to be ‘optimal’ if it represents as closely as possible the original resemblance matrix. For such an analysis there are different distortion measures. The best known one is the cophenetic
M.M. Fischer, Regional taxonomy
518
r------
a. cl.tc
t
0.0
0.10
I-
O.Oi
0.a
2 4 17 91616
~-
L
i0.11
----
1
5 6
$1
r----
1
:I/
2 4 1716 918
I :
5 6 7 12 61014
..--. _- -_._
I
$1
3
1
0.15 , I
0.15
l / 0.10
I
ll-
r
-
0.10
0.05
0.00
._ ii7 _--.I_-_
r-
0.06
,1
615
0.00
1
4 17 91616
6 7 1210
Fig. 2. Comparison between dilTerent linkage strategies: Dendograms produced hy (a) SLS. (b) CLS, (c) UZLS. (d) UAALS. Dendrograms based upon average Manhattan distalces calculated from normalized attribute values.
M.M.
Fischer,
Regional
tnxonoml
519
correlation measure, a product moment correlation between the resemblance and the cophenetic values. developed by Sokal and Rohlf (1962) and defined as
with d,j-distance between Oi and OS, d ---mean of the (“,) corresponding distances, di*,- cophenetic distance between Oi and Oj, P -- mean of the (;) corresponding cophenetic distances. in the case of a chosen distance measure d respectively similarity measure s as (SC-P) -+L__.____,__-. --_____f, f.
r‘x,,ph(S,s* ): =
(
C
i