Hierarchical Grouping to Optimize an Objective Function Author(s): Joe H. Ward, Jr. Source: Journal of the American Statistical Association, Vol. 58, No. 301 (Mar., 1963), pp. 236244 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2282967 Accessed: 23/09/2010 10:19 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=astata. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact
[email protected].
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association.
http://www.jstor.org
GROUPING TO OPTIMIZE HIERARCHICAL OBJECTIVE FUNCTION*
AN
JoEH. WARD, JR. AerospaceMedical Division,LacklandAir ForceBase A procedurefor forminghierarchicalgroupsof mutuallyexclusive subsets,each of whichhas membersthat are maximallysimilarwith is suggestedfor use in large-scale respectto specifiedcharacteristics, (n > 100) studieswhena preciseoptimalsolutionfora specifiednumber of groupsis not practical. Given n sets, this procedurepermitstheir theunionofall reductionto n -1 mutuallyexclusivesetsby considering possiblen(n -1)/2 pairs and selectinga unionhavinga maximalvalue for the functionalrelation,or objective function,that reflectsthe criterionchosen by the investigator.By repeatingthis process until only one group remains,the complete hierarchicalstructureand a quantitativeestimate of the loss associated with each stage in the groupingcan be obtained. A general flowcharthelpfulin computer and a numericalexampleare included. programming
S
ITUATIONS
1. FORMULATION
OF THE GROUPING
PROBLIEM
oftenarisein whichit is desirableto clusterlargenumbersof
objects, symbols,or persons into smaller numbersof mutually exclusive groups,each having membersthat are as much alike as possible. Groupingin this mannermakes it easier to considerand understandrelationsin large colofmanagement.Grouping,however, lections;henceit oftenincreasesefficiency of information that may be quantifiedin a ordinarilyresults in some loss number. "value-reflecting" In earlierformulationsofthe groupingproblem,Cox [2] and Fisher [31were concernedprimarilywithgroupingbased on similaritywith respectto a single variable. Cox treatedthe special case whenthe variableis normallydistributed, and brieflydiscussed the two-dimensionalproblem. Fisher described a technique for groupingwithout regard for the distributionof the variable. His procedureis designed to minimizea weighted sum of squares about group fromboth. means on one variable. Our formulationof the problemdiffers For our purposes,it was necessaryto take account of the similarityof group memberswithrespectto many variables. Our desirewas to formeach possible numberof groups,n, n -1, * * * , 1, in a mannerthat would minimizethe loss associated with each grouping,and to quantifythat loss in a formthat could be readily interpreted.Since it was not feasible to obtain the many optimal solutionsthat would be requiredforlarge-scalestudies (n > 100), a compromise was necessary. Therefore,we proposed (a) to reduce the number of groups fromn to n -1 in a mannerthat would minimizethe loss and then, without modifyingthe groupsformed,to repeat the processuntilthe numberof groups was systematicallyreduced fromn to 1, if desired,and (b) to evaluate loss in termsof whateverfunctionalrelationbest expressedan investigator'scriterion forgrouping. * The researchreportedin this paper was sponsoredby 6570th PersonnelResearch Laboratory,Aerospace Medical Division,underAFSC Project 7784, Task 773403.
236
GROUPING
237
TO OPTIMIZE
Function Objective {2, 6, 5, 6,2, 2, 2,0, 0,0 }, a common Givena setofratings for10individuals, practiceis to use the meanvalue to represent all the scoresratherthan to consider individualscores.The "loss"in information thatresultsfromtreating the 10 scoresas one groupwitha meanof2.5 can be indicatedby a 'valuereflecting" number, theerrorsumofsquares(ESS). The errorsumofsquaresis givenbythefunctional relation, 1
2
ESS=Z 0=1x --(n
/n
xi)
2
=1
wherexi is thescoreoftheithindividual.The ESS fortheexampleis 10
ESS(one
group)
X -- 0 2
1
2
0
xi
= 113 -62.5
= 50.5.
ifthe10 individuals are classified to theirscoresintofour Similarly, according sets, {0 {o,o0,O}
2, 2 ,2 2}, {5}2
{6, 6}
can be evaluatedas the sumofthefourerrorsumsof squares, thisgrouping ESS (fourgroups) = ESS (Group1) + ESS (Group2) + ESS (Group3) + ESS (Group4).
A functional relationthatprovidesa "value-reflecting" numberofthistype willbe referred In general,an objective to hereas an "objectivefunction." selectsto reflect function maybe anyfunctional relationthatan investigator ofgroupings. In thisexample,theobjectivefunction is therelativedesirability as reflected 'loss ofinformation" by theerrorsumofsquares.The mostdesiris its minimum able levelofthisobjectivefunction value,00. The objectivevalues that werecomputedindicatethe information lost whenthe function 10 scoresare treatedas a singleset (50.5) and as foursets(0.0). The natureof ofcourse,dictatethechoiceand interpretation theproblemand thecriterion, oftheobjectivefunction. A numberofdifferent objectivefunctions have beendevelopedand put to inapplications ofthegrouping use.Theseareexemplified operational procedure theclusternowprogrammed ResearchLaboratory to accomplish byPersonnel withrespectto measuredcharingof (a) personsto maximizetheirsimilarity timewhenmenare reassigned acteristics, (b) jobs to minimizecross-training to minimizeerrorsin accordingto establishedpolicies,(c) job descriptions and a largenumberof jobs witha smallnumberof descriptions, describing theloss ofpredictive (d) regression equationsto minimize efficiency resulting in the numberofequations.In (a) the objectivefunctionis fromreductions the grandsum of the squareddeviationsabout the meansof all measured The objectivefunction for(b) is expectedcross-training characteristics. time, assumingrandommovement amongjobs in each cluster;whereasin (c) it is amountofjob timeincorrectly andin (d) lossofpredictive described efficiency. the same generalcomputerprogramforgroupingcan be used Fortunately, In fact,the four withmanyobjectivefunctions thatappearto be different.
238
AMERICAN
STATISTICAL
ASSOCIATION
JOURNAL,
MARCH 1963
objectivefunctionsjust describedall lead to the use ofthe same program[1, 7]. In some situations,the desirednumberofgroupscan be specifiedin advance; in others,this is difficultto do.' With this groupingprocedure,one need not value associated with specifythe numberin advance, forthe objective-function any given numberof groups (fromn to 1) is available. Indeed, the changesin these values as the numberof groupsis systematicallyreduced furnishuseful clues to the appropriatenumberof groupsto be used foroperationalpurposes. HierarchicalGroups The groupingprocedurereportedhereis based on the premisethat the greatas indicated by an objective function,is available est amount of information, whena set of n membersis ungrouped.Hence the groupingprocessstartswith these n members,which are termedgroups,or subsets,althoughthey contain only one member.The firststep in groupingis to select two of these n subsets which,whenunited,will reduce by one the numberof subsetswhile producing the least impairmentof the optimalvalue of the objective function.The n-1 resultingsubsets then are examinedto determineif a thirdmembershould be unitedwiththe firstpair or anotherpairingmade in orderto securethe optimal value of the objective functionforn-2 groups. This procedure can be continued,if desired,until all n membersof the originalarray are in one group. * , 1), the Since the numberof subsets is systematicallyreduced (n, n-i, processis termed"hierarchicalgrouping"and the resultingmutuallyexclusive groups "hierarchicalgroups." Applicationsof HierarchicalGroupingProcedure Hierarchicalgroups,formedin the mannerdescribed,are particularlyuseful for classificationpurposes [2, 3, 4, 5, 6, 7]. They may be used, for example, to establish taxonomiesof plants and animals with respect to genetic background. Such groups also permit organizingand cataloging materials, e.g., librarydocuments,so as to facilitatethe storageand retrievalof information. Similarly,hierarchicalgroupingsof jobs may be helpful in identifyingjob "types" and "subtypes." As indicated previously,the usefulnessof this hierarchicalapproach is not restrictedto classificationproblems.No seriousloss of predictiveaccuracy resulted,forinstance,when a hierarchicalgroupingof regressionequations was used to select a smallernumberto replace the 60 equations developed forpredictionof success in Air Force technicalschools. While a hierarchicalarrangement may not always be needed forthe "best" partitioninginto groups,it is preferredin many practical situations.Even when a nonhierarchicalgrouping is sought, the approach described here probably will yield a good solution, althoughit may be one that does not optimizethe objective functionforthe specifiednumberof groups. A problemoftenlinked with the groupingproblem is that of assigningor 1 When it is desiredto assign objects to k, a specifiednumberof groups-withoutregardforthe subdivisions of these groups-the goal is only to identifythe k groupsthat optimizethe objective function.If the objective problem.If the objectivefunction functionis a linearform,the problemcan be formulatedas a linearprogramming problem [2]; howevercertain is nonlinearin form,the problemcan be formulatedas a nonlinearprogramming must be considered. computationaldifficulties
GROUPING
239
TO OPTIMIZE
newobjectsintoacceptedgroups.This problem,as wellas those classifying and decidingupon associatedwithselectingappropriateobjectivefunctions thenumberofgroupsto be utilized,willnot be discussedhere.The purpose believedto be of procedure grouping ofthisarticleis to reporta hierarchical sinceit has manyapplications. generalinterest 2. HIERARCHICAL
GROUPING
PROCEDURE
Cycle Grouping Hierarchical startswith a uniprocedure S(i, n). The grouping Optimalunionofsubsets subsets,i.e.,wellofn one-element versalset (U), {el,e2,* I,en1,consisting numberedin sedefinedobjects,symbols,or persons.These are arbitrarily in computerprocessing. To reducethe identification quenceforconvenient thechangein the numberofsubsetsto n-1, onenewsubset,whichminimizes by unitingtwooftheoriginaln subsets, value,is formed objectivefunction's say [S(1, n)] U [S(2, n)]
=
{ei,
e2}I
foreachofthen(n-1)/2 Thisrequiresan evaluationoftheobjectivefunction possibleunionsof subsetsS(i, n), i= 1, 2, * * , n,wherei refersto the number n at thisstage,refersto the the set, and the secondparameter, identifying in turn,the As each unionis considered numberof setsunderconsideration. hypothesized and is computed function objective value of the corresponding union.The identityof to be "equal to or betterthan"thatofany preceding This ofcomparisons. the is sequence union maintained "best" throughout the value of thatunionwhichhas an objective-function identification facilitates "equal to or betterthan"thatofany of then(n-1)/2 possibleunions.This ofsubsetsis reduced whenthenumber unionis acceptedas an optimalgrouping fromn to n-1. value.For the ofnewsubsetand its associatedobjective-function Designation in subsetsare which the to show a out sequence of display purpose printing machine in it is convenient 100 to studies in 1000), (n= united large-scale new each to subset (resulting identify designations special to give processing in fromacceptanceof a union)and its members.Thus the unionresulting is denoted: n-1 subsets S(pn
-
1)
=
[S(pn-1,
n)] U [S(q,-, n)]
where Pn-1
the subsetin the n used to identify =the smallerof the twonumbers
thenewsubset. originalsubsets.Thisnumberis used to identify the subsetin the n to used identify qn-l=the largerof the two numbers after "inactive" it is used at this is number This subsets. original beenunited. subsets have which two the for showing printout stage
it with valueis denotedin thesamemannerto identify The objective-function thisunion: Z[pn-1)
qn-1,n - 1].
240
AMERICAN
STATISTICAL
ASSOCIATION
JOURNAL,
MARCH 1963
The termat theright,n-1, showsthenumberofsubsetsremaining afterthe numbers union;theleft-hand and centertermsare the originalidentification oftheunitedsubsets. Optimalunionofsubsets S(i, n-1), *, S(i, n- [n-i]). Withn-I subsets,now,we have to defineand consider S(i, n
wheni==1,2,
,
-
1)
S(i, n)
=
n; i5pPn1; anditqn-i [S(pn-,, n) |J [S(qn-1,n) ]
and S(pn_1, n-1) when i-pn4.
Selectionof an optimalunionto reducethe n -1 subsetsto n -2 requires ofthe (n-I) (n-2)/2 possibleunionsin a manner evaluationand comparison analogousto thatused to reducethe n subsetsto n-1. Whenthishas been value are done, the acceptedunion and its associatedobjective-function designated S(pn2y
n - 2) = [S(pn-2,n - 1)] U
[S(qn-2,
n-1)]
and Z [pn-2
qn-2)
n - 21
(pn-2 < qn-2).
are maintainedas before,i.e., Pn-2 iS the identification The identifications value and qn- is thatwiththelargernunumberwiththesmallernumerical mericalvalue. ifdesired,untilall subsetshave been Thisgrouping cyclecan be continued, unitedto formthe universalset, U. At any phase in whichk mutuallyexthe objective-function clusivesubsetsare underconsideration, value,and the as unionwithwhichit is associatedwouldbe expressed Z [i,j, k-1 ] associatedwith[S(i, k)]U [S(j, k)] where i=1,2,*
-,n-1
i 5? qn-1) qn-2)
j =i+
1,i+2,
qk
**,n
j y?qn-1l) qn-2) * * * ) qkc
Followingselectionof an optimalunion,this union and its corresponding value wouldbe designated objective-function S(pk-1,
k - 1) = [S(pk-1,k)]UJ[S(qk-1,iG)]
and Z[Pk-ly qk-1, kc
11
(pk-1 < qk-1).
the elementsofany subset,S(i, c), wouldbe designated Furthermore, S(i, k) =
{em1, ..
, ema, *
, e.,)
GROUPING
241
TO OPTIMIZE
where
in subset t=number ofelements inthesubset. numberofathelement ma= identification
NumericalExample ofHierarchicalGroupingCycle
let us considera smallproblemin whichfive the procedure, To illustrate areto be groupedon thebasisofratingstheyhavegivenone object individuals is the sum of the value to be minimized (Figure1). The objective-function likethisare squareddeviationsaboutthegroupmean(ESS). Whileproblems numbersand readilydoneby handand do notrequiretheuse ofidentification whenn> 25. Thereofdata is seldomeconomical hand-processing designations, programming thatmaybe helpfulin computer fore,wegivea generalflowchart operationsforthe exampleis (Figure2). The patternforthe computational shownin Figure3. Numerical Example
PrintoutDisplayingHierarchicalGroupingResults
Per- Object son Rating Person
1
2
Numberof Groups 3
4
5
1
1
1
1
1
1
1
1
2
7
3
3
3
3
3
3
3
2
2
2
2
2
2
2
4
9
4
4
4
5
12
FIGURE.
5 5 ObjectiveFunction Value 86.80 (ESS) A1 A2
5
73.63
< -------------4 4