41, Bid Napoleon III, 06041 Nice cedex, France. *INRIA Sophia-Antipolis. 2004 route des Lucioles, 06565 Valbonne Cedex, France giraudon@mirsa. inria.fr.
MVA'gO
IAPR Workshop on Machine Vision Applications Nov. 28-30,1990,Tokyo
Component labeling on SIMD-SPMD Architecture P. Duclos, G . Giraudon*, P. Kaplan, M. Auguin, F. Boeri l,iil)o~.at~iS r ei~g ~ i a t ~ ets S!.ste~iies (L;\SSI'. Lni\.crsitc tle Kice, i l l , Bltl N a l ~ o l c o ~111. r 06041 S i c e cctlcs. F1.ancc) *INI?II\ Sol)hia-,-\ntipolis. .LOO? r o u t e tlcs Lucioles, 06565 \ ; a l l ~ o ~ ~ lCi e d e s . F r a n c e giraudonQmirsa.inria.fr
Abstract
after p r o c e s s i ~ ~tlre g I)est loacl balancing efficiency. If wt, hay t11(, processor load t l e p e ~ ~ t on l s the n u ~ n b e rof region pisrls. we want I ) ; I ~ ; I I I I > I ; ~ r c l ~ i t c c t n rrr;1111ctl c~ OPSII,,\. OPSIL:\ i h an r s p e r i n ~ r r ~ - ilftrr processing tlie same ~rnrnberof pixels i r r each prorrssor. 111 I;II g(>n(~r;rI 1)11r1)own111ltil)rocessor \vIricI~nses t\vo tlifferent f o r ~ n s gen(>ral.this involves that \ve W O I I ' ~get the salnc I I I I I I I ~ of ) ~ ~rrgio~r of ~ ~ ~ r i ~ l l r l i s'I'l~r n r . first, on(, is a well known SI.\II) mode ( S ~ I I - in e a c l ~processor ant1 therc~forrinvolves a commnnication rost he~ I I ~ ~ ) I I O I I ~S ~ ~ o and t l r )rlrr sccootl o n r is called SI'.\IIl (single I'rot ~ v c ~processors c ~ ~ ~ wliicl~must be taken into account. :~;IIII S111Itiple1)iita s t r o a ~ n \vIriclr ) is a11 a s y ~ ~ c l i r o ~ Ir oI ~I ~Os~\YP ~. So. ivc 1)roposr a load tlistribution algoritl~rn\ v l ~ i c lincorl)o~ Ilavi. sl~o\vnOI'SILl\'s rffict.ivcnrss for the low-level i ~ n a g rproratcs a tolerance factor on tlre load function. Tlris a l g o r i t l ~ ~can n voshing [ I ) I I ~ R H and ~ I ] \vo d(~111011st~ri1te Irere its effectiveness for take into consitleration t11(, initial localization of rrgion i~ntlwr t IIP i ~ ~ l e ~ r ~ n ( ~l(>vel (lii~ ~vitli t o a co1111)one11t 1al)eli11gi ~ ~ ~ ~ ) l e ~ i ~ e ~ r s11o\v l a t i oit~ is l . the I)rtt.er cl~oicct o tlrr loatl I ) a l a ~ r c i ~efficiency ~g vrrsll:, I'lrr ainr of tile inlplenle~rtationis ol~tainedafter processi~rgthe tlre cost of list transfer. Sorne examples and results are given on I)c.x~ loatl 1)alanring efficiency. So. \ve propose a loatl tlistribureill inrage. lion illgoritl~nl\vliic11 i1icor1)orates a tolerance factor on the load 2 Presentation of OPSILA fl~nclio~l. 'l'l~isa l g o r i t l ~ ncall ~ take illto consideration tlrc initial loOI'SILA is a general purpose parallel arclritecturr belo~rging to c.ali~i~tion of region ant1 tve slioiv it is the better choice l o the loatl I ) a l i ~ ~ ~ cclficiency iug vcrst~stlrc cost. of list tranhfer. Some r e s ~ ~ l t s tlrc ~ilisedcla.ss (see [i\ugSi] for Inore tlrtails). It rnns \vitli two tlilfcrent forlrrx or ~ ) i ~ r i r l l e l i ~T~h ~e i first . one is the \vcll ! ~ I I O \ V I I SlhlD l~rotle( a s y n c l ~ r o n o t form ~ s of parallelism) [ReeXl]. Tlre see1 Introduction ontl one is callrtl SPhlD (single P r o g r a n ~ .S I ~ ~ l t i p Data lc Strea111) wlriclr is an a s y ~ ~ c l ~ r o ~ triode. r o u s Tllesc tv.0 ~notlcsa r e tly1ia111111 C'o~ul)c~tcr \.ision, inrage interpretation is ~ i i i ~ trougl~ly le in three "pil)(4inrtl" stel)"IlaniP]. At each step corresl)onds a d a t a strucically configurable in one ~ n a c h i n ccycle. OPSILA is colnposctl 1 I I ~ C'I'hr . . I,o\v 1,rvc.l vision tlcals tvitl~iconic structrrrc (pixel array of t,wo pa.rts : ;I centr;~lcont;.ol nnit and a parallel coirrl)~~tation tunit. ~vlriclicont.ains p = ICi processors. 1) mcniory nnits i ~ n dan 11.\.(>1). .I'll(. I l i g l ~I,(.veI ~ ~ r o c e s s i {~~r og r l itsv i l l r t r c ~ant1 , grapli st rncnct~vork[FengHl]. 111 S I l I D ~notlc. t l r r c ( h y ~ r ~ l ) ~IcveI). lic T h e I~rtlernietliateLrvrl ])rocc.;sil~g~ I I I I S Omega./Benes ir~terco~lnection Illc~mory is ~lli1I'Otl: (Yrcll I)rOresSOr C i l l l arC1~SSill illly i l d d l ' ~of~ ~ !)I t;tking ;IS inj)nl ~nirtl.ixof pixrlh ~ I I ereatin:: I ~ its O ~ I I ~ ) I li\ts II of Inelnory. In Sl'hlD ~ ~ r o t lc>i~ch r . procrshor acccssrs only at its lorill 4y1111)o1s(1)ix~I~ ~ ~ i r t .3 r i xlist s e t ) . 'l'l~(' typical aIgoritl1111of the I n t ( ~ r l ~ ~ ~ ~I,t,v(,l ( l i ; ~ist (t~l ~ t ,c o ~ n p o n ( ~ nl at l ) e l i ~ ~algoritlr~n g [13aIX2]. nren1ory. 'I'll(, ( . O I I I ~ ) O Ili~I) I I ~ goill is to o1)tirin :I srt ofrc>gio11lists fro111 TIIF scalar-CCJIII rol 1111itperfor1115ov(,rall n~anirgenientof t I I V system. I t consists in two processors : t h e Scalar I'rocessor (Sl') ;I 1)isc.l i ~ ~ ~ i :\ ~ grrgion v. is a set of eon~rectctlpisrls wlricli are 110I I I O ~ I ~ I I ( > for ~ I I ~;I givr11 crit(srio~~s. 111 gen(>ral.tl~(,selists arc S O I I I ~ ant1 the Ilrstrnct,ion I'roccssor ( I P ) . 'l'l~r I P Inanages t h r vrctor I . ( ' I ) ~ I ' ~ ( > I I ~ ~ofI ~i~n;ige O I I ~ featr~rcslike regions. contours. lines etc ... unit (menrory. ~~c,twork. vrct.or instrnctions and s y ~ r c l ~ r o ~ ~ i s i ~ t i o ~ ~ s SO. ~ ) ; ~ r ; ~ l l ~ l i z aoft i ointcrlnediate ~r procrss is an i ~ n p o r t a n t I)et.ween SIMD and SI'blD ~ n o d e s ) . ( . I I ; I I I ~ ~ I I ~ i( l~l ro1111)ntorvision I ) c ~ i r l ~ sof r t l r ~ 111ixc>(1( i c o ~ ~ i c T h e vector unit. consists of p=2"= l (i Ele~nentaryl'roccssors ~ ! 1 1 1 I ) ~ > l i i ~ )I I ; I ~ ; I 5t1,11el11rc> in 1)roccshi11g, lIi111y i111t110rsIrave ( P E s ) , each associatctl wit11 a I I I C I I I O ~l)i\~lk ~ ( . \ I n ) . t\ s y ~ ~ e I r r o n ( ~ , ~ I I ( I I ( Y I 1 1 1 ~~ ~ i ~ r ; ~ l I of ( ~ conipo~rent lis~~r 1aI)rling [Sl1iX2], [ H I I I I ~ ~ ~Omega/nenes ]. i n t e r c o ~ r n c c t i onc,t\vork ~~ is rlsrtl t o perfor~nd a t a [1)11fl'S(i].[('yI)"!)]. [I.;IIII)X!)] 011 hIlhll) o r fine grain SI.\tD arcliicxchangcs. Particnlar facilities are providetl as any length vec11'1.t11ri'h. 11111 C O I I I ~ ) O I I ~ I 1al)c~lilrg I~ is j11s1.;I stol) lo\vards tlrr final tors, a l ~ t o m a t i cInerrlory ant1 nc~t\vo~.li I I I ~ I I I ~ ~ ~and ~ I Iindir(\ct ICII~. .( i.11i. i ~ l t ~ ~ r l ~ ~ ~ i111t1 ( ~ tthe ; r tdi aot ~a ~s t r l ~ c t r ~ cfficirncy re li)r t l ~ rnrst parallel adtlrc,ssing C~:\'l'IIIC11 and SC':\'l'TI~It. OE'SIL.\ is I ) r ~ i l t \ l c s i ) I I I I I S ~I)(\ t i 1 1 i ~ 1 1into ;~cconntfor ;I glol)i~loplinlization of conlwith bit slice teclrnology. T h e processors are X l I D 29116. Each I I I I I V ~visi011~)rocrss. MB capacity is 96 1il)ytes. T h e base cycle time of t l ~ emachine is \\.c. ~)r(\srnti l l this pal)cr, a conrl)onent lal)eling algorithm de700 ns. Operantls can l)e integers of 16 or 32 bits o r 32 hits real vcsIo1)1)(~I i l l 198s [DucSSl)] for a coarse-grainrtl parallel arcl~itecnumbers. (see figure 1 for OPSIL:\ I)locli tliagram) 1 11rcII;IIIICYI OPSllr:\. OI'SILI-\ is an exl)erinrr~rt;rlgeneral purpose T h e SIMD operating m o d e : I I I I I I I ii)roc(\hsor tlo\-c~lol)l)c~l ant1 l,nilt 1)y tlrc I,.-\SS\- ant1 snpportctl In SIhID ~ ~ r o t l tcl. ~ ci~l)l)lic;~tion progra111 is entirely .torctiveness for the nlelnory. and R is the conllnon addresses difference for conseclo\v-Ii~v~~l i~n;rg(,procc>ssi~~g [1)11cHSa];111(l \v(, ~Ic~~~rc)~rhtr;rt(, lrrrr its ~ r t i v rclen~c~nts.In ortlcr to r x c r n t r operations I)rt\vrrn vrctors i s f [ i ~i\.rnrss l for Ill(, int,rr~nctlii~te love1 witli a co~nponc~n t lal~el- of ally Irngtlr. 1 1 1 ~11' s c ~ g ~ n c ~at n t srnnning t i ~ n rt l ~ rvcctot.s into illy ~ I I I ~ ) ~ ( ~ I I I ( ~ I '1I '1~ 1(, ~ ~ail11 ~ O I of I . t111t i~nplrnrr~rtation is oI)tainrtl subvectors of ir lrngtlr at 111ost rclnal to the ~rnnrl)c>r1, of El's. \\'v ~ ~ ( ~ s (a, n conlponent t 1al)elirrg algorithm for a coarse-grained
I)c5c>11
t r c > a t n ~ cof ~ ~ttl y ~ ~ a ~ r sr it cr ~ t c t ~ ~ (lists) res Represc,ntirtio~~ \ \ ' I I ~ ItIl ~ rI(11rgtl1of sr~bvrctoris less tlra~r1). tllc El's c o n t a i ~ ~ i ~ r g ill a11 11111~iproc(~sso1~ a r c l ~ i ~ ( ~ c.\!I t ~ ~ol)jet r ~ ~ p. ; ~ r ~ ~ l l risl iill1 s~~~ 1 1 0 hig~~ificir~rt ( I i i t i i irrc III~ISIi l l ~ ~ s t r a~~I ItPtI I I OII a partic111;rr c;rscX. i n t l c ~ l ) c ~ ~ rht lci ~ ~( ~ l ~l ~ r ~ )l r~oi~ael s s o r(lac11 s . associaled \\.it11 tll(' S ~ I I I C I I I I I I I ~ I) I~ I ~P ~I I I O ~ 1)nnk. ~ Iin(Icr SI'hlD ~notle.I'Es c;ililrol esclriingc~ 1. Ilivitlr pict~~rc' illto I(; vrrtic;~l I)al~tls\\'ill1 ol~c.( . O ~ I I I I I I I i ~ i f o ~ . ~ ~ l ; r ~l)at(> i o l r csclr;r~iges . can only occur iu SlSII) n ~ o t l rvia overlappi~i:: (SlhlI) ~notlr). t \I(. s y n c l i r o ~ ~i ~c l~t c r c o n ~ i c ~ c~t~i or t~. ~ work. 111 fact. tlivitli~~g i ~ n a g cill rrgi~larsclilarc ~ n i ~ ~ i n ~the i z cI so ~ ~ g t l ~ S P l I I ) ~ ~ r o t is l c i~~itializetl I)y t l ~ r1P \vlliclr 1)rovitlrs c a c l ~PI.: of fro~~tir>l.s. I ) I I I i t i ~ ~ r r r a s c s ~ O I I I I I I ~ I I ~costs ~ ~ ~ ~\vitl~ O I I I IIC h l i ~ r t i ~ SI'lII) ~ g code atltlress. T h e n~aclrinccall I](, t l y ~ i a ~ ~ r i c a l l , ~ special 11.c.at111cwt.s for c o r ~ ~ c wliicl~ ~ r s I)rlo~rgat 4 ~)rocc~hsors. co~~figi~rc>tl i l l SIlIL) or SPhID ~notlc. This allorvs 11s to process 2. Run t h r seque~rtialalgorit,hnr into every processor (SPM1) \.(,(.tor01. ~ n i ~ t r glol~ally is in SIhID ~ ~ r o da c~ i dlocally ill SPSID. ~ n o t l e ) .Tlre a l g o r i t l ~ ~runs n i l l single pass. 'l'llr synelrro~rizatio~is reqniretl for ilritializi~lg the SPSID ~ n o t l e 3. For each processor, nralie a label merging process by excllangr ant1 for returlii~rgt o SIR~II)mode are a fork-join operating over of overlapped vertical columns(SIh1D and SPRID rnodt.) t11e set of YES. . i s the synclrronization lnecanism is sinlple. its -1. C o n ~ l ~ n;it cg1ol);rl lal)clin:, I? a 111ergi11g of 16 e(111ivale11retaol)oraIiol~is v c v elficient : tra~rsitionfro111Sl'AlD lo SISID motlr o~~ ~)roccssors1.0 ble ant1 colnputc tlrc loatl d i s t r i l ) ~ ~ t iI)et\vce~~ is ~ ~ r n ti ll lr o ~ i eI I I R C I I ~ I I C cycle after tllc- clrtl of e s c ~ c i ~ t i o of~ ithe P E obtain a I~onrogenons loatl bala~rci~rg (seque~rtial ~ n o t l crcal\vill~tlrr I a r g c ~ t~ v o r kload. ized by scalar ~)rocessor) Programming a n d environment: .. .5. Each processor puts sub-regio~ilist i l l a output ma.ilbos for IIIC. i ~ ~ r ~ , l r ~ r r r ~ r tofa t iparallel o~i algorithnr 011 OPSII,>\ is carot11c.r processor (SP;MD nlotle) i l l relation to the eqnivalence r i c ~ lout \vitl~a lrigl~level s t r l ~ c t ~ ~ Iangl~agc red ~ i n n ~ eHELI,ES..\ tl table. hlailboses arc exchange ill SIhlD mode. Then each [.I(szS(j]. :I P.ISC*:\I, like. i ~ i ~ p r o v eby (l : processor n1ergc.s sull-region lists whir11 are in its i ~ l p r ~n~ailt \ . c ~ ~ oi ~r i s t r ~ ~ c t itlcfinctl o ~ r s l)y e s t e r ~ d i ~snclar ~ g opcratio~rst o 110s(se~rtlingfroln tllc other processors) (SPXID node). a1.1a!. of scalari (I)? o\.crIoatli~rgof ol)erato~,s) ant1 rsecr~tetli l l 3.2.1 O p t i i n a l a l g o r i t h i n f o r l o a d b a l a n c i n g SlIlI) n~otlr nln(.k illstr~~ctiolrs 11sct1t o enter arid to leave 1Ire SI'lID n~otlc. .\ load l)ala~~cirig is ;III i~l)plicatio~r f : O -- P. ~ v h e r eO is a set of11 ol)jects O 1 ..... 0,, ant1 P is ii sc.1 of 11 ~,rocrssorsPI..... I'l, 11.e tlrfine : 3 Parallel implementation of the compo111, as tlrr cost o f o l ~ j c c it for i E [ I , I,]. In our case. the ol)jccl nent labeling is n regio~l-list.atltl \ve c o ~ ~ s i d ethe r cost as tlrr 11un1I)rrof pixels. 3.1 S e q u e n t i a l a l g o r i t h l l l L, as the cost of processor 11. rr E [ l , p ] . wlrert L,, = C, 111,. I ' h r I I I ~ ~ Iitlea I of tlre algorith~nis t,o assign tliffcre~rteolnpolrellts i.e. cq11aI the srlln of o l ~ j e r tcosts into processor 11. \ \ i t 1 1 elifTere~lt1al)rls [BalS2]. ;\ compone~ltis tlrfi~letlas a set of tI1(8
pisc>ls c o ~ ~ ~ r c c licl lt lone of the cligllt direct ions (S-coll~rectivity ). l l i r 1al)rl.; arc, like cc~uivalence tal)les \vitl~ec~uivalc~rce relatio~r "l)c~longsto the hame co~rll)onel~t as". For our propose. the result \vr. ~ v a ~ohtai~r rt is a identified list for ear11 e q ~ ~ i v a l r n ctable e .. l11c. ;~lgoritlr~rr scans image, pixel by pixel, fro111 left t o right ; ~ ~ i tfl r o t ~top ~ to b o t t o ~ u .wit11 a \vi~rtlowtlcfi~ictli l l figure 2. For each current pisel. t h e algorithm tests t h e ~ ~ e i g h h o u r l ~ oconfigod 11rati011 a ~ r dill f i l ~ ~ c t i oofn this, luakes some ones of the following ol)eratio~~s: create a netv lahel ( t h e generated labels start from 1) ant1 a new list. labelilrg the current pixel and put the pixel in a list if t\vo n e i g l ~ b o ~ ~ r h opixels otl have different lal~elsL1 autl LZ \vitlr L1 < LZ. ~ v r i t et h a t L Z is a11 e q l ~ i v a l e ~tahlc ~ t t o L1 .-liter t h e s c a ~ l n i ~ rthe g , equivalence tables arc reorganised and the lists of salnr equivalence are mergiug. So. wit11 this algoritllm. \ve ~ r ~ a l ai ec o n ~ p o n e n 1al)eling t in single pass over image. T h e main tlificnltp of this algorithlll is tlie pixel treat,lllelit i h r o ~ ~ t ~ ~ s t - d e ~ )Tlie e ~ ~process d a ~ ~ t is. not l ~ o n r o g e ~ r oand ~ ~ snot rcgr~larwith a nlacl~inecycle. So, we can t l ~ i l ~tkh a t an asynclrronoi~sarcl~itecturefits better than a. synchro~rousarchitecture. 111 the 11est section. we slro~van i m p l e m e n t a t i o ~\vitlr ~ OPSILA's a s y ~ ~ c l r r o ~ n~ ~o ou ds e .
3.2
Parallel implementation I'or a coiirsr ~)arallelarchitecture, conlpolrent labeling sets follolvi11g proI)len~s:
So\v the j~roblenris to ~ ~ ~ i ~ r ianglol)al ~ i z e cost from elenrel~tary cost associated to ol)ject. ;\ s o l r ~ t i oco~lsists ~~ to filrtl 11" soli~tio~rh autl to take tlre hest. So to avoid this. \ve i ~ r t r o t l ~ ~a cquality e criterius of the load tlistrihntion process whiclr nieasures the effiriency Ec of 1oa.d tlistri11111 ion Ec = Lt / p . L ~ n a s where Lt = total load = CtTlP,, Lmax = load of the processor which is the most loaded p = number of processor. For Opsila, t h e hcst load h~nction(F,c=l) is for L ~ I I ( I=. ~I,, = I,) = L t / l G . i # j. 1 5 i . j 5 16 (1) So, if q is the n u n ~ b c rof 01)jects ant1 p the nr~nlbcrof procrhsors. the o p t i ~ n a lalgoritlrni is : 1. Find the proccssor 1'" the least loaded among p 2. Fintl the ol)jcct 0, tvit,h the I~iglrcst cost among the ol)jels remaining 3. Put 0, i l l P,, -1. goto 1 until o l ~ j c c tlist is e1n1)t.v Tlie algoritliil~complesitp is O(r12+ pq). \Ve can retlucr this co~nplexit,yby sorting objects before ~)roccssingin fulrct.io11of ohject cost. But this algoritlrm tloes not take illto accou~rtthc initial localization of objects vel.sus processors. So. this algoritlrnl tloes not minimize d a t a transferts I)etwecn processors as s l ~ o w ~i i rr figure 3.
3.3
B e s t efficiency w i t h t a k i n g a c c o u n t t h e list t r a n s fer cost '1;) soIv(~t11r I ) ~ O ~ ) ~ P I I I Isl10tv11 II i l l figure 1%. a s t ~ o ~rriterius ~tl ~rrust IN, ~ ~ I I I I ( \\.lrirl~ I lakrs into ;~c.ror~llt costs of list transf(\rs I ) r t w c c ~ ~ 1)ro(.t's"o "lo ol)lirin ;I c o ~ ~ ~ l ) lrcr tgri o ~in~ a processor. 170r tlris. I I I I I ~1i111itrtI ~ t l ~ et r a l ~ s f t w . So. \v(- 1)ro1)0w a loat1 t l i s l r i l ) ~ ~ t i oalgoritll~lr l~ \vlticl~i~reorl)oratrs ;I tol(~r;~r~c.c~ factor on t l ~ rlo;ltl f11nrtio11.'This algoritl~~rr ~ ; I I I talir illlo c.o~~sitlrratiot~ t.lrr i11it.ii11 lorali~ationof rrgion itn(l i t is the I)(>\!c,ltoicc. to 111(,lo;~tl l ) a l i ~ ~ ~rfficir~rcy c i ~ ~ g ver>11st l ~ erost or list I r;1115f('r. So. \VC i n l r o ( l ~ ~ac~~ ~' o t i or o ~tolcrance t 0 1 1 Ec c~xprcsscdi n ~ ) t ~ ~ ~ .rl'l~is ( ~ t rtolcra~lcc~ t. givc.s a set of the least loatlrtl processors ;III(I c;rll ( ' 1 1 0 0 h ? anlolrg tlrii. ~ I I C procehsor for \~lrirlrthe t1.atrs1'1.1.tcost i h t ~ ~ i ~ r i ~ lTIIC r a l . o1)timal algorithnl 1vitl1 load tolerance \\(I
loatl. \Ve sholv tlrr loatl I ) a l a ~ ~ c cfiice~lcy i~~g is c o ~ r s t a ~ant1 lt t l ~ r cost of transfer tlec~~cirses i n f ~ ~ n c t i oof~ tr, o l e r a ~ ~ ~)ara.~net.'r. cr
5
Conclusion
\\'e Iravr tle~nonstratrtlin this paper t.lre elfectivr~irssof general p ~ ~ r p o s~)arallcl c a r c h i t e c t ~ ~ rOPSII,i\ c for the irrtrr~lrrtliatelcvcl \ v i t l ~a C O I I ~ I I O I I C I Ilal)cli~rg ~ i ~ n p l r ~ ~ r r ~ ~ t iSo. ~ t . iwe o nhave . proposrtl t\vo algoritllr~lsof loatl d i s t r i b l ~ t i oant1 ~ ~ sllown t h r benefit t o introtlr~cea toleri~~rcc factor on the loatl fu~lctio~r for a bet.ter loatl I)alancir~geflicie~rcy.\Ye Irave rr~atlran i~r~l)lrnreutation ant1 tcstetl t l~c,algoritl~nls011 rr;~li n ~ i ~ g r .
References
[.-\11g87] Al. . - \ u g ~ ~:i ~"Expcri~uents l OII a ~)arallrlSIhIDISPhlT) architectltre ant1 its progra~nr~ri~lg". i l l France-.lal)a~r111tellige~lcea~rtlComputer Science Syniposin~n.Nov 1987 I. I,'intl t l ~ cs ~ St of tllc least loatletl processors among p proces[BalX'L] D.H. BXLLARD et C.11. BRO\VN : "Conrpnter \'ision" h 0 l . h \\.it 11 ;I load tolrranct t . Pre~rticeIlall. S e ~ vJersey (1082). 2 . I.'intl tlrr ol),jt>ct 0, witl~the I~iglrrst cost alnolrg the objets I , ( > I I I ~ ~ I I ~ I I ~ [C'yl)P9] It. Cyl)lier. .J.I>.C. Sanz. L. S~rytlcr: ">\II F:prrw P ~ ~ I I I .\lgoritlrm for Image Component Labeli~lg", in PAIII. :I. (iivc 0,to 1)rocrssor 1'. P E .S. if the rlroice I' ~ n i ~ ~ i ~ n i z e s c ~ s c l ~ i ~cost. ~~ge \;ol 11 . Alarch 1989. pp 258-261 [DucRSa] P. Duclos. F. Boeri. 11. i\uguin. G . Giral~tlon: "Image I . (ioto I 11ntil ol)jret list is rlnl)ty. Processi~lgon a SIXIDISPRID Architecture : OPSILA". 4 Results in Proc I E E E of 9th ICPR. Roma. November 1988. pp 111 tlris srctio~r.\CP give two kinds of r r s ~ ~ l a11o11t ts compo~lent 430-433. li~l)('li~~g i~~rj)lerl~e~rtat,ion. firstly a11 inrpleme~ltatio~l on OPSILI\ [Dr~cAdb]P. Drtclos : Etude tlu par;~.llclismr en trait.(,wi I 11 t lrc first algori ~ I I I I I of loatl I ) a l i ~ c i ~ ~Secontlly. g. 1r.r present a men1 tl'images. Realisations stlr arclrit,rct~~rcm i s t r s i ~ t l ~ ~ l ito~ ill~~stratc, l i o ~ ~ 1.11~. I)ctl~aviorof rnotlifirtl algoritli~uof load SIMDISPMI)". PIID, Universite of Nicc 1988 (111R e n c h ) I);~l;~ring. [DrtffSG] 51.J.n. Duff ed.. Chapter 7 i n I~rtern~ediatc-Le\,cl In~agr Processing, ilc. Press London 1986 4.1 hardware c o n s t r a i n t [Emh89] If. Enibreclrts. D. Roosr. P. \Yamt)accl : " C o m l ) o n r ~ ~ t OI'SIl,i\ is a prototype of a parallcl arclritectr~re. I%ecal~sc caclr Iabeli~rgOIL a tlistribu tetl mcrllory ~ ~ l l ~ l t i ~ ) r o c r s s oin rs", ~ ~ ~ r ~1);11ik l ~ oIlolds r y only 49kb of nrcxnlory, we l i r ~ ~tiht r sizr of inProc of t.he first Er~ropearnworkshop 011l~ypercul)esant1 pr~t i~n;rgcto I)? less tlra~r25Gx25G ~)ixtls. 111 a t l d i t i o ~ ~Inelnory . d i s t r i l ~ ~ ~ trolnprlters. etl Rrnnrs 1989, Nort11-llolla~~tl. 1)p silt ~ ~ r i ~\vit t e 11s 200 l is1.s per ~)rocessor( r e p r ~ s e nt i ~ r gahortt 800 pix5-17. (,Is). So. wo I~avt.cllosc~nt o rcvtlizc a ronlpolrent 1al)elilrg on a edge [For881 l I . C . Forgr~r:Parallelisatio~rtlu Iarrcrr r;\yo~lshr~r1111 111it1) of a i ~ ~ t a g 0r 1. 1 OI'SILI\. \vr have cornl)~~tctl an edge tletecc a l c ~ ~ l a t c utle r typo SIXlD/SPhll). l'1111 of 1J11ivc~rsito of t i o ~\\.it11 ~ 11011 ~ ~ l i r x isr~ppression ~~ri~ ant1 taken 20 per cent of these Nice. S c p t e ~ l r l ~ 1988 r e (in frenclr) c~lgrs.11liti;ll i l r ~ ; ~ gisr t!lr well k11on.11image of girl Lena. For this [Fcng81] T.Y. Feng : ";\ Survey of Interconnrction Nct\vorl;s", it1 i~lri\g(>. \ve I I ~ V C t.he follo\viug clraracteristics (wit11 16 processors): Computer pl) 12-27. Dece~nher1981. Sumllocr of etlge-pixels is 6273 representing 396 different lists [JegdG] 1.. Jegon : "Le langage vectoricl Ilellena". Rapport tlc :\\.(,rage le~rgtlrof list,s : 13.S Rechcrclie INRIA S o 703. J111.v 1986 (in frrnch) .\vcr;~ge~ r n m l ~ of c r lists per processor 24 [HIIII~PS] R. Hnnlmcl. A . Rojer: "I~iiple~llenti~ig a I'arallel Co11.\vcrage cost per 1)rocessor : 392 nectetl Conrl)oner~t~\lgorithiir 011 hIIMD i\rchit,ectr~res". 4.2 R e a l i m p l e r n e ~ l t a t i o ~ l IEEE \Vorksllop on Co111p11ter Architectltre for P a t an Image Data I3ase 1lanagcn1e1lt. I I i a ~ n i t e r ~.-\rralysis ~ \\.(- ~ I I O \ V I I of figl~rc'1 a co~rrplete;rnalysis of the loatl balaci~rg. Floritla 19s: 11.i1rgtlrr first ;~lgoritlrln. \Vc hho\\. in figurc 1.a processor loatl [Ha11781 A.R. H.-\SSOS et E.51. RISE3I;ZN : "\:ISIOSS: :\ C o n ( i . r . I I I I I I I ~ of C ~ 1)ixel) l~eforcand aftcr shari~rgO I I ~ T .h e eficiency pnter Systenr for I~lterpreti~rg Scenes". New York i-\ca;~l'lrr~)rocossingis O.T(i = 6273110: * 510. wlrerc 510 is the loatl tlemic Press 1)p 303-333 1978 of ~)roccssorYo 13. \\:c show ill figure 4.11 list tlistrib~~tion after [Rlai88] J . LIaillartl. .I. Siva. XI. i\uguin , F Rorri : "Accrlcrator I;~l)c~ling (first step), ant1 after glol~aldistril~lrtionin relation with Simulation 011 a I'arallcl Co~rrp~lter". E11ro1)can ('on frr1ig11rv.I.;I. 'l'l~rI I I I I I I ~ of I ~ ~list is rc(111ce(ll)ec:111st$of list ~ i i e r g i ~ ~ g . ence on i\ccelerator. Rome J I I I I1088. ~ I.'ig~~rr .1 .r I ) ~ C S ( \ I I ~rrsl~lts S a l ~ o l ~r~rrittrtl t and receivtvl ~)iselsfor r r Irnagr [Reedl] A.P. Reeves : "Parallel Coml)r~t,erA r c l ~ i t r c t r ~for (~;I(.II ~ ) r o ( . ( ~ s c )\\'t, r . Ir:~vc>fort~~tl tl1;113935 pixels (62 per c c ~ r tof Processing". pp 199-206. ICCI' 1981 1)isrlx) ~ r r l ) r r s r ~ ~ t173 i n g list,s lravr 11rc11~rrovetl. [Shi42] Y. Shiloach a.nti U . Vishkin: "An O(log 11) Parallel C ~ I I \Co prt'wnt(. 011 figure 5 run time in f r ~ ~ ~ c t of i o n11111llber of pronectivity Algorithm". ill .Jor~r~lal of A l g o r i t l l ~ ~ ~:I.s . pp c.c~sso~.s \\.(, It;~vc;~ctivatctl.Time is given i l l 1Irgaryclr of OPSILt\ 57-G7 (19S2) (cy(.Iv = 700 11s). C : o ~ n l ~ o n r I~arlt) c l i ~ ~time g tlrcreasrs as we use [ S h ~ r d i ]I