Density-Based Clustering of Uncertain Data - Database Systems ...

40 downloads 23 Views 818KB Size Report
glejvalued distance functions standard data mining algorithms can ... Algorithms. Keywords density based clustering, uncertain data, fuzzy distance functions. 1.
Research Track Poster

Density-Based Clustering of Uncertain Data Hans-Peter Kriegel

Martin Pfeifle

University of Munich, Germany Institute for Computer Science

University of Munich, Germany Institute for Computer Science

[email protected]

[email protected]

$%675$&7

,QRUGHUWRH[WUDFWNQRZOHGJHIURPWKHVHIX]]\REMHFWGHVFULSWLRQV E\ PHDQV RI VWDQGDUG GDWD PLQLQJ DOJRULWKPV WKH VLPLODULW\ EH WZHHQWKHREMHFWVKDVWREHPHDVXUHGE\RQHQXPHULFDOYDOXHLH WKHFRPSOHWHIX]]\GLVWDQFHLQIRUPDWLRQLVDJJUHJDWHGE\RQO\RQH GLVWDQFHYDOXH2EYLRXVO\DJJUHJDWLRQJRHVKDQGLQKDQGZLWKLQ IRUPDWLRQORVV)RULQVWDQFHZHKDYHQRLQIRUPDWLRQDERXWWKHGH JUHHRIXQFHUWDLQW\RIVXFKDVLQJOHGLVWDQFHYDOXH(YHQLIZHKDG RQH LW ZRXOG EH RI QR XVH EHFDXVH WUDGLWLRQDO GDWD PLQLQJ DOJR ULWKPVHJFOXVWHULQJDOJRULWKPVFDQQRWKDQGOHWKLVDGGLWLRQDOLQ IRUPDWLRQ

,QPDQ\GLIIHUHQWDSSOLFDWLRQDUHDVHJVHQVRUGDWDEDVHVORFDWLRQ EDVHGVHUYLFHVRUIDFHUHFRJQLWLRQV\VWHPVGLVWDQFHVEHWZHHQRE MHFWVKDYHWREHFRPSXWHGEDVHGRQYDJXHDQGXQFHUWDLQGDWD&RP PRQO\ WKH GLVWDQFHV EHWZHHQ WKHVH XQFHUWDLQ REMHFW GHVFULSWLRQV DUHH[SUHVVHGE\RQHQXPHULFDOGLVWDQFHYDOXH%DVHGRQVXFKVLQ JOHYDOXHGGLVWDQFHIXQFWLRQVVWDQGDUGGDWDPLQLQJDOJRULWKPVFDQ ZRUNZLWKRXWDQ\FKDQJHV,QWKLVSDSHUZHSURSRVHWRH[SUHVVWKH VLPLODULW\EHWZHHQWZRIX]]\REMHFWVE\GLVWDQFHSUREDELOLW\IXQF WLRQV7KHVHIX]]\GLVWDQFHIXQFWLRQVDVVLJQDSUREDELOLW\YDOXHWR HDFK SRVVLEOH GLVWDQFH YDOXH %\ LQWHJUDWLQJ WKHVH IX]]\ GLVWDQFH IXQFWLRQVGLUHFWO\LQWRGDWDPLQLQJDOJRULWKPVWKHIXOOLQIRUPDWLRQ SURYLGHGE\WKHVHIXQFWLRQVLVH[SORLWHG,QRUGHUWRGHPRQVWUDWHWKH EHQHILWV RI WKLV JHQHUDO DSSURDFK ZH HQKDQFH WKH GHQVLW\EDVHG FOXVWHULQJDOJRULWKP'%6&$1VRWKDWLWFDQZRUNGLUHFWO\RQWKHVH IX]]\ GLVWDQFH IXQFWLRQV ,Q D GHWDLOHG H[SHULPHQWDO HYDOXDWLRQ EDVHGRQDUWLILFLDODQGUHDOZRUOGGDWDVHWVZHVKRZWKHFKDUDFWHU LVWLFVDQGEHQHILWVRIRXUQHZDSSURDFK

,QWKLVSDSHUZHSURSRVHWRXVHIX]]\GLVWDQFHIXQFWLRQVWRPHDVXUH WKHVLPLODULW\EHWZHHQIX]]\REMHFWV&RQWUDU\WRWKHWUDGLWLRQDODS SURDFKHVZHGRQRWH[WUDFWDJJUHJDWHGYDOXHVIURPWKHIX]]\GLV WDQFHIXQFWLRQVEXWSURSRVHWRHQKDQFHWKHGDWDPLQLQJDOJRULWKPV VRWKDWWKH\FDQH[SORLWWKHIXOOLQIRUPDWLRQSURYLGHGE\WKHVHIXQF WLRQV$VIRUPDQ\LPSRUWDQWDSSOLFDWLRQUDQJHVZKHUHIX]]\GLV WDQFHIXQFWLRQVQDWXUDOO\RFFXUHJWKHFOXVWHULQJRIPRYLQJRE MHFWVGHQVLW\EDVHGFOXVWHULQJDOJRULWKPVVHHPWREHWKHPHWKRGRI FKRLFH>@ZHGHPRQVWUDWHLQWKLVSDSHUKRZIX]]\GLVWDQFHIXQF WLRQVFDQEHLQWHJUDWHGLQWRWKHGHQVLW\EDVHGFOXVWHULQJDOJRULWKP '%6&$1 >@ :H FDOO WKH UHVXOWLQJ FOXVWHULQJ DOJRULWKPV ) '%6&$1LQGLFDWLQJWKDWLWLVDSSOLFDEOHWRFOXVWHUIX]]\REMHFWV

&DWHJRULHVDQG6XEMHFW'HVFULSWRUV *>3UREDELOLW\DQG6WDWLVWLFV@3UREDELOLVWLFDOJRULWKPV LQFOXGLQJ 0RQWH&DUOR 

*HQHUDO7HUPV

7KHUHPDLQGHURIWKLVSDSHULVRUJDQL]HGDVIROORZV,Q6HFWLRQZH SUHVHQWWKHUHODWHGZRUNLQWKHDUHDRIGHQVLW\EDVHGFOXVWHULQJRI XQFHUWDLQGDWD,Q6HFWLRQZHLQWURGXFHIX]]\GLVWDQFHIXQFWLRQV ,Q6HFWLRQZHVKRZKRZZHFDQLQWHJUDWHWKHVHIXQFWLRQVLQWRWKH GHQVLW\EDVHGFOXVWHULQJDOJRULWKP'%6&$1,Q6HFWLRQZHZLOO H[SHULPHQWDOO\VKRZWKHEHQHILWRIRXUQHZIX]]\FOXVWHULQJDOJR ULWKP ) '%6&$1 :HZLOOFORVHWKLVSDSHU LQ 6HFWLRQZLWK D VKRUWVXPPDU\DQGDIHZUHPDUNVRQIXWXUHZRUN

$OJRULWKPV

.H\ZRUGV GHQVLW\EDVHGFOXVWHULQJXQFHUWDLQGDWDIX]]\GLVWDQFHIXQFWLRQV

 ,1752'8&7,21 ,QPDQ\PRGHUQDSSOLFDWLRQUDQJHVHJWKHFOXVWHULQJRIPRYLQJ REMHFWV>@RUVHQVRUGDWDEDVHV>@RQO\XQFHUWDLQGDWDLVDYDLODEOH )RULQVWDQFHLQWKHDUHDRIPRELOHVHUYLFHVWKHREMHFWVFRQWLQXRXVO\ FKDQJHWKHLUSRVLWLRQVVRWKDWH[DFWSRVLWLRQDOLQIRUPDWLRQLVRIWHQ QRWDYDLODEOH ,Q RWKHUDSSOLFDWLRQ DUHDV VXFK DV WKH FOXVWHULQJRI GLVWULEXWHGIHDWXUHYHFWRUV>@GXHWRVHFXULW\DVSHFWVRUWROLPLW HG EDQGZLGWK RQO\ DSSUR[LPDWHG LQIRUPDWLRQ LV WUDQVPLWWHG WR D FHQWUDOVHUYHUVLWH

 5(/$7(':25. *LYHQDVHWRIREMHFWVZLWKDGLVWDQFHIXQFWLRQRQWKHPDQLQWHUHVW LQJGDWDPLQLQJTXHVWLRQLVZKHWKHUWKHVHREMHFWVQDWXUDOO\IRUP JURXSV FDOOHGFOXVWHUV DQGZKDWWKHVHJURXSVORRNOLNH'DWDPLQ LQJDOJRULWKPVWKDWWU\WRDQVZHUWKLVTXHVWLRQDUHFDOOHGFOXVWHULQJ DOJRULWKPV,Q6HFWLRQZHVKRUWO\FODVVLI\FOXVWHULQJDOJRULWKPV DFFRUGLQJWRGLIIHUHQWFDWHJRUL]DWLRQVFKHPHV7KHQLQ6HFWLRQ ZHSUHVHQWWKHEDVLFFRQFHSWVRIIX]]\FOXVWHULQJDOJRULWKPVDQG GHVFULEHKRZWKHDSSURDFKRIWKLVSDSHUGLIIHUVIURPWKHIX]]\FOXV WHULQJ DSSURDFKHV SUHVHQWHG LQ WKH OLWHUDWXUH ,Q 6HFWLRQ  ZH SUHVHQWWKHGHQVLW\EDVHGFOXVWHULQJDOJRULWKP'%6&$1LQDOHYHO RIGHWDLOZKLFKLVLQGLVSHQVDEOHWRXQGHUVWDQGWKHUHPDLQGHURIWKLV SDSHU$VIX]]\REMHFWVFDQDOVREHUHJDUGHGDVPXOWLUHSUHVHQWHG REMHFWVZHZLOOILQDOO\SUHVHQWDGHQVLW\EDVHGFOXVWHULQJDSSURDFK ZKLFKLVVXLWDEOHIRUFOXVWHULQJPXOWLUHSUHVHQWHGREMHFWV

3HUPLVVLRQWRPDNHGLJLWDORUKDUGFRSLHVRIDOORUSDUWRIWKLVZRUNIRU SHUVRQDORUFODVVURRPXVHLVJUDQWHGZLWKRXWIHHSURYLGHGWKDWFRSLHVDUH QRW PDGH RU GLVWULEXWHG IRU SURILW RU FRPPHUFLDO DGYDQWDJH DQG WKDW FRSLHVEHDUWKLVQRWLFHDQGWKHIXOOFLWDWLRQRQWKHILUVWSDJH7RFRS\ RWKHUZLVHRUUHSXEOLVKWRSRVWRQVHUYHUVRUWRUHGLVWULEXWHWROLVWVUH TXLUHVSULRUVSHFLILFSHUPLVVLRQDQGRUDIHH .''¶$XJXVW&KLFDJR,OOLQRLV86$ &RS\ULJKW$&0;

672

Research Track Poster

&OXVWHULQJ$OJRULWKPV

'%6&$1$IODWGHQVLW\EDVHGFOXVWHULVGHILQHGDVDVHWRIGHQVL W\FRQQHFWHGREMHFWVZKLFKLVPD[LPDOZUWGHQVLW\UHDFKDELOLW\ 7KHQWKHQRLVHLVWKHVHWRIREMHFWVQRWFRQWDLQHGLQDQ\FOXVWHU7KXV DFOXVWHUFRQWDLQVQRWRQO\FRUHREMHFWVEXWDOVRREMHFWVWKDWGRQRW VDWLVI\WKHFRUHREMHFWFRQGLWLRQ7KHVHERUGHUREMHFWVDUHGLUHFWO\ GHQVLW\UHDFKDEOHIURPDWOHDVWRQHFRUHREMHFWRIWKHFOXVWHU

&OXVWHULQJDOJRULWKPVFDQEHFODVVLILHGDORQJGLIIHUHQWLQGHSHQGHQW GLPHQVLRQV 2QH ZHOONQRZQ GLPHQVLRQ FDWHJRUL]HV FOXVWHULQJ PHWKRGVDFFRUGLQJWRWKHUHVXOWWKH\SURGXFH+HUHZHFDQGLVWLQ JXLVKEHWZHHQKLHUDUFKLFDODQGSDUWLWLRQLQJFOXVWHULQJDOJRULWKPV >@3DUWLWLRQLQJDOJRULWKPVFRQVWUXFWDIODW VLQJOHOHYHO SDUWLWLRQRI DGDWDEDVH'RIQREMHFWVLQWRDVHWRINFOXVWHUVVXFKWKDWWKHREMHFWV LQDFOXVWHUDUHPRUHVLPLODUWRHDFKRWKHUWKDQWRREMHFWVLQGLIIHUHQW FOXVWHUV $QRWKHU GLPHQVLRQ DFFRUGLQJ WR ZKLFK ZH FDQ FODVVLI\ FOXVWHULQJDOJRULWKPVLVIURPDQDOJRULWKPLFSRLQWRIYLHZ+HUHZH FDQGLVWLQJXLVKEHWZHHQRSWLPL]DWLRQEDVHGRUGLVWDQFHEDVHGDOJR ULWKPVDQGGHQVLW\EDVHGDOJRULWKPV'HQVLW\EDVHGDOJRULWKPVDS SO\DORFDOFOXVWHUFULWHULRQ&OXVWHUVDUHUHJDUGHGDVUHJLRQVLQWKH GDWDVSDFHLQZKLFKWKHREMHFWVDUHGHQVHDQGZKLFKDUHVHSDUDWHG E\ UHJLRQV RI ORZ REMHFW GHQVLW\ QRLVH  ,Q WKLV SDSHU ZH ZLOO SUHVHQW DQ H[WHQVLRQ IRU WKH SDUWLWLRQLQJ GHQVLW\EDVHG FOXVWHULQJ DOJRULWKP'%6&$1>@)RUDPRUHGHWDLOHGJHQHUDORYHUYLHZRQ FOXVWHULQJDOJRULWKPVZHUHIHUWKHLQWHUHVWHGUHDGHUWR>@

7KHDOJRULWKP'%6&$1>@ZKLFKGLVFRYHUVWKHFOXVWHUVDQGWKH QRLVHLQDGDWDEDVHLVEDVHGRQWKHIDFWWKDWDFOXVWHULVHTXLYDOHQWWR WKHVHWRIDOOREMHFWVLQ'ZKLFKDUHGHQVLW\UHDFKDEOHIURPDQDUEL WUDU\FRUHREMHFWLQWKHFOXVWHU FIOHPPDDQGLQ>@ 7KHUH WULHYDORIGHQVLW\UHDFKDEOHREMHFWVLVSHUIRUPHGE\LWHUDWLYHO\FRO OHFWLQJ GLUHFWO\ GHQVLW\UHDFKDEOH REMHFWV '%6&$1 FKHFNV WKH eQHLJKERUKRRGRIHDFKSRLQWLQWKHGDWDEDVH,IWKHeQHLJKERUKRRG 1e R RIDSRLQWRKDVPRUHWKDQmHOHPHQWVRLVDVRFDOOHGFRUH SRLQWDQGDQHZFOXVWHU&FRQWDLQLQJWKHREMHFWVLQ1e R LVFUHDWHG 7KHQWKHeQHLJKERUKRRGRIDOOSRLQWVSLQ&ZKLFKKDYHQRW\HW EHHQSURFHVVHGLVFKHFNHG,I1e S FRQWDLQVPRUHWKDQmSRLQWVWKH QHLJKERUVRISZKLFKDUHQRWDOUHDG\FRQWDLQHGLQ&DUHDGGHGWRWKH FOXVWHUDQGWKHLUeQHLJKERUKRRGLVFKHFNHGLQWKHQH[WVWHS7KLV SURFHGXUHLVUHSHDWHGXQWLOQRQHZSRLQWFDQEHDGGHGWRWKHFXUUHQW FOXVWHU&7KHQWKHDOJRULWKPFRQWLQXHVZLWKDSRLQWZKLFKKDVQRW \HWEHHQSURFHVVHGWU\LQJWRH[SDQGDQHZFOXVWHU

)X]]\&OXVWHULQJ ,QUHDODSSOLFDWLRQVWKHUHLVYHU\RIWHQQRVKDUSERXQGDU\EHWZHHQ FOXVWHUVVRWKDWIX]]\FOXVWHULQJLVRIWHQEHWWHUVXLWHGIRUWKHGDWD 0HPEHUVKLSGHJUHHVEHWZHHQ]HURDQGRQHDUHXVHGLQIX]]\FOXV WHULQJLQVWHDGRIFULVSDVVLJQPHQWVRIWKHGDWDWRFOXVWHUV,QFRQWUDVW WRIX]]\FOXVWHULQJDOJRULWKPVZKHUHREMHFWVDUHDVVLJQHGWRGLIIHU HQWFOXVWHUVLQWKLVSDSHUZHFOXVWHUIX]]\REMHFWUHSUHVHQWDWLRQV DQGDVVLJQHDFKIX]]\REMHFWWRH[DFWO\RQHFOXVWHU)RUPRUHGHWDLOV DERXWIX]]\FOXVWHULQJDOJRULWKPVZHUHIHUWKHUHDGHUWR>@

&OXVWHULQJRI0XOWL5HSUHVHQWHG2EMHFWV ,Q PDQ\ GLIIHUHQW DSSOLFDWLRQ UDQJHV VHYHUDO UHSUHVHQWDWLRQV IRU HDFKREMHFWH[LVWHJPROHFXOHVDUHFKDUDFWHUL]HGE\DQDPLQRDFLG VHTXHQFHDVHFRQGDU\VWUXFWXUHDQGD'UHSUHVHQWDWLRQ)X]]\RE MHFWV FI 'HILQLWLRQ   FDQ DOVR EH UHJDUGHG DV PXOWLUHSUHVHQWHG REMHFWV ,Q >@ D GHQVLW\EDVHG DSSURDFK IRU FOXVWHULQJ VXFK PXOWLUHSUHVHQWHG REMHFWV ZDV SURSRVHG ZKLFK LV EDVHG RQ '% 6&$17RGHWHUPLQHDFOXVWHULQJZKLFKWDNHVDOOUHSUHVHQWDWLRQV LQWRDFFRXQWWKHEDVLFGHILQLWLRQVRI'%6&$1LHWKHFRUHREMHFW GHILQLWLRQDQGWKHUHDFKDELOLW\GHILQLWLRQDUHH[WHQGHG7KHUHE\WKH eQHLJKERUKRRGVRIHDFKUHSUHVHQWDWLRQDUHFRPELQHGWRDJOREDO QHLJKERUKRRG)RUVSDUVHGDWDVHWVWKHXQLRQPHWKRGZDVSURSRVHG ZKLFKDVVXPHVWKDWDQREMHFWLVDFRUHREMHFWLImREMHFWVDUHIRXQG ZLWKLQWKHXQLRQRIDOOeQHLJKERUKRRGVRIDOOUHSUHVHQWDWLRQV)XU WKHUPRUHWKHLQWHUVHFWLRQPHWKRGZDVLQWURGXFHGZKHUHDQREMHFWLV DFRUHREMHFWLIDWOHDVWmREMHFWVDUHZLWKLQWKHLQWHUVHFWLRQRIDOO eQHLJKERUKRRGVRIDOOUHSUHVHQWDWLRQV,QRXUH[SHULPHQWDOHYDOXD WLRQZHZLOOXVHWKHDSSURDFKSUHVHQWHGLQ>@DVFRPSDULVRQSDUW QHU$VDVLGHHIIHFWRIWKLVSDSHULWEHFRPHVFOHDUWKDWDVOLJKWDGDS WLRQRIWKH) '%6&$1DOJRULWKPZRXOGEHPXFKPRUHVXLWDEOHIRU FOXVWHULQJPXOWLUHSUHVHQWHGREMHFWVWKDQWKHDSSURDFKHVLQWURGXFHG LQ>@

'HQVLW\EDVHG&OXVWHULQJ 7KHNH\LGHDRIGHQVLW\EDVHGFOXVWHULQJLVWKDWIRUHDFKREMHFWRID FOXVWHUWKHQHLJKERUKRRGRIDJLYHQUDGLXVeKDVWRFRQWDLQDWOHDVWD PLQLPXPQXPEHURImREMHFWVLHWKHFDUGLQDOLW\RIWKHQHLJKERU KRRG KDV WR H[FHHG D JLYHQ WKUHVKROG ,Q WKH IROORZLQJ ZH ZLOO SUHVHQWWKHEDVLFGHILQLWLRQVRIGHQVLW\EDVHGFOXVWHULQJ 'HILQLWLRQ&RUH2EMHFW 2EMHFWRLVFDOOHGDFRUHREMHFWZUWeDQGmLQDVHWRIREMHFWV'LI _1e R _˜mZKHUH1e R GHQRWHVWKHVXEVHWRI'FRQWDLQHGLQWKH eQHLJKERUKRRGRIR 'HILQLWLRQ'LUHFWO\'HQVLW\5HDFKDEOH 2EMHFWSLVGLUHFWO\GHQVLW\UHDFKDEOHIURPREMHFWRZUWeDQGmLQ DVHWRIREMHFWV'LIRLVDFRUHREMHFWDQGS³1e R ZKHUHDJDLQ 1e R GHQRWHVWKHVXEVHWRI'FRQWDLQHGLQWKHeQHLJKERUKRRGRIR 1RWHWKDWREMHFWVFDQEHGLUHFWO\GHQVLW\UHDFKDEOHRQO\IURPFRUH REMHFWV

 )8==@)RULQVWDQFHWKLQNRIVLWXDWLRQVZKHUH WKHFHQWURLGVDUHFORVHWRHDFKRWKHUEXWGXHWRDUDWKHUKLJKIX]]L QHVVRIWKHREMHFWVWKHGLVWDQFHH[SHFWDWLRQYDOXHVLQGLFDWHDUDWKHU KLJKGLVWDQFHEHWZHHQWKHREMHFWV,QWKLVFDVHZKHUHLWLVQRWYHU\ OLNHO\WKDWWKHREMHFWVIRUPDFOXVWHUWKHFHQWURLGDSSURDFKZRXOG ZURQJO\ GHWHFW FOXVWHUV DQG WKH H[SHFWDWLRQ DSSURDFK ZRXOG FRU UHFWO\GHWHFWQRFOXVWHUV

VWDQFHDQREMHFWLVORFDWHGVRPHZKHUHZLWKLQDPRYLQJPLFURFOXV WHUUHSUHVHQWHGE\DUHFWDQJOHDQGLQ>@DQREMHFWLVORFDWHGVRPH ZKHUHLQDK\SHUVSKHUH,Q>@GLPHQVLRQDOSUREDELOLW\GHQVLW\ IXQFWLRQV SGI DUHXVHGWRGHVFULEHDWWULEXWHVRIXQFHUWDLQVHQVRU GDWD:HH[WHQGWKLVDSSURDFKDQGSURSRVHWRGHVFULEHDQREMHFWQR ORQJHU E\ RQH VLQJOH IHDWXUH YHFWRU EXW E\ D SUREDELOLW\ GHQVLW\ IXQFWLRQLQGLFDWLQJWKHOLNHOLKRRGWKDWDQREMHFWLVORFDWHGDWDFHU WDLQSRVLWLRQ 'HILQLWLRQ)X]]\2EMHFW5HSUHVHQWDWLRQ /HW R ³ ' ² ,5 G EH DQ REMHFW IURP D GDWDEDVH $ IX]]\ REMHFW UHSUHVHQWDWLRQLVDIXQFWLRQRIX]]\ ,5 G“ ,5 ­ Š IRUZKLFKWKH IROORZLQJFRQGLWLRQKROGV × × RIX]]\ ( Y ) GY =  ,5

 ) '%6&$1

G

,QWKLVVHFWLRQZHZLOOGHVFULEHRXUH[WHQGHGFOXVWHULQJDOJRULWKP ) '%6&$1ZKLFKGRHVQRWUHO\RQORVV\DJJUHJDWHGLQIRUPDWLRQ EXWH[SORLWVWKHFRPSOHWHLQIRUPDWLRQSURYLGHGE\WKHIX]]\GLV WDQFHIXQFWLRQV:HILUVWSUHVHQWWKHIRUPDOGHILQLWLRQVXQGHUO\LQJ WKH) '%6&$1DOJRULWKP FI6HFWLRQ EHIRUHZHORRNDWFRP SXWDWLRQDODVSHFWV FI6HFWLRQ 

'LVWDQFH)XQFWLRQVEHWZHHQ)X]]\2EMHFWV 7UDGLWLRQDOGDWDPLQLQJDOJRULWKPVUHTXLUHGLVWDQFHIXQFWLRQVZKLFK H[SUHVVWKHVLPLODULW\EHWZHHQWZRREMHFWVE\H[DFWO\RQHQXPHULFDO YDOXH,QWKLVVHFWLRQZHLQWURGXFHGLVWDQFHIXQFWLRQVZKLFKGRQRW H[SUHVVWKHVLPLODULW\EHWZHHQWZRREMHFWVE\DVLQJOHQXPHULFDO YDOXH,QVWHDGZHSURSRVHWRXVHIX]]\GLVWDQFHIXQFWLRQVZKHUHWKH VLPLODULW\ EHWZHHQ WZR IX]]\ REMHFWV LV H[SUHVVHG E\ PHDQVRID SUREDELOLW\IXQFWLRQZKLFKDVVLJQVDQXPHULFDOYDOXHWRHDFKGLV WDQFHYDOXH7ZRIX]]\GLVWDQFHIXQFWLRQVDUHWKHGLVWDQFHGHQVLW\ IXQFWLRQDQGWKHGLVWDQFHGLVWULEXWLRQIXQFWLRQ

7KHRUHWLFDO)RXQGDWLRQV 7KHDOJRULWKP) '%6&$1LVEDVHGRQDQHQKDQFHGYHUVLRQRIWKH FRUHREMHFWGHILQLWLRQ FI'HILQLWLRQ 7KHFRUHREMHFWSUREDELOLW\ RIDQREMHFWRLQGLFDWHVWKHOLNHOLKRRGWKDWRLVDFRUHREMHFW 'HILQLWLRQ&RUH2EMHFW3UREDELOLW\ /HW' EHDGDWDEDVHDQGOHW3G ' ™ ' “ ,5 “>@ EHD GLVWDQFHGLVWULEXWLRQIXQFWLRQ7KHQWKHFRUHREMHFWSUREDELOLW\RI DQREMHFWRLVGHILQHGDV

'HILQLWLRQ'LVWDQFH'HQVLW\)XQFWLRQ /HW G ' ™ ' “ ,5  EH D GLVWDQFH IXQFWLRQ DQG OHW 3 ( D ˆ G ( R, R ) ˆ E ) GHQRWHWKHSUREDELOLW\WKDWG RR¶ LVEHWZHHQD DQG E 7KHQ D SUREDELOLW\ GHQVLW\ IXQFWLRQ SG ' ™ '“ ,5 “ ,5 ­ Š  LVFDOOHGDGLVWDQFHGHQVLW\IXQFWLRQLIWKHIROORZLQJ FRQGLWLRQKROGV 3 ( D ˆ G ( R, R ) ˆ E ) =

FRUH

3 e, m, G, ' ( R ) = 

Ê º 3G ( S, R ) ( e ) º(  – 3G ( S , R ) ( e ) )

E

×D SG ( R, R ) ( [ ) G[

$²' S³$ $ ˜m

,IWKHGLVWDQFHt G RR¶ EHWZHHQWZRREMHFWVFDQH[DFWO\EHGHWHU PLQHGWKHSUREDELOLW\GHQVLW\IXQFWLRQSGLVHTXDOWRWKHGLUDFGHOWD IXQFWLRQ dLH SG R R¶ [ d [t). )RU DUELWUDU\ IXQFWLRQV I HJ I [ WKHGLUDFGHOWDIXQFWLRQKDVWKHIROORZLQJLPSRUWDQWSURS HUW\ E

× I ( [ )d ( [ – t ) G[ =

D

Ñ Ô Ò Ô Ó

I ( t ) LI ( D ˆ t ˆ E )  RWKHUZLVH

S ³ '?$

/HPPD7KHFRUHREMHFWSUREDELOLW\ 3eFRUH , m, G, ' ( R ) LVHTXDOWRWKH

SUREDELOLW\YDOXH3 _1e R _ ˜ m LQGLFDWLQJWKHOLNHOLKRRGWKDWRLVD FRUHREMHFW 3URRI,Q'HILQLWLRQZHGHWHUPLQHIRUHDFKVXEVHW$RI'KDYLQJ DFDUGLQDOLW\KLJKHUWKDQmWKHSUREDELOLW\WKDWRQO\WKHSRLQWVRI$ DUHZLWKLQDQeUDQJHRIREXWQRSRLQWVRI' ? $7KHVXPRIDOOWKHVH SUREDELOLW\YDOXHVLQGLFDWHVWKHSUREDELOLW\WKDWRLVDFRUHREMHFW LH 3eFRUH  , m, G, ' ( R ) 3 _1e R _ ˜ m 

>@



6LPLODUWRGLVWDQFHGHQVLW\IXQFWLRQVZHFDQGHILQHGLVWDQFHGLVWUL EXWLRQIXQFWLRQV



1RWHWKDWWKHWUDGLWLRQDOGHILQLWLRQRIDFRUHREMHFWFDQDOVREHUH JDUGHGDVDIXQFWLRQZKLFKDVVLJQVWRHDFKREMHFWRDYDOXHHTXDOWR LIIRLVDFRUHREMHFWDQGRWKHUZLVH,IWKHGLVWDQFHGLVWULEXWLRQ IXQFWLRQ3G\LHOGVRQO\YDOXHVDQGDWSRVLWLRQeWKHWUDGLWLRQDO DQGWKHSUREDELOLW\GHILQLWLRQRIDFRUHREMHFWFRLQFLGH

'HILQLWLRQ'LVWDQFH'LVWULEXWLRQ)XQFWLRQ /HWG'™ '“ ,5 EHDGLVWDQFHIXQFWLRQDQGOHW 3 ( G ( R R ) ˆ E ) GHQRWHWKHSUREDELOLW\WKDWG RR¶ LVVPDOOHUWKDQE7KHQDSURED ELOLW\GLVWULEXWLRQIXQFWLRQ3G2™ 2“ ,5  “>@ LVFDOOHGD GLVWDQFHGLVWULEXWLRQIXQFWLRQLIWKHIROORZLQJFRQGLWLRQKROGV

)LJXUHVKRZVKRZRXUSUREDELOLW\GHILQLWLRQRIDFRUHREMHFWGLI IHUVIURPWKH³WUDGLWLRQDO´DSSURDFKZKHUHWKHVLPLODULW\EHWZHHQ IX]]\REMHFWVLVPHDVXUHGE\WKHLUGLVWDQFHH[SHFWDWLRQYDOXHV$O WKRXJKWKHREMHFWRLQ)LJXUHDGRHVQRWVHHPWREHORFDWHGLQD YHU\GHQVHDUHDLWLVDFRUHREMHFWDFFRUGLQJWRWKHWUDGLWLRQDODS SURDFKDVWKHGLVWDQFHH[SHFWDWLRQYDOXHEHWZHHQRDQGm RWKHU REMHFWVLVVPDOOHUWKDQe2QWKHRWKHUKDQGLWLVYHU\XQOLNHO\WKDW DOOmREMHFWVDUHLQGHHGORFDWHGLQ1e R 7KHUHIRUHWKHSUREDELOLW\ WKDWRLVDFRUHREMHFWLVYHU\VPDOO,Q)LJXUHEWKHUHYHUVHVLWXDWLRQ

3 G ( R, R ) ( E ) = 3 ( G ( R R ) ˆ E ) /HWXVQRWHWKDW 3 G ( R, R ) ( E ) = ×–E S G ( R, R ) ( [ ) G[ KROGVDQGWKDW Š WKHUHIRUHSGDQG3GFRQWDLQEDVLFDOO\WKHVDPHLQIRUPDWLRQ $VDOUHDG\PHQWLRQHGWUDGLWLRQDODOJRULWKPVFDQRQO\KDQGOHGLV WDQFH IXQFWLRQV ZKLFK \LHOG D XQLTXH GLVWDQFH YDOXH ,Q RUGHU WR PDNHRXUIX]]\GLVWDQFHIXQFWLRQVXVHIXOIRUVWDQGDUG FOXVWHULQJ DOJRULWKPVZHFRXOGH[WUDFWDQDJJUHJDWHGYDOXHRIWKHP)RULQ

674

Research Track Poster

UHDFKDEOHIURPWKHFXUUHQWTXHU\REMHFWR7KHIX]]\YHUVLRQ) '% 6&$1ZRUNVYHU\VLPLODUWRWKHWUDGLWLRQDODSSURDFK$QREMHFWSLV DGGHGWRWKHFXUUHQWFOXVWHULIWKHYDOXH 3 eUHDFK , m, G, ' ( S, R ) H[FHHGV ZKHUHRLVWKHFXUUHQWTXHU\REMHFW1RWHWKDWLI 3eFRUH , m – , G, '? { S } ( R ) KROGVIRUQRREMHFWSWKHYDOXH 3 eUHDFK , m, G, ' ( S, R ) FDQH[FHHG 7KHUHIRUHSZLOOQRWEHDGGHGWRWKHFXUUHQWFOXVWHU$JDLQWKLV LVDJHQHUDOL]DWLRQRIWKHWUDGLWLRQDODSSURDFK

E

D R

e

e R

• core object according to the traditional approach based on the distance expectation values Ed. • very unlikely a core object according to the probability approach of Definition 7.

• no core object according to the traditional approach based on the distance expectation values Ed. • very likely a core object according to the probability approach of Definition 7.

)LJXUH'HWHUPLQDWLRQRIFRUHSRLQWSURSHUW\ m

7KH UHPDLQLQJ TXHVWLRQ LV KRZ WR FRPSXWH WKH YDOXHV 3 eUHDFK , m, G, ' ( S, R )  HIILFLHQWO\ $OWKRXJK WKHUH PLJKW H[LVW VLWXDWLRQV ZKHUHZHFDQFRPSXWHWKHVHYDOXHVGLUHFWO\EDVHGRQWKHIX]]\RE MHFWUHSUHVHQWDWLRQV FI'HILQLWLRQ LQWKLVSDSHUZHSURSRVHD JHQHUDOO\DSSOLFDEOHDSSURDFKEDVHGRQPRQWHFDUORVDPSOLQJ,Q PDQ\DSSOLFDWLRQVWKHIX]]\REMHFWVPLJKWDOUHDG\EHGHVFULEHGE\ DGLVFUHWHSUREDELOLW\GHQVLW\IXQFWLRQLHZHKDYHWKHVDPSOHVHW DOUHDG\,IWKHIX]]\REMHFWLVGHVFULEHGE\DFRQWLQXRVSUREDELOLW\ GHQVLW\IXQFWLRQZHFDQHDVLO\VDPSOHDFFRUGLQJWRWKLVIXQFWLRQ DQGGHULYHWKXVDVHTXHQFHRIVDPSOHV,QWKHIROORZLQJZHDVVXPH WKDWHDFKREMHFW[LVUHSUHVHQWHGE\DVHTXHQFHRIVVDPSOHSRLQWV LH[LVUHSUHVHQWHGE\VGLIIHUHQWUHSUHVHQWDWLRQV[[V!

 

LVVNHWFKHG2EMHFWRLVORFDWHGLQDYHU\GHQVHDUHDEXWWKHUHGRQRW H[LVWmREMHFWVSIRUZKLFK ( G ( R, S ) ˆ m KROGV7KHUHIRUHRLVQR FRUHREMHFWDFFRUGLQJWRWKHWUDGLWLRQDODSSURDFKDOWKRXJKLWLVYHU\ OLNHO\WKDWWKHUHH[LVWmHOHPHQWVSIRUZKLFK G ( R, S ) ˆ e KROGV

%DVHG RQ WKH VDPSOH VHTXHQFHV ZH FRXOG QRZ FRPSXWH GLVFUHWH GLVWDQFHGHQVLW\IXQFWLRQVFRQVLVWLQJRIVPDQ\GLVFUHWHGLVWDQFH YDOXHV%DVHGRQWKHVHIXQFWLRQVZHFRXOGWKHQFRPSXWHWKHUHDFK DELOLW\SUREDELOLWLHVDFFRUGLQJWR'HILQLWLRQ7KHELJSUREOHPLV WKDWZHKDYHWRFRPSXWHIRUHDFKTXHU\REMHFWR2 _'%_ PDQ\GLI IHUHQWFRUHREMHFWYHUVLRQVOHDYLQJRXWDOZD\VRQHHOHPHQWIURPWKH GDWDEDVH)XUWKHUPRUHWKHFRPSXWDWLRQRIHDFKRIWKHVHFRUHREMHFW YDOXHVKDVWRFRQVLGHU LQ_'%_ H[SRQHQWLDOO\PDQ\VHWV $ ² '% FI'HILQLWLRQ 2EYLRXVO\WKLVLVLPSUDFWLFDEOH

%DVHGRQWKHFRUHREMHFWSUREDELOLW\GHILQLWLRQZHFDQGHILQHKRZ OLNHO\ LW LV WKDW DQ REMHFW S LV GLUHFWO\ GHQVLW\ UHDFKDEOH IURP DQ REMHFWR,QWKHWUDGLWLRQDODSSURDFKWZRFRQGLWLRQVKDYHWRKROG )LUVWRKDVWREHDFRUHREMHFWDQGVHFRQGWKHGLVWDQFHEHWZHHQS DQGRKDVWREHVPDOOHUWKDQe,QWKHFRQWH[WRIWKLVSDSHUERWKRI WKHVHFRQGLWLRQVDUHIX]]\KROGLQJRQO\ZLWKDFHUWDLQSUREDELOLW\ 'HILQLWLRQ5HDFKDELOLW\3UREDELOLW\ /HW ' EH D GDWDEDVH DQG OHW 3G ' ™ ' “ ,5 “ >@  EH D GLVWDQFHGLVWULEXWLRQIXQFWLRQ7KHQWKHUHDFKDELOLW\SUREDELOLW\RI SZUWRLVGHILQHGDVIROORZV UHDFK 3 e, m, G, ' ( S,

R) =

FRUH 3 e, m – , G, '? { S }( R )

7KHLGHDRIRXUDSSURDFKLVWRGHWHUPLQHWKHFRUHREMHFWSUREDELOL WLHVEDVHGRQVPHDQLQJIXOVDPSOHV7KHQZHFRPSXWHWKHUHDFK DELOLW\YDOXHVDFFRUGLQJWR'HILQLWLRQ

¼ 3 G ( S, R ) ( e )

/HPPD  3 eUHDFK , m, G, ' ( S, R )  UHIOHFWV WKH SUREDELOLW\ WKDW S LV GLUHFWO\GHQVLW\UHDFKDEOHIURPR

:HILUVWFRPSXWHIRUDOOREMHFWV[WKHPLQLPXPERXQGLQJUHFWDQJOH 0%5 [ RIWKHVDPSOHSRLQWV[[V! FI)LJXUH ,IZHQRZ FDUU\RXWDUDQJHTXHU\DURXQGRZHFUHDWHDVDPSOHPDWUL[0 R ZKLFK FRQWDLQV IRU HDFK REMHFW LQVWDQFH RL V GLIIHUHQW YDOXHV P L, M _ 1 e, 'M ( R L ) _ ZKHUH 'M GHQRWHV WKH MWK GDWDEDVH LQVWDQFH ^[M_ Æ [ , ¡, [ M, ¡, [ VÖ ³ ' ¾ [ M ž R M `  ­ R L  DQG 1e, ' M ( R L )  GH QRWHVWKHVHW^[M_ G ( R L, [ M ) ˆ e ¾ [ M ³ ' M ` FI)LJXUH :HWHVWIRU HDFKREMHFW[LQWKHGDWDEDVHZKHWKHUWKHUHH[LVWVDPSOHLQVWDQFHV[M IRUZKLFK G ( R L, [ M ) ˆ e KROGV,IWKLVLVWUXHZHLQFUHDVHWKHFXUUHQW YDOXHRI P L, M 1RWHWKDWRIWHQZHGRQRWKDYHWRFRPSXWHWKHV GLVWDQFHV G ( RL, [ M ) EXWZHFDQGHFLGHEDVHGRQWKHER[HV0%5 R DQG0%5 [ ZKHWKHUZHKDYHWRLQFUHDVHDOOYDOXHVRIWKHVDPSOH PDWUL[ 0 R  RU QRQH RI WKHP ,I IRU WKH PD[LPXP GLVWDQFH G PD[ ( R, [ )  EHWZHHQ WKH WZR ER[HV 0%5 R  DQG 0%5 [ G PD[ ( R, [ ) ˆ e KROGVZHFDQLQFUHDVHDOOYDOXHVRIWKHVDPSOHPD WUL[0 R E\ FIREMHFWFLQ)LJXUH ,IIRUWKHPLQLPXPGLVWDQFH G PLQ ( R, [ ) EHWZHHQWKHWZRER[HV G PLQ ( R, [ ) ˜ e KROGVZHGRQRW KDYHWRLQFUHDVHDQ\RIWKHVHYDOXHV FIREMHFWGLQ)LJXUH 2QO\ LIWKHYDOXHRIeLVVRPHZKHUHLQEHWZHHQWKHWZRYDOXHV G PLQ ( R, [ ) DQG G PD[ ( R, [ ) ZHKDYHWRFRPSXWHWKHGLVWDQFHVEHWZHHQWKHVDP SOHVWRGHFLGHZKLFKYDOXHV P L, M RIWKHVDPSOHPDWUL[KDYHWREH LQFUHDVHG FIREMHFWDLQ)LJXUH )LQDOO\ZHZRXOGOLNHWRPHQ WLRQWKDWZHFDQFRPSXWHWKLVVDPSOHPDWUL[E\RQO\RQHUDQJHVFDQ

3URRI$FFRUGLQJWR/HPPDWKHSUREDELOLW\WKDWDWOHDVWmRE MHFWVIURP'?SDUHORFDWHGLQ1e R LVHTXDOWR 3 eFRUH , m – , G, '? { S } ( R )  6HFRQGWKHSUREDELOLW\WKDWWKHGLVWDQFHEHWZHHQSDQGRLVVPDOOHU WKDQeLVHTXDOWR 3 G ( S, R ) ( e ) $VWKHVHWZRFRQGLWLRQVDUHLQGHSHQ GHQWIURPHDFKRWKHUWKHLUSURGXFWFRUUHVSRQGVWRWKHSUREDELOLW\ WKDWDWOHDVWmREMHFWVIURP'DUHORFDWHGLQ1e R DQGWKDWSLVRQHRI WKHP1RWHWKDWWKLVYDOXHUHIOHFWVWKHSUREDELOLW\WKDWSLVGLUHFWO\ GHQVLW\UHDFKDEOHIURPR 'HILQLWLRQFDQEHUHJDUGHGDVDQH[WHQVLRQRIWKHWUDGLWLRQDODS SURDFK,WFRLQFLGHVZLWKWKHWUDGLWLRQDODSSURDFKLIZHDVVXPHWKDW WKHFRUHREMHFWSUREDELOLW\LVDOZD\VRUDQGWKHGLVWDQFHGLVWUL EXWLRQIXQFWLRQ3G\LHOGVRQO\YDOXHVDQGDWSRVLWLRQe.

&RPSXWDWLRQDO$VSHFWV 7KHWUDGLWLRQDO'%6&$1DOJRULWKPFOXVWHUVDGDWDVHWE\DOZD\V DGGLQJ REMHFWV WR WKH FXUUHQW FOXVWHU ZKLFK DUH GLUHFWO\ GHQVLW\ 1RWHWKDWFOXVWHULQJRQWKHFHQWURLGVRIWKHIX]]\REMHFWUHSUHVHQ

WDWLRQV ZRXOG VXIIHU IURP WKH VDPH GUDZEDFNV DV WKH DSSURDFK EDVHGRQWKHGLVWDQFHH[SHFWDWLRQYDOXHV

675

Research Track Poster

G [

[

GPD[ RD

D

E

E

[D

E

[ E[

LQVWDQFHVRIR

OHJHQG

D [

GPLQ RD

[

F

D

R [ R R

R

[ 0%5 R

e

GDWDEDVHLQVWDQFHV 

  

  



  



 VDPSOH  PDWUL[RIR 

FOXVWHULQJDSSURDFKHVZHDVVXPHWKDWHDFKSRVLWLRQZLWKLQWKHER[ LVHTXDOO\OLNHO\

H[DPSOHV

UHDFK  G ' ( D, R )

=

   ¼  =   

UHDFK 3  G ' ( E, R )

=

   ¼  =   

UHDFK  G ' ( F, R )

=

  ¼  =    

3

e,

e,

3

e,

,

,

,

,

,

,

7KHDUWLILFLDOGDWDVHW $57 FRQVLVWVRIGLPHQVLRQDOREMHFWV ZKLFKDUHQRUPDOO\GLVWULEXWHGLQ> @ 7KHHQJLQHHULQJGDWDVHW 3/$1( FRQVLVWVRI'&$'RE MHFWVSURYLGHGE\RXULQGXVWULDOSDUWQHUDQ$PHULFDQDLUSODQHPDQ XIDFWXUHU(DFKREMHFWLVUHSUHVHQWHGE\DGLPHQVLRQDOIHDWXUH YHFWRUZKLFKLVGHULYHGIURPWKHFRYHUVHTXHQFHPRGHODVGHVFULEHG LQ>@

H[DFWVDPSOHV DUHQRWUHOHYDQW

,PSOHPHQWDWLRQ)RUFOXVWHULQJWKHIX]]\REMHFWUHSUHVHQWDWLRQV ZH KDYH LPSOHPHQWHG WKH DOJRULWKP ) '%6&$1 DV GHVFULEHG LQ 6HFWLRQ  )XUWKHUPRUH ZH LPSOHPHQWHG WKH WZR DSSURDFKHV 81,21DQG,17(56(&7,21DVGHVFULEHGLQ>@DQGWKHVWDQ GDUG '%6&$1 DSSURDFK ZKLFK FDUULHV RXW WKH IX]]\ FOXVWHULQJ EDVHGRQWKHGLVWDQFHH[SHFWDWLRQYDOXHV UHIHUUHGWRDV(;3'% 6&$1 DQGWKHFOXVWHULQJRQWKHH[DFWREMHFWUHSUHVHQWDWLRQV

H[DFWVDPSOHV DUHUHOHYDQW

)LJXUH&RPSXWDWLRQRIIX]]\UHDFKDELOLW\GLVWDQFHV V m   $IWHUKDYLQJFRPSXWHGWKHVDPSOHPDWUL[0 R ZHFDQHDVLO\GH ULYHWKHUHDFKDELOLW\YDOXHVIRUDOOREMHFWV[LQWKHGDWDEDVH'ZUW R7KHUHWR 3 eFRUH , m – , G, '?[ ( R ) ¼ 3 G ( [, R ) ( e )  FI'HILQLWLRQ KDVWR EHFRPSXWHG7KHILUVWSDUWFDQEHGHULYHGIURPWKHPDWUL[0 R LI ZH GHFUHDVH WKH YDOXHV P L, M  E\  IRU ZKLFK G ( R L, [ M ) ˆ e  KROGV 7KHQZHFDQFRXQWWKHQXPEHURIHOHPHQWVLQWKHVDPSOHPDWUL[ 0 R ZKLFKFRQWDLQYDOXHVKLJKHURUHTXDOWRm1RUPDOL]LQJWKLV QXPEHUE\V\LHOGVWKHSUREDELOLW\ 3 eFRUH , m – , G, '?[ ( R ) 7KHYDOXHV 3 G ( [, R ) ( e )  FDQ EH FRPSXWHG E\ FRXQWLQJ WKH QXPEHU RI HYHQWV G ( R L, [ M ) ˆ e DQGE\QRUPDOL]LQJWKLVQXPEHUDJDLQE\V)RUWKH FRPSXWDWLRQ RI 3 G ( [, R ) ( e )  WKH GLVWDQFHV G PD[ ( R, [ )  DQG G PLQ ( R, [ ) FDQDJDLQEHXVHGIRUSUXQLQJ

$OO DOJRULWKPV ZHUH LPSOHPHQWHG LQ -DYD  7KH H[SHULPHQWV ZHUH UXQ RQ D :LQGRZV ODSWRS ZLWK D  0+] SURFHVVRU DQG  0%PDLQPHPRU\,IQRWRWKHUZLVHVWDWHGZHXVHGDVDPSOH UDWHRIV  4XDOLW\0HDVXUHV)RUFRPSDULQJDJLYHQUHIHUHQFHFOXVWHULQJWR WKHFOXVWHULQJVUHVXOWLQJIURPFOXVWHULQJWKHIX]]\REMHFWUHSUHVHQ WDWLRQVZHXVHGWKHDSSUR[LPDWLQJTXDOLW\PHDVXUHLQWURGXFHGLQ >@,Q>@a quality measure for clusters based on the symmetric set difference was introduced and based on this distance measure between clusters a quality criteria for approximated partitioning clusterings QAPC was introduced. This quality measure is based on the PLQLPXPZHLJKWSHUIHFWPDWFKLQJRIVHWV

,IZHDVVXPHQGDWDEDVHREMHFWVZKLFKDUHQRWVWRUHGLQDQ\LQGH[ VWUXFWXUHDQGDVDPSOHUDWHRIVZHFDQVXPPDUL]HWKHFKDUDFWHULV WLFVRIWKH) '%6&$1LPSOHPHQWDWLRQDVIROORZV • We need O(n) range scans    • We require between 2 ( Q ) and 2 ( V ¼ Q ) many distance computations between d-dimensional feature vectors.

3DUDPHWHUV,QDOORXUWHVWVZHVHWm DQGXVHGDQeSDUDPHWHU IRU WKH YDULRXV '%6&$1 LPSOHPHQWDWLRQV VXFK WKDW EHWZHHQ  DQGFOXVWHUVDQGEHWZHHQDQGQRLVHREMHFWVIRUWKHUHI HUHQFHFOXVWHULQJZHUHFUHDWHG

([SHULPHQWDO5HVXOWV

1RWHWKDWHVSHFLDOO\LQWKHLPSRUWDQWFDVHZKHUHWKHIX]]\REMHFWV  DUHQRWWRRIX]]\ZHRQO\QHHGDURXQG 2 ( Q ) many distance calculations,QWKLVFDVHWKHLQWURGXFHGSUXQLQJGLVWDQFHVGPLQDQGGPD[ DUHYHU\HIIHFWLYHDVWKH\DUHUDWKHUFORVHWRHDFKRWKHU7KHUHIRUHLW LVYHU\XQOLNHO\WKDWWKHeYDOXHLVLQEHWZHHQWKHP)XUWKHUPRUHLQ WKLVFDVHLWLVEHQHILFLDOWRRUJDQL]HWKHPLQLPXPERXQGLQJUHFWDQ JOHVRIWKHVDPSOHVHWVLQ5WUHH>@OLNHLQGH[VWUXFWXUHV$VZHFDQ W\SLFDOO\XVHUDWKHUVPDOOeYDOXHVIRU'%6&$1WKHQXPEHURIGLV WDQFH FRPSXWDWLRQV FDQ WKXV IXUWKHU EH UHGXFHG WR 2 ( Q ¼ ORJ Q ) ZKLFKFRUUHVSRQGVWRWKHWLPHFRPSOH[LW\RIWKHRULJLQDO'%6&$1 DOJRULWKPEDVHGRQLQGH[VWUXFWXUHV

(IILFLHQF\ )LUVW ZH LQYHVWLJDWH WKH UXQWLPHV RI RXU IX]]\ '% 6&$1FOXVWHULQJDSSURDFKHV7KHIROORZLQJWDEOHGHSLFWVWKHDEVR OXWHUXQWLPHVLQVHFRQGVIRUWKH$57GDWDVHW r V   ) '%6&$1

(;3'%6&$1

81,21

,17(56(&7,21









7KHJRRGSHUIRUPDQFHRIWKH ) '%6&$1DSSURDFKGHPRQVWUDWHV WKHVXLWDELOLW\RIWKHILOWHUVLQWURGXFHGLQ6HFWLRQUHVXOWLQJLQ   RQO\ 2 ( Q  ) , and not 2 ( V ¼ Q ) , many distance computations.)XU WKHUPRUHZHFDQVHHWKDWDOORWKHUIX]]\FOXVWHULQJDSSURDFKHVDUH VORZHUZKLFKFDQEHH[SODLQHGE\WKHKLJKHUQXPEHURIGLVWDQFH FRPSXWDWLRQVZKLFKKDYHWREHFDUULHGRXWLH81,21DQG,17(5 6(&7,21UHTXLUH 2 ( V ¼ Q  ) PDQ\GLVWDQFHFRPSXWDWLRQVDQG(;   3'%6&$1UHTXLUHV 2 ( V ¼ Q ) PDQ\GLVWDQFHFRPSXWDWLRQV

 (;3(5,0(17$/(9$/8$7,21 ,QWKLVVHFWLRQZHSUHVHQWWKHH[SHULPHQWDOUHVXOWVRIWKHLQWURGXFHG FOXVWHULQJDOJRULWKP) '%6&$1GHPRQVWUDWLQJWKHFKDUDFWHULVWLFV DQGEHQHILWVRIRXUDSSURDFK

6HWXS

,QWKHIROORZLQJVHFWLRQVZHZLOOVKRZWKDWIURPDQHIIHFWLYLW\SRLQW RI YLHZ RXU DSSURDFKHV DOVR RXWSHUIRUP WKH FKRVHQ FRPSDULVRQ SDUWQHUVE\IDU

'DWD6HWV$OOH[SHULPHQWVZHUHEDVHGRQWZRGLIIHUHQWWHVWGDWD VHWVDQ DUWLILFLDO GDWD VHW DQG DQHQJLQHHULQJ GDWD VHWZKLFKDUH QRUPDOL]HGLQDGDWDVSDFH> @G)RUHDFKGDWDVHWZHKDYHH[DFW REMHFWUHSUHVHQWDWLRQVLHDQREMHFWLVGHVFULEHGE\H[DFWO\RQHIHD WXUHYHFWRU)XUWKHUPRUHHDFKREMHFWLVUDQGRPO\VXUURXQGHGE\D ER[KDYLQJDVLGHOHQJWKRIr LQHDFKGLPHQVLRQ)RURXUIX]]\

(IIHFWLYLW\,QDILUVWVHWRIH[SHULPHQWVZHLQYHVWLJDWHGWKHTXDOL WLHVRIWKHGLIIHUHQWIX]]\FOXVWHULQJDSSURDFKHVZUWDJLYHQUHIHU HQFHFOXVWHULQJ)LJXUHVKRZVIRUWKH$57DQGIRUWKH3/$1( GDWDVHWWKDWIRUDOOIX]]\FOXVWHULQJDOJRULWKPVWKHTXDOLW\GHFUHDVHV ZLWKDQLQFUHDVLQJYDOXHRIrLHDQLQFUHDVLQJXQFHUWDLQW\DUHDRI WKHREMHFWV)XUWKHUPRUHZHFDQVHHWKDWWKH) '%6&$1DOJRULWKP

676





D $57

 



) '%6&$1

,17(56(&7,21

(;3'%6&$1

81,21

     

) '%6&$1

,17(56(&7,21

(;3'%6&$1

81,21

E 3/$1( D $57 )LJXUHCore-object classification of IX]]\'%6&$1 UHFDOO SUHFLVLRQ FOXVWHULQJ DOJRULWKPV(r = 0.01).

r[1/1000]  

E 3/$1(

)LJXUH4XDOLW\RIIX]]\'%6&$1FOXVWHULQJDOJRULWKPV

3UHFLVLRQ5HFDOO>@

r[1/1000] 

     

     

3UHFLVLRQ5HFDOO>@

     

) '%6&$1 (;3'%6&$1

3UHFLVLRQ5HFDOO>@

81,21 ,17(56(&7,21

TXDOLW\4$3&>@

TXDOLW\4$3&>@

Research Track Poster

VLGHVO\LQJWKHWKHRUHWLFDOIRXQGDWLRQVIRUGHQVLW\EDVHGFOXVWHULQJ RIIX]]\GDWDZHVKRZHGKRZWRSXWWKHVHFRQFHSWVLQWRSUDFWLFH 7KHUHVXOWLQJSDUWLWLRQLQJGHQVLW\EDVHGDOJRULWKP) '%6&$1FDQ EHXVHGWRFOXVWHUIX]]\GDWDHJPRYLQJREMHFWVHIIHFWLYHO\DQG HIILFLHQWO\7KHDOJRULWKPIROORZVWKHQHZSDUDGLJPRILQWHJUDWLQJ IX]]\ GLVWDQFH IXQFWLRQV GLUHFWO\ LQWR GDWD PLQLQJ DOJRULWKPV LQ VWHDGRIZRUNLQJRQORVV\DJJUHJDWHGLQIRUPDWLRQ,QRXUH[SHUL PHQWDOHYDOXDWLRQZHGHPRQVWUDWHGWKDWWKHQHZO\LQWURGXFHGFOXV WHULQJDOJRULWKP) '%6&$1DFKLHYHVPXFKPRUHDFFXUDWHUHVXOWV WKDQ VWDWHRIWKHDUW FRPSDULVRQ SDUWQHUV ZLWKRXW VDFULILFLQJ HIIL FLHQF\

LVIRUERWKGDWDVHWVWKHPRVWHIIHFWLYHFOXVWHULQJDOJRULWKPRYHUWKH FRPSOHWHUDQJHRIr. )RUWKH$57GDWDVHWWKH(;3'%6&$1SHU IRUPVDOVRTXLWHZHOOEXWIRUKLJKHUGLPHQVLRQDOGDWDHJIRUWKH 3/$1(GDWDVHWLWVTXDOLW\LVPXFKZRUVHWKDQWKHTXDOLW\RIWKH ) '%6&$1 DSSURDFK 2Q WKH RWKHU KDQG WKH 81,21 DSSURDFK SHUIRUPVIRUVSDUVHKLJKGLPHQVLRQDOGDWDZHOOEXWXQIRUWXQDWHO\ QRWIRUWKHORZGLPHQVLRQDO$57GDWDVHW $QH[SODQDWLRQIRUWKHVXSHULRULW\RIWKH) '%6&$1DOJRULWKPFDQ EHIRXQGLQ)LJXUH,QWKLVILJXUHZHLQYHVWLJDWHWKHDFFXUDF\RI WKH FRUHSRLQW FODVVLILFDWLRQ RI WKH GLIIHUHQW DOJRULWKPV )RU WKH (;3'%6&$1DSSURDFKWKHSUHFLVLRQRIWKHGHWHFWHGFRUHREMHFWV LVYHU\KLJKEXWXQIRUWXQDWHO\WKHUHFDOOLVYHU\ORZLHWKHDSSURDFK IDLOVWRGHWHFWPDQ\FRUHREMHFWV7KXVZHKDYHYHU\RIWHQWKHVLWX DWLRQ GHSLFWHGLQ)LJXUHE6LPLODUREVHUYDWLRQVEXWPXFKPRUH SURQRXQFHGFDQEHPDGHIRUWKH,17(56(&7,21DSSURDFK)RU WKH81,21DSSURDFKWKHRSSRVLWHREVHUYDWLRQFDQEHPDGH7KH SUHFLVLRQRIWKLVDSSURDFKLVYHU\ORZDVWKH81,21DSSURDFKFODV VLILHVZD\WRPDQ\REMHFWVDVFRUHREMHFWVZKLFKDFWXDOO\DUHQRFRUH REMHFWV7KXVZHRIWHQKDYHVLWXDWLRQVVLPLODUWRWKHRQHGHSLFWHGLQ )LJXUH D 2Q WKH RWKHU KDQG IRU WKH ) '%6&$1DSSURDFK ERWK SUHFLVLRQDQGUHFDOODUHUDWKHUKLJK

,QRXUIXWXUHZRUNZHZLOOVKRZWKDWDOVRRWKHUGDWDPLQLQJDOJR ULWKPVZRUNLQJRQYDJXHLQIRUPDWLRQFDQEHQHILWIURPDGLUHFWLQWH JUDWLRQRIIX]]\GLVWDQFHIXQFWLRQV

 5()(5(1&(6 >@

>@ >@

)XUWKHUPRUHZHZRXOGOLNHWRPHQWLRQWKDWRXU ) '%6&$1DOJR ULWKP RXWSHUIRUPV WKH VHUYHUVLGHG FOXVWHULQJ DSSURDFKHV RI VWDWHRIWKHDUWGHQVLW\EDVHGGLVWULEXWHGFOXVWHULQJDOJRULWKPV> @ 7KH VHUYHUVLGHG DSSURDFK SUHVHQWHG LQ >@ FRUUHVSRQGVWR DQ ) '%6&$1DSSURDFKXVLQJDVDPSOHUDWHRI:HQRWLFHGWKDWLIZH XVHDVDPSOHUDWHVDURXQGLQVWHDGRIZHFDQLQFUHDVHWKHDYHU DJH TXDOLW\ YDOXHV FRQVLGHUDEO\ HJ IRU r  RQ WKH 3/$1( GDWDVHWZHFDQLQFUHDVHWKHTXDOLW\IURP V  WR V   )XUWKHUPRUH VPDOO VDPSOH UDWHV DOZD\V EHDU WKH ULVN RI H[WUHPH YDOXHVHJIRUr RQWKH3/$1(GDWDVHWZHQRWLFHGTXDOLW\ YDOXHVOHVVWKDQZKHQXVLQJDVDPSOHUDWHRI$VVDPSOHUDWHV KLJKHUWKDQGRQRWPXFKSD\RIIZHVXJJHVWWRXVHDVDPSOHUDWH RIWRJHWDJRRGWUDGHRIIEHWZHHQDFFXUDF\DQGHIILFLHQF\)XU WKHUPRUHWKHVHUYHUVLGHGDSSURDFKSUHVHQWHGLQ>@FRUUHVSRQGVWR WKH (;3'%6&$1 DSSURDFK )LJXUH  DQG  VKRZ WKDW WKH ) '%6&$1 DSSURDFK DOVR FOHDUO\ RXWSHUIRUPV WKLV FRPSDULVRQ SDUWQHU

>@ >@ >@ >@ >@ >@

 &21&/86,21

>@

,QWKLVSDSHUZHGHPRQVWUDWHGKRZGHQVLW\EDVHGFOXVWHULQJFDQEH FDUULHGRXWEDVHGRQYDJXHDQGXQFHUWDLQLQIRUPDWLRQZKLFKRIWHQ RFFXUV LQ PRGHUQ DSSOLFDWLRQ UDQJHV OLNH VHQVRU GDWDEDVHV VSD WLDOWHPSRUDODSSOLFDWLRQVDQGELRPHWULFLQIRUPDWLRQV\VWHPV%H

>@

 ,Q WKH ) '%6&$1 DSSURDFK ZH FODVVLILHG DQ REMHFW R DV FRUH REMHFWLII 3eFRUH , m, G, ' ( R ) ˜KROGV

>@

>@

677

%UDFHZHOO 5 7KH ,PSXOVH 6\PERO &K  LQ 7KH )RXULHU 7UDQVIRUPDQG,WV$SSOLFDWLRQVUGHG0F*UDZ+LOO &KHQJ5.DODVKQLNRY'93UDEKDNDU6(YDOXDWLQJSURE DELOLVWLF TXHULHV RYHU LPSUHFLVH GDWD 6,*02'¶ SS  (VWHU0.ULHJHO+36DQGHU-;X;$'HQVLW\%DVHG $OJRULWKP IRU 'LVFRYHULQJ &OXVWHUV LQ /DUJH 6SDWLDO 'DWD EDVHVZLWK1RLVH.''¶SS *XWWPDQ$5WUHHV$'\QDPLF,QGH[6WUXFWXUHIRU6SDWLDO 6HDUFKLQJ3URF6,*02'¶SS +|SSQHU).ODZRQQ).UXVH55XQNOHU7)X]]\&OXVWHU $QDO\VLV:LOH\   -DQX]DM(.ULHJHO+33IHLIOH06FDODEOH'HQVLW\%DVHG 'LVWULEXWHG&OXVWHULQJ3.'' SS -DLQ $ . 0XUW\ 0 1 )O\QQ 3 - 'DWD &OXVWHULQJ $ 5HYLHZ$&0&RPSXWLQJ6XUYH\V9RO1R6HS SS .ULHJHO+3%UHFKHLVHQ6.U|JHU33IHLIOH06FKXEHUW 08VLQJ6HWVRI)HDWXUH9HFWRUVIRU6LPLODULW\6HDUFKRQ 9R[HOL]HG&$'2EMHFWV6,*02' SS .ULHJHO+3.XQDWK33IHLIOH05HQ]0$SSUR[LPDWHG &OXVWHULQJ RI 'LVWULEXWHG +LJK 'LPHQVLRQDO 'DWD 3$.''¶SS .ULHJHO+3.DLOLQJ.3U\DNLQ$6FKXEHUW0&OXVWHULQJ 0XOWL5HSUHVHQWHG 2EMHFWV ZLWK 1RLVH 3$.''¶ SS  .ULHJHO+33IHLIOH00HDVXULQJWKH4XDOLW\RI$SSUR[L PDWHG&OXVWHULQJV%7:¶SS /L