glejvalued distance functions standard data mining algorithms can ... Algorithms.
Keywords density based clustering, uncertain data, fuzzy distance functions. 1.
Research Track Poster
Density-Based Clustering of Uncertain Data Hans-Peter Kriegel
Martin Pfeifle
University of Munich, Germany Institute for Computer Science
University of Munich, Germany Institute for Computer Science
[email protected]
[email protected]
$%675$&7
,QRUGHUWRH[WUDFWNQRZOHGJHIURPWKHVHIX]]\REMHFWGHVFULSWLRQV E\ PHDQV RI VWDQGDUG GDWD PLQLQJ DOJRULWKPV WKH VLPLODULW\ EH WZHHQWKHREMHFWVKDVWREHPHDVXUHGE\RQHQXPHULFDOYDOXHLH WKHFRPSOHWHIX]]\GLVWDQFHLQIRUPDWLRQLVDJJUHJDWHGE\RQO\RQH GLVWDQFHYDOXH2EYLRXVO\DJJUHJDWLRQJRHVKDQGLQKDQGZLWKLQ IRUPDWLRQORVV)RULQVWDQFHZHKDYHQRLQIRUPDWLRQDERXWWKHGH JUHHRIXQFHUWDLQW\RIVXFKDVLQJOHGLVWDQFHYDOXH(YHQLIZHKDG RQH LW ZRXOG EH RI QR XVH EHFDXVH WUDGLWLRQDO GDWD PLQLQJ DOJR ULWKPVHJFOXVWHULQJDOJRULWKPVFDQQRWKDQGOHWKLVDGGLWLRQDOLQ IRUPDWLRQ
,QPDQ\GLIIHUHQWDSSOLFDWLRQDUHDVHJVHQVRUGDWDEDVHVORFDWLRQ EDVHGVHUYLFHVRUIDFHUHFRJQLWLRQV\VWHPVGLVWDQFHVEHWZHHQRE MHFWVKDYHWREHFRPSXWHGEDVHGRQYDJXHDQGXQFHUWDLQGDWD&RP PRQO\ WKH GLVWDQFHV EHWZHHQ WKHVH XQFHUWDLQ REMHFW GHVFULSWLRQV DUHH[SUHVVHGE\RQHQXPHULFDOGLVWDQFHYDOXH%DVHGRQVXFKVLQ JOHYDOXHGGLVWDQFHIXQFWLRQVVWDQGDUGGDWDPLQLQJDOJRULWKPVFDQ ZRUNZLWKRXWDQ\FKDQJHV,QWKLVSDSHUZHSURSRVHWRH[SUHVVWKH VLPLODULW\EHWZHHQWZRIX]]\REMHFWVE\GLVWDQFHSUREDELOLW\IXQF WLRQV7KHVHIX]]\GLVWDQFHIXQFWLRQVDVVLJQDSUREDELOLW\YDOXHWR HDFK SRVVLEOH GLVWDQFH YDOXH %\ LQWHJUDWLQJ WKHVH IX]]\ GLVWDQFH IXQFWLRQVGLUHFWO\LQWRGDWDPLQLQJDOJRULWKPVWKHIXOOLQIRUPDWLRQ SURYLGHGE\WKHVHIXQFWLRQVLVH[SORLWHG,QRUGHUWRGHPRQVWUDWHWKH EHQHILWV RI WKLV JHQHUDO DSSURDFK ZH HQKDQFH WKH GHQVLW\EDVHG FOXVWHULQJDOJRULWKP'%6&$1VRWKDWLWFDQZRUNGLUHFWO\RQWKHVH IX]]\ GLVWDQFH IXQFWLRQV ,Q D GHWDLOHG H[SHULPHQWDO HYDOXDWLRQ EDVHGRQDUWLILFLDODQGUHDOZRUOGGDWDVHWVZHVKRZWKHFKDUDFWHU LVWLFVDQGEHQHILWVRIRXUQHZDSSURDFK
,QWKLVSDSHUZHSURSRVHWRXVHIX]]\GLVWDQFHIXQFWLRQVWRPHDVXUH WKHVLPLODULW\EHWZHHQIX]]\REMHFWV&RQWUDU\WRWKHWUDGLWLRQDODS SURDFKHVZHGRQRWH[WUDFWDJJUHJDWHGYDOXHVIURPWKHIX]]\GLV WDQFHIXQFWLRQVEXWSURSRVHWRHQKDQFHWKHGDWDPLQLQJDOJRULWKPV VRWKDWWKH\FDQH[SORLWWKHIXOOLQIRUPDWLRQSURYLGHGE\WKHVHIXQF WLRQV$VIRUPDQ\LPSRUWDQWDSSOLFDWLRQUDQJHVZKHUHIX]]\GLV WDQFHIXQFWLRQVQDWXUDOO\RFFXUHJWKHFOXVWHULQJRIPRYLQJRE MHFWVGHQVLW\EDVHGFOXVWHULQJDOJRULWKPVVHHPWREHWKHPHWKRGRI FKRLFH>@ZHGHPRQVWUDWHLQWKLVSDSHUKRZIX]]\GLVWDQFHIXQF WLRQVFDQEHLQWHJUDWHGLQWRWKHGHQVLW\EDVHGFOXVWHULQJDOJRULWKP '%6&$1 >@ :H FDOO WKH UHVXOWLQJ FOXVWHULQJ DOJRULWKPV ) '%6&$1LQGLFDWLQJWKDWLWLVDSSOLFDEOHWRFOXVWHUIX]]\REMHFWV
&DWHJRULHVDQG6XEMHFW'HVFULSWRUV *>3UREDELOLW\DQG6WDWLVWLFV@3UREDELOLVWLFDOJRULWKPVLQFOXGLQJ 0RQWH&DUOR
*HQHUDO7HUPV
7KHUHPDLQGHURIWKLVSDSHULVRUJDQL]HGDVIROORZV,Q6HFWLRQZH SUHVHQWWKHUHODWHGZRUNLQWKHDUHDRIGHQVLW\EDVHGFOXVWHULQJRI XQFHUWDLQGDWD,Q6HFWLRQZHLQWURGXFHIX]]\GLVWDQFHIXQFWLRQV ,Q6HFWLRQZHVKRZKRZZHFDQLQWHJUDWHWKHVHIXQFWLRQVLQWRWKH GHQVLW\EDVHGFOXVWHULQJDOJRULWKP'%6&$1,Q6HFWLRQZHZLOO H[SHULPHQWDOO\VKRZWKHEHQHILWRIRXUQHZIX]]\FOXVWHULQJDOJR ULWKP ) '%6&$1 :HZLOOFORVHWKLVSDSHU LQ 6HFWLRQZLWK D VKRUWVXPPDU\DQGDIHZUHPDUNVRQIXWXUHZRUN
$OJRULWKPV
.H\ZRUGV GHQVLW\EDVHGFOXVWHULQJXQFHUWDLQGDWDIX]]\GLVWDQFHIXQFWLRQV
,1752'8&7,21 ,QPDQ\PRGHUQDSSOLFDWLRQUDQJHVHJWKHFOXVWHULQJRIPRYLQJ REMHFWV>@RUVHQVRUGDWDEDVHV>@RQO\XQFHUWDLQGDWDLVDYDLODEOH )RULQVWDQFHLQWKHDUHDRIPRELOHVHUYLFHVWKHREMHFWVFRQWLQXRXVO\ FKDQJHWKHLUSRVLWLRQVVRWKDWH[DFWSRVLWLRQDOLQIRUPDWLRQLVRIWHQ QRWDYDLODEOH ,Q RWKHUDSSOLFDWLRQ DUHDV VXFK DV WKH FOXVWHULQJRI GLVWULEXWHGIHDWXUHYHFWRUV>@GXHWRVHFXULW\DVSHFWVRUWROLPLW HG EDQGZLGWK RQO\ DSSUR[LPDWHG LQIRUPDWLRQ LV WUDQVPLWWHG WR D FHQWUDOVHUYHUVLWH
5(/$7(':25. *LYHQDVHWRIREMHFWVZLWKDGLVWDQFHIXQFWLRQRQWKHPDQLQWHUHVW LQJGDWDPLQLQJTXHVWLRQLVZKHWKHUWKHVHREMHFWVQDWXUDOO\IRUP JURXSVFDOOHGFOXVWHUV DQGZKDWWKHVHJURXSVORRNOLNH'DWDPLQ LQJDOJRULWKPVWKDWWU\WRDQVZHUWKLVTXHVWLRQDUHFDOOHGFOXVWHULQJ DOJRULWKPV,Q6HFWLRQZHVKRUWO\FODVVLI\FOXVWHULQJDOJRULWKPV DFFRUGLQJWRGLIIHUHQWFDWHJRUL]DWLRQVFKHPHV7KHQLQ6HFWLRQ ZHSUHVHQWWKHEDVLFFRQFHSWVRIIX]]\FOXVWHULQJDOJRULWKPVDQG GHVFULEHKRZWKHDSSURDFKRIWKLVSDSHUGLIIHUVIURPWKHIX]]\FOXV WHULQJ DSSURDFKHV SUHVHQWHG LQ WKH OLWHUDWXUH ,Q 6HFWLRQ ZH SUHVHQWWKHGHQVLW\EDVHGFOXVWHULQJDOJRULWKP'%6&$1LQDOHYHO RIGHWDLOZKLFKLVLQGLVSHQVDEOHWRXQGHUVWDQGWKHUHPDLQGHURIWKLV SDSHU$VIX]]\REMHFWVFDQDOVREHUHJDUGHGDVPXOWLUHSUHVHQWHG REMHFWVZHZLOOILQDOO\SUHVHQWDGHQVLW\EDVHGFOXVWHULQJDSSURDFK ZKLFKLVVXLWDEOHIRUFOXVWHULQJPXOWLUHSUHVHQWHGREMHFWV
3HUPLVVLRQWRPDNHGLJLWDORUKDUGFRSLHVRIDOORUSDUWRIWKLVZRUNIRU SHUVRQDORUFODVVURRPXVHLVJUDQWHGZLWKRXWIHHSURYLGHGWKDWFRSLHVDUH QRW PDGH RU GLVWULEXWHG IRU SURILW RU FRPPHUFLDO DGYDQWDJH DQG WKDW FRSLHVEHDUWKLVQRWLFHDQGWKHIXOOFLWDWLRQRQWKHILUVWSDJH7RFRS\ RWKHUZLVHRUUHSXEOLVKWRSRVWRQVHUYHUVRUWRUHGLVWULEXWHWROLVWVUH TXLUHVSULRUVSHFLILFSHUPLVVLRQDQGRUDIHH .''¶$XJXVW&KLFDJR,OOLQRLV86$ &RS\ULJKW$&0;
672
Research Track Poster
&OXVWHULQJ$OJRULWKPV
'%6&$1$IODWGHQVLW\EDVHGFOXVWHULVGHILQHGDVDVHWRIGHQVL W\FRQQHFWHGREMHFWVZKLFKLVPD[LPDOZUWGHQVLW\UHDFKDELOLW\ 7KHQWKHQRLVHLVWKHVHWRIREMHFWVQRWFRQWDLQHGLQDQ\FOXVWHU7KXV DFOXVWHUFRQWDLQVQRWRQO\FRUHREMHFWVEXWDOVRREMHFWVWKDWGRQRW VDWLVI\WKHFRUHREMHFWFRQGLWLRQ7KHVHERUGHUREMHFWVDUHGLUHFWO\ GHQVLW\UHDFKDEOHIURPDWOHDVWRQHFRUHREMHFWRIWKHFOXVWHU
&OXVWHULQJDOJRULWKPVFDQEHFODVVLILHGDORQJGLIIHUHQWLQGHSHQGHQW GLPHQVLRQV 2QH ZHOONQRZQ GLPHQVLRQ FDWHJRUL]HV FOXVWHULQJ PHWKRGVDFFRUGLQJWRWKHUHVXOWWKH\SURGXFH+HUHZHFDQGLVWLQ JXLVKEHWZHHQKLHUDUFKLFDODQGSDUWLWLRQLQJFOXVWHULQJDOJRULWKPV >@3DUWLWLRQLQJDOJRULWKPVFRQVWUXFWDIODWVLQJOHOHYHO SDUWLWLRQRI DGDWDEDVH'RIQREMHFWVLQWRDVHWRINFOXVWHUVVXFKWKDWWKHREMHFWV LQDFOXVWHUDUHPRUHVLPLODUWRHDFKRWKHUWKDQWRREMHFWVLQGLIIHUHQW FOXVWHUV $QRWKHU GLPHQVLRQ DFFRUGLQJ WR ZKLFK ZH FDQ FODVVLI\ FOXVWHULQJDOJRULWKPVLVIURPDQDOJRULWKPLFSRLQWRIYLHZ+HUHZH FDQGLVWLQJXLVKEHWZHHQRSWLPL]DWLRQEDVHGRUGLVWDQFHEDVHGDOJR ULWKPVDQGGHQVLW\EDVHGDOJRULWKPV'HQVLW\EDVHGDOJRULWKPVDS SO\DORFDOFOXVWHUFULWHULRQ&OXVWHUVDUHUHJDUGHGDVUHJLRQVLQWKH GDWDVSDFHLQZKLFKWKHREMHFWVDUHGHQVHDQGZKLFKDUHVHSDUDWHG E\ UHJLRQV RI ORZ REMHFW GHQVLW\ QRLVH ,Q WKLV SDSHU ZH ZLOO SUHVHQW DQ H[WHQVLRQ IRU WKH SDUWLWLRQLQJ GHQVLW\EDVHG FOXVWHULQJ DOJRULWKP'%6&$1>@)RUDPRUHGHWDLOHGJHQHUDORYHUYLHZRQ FOXVWHULQJDOJRULWKPVZHUHIHUWKHLQWHUHVWHGUHDGHUWR>@
7KHDOJRULWKP'%6&$1>@ZKLFKGLVFRYHUVWKHFOXVWHUVDQGWKH QRLVHLQDGDWDEDVHLVEDVHGRQWKHIDFWWKDWDFOXVWHULVHTXLYDOHQWWR WKHVHWRIDOOREMHFWVLQ'ZKLFKDUHGHQVLW\UHDFKDEOHIURPDQDUEL WUDU\FRUHREMHFWLQWKHFOXVWHUFIOHPPDDQGLQ>@ 7KHUH WULHYDORIGHQVLW\UHDFKDEOHREMHFWVLVSHUIRUPHGE\LWHUDWLYHO\FRO OHFWLQJ GLUHFWO\ GHQVLW\UHDFKDEOH REMHFWV '%6&$1 FKHFNV WKH eQHLJKERUKRRGRIHDFKSRLQWLQWKHGDWDEDVH,IWKHeQHLJKERUKRRG 1eR RIDSRLQWRKDVPRUHWKDQmHOHPHQWVRLVDVRFDOOHGFRUH SRLQWDQGDQHZFOXVWHU&FRQWDLQLQJWKHREMHFWVLQ1eR LVFUHDWHG 7KHQWKHeQHLJKERUKRRGRIDOOSRLQWVSLQ&ZKLFKKDYHQRW\HW EHHQSURFHVVHGLVFKHFNHG,I1eS FRQWDLQVPRUHWKDQmSRLQWVWKH QHLJKERUVRISZKLFKDUHQRWDOUHDG\FRQWDLQHGLQ&DUHDGGHGWRWKH FOXVWHUDQGWKHLUeQHLJKERUKRRGLVFKHFNHGLQWKHQH[WVWHS7KLV SURFHGXUHLVUHSHDWHGXQWLOQRQHZSRLQWFDQEHDGGHGWRWKHFXUUHQW FOXVWHU&7KHQWKHDOJRULWKPFRQWLQXHVZLWKDSRLQWZKLFKKDVQRW \HWEHHQSURFHVVHGWU\LQJWRH[SDQGDQHZFOXVWHU
)X]]\&OXVWHULQJ ,QUHDODSSOLFDWLRQVWKHUHLVYHU\RIWHQQRVKDUSERXQGDU\EHWZHHQ FOXVWHUVVRWKDWIX]]\FOXVWHULQJLVRIWHQEHWWHUVXLWHGIRUWKHGDWD 0HPEHUVKLSGHJUHHVEHWZHHQ]HURDQGRQHDUHXVHGLQIX]]\FOXV WHULQJLQVWHDGRIFULVSDVVLJQPHQWVRIWKHGDWDWRFOXVWHUV,QFRQWUDVW WRIX]]\FOXVWHULQJDOJRULWKPVZKHUHREMHFWVDUHDVVLJQHGWRGLIIHU HQWFOXVWHUVLQWKLVSDSHUZHFOXVWHUIX]]\REMHFWUHSUHVHQWDWLRQV DQGDVVLJQHDFKIX]]\REMHFWWRH[DFWO\RQHFOXVWHU)RUPRUHGHWDLOV DERXWIX]]\FOXVWHULQJDOJRULWKPVZHUHIHUWKHUHDGHUWR>@
&OXVWHULQJRI0XOWL5HSUHVHQWHG2EMHFWV ,Q PDQ\ GLIIHUHQW DSSOLFDWLRQ UDQJHV VHYHUDO UHSUHVHQWDWLRQV IRU HDFKREMHFWH[LVWHJPROHFXOHVDUHFKDUDFWHUL]HGE\DQDPLQRDFLG VHTXHQFHDVHFRQGDU\VWUXFWXUHDQGD'UHSUHVHQWDWLRQ)X]]\RE MHFWV FI 'HILQLWLRQ FDQ DOVR EH UHJDUGHG DV PXOWLUHSUHVHQWHG REMHFWV ,Q >@ D GHQVLW\EDVHG DSSURDFK IRU FOXVWHULQJ VXFK PXOWLUHSUHVHQWHG REMHFWV ZDV SURSRVHG ZKLFK LV EDVHG RQ '% 6&$17RGHWHUPLQHDFOXVWHULQJZKLFKWDNHVDOOUHSUHVHQWDWLRQV LQWRDFFRXQWWKHEDVLFGHILQLWLRQVRI'%6&$1LHWKHFRUHREMHFW GHILQLWLRQDQGWKHUHDFKDELOLW\GHILQLWLRQDUHH[WHQGHG7KHUHE\WKH eQHLJKERUKRRGVRIHDFKUHSUHVHQWDWLRQDUHFRPELQHGWRDJOREDO QHLJKERUKRRG)RUVSDUVHGDWDVHWVWKHXQLRQPHWKRGZDVSURSRVHG ZKLFKDVVXPHVWKDWDQREMHFWLVDFRUHREMHFWLImREMHFWVDUHIRXQG ZLWKLQWKHXQLRQRIDOOeQHLJKERUKRRGVRIDOOUHSUHVHQWDWLRQV)XU WKHUPRUHWKHLQWHUVHFWLRQPHWKRGZDVLQWURGXFHGZKHUHDQREMHFWLV DFRUHREMHFWLIDWOHDVWmREMHFWVDUHZLWKLQWKHLQWHUVHFWLRQRIDOO eQHLJKERUKRRGVRIDOOUHSUHVHQWDWLRQV,QRXUH[SHULPHQWDOHYDOXD WLRQZHZLOOXVHWKHDSSURDFKSUHVHQWHGLQ>@DVFRPSDULVRQSDUW QHU$VDVLGHHIIHFWRIWKLVSDSHULWEHFRPHVFOHDUWKDWDVOLJKWDGDS WLRQRIWKH) '%6&$1DOJRULWKPZRXOGEHPXFKPRUHVXLWDEOHIRU FOXVWHULQJPXOWLUHSUHVHQWHGREMHFWVWKDQWKHDSSURDFKHVLQWURGXFHG LQ>@
'HQVLW\EDVHG&OXVWHULQJ 7KHNH\LGHDRIGHQVLW\EDVHGFOXVWHULQJLVWKDWIRUHDFKREMHFWRID FOXVWHUWKHQHLJKERUKRRGRIDJLYHQUDGLXVeKDVWRFRQWDLQDWOHDVWD PLQLPXPQXPEHURImREMHFWVLHWKHFDUGLQDOLW\RIWKHQHLJKERU KRRG KDV WR H[FHHG D JLYHQ WKUHVKROG ,Q WKH IROORZLQJ ZH ZLOO SUHVHQWWKHEDVLFGHILQLWLRQVRIGHQVLW\EDVHGFOXVWHULQJ 'HILQLWLRQ&RUH2EMHFW 2EMHFWRLVFDOOHGDFRUHREMHFWZUWeDQGmLQDVHWRIREMHFWV'LI _1eR _mZKHUH1eR GHQRWHVWKHVXEVHWRI'FRQWDLQHGLQWKH eQHLJKERUKRRGRIR 'HILQLWLRQ'LUHFWO\'HQVLW\5HDFKDEOH 2EMHFWSLVGLUHFWO\GHQVLW\UHDFKDEOHIURPREMHFWRZUWeDQGmLQ DVHWRIREMHFWV'LIRLVDFRUHREMHFWDQGS³1eR ZKHUHDJDLQ 1eR GHQRWHVWKHVXEVHWRI'FRQWDLQHGLQWKHeQHLJKERUKRRGRIR 1RWHWKDWREMHFWVFDQEHGLUHFWO\GHQVLW\UHDFKDEOHRQO\IURPFRUH REMHFWV
)8==@)RULQVWDQFHWKLQNRIVLWXDWLRQVZKHUH WKHFHQWURLGVDUHFORVHWRHDFKRWKHUEXWGXHWRDUDWKHUKLJKIX]]L QHVVRIWKHREMHFWVWKHGLVWDQFHH[SHFWDWLRQYDOXHVLQGLFDWHDUDWKHU KLJKGLVWDQFHEHWZHHQWKHREMHFWV,QWKLVFDVHZKHUHLWLVQRWYHU\ OLNHO\WKDWWKHREMHFWVIRUPDFOXVWHUWKHFHQWURLGDSSURDFKZRXOG ZURQJO\ GHWHFW FOXVWHUV DQG WKH H[SHFWDWLRQ DSSURDFK ZRXOG FRU UHFWO\GHWHFWQRFOXVWHUV
VWDQFHDQREMHFWLVORFDWHGVRPHZKHUHZLWKLQDPRYLQJPLFURFOXV WHUUHSUHVHQWHGE\DUHFWDQJOHDQGLQ>@DQREMHFWLVORFDWHGVRPH ZKHUHLQDK\SHUVSKHUH,Q>@GLPHQVLRQDOSUREDELOLW\GHQVLW\ IXQFWLRQVSGI DUHXVHGWRGHVFULEHDWWULEXWHVRIXQFHUWDLQVHQVRU GDWD:HH[WHQGWKLVDSSURDFKDQGSURSRVHWRGHVFULEHDQREMHFWQR ORQJHU E\ RQH VLQJOH IHDWXUH YHFWRU EXW E\ D SUREDELOLW\ GHQVLW\ IXQFWLRQLQGLFDWLQJWKHOLNHOLKRRGWKDWDQREMHFWLVORFDWHGDWDFHU WDLQSRVLWLRQ 'HILQLWLRQ)X]]\2EMHFW5HSUHVHQWDWLRQ /HW R ³ ' ² ,5 G EH DQ REMHFW IURP D GDWDEDVH $ IX]]\ REMHFW UHSUHVHQWDWLRQLVDIXQFWLRQRIX]]\ ,5 G ,5 IRUZKLFKWKH IROORZLQJFRQGLWLRQKROGV × × RIX]]\ ( Y ) GY = ,5
) '%6&$1
G
,QWKLVVHFWLRQZHZLOOGHVFULEHRXUH[WHQGHGFOXVWHULQJDOJRULWKP ) '%6&$1ZKLFKGRHVQRWUHO\RQORVV\DJJUHJDWHGLQIRUPDWLRQ EXWH[SORLWVWKHFRPSOHWHLQIRUPDWLRQSURYLGHGE\WKHIX]]\GLV WDQFHIXQFWLRQV:HILUVWSUHVHQWWKHIRUPDOGHILQLWLRQVXQGHUO\LQJ WKH) '%6&$1DOJRULWKPFI6HFWLRQ EHIRUHZHORRNDWFRP SXWDWLRQDODVSHFWVFI6HFWLRQ
'LVWDQFH)XQFWLRQVEHWZHHQ)X]]\2EMHFWV 7UDGLWLRQDOGDWDPLQLQJDOJRULWKPVUHTXLUHGLVWDQFHIXQFWLRQVZKLFK H[SUHVVWKHVLPLODULW\EHWZHHQWZRREMHFWVE\H[DFWO\RQHQXPHULFDO YDOXH,QWKLVVHFWLRQZHLQWURGXFHGLVWDQFHIXQFWLRQVZKLFKGRQRW H[SUHVVWKHVLPLODULW\EHWZHHQWZRREMHFWVE\DVLQJOHQXPHULFDO YDOXH,QVWHDGZHSURSRVHWRXVHIX]]\GLVWDQFHIXQFWLRQVZKHUHWKH VLPLODULW\ EHWZHHQ WZR IX]]\ REMHFWV LV H[SUHVVHG E\ PHDQVRID SUREDELOLW\IXQFWLRQZKLFKDVVLJQVDQXPHULFDOYDOXHWRHDFKGLV WDQFHYDOXH7ZRIX]]\GLVWDQFHIXQFWLRQVDUHWKHGLVWDQFHGHQVLW\ IXQFWLRQDQGWKHGLVWDQFHGLVWULEXWLRQIXQFWLRQ
7KHRUHWLFDO)RXQGDWLRQV 7KHDOJRULWKP) '%6&$1LVEDVHGRQDQHQKDQFHGYHUVLRQRIWKH FRUHREMHFWGHILQLWLRQFI'HILQLWLRQ 7KHFRUHREMHFWSUREDELOLW\ RIDQREMHFWRLQGLFDWHVWKHOLNHOLKRRGWKDWRLVDFRUHREMHFW 'HILQLWLRQ&RUH2EMHFW3UREDELOLW\ /HW' EHDGDWDEDVHDQGOHW3G ' ' ,5 >@ EHD GLVWDQFHGLVWULEXWLRQIXQFWLRQ7KHQWKHFRUHREMHFWSUREDELOLW\RI DQREMHFWRLVGHILQHGDV
'HILQLWLRQ'LVWDQFH'HQVLW\)XQFWLRQ /HW G ' ' ,5 EH D GLVWDQFH IXQFWLRQ DQG OHW 3 ( D G ( R, R ) E ) GHQRWHWKHSUREDELOLW\WKDWGRR¶ LVEHWZHHQD DQG E 7KHQ D SUREDELOLW\ GHQVLW\ IXQFWLRQ SG ' ' ,5 ,5 LVFDOOHGDGLVWDQFHGHQVLW\IXQFWLRQLIWKHIROORZLQJ FRQGLWLRQKROGV 3 ( D G ( R, R ) E ) =
FRUH
3 e, m, G, ' ( R ) =
Ê º 3G ( S, R ) ( e ) º( – 3G ( S , R ) ( e ) )
E
×D SG ( R, R ) ( [ ) G[
$²' S³$ $ m
,IWKHGLVWDQFHt GRR¶ EHWZHHQWZRREMHFWVFDQH[DFWO\EHGHWHU PLQHGWKHSUREDELOLW\GHQVLW\IXQFWLRQSGLVHTXDOWRWKHGLUDFGHOWD IXQFWLRQ dLH SGR R¶ [ d[t). )RU DUELWUDU\ IXQFWLRQV I HJ I[ WKHGLUDFGHOWDIXQFWLRQKDVWKHIROORZLQJLPSRUWDQWSURS HUW\ E
× I ( [ )d ( [ – t ) G[ =
D
Ñ Ô Ò Ô Ó
I ( t ) LI ( D t E ) RWKHUZLVH
S ³ '?$
/HPPD7KHFRUHREMHFWSUREDELOLW\ 3eFRUH , m, G, ' ( R ) LVHTXDOWRWKH
SUREDELOLW\YDOXH3_1eR _ m LQGLFDWLQJWKHOLNHOLKRRGWKDWRLVD FRUHREMHFW 3URRI,Q'HILQLWLRQZHGHWHUPLQHIRUHDFKVXEVHW$RI'KDYLQJ DFDUGLQDOLW\KLJKHUWKDQmWKHSUREDELOLW\WKDWRQO\WKHSRLQWVRI$ DUHZLWKLQDQeUDQJHRIREXWQRSRLQWVRI' ? $7KHVXPRIDOOWKHVH SUREDELOLW\YDOXHVLQGLFDWHVWKHSUREDELOLW\WKDWRLVDFRUHREMHFW LH 3eFRUH , m, G, ' ( R ) 3_1eR _ m
>@
6LPLODUWRGLVWDQFHGHQVLW\IXQFWLRQVZHFDQGHILQHGLVWDQFHGLVWUL EXWLRQIXQFWLRQV
1RWHWKDWWKHWUDGLWLRQDOGHILQLWLRQRIDFRUHREMHFWFDQDOVREHUH JDUGHGDVDIXQFWLRQZKLFKDVVLJQVWRHDFKREMHFWRDYDOXHHTXDOWR LIIRLVDFRUHREMHFWDQGRWKHUZLVH,IWKHGLVWDQFHGLVWULEXWLRQ IXQFWLRQ3G\LHOGVRQO\YDOXHVDQGDWSRVLWLRQeWKHWUDGLWLRQDO DQGWKHSUREDELOLW\GHILQLWLRQRIDFRUHREMHFWFRLQFLGH
'HILQLWLRQ'LVWDQFH'LVWULEXWLRQ)XQFWLRQ /HWG' ' ,5 EHDGLVWDQFHIXQFWLRQDQGOHW 3 ( G ( R R ) E ) GHQRWHWKHSUREDELOLW\WKDWGRR¶ LVVPDOOHUWKDQE7KHQDSURED ELOLW\GLVWULEXWLRQIXQFWLRQ3G2 2 ,5 >@ LVFDOOHGD GLVWDQFHGLVWULEXWLRQIXQFWLRQLIWKHIROORZLQJFRQGLWLRQKROGV
)LJXUHVKRZVKRZRXUSUREDELOLW\GHILQLWLRQRIDFRUHREMHFWGLI IHUVIURPWKH³WUDGLWLRQDO´DSSURDFKZKHUHWKHVLPLODULW\EHWZHHQ IX]]\REMHFWVLVPHDVXUHGE\WKHLUGLVWDQFHH[SHFWDWLRQYDOXHV$O WKRXJKWKHREMHFWRLQ)LJXUHDGRHVQRWVHHPWREHORFDWHGLQD YHU\GHQVHDUHDLWLVDFRUHREMHFWDFFRUGLQJWRWKHWUDGLWLRQDODS SURDFKDVWKHGLVWDQFHH[SHFWDWLRQYDOXHEHWZHHQRDQGm RWKHU REMHFWVLVVPDOOHUWKDQe2QWKHRWKHUKDQGLWLVYHU\XQOLNHO\WKDW DOOmREMHFWVDUHLQGHHGORFDWHGLQ1eR 7KHUHIRUHWKHSUREDELOLW\ WKDWRLVDFRUHREMHFWLVYHU\VPDOO,Q)LJXUHEWKHUHYHUVHVLWXDWLRQ
3 G ( R, R ) ( E ) = 3 ( G ( R R ) E ) /HWXVQRWHWKDW 3 G ( R, R ) ( E ) = ×–E S G ( R, R ) ( [ ) G[ KROGVDQGWKDW WKHUHIRUHSGDQG3GFRQWDLQEDVLFDOO\WKHVDPHLQIRUPDWLRQ $VDOUHDG\PHQWLRQHGWUDGLWLRQDODOJRULWKPVFDQRQO\KDQGOHGLV WDQFH IXQFWLRQV ZKLFK \LHOG D XQLTXH GLVWDQFH YDOXH ,Q RUGHU WR PDNHRXUIX]]\GLVWDQFHIXQFWLRQVXVHIXOIRUVWDQGDUGFOXVWHULQJ DOJRULWKPVZHFRXOGH[WUDFWDQDJJUHJDWHGYDOXHRIWKHP)RULQ
674
Research Track Poster
UHDFKDEOHIURPWKHFXUUHQWTXHU\REMHFWR7KHIX]]\YHUVLRQ) '% 6&$1ZRUNVYHU\VLPLODUWRWKHWUDGLWLRQDODSSURDFK$QREMHFWSLV DGGHGWRWKHFXUUHQWFOXVWHULIWKHYDOXH 3 eUHDFK , m, G, ' ( S, R ) H[FHHGV ZKHUHRLVWKHFXUUHQWTXHU\REMHFW1RWHWKDWLI 3eFRUH , m – , G, '? { S } ( R ) KROGVIRUQRREMHFWSWKHYDOXH 3 eUHDFK , m, G, ' ( S, R ) FDQH[FHHG 7KHUHIRUHSZLOOQRWEHDGGHGWRWKHFXUUHQWFOXVWHU$JDLQWKLV LVDJHQHUDOL]DWLRQRIWKHWUDGLWLRQDODSSURDFK
E
D R
e
e R
• core object according to the traditional approach based on the distance expectation values Ed. • very unlikely a core object according to the probability approach of Definition 7.
• no core object according to the traditional approach based on the distance expectation values Ed. • very likely a core object according to the probability approach of Definition 7.
)LJXUH'HWHUPLQDWLRQRIFRUHSRLQWSURSHUW\m
7KH UHPDLQLQJ TXHVWLRQ LV KRZ WR FRPSXWH WKH YDOXHV 3 eUHDFK , m, G, ' ( S, R ) HIILFLHQWO\ $OWKRXJK WKHUH PLJKW H[LVW VLWXDWLRQV ZKHUHZHFDQFRPSXWHWKHVHYDOXHVGLUHFWO\EDVHGRQWKHIX]]\RE MHFWUHSUHVHQWDWLRQVFI'HILQLWLRQ LQWKLVSDSHUZHSURSRVHD JHQHUDOO\DSSOLFDEOHDSSURDFKEDVHGRQPRQWHFDUORVDPSOLQJ,Q PDQ\DSSOLFDWLRQVWKHIX]]\REMHFWVPLJKWDOUHDG\EHGHVFULEHGE\ DGLVFUHWHSUREDELOLW\GHQVLW\IXQFWLRQLHZHKDYHWKHVDPSOHVHW DOUHDG\,IWKHIX]]\REMHFWLVGHVFULEHGE\DFRQWLQXRVSUREDELOLW\ GHQVLW\IXQFWLRQZHFDQHDVLO\VDPSOHDFFRUGLQJWRWKLVIXQFWLRQ DQGGHULYHWKXVDVHTXHQFHRIVDPSOHV,QWKHIROORZLQJZHDVVXPH WKDWHDFKREMHFW[LVUHSUHVHQWHGE\DVHTXHQFHRIVVDPSOHSRLQWV LH[LVUHSUHVHQWHGE\VGLIIHUHQWUHSUHVHQWDWLRQV[[V!
LVVNHWFKHG2EMHFWRLVORFDWHGLQDYHU\GHQVHDUHDEXWWKHUHGRQRW H[LVWmREMHFWVSIRUZKLFK ( G ( R, S ) m KROGV7KHUHIRUHRLVQR FRUHREMHFWDFFRUGLQJWRWKHWUDGLWLRQDODSSURDFKDOWKRXJKLWLVYHU\ OLNHO\WKDWWKHUHH[LVWmHOHPHQWVSIRUZKLFK G ( R, S ) e KROGV
%DVHG RQ WKH VDPSOH VHTXHQFHV ZH FRXOG QRZ FRPSXWH GLVFUHWH GLVWDQFHGHQVLW\IXQFWLRQVFRQVLVWLQJRIVPDQ\GLVFUHWHGLVWDQFH YDOXHV%DVHGRQWKHVHIXQFWLRQVZHFRXOGWKHQFRPSXWHWKHUHDFK DELOLW\SUREDELOLWLHVDFFRUGLQJWR'HILQLWLRQ7KHELJSUREOHPLV WKDWZHKDYHWRFRPSXWHIRUHDFKTXHU\REMHFWR2_'%_ PDQ\GLI IHUHQWFRUHREMHFWYHUVLRQVOHDYLQJRXWDOZD\VRQHHOHPHQWIURPWKH GDWDEDVH)XUWKHUPRUHWKHFRPSXWDWLRQRIHDFKRIWKHVHFRUHREMHFW YDOXHVKDVWRFRQVLGHULQ_'%_ H[SRQHQWLDOO\PDQ\VHWV $ ² '% FI'HILQLWLRQ 2EYLRXVO\WKLVLVLPSUDFWLFDEOH
%DVHGRQWKHFRUHREMHFWSUREDELOLW\GHILQLWLRQZHFDQGHILQHKRZ OLNHO\ LW LV WKDW DQ REMHFW S LV GLUHFWO\ GHQVLW\ UHDFKDEOH IURP DQ REMHFWR,QWKHWUDGLWLRQDODSSURDFKWZRFRQGLWLRQVKDYHWRKROG )LUVWRKDVWREHDFRUHREMHFWDQGVHFRQGWKHGLVWDQFHEHWZHHQS DQGRKDVWREHVPDOOHUWKDQe,QWKHFRQWH[WRIWKLVSDSHUERWKRI WKHVHFRQGLWLRQVDUHIX]]\KROGLQJRQO\ZLWKDFHUWDLQSUREDELOLW\ 'HILQLWLRQ5HDFKDELOLW\3UREDELOLW\ /HW ' EH D GDWDEDVH DQG OHW 3G ' ' ,5 >@ EH D GLVWDQFHGLVWULEXWLRQIXQFWLRQ7KHQWKHUHDFKDELOLW\SUREDELOLW\RI SZUWRLVGHILQHGDVIROORZV UHDFK 3 e, m, G, ' ( S,
R) =
FRUH 3 e, m – , G, '? { S }( R )
7KHLGHDRIRXUDSSURDFKLVWRGHWHUPLQHWKHFRUHREMHFWSUREDELOL WLHVEDVHGRQVPHDQLQJIXOVDPSOHV7KHQZHFRPSXWHWKHUHDFK DELOLW\YDOXHVDFFRUGLQJWR'HILQLWLRQ
¼ 3 G ( S, R ) ( e )
/HPPD 3 eUHDFK , m, G, ' ( S, R ) UHIOHFWV WKH SUREDELOLW\ WKDW S LV GLUHFWO\GHQVLW\UHDFKDEOHIURPR
:HILUVWFRPSXWHIRUDOOREMHFWV[WKHPLQLPXPERXQGLQJUHFWDQJOH 0%5[ RIWKHVDPSOHSRLQWV[[V!FI)LJXUH ,IZHQRZ FDUU\RXWDUDQJHTXHU\DURXQGRZHFUHDWHDVDPSOHPDWUL[0R ZKLFK FRQWDLQV IRU HDFK REMHFW LQVWDQFH RL V GLIIHUHQW YDOXHV P L, M _ 1 e, 'M ( R L ) _ ZKHUH 'M GHQRWHV WKH MWK GDWDEDVH LQVWDQFH ^[M_ Æ [ , ¡, [ M, ¡, [ VÖ ³ ' ¾ [ M R M ` R L DQG 1e, ' M ( R L ) GH QRWHVWKHVHW^[M_ G ( R L, [ M ) e ¾ [ M ³ ' M `FI)LJXUH :HWHVWIRU HDFKREMHFW[LQWKHGDWDEDVHZKHWKHUWKHUHH[LVWVDPSOHLQVWDQFHV[M IRUZKLFK G ( R L, [ M ) e KROGV,IWKLVLVWUXHZHLQFUHDVHWKHFXUUHQW YDOXHRI P L, M 1RWHWKDWRIWHQZHGRQRWKDYHWRFRPSXWHWKHV GLVWDQFHV G ( RL, [ M ) EXWZHFDQGHFLGHEDVHGRQWKHER[HV0%5R DQG0%5[ ZKHWKHUZHKDYHWRLQFUHDVHDOOYDOXHVRIWKHVDPSOH PDWUL[ 0R RU QRQH RI WKHP ,I IRU WKH PD[LPXP GLVWDQFH G PD[ ( R, [ ) EHWZHHQ WKH WZR ER[HV 0%5R DQG 0%5[ G PD[ ( R, [ ) e KROGVZHFDQLQFUHDVHDOOYDOXHVRIWKHVDPSOHPD WUL[0R E\FIREMHFWFLQ)LJXUH ,IIRUWKHPLQLPXPGLVWDQFH G PLQ ( R, [ ) EHWZHHQWKHWZRER[HV G PLQ ( R, [ ) e KROGVZHGRQRW KDYHWRLQFUHDVHDQ\RIWKHVHYDOXHVFIREMHFWGLQ)LJXUH 2QO\ LIWKHYDOXHRIeLVVRPHZKHUHLQEHWZHHQWKHWZRYDOXHV G PLQ ( R, [ ) DQG G PD[ ( R, [ ) ZHKDYHWRFRPSXWHWKHGLVWDQFHVEHWZHHQWKHVDP SOHVWRGHFLGHZKLFKYDOXHV P L, M RIWKHVDPSOHPDWUL[KDYHWREH LQFUHDVHGFIREMHFWDLQ)LJXUH )LQDOO\ZHZRXOGOLNHWRPHQ WLRQWKDWZHFDQFRPSXWHWKLVVDPSOHPDWUL[E\RQO\RQHUDQJHVFDQ
3URRI$FFRUGLQJWR/HPPDWKHSUREDELOLW\WKDWDWOHDVWmRE MHFWVIURP'?SDUHORFDWHGLQ1eR LVHTXDOWR 3 eFRUH , m – , G, '? { S } ( R ) 6HFRQGWKHSUREDELOLW\WKDWWKHGLVWDQFHEHWZHHQSDQGRLVVPDOOHU WKDQeLVHTXDOWR 3 G ( S, R ) ( e ) $VWKHVHWZRFRQGLWLRQVDUHLQGHSHQ GHQWIURPHDFKRWKHUWKHLUSURGXFWFRUUHVSRQGVWRWKHSUREDELOLW\ WKDWDWOHDVWmREMHFWVIURP'DUHORFDWHGLQ1eR DQGWKDWSLVRQHRI WKHP1RWHWKDWWKLVYDOXHUHIOHFWVWKHSUREDELOLW\WKDWSLVGLUHFWO\ GHQVLW\UHDFKDEOHIURPR 'HILQLWLRQFDQEHUHJDUGHGDVDQH[WHQVLRQRIWKHWUDGLWLRQDODS SURDFK,WFRLQFLGHVZLWKWKHWUDGLWLRQDODSSURDFKLIZHDVVXPHWKDW WKHFRUHREMHFWSUREDELOLW\LVDOZD\VRUDQGWKHGLVWDQFHGLVWUL EXWLRQIXQFWLRQ3G\LHOGVRQO\YDOXHVDQGDWSRVLWLRQe.
&RPSXWDWLRQDO$VSHFWV 7KHWUDGLWLRQDO'%6&$1DOJRULWKPFOXVWHUVDGDWDVHWE\DOZD\V DGGLQJ REMHFWV WR WKH FXUUHQW FOXVWHU ZKLFK DUH GLUHFWO\ GHQVLW\ 1RWHWKDWFOXVWHULQJRQWKHFHQWURLGVRIWKHIX]]\REMHFWUHSUHVHQ
WDWLRQV ZRXOG VXIIHU IURP WKH VDPH GUDZEDFNV DV WKH DSSURDFK EDVHGRQWKHGLVWDQFHH[SHFWDWLRQYDOXHV
675
Research Track Poster
G [
[
GPD[RD
D
E
E
[D
E
[ E[
LQVWDQFHVRIR
OHJHQG
D [
GPLQRD
[
F
D
R [ R R
R
[ 0%5R
e
GDWDEDVHLQVWDQFHV
VDPSOH PDWUL[RIR
FOXVWHULQJDSSURDFKHVZHDVVXPHWKDWHDFKSRVLWLRQZLWKLQWKHER[ LVHTXDOO\OLNHO\
H[DPSOHV
UHDFK G ' ( D, R )
=
¼ =
UHDFK 3 G ' ( E, R )
=
¼ =
UHDFK G ' ( F, R )
=
¼ =
3
e,
e,
3
e,
,
,
,
,
,
,
7KHDUWLILFLDOGDWDVHW$57 FRQVLVWVRIGLPHQVLRQDOREMHFWV ZKLFKDUHQRUPDOO\GLVWULEXWHGLQ> @ 7KHHQJLQHHULQJGDWDVHW3/$1( FRQVLVWVRI'&$'RE MHFWVSURYLGHGE\RXULQGXVWULDOSDUWQHUDQ$PHULFDQDLUSODQHPDQ XIDFWXUHU(DFKREMHFWLVUHSUHVHQWHGE\DGLPHQVLRQDOIHDWXUH YHFWRUZKLFKLVGHULYHGIURPWKHFRYHUVHTXHQFHPRGHODVGHVFULEHG LQ>@
H[DFWVDPSOHV DUHQRWUHOHYDQW
,PSOHPHQWDWLRQ)RUFOXVWHULQJWKHIX]]\REMHFWUHSUHVHQWDWLRQV ZH KDYH LPSOHPHQWHG WKH DOJRULWKP ) '%6&$1 DV GHVFULEHG LQ 6HFWLRQ )XUWKHUPRUH ZH LPSOHPHQWHG WKH WZR DSSURDFKHV 81,21DQG,17(56(&7,21DVGHVFULEHGLQ>@DQGWKHVWDQ GDUG '%6&$1 DSSURDFK ZKLFK FDUULHV RXW WKH IX]]\ FOXVWHULQJ EDVHGRQWKHGLVWDQFHH[SHFWDWLRQYDOXHVUHIHUUHGWRDV(;3'% 6&$1 DQGWKHFOXVWHULQJRQWKHH[DFWREMHFWUHSUHVHQWDWLRQV
H[DFWVDPSOHV DUHUHOHYDQW
)LJXUH&RPSXWDWLRQRIIX]]\UHDFKDELOLW\GLVWDQFHVV m $IWHUKDYLQJFRPSXWHGWKHVDPSOHPDWUL[0R ZHFDQHDVLO\GH ULYHWKHUHDFKDELOLW\YDOXHVIRUDOOREMHFWV[LQWKHGDWDEDVH'ZUW R7KHUHWR 3 eFRUH , m – , G, '?[ ( R ) ¼ 3 G ( [, R ) ( e ) FI'HILQLWLRQ KDVWR EHFRPSXWHG7KHILUVWSDUWFDQEHGHULYHGIURPWKHPDWUL[0R LI ZH GHFUHDVH WKH YDOXHV P L, M E\ IRU ZKLFK G ( R L, [ M ) e KROGV 7KHQZHFDQFRXQWWKHQXPEHURIHOHPHQWVLQWKHVDPSOHPDWUL[ 0R ZKLFKFRQWDLQYDOXHVKLJKHURUHTXDOWRm1RUPDOL]LQJWKLV QXPEHUE\V\LHOGVWKHSUREDELOLW\ 3 eFRUH , m – , G, '?[ ( R ) 7KHYDOXHV 3 G ( [, R ) ( e ) FDQ EH FRPSXWHG E\ FRXQWLQJ WKH QXPEHU RI HYHQWV G ( R L, [ M ) e DQGE\QRUPDOL]LQJWKLVQXPEHUDJDLQE\V)RUWKH FRPSXWDWLRQ RI 3 G ( [, R ) ( e ) WKH GLVWDQFHV G PD[ ( R, [ ) DQG G PLQ ( R, [ ) FDQDJDLQEHXVHGIRUSUXQLQJ
$OO DOJRULWKPV ZHUH LPSOHPHQWHG LQ -DYD 7KH H[SHULPHQWV ZHUH UXQ RQ D :LQGRZV ODSWRS ZLWK D 0+] SURFHVVRU DQG 0%PDLQPHPRU\,IQRWRWKHUZLVHVWDWHGZHXVHGDVDPSOH UDWHRIV 4XDOLW\0HDVXUHV)RUFRPSDULQJDJLYHQUHIHUHQFHFOXVWHULQJWR WKHFOXVWHULQJVUHVXOWLQJIURPFOXVWHULQJWKHIX]]\REMHFWUHSUHVHQ WDWLRQVZHXVHGWKHDSSUR[LPDWLQJTXDOLW\PHDVXUHLQWURGXFHGLQ >@,Q>@a quality measure for clusters based on the symmetric set difference was introduced and based on this distance measure between clusters a quality criteria for approximated partitioning clusterings QAPC was introduced. This quality measure is based on the PLQLPXPZHLJKWSHUIHFWPDWFKLQJRIVHWV
,IZHDVVXPHQGDWDEDVHREMHFWVZKLFKDUHQRWVWRUHGLQDQ\LQGH[ VWUXFWXUHDQGDVDPSOHUDWHRIVZHFDQVXPPDUL]HWKHFKDUDFWHULV WLFVRIWKH) '%6&$1LPSOHPHQWDWLRQDVIROORZV • We need O(n) range scans • We require between 2 ( Q ) and 2 ( V ¼ Q ) many distance computations between d-dimensional feature vectors.
3DUDPHWHUV,QDOORXUWHVWVZHVHWm DQGXVHGDQeSDUDPHWHU IRU WKH YDULRXV '%6&$1 LPSOHPHQWDWLRQV VXFK WKDW EHWZHHQ DQGFOXVWHUVDQGEHWZHHQDQGQRLVHREMHFWVIRUWKHUHI HUHQFHFOXVWHULQJZHUHFUHDWHG
([SHULPHQWDO5HVXOWV
1RWHWKDWHVSHFLDOO\LQWKHLPSRUWDQWFDVHZKHUHWKHIX]]\REMHFWV DUHQRWWRRIX]]\ZHRQO\QHHGDURXQG 2 ( Q ) many distance calculations,QWKLVFDVHWKHLQWURGXFHGSUXQLQJGLVWDQFHVGPLQDQGGPD[ DUHYHU\HIIHFWLYHDVWKH\DUHUDWKHUFORVHWRHDFKRWKHU7KHUHIRUHLW LVYHU\XQOLNHO\WKDWWKHeYDOXHLVLQEHWZHHQWKHP)XUWKHUPRUHLQ WKLVFDVHLWLVEHQHILFLDOWRRUJDQL]HWKHPLQLPXPERXQGLQJUHFWDQ JOHVRIWKHVDPSOHVHWVLQ5WUHH>@OLNHLQGH[VWUXFWXUHV$VZHFDQ W\SLFDOO\XVHUDWKHUVPDOOeYDOXHVIRU'%6&$1WKHQXPEHURIGLV WDQFH FRPSXWDWLRQV FDQ WKXV IXUWKHU EH UHGXFHG WR 2 ( Q ¼ ORJ Q ) ZKLFKFRUUHVSRQGVWRWKHWLPHFRPSOH[LW\RIWKHRULJLQDO'%6&$1 DOJRULWKPEDVHGRQLQGH[VWUXFWXUHV
(IILFLHQF\ )LUVW ZH LQYHVWLJDWH WKH UXQWLPHV RI RXU IX]]\ '% 6&$1FOXVWHULQJDSSURDFKHV7KHIROORZLQJWDEOHGHSLFWVWKHDEVR OXWHUXQWLPHVLQVHFRQGVIRUWKH$57GDWDVHWr V ) '%6&$1
(;3'%6&$1
81,21
,17(56(&7,21
7KHJRRGSHUIRUPDQFHRIWKH ) '%6&$1DSSURDFKGHPRQVWUDWHV WKHVXLWDELOLW\RIWKHILOWHUVLQWURGXFHGLQ6HFWLRQUHVXOWLQJLQ RQO\ 2 ( Q ) , and not 2 ( V ¼ Q ) , many distance computations.)XU WKHUPRUHZHFDQVHHWKDWDOORWKHUIX]]\FOXVWHULQJDSSURDFKHVDUH VORZHUZKLFKFDQEHH[SODLQHGE\WKHKLJKHUQXPEHURIGLVWDQFH FRPSXWDWLRQVZKLFKKDYHWREHFDUULHGRXWLH81,21DQG,17(5 6(&7,21UHTXLUH 2 ( V ¼ Q ) PDQ\GLVWDQFHFRPSXWDWLRQVDQG(; 3'%6&$1UHTXLUHV 2 ( V ¼ Q ) PDQ\GLVWDQFHFRPSXWDWLRQV
(;3(5,0(17$/(9$/8$7,21 ,QWKLVVHFWLRQZHSUHVHQWWKHH[SHULPHQWDOUHVXOWVRIWKHLQWURGXFHG FOXVWHULQJDOJRULWKP) '%6&$1GHPRQVWUDWLQJWKHFKDUDFWHULVWLFV DQGEHQHILWVRIRXUDSSURDFK
6HWXS
,QWKHIROORZLQJVHFWLRQVZHZLOOVKRZWKDWIURPDQHIIHFWLYLW\SRLQW RI YLHZ RXU DSSURDFKHV DOVR RXWSHUIRUP WKH FKRVHQ FRPSDULVRQ SDUWQHUVE\IDU
'DWD6HWV$OOH[SHULPHQWVZHUHEDVHGRQWZRGLIIHUHQWWHVWGDWD VHWVDQ DUWLILFLDO GDWD VHW DQG DQHQJLQHHULQJ GDWD VHWZKLFKDUH QRUPDOL]HGLQDGDWDVSDFH> @G)RUHDFKGDWDVHWZHKDYHH[DFW REMHFWUHSUHVHQWDWLRQVLHDQREMHFWLVGHVFULEHGE\H[DFWO\RQHIHD WXUHYHFWRU)XUWKHUPRUHHDFKREMHFWLVUDQGRPO\VXUURXQGHGE\D ER[KDYLQJDVLGHOHQJWKRIr LQHDFKGLPHQVLRQ)RURXUIX]]\
(IIHFWLYLW\,QDILUVWVHWRIH[SHULPHQWVZHLQYHVWLJDWHGWKHTXDOL WLHVRIWKHGLIIHUHQWIX]]\FOXVWHULQJDSSURDFKHVZUWDJLYHQUHIHU HQFHFOXVWHULQJ)LJXUHVKRZVIRUWKH$57DQGIRUWKH3/$1( GDWDVHWWKDWIRUDOOIX]]\FOXVWHULQJDOJRULWKPVWKHTXDOLW\GHFUHDVHV ZLWKDQLQFUHDVLQJYDOXHRIrLHDQLQFUHDVLQJXQFHUWDLQW\DUHDRI WKHREMHFWV)XUWKHUPRUHZHFDQVHHWKDWWKH) '%6&$1DOJRULWKP
676
D $57
) '%6&$1
,17(56(&7,21
(;3'%6&$1
81,21
) '%6&$1
,17(56(&7,21
(;3'%6&$1
81,21
E 3/$1( D $57 )LJXUHCore-object classification of IX]]\'%6&$1 UHFDOO SUHFLVLRQ FOXVWHULQJ DOJRULWKPV(r = 0.01).
r[1/1000]
E 3/$1(
)LJXUH4XDOLW\RIIX]]\'%6&$1FOXVWHULQJDOJRULWKPV
3UHFLVLRQ5HFDOO>@
r[1/1000]
3UHFLVLRQ5HFDOO>@
) '%6&$1 (;3'%6&$1
3UHFLVLRQ5HFDOO>@
81,21 ,17(56(&7,21
TXDOLW\4$3&>@
TXDOLW\4$3&>@
Research Track Poster
VLGHVO\LQJWKHWKHRUHWLFDOIRXQGDWLRQVIRUGHQVLW\EDVHGFOXVWHULQJ RIIX]]\GDWDZHVKRZHGKRZWRSXWWKHVHFRQFHSWVLQWRSUDFWLFH 7KHUHVXOWLQJSDUWLWLRQLQJGHQVLW\EDVHGDOJRULWKP) '%6&$1FDQ EHXVHGWRFOXVWHUIX]]\GDWDHJPRYLQJREMHFWVHIIHFWLYHO\DQG HIILFLHQWO\7KHDOJRULWKPIROORZVWKHQHZSDUDGLJPRILQWHJUDWLQJ IX]]\ GLVWDQFH IXQFWLRQV GLUHFWO\ LQWR GDWD PLQLQJ DOJRULWKPV LQ VWHDGRIZRUNLQJRQORVV\DJJUHJDWHGLQIRUPDWLRQ,QRXUH[SHUL PHQWDOHYDOXDWLRQZHGHPRQVWUDWHGWKDWWKHQHZO\LQWURGXFHGFOXV WHULQJDOJRULWKP) '%6&$1DFKLHYHVPXFKPRUHDFFXUDWHUHVXOWV WKDQ VWDWHRIWKHDUW FRPSDULVRQ SDUWQHUV ZLWKRXW VDFULILFLQJ HIIL FLHQF\
LVIRUERWKGDWDVHWVWKHPRVWHIIHFWLYHFOXVWHULQJDOJRULWKPRYHUWKH FRPSOHWHUDQJHRIr. )RUWKH$57GDWDVHWWKH(;3'%6&$1SHU IRUPVDOVRTXLWHZHOOEXWIRUKLJKHUGLPHQVLRQDOGDWDHJIRUWKH 3/$1(GDWDVHWLWVTXDOLW\LVPXFKZRUVHWKDQWKHTXDOLW\RIWKH ) '%6&$1 DSSURDFK 2Q WKH RWKHU KDQG WKH 81,21 DSSURDFK SHUIRUPVIRUVSDUVHKLJKGLPHQVLRQDOGDWDZHOOEXWXQIRUWXQDWHO\ QRWIRUWKHORZGLPHQVLRQDO$57GDWDVHW $QH[SODQDWLRQIRUWKHVXSHULRULW\RIWKH) '%6&$1DOJRULWKPFDQ EHIRXQGLQ)LJXUH,QWKLVILJXUHZHLQYHVWLJDWHWKHDFFXUDF\RI WKH FRUHSRLQW FODVVLILFDWLRQ RI WKH GLIIHUHQW DOJRULWKPV )RU WKH (;3'%6&$1DSSURDFKWKHSUHFLVLRQRIWKHGHWHFWHGFRUHREMHFWV LVYHU\KLJKEXWXQIRUWXQDWHO\WKHUHFDOOLVYHU\ORZLHWKHDSSURDFK IDLOVWRGHWHFWPDQ\FRUHREMHFWV7KXVZHKDYHYHU\RIWHQWKHVLWX DWLRQ GHSLFWHGLQ)LJXUHE6LPLODUREVHUYDWLRQVEXWPXFKPRUH SURQRXQFHGFDQEHPDGHIRUWKH,17(56(&7,21DSSURDFK)RU WKH81,21DSSURDFKWKHRSSRVLWHREVHUYDWLRQFDQEHPDGH7KH SUHFLVLRQRIWKLVDSSURDFKLVYHU\ORZDVWKH81,21DSSURDFKFODV VLILHVZD\WRPDQ\REMHFWVDVFRUHREMHFWVZKLFKDFWXDOO\DUHQRFRUH REMHFWV7KXVZHRIWHQKDYHVLWXDWLRQVVLPLODUWRWKHRQHGHSLFWHGLQ )LJXUH D 2Q WKH RWKHU KDQG IRU WKH ) '%6&$1DSSURDFK ERWK SUHFLVLRQDQGUHFDOODUHUDWKHUKLJK
,QRXUIXWXUHZRUNZHZLOOVKRZWKDWDOVRRWKHUGDWDPLQLQJDOJR ULWKPVZRUNLQJRQYDJXHLQIRUPDWLRQFDQEHQHILWIURPDGLUHFWLQWH JUDWLRQRIIX]]\GLVWDQFHIXQFWLRQV
5()(5(1&(6 >@
>@ >@
)XUWKHUPRUHZHZRXOGOLNHWRPHQWLRQWKDWRXU ) '%6&$1DOJR ULWKP RXWSHUIRUPV WKH VHUYHUVLGHG FOXVWHULQJ DSSURDFKHV RI VWDWHRIWKHDUWGHQVLW\EDVHGGLVWULEXWHGFOXVWHULQJDOJRULWKPV> @ 7KH VHUYHUVLGHG DSSURDFK SUHVHQWHG LQ >@ FRUUHVSRQGVWR DQ ) '%6&$1DSSURDFKXVLQJDVDPSOHUDWHRI:HQRWLFHGWKDWLIZH XVHDVDPSOHUDWHVDURXQGLQVWHDGRIZHFDQLQFUHDVHWKHDYHU DJH TXDOLW\ YDOXHV FRQVLGHUDEO\ HJ IRU r RQ WKH 3/$1( GDWDVHWZHFDQLQFUHDVHWKHTXDOLW\IURPV WRV )XUWKHUPRUH VPDOO VDPSOH UDWHV DOZD\V EHDU WKH ULVN RI H[WUHPH YDOXHVHJIRUr RQWKH3/$1(GDWDVHWZHQRWLFHGTXDOLW\ YDOXHVOHVVWKDQZKHQXVLQJDVDPSOHUDWHRI$VVDPSOHUDWHV KLJKHUWKDQGRQRWPXFKSD\RIIZHVXJJHVWWRXVHDVDPSOHUDWH RIWRJHWDJRRGWUDGHRIIEHWZHHQDFFXUDF\DQGHIILFLHQF\)XU WKHUPRUHWKHVHUYHUVLGHGDSSURDFKSUHVHQWHGLQ>@FRUUHVSRQGVWR WKH (;3'%6&$1 DSSURDFK )LJXUH DQG VKRZ WKDW WKH ) '%6&$1 DSSURDFK DOVR FOHDUO\ RXWSHUIRUPV WKLV FRPSDULVRQ SDUWQHU
>@ >@ >@ >@ >@ >@
&21&/86,21
>@
,QWKLVSDSHUZHGHPRQVWUDWHGKRZGHQVLW\EDVHGFOXVWHULQJFDQEH FDUULHGRXWEDVHGRQYDJXHDQGXQFHUWDLQLQIRUPDWLRQZKLFKRIWHQ RFFXUV LQ PRGHUQ DSSOLFDWLRQ UDQJHV OLNH VHQVRU GDWDEDVHV VSD WLDOWHPSRUDODSSOLFDWLRQVDQGELRPHWULFLQIRUPDWLRQV\VWHPV%H
>@
,Q WKH ) '%6&$1 DSSURDFK ZH FODVVLILHG DQ REMHFW R DV FRUH REMHFWLII 3eFRUH , m, G, ' ( R ) KROGV
>@
>@
677
%UDFHZHOO 5 7KH ,PSXOVH 6\PERO &K LQ 7KH )RXULHU 7UDQVIRUPDQG,WV$SSOLFDWLRQVUGHG0F*UDZ+LOO &KHQJ5.DODVKQLNRY'93UDEKDNDU6(YDOXDWLQJSURE DELOLVWLF TXHULHV RYHU LPSUHFLVH GDWD 6,*02'¶ SS (VWHU0.ULHJHO+36DQGHU-;X;$'HQVLW\%DVHG $OJRULWKP IRU 'LVFRYHULQJ &OXVWHUV LQ /DUJH 6SDWLDO 'DWD EDVHVZLWK1RLVH.''¶SS *XWWPDQ$5WUHHV$'\QDPLF,QGH[6WUXFWXUHIRU6SDWLDO 6HDUFKLQJ3URF6,*02'¶SS +|SSQHU).ODZRQQ).UXVH55XQNOHU7)X]]\&OXVWHU $QDO\VLV:LOH\ -DQX]DM(.ULHJHO+33IHLIOH06FDODEOH'HQVLW\%DVHG 'LVWULEXWHG&OXVWHULQJ3.'' SS -DLQ $ . 0XUW\ 0 1 )O\QQ 3 - 'DWD &OXVWHULQJ $ 5HYLHZ$&0&RPSXWLQJ6XUYH\V9RO1R6HS SS .ULHJHO+3%UHFKHLVHQ6.U|JHU33IHLIOH06FKXEHUW 08VLQJ6HWVRI)HDWXUH9HFWRUVIRU6LPLODULW\6HDUFKRQ 9R[HOL]HG&$'2EMHFWV6,*02' SS .ULHJHO+3.XQDWK33IHLIOH05HQ]0$SSUR[LPDWHG &OXVWHULQJ RI 'LVWULEXWHG +LJK 'LPHQVLRQDO 'DWD 3$.''¶SS .ULHJHO+3.DLOLQJ.3U\DNLQ$6FKXEHUW0&OXVWHULQJ 0XOWL5HSUHVHQWHG 2EMHFWV ZLWK 1RLVH 3$.''¶ SS .ULHJHO+33IHLIOH00HDVXULQJWKH4XDOLW\RI$SSUR[L PDWHG&OXVWHULQJV%7:¶SS /L