applications such as Scientific Computing, Financial analysis,. Social Analytics, Evidence-based medicine etc. IDE supports user's navigation through the query ...
)RXUWK,QWHUQDWLRQDO&RQIHUHQFHRQ3DUDOOHO'LVWULEXWHGDQG*ULG&RPSXWLQJ3'*&
A Scalable Query Materialization Algorithm for Interactive Data Exploration $UFKDQD'KDQNDU
9LNUDP6LQJK
&RPSXWHU(QJLQHHULQJ'HSDUWPHQW 1DWLRQDO,QVWLWXWHRI7HFKQRORJ\.XUXNVKHWUD +DU\DQD,QGLD $UFKDQDGKDQNDU#JPDLOFRP
&RPSXWHU(QJLQHHULQJ'HSDUWPHQW 1DWLRQDO,QVWLWXWHRI7HFKQRORJ\.XUXNVKHWUD +DU\DQD,QGLD YLNV#QLWNNUDFLQ
Abstract— Data exploration is the process of competently digging out insights from stored data even if the user doesn’t know or uncertain on what exactly he want. Iterative interactions of the user with search systems can help to achieve these goals; Interactive data exploration (IDE) is one such system, supporting data exploration by simply incorporating user feedback on retrieve data. These systems are the key ingredient of many discoveries and recall-oriented real-life applications such as Scientific Computing, Financial analysis, Social Analytics, Evidence-based medicine etc. IDE supports user's navigation through the query to query transition in data space, via exploratory session'. The exploratory session consists of often long, complex and analytical Queries. When processed against a large and multi-faceted data, hence consume a lot of time for processing. The user's original query can be decomposed into multiple candidate queries, ‘checkpoint queries', and will be selected for materialization. In his paper, we introduced the notion of ‘checkpoint queries', also discussed how heuristics query frequency and query result overlap ratio (QROR) are used in a greedy selection of these candidate queries from an exploratory session. It is observed, checkpoint queries proved a good decision for significant improvement in query processing time, as materialized checkpoint queries can be used in the query answering in query navigation. In this process, we face some daunting tasks; one key challenge is the selection of checkpoint queries for the materialization and how to improve the query reuse for query answering. In spite of the simplicity, our algorithm selects queries which give us better performance than views that selected by existing algorithms. Keywords- Checkpoint Query; Interactive Data Exploration; Query Lattice; Query Reuse; QueryResult Overlap Ratio,
,
,1752'8&7,21
$ QDwYH XVHU LQWHUDFWV ZLWK WKH GDWDEDVH V\VWHP LQ LQIRUPDWLRQVHHNLQJWDVNYLDTXHULHV7KLVTXHU\UHVXOWPRGH RILQWHUDFWLRQDVVXPHVWKDWXVHUVDUHIDPLOLDUZLWKWKHGDWDEDVH DQGKDYHFODULW\RQWKHLULQIRUPDWLRQQHHGV+RZHYHUDVQRZ GDWDEDVHV V\VWHPV DUH JURZLQJ DQG DFFHVVLEOH WR D ZLGH VSHFWUXP RI XVHUV :KHWKHU D WHFKQLFDO XVHU DVVXPHG WR EH IDPLOLDUZLWKWKHGDWDEDVHVHPDQWLFVRUVWUXFWXUHDQGWKLVSULRU NQRZOHGJHKHOSVLQTXHU\IRUPXODWLRQRIWKHLQLWLDOTXHU\7KH GDWDEDVH LQ WKHVH DSSOLFDWLRQ GRPDLQV LV RIWHQ KXJH DQG FRPSOH[7KXVDGDWDEDVHV\VWHPLVUHTXLUHGZKLFKDVVLVWVD XVHULQWKHGDWDH[SORUDWLRQE\HLWKHUVLPSO\VXJJHVWLQJVRPH TXHULHV>@RULQFRUSRUDWLQJWKHXVHUIHHGEDFN ,Q GDWD H[SORUDWLRQ V\VWHPV LQLWLDOO\ VLPSOH TXHULHV DUH SRVHGDQGIXUWKHUFRPSOH[SUHGLFDWHVDUHDGGHGDQGWKHQWKH TXHU\LVUHH[HFXWHGLQWHUDFWLYHO\XQWLOWKHGHVLUHGUHVXOWDUH
,(((
DFKLHYHG7KLVSDUDGLJPRITXHU\LQJLVERWKLQFUHPHQWDODQG LQWHUDFWLYHDVHVVLRQFRQVWDQWO\HYROYLQJXVHUTXHU\YLDWKH F\FOLFSDWKRILQWHQWTXHU\H[HFXWLRQUHVXOW)RUH[DPSOHLQ DUHODWLRQDOWDEOHcitizenLQDcountryGDWDEDVHKDYLQJPRUH WKDQDPLOOLRQWXSOHV6XSSRVHDVRFLDOVFLHQWLVWZDQWVWRILQG LQWHUHVWLQJIDFWVOLNHOLYLQJVWDQGDUGVDQGOLIHVW\OHRIVHQLRU FLWL]HQVLQDSDUWLFXODUUHJLRQ$XVHUTXHU\OLNHselect * from citizenwhereage > 60, ZRXOGEHLVVXHGZKLFKPLJKWUHWXUQ PLOOLRQVRIUHFRUGV,WZLOOEHFRPHWHGLRXVIRUKLPWRLQIHU DQ\FRQFOXVLRQRXWRIWKLVKXJHUHVXOWVHW$IXQGDPHQWDOGDWD H[SORUDWRU\ V\VWHP JHQHUDWHV IHZ TXHU\ VXJJHVWLRQV >@ DFFRUGLQJ WR FRUUHODWHG GDWD DWWULEXWHV SUHYLRXVO\ LVVXHG TXHULHVVL]HRIUHVXOWGDWDVHWIUHTXHQWO\LVVXHGTXHULHVHWF 7KHVXJJHVWLRQFDQEHWKHOLVWRITXHULHVOLNHselect * from citizen where age between 60 and 70 and income < 5 lakh RU select name, address, contact_details from citizen where has_relative= “no”, RU select * from citizen where location “downtown”. 7KHVHLQLWLDOTXHULHVDUHH[SHQVLYHEXWEDVHG RQ WKH VXJJHVWHG TXHULHV DQG UHVXOW LQ VHW REWDLQHG XVHU DSSOLHV IRFXVGHIRFXV WRROV WKH UHVXOW VHW >@ $ IRUPRIWKLVLVDOVRUHIHUUHGDVJXLGHGLQWHUDFWLRQ>@ $QH[SORUDWRU\VHDUFKHQFRPSDVVHVWZRDVSHFWVZKLFKJR EH\RQG D FODVVLFDO SUREOHP RI ORRNLQJ XS LQIRUPDWLRQ $V H[SORUDWRU\VHDUFKKDVWKHJHQHUDOJRDORIL OHDUQLQJWKXV DFTXLULQJ QHZ NQRZOHGJH DQG LL LQYHVWLJDWLQJ WR SRVVLEO\ UHYHDOQHZIDFWV,QWHUDFWLYHGDWDH[SORUDWLRQ,'( V\VWHPV DUHRQHVXFKH[DPSOH,'(V\VWHPVH[SORUHWKHXQGHUO\LQJ GDWDVSDFHE\DGMXVWLQJWKHEDVLFTXHULHVDFFRUGLQJWRWKHXVHU UHOHYDQFHIHHGEDFNRQLQLWLDOO\UHWULHYHGTXHU\UHVXOWV,'( IXQGDPHQWDOO\LVDORQJUXQQLQJPXOWLVWHSSURFHVVZLWKXVHU LQWHUHVWV VSHFLILHG LQ LPSUHFLVH WHUPV ,Q ,'( DSSOLFDWLRQV XVHUV WU\ WR PDNH VHQVH RI WKH VWRUHG ODUJH GDWD VSDFH E\ VWHHULQJWKURXJKLW7KHWZRIXQGDPHQWDOZD\VRIVWHHULQJDUH IRFXVLQJDQGGHIRFXVLQJZKHUHIRFXVLQJRQSDUWVRIWKHGDWD WRLGHQWLI\LQWHUHVWLQJ³LQVLJKWV´GHIRFXVLQJRQGDWDWKDWLVRI OLWWOHRUQRLQWHUHVWMXPSLQJWRUHODWHWR³VWXII´DQGUHSHDWLQJ WKLVSURFHVVW\SLFDOO\DOOWKURXJKDYLVXDOLQWHUIDFH 'DWD H[SORUDWLRQ V\VWHPV VLPSO\ LPSOHPHQW WKHVHV QDYLJDWLRQDO RSHUDWLRQV YLD TXHU\WRTXHU\ WUDQVLWLRQ DOVR FDOOHGTXHU\VWHHULQJ,QRUGHUWRUHWULHYHGDWDREMHFWVRIUHDO LQWHUHVWLWLVKLJKO\SUREDEOHWKDWDXVHUUHYLHZVDPHTXHU\ UHVXOWVHWPXOWLSOHWLPHV>@DVTXHULHVDUHUHODWHGDQGGDWD H[SORUDWLRQV\VWHPRIWHQUHIRUPXODWHTXHU\,QDQH[SORUDWRU\ VHVVLRQ WKH SRVVLELOLW\ RI UHVXOW RYHUODS UHVXOWV LQWR TXHU\ UHXVH>@How to reuse this result overlap among queries in
)RXUWK,QWHUQDWLRQDO&RQIHUHQFHRQ3DUDOOHO'LVWULEXWHGDQG*ULG&RPSXWLQJ3'*&
data exploration LV WKXV DQ LQWHUHVWLQJ UHVHDUFK SUREOHP ZKLFKZHH[SORUHLQWKLVSDSHU7KHFKHFNSRLQWTXHULHVDUH FUHDWHGIURPWKHXVHUTXHU\LQWKHVDPHVHVVLRQDQGKLJKO\ RYHUODSSLQJTXHULHVDUHPDWHULDOL]HG/DWHUGDWDH[SORUDWLRQ WDVNV WKHVH YLHZV DUH XVHG WR UHWULHYH WKH GDWD ILJXUH
Figure 1,QWHUDFWLYH'DWD([SORUDWLRQ,'( V\VWHP ,Q WKLV SDSHU ZH GHYLVHG DQ DSSURDFK IRU UHXVLQJ WKH 5HVXOW 2YHUODS DPRQJ TXHULHV LQ µH[SORUDWRU\ VHVVLRQ E\ VLPSO\ PDWHULDOL]LQJ WKH SUHYLRXV TXHULHV RU FDQGLGDWH TXHULHV LVVXHG E\ WKH XVHU RI KLJK RYHUODSSHG UHVXOWV 7KH VHOHFWLRQ RI UHXVDEOH TXHULHV µFKHFNSRLQW TXHULHV LV SXUHO\ EDVHGRQWKH5HVXOW2YHUODSZLWKWKHUHFHQWRUSDVWTXHULHVLQ D µ4XHU\ 6HVVLRQ )RU D XVHU H[SORUDWRU\ VHVVLRQ WRSN TXHULHV DUH LGHQWLILHG EDVHG RQ RSWLPL]DWLRQ KHXULVWLFV OLNH TXHU\UHVXOWRYHUODSTXHU\IUHTXHQF\HWF7KH4XHU\UHVXOW RYHUODSUDWLR4525 LQGLFDWHVWKHUDWLRRIRYHUODSSLQJGDWD REMHFWVLQWKHSDLURIWKHTXHULHVLQDXVHUVHVVLRQWKXVLQD VHVVLRQVRPHTXHULHVZLOOKDYH4525YDOXHVKLJKHUWKDQWKH RSWLPDO WKUHVKROG 0DWHULDOL]HG YLHZV IURP FKHFNSRLQW TXHULHV DUH XVHG GXULQJ WKH TXHU\ UHIRUPXODWLRQ LQ TXHU\ VWHHULQJIRUGDWDH[SORUDWLRQDVWKHGDWDVRXUFH7KHVHYLHZV IXUWKHUFRXOGEHXVHGLQWKHH[SDQGHGTXHU\RUDQHQKDQFHG TXHU\E\WKHXVHULQWKHGDWDVSDFH A. Related Work 7KH TXHU\ PDWHULDOL]DWLRQ RQ WKH KLVWRULFDO GDWD LV D IDPLOLDUDSSURDFKLQWKHGDWDZDUHKRXVHV>@0RVWRIWKH H[LVWLQJYLHZVHOHFWLRQDSSURDFKHVFRQVLGHUVIUHTXHQF\WKH VL]H RI YLHZV FRQVWUDLQWV OLNH PDLQWHQDQFH FRVW WLPH DQG GLVN VSDFH DV FRPPRQ TXHU\ VHOHFWLRQ KHXULVWLFV IRU PDWHULDOL]DWLRQ 0DQ\ VROXWLRQV KDYH EHHQ SURSRVHG DQG DQDO\]HGLQ>@7KHVXUYH\>@FRQFHQWUDWHVRQ PHWKRGV RI ILQGLQJ D UHZULWLQJ RI D TXHU\ XVLQJ D VHW RI PDWHULDOL]HG YLHZV 7KH VWXG\ SUHVHQWHG LQ >@ IRFXVHV RQWKHVWDWHRIWKHDUWLQPDWHULDOL]DWLRQIRUZHEGDWDEDVHV >@*LYHVDQDQDO\VLVRIPHWKRGRORJLHVWRPDWHULDOL]HYLHZV LQWKHFRQWH[WRIZDUHKRXVLQJ$FRPSOHWHVXUYH\RQWKHYLHZ VHOHFWLRQSUREOHPKDVEHHQFODVVLILHGLQ>@ +RZHYHU HLWKHU VLPLODU TXHULHV RU H[SDQGHG TXHU\ LV XVHGUHSHDWHGO\LQH[SORUDWRU\VHDUFK$VDUHVXOWWKHQRWLRQ RI TXHU\ PDWHULDOL]DWLRQ RI GDWD ZDUHKRXVHV FDQQRW EH DGDSWHG IRU WKH ,'( (DUOLHU FRUUHFWQHVV DQG FRPSOHWHQHVV ZHUH WKH NH\ FULWHULD EXW QRZ WKH PDLQ IRFXV LV WR ILQG LQWHUHVWLQJSDWWHUQVUDWKHUWKDQH[DFWDQVZHUV7KH>@ H[SORLWV WKH NQRZOHGJH RI GDWD WR KHOS QDLYH XVHUV LQ FRQVWUXFWLQJH[SORUDWRU\TXHULHVDQGIXUWKHUPRGLILHGE\WKH
XVHUWRSURGXFHGHVLUHGUHVXOW>@6LPLODUWRWKHZHEVHDUFK WKH XVHU LV JXLGHG IURP LPSUHFLVH TXHULHV WR UHOHYDQW ZHE SDJHVDQGUHVXOWLQIXUWKHUVWHHUKLPWRGLJPRUHH[WHQVLYHO\ LQ WKH VHDUFK UHVXOWV REWDLQHG DQG DOWHU WKH TXHU\ WR JHW VXIILFLHQW DQVZHUV >@ $V >@ GHILQHV WKH TXHU\ VWHHULQJ DOJHEUDOLNH'5,//'2:1DQG5(/$7(IRUQDYLJDWLRQLQ WKHTXHU\VHVVLRQVDQG>@VXJJHVWVTXHU\PDQLSXODWLRQVIRU SHUIRUPDQFHEHQHILWV,QWKH,'(WKHFKDQFHVRIVXEPLVVLRQ RI VLPLODU TXHULHV ZKLFK PLJKW KDYH FRPPRQ VXESDUWV DUH KLJKO\ OLNHO\ +HQFH ZH QHHG WR UHWKLQN WKH TXHU\UHVXOW SDUDGLJPLQWKHFDVHRI,'($JRRGVXUYH\RQ,'(LVGRQH LQ >@ 4XHULHV DUH FRQVWUXFWHG IRU H[SORUDWRU\ UHVHDUFK XVLQJ VDPSOH TXHULHV LQ >@ )RU %LJ 'DWD >@ RIIHUV RQH LQWHUHVWLQJ VROXWLRQ 7KH ODWWLFH FDQ EH XVHG WR UHSUHVHQW PXOWLGLPHQVLRQDOGDWD>@JLYHPHWKRGVWRPDQLSXODWH ODWWLFHVIRUGDWDDQDO\VLV,QRXUZRUNIRUDVHOHFWHGTXHU\D ODWWLFH LV FRQVWUXFWHG XVLQJ KHXULVWLFV 7KHQ TXHU\ ZLWK PD[LPXPEHQHILWLVPDWHULDOL]HG B. Contribution and Outline $Q DSSURDFK IRU TXHU\ VWHHULQJ IRU LQWHUDFWLYH GDWD H[SORUDWLRQ V\VWHP LV WKH SULPH FRQWULEXWLRQ RI WKLV SDSHU 7UHQGVDUHFKDQJLQJIURPVWULQJHQWTXHULHVWRQH[WJHQHUDWLRQ V\VWHPV ZKHUH TXHULHV DUH VXEPLWWHG RQ WKH EDVLV RI WKHLU LQWHQWUDWKHUWKDQIRFXVLQJRQWKHV\QWD[RITXHULHV7KHUHVXOW REWDLQHGIURPWKHLVVXHGTXHULHVVKRXOGSURYLGHWKHJXLGDQFH WRH[SORUHWKHPHDQLQJIXOGDWD$TXHU\FRQWLQXHVWRHYROYH LQ WKH FRXUVH RI WKH XVHU VHVVLRQ LQ ,'( 7KH DSSURDFK LV DGDSWLQJ WKH QRWLRQ RI YLHZ PDWHULDOL]DWLRQ RI IUHTXHQW TXHULHV SRVHG E\ WKH XVHU LQ TXHU\ VHVVLRQ RU H[SORUDWRU\ VHVVLRQ,QVWHDGRIPDWHULDOL]LQJDOOLQSXWXVHUTXHULHVWKHLU VXETXHULHV RU FDQGLGDWH TXHULHV DUH FRQVLGHUHG IRU PDWHULDOL]DWLRQ 6LQFH LW V KLJKO\ XQOLNHO\ WKDW H[DFW TXHU\ ZLOOEHLVVXHGDJDLQLQWKHVDPHXVHUVHVVLRQWKHXVHUPLJKW H[SORUHWKHVDPHGDWDVSDFH7KHUHIRUHWKHUHSHWLWLRQRIWKH VXETXHULHVRUFKHFNSRLQWTXHULHVLVPRUHOLNHO\7KHUHXVH RI PDWHULDOL]HG FKHFNSRLQW TXHULHV IRU GDWD H[SORUDWLRQ LV PDLQPRWLYDWLRQRIWKLVSDSHUDVLWDOVRLPSURYHVWKHRYHUDOO HIILFLHQF\RIWKHGDWDH[SORUDWLRQV\VWHP ,Q WKLV SDSHU VHFWLRQ GLVFXVVHV WKH LQWHUDFWLYH GDWD H[SORUDWLRQV\VWHPVIXQGDPHQWDOVDQGDIRUPDOV\VWHPPRGHO LV LOOXVWUDWHG6HFWLRQ FRQWDLQV SURSRVHG DSSURDFKIRU WKH FKHFNSRLQW TXHU\ JHQHUDWLRQ ,GHQWLILFDWLRQ RI IUHTXHQW TXHULHV IRU PDWHULDOL]DWLRQ LV GHVFULEHG LQ WKH VXEVHTXHQW VXEVHFWLRQV ,Q VHFWLRQ ZH KDYH KLJKOLJKWHG YDULRXV FKDOOHQJHV LQ WKH FXUUHQW DSSURDFK 7KH FRQFOXVLRQ LV WKH EULHIUHYLHZRIFXUUHQWZRUNGLVFXVVHGLQWKHSDSHU ,,
,17(5$&7,9('$7$(;3/25$7,21
$Q,'(FDQEHPRGHOHGDVDSURFHVVDVZHOODVDV\VWHP :KHQYLHZHGDVDV\VWHPWKH,'(EHKDYHVOLNHDEODFNER[ WR WKH XVHU RQ ZKLFK XVHU RQO\ SURYLGHV ³UHOHYDQW´ DQG ³LUUHOHYDQW´DVIHHGEDFN1RZLWLVWKHMRERI,'(V\VWHPWR FRPHXSZLWKWKHUHVXOWVHWVWKDWPLJKWEHLQWHUHVWLQJWRWKH XVHU %DVHG RQ WKH IHHGEDFN WKH V\VWHP SURYLGHV PRUH VDPSOHVWRVDWLVI\WKHUHTXLUHPHQWQHHGVRIWKHXVHU,QWKLV
)RXUWK,QWHUQDWLRQDO&RQIHUHQFHRQ3DUDOOHO'LVWULEXWHGDQG*ULG&RPSXWLQJ3'*&
FDVHWKHXVHULVQRWFDSDEOHRIQDUURZLQJGRZQKLVLQWHUHVWV E\ XQGHUVWDQGLQJ WKH EHKDYLRU RI WKH V\VWHP KHQFH KH LV KLJKO\ GHSHQGHQW RQ WKH V\VWHP ,Q WKLV FDVH WKH V\VWHP VXJJHVWVE\JLYLQJ@2QWKH FRQWUDU\ZKHQ,'(LVYLHZHGDVDSURFHVVWKHXVHUFDQOHDUQ PRUHDERXWKLVUHTXLUHPHQWVE\EURZVLQJWKURXJKWKHUHVXOW VHWV$IWHUDIHZLWHUDWLRQVWKHXVHULVDEOHWRGULOOGRZQKLV LQWHUHVWVWRWKHSRLQWZKHUHKHLVYHU\VSHFLILFDERXWKLVGDWD UHTXLUHPHQWV,QWKLVSDSHUZHEXLOG,'(DVDSURFHVV 7KHGDWDH[SORUDWLRQV\VWHPWKHXVHUFDQVHHLQVLGHWKH ILQDO UHVXOWV DQG XVHV LW WR FKDQJHV WKH FXUUHQW H[SORUDWRU\ SURFHVVE\DGMXVWLQJWKHFXUUHQWSUHGLFDWHVRSHUDWLRQV7KLV TXHU\ WR TXHU\ QDYLJDWLRQ IRU GDWD H[SORUDWLRQ RIWHQ KDV RYHUODSSLQJUHVXOWVHWVDVZHOODVVLPLODUTXHU\WHUPV4XHU\ UHXVHE\FXVWRPL]LQJLQ,'(LVDQHIILFLHQWVWUDWHJ\WKDWFDQ EHXVHGZLWKRXWWKHKHOSRIGDWDH[SHUWVZKLFKLVWKHSULPDU\ DLPRIGDWDH[SORUDWLRQV\VWHPV$VDQ,'(VHDUFKLVDPDQ\ VWHS SURFHVV WKDW XVXDOO\ FRQVLVWV RI ZLGH UDQJH RI TXHULHV ZLWK LQH[DFW JRDOV DQG RIWHQ VXEPLWWHG TXHULHV DUH VOLJKWO\ PRGLILHG YHUVLRQV RU HDUOLHU VXEPLWWHG TXHULHV >@ 6XFK REVHUYDWLRQVLQYROYLQJKLJKDQGRYHUODSVXEVWDQWLDOUHXVHLQ WKH LQIRUPDWLRQ QHHGV RI XVHU IRUPV WKH EDVLV RI RXU TXHU\ FKHFNSRLQWLQJ IUDPHZRUN In this paper, an approach for query reuse is proposed. Since queries that share a high degree of overlap either in the result set or the query itself are likely to appear again in the near future, it would be beneficial for the system if these frequent queries are pre-computed. A. System Model 7KHZRUNIORZRIWKHSURSRVHGIUDPHZRUNLVGHSLFWHGLQ ILJXUH7KHIUDPHZRUNZRUNVXQGHUWKHDVVXPSWLRQWKDWWKH FKHFNSRLQWTXHULHVDUHYDOLGIRUDVLQJOHXVHUVHVVLRQKHQFH XVHUYDULRXVTXHULHVDUHLQSXW2XUSURSRVHGDSSURDFKLQ,'( XWLOL]HG WKH RYHUODS DPRQJ XVHU TXHULHV LQ DQ H[SORUDWLRQ VHVVLRQ 7KH DSSURDFK ZRUNV LQ SDUDOOHO WR DQ H[SORUDWRU\ VHVVLRQWRWKHLGHQWLILFDWLRQRIRYHUODSSHGSRUWLRQRITXHULHV FDOOHGFKHFNSRLQWTXHULHV7KHPDWHULDOL]HGYLHZVDUHXVHGLQ VXEVHTXHQW TXHULHV DIWHU UHIRUPXODWLRQ LQ WKH XVHU TXHU\ 7KHVH FKHFNSRLQW TXHULHV VLJQLILFDQWO\ LPSURYH WKH H[SORUDWLRQ SURFHVV 6LQFH WKHUH FDQ EH D ODUJH QXPEHU RI LQSXWTXHULHVLWLVZLVHWRUHGXFHWKHQXPEHURILQSXWTXHULHV 7KHILUVWSKDVHVWDUWVE\DSSO\LQJWKHFULWHULDRIfrequency of queriesLVVXHGLQH[SORUDWRU\VHVVLRQE\WKHXVHU2QO\WKH WRSNIUHTXHQWTXHULHVDUHVHOHFWHGIRUQH[WSKDVH7KHVHFRQG KHXULVWLFLVWKHsize of the result setWKDWLVWKHQXPEHURIURZV DQGFROXPQVUHWXUQHGDIWHUH[HFXWLQJWKHTXHULHV7KHVL]HRI WKHUHVXOWVHWLVDPHDVXUHRIKRZH[SHQVLYHLWZRXOGEHWR PDWHULDOL]HWKHTXHU\7KHODUJHUWKHVL]HRIWKHUHVXOWVHWWKH FRVWOLHULWZRXOGEHWRVWRUHLW/DUJHUTXHULHVDUHQRWSUHIHUUHG VLQFHWKHLUPDLQWHQDQFHFRVWLVKLJKWRR7KHRXWSXWRIWKLV SKDVH LV IHG WR WKH ODWWLFH FRQVWUXFWLRQ SKDVH ,Q WKLV SKDVH ODWWLFHLVFRQVWUXFWHGZKHUHTXHULHVDFWDVQRGHV 2QFHTXHU\ODWWLFHLVFRQVWUXFWHGDQLWHUDWLYHVHOHFWLRQRI WRSNTXHULHVZKLFK \LHOGV WKH VHW RI FDQGLGDWH FKHFNSRLQW TXHULHVVWDUWV)RUWKLV1HW*DLQRIHDFKFDQGLGDWHTXHU\LV HYDOXDWHGDQGTXHULHVZLWKSRVLWLYH1HW*DLQDUHLGHQWLILHG
DVFKHFNSRLQWTXHULHVWKXVPDWHULDOL]HG1RWHWKDWWKLVSKDVH LV LWHUDWLYH KHQFH XQWLO WKH SKDVH VHOHFWV FDQGLGDWH TXHULHV XQWLONTXHULHVDUHIRXQG7KHTXHULHVZKRVH1HW*DLQIDLOV WR PHHW WKH UHTXLUHG FULWHULD LQ SKDVH 3L DUH GLVFDUGHG DQG VXEVHTXHQWO\3LSKDVHVWDUWV0XOWLSOHFKHFNSRLQWTXHULHV VHWV DUH FUHDWHG DIWHU HYHU\ IL[HG LQWHUYDO GHILQHG E\ FKHFNSRLQWWKUHVKROG 6LQFHXVHU¶VUHTXLUHPHQWVDVZHOO DVWKHSDUDPHWHUVXVHGPLJKWFKDQJHDVVHDUFKSURJUHVV
Figure 2&KHFNSRLQW4XHU\JHQHUDWLRQ6\VWHPIRU,'( B. Problem Definition 7KHPDLQDLPRIWKHV\VWHPLVWRfind the most beneficial checkpoint queries VHWV VXFK WKDW WKH SHUIRUPDQFH RI WKH V\VWHPLVPD[LPL]HGLIWKHVHTXHULHVZHUHPDWHULDOL]HG7KH FKDOOHQJH LV WR GHVLJQ WKHVH FKHFNSRLQWV WR PD[LPL]H UHXVDELOLW\LQFRQVLGHUDWLRQRIWKHIXWXUHTXHULHV7KLVFDQEH VHHQ DV DQ RQOLQH PDWHULDOL]HG YLHZ VHOHFWLRQSUREOHP :H FDQ OHYHUDJH WKH NQRZOHGJH RI WKH VHPDQWLFV RI VWHHULQJ RSHUDWRUVDQGSURILOHVWRSUHGLFWXSFRPLQJTXHU\WUDQVLWLRQV HJDWWULEXWHVOLNHO\WREHLQYROYHGLQWKHDJJUHJDWLRQVUHODWH RSHUDWLRQV DQG SUHGLFDWHV 7KH RUGHU RI PDWHULDOL]LQJ WKH FKHFNSRLQWV LV DOVR RI FRQFHUQ VLQFH XVHU FDQ LQWHUUXSW WKH FXUUHQWTXHU\DQ\WLPH'\QDPLFYLHZV>@KDYHPDQ\IROG EHQHILWVRYHUVWDWLFYLHZV2XUSURSRVHGV\VWHPG\QDPLFDOO\ PDWHULDOL]HV TXHULHV WR PDWFK ZRUNORDG EXW DOVR WDNHV LQWR DFFRXQWWKHPDLQWHQDQFHLVVXHVVXFKDVWLPHWRXSGDWHYLHZV DQG VSDFH DYDLODELOLW\ $ VWDWLF VHOHFWLRQ RI YLHZV PLJKW EHFRPHRXWGDWHGDQGLWFDQQRWWXQHDZURQJVHOHFWLRQ ,,, 48(5^6L]H3DUHQW 6L]H4 `5HFRUGVSHU%ORFN @
)UHTXHQF\ 7LPHRIRQH%ORFN$FFHVV 6)4VHOHFWLRQ 6)4SURMHFWLRQ 6)4MRLQ &RVWWRILQGTXDOLI\LQJWXSOHV4 6) %ORFN 5HWULHYLQJ&RVW 6L]H &RVW4 LQLWLDOL]DWLRQFRVW FRVWWRILQGTXDOLI\LQJ WXSOHV FRVWSURFHVVVHOHFWHGWXSOHV
Net-Gain(Q3)
$GYDQWDJH4 &RVW4 ±
7DEOH&DOFXODWLRQVSHUIRUPHGLQWKHVHOHFWHGH[DPSOH
,9 $1$/@ 0 'URVRX DQG ( 3LWRXUD @ .HUVWHQHWDO7KH5HVHDUFKHU V*XLGHWRWKH'DWD'HOXJH4XHU\LQJD 6FLHQWLILF'DWDEDVHLQ-XVWD)HZ6HFRQGV39/'% SS >@ 00=ORRI4XHU\E\H[DPSOH,Q3URFHHGLQJVRIWKH0D\ QDWLRQDO FRPSXWHU FRQIHUHQFH DQG H[SRVLWLRQ SS $&0 >@ . 'LPLWULDGRX 2 3DSDHPPDQRXL DQG @ &6KDKDQG*0DUFKLRQLQL4XHU\UHXVHLQH[SORUDWRU\VHDUFKWDVNV ,Q 3RVWHU DW :RUNVKRS RQ +XPDQ&RPSXWHU ,QWHUDFWLRQ DQG ,QIRUPDWLRQ5HWULHYDO+&,5 >@ ( %DOIH DQG % 6P\WK $Q DQDO\VLV RI TXHU\ VLPLODULW\ LQ FROODERUDWLYHZHEVHDUFK,Q$GYDQFHVLQ,QIRUPDWLRQ5HWULHYDOSS >@ @ , 0DPL DQG = %HOODKVHQH $ VXUYH\ RI YLHZ VHOHFWLRQ PHWKRGV $&06,*02'5HFRUGQRSS >@ &$ 'KRWH DQG 0 6 $OL 0DWHULDOL]HG YLHZ VHOHFWLRQ LQ GDWD ZDUHKRXVLQJDVXUYH\-RXUQDORI$SSOLHG6FLHQFHVQRSS >@ $/DEULQLGLV4/XR-;XDQG:;XH&DFKLQJDQGPDWHULDOL]DWLRQ IRUZHEGDWDEDVHV)RXQGDWLRQVDQG7UHQGVLQ'DWDEDVHVQR >@ $@ 6&KDXGKXULDQG8'D\DO$QRYHUYLHZRIGDWDZDUHKRXVLQJDQG 2/$3WHFKQRORJ\$&06LJPRGUHFRUGQRSS >@ 6 ,GUHRV 2 3DSDHPPDQRXLO DQG 6 &KDXGKXUL 2YHUYLHZ RI GDWD H[SORUDWLRQWHFKQLTXHV,Q3URFHHGLQJVRIWKH$&06,*02' ,QWHUQDWLRQDO&RQIHUHQFHRQ0DQDJHPHQWRI'DWDSS$&0 >@ 76HOODPDQG0.HUVWHQ0HHW&KDUOHVELJGDWDTXHU\DGYLVRU SS >@ +8FKL\DPD.5XQDSRQJVDDQG7-7HRUH\$SURJUHVVLYHYLHZ PDWHULDOL]DWLRQ DOJRULWKP ,Q 3URFHHGLQJV RI WKH QG $&0 LQWHUQDWLRQDO ZRUNVKRS RQ 'DWD ZDUHKRXVLQJ DQG 2/$3 SS $&0 >@ 1DQGL$UQDE4XHU\LQJ:LWKRXW.H\ERDUGV,Q&,'5 >@ $ $ERX]LHG -0 +HOOHUVWHLQ DQG $ 6LOEHUVFKDW] 3OD\IXO TXHU\ VSHFLILFDWLRQZLWK'DWD3OD\3URFHHGLQJVRIWKH9/'%(QGRZPHQW QRSS >@ -)DQ*/LDQG/=KRX,QWHUDFWLYH64/TXHU\VXJJHVWLRQ0DNLQJ GDWDEDVHVXVHUIULHQGO\,Q,(((WK,QWHUQDWLRQDO&RQIHUHQFH RQ'DWD(QJLQHHULQJSS,((( >@ %4DUDEDTLDQG05LHGHZDOG8VHUGULYHQUHILQHPHQWRILPSUHFLVH TXHULHV ,Q ,((( WK ,QWHUQDWLRQDO &RQIHUHQFH RQ 'DWD (QJLQHHULQJSS,((( >@ @ 6,GUHRVDQG(/LDURXGE7RXFK$QDO\WLFVDW\RXU)LQJHUWLSV,Q &,'5 >@ $ 1DQGL DQG + 9 -DJDGLVK *XLGHG LQWHUDFWLRQ 5HWKLQNLQJ WKH TXHU\UHVXOWSDUDGLJP3URFHHGLQJVRIWKH9/'%(QGRZPHQWQR SS >@ %+LOO$ODWWLFHIUDPHZRUNIRUUHXVLQJWRSNTXHU\UHVXOWV,Q,5, ,((( ,QWHUQDWLRQDO &RQIHUHQFH RQ ,QIRUPDWLRQ 5HXVH DQG ,QWHJUDWLRQ&RQISS,((( >@ 9+DULQDUD\DQ$5DMDUDPDQDQG-'8OOPDQ,PSOHPHQWLQJGDWD FXEHVHIILFLHQWO\,Q$&06,*02'5HFRUGYROQRSS