Dealing with categorical data

49 downloads 95440 Views 129KB Size Report
The test then compares the observed and expected values, a large ... these benefits, the observed data are quite far from what one would expect if there.
    ' ( $ / , 1 *  : , 7 +  & $ 7 ( * 2 5 , & $ /  ' $ 7 $  ³  / ( & 7 8 5 (  1 2 7 ( 

6HVVLRQ



'HDOLQJZLWKFDWHJRULFDOGDWD ($OODQ5'6WHUQ5&RHDQG-'H:ROI 

&DWHJRULFDO'HSHQGHQW9DULDEOHV ,IZHWDNHDFORVHUORRNWRWKHGDWDVHWRIWKHVWXG\,PSURYHGIDOORZVDQGURFNSKRVSKDWH IDUPHUV· H[SHULHQFHV ZH VHH WKDW PDQ\ UHVSRQVHV  DUH PHDVXUHG RQ TXDOLWDWLYH VFDOHV 2IWHQ TXHVWLRQV DUH DQVZHUHG ZLWK \HVQR DEVHQWSUHVHQW SRVLWLYHQHXWUDOQHJDWLYHSRRUJRRGYHU\JRRG,QRWKHUFDVHVIDUPHUVDUHDVNHGWR JLYHDVFRUHEHWZHHQDQG$QDO\VHVXVHGVRIDUEDVHGRQPRGHOVZKLFKDVVXPH FRQVWDQWYDULDQFHDQGQRUPDOGLVWULEXWLRQVZRXOGQRWVHHPDSSURSULDWH 6HVVLRQGHVFULEHGWKHJHQHUDOLGHDVRIGHDOLQJZLWKQRQQRUPDOGLVWULEXWLRQVXVLQJ JHQHUDOL]HGOLQHDUPRGHOV7KHVHDUHRXUPDLQWRROIRUWDFNOLQJWKLVFRPSOLFDWLRQ 7KH VLPSOHVW VLWXDWLRQ LV ZKHQ WKH DQVZHUFDQ RQO\ EH RQH RXW RI WZR SRVVLELOLWLHV HJ \HVQR DEVHQWSUHVHQW IDLOXUHVXFFHVV IDOVHWUXH   ,Q WKLV FDVH ZH KDYH D ELQDU\UHVSRQVHZKLFKFDQWDNHRQHRIWZRYDOXHVFRQYHQLHQWO\FRGHGDQG &DWHJRULFDO GDWD FDQ DOVR FRQVLVW RI PRUH WKDQ WZR SRVVLELOLWLHV  6RPHWLPHV WKRVH FDWHJRULHVGRQRWKDYHDQ\SDUWLFXODURUGHU)RUH[DPSOHKRXVHKROGW\SHPD\KDYH FODVVHV PDOH KHDGHG PDOH SUHVHQW  PDOH KHDGHG PDOH DEVHQW  IHPDOH KHDGHG ,I WKHVH DUH ODEHOOHG   DQG   WKH ODEHOV WKHPVHOYHV DUH PHDQLQJOHVV DQG WKH LQIRUPDWLRQLQWKHGDWDZRXOGEHWKHVDPHLIWKH\ZHUHFKDQJHG+RXVHKROGW\SHLV FDOOHG D QRPLQDO FDWHJRULFDO UHVSRQVH ,Q RWKHU  FDVHV WKHUH ZLOO EH VRPH RUGHU LQ WKH YDULRXVFDWHJRULHV7KLQNDERXWDQVZHUVWKDWFDQEH´QHYHUµ´VHOGRPµRU´RIWHQµRU WKDW FDQ EH ´QHJDWLYHµ ´QHXWUDOµ RU ´SRVLWLYHµ ,Q WKHVH FDVHV WKH DQVZHUV DUH VWUXFWXUHGLQDORJLFDORUGHUWKDWFDQQRWEHFKDQJHG6XFKUHVSRQVHVDUHFDOOHGRUGLQDO RURUGHUHGFDWHJRULFDOUHVSRQVHV )RUVLPSOHFDVHVRIWKHVHUHVSRQVHFDWHJRULFDOUHVSRQVHYDULDEOHVXVHIXOGHVFULSWLYH VXPPDULHVRIWKHVHUHVSRQVHVFDQEHFRQVWUXFWHGDVWDEOHVRIFRXQWVRUSHUFHQWDJHV ZLWKWKHUHVSRQVHRQRQHPDUJLQDQGDFODVVLI\LQJYDULDEOH VXFKDVWUHDWPHQW RQWKH RWKHU([DPSOHVZHUHXVHGLQ6HVVLRQ,WLVKDUGHUWRILQGJRRGVXPPDULHVZKHQ LQWHUHVWHG LQ WKH UHODWLRQVKLS EHWZHHQ D FDWHJRULFDO YDULDEOH DQG FRQWLQXRXV ¶H[SODQDWRU\·YDULDEOHV





    ' ( $ / , 1 *  : , 7 +  & $ 7 ( * 2 5 , & $ /  ' $ 7 $  ³  / ( & 7 8 5 (  1 2 7 ( 

$FURVVWDEXODWLRQRIFRXQWVLVVRPHWLPHVUHIHUUHGWRDVDFRQWLQJHQF\WDEOH)RU ZD\FRQWLQJHQF\WDEOHVDVLPSOHFRQILUPDWRU\WHVWH[LVWVWKH&KLVTXDUHWHVWEXW WKHOLPLWVRIWKHDSSOLFDELOLW\RIWKLVPHWKRGZLOOEHVRRQUHDFKHGDVZHZLOOVHH )RU PRUH JHQHUDO SUREOHPV  D PRGHOOLQJ DSSURDFK LV XVHG 7KH ORJLVWLF UHJUHVVLRQ PRGHOLQWURGXFHGLQ6HVVLRQLVH[SORUHGIXUWKHULQWKLVVHVVLRQDVLWZLGHO\XVHIXO DQGWKHEDVLVIRURWKHUGHYHORSPHQWV

&RQWLQJHQF\WDEOHV :H VDLG LQ 6HVVLRQ  WKDW FURVVWDEXODWLRQV RU FRQWLQJHQF\ WDEOHV RIWHQ JLYH VXIILFLHQW LQIRUPDWLRQ IRU RXU UHSRUW ZLWKRXW WKH QHHG IRU IRUPDO DQDO\VLV 6RPHWLPHVWKRXJKDPRUHIRUPDOWHVWLVQHHGHGDQGZHFRQVLGHUWKLVKHUH $VDQH[DPSOHORRNDWWKHPRVWLPSRUWDQWEHQHILWRILPSURYHGIDOORZVUHSRUWHGE\ PHQDQGZRPHQ$WDEXODWLRQRIWKHGDWD²HLWKHUXVLQJ([FHORU*HQVWDW²ZRXOG SURGXFHWKHIROORZLQJVXPPDU\ 2EVHUYHGYDOXHV  ) 0 

FURS   

VHHG   

VRLO   

ZHHG ZRRG WRWDO         

VRLO  

ZHHG ZRRG WRWDO      

 7KLV GDWD LV UHOHYDQW WR DGGUHVV WKH TXHVWLRQ ¶'R WKH JHQGHUV GLIIHU LQ WKHLU SHUFHSWLRQRIWKHPRVWLPSRUWDQWEHQHILWRILPSURYHGIDOORZV"·7KHSDWWHUQVLQWKH WDEOHDUHHDVLHUWRVHHLIWKHGDWDDUHSUHVHQWHGDVSHUFHQWDJHV  ) 0

FURS  

VHHG  

 ,WDSSHDUVIRUH[DPSOHWKDWZRPHQYDOXHWKHEHQHILWVWRZHHGLQJPRUHWKDQPHQ PD\EHWREHH[SHFWHGDVWKH\GRPXFKRIWKHZHHGLQJ2QWKHRWKHUKDGPRUHPHQ YDOXH WKH ZRRG SURGXFHG LQ WKH IDOORZ 1RZ D FRQILUPDWRU\ WHVW FDQ EH GRQH WR FKHFNZKHWKHUWKHVHSDWWHUQVFRXOGMXVWEHGXHWRVDPSOLQJYDULDWLRQ$&KLVTXDUHG WHVWLVDSSURSULDWH 7KLV&KLVTXDUHWHVWFDOFXODWHVIRUHDFKFHOOLQWKHWDEOHWKHIUHTXHQF\WKDWLVH[SHFWHG IRU HDFK FDWHJRU\ DVVXPLQJ QR GLIIHUHQFH EHWZHHQ PDOHV DQG IHPDOHV  $ FRPSDULVRQ LV WKHQ PDGH EHWZHHQ WKH REVHUYHG IUHTXHQFLHV DQG WKH H[SHFWHG IUHTXHQFLHVDQGWKHIXUWKHUWKHVHWZRDUHDSDUWWKHPRUHFRQYLQFLQJHYLGHQFHWKHUH LVWRUHMHFWWKHK\SRWKHVLV 7RWDOV DUH LQFOXGHG LQ WKH WDEOH DERYH WR VKRZ KRZ H[SHFWHG YDOXHV FDQ EH FDOFXODWHG 7KH WDEOH RI H[SHFWHG YDOXHV LV JLYHQ EHORZ  FDOFXODWHG DV IROORZV ,I WKHUHLVQRVH[HIIHFWWKHQWKHSURSRUWLRQZKRVD\´FURSµLVHVWLPDWHGWREH





    ' ( $ / , 1 *  : , 7 +  & $ 7 ( * 2 5 , & $ /  ' $ 7 $  ³  / ( & 7 8 5 (  1 2 7 ( 

+HQFHZHZRXOGH[SHFWWKDWRIWKHIHPDOHVLQWKHVWXG\[LH ZRXOGUHVSRQGZLWK´FURSµ$QGVRRQIRUWKHUHVWRIWKHFHOOVLQWKHWDEOH ([SHFWHGYDOXHV  ) P 

FURS VHHG VRLO ZHHG ZRRG                   

 7KHWHVWWKHQFRPSDUHVWKHREVHUYHGDQGH[SHFWHGYDOXHVDODUJHGLVFUHSDQF\EHLQJ HYLGHQFHDJDLQVWWKHK\SRWKHVLVXVHGWRFDOFXODWHH[SHFWHGYDOXHV *Warning: some cells in the table of expected values have less than 5 Pearson chi-square value is 8.98 with 4 df. Probability level (under null hypothesis) p = 0.062 Likelihood chi-square value is 9.71 with 4 df. Probability level (under null hypothesis) p = 0.046

 0DQ\UHVHDUFKHUVZLOOEHIDPLOLDUZLWKWKH3HDUVRQ·VFKLVTXDUHVWDWLVWLF²ZKLFKLVWKH FRPPRQO\XVHGIRUPRIWKH&KLVTXDUHWHVW:KDWLVOHVVZHOONQRZQWKRXJKLVWKH RWKHUIRUPZKLFKFDQEHXVHG²WKHPD[LPXPOLNHOLKRRGVWDWLVWLF%RWKWHVWVIRFXV RQ WKH FRPSDULVRQ RI WKH REVHUYHG DQG H[SHFWHG GDWD XQGHU DQ DVVXPSWLRQ RI QR GLIIHUHQFHEHWZHHQPDOHVDQGIHPDOHV,WLVWKHUHIRUHQRWVXUSULVLQJWKDWWKH\KDYH VLPLODU FKLVTXDUH YDOXHV  DW DURXQG   DQG VLPLODU SYDOXHV  FORVH WR WKH  VLJQLILFDQFHOHYHO7KHUHLVQRDGYDQWDJHLQXVLQJRQHDVRSSRVHGWRWKHRWKHU7KH 3HDUVRQVWDWLVWLFLVWKHWUDGLWLRQDOO\XVHGRQHEHFDXVHWKHKDQGFDOFXODWLRQLVHDV\EXW PRVWVRIWZDUHSDFNDJHVQRZJLYHERWK 7KH *HQVWDW ZDUQLQJ SRLQWV RXW RQH OLPLWDWLRQ RI WKH FKLVTXDUH WHVW ² WKDW WKH H[SHFWHGYDOXHRIDSDUWLFXODUFHOOLQWKHFRQWLQJHQF\WDEOHVKRXOGQRWEHWRRVPDOO $V D UXOH WKH WHVW ZLOO EH YDOLG SURYLGHG WKDW IHZHU WKDQ  RI FHOOV KDYH DQ H[SHFWHGFRXQWEHORZDQGQRQHDUHEHORZ7KHDERYHZDUQLQJLVKLJKOLJKWLQJ WKHORZH[SHFWHGFRXQWIRUWKH¶VHHG·EHQHILWIRUPHQZKHQQRGLIIHUHQFHEHWZHHQ PHQ DQG ZRPHQ LV DVVXPHG  7KLV LV RQH FHOO RXW RI    DQG DOO RWKHU FHOOV KDYHDQH[SHFWHGFRXQWVJUHDWHUWKDQVRWKHWHVWLVDFFHSWDEOH7KLVUHVWULFWLRQ DERXWH[SHFWHGFHOOFRXQWVLPSOLHVWKDWWKHFKLVTXDUHWHVWLVRQO\XVHIXOIRUODUJHGDWD VHWVRUIRUFUXGHFDWHJRULVDWLRQVRIGDWD 7KH ORZ SYDOXH V  LQGLFDWH WKDW WKH K\SRWKHVLV RI QR GLIIHUHQFH EHWZHHQ PHQ DQG ZRPHQLVGLIILFXOWWRVXVWDLQLHWKHUHLVHYLGHQFHWRVXJJHVWWKDWWKHUHLVDGLIIHUHQFH EHWZHHQWKHJHQGHUV)URPWKHFRQWLQJHQF\WDEOHZHVHHWKDWWKLVLVFDXVHGE\WKH GLIIHUHQFHEHWZHHQWKHJHQGHUVZKHQLWFRPHVWRERWKZHHGVDQGZRRG)RUERWKRI WKHVHEHQHILWVWKHREVHUYHGGDWDDUHTXLWHIDUIURPZKDWRQHZRXOGH[SHFWLIWKHUH ZDV QR GLIIHUHQFH EHWZHHQ PHQ DQG ZRPHQ  7KH FKLVTXDUH VWDWLVWLF DQG WKH DFFRPSDQ\LQJSYDOXHPDNHRQO\DJHQHUDOVWDWHPHQWFRQFHUQLQJDOOFDWHJRULHV7KLV VLWXDWLRQLVVLPLODUWRZKHQZHXVHG$129$DQGLWV)WHVW)RU$129$DPRUH GHWDLOHGDQDO\VLVXVLQJVHG·VKDGWREHXVHGWRORRNDWSDUWLFXODUGLIIHUHQFHV,QWKH





    ' ( $ / , 1 *  : , 7 +  & $ 7 ( * 2 5 , & $ /  ' $ 7 $  ³  / ( & 7 8 5 (  1 2 7 ( 

FDVH RI FRQWLQJHQF\ WDEOHV WKLV ZLOO KDYH WR EH GRQH E\ DQDO\VLQJ VXEWDEOHV IRU LQVWDQFHRQHWKDWOHDYHVRXWWKHFDWHJRU\¶ZHHG· 7KH SUREOHPZLWKWKHVH&KLVTXDUHWHVWVLVWKDWWKH\DUHRQO\DSSOLFDEOHIRUVLPSOH WZRZD\WDEOHV8QIRUWXQDWHO\WKLVUDUHO\FRUUHVSRQGVWRRXUDQDO\VLVREMHFWLYHV,Q 6HVVLRQZHVDZWKDWZHQHHGHGDPXOWLZD\WDEOHWRUHODWHXVHRIURFNSKRVSKRURXV WRWKHVH[RIWKHIDUPHUDQGWKHLUSULRUH[SHULHQFHZLWKWKHXVHRIIHUWLOLVHUV0RUH JHQHUDOO\ ZH PLJKW ZDQW WR UHODWH RXU UHVSRQVH WR D FRQWLQXRXV H[SODQDWRU\ FKDUDFWHULVWLFVXFKDVIDUPVL]HRUDPRXQWRIIHUWLOLVHUXVHG ,Q WKH UHPDLQLQJ VHFWLRQV ZH VHH KRZ WR DQDO\VH FDWHJRULFDO GHSHQGHQW YDULDEOHV ZKHQWKHVWUXFWXUHRIWKHGDWDLVPRUHWKDQMXVWDVLPSOHFRPSDULVRQRIJURXSV:H EHJLQ ZLWK ELQDU\ UHVSRQVHV EHIRUH DQG WKHQ FRQVLGHU KRZ WR GHDO ZLWK UHVSRQVHV ZKHUHWKHUHDUHPRUHWKDQWZRFDWHJRULHVWRFKRRVHIURP

0RGHOOLQJ%LQDU\5HVSRQVHV 0DQ\K\SRWKHVHVRUUHVHDUFKTXHVWLRQVFRQFHUQKRZVRPHELQDU\UHVSRQVH²VXFK DV ZKHWKHU RU QRW  LPSURYHG IDOORZV DUH XVHG ²  DQG KRZ WKLV UHODWHV WR DVHW RI  FRQWLQXRXV RU GLVFUHWH H[SODQDWRU\ YDULDEOHV  )RU H[DPSOH LQ RXU W\SH  WULDO ZH PLJKW ZDQW WR LQYHVWLJDWH KRZ ODUJH WKH SORW XQGHU LPSURYHG IDOORZ FDQ EHFRPH DFUHDJH  EHIRUH LW VWDUWV WR JLYH WKH IDUPHUV SUREOHPV ZLWK ODERXU  7KLV FRXOG EH DQDO\VHGE\UHODWLQJWKHLQGLFDWLRQRIWKHODERXUSUREOHP \HVQR WRWKHSORWVL]H $QRWKHUFRPPRQVLWXDWLRQLVZKHQZHZDQWWRHYDOXDWHIDFWRUVDVVRFLDWHGDFHUWDLQ EHKDYLRXU²IRUH[DPSOH´KRZGRXVHUVRILPSURYHGIDOORZGLIIHUIURPQRQXVHUV"µ 7KLV ZRXOG LQYROYH UHODWLQJ VHYHUDO H[SODQDWRU\ YDULDEOHV WR WKH XVHUQRQXVHU UHVSRQVH 7RGRWKLVZHPRYHWRILWWLQJPRGHOVWRGDWD²EHDULQJLQPLQGWKDWRXUGHSHQGHQW YDULDEOHLVQRORQJHUFRQWLQXRXV7KHJHQHUDOSULQFLSOHVRIPRGHOILWWLQJVWLOODSSO\ :LWKFRQWLQXRXVGDWDZHXVHGPHDQVDQGVWDQGDUGHUURUVWRVXPPDULVHWKHSDWWHUQ LQRXUUHVSRQVHVWKHQZHQWEDFNWRWKHUDZGDWDWRILWDPRGHOZKLFKDVVXPHGDQ XQGHUO\LQJ 1RUPDO GLVWULEXWLRQ  +HUH ZH XVH PXOWLZD\ FRQWLQJHQF\ WDEOHV WR VXPPDULVHWKHSDWWHUQVLQRXUELQDU\UHVSRQVHVEXWJREDFNWRWKHUDZGDWDWRPRGHO WKHHIIHFWVRIYDULRXVIHDWXUHVRQWKLVUHVSRQVH /HWXVXVHWKHH[DPSOHRI´WKHGHWHUPLQDQWVIRUKDYLQJLPSURYHGIDOORZµWRLOOXVWUDWH WKHDQDO\VLVRIELQDU\GDWD7KHUDZGDWDDUHGHSLFWHGEHORZHDFKURZUHSUHVHQWLQJD GLIIHUHQWIDUP 





    ' ( $ / , 1 *  : , 7 +  & $ 7 ( * 2 5 , & $ /  ' $ 7 $  ³  / ( & 7 8 5 (  1 2 7 ( 

  7KH YDULDEOH SUHV,) LV RXU ELQDU\ UHVSRQVH ² LW WDNHV WKH YDOXH RI  LI LPSURYHG IDOORZLVXVHGLQDQGLILWZDVQRWXVHG 7KHYDULDEOHVOXRUHFRUGVWKHHWKQLFJURXS HLWKHURULQGLFDWLQJWKH/XRRU/XK\D HWKQLF JURXS  DQG QDWIDO WKH XVH RI QDWXUDO IDOORZ HLWKHU  RU  LQGLFDWLQJ WKH DEVHQFHRUSUHVHQFHRIWUDGLWLRQDOIDOORZ 7KHFRQWLQXRXVYDULDEOHIDUPVL]HJLYHVWKH VL]HRIWKHIDUPLQKHFWDUHV &RQVLGHUWKHVLPSOHTXHVWLRQ´,VWKHXVHRILPSURYHGIDOORZVUHODWHGWRHWKQLFLW\"µ ZKLFKFDQEHDGGUHVVHGE\WKHFRQWLQJHQF\WDEOH 

 /XR /X\KD 7RWDO

1R