Empirical Validation of a Usability Inspection ... - Semantic Scholar

4 downloads 65928 Views 1MB Size Report
break the Web application design up into three dimensions: content, navigation ..... inspection methods that are oriented towards traditional Web development.
*Manuscript Click here to view linked References

Empirical Validation of a Usability Inspection Method for Model-Driven Web Development Adrian Fernandeza,1, Silvia Abrahãoa, Emilio Insfrana a

ISSI Research Group, Department of Information Systems and Computation, Universitat Politècnica de València, Camino de Vera, s/n, 46022, Valencia, Spain.

Abstract Web a pplica t ion s sh ou ld be u sa ble in or der t o be a ccept ed by u ser s a n d t o im pr ove t h e ir su ccess pr oba bilit y. Despit e t h e fa ct t h a t t h is r equ ir em en t h a s pr om ot ed t h e em er gen ce of sever a l u sa bilit y eva lu a t ion m et h ods, t h er e is a n eed for em pir ica lly va lida t ed m et h ods t h a t pr ovide eviden ce a bou t t h eir effect iven ess a n d t h a t ca n be pr oper ly in t egr a t ed in t o ea r ly st a ges of Web developm en t pr ocesses . Model-dr iven Web developm en t pr ocesses h a ve gr own in popu la r it y over t h e la st few yea r s, a n d offer a su ita ble con t ext in wh ich t o per for m ea r ly u sa bilit y eva lu a tion s du e to t h eir in t r in sic t r a cea bility m ech a n ism s . Th ese issues h a ve m ot iva t ed u s t o pr opose a Web Usa bilit y E va lu a tion P r ocess (WUE P ) wh ich ca n be int egr a t ed in t o m odel-dr iven Web developm en t pr ocesses. Th is pa per pr esen t s a fa m ily of exper im en t s t h a t we h a ve ca r r ied ou t t o em pir ica lly va lida t e WUE P . Th e fa m ily of exper im en t s wa s ca r r ied ou t by 64 pa r t icipa n t s, in clu din g P h D a n d Ma st er ‟s com put er scien ce st u den t s. Th e object ive of t h e exper im en t s wa s t o eva lu a t e t h e pa r t icipa n t s‟ effect iven ess, efficien cy, per ceived ea se of u se a n d per ceived sa t isfa ct ion wh en u sin g WUE P in com pa r ison t o a n in du st r ia l widely-u sed in spect ion m et h od: H eu r ist ic E va lu a t ion (H E ). Th e st a tist ica l a n a lysis a n d m et a -a na lysis of t h e da t a obt a in ed sepa r a t ely fr om ea ch exper im en t in dica t ed t h a t WUE P is m or e effect ive a n d efficien t t h a n H E in t h e det ect ion of u sa bilit y pr oblem s. Th e eva lu a t or s wer e a lso m or e sa t isfied wh en a pplyin g WUE P , a n d fou n d it ea sier t o u se t h a n H E . Alt h ou gh fu r t h er exper im en t s m u st be ca r r ied ou t t o st r en gt h en t h ese r esu lt s, WUE P h a s pr oved t o be a pr om isin g u sa bilit y in spect ion m et h od for Web a pplica t ion s wh ich h a ve been developed by u sin g m odel-dr iven developm en t pr ocesses. Ke y w o rd s : Usa bilit y inspect ion , Web a pplica t ion s, Model-dr iven developm en t , F a m ily of exper im en t s

1

Corresponding author: Tel: +34 96 387 73 50 (Ext: 83525); Fax: +34 96 387 73 59 Email addresses: [email protected] (Adrian Fernandez), [email protected] (Silvia Abrahão), [email protected] (Emilio Insfran).

1.

INTRODUCTION

Web a pplica t ion s pla y a n im por t a n t r ole in bu sin ess a ct ivit ies, in for m a t ion exch a n ge, a n d socia l n et wor ks. Th e a ccept a bilit y of Web a pplica t ion s r elies on t h e ea se or difficu lt y t h a t u ser s exper ien ce wit h t h is kin d of syst em s [Ma t er a et a l. 2006]. Usa bilit y is t h er efor e con sider ed t o be on e of t h e m ost im por t a n t qu a lit y fa ct or s for Web a pplica t ion s [Offu t 2002]. Th e ch a llen ge of developin g m or e u sa ble Web a pplica t ion s h a s led t o t h e em er gen ce of u sa bilit y eva lu a t ion m et h ods wit h wh ich t o a ddr ess Web u sa bilit y. Th ese m et h ods ca n be pr in cipa lly cla ssified in t o t wo differ en t t ypes: em pirical m eth od s a n d in spection m eth od s. E m pir ica l m et h ods a r e ba sed on obser vin g, ca pt u r in g a n d a n a lyzin g u sa ge da t a fr om r ea l en d-u ser s, wh er ea s in spect ion m et h ods a r e per for m ed by exper t eva lu a t or s or Web design er s, a n d a r e ba sed on r eviewin g u sa bilit y pr in ciples in Web a r t ifa ct s (e.g., m ocku ps, con cept u a l m odels, u ser in t er fa ces) wit h r ega r d t o t h eir con for m a n ce wit h a set of gu idelin es . Th e em ploym en t of u sa bilit y eva lu a t ion m et h ods t o eva lu a t e Web a r t ifa ct s wa s in vest iga t ed t h r ou gh a syst em a t ic m a ppin g st u dy in a pr eviou s wor k [F er n a n dez et a l. 2011a ]. Th is st u dy r evea led va r iou s fin din gs su ch a s: a ) Th er e is a la ck of u sa bilit y eva lu a t ion m et h ods t h a t ca n be pr oper ly in t egr a t ed in t o t h e ea r ly st a ges of Web developm en t pr ocesses. b) Th er e is a sh or t a ge of u sa bilit y eva lu a t ion m et h ods t h a t h a ve been em pir ica lly va lida t ed. Th ese r esu lt s, a n d pa r t icu la r ly fin din g „a )‟, m ot iva t ed u s t o pr opose t h e Web Usa bilit y E va lu a t ion P r ocess (WUE P ) in pr eviou s r esea r ch [F er n a n dez et a l. 2011b]. WUE P is a n in spect ion m et h od wh ich ca n be in st a n t ia t ed a n d in t egr a t ed in t o differ en t m odel-dr iven Web developm en t (MDWD) pr ocesses. Most MDWD pr ocesses br ea k t h e Web a pplica t ion design u p in t o t h r ee dim en sion s: con t en t , n a viga t ion a n d pr esen t a t ion . Th ese dim en sion s a llow pr oper levels of a bst r a ct ion t o be est a blish ed [Ca st eleyn et a l. 2009]. An MDWD pr ocess ba sica lly t r a n sfor m s m odels t h a t a r e in depen den t of t ech n ologica l im plem en t a t ion det a ils (i.e., P la t for m -In depen den t Models - P IMs), su ch a s st r u ct u r a l m odels, n a viga t ion a l m odels or a bst r a ct u ser in t er fa ce (UI) m odels, in t o ot h er m odels t h a t con t a in specific a spect s fr om a specific t ech n ologica l pla t for m (i.e., P la t for m -Specific Models - P SMs), su ch a s specific u ser in t er fa ce m odels or da t a ba se sch em a s. Th is is don e a u t om a t ica lly by a pplyin g t r a n sfor m a t ion r u les. P SMs ca n be a u t om a t ica lly com piled t o gen er a t e t h e sou r ce code of t h e fin a l Web a pplica t ion . In t h is r espect , eva lu a t ion s of t h ese m odels ca n pr ovide ea r ly u sa bilit y eva lu a t ion r epor t s wit h wh ich t o iden t ify pr oblem s t h a t will a ppea r in t h e fin a l Web a pplica t ion a n d con sequ en t ly, ca n be cor r ect ed pr ior t o t h e gen er a t ion of t h e sou r ce code. Wit h r ega r d t o fin din g „b)‟, we pr esen t ed in a sh or t pa per [F er n a n dez et a l. 2010] a con t r olled pilot exper im en t a s a n in it ia l st ep in t h e em pir ica l va lida t ion of t h e Web Usa bilit y E va lu a t ion P r ocess (WUE P ). H owever , r eplica t ion s a r e n ecessa r y t o pr ovide m or e eviden ce a bou t t h e va lidit y of t h ese r esu lt s. Th e con cept of r eplica t ion is ext en ded t o t h e “fa m ily of exper im en t s” r epor t ed by Ba sili et a l. [1999]. A fa m ily is com posed of m u lt iple sim ila r exper im en t s t h a t pu r su e t h e sa m e goa l t o bu ild t h e kn owledge n eeded t o ext r a ct sign ifica n t con clu sion s. Th is m ot iva t ed t h e m a in con t r ibu t ion of t h is pa per wh ich is, in deed, a fa m ily of t h r ee con t r olled exper im en t s a im ed a t em pir ica lly va lida t in g t h e E ffectiven ess, E fficien cy, Perceived E ase of Use, a n d Perceived S atisfaction of Use of WUE P wh en com pa r ed wit h a r epr esen t a t ive in spect ion m et h od t h a t is com m on ly a pplied in in du st r y: H eu r ist ic E va lu a t ion (H E ) [Nielsen 1994]. Th ese con t r olled exper im en t s wer e con du ct ed wit h Com pu t er Scien ce Ma st er ‟s degr ee st u den t s fr om t h e Un iver sit a t P olit ècn ica de Va lèn cia (U P V) in Spa in . In a ddit ion , we h a ve a lso

per for m ed a m et a -a n a lysis in or der t o a ggr ega t e t h e r esu lt s obt a in ed in t h e in dividu a l exper im en t s t o pr ovide m or e gen er a l con clu sion s. Th e pa per is st r u ct u r ed a s follows. Sect ion 2 discu sses r ela t ed wor k in t h e em pir ica l va lida t ion of in spect ion m et h ods wit h wh ich t o eva lu a t e Web u sa bilit y. Sect ion 3 br iefly in t r odu ces t h e in spect ion m et h ods (WUE P a n d H E ) wh ich wer e u sed in t h e fa m ily of exper im en t s. Sect ion 4 pr esen t s t h e fa m ily of exper im en t s. Sect ion 5 pr ovides det a ils of t h e in dividu a l design s of ea ch exper im en t . Sect ion 6 r epor t s a n d a n a lyzes t h e r esu lt s obt a in ed fr om ea ch exper im en t . Sect ion 7 su m m a r izes t h e r esu lt s of t h e fa m ily of exper im en t s , a n d a m et a -a n a lysis t o pr ovide a globa l a n a lysis of t h e in dividu a l exper im en t s . Sect ion 8 discu sses possible t h r ea t s t o va lidit y. F in a lly, Sect ion 9 pr esen t s ou r con clu sion s a n d fin a l r em a r ks . 2.

RELATED WORK

Sin ce t h e la t e 1980s, u sa bilit y in spect ion m et h ods h a ve em er ged a s a cost -effect ive a lt er n a t ive t o em pir ica l m et h ods for iden t ifyin g u sa bilit y pr oblem s [Cockt on et a l. 2003]. In t h is con t ext , sever a l in spect ion m et h ods (e.g., H eu r ist ic E va lu a t ion , Cogn it ive Wa lkt h r ou gh ) wer e pr oposed by u sa bilit y exper t s fr om t h e H u m a n Com pu t er In t er a ct ion (H CI) field. Sin ce t h e t er m “Web E n gin eer in g” wa s fir st pu blish ed in 1997 [Geller sen et a l. 1997], t h ese exist in g H CI m et h ods h a ve been a da pt ed a n d im pr oved in or der t o be a pplied t o Web a pplica t ion s, a n d ot h er n ew u sa bilit y eva lu a t ion m et h ods specifica lly cr a ft ed for t h e Web dom a in h a ve a lso a ppea r ed. In t h is sect ion , we discu ss r ela t ed wor k s t h a t r epor t on em pir ica l va lida t ion s a n d com pa r ison s of u sa bilit y in spect ion m et h ods for Web a pplica t ion s. 2.1 Empirical Studies Involving Usability Inspection Methods for Traditional Web Development

Sever a l em pir ica l st u dies wit h wh ich t o va lida t e t h e per for m a n ce of u sa bilit y in spect ion m et h ods h a ve been r epor t ed. Th ese st u dies ca n be cla ssified in t wo t ypes a ccor din g t o t h eir a im : a ) em pir ica l st u dies t h a t wer e in t en ded t o per for m com pa r a t ive st u dies in volvin g well-kn own u sa bilit y in spect ion m et h ods in or der t o gu ide r esea r ch er s a n d pr a ct it ion er s, a n d b) em pir ica l st u dies t h a t wer e in t en ded t o em pir ica lly va lida t e a specific u sa bilit y in spect ion m et h od wh ich h a d been specifica lly pr oposed for t h e Web dom a in . Th e followin g r epr esen t a t ive exa m ples of com pa r a t ive st u dies in volvin g wellkn own u sa bilit y in spect ion m et h ods sh ou ld be h igh ligh t ed:  H va n n ber g et a l. [2007] r epor t ed a n exper im en t in wh ich t wo u sa bilit y in spect ion m et h ods wer e com pa r ed: H eu r ist ic E va lu a t ion a n d Ger h a r dt P owa ls P r in ciples. A wit h in -su bject s exper im en t a l design wa s a pplied t o eva lu a t e t h e u sa bilit y of a Web por t a l. Th e st u dy fou n d t h a t t h er e wer e n o sign ifica n t differ en ces bet ween bot h m et h ods a s r ega r ds t h eir effect iven ess a n d efficien cy in t h e specified con t ext .  Kou t sa ba sis et a l. [2007] r epor t ed a ca se st u dy in wh ich t h e effect iven ess of fou r u sa bilit y eva lu a t ion m et h ods wa s com pa r ed. P a r t icipa n t s wer e divided in t o 9 gr ou ps, of wh ich 3 a n d 2 gr ou ps of pa r t icipa n t s a pplied t h e H eu r ist ic E va lu a t ion a n d Cogn it ive Wa lk t h r ou gh in spect ion m et h ods, r espect ively, a n d 3 a n d 1 gr ou ps of pa r t icipa n t s a pplied t wo em pir ica l m et h ods: Th in k-a lou d pr ot ocol a n d Co-discover y Lea r n in g, r espect ively. Th e Co-discover y Lea r n in g m et h od wa s fou n d t o be sligh t ly m or e effect ive t h a n t h e ot h er s.  Ssem u ga bi a n d De Villier s [2007] r epor t ed a ca se st u dy wh ose a im wa s t o in vest iga t e t h e ext en t t o wh ich H eu r ist ic E va lu a t ion iden t ifies u sa bilit y pr oblem s in a Web-ba sed lea r n in g a pplica t ion by com pa r in g t h e r esu lt s wit h t h ose of Su r vey E va lu a t ion s a m on g en d-u ser s. Th e H eu r ist ic E va lu a t ion



per for m ed by fou r exper t eva lu a t or s pr oved t o be a n a ppr opr ia t e a n d effect ive u sa bilit y eva lu a t ion m et h od for e-lea r n in g a pplica t ion s. Ta n a n d Bish u [2009] r epor t ed a n exper im en t in wh ich H eu r ist ic E va lu a t ion wa s com pa r ed t o U ser Test in g. Alt h ou gh H eu r ist ic E va lu a t ion wa s a ble t o iden t ify m or e u sa bilit y pr oblem s, t h er e wer e n o sign ifica n t con clu sion s r ega r din g t h e effect iven ess a n d efficien cy of bot h m et h ods sin ce t h ey a im ed t o eva lu a t e differ en t a spect s of t h e Web a pplica t ion .

Most of t h e a for em en t ion ed em pir ica l st u dies pr esen t ed com pa r ison s bet ween u sa bilit y in spect ion m et h ods a n d em pir ica l m et h ods. It is im por t a n t t o h igh ligh t t h a t t h ese kin ds of com pa r ison s a r e u sefu l for pr a ct it ion er s in t h a t t h ey pr ovide gu ida n ce in t h e select ion of pr oper u sa bilit y eva lu a t ion m et h od s in a specific con t ext . H owever , we a r gu e t h a t u sa bilit y in sp ect ion m et h ods sh ou ld be com pa r ed t o ot h er u sa bilit y in spect ion m et h ods sin ce em pir ica l m et h ods t en d t o eva lu a t e u sa bilit y a spect s discover ed du r in g u ser in t er a ct ion r a t h er t h a n u sa bilit y a spect s discover ed in Web a r t ifa ct s. Th e followin g r epr esen t a t ive exa m ples of em pir ica l va lida t ion s of a specific u sa bilit y in spect ion m et h od wh ich h a d been specifica lly pr oposed for t h e Web dom a in sh ou ld be h igh ligh t ed:  Cost a bile a n d Ma t er a [2001] pr esen t ed t h e em pir ica l va lida t ion of t h e Syst em a t ic Usa bilit y E va lu a t ion (SUE ) m et h od wh ich em ployed oper a t ion a l gu idelin es ca lled Abst r a ct Ta sks. Two exper im en t s in volvin g 26 a n d 20 n ovice eva lu a t or s, r espect ively, wer e con du ct ed. Th e fir st exper im en t con fir m ed t h a t t h e SUE m et h od en h a n ced t h e effect iven ess a n d efficien cy of t h e u sa bilit y eva lu a t ion , a lon g wit h t h e eva lu a t or s‟ sa t isfa ct ion . Th e secon d exper im en t a im ed t o pr edict t h e n u m ber of eva lu a t or s n eed ed t o a ch ieve a cer t a in per cen t a ge of u sa bilit y pr oblem s det ect ed.  Ch a t t r a t ich a r t a n d Br odie [2004] pr esen t ed t h e em pir ica l va lida t ion of t h e H eu r ist ic E va lu a t ion P lu s m et h od (H E -P lu s), wh ich is a n ext en ded ver sion of t h e H eu r ist ic E va lu a t ion (H E ) [Nielsen 1994]. Th e exper im en t con sist ed of t wo gr ou ps con t a in in g five pa r t icipa n t s ea ch , wh ich wer e r a n dom ly a ssign ed t o t h e t wo m et h ods. Th e r esu lt s sh owed t h a t H E -P lu s wa s m or e effect ive t h a n HE.  H or n bæ k a n d F r økjæ r [2004] pr esen t ed t h e em pir ica l va lida t ion of t h e Met a ph or of H u m a n -Th in kin g m et h od (MOT). Th e exper im en t com pa r ed t h e pr oposed m et h od wit h t h e Cogn it ive Wa lkt h r ou gh m et h od. E va lu a t or s a pplied bot h m et h ods in a differ en t or der . Th e r esu lt s sh owed t h a t t h e pa r t icipa n t s wer e m or e effect ive in t h e det ect ion of u sa bilit y pr oblem s wh en u sin g MOT. In a ddit ion , it a ch ieved a br oa der cover a ge in t h e t ype of u sa bilit y pr oblem s det ect ed.  Bla ckm on et a l. [2005] pr esen t ed t h e em pir ica l va lida t ion of t h e Cogn it ive Wa lkt h r ou gh for t h e Web m et h od (CWW). Th e exper im en t sh owed t h a t CWW wa s m or e effect ive t h a n t h e Cogn it ive Wa lk t h r ou gh m et h od on wh ich it is ba sed, a n d it a lso con sider ed CWW t o be a n effect ive in spect ion m et h od wit h wh ich t o r epa ir u sa bilit y pr oblem s r ela t ed t o u n fa m ilia r a n d con fu sa ble lin ks .  Con t e et a l. [2009] pr esen t ed t h e em pir ica l va lida t ion of t h e Web Design P er spect ives m et h od (WDP ), wh ich defin es a set of h eu r ist ics by con sider in g fou r differ en t per spect ives of a Web a pplica t ion : con cept u a l, st r u ct u r a l, n a viga t ion a n d pr esen t a t ion . Two exper im en t s t h a t pu r su ed differ en t goa ls wer e per for m ed in or der t o r efin e t h e a ppr oa ch . Th e r esu lt s of t h e fir st exper im en t sh owed t h a t WDP wa s a fea sible m et h od wit h wh ich t o det ect u sa bilit y pr oblem s, wh er ea s t h e secon d exper im en t sh owed t h a t WP D wa s m or e effect ive wh en it wa s com pa r ed t o t h e Nielsen ‟s H eu r ist ic E va lu a t ion .



Ma la k a n d Sa h r a ou i [2010] pr esen t ed t h e defin it ion a n d em pir ica l va lida t ion of a pr oba bilist ic a ppr oa ch for bu ildin g Web qu a lit y m odels in or der t o m a n a ge u n cer t a in t y a n d su bject ivit y, wh ich a r e in h er en t t o qu a lit y eva lu a t ion . Th is a ppr oa ch wa s in st a n t ia t ed t o eva lu a t e t h e n a viga bilit y of Web a pplica t ion s, wh ich is con sider ed t o be a r eleva n t su b-ch a r a ct er ist ic of u sa bilit y [Lea vit a n d Sh n eider m a n 2006]. Th e r esu lt s of a n exper im en t con du ct ed sh owed t h a t t h e scor es given by t h e pr oposed m odel a r e st r on gly cor r ela t ed wit h n a viga bilit y a s per ceived by t h e u ser .

Alt h ou gh t h e a for em en t ion ed em pir ica l st u dies pr esen t t h e em pir ica l va lida t ion of u sa bilit y in spect ion m et h ods, t h e m a jor it y of t h em t en d t o pr esen t isola t ed em pir ica l st u dies wit h n o r eplica t ion s in or der t o su ppor t a m et a -a n a lysis a im ed a t a ggr ega t in g em pir ica l eviden ces fr om in dividu a l st u dies . Th is fa ct wa s a lso eviden ced in a syst em a t ic r eview on t h e effect iven ess of Web u sa bilit y eva lu a t ion m et h ods per for m ed in pr eviou s wor k [F er n a n dez et a l. 2012]. Also despit e t h er e a r e som e em pir ica l st u dies su ch a s t h e on es by H or n bæ k [2006] a n d H or n bæ k a n d La w [2007], in wh ich a m et a -a n a lysis of u sa bilit y m ea su r es is pr esen t ed, t h ese st u dies a r e a im ed a t eva lu a t in g t h e u sa bilit y of a u ser in t er fa ce (i.e., u sa bilit y exper ien ced by in t er a ct in g wit h a soft wa r e pr odu ct ) r a t h er t h a n t h e u sa bilit y of t h e eva lu a t ion m et h od it self (i.e., u sa bilit y exper ien ced by a n u sa bilit y eva lu a t or du r in g t h e em ploym en t of t h e eva lu a t ion m et h od ). Never t h eless, t h ese st u dies eviden ce t h e im por t a n ce of u n der st a n din g t h e r ela t ion bet ween u sa bilit y m et r ics in or der t o select t h e r igh t m et r ics for u sa bilit y st u dies . In a ddit ion , m ost of t h e em pir ica l st u dies com pa r in g u sa bilit y in spect ion m et h ods on ly con sider object ive va r ia bles r ela t ed t o t h e m et h ods em ploym en t (m a in ly t h eir effect iven ess). Alt h ou gh object ive depen den t va r ia bles su ch a s effect iven ess a n d efficien cy a r e r eleva n t , su bject ive depen den t va r ia bles r ela t ed t o t h e eva lu a t or ‟s per cept ion s sh ou ld a lso be con sider ed sin ce t h ey lik ewise con t r ibu t e t o t h e a ccept a n ce of t h e u sa bilit y in spect ion m et h od in pr a ct ice. 2.2 Empirical Studies Involving Usability Inspection Methods for Model-driven Web Development

St u dies su ch a s t h a t of J u r ist o et a l. [2007] cla im t h a t u sa bilit y eva lu a t ion s sh ou ld a lso be per for m ed du r in g t h e ea r ly st a ges of t h e Web developm en t pr ocess in or der t o im pr ove u ser exper ien ce a n d decr ea se m a in t en a n ce cost s. We a r gu e t h a t m odel dr iven Web developm en t (MDWD) pr ocesses pr ovide a n a ppr opr ia t e con t ext in wh ich t o con du ct ea r ly u sa bilit y eva lu a t ion s, sin ce m odels wh ich a r e a pplied a t a ll st a ges ca n be eva lu a t ed t h r ou gh ou t t h e en t ir e Web developm en t pr ocess. Despit e t h e fa ct t h a t sever a l MDWD pr ocesses h a ve been pr oposed sin ce t h e la t e 2000s, a n d a r e st ill evolvin g [Va lder a s a n d P elech a n o 2011], few wor k s a ddr ess u sa bilit y eva lu a t ion s i n m odel-dr iven Web developm en t (e.g., Abr a h ã o a n d In sfr a n [2006], Sot t et et a l. [2007], a n d Molin a a n d Tova l [2009]). Th er e a r e con sequ en t ly few st u dies t h a t pr esen t em pir ica l st u dies in t h is con t ext . Som e exa m ples a r e Abr a h ã o et a l. [2007] a n d P a n a ch et a l. [2008]. Abr a h ã o et a l. [2007] pr esen t a n em pir ica l st u dy wh ich eva lu a t es t h e u ser in t er fa ces t h a t wer e gen er a t ed a u t om a t ica lly by a m odel-dr iven developm en t t ool. Th is st u dy a pplies t wo u sa bilit y eva lu a t ion m et h ods: a n in spect ion m et h od (i.e., Act ion An a lysis [Olson a n d Olson 1990]) a n d a n em pir ica l m et h od (i.e., User Test in g) wit h t h e a im of com pa r in g wh a t t ypes of u sa bilit y pr oblem s a r e det ect ed in t h e u ser in t er fa ces a n d wh a t t h eir im plica t ion s a r e for t r a n sfor m a t ion s r u les a n d pla t for m in depen den t m odels. H owever , t h e u sa bilit y eva lu a t ion m et h ods em ployed wer e n ot a da pt ed t o be a pplied in Web a r t ifa ct s a n d n o depen den t va r ia bles wer e defin ed in or der t o com pa r e t h e per for m a n ce of bot h m et h ods.

P a n a ch et a l. [2008] ext en ded t h e u sa bilit y m odel pr oposed in Abr a h ã o a n d In sfr a n [2006], wh ich decom poses u sa bilit y in t o m ea su r a ble a t t r ibu t es t h a t a r e a pplied t o soft wa r e pr odu ct s obt a in ed a s r esu lt of a m odel-dr iven developm en t pr ocess. Th e a im wa s t o pr ovide m et r ics wit h wh ich t o eva lu a t e t h e u n der st a n da bilit y of Web a pplica t ion s (i.e., a u sa bilit y su b -ch a r a ct er ist ic) a n d t o a ggr ega t e t h e va lu es obt a in ed in or der t o pr ovide a t t r ibu t e in dexes . Th ese in dexes wer e com pa r ed t o t h e per cept ion of t h ese sa m e a t t r ibu t es by en d u ser s. H owever , t h e em pir ica l va lida t ion wa s ba sed on cor r ela t ion s bet ween m et r ic ca lcu la t ion a n d a t t r ibu t e per cept ion . Mor eover , it did n ot con sider a n y per for m a n ce m ea su r e of m et h od u sa ge. As in dica t ed by H or n bæ k [2010], for a ssessin g t h e qu a lit y of u sa bilit y eva lu a t ion m et h ods, n ot on ly t h e cou n t in g of u sa bilit y pr oblem s det ect ed sh ou ld be con sider ed bu t a lso t h e eva lu a t or s‟ obser va t ion s a n d sa t isfa ct ion wit h t h e m et h ods u n der eva lu a t ion . 2.3 Discussion

Th e a n a lysis of t h e a for em en t ion ed st u dies h a s a llowed u s t o det ect som e lim it a t ion s in t h e em pir ica l va lida t ion of u sa bilit y in spect ion m et h ods su ch a s: 1) t h e low n u m ber of em pir ica l st u dies, pa r t icu la r ly in t h e con t ext of m odel-dr iven Web developm en t ; 2) t h e la ck of fr a m ewor k s a n d st a n da r d cr it er ia for t h e com pa r ison of u sa bilit y eva lu a t ion m et h od s; a n d 3) t h e fa ct t h a t t h e m a jor it y of em pir ica l va lida t ion s t en d t o be isola t ed a n d n ot r eplica t ed . Th e fir st lim it a t ion is in lin e wit h t h e r esu lt s of ou r syst em a t ic m a ppin g st u dy , wh ich r evea led t h a t on ly 44% of Web u sa bilit y st u dies h a ve r epor t ed em pir ica l va lida t ion s of t h e pr oposed a n d/or em ployed u sa bilit y eva lu a t ion m et h ods [F er n a n dez et a l. 2011a ]. Th is st u dy sh owed t h a t exper im en t s wer e on e of t h e m ost fr equ en t ly em ployed t ypes of em pir ica l m et h ods u sed for va lida t ion pu r poses sin ce t h e y pr ovide a h igh level of con t r ol a n d a r e u sefu l for com pa r in g u sa bilit y eva lu a t ion m et h ods in a m or e r igor ou s m a n n er . H owever , t h e m a jor it y of t h ese exper im en t s in volved u sa bilit y in spect ion m et h ods t h a t a r e or ien t ed t owa r ds t r a dit ion a l Web developm en t pr ocesses, a n d u sa bilit y eva lu a t ion s t h er efor e pr in cipa lly t ook pla ce in t h e la t er st a ges of t h e Web developm en t pr ocess . Th e secon d lim it a t ion is in lin e wit h st u dies su ch a s t h a t of Gr a y a n d Sa lzm a n [1998] in wh ich it is cla im ed t h a t m ost of t h e exper im en t s ba sed on com pa r ison s of u sa bilit y eva lu a t ion m et h ods do n ot clea r ly iden t ify wh ich a spect s of t h ese m et h ods a r e bein g com pa r ed. Th is issu e wa s a lso det ect ed by H a r t son [2003], in wh ich sever a l st u dies wer e a n a lyzed in or der t o det er m in e wh ich m ea su r es h a d been u sed in t h e va lida t ion of u sa bilit y eva lu a t ion m et h ods. Th e m a jor it y of t h ese st u dies eva lu a t ed t h e effect iven ess of u sa bilit y eva lu a t ion m et h ods u sin g t h e t h or ou gh n ess m et r ic (i.e., t h e r a t io bet ween t h e n u m ber of r ea l u sa bilit y pr oblem s fou n d a n d t h e n u m b er of t ot a l r ea l u sa bilit y pr oblem s). Th is st u dy a lso cla im ed t h a t t h e m a jor it y of t h ese com pa r a t ive st u dies did n ot pr ovide t h e descr ipt ive st a t ist ics n eeded t o per for m a m et a -a n a lysis of t h e em pir ica l fin din gs ext r a ct ed fr om differ en t sou r ces. Th e t h ir d lim it a t ion is in lin e wit h st u dies t h a t h a ve been per for m ed in t h e Soft wa r e E n gin eer in g field , su ch a s t h a t of Sjøber g et a l. [2005]. Th is wor k cla im s t h a t on ly 20 ou t of 113 con t r olled exper im en t s a r e r eplica t ion s . A r eplica t ion is t h e r epet it ion of a n exper im en t t o con fir m fin din gs or t o en su r e a ccu r a cy. Th er e a r e t wo t ypes of r eplica t ion s: close r eplica t ion s a lso kn own a s st r ict r eplica t ion s (i.e., r eplica t ion s t h a t a t t em pt t o keep a lm ost a ll t h e kn own exper im en t a l con dit ion s m u ch t h e sa m e or a t lea st ver y sim ila r ) a n d differ en t ia t ed r eplica t ion s (i.e., r eplica t ion s t h a t in t r odu ce va r ia t ion s in essen t ia l a spect s of t h e exper im en t a l con dit ion s, su ch a s execu t ion s of r eplica t ion s wit h differ en t kin ds of pa r t icipa n t s) [Lin dsa y a n d E h r en ber g 1993]. Bot h t ypes of r eplica t ion s a r e n ecessa r y t o a ch ieve gr ea t er va lidit y in t h e r esu lt s obt a in ed t h r ou gh em pir ica l st u die s. Dea lin g wit h exper im en t a l

r eplica t ion s h a s been a ddr essed by t h e con cept of t h e fa m ily of exper im en t s. Alt h ou gh m a n y em pir ica l st u dies of t h is t yp e h a ve been a pplied in t h e Soft wa r e E n gin eer in g field (e.g., Cr u z-Lem u s et a l. [2011]; Abr a h ã o et a l. [2011]), few fa m ilies of exper im en t h a ve been r epor t ed in t h e Web E n gin eer in g field (e.g., Abr a h ã o a n d P oels [2009]). An ot h er issu e a lso a ppea r s wh ich is specific t o t h e Web E n gin eer in g field: t h e m a jor it y of em pir ica l st u dies ca n n ot be con sider ed t o be m et h odologica lly r igor ou s. A syst em a t ic r eview pr esen t ed by Men des [2005] wa s per for m ed t o det er m in e t h e r igor of cla im s of Web E n gin eer in g r esea r ch . Th is r eview dem on st r a t ed t h a t on ly 5% sh ou ld be con sider ed a s r igor ou s. It a lso fou n d t h a t n u m er ou s Web E n gin eer in g pa per s u sed in cor r ect t er m in ology (e.g., t h ey u sed t h e t er m experim en t r a t h er t h a n experien ce report or t h e t er m case stu d y r a t h er t h a n proof of con cept). 3.

METHODS EVALUATED

Th e m et h ods eva lu a t ed t h r ou gh t h e fa m ily of exper im en t s wer e t wo in spect ion m et h ods: ou r pr oposa l (WUE P ) a n d t h e H eu r ist ic E va lu a t ion (H E ) pr oposed by Nielsen [1994]. An over view of bot h m et h ods is pr esen t ed in t h e followin g su bsect ion s. Th e r a t ion a le for select in g H E a s t h e m et h od u sed t o com pa r e ou r pr oposa l is ba sed on t h e followin g st a t em en t s:  WUE P sh ou ld be com pa r ed wit h ot h er in spect ion m et h od sin ce t h ese m et h ods a llow u s t o eva lu a t e Web a r t ifa ct s t h a t a r e pr odu ced du r in g t h e ea r ly st a ges of t h e Web developm en t pr ocess. E m pir ica l m et h ods wh ich in volve t h e pa r t icipa t ion of r ea l u ser s a n d a r e oft en u sed a ft er developm en t t o a ssess a design a r e t h er efor e disca r ded (e.g., U ser Test in g or E n d-u ser Qu est ion n a ir es). In t h is wor k, we a r e t h u s in t er est ed in com pa r in g WUE P a ga in st ot h er m et h od t h a t ca n be a pplied t o obt a in for m a t ive eva lu a t ion s (i.e., eva lu a t ion s ca r r ied ou t du r in g developm en t t o im pr ove a design ).  H E is on e of t h e best -kn own in spect ion m et h ods. Th is a llows u s t o ga t h er m or e a ccu r a cy in for m a t ion a bou t it s em ploym en t [H ollin gsed a n d N ovick 2007].  H E is on e of t h e m ost widely-u sed eva lu a t ion m et h ods in in du st r y. F or in st a n ce, h a lf of t h e t en Web in t r a n et s t h a t won a 2005 com pet it ion u sed t h is m et h od [Nielsen 2005].  H E cover s a br oa der r a n ge of u sa bilit y a spect s t h a n ot h er in spect ion m et h ods su ch a s, for in st a n ce, Cogn it ive Wa lkt h r ou gh s, wh ose u sa bilit y defin it ion is m or e focu sed on ea se of n a viga t ion .  H E h a s pr ovided u sefu l r esu lt s wh en u sed t o con du ct Web u sa bilit y eva lu a t ion s [Su t cliffe 2002; Allen et a l. 2006; Ssem u ga bi a n d De Villier s 2007].  H E h a s oft en been u sed for com pa r ison wit h ot h er in spect ion m et h ods [Cost a bile a n d Ma t er a 2001; Ch a t t r a t ich a r t a n d Br odie 2004; Con t e et a l. 2009].  No u sa bilit y eva lu a t ion m et h od h a s been pr eviou sly defin ed t o be a pplied in m odel-dr iven Web developm en t pr ocesses. Sin ce t h er e is cu r r en t ly n o st a n da r d in spect ion m et h od for con du ct in g Web u sa bilit y eva lu a t ion s, we ca n n ot eva lu a t e WUE P a ga in st a con t r ol m et h od. 3.1 Web Usability Evaluation Process (WUEP)

Th e Web Usa bilit y E va lu a t ion P r ocess (WUE P ) [F er n a n dez et a l. 2011b] ext en ds a n d a da pt s t h e qu a lit y eva lu a t ion pr ocess pr oposed in t h e ISO 25000 st a n da r d (SQu a RE ) [2005] wit h t h e a im of in t egr a t in g u sa bilit y eva lu a t ion s in t o m odel-dr iven Web developm en t pr ocesses. WUE P em ploys a Web Usa bilit y Model [F er n a n dez et a l.

2009] t h a t decom poses t h e u sa bilit y con cept in t o su b -ch a r a ct er ist ics a n d m ea su r a ble a t t r ibu t es. Met r ics wit h a gen er ic defin it ion a r e a ssocia t ed wit h t h ese a t t r ibu t e s in or der for t h em t o be oper a t ion a lized a t differ en t a bst r a ct ion levels (P la t for m In depen den t Models, P la t for m -Specific Models a n d fin a l User In t er fa ces) in a n y m odel-dr iven Web developm en t pr ocess. Th e Web Usa bilit y Model (in clu din g a ll t h e su b-ch a r a ct er ist ics a t t r ibu t es a n d t h eir a ssocia t ed gen er ic m et r ics ) is a va ila ble a t h t t p://www.dsic.u pv.es/~a fer n a n dez/ WebUsa bilit yModel. Th e a im of a pplyin g m et r ics wa s t o r edu ce t h e su bject ivit y in h er en t t o exist in g in spect ion m et h ods. It is im por t a n t t o n ot e t h a t by a pplyin g m et r ics, t h e eva lu a t or s in spect t h ese a r t ifa ct s in or der t o det ect pr oblem s r ela t ed t o t h e u sa bilit y for en d u ser s bu t n ot r ela t ed t o t h e u sa bilit y of m odel-dr iven a r t ifa ct s t h em selves. Th er efor e, in spect ion of t h ese m odels (by con sider in g t h e t r a cea bilit y a m on g t h em ) a llows t h e sou r ce of t h e u sa bilit y pr oblem t o be discover ed a n d fa cilit a t es t h e pr ovision of r ecom m en da t ion s t o cor r ect t h ese pr oblem s du r in g t h e ea r lier st a ges of t h e Web developm en t pr ocess. In ot h er wor ds, we a r e r efer r in g t o a Web a pplica t ion t h a t ca n be u sa ble by con st r u ct ion [Abr a h ã o et a l. 2007]. F igu r e 1 sh ows a n over view of t h e m a in st a ges of WUE P in wh ich t h r ee r oles a r e in volved: eva lu a t ion design er , eva lu a t or , a n d Web developer . Th e eva lu a t ion design er per for m s t h e fir st t h r ee st a ges : 1) E st a blish in g t h e r equ ir em en t s of t h e eva lu a t ion ; 2) Specifica t ion of t h e eva lu a t ion ; a n d 3) Design of t h e eva lu a t ion . Th e eva lu a t or per for m s t h e fou r t h st a ge: 4) E xecu t ion of t h e eva lu a t ion , a n d t h e Web developer per for m s t h e la st st a ge: 5) An a lysis of ch a n ges. A br ief descr ipt ion of ea ch st a ge is pr ovided a s follows: 1. In t h e establish m en t of th e evalu ation requ irem en ts st a ge, t h e scope of t h e eva lu a t ion is defin ed by a ) est a blish in g t h e pu r pose of t h e eva lu a t ion ; b) specifyin g t h e eva lu a t ion pr ofiles (t ype of Web a pplica t ion , Web developm en t m et h od em ployed, con t ext of u se); c) select in g t h e Web a r t ifa ct s t o be eva lu a t ed; a n d d) select in g t h e u sa bilit y a t t r ibu t es fr om t h e Web u sa bilit y m odel wh ich a r e goin g t o be eva lu a t ed. 2. In t h e specification of th e evalu ation st a ge, t h e m et r ics a ssocia t ed wit h t h e select ed a t t r ibu t es a r e oper a t ion a lized in or der for t h em t o be a pplied t o t h e Web a r t ifa ct t o be eva lu a t ed. Th is oper a t ion a liza t ion con sist s of est a blish in g a m a ppin g bet ween t h e gen er ic descr ipt ion of t h e m et r ic a n d t h e con cept s t h a t a r e r epr esen t ed in t h e Web a r t ifa ct s (m odelin g pr im it ives in m odels or UI elem en t s in t h e fin a l Web a pplica t ion ). In a ddit ion , r a t in g levels a r e est a blish ed for r a n ges of va lu es obt a in ed for ea ch m et r ic by con sider in g t h eir sca le t ype a n d t h e gu idelin es r ela t ed t o ea ch m et r ic wh en ever possible . Th ese r a t in g levels pr ovide a cla ssifica t ion of u sa bilit y pr oblem s ba sed on t h eir sever it y: low, m ediu m , or cr it ica l. It is im por t a n t t o n ot e t h a t t h e oper a t ion a liza t ion n eeds t o be per for m ed on ce by a con cr et e Web developm en t m et h od, a n d ca n be r eu sed in fu r t h er eva lu a t ion s t h a t in volve Web a pplica t ion s fr om t h e sa m e Web developm en t m et h od. 3. In t h e d esign of th e evalu ation st a ge, t h e t em pla t e for u sa bilit y r epor t s is defin ed a n d t h e eva lu a t ion pla n is ela bor a t ed (e.g., n u m ber of eva lu a t or s, eva lu a t ion r est r ict ion s). 4. In t h e execu tion of th e evalu ation st a ge, t h e eva lu a t or a pplies t h e oper a t ion a lized m et r ics t o t h e select ed a r t ifa ct s in or der t o det ect u sa bilit y pr oblem s by con sider in g t h e r a t in g levels of ea ch m et r ic. An exa m ple of t h is m et r ic a pplica t ion ca n be fou n d in Appen dix A.1. 5. In t h e an alysis of ch an ges st a ge, a ll t h e u sa bilit y pr oblem s det ect ed a r e a n a lyzed in or der t o pr opose ch a n ges wit h wh ich t o cor r ect t h e a ffect ed a r t ifa ct s fr om a specific st a ge of t h e Web developm en t pr ocess. Th e ch a n ges a r e a pplica ble t o t h e pr eviou s in t er m edia t e Web a r t ifa ct s (i.e., pla t for m -

in depen den t m odels, pla t for m -specific m odels a n d m odel t r a n sfor m a t ion s if t h e eva lu a t ion is per for m ed on t h e fin a l Web u ser in t er fa ce). > 3.2 Heuristic Evaluation (HE)

Th e H eu r ist ic E va lu a t ion (H E ) m et h od r equ ir es a gr ou p of eva lu a t or s t o exa m in e Web a r t ifa ct s (com m on ly u ser in t er fa ces) in com plia n ce wit h com m on ly-a ccept ed u sa bilit y pr in ciples ca lled h eu r ist ics. H E pr oposes t en h eu r ist ics t h a t a r e in t en ded t o cover t h e best pr a ct ices in t h e design of a n y u ser in t er fa ce. (e.g., m in im ize t h e u ser wor kloa d, er r or pr even t ion , r ecogn it ion r a t h er t h a n r eca ll). In or der t o fa cilit a t e bot h t h e m et h od a pplica t ion a n d t h e m et h od com pa r ison , we h a ve st r u ct u r ed t h e m et h od in t h e sa m e m a in st a ges pr ovided by WUE P . F igu r e 2 sh ows a n over view of t h ese st a ges in wh ich t h r ee r oles a r e a lso in volved: eva lu a t ion design er , eva lu a t ion execu t or a n d Web developer . Th e eva lu a t ion design er per for m s t h e fir st t h r ee st a ges: 1) E st a blish in g t h e r equ ir em en t s of t h e eva lu a t ion ; 2) Specifica t ion of t h e eva lu a t ion ; a n d 3) Design of t h e eva lu a t ion . Th e eva lu a t or per for m s t h e fou r t h st a ge: 4) E xecu t ion of t h e eva lu a t ion , a n d t h e Web developer per for m s t h e la st st a ge: 5) An a lysis of ch a n ges . A br ief descr ipt ion of ea ch st a ge is pr ovided a s follows: 1. In t h e establish m en t of th e evalu ation requ irem en ts st a ge, t h e scope of t h e eva lu a t ion is defin ed by: a ) est a blish in g t h e pu r pose of t h e eva lu a t ion ; b) specifyin g t h e eva lu a t ion pr ofiles (t ype of Web a pplica t ion , Web developm en t m et h od em ployed, con t ext of u se); a n d c) select in g t h e Web a r t ifa ct s t o be eva lu a t ed. 2. In t h e specification of th e evalu ation st a ge, t h e t en h eu r ist ics a r e descr ibed in det a il by pr ovidin g gu idelin es a bou t wh ich elem en t s fr om t h e select ed a r t ifa ct s ca n be a ffect ed by ea ch h eu r ist ic. 3. In t h e d esign of th e evalu ation st a ge, t h e t em pla t e for u sa bilit y r epor t s is defin ed (e.g., st r u ct u r ed r epor t s or ver ba lized fin din g), a n d t h e eva lu a t ion pla n is ela bor a t ed (e.g., n u m ber of eva lu a t or s, m ech a n ism s t o a ggr ega t e r esu lt s, eva lu a t ion r est r ict ion s). 4. In t h e execu tion of th e evalu ation st a ge, t h e eva lu a t or a pplies t h e h eu r ist ics t o t h e select ed a r t ifa ct s (wh en it s expr essiven ess a llows t h e h eu r ist ic t o be a pplica ble) in or der t o det ect u sa bilit y pr oblem s . An exa m ple of t h is h eu r ist ic a pplica t ion ca n be fou n d in Appen dix A.2. 5. In t h e an alysis of ch an ges st a ge, a ll t h e u sa bilit y pr oblem s det ect ed a r e a n a lyzed in or der t o pr opose ch a n ges wit h wh ich t o cor r ect t h e a ffect ed a r t ifa ct s. > 4.

THE FAMILY OF EXPERIMENTS

An in cr ea sin g u n der st a n din g exist s t h a t em pir ica l st u dies a r e n eeded t o cr ea t e, im pr ove, or a ssess pr ocesses, m et h ods, a n d t ools for soft wa r e developm en t [Ba sili et a l. 1986; Ba sili 1996; F en t on 1993], m a in t en a n ce [Colosim o et a l. 2009; Dzidek et a l. 2008], a n d qu a lit y eva lu a t ion [Bolch in i a n d Ga r zot t o 2007]. An em pir ica l st u dy is gen er a lly a n a ct or oper a t ion by wh ich t o discover som et h in g t h a t is u n kn own , or t o t est h ypot h eses [Ba sili 1993]. Resea r ch st r a t egies in clu de con t r olled exper im en t s, qu a lit a t ive st u dies, su r veys, a n d a r ch iva l a n a lyses [J u r ist o a n d Mor en o 2001; Woh lin et a l. 2000]. H owever , r eplica t ion s of t h ese st u dies a r e n ecessa r y if t h eir r esu lt s a r e t o a ch ieve gr ea t er va lidit y [Sh u ll et a l. 2008; Kit ch en h a m 2008]. In t h is r espect , t h e “fa m ily of exper im en t s” a s a n em pir ica l r esea r ch m et h odology h a s a r isen wit h t h e

a im of ext r a ct in g sign ifica n t con clu sion s fr om m u lt iple sim ila r exper im en t s t h a t pu r su e t h e sa m e goa l. In t h is sect ion , we pr esen t t h e fa m ily of exper im en t s t h a t we per for m ed t o em pir ica lly va lida t e WU E P . Th is em pir ica l st u dy is a lso in t en ded t o con t r ibu t e t o Soft wa r e E n gin eer in g r esea r ch t h r ou gh pr oposin g a well-defin ed fr a m ewor k t h a t ca n be r eu sed by ot h er r esea r ch er s in t h e em pir ica l va lida t ion of t h eir u sa bilit y eva lu a t ion m et h ods. Th e r esea r ch m et h odology a dopt ed is a n ext en sion of t h e fivest eps pr oposed by Ciolkowski et a l. [2002], in wh ich t h e fift h st ep, “F a m ily da t a a n a lysis”, h a s been r epla ced wit h “F a m ily da t a a n a lysis a n d m et a -a n a lysis”, a n d it wa s gu ided by t h e exper im en t a l pr ocess of Woh lin et a l. [2000]. 4.1 Step 1. Experiment Preparation

Th e exper im en t wa s pr epa r ed by ca r r yin g ou t t h e followin g st eps: 1) t h e est a blish m en t of t h e goa l of t h e fa m ily of exper im en t s ; 2) t h e select ion of va r ia bles ; 3) t h e for m u la t ion of h ypot h eses; a n d 4) t h e exper im en t a l design , wh ich a ll t h e in dividu a l exper im en t s h a ve in com m on . Th ese issu es a r e descr ibed in t h e followin g su bsect ion s. 4.1.1. Goal of the family of experiments. Accor din g t o t h e Goa l-Qu est ion -Met r ic (GQM) pa r a digm [Ba sili a n d Rom ba ch 1988], t h e goa l of ou r fa m ily of exper im en t s is t o a n a lyze t h e Web U sa bilit y E va lu a t ion P r ocess (WUE P ) in or der t o eva lu a t e it wit h r ega r d t o it s effect iven ess, efficien cy, per ceived ea se of u se, a n d per ceived sa t isfa ct ion in com pa r ison t o t h e H eu r ist ic E va lu a t ion (H E ) fr om t h e viewpoin t of a set of u sa bilit y in spect or s. Th is exper im en t a l goa l will a lso a llow u s t o sh ow t h e fea sibilit y of ou r a ppr oa ch wh en it is a pplied t o Web a r t ifa ct s fr om a m odel-dr iven Web developm en t pr ocess, in a ddit ion t o det ect in g issu es t h a t ca n be im pr oved in fu t u r e ver sion s of WUE P . 4.1.2. Independent and Dependent Variables . Th er e a r e t wo in depen den t va r ia bles in t h e fa m ily of exper im en t s:  Th e eva lu a t ion m et h od, wit h n om in a l va lu es: WUE P a n d H E .  Th e exper im en t a l object s (collect ion of Web a r t ifa ct s) t o wh ich bot h m et h ods a r e a pplied, wit h n om in a l va lu es: O1 a n d O2. A det a iled descr ipt ion of t h ese exper im en t a l object s is pr ovided in Sect ion 4.2.1.

Th er e a r e t wo object ive depen den t va r ia bles, wh ich wer e select ed by con sider in g wor k s su ch a s H a r t son et a l. [2000] a n d Gr a y a n d Sa lzm a n [1998]:  E ffectiven ess, wh ich is ca lcu la t ed a s t h e r a t io bet ween t h e n u m ber of u sa bilit y pr oblem s det ect ed a n d t h e t ot a l n u m ber of exist in g (kn own ) u sa bilit y pr oblem s. We con sider on e u sa bilit y pr oblem a s on e defect t h a t ca n be fou n d in differ en t a r t ifa ct s in depen den t ly of it s sever it y level a n d it s t ot a l n u m ber of occu r r en ces.  E fficien cy, wh ich is ca lcu la t ed a s t h e r a t io bet ween t h e n u m ber of u sa bilit y pr oblem s det ect ed a n d t h e t ot a l t im e s pen t on t h e in spect ion pr ocess. Th e m ea su r em en t of t h ese va r ia bles in volves sever a l issu es. Sin ce t h e exper im en t a l object s h a ve been ext r a ct ed fr om a r ea l Web a pplica t ion , it is n ot possible t o a n t icipa t e a ll t h e exist in g pr oblem s in t h e a r t ifa ct s t o be eva lu a t ed. F or t h is r ea son , a con t r ol gr ou p (for m ed of t wo in depen den t eva lu a t or s wh o a r e exper t s in u sa bilit y eva lu a t ion s a n d on e of t h e a u t h or s of t h is pa per ) wa s cr ea t ed in or der t o pr ovide a ba selin e of u sa bilit y pr oblem s by a pplyin g a n E xper t E va lu a t ion a s ad -h oc in spect ion m et h od ba sed on t h eir own exper t ise. Sin ce t h is ba selin e m a y be bia sed by t h e eva lu a t or ‟s exper t ise, we on ly con sider ed t h is ba selin e a s a n in it ia l set of u sa bilit y pr oblem s wh ich cou ld evolve by a ddin g t h e n ew u sa bilit y pr oblem s det ect ed by t h e pa r t icipa n t s. F or t h is r ea son , t h is con t r ol gr ou p wa s a lso r espon sible t o det er m in e wh et h er t h e u sa bilit y pr oblem s r epor t ed by t h e pa r t icipa n t s in ea ch

exper im en t wer e fa lse posit ives (n o r ea l u sa bilit y pr oblem s), wh et h er t h e u sa bilit y pr oblem h a s a lr ea dy been det ect ed by a pa r t icipa n t in t h e wh ole exper im en t a l object (r eplica t ed pr oblem s), or wh et h er t h er e a r e n ew u sa bilit y pr oblem s t h a t n eed t o be a dded t o t h e ba selin e (in cr ea sin g t h e t ot a l n u m ber of exist in g u sa bilit y pr oblem s). Disa gr eem en t s a m on g con t r ol gr ou p m em ber s wer e r esolved by con sen su s . Th er e a r e a lso t wo su bject ive depen den t va r ia bles , wh ich wer e ba sed on con st r u ct s fr om t h e Tech n ology Accept a n ce Model (TAM) [Da vis 1989] sin ce TAM is on e of t h e m ost widely a pplied t h eor et ica l m odel t o st u dy u ser a ccept a n ce a n d u sa ge beh a vior of em er gin g in for m a t ion t ech n ologies , a n d it h a s r eceived ext en sive em pir ica l su ppor t t h r ou gh va lida t ion s a n d r eplica t ion s [Ven ka t esh 2000]:  Perceived E ase of Use, wh ich r efer s t o t h e degr ee t o wh ich eva lu a t or s believe t h a t lea r n in g a n d u sin g a pa r t icu la r eva lu a t ion m et h od will be effor t -fr ee.  Perceived S atisfaction of Use, wh ich r efer s t o t h e degr ee t o wh ich eva lu a t or s believe t h a t t h e em ploym en t of a pa r t icu la r eva lu a t ion m et h od ca n h elp t h em t o a ch ieve specific a bilit ies a n d pr ofession a l goa ls . Bot h va r ia bles a r e m ea su r ed u sin g a set of 8 closed-qu est ion s: 5 qu est ion s wit h wh ich t o m ea su r e P er ceived E a se of U se (P E U), a n d 3 qu est ion s wit h wh ich t o m ea su r e P er ceived Sa t isfa ct ion of Use (P SU ). Th e closed-qu est ion s wer e for m u la t ed by u sin g a 5-poin t Liker t sca le, u sin g t h e opposin g st a t em en t qu est ion for m a t . In ot h er wor ds, ea ch qu est ion con t a in s t wo opposit e st a t em en t s wh ich r epr esen t t h e m a xim u m a n d m in im u m possible va lu es (5 a n d 1), in wh ich t h e va lu e 3 is con sider ed t o be a n eu t r a l per cept ion . E a ch su bject ive depen den t va r ia ble wa s qu a n t ified by ca lcu la t in g t h e a r it h m et ica l m ea n of it s closed -qu est ion va lu es. Ta ble 1 pr esen t s t h e qu est ion s a ssocia t ed wit h ea ch su bject ive depen den t va r ia ble. > It is im por t a n t t o n ot e t h a t bot h object ive a n d su bject ive va r ia bles a r e r ela t ed t o t h e em ploym en t of Web u sa bilit y eva lu a t ion m et h ods, n ot t h e u sa bilit y eva lu a t ion of a Web a pplica t ion by in volvin g en d u ser s. 4.1.3. Hypotheses. We for m u la t ed t h e followin g n u ll h ypot h eses, wh ich a r e on esided sin ce we expect ed WUE P t o be su per ior t o H E for ea ch depen den t va r ia ble. E a ch n u ll h ypot h esis a n d it s a lt er n a t ive h ypot h es is a r e pr esen t ed a s follows:  H 1 0: Th er e is n o sign ifican t d ifferen ce bet ween t h e effect iven ess of WU E P a n d HE.  H 1 a : WUE P is sign ifican tly m ore effective t h a n H E .

     

H 2 0: Th er e is n o sign ifican t d ifferen ce bet ween t h e efficien cy of WUE P a n d HE. H 2 a : WUE P is sign ifican tly m ore efficien t t h a n H E . H 3 0: Th er e is n o sign ifican t d ifferen ce bet ween t h e per ceived ea se of u se of WUE P a n d H E . H 3 a : WUE P is perceived to be sign ifican tly easier to u se t h a n H E . H 4 0: Th er e is n o sign ifican t d ifferen ce bet ween t h e per ceived sa t isfa ct ion of em ployin g WUE P a n d H E . H 4 a : WUE P is perceived to be sign ifican tly m ore satisfactory to u se t h a n H E .

4.1.4. Experimental Design. Th e exper im en t wa s pla n n ed a s a ba la n ced wit h in su bject design wit h a con fou n din g effect , sign ifyin g t h a t t h e sa m e n u m ber of pa r t icipa n t s u sed bot h m et h ods in a differ en t or der a n d wit h differ en t exper im en t a l object s. Ta ble 2 sh ows t h e sch em a of t h e exper im en t a l design wh ich h a s been u sed in

a ll t h e in dividu a l exper im en t s. Alt h ou gh t h is exper im en t a l design wa s in t en ded t o m in im ize t h e im pa ct of lea r n in g effect s on t h e r esu lt s, sin ce n on e of t h e pa r t icipa n t s r epea t ed a n y of t h e m et h ods in t h e sa m e exper im en t a l object , ot h er fa ct or s wer e a lso pr esen t t h a t n eed ed t o be con t r olled sin ce t h ey m a y h a ve in flu en ced t h e r esu lt s. Th ese fa ct or s wer e:  Com plexit y of exper im en t a l object s, sin ce t h e com pr eh en sion of t h e m odelin g pr im it ives fr om Web a r t ifa ct s m a y h a ve a ffect ed t h e a pplica t ion of bot h in spect ion m et h ods. We a t t em pt ed t o a llevia t e t h e in flu en ce of t h is fa ct or by select in g r epr esen t a t ive Web a r t ifa ct s t h a t wer e con sider ed su it a ble, in bot h size a n d com plexit y, for a pplica t ion in t h e t im e a va ila ble for t h e execu t ion of t h e exper im en t , a n d a lso by pr ovidin g a com plet e descr ipt ion of t h e Web a r t ifa ct s t o be eva lu a t ed (gr a ph ica l a n d t ext u a l).  Or der of exper im en t a l object s a n d m et h ods, sin ce t h is m a y h a ve ca u sed lea r n in g effect s, t h u s bia sin g r esu lt s. We a t t em pt ed t o ch eck t h e in flu en ce of t h is fa ct or by a pplyin g pr oper st a t ist ica l t est s. > 4.2 Step 2. Context Definition

Th e con t ext wa s det er m in ed by a ) t h e Web a pplica t ion t o be eva lu a t ed; b) t h e u sa bilit y eva lu a t ion m et h ods t o be a pplied; a n d c) t h e su bject select ion . Th ese a r e descr ibed in t h e followin g su bsect ion s. 4.2.1. Web Application Evaluated. We con t a ct ed a Web developm en t com pa n y loca t ed in

Alica n t e (Spa in ) in or der t o obt a in Web a r t ifa ct s fr om a r ea l Web a pplica t ion . Th is Web a pplica t ion wa s developed t h r ou gh t h e u se of a m odel-dr iven Web developm en t m et h od ca lled t h e Object -Or ien t ed H yper m edia (OO-H ) [Gom ez et a l. 2000] wh ich is 2 su ppor t ed by t h e Visu a lWa de t ool . OO-H pr ovides t h e sem a n t ics a n d n ot a t ion n eeded t o develop Web a pplica t ion s. Th e pla t for m -in depen den t m odels (P IMs) t h a t r epr esen t t h e differ en t con cer n s of a Web a pplica t ion a r e: a cla ss m odel, a n a viga t ion a l m odel, a n d a pr esen t a t ion m odel. Th e Cla ss Model is U ML-ba sed a n d specifies t h e con t en t r equ ir em en t s; t h e n a viga t ion a l m odel is com posed of a set of Na viga t ion a l Access Dia gr a m s (NADs) t h a t specify t h e fu n ct ion a l r equ ir em en t s in t er m s of n a viga t ion a l n eeds a n d u ser s‟ a ct ion s; a n d t h e pr esen t a t ion m odel is com posed of a set of Abst r a ct P r esen t a t ion Dia gr a m s (AP Ds), wh ose in it ia l ver sion is obt a in ed by m er gin g t h e Cla ss Model a n d NADs, wh ich a r e t h en r efin ed in or der t o r epr esen t t h e visu a l pr oper t ies of t h e fin a l UI. Th e pla t for m -specific m odels (P SMs) a r e em bedded in a m odel com piler , wh ich a u t om a t ica lly obt a in s t h e sou r ce code (CM) fr om t h e Web a pplica t ion by t a kin g a ll t h e pr eviou sly m en t ion ed pla t for m -in depen den t m odels a s in pu t . We h a ve select ed t h e OO-H m et h od for t h e followin g r ea son s:  Th e fa ct t h a t it is a m odel-dr iven Web developm en t m et h od t h a t is bein g em ployed in t h e developm en t of r ea l Web pr oject s in a loca l com pa n y.  Th e a va ila bilit y of t h e cor r espon din g con cept u a l m odels of r ea l Web a pplica t ion s a s well a s t h eir gen er a t ed sou r ce code.  Th e fa ct t h a t it ca n be con sider ed a r epr esen t a t ive m et h od of t h e wh ole set of m odel-dr iven Web developm en t m et h ods a s it is m en t ion ed in Mor en o a n d Va llecillo [2008].  Th e flexibilit y of it s CASE t ool (Visu a lWa de) t o be ext en ded in or der t o a u t om a t e t h e eva lu a t ion of som e u sa bilit y a t t r ibu t es.

2

www.visu a lwa de.com

Th e t ype of t h e pr ovided Web a pplica t ion wa s a n in t r a n et for t a sk m a n a gem en t t o be u sed in t h e con t ext of a soft wa r e developm en t com pa n y. Two differ en t fu n ct ion a l fea t u r es (Ta sk m a n a gem en t a n d Repor t m a n a gem en t ) wer e select ed for t h e com posit ion of t h e exper im en t a l object s (O1 a n d O2), a s Ta ble 3 sh ows in det a il. We select ed t h ese fu n ct ion a l fea t u r es beca u se t h ey a r e r eleva n t t o t h e Web u ser s. Th ese fu n ct ion a l fea t u r es a r e a lso sim ila r in com plexit y, a n d t h eir r ela t ed Web a r t ifa ct s a r e a lso sim ila r in size. E a ch exper im en t a l object con t a in s t h r ee Web a r t ifa ct s: a Na viga t ion a l Access Dia gr a m (NAD), a n Abst r a ct P r esen t a t ion Dia gr a m (AP D) m odel, a n d a F in a l U ser I n t er fa ce (F UI). > 4.2.2. Inspection Methods Evaluated. Sin ce t h e con t ext of ou r fa m ily of exper im en t s wa s

fr om t h e viewpoin t of a set of u sa bilit y in spect or s, we eva lu a t ed t h e execu t ion st a ges of bot h m et h ods (WUE P a n d H E ), or in ot h er wor ds, t h e eva lu a t or s‟ a pplica t ion of bot h m et h ods. Two of t h e a u t h or s t h er efor e per for m ed t h e eva lu a t ion design er r ole in bot h m et h ods in or der t o design a n eva lu a t ion pla n . In cr it ica l a ct ivit ies su ch a s t h e select ion of u sa bilit y a t t r ibu t es in WUE P , we r equ ir ed t h e h elp of t wo ext er n a l Web u sa bilit y exper t s. Th e ou t com es of t h e st a ges per for m ed by t h e eva lu a t ion design er s a r e descr ibed a s follows. Wit h r ega r d t o t h e est a blish m en t of t h e eva lu a t ion r equ ir em en t s st a ge, t h e fir st t h r ee a ct ivit ies (i.e., pu r pose of t h e eva lu a t ion , eva lu a t ion pr ofiles, a n d select ion of Web a r t ifa ct s) wer e t h e sa m e for bot h m et h ods. In t h e ca se of t h e H E , a ll 10 h eu r ist ics wer e select ed. In t h e ca se of t h e WUE P , a set of 20 u sa bilit y a t t r ibu t es wer e select ed a s ca n dida t es fr om t h e Web U sa bilit y Model t h r ou gh t h e con sen su s r ea ch ed by t h e t wo eva lu a t or design er s a n d t h e t wo Web u sa bilit y exper t s. Th e a t t r ibu t es wer e select ed by con sider in g t h e eva lu a t ion pr ofiles (i.e., wh ich of t h em wou ld be m or e r eleva n t t o t h e t ype of Web a pplica t ion a n d t h e con t ext in wh ich it is goin g t o be u sed). On ly 12 ou t of 20 a t t r ibu t es wer e r a n dom ly select ed in or der t o m a in t a in a ba la n ce in t h e n u m ber of m et r ics a n d h eu r ist ics t o b e a pplied. Wit h r ega r d t o t h e specifica t ion of t h e eva lu a t ion st a ge, t h e 10 h eu r ist ics fr om t h e H E wer e descr ibed in det a il by pr ovidin g gu idelin es con cer n in g wh ich elem en t s ca n be con sider ed in t h e Web a r t ifa ct s t o be eva lu a t ed. E xa m ples of t h ese h eu r ist ics ca n be fou n d in Appen dix B.1.2. In t h e ca se of t h e WUE P , 13 m et r ics a ssocia t ed wit h t h e 12 select ed a t t r ibu t es wer e obt a in ed fr om t h e Web U sa bilit y Model, a n d t h en a ssocia t ed wit h t h e a r t ifa ct in wh ich t h ey cou ld be a pplied. Sin ce m et r ics ca n be a pplied a t differ en t a bst r a ct ion levels, t h e h igh est level of a pplica t ion wa s select ed. On ce t h e m et r ics h a d been a ssocia t ed wit h t h e a r t ifa ct s, t h ese m et r ics wer e oper a t ion a lized in or der t o pr ovide a ca lcu la t ion for m u la for a r t ifa ct s fr om t h e OO-H m et h od a n d t o est a blish r a t in g levels for t h em . E xa m ples of t h ese oper a t ion a lized m et r ics ca n be fou n d in Appen dix B.1.1. Wit h r ega r d t o t h e design of t h e eva lu a t ion st a ge, t h e sa m e eva lu a t ion pla n (i.e., t h e exper im en t design ), a lon g wit h t h e sa m e t em pla t e wit h wh ich t o r epor t u sa bilit y pr oblem s, wer e defin ed for bot h m et h ods. Th e t em pla t es em ployed for bot h in spect ion m et h ods ca n be fou n d in Appen dix B.4. 4.2.3. Subject selection. Alt h ou gh exper t eva lu a t or s a r e a ble t o det ect m or e u sa bilit y

pr oblem s t h a n n ovice eva lu a t or s [H er t zu m a n d J a cobsen 2001], we focu s on t h is la t t er eva lu a t or pr ofile sin ce t h e in t en t ion is t o pr ovide a Web u sa bilit y eva lu a t ion m et h od wh ich en a bles in exper ien ced eva lu a t or s t o per for m t h eir own u sa bilit y eva lu a t ion s. Th er efor e, t h e followin g gr ou ps of su bject s wer e iden t ified in or der t o fa cilit a t e t h e gen er a liza t ion of r esu lt s:  Ma st er ‟s st u den t s, a ll of wh om h a d pr eviou sly obt a in ed a degr ee in Com pu t er Scien ce. At t h e m om en t of ea ch exper im en t , t h ey wer e a t t en din g a “Qu a lit y of



Web In for m a t ion Syst em s” cou r se on t h e Ma st er s in Soft wa r e E n gin eer in g cou r se a t t h e Un iver sit a t P olit ècn ica de Va lèn cia . It h a s been sh own t h a t , u n der cer t a in con dit ion s, t h er e is n o gr ea t differ en ce bet ween t h is t ype of st u den t s a n d pr ofession a ls [Ba sili et a l. 1999; H öst et a l. 2000], a n d t h ey cou ld t h er efor e be con sider ed a s t h e n ext gen er a t ion of pr ofession a ls [Kit ch en h a m et a l. 2002]. We t h er efor e believe t h a t t h eir a bilit y t o u n der st a n d Web a r t ifa ct s obt a in ed wit h m odel-dr iven Web developm en t pr ocesses, a n d t o a pply u sa bilit y eva lu a t ion m et h ods t o t h em , ca n be com pa r a ble t o t h a t of t ypica l n ovice pr a ct it ion er s. Wit h r ega r d t o t h eir pa r t icipa t ion , a ll t h e Ma st er ‟s st u den t s wer e given on e poin t in t h eir fin a l gr a des, r ega r dless of t h eir per for m a n ces. P h D st u den t s, a ll of wh om h a d pr eviou sly obt a in ed a degr ee in Com pu t er Scien ce a n d wh ose r esea r ch a ct ivit ies a r e per for m ed in t h e Soft wa r e E n gin eer in g field . At t h e m om en t of ea ch exper im en t , t h ey wer e pa r t icipa n t s in t h e P h D Doct or a t e P r ogr a m in Com pu t er Scien ce a t t h e Un iver sit a t P olit ècn ica de Va lèn cia . Th e pa r t icipa t ion of t h ese P h D st u den t s in t h e exper im en t s wa s volu n t a r y.

We did n ot est a blish a cla ssifica t ion of pa r t icipa n t s , sin ce n eit h er t h e Ma st er ‟s n or t h e P h D st u den t s h a d a n y pr eviou s exper ien ce in con du ct in g u sa bilit y eva lu a t ion st u dies. Th e a ssign a t ion of t h e pa r t icipa n t s t o t h e exper im en t a l gr ou ps wa s t h er efor e r a n dom . Rega r din g t h e n u m ber of eva lu a t or s r equ ir ed for con du ct in g u sa bilit y st u dies, som e pr eviou s st u dies [H wa n g a n d Sa lven dy 2010] cla im t h a t 10±2 eva lu a t or s a r e n eeded t o per for m a u sa bilit y eva lu a t ion t o fin d a r ou n d 80% of u sa bilit y pr oblem s. H owever , r ecen t st u dies su ch a s Sch m et t ow [2012] r efu t e t h e idea of a n exist in g m a gic n u m ber of in spect or s for u sa bilit y eva lu a t ion s in or der t o det ect a cer t a in per cen t a ge of u sa bilit y pr oblem s . F or t h is r ea son , we did n ot est a blish a n y n u m ber of eva lu a t or s per exper im en t , bu t we t r ied t o en r oll t h e m a xim u m possible pa r t icipa n t s in ea ch in dividu a l exper im en t in or der t o det ect a r epr esen t a t ive n u m ber of u sa bilit y pr oblem s. 4.3 Step 3. Experimental Tasks and Materials

Th e m a t er ia l wa s com posed of t h e docu m en t s n eeded t o su ppor t t h e exper im en t a l t a sks a n d t h e t r a in in g m a t er ia l. Th e docu m en t s u sed t o su ppor t t h e exper im en t a l t a sks wer e:  F ou r kin ds of da t a ga t h er in g docu m en t s in or der t o cover t h e fou r possible com bin a t ion s (WUE P -O1, WUE P -O2, H E -O1, a n d H E -O2). E a ch docu m en t con t a in ed: t h e set of Web a r t ifa ct s fr om t h e exper im en t a l object wit h a descr ipt ion of t h eir m odelin g pr im it ives (a n exa m ple of t h e Web a r t ifa ct eva lu a t ed ca n be fou n d in Appen dix B.2); a n d t h e descr ipt ion of t h e t a sk s t o be per for m ed in t h ese a r t ifa ct s (a n exa m ple of t h ese t a sks for bot h u sa bilit y in spect ion m et h ods ca n be fou n d in Appen dix B.3). Alt h ou gh on ly t h r ee a r t ifa ct s wer e eva lu a t ed (NAD, AP D, a n d F UI), we a lso in clu ded a Cla ss Dia gr a m in or der t o pr ovide a bet t er u n der st a n din g of t h e Web a pplica t ion ‟s st r u ct u r e a n d con t en t .  Two a ppen dixes con t a in in g a det a iled expla n a t ion of ea ch eva lu a t ion m et h od (WUE P a n d H E ) a ppea r a t t h e en d of t h is pa per .  Two qu est ion n a ir es (on e for ea ch m et h od), wh ich con t a in ed t h e closed-qu est ion s pr esen t ed in Sect ion 4.1.2 wit h wh ich t o eva lu a t e t h e t wo su bject ive depen den t va r ia bles (i.e., P er ceived ea se of u se a n d P er ceived sa t isfa ct ion ). Va r iou s qu est ion s belon gin g t o t h e sa m e depen den t va r ia ble (i.e., con st r u ct gr ou p) wer e r a n dom ized t o pr even t syst em ic r espon se bia s. In a ddit ion , in or der t o en su r e t h e ba la n ce of it em s in t h e qu est ion n a ir e, h a lf of t h e qu est ion s on t h e left -h a n d side wer e wr it t en a s n ega t ive sen t en ces t o a void m on ot on ou s r espon ses [H u a n d Ch a u 1999]. We a lso a dded t wo open -qu est ion s in or der t o obt a in feedba ck on h ow t o

im pr ove t h e ea se of u se a n d t h e em ploym en t of bot h m et h ods. Th ese open qu est ion s wer e for m u la t ed a s follows: o Q1: Wh a t su ggest ion s wou ld you m a ke in or der t o im pr ove t h e m et h od ‟s ea se of u se? o Q2: Wh a t su ggest ion s wou ld you m a k e in or der t o m a ke t h e m et r ics/h eu r ist ics m or e u sefu l in t h e con t ext of Web u sa bilit y eva lu a t ion s? Th e t r a in in g m a t er ia ls in clu ded: i) a set of slides con t a in in g a n in t r odu ct ion t o t h e Object Or ien t ed H yper m edia m et h od in or der t o pr esen t t h e m odelin g pr im it ives of Web a r t ifa ct s; (ii) a set of slides descr ibin g t h e WUE P m et h od, wit h exa m ples of m et r ic a pplica t ion a n d t h e pr ocedu r e t o be followed in t h e exper im en t s; a n d (iii) a set of slides descr ibin g t h e H E m et h od wit h exa m ples of h eu r ist ic a pplica t ion a n d t h e pr ocedu r e t o be followed in t h e exper im en t s. All t h e docu m en t s wer e cr ea t ed in Spa n ish , sin ce t h is wa s t h e pa r t icipa n t s‟ n a t ive la n gu a ge. All t h e m a t er ia l (in clu din g t h e exper im en t a l t a sks a n d t h e t r a in in g slides) is a va ila ble for down loa d a t h t t p://www.dsic.u pv.es/~a fer n a n dez/J SS/fa m ilyexp.h t m l. 4.4 Step 4. Individual Experiments

F igu r e 3 su m m a r izes t h e fa m ily of exper im en t s by r epr esen t in g ea ch in dividu a l exper im en t a s a r ect a n gle. Th is figu r e sh ows t h e or der in wh ich t h e exper im en t s wer e execu t ed (e.g., 1st exper im en t ), t h e kin d of pa r t icipa n t s in volved a n d t h eir n u m ber , t h e n a m e a ssocia t ed wit h ea ch exper im en t (e.g., E XP ), a n d t h e kin d of r eplica t ion (e.g., in t er n a l r eplica t ion ). It is im por t a n t t o n ot e t h a t t h e n u m ber of pa r t icipa n t s is a ccor din g t o t h e fin a l a ccept ed sa m ples, sin ce we disca r ded in com plet e sa m ples, in a ddit ion t o r a n dom sa m ples wh en it wa s n ecessa r y t o m a in t a in t h e ba la n ced wit h in -su bject design (i.e., t h e sa m e n u m ber of pa r t icipa n t s per gr ou p). Th e secon d a n d t h ir d exper im en t s (RE P 1 a n d RE P 2) wer e differ en t ia t ed r eplica t ion s of t h e or igin a l exper im en t (i.e., E XP ) sin ce t h ey wer e per for m ed in differ en t set t in gs. Th is m ea n s t h a t we h a ve m a de som e con t r olled m odifica t ion s in t h e exper im en t design (e.g., pr ofile of pa r t icipa n t s, exper im en t sch edu le). In or der t o con fir m t h e r esu lt s obt a in ed in RE P 1 we r eplica t e t h is exper im en t (RE P 2) u n der t h e sa m e con dit ion s (st r ict r eplica t ion ), ch a n gin g on ly t h e su bject s [Ba sili et a l. 1999]. St r ict r eplica t ion s a r e n eeded t o in cr ea se con fiden ce in t h e con clu sion va lidit y of t h e exper im en t . > 4.5 Step 5. Family Data Analysis and Meta-Analysis

Th e r esu lt s of ea ch in dividu a l exper im en t a n d t h e fa m ily of exper im en t s wer e collect ed a n d a n a lyzed. Wit h r ega r d t o t h e a n a lysis of ea ch in dividu a l exper im en t , we u sed boxplot s a n d st a t ist ica l t est s t o a n a lyze t h e da t a collect ed. In pa r t icu la r , we t est ed t h e n or m a lit y of t h e da t a dist r ibu t ion by a pplyin g t h e Sh a pir o-Wilk t est . Th e r esu lt s of t h e n or m a lit y t est a llowed u s t o select t h e pr oper sign ifica n ce t est in or der t o t est ou r h ypot h eses. Wh en da t a wa s a ssu m ed t o be n or m a lly dist r ibu t ed (p-va lu e ≥ 0.05), we a pplied t h e pa r a m et r ic on e-t a iled t -t est for in depen den t sa m ples [J u r ist o a n d Mor en o 2001]. H owever , wh en da t a cou ld n ot be a ssu m ed t o be n or m a lly dist r ibu t ed (p-va lu e < 0.05), we a pplied t h e n on -pa r a m et r ic Ma n n -Wh it n ey t est [Con over 1998]. In or der t o t est t h e in flu en ce of Or der of Met h od a n d Or der of E xper im en t a l Object s (bot h in depen den t va r ia bles), we u sed a m et h od sim ila r t o t h a t pr oposed by Br ia n d et a l. [2005]. We u sed t h e Diff fu n ct ion : Diff x = observation x(A) - observation x(B ) (1)

wh er e x den ot es a pa r t icu la r su bject , a n d A,B a r e t h e t wo possible n om in a l va lu es of a n in depen den t va r ia ble. We cr ea t ed Diff va r ia bles fr om ea ch depen den t va r ia ble (e.g., E ffec_Diff(WUE P ) r epr esen t s t h e differ en ce in effect iven ess of t h e su bject s wh o u sed WUE P fir st a n d H E secon d. On t h e ot h er h a n d, E ffe c_Diff(H E ) r epr esen t s t h e differ en ce in effect iven ess of t h e su bject s wh o u sed H E fir st a n d WUE P secon d . Th e a im wa s t o ver ify t h a t t h er e wer e n o sign ifica n t differ en ces bet ween Diff fu n ct ion s sin ce t h a t wou ld sign ify t h a t t h er e wa s n o in flu en ce in t h e or d er of t h e in depen den t va r ia bles. We a lso a pplied t h e Sh a pir o-Wilk t est t o pr ove t h e n or m a lit y of t h e Diff fu n ct ion s. Ta ble 4 pr esen t s t h e h ypot h eses r ela t ed t o t h e Diff fu n ct ion s, wh ich a r e t wo-sided sin ce we did n ot m a k e a n y a ssu m pt ion a bou t wh et h er on e specific or der wou ld be m or e in flu en t ia l t h a n a n ot h er . We ver ified t h ese h ypot h eses by a pplyin g t h e pa r a m et r ic t wo-t a iled t -t est for in depen den t sa m ples or t h e n on -pa r a m et r ic Ma n n -Wh it n ey t est depen din g on t h e r esu lt s of t h e n or m a lit y t est . > Th ese st a t ist ica l t est s h a ve been ch osen beca u se t h ey a r e ver y r obu st a n d sen sit ive, a n d h a ve been u sed in exper im en t s sim ila r t o ou r s in t h e pa st , e.g., [Ricca et a l. 2010; Br ia n d et a l. 2005; Con t e et a l. 2005]. As u su a l, in a ll t h e t est s we decided t o a ccept a pr oba bilit y of 5% of com m it t in g a Type -I-E r r or [Woh lin et a l. 2000], i.e., of r eject in g t h e n u ll h ypot h esis wh en it is a ct u a lly t r u e. We a lso per for m ed a m et a -a n a lysis in or der t o a ggr ega t e t h e r esu lt s, sin ce t h e exper im en t a l con dit ion s wer e ver y sim ila r for ea ch exper im en t . Th is a n a lysis , wh ich is det a iled in Sect ion 7.2, en a bled u s t o ext r a ct m or e gen er a l con clu sion s wit h r ega r d t o ea ch in dividu a l exper im en t . 5.

DESIGN OF INDIVIDUAL EXPERIMENTS

In t h is sect ion , we descr ibe t h e m a in ch a r a ct er ist ics of ea ch of t h e t h r ee in dividu a l exper im en t s t h a t con st it u t e ou r fa m ily of exper im en t s. In or der t o a void u seless r edu n da n cies, we discu ss som e cla r ifica t ion s of t h e or igin a l exper im en t r ela t ed t o t h e in for m a t ion pr esen t ed in t h e pr eviou s sect ion , a n d we on ly discu ss t h e differ en ces in t h e r eplica t ion s wit h r ega r d t o t h e or igin a l exper im en t . 5.1 The Original Experiment (EXP) 5.1.1. Planning. Th is sect ion det a ils t h e exper im en t a l pla n by descr ibin g t h e con t ext ,

t h e va r ia bles, h ypot h eses, exper im en t design , a n d in st r u m en t a t ion . Th e con t ext of t h e exper im en t : we u sed bot h of t h e exper im en t a l object s descr ibed in Sect ion 4.2.1 (O1 a n d O2), we eva lu a t ed t h e execu t ion st a ges by pr ovidin g a n eva lu a t ion design a s descr ibed in Sect ion 4.2.2 (10 h eu r ist ics t o be a pplied wit h t h e H E m et h od a n d 13 m et r ics t o be a pplied wit h t h e WUE P m et h od ), a n d we select ed 12 P h D st u den t s a s pa r t icipa n t s wh ose pr ofile is descr ibed in Sect ion 4.2.3. Th e va r ia bles: we select ed a ll t h e in depen den t a n d depen den t va r ia bles descr ibed in Sect ion 4.1.2. Th e h ypot h eses: we t est ed a ll t h e h ypot h eses r ela t ed t o ea ch depen den t va r ia ble (Sect ion 4.1.3) a n d a ll t h e h ypot h eses r ela t ed t o t h e in flu en ce of t h e or der of m et h ods a n d or der of exper im en t a l object s (Sect ion 4.5). Th e exper im en t a l design : we u sed t h e ba la n ced wit h in -su bject design wit h a con fou n din g effect , pr esen t ed in Sect ion 4.1.4. Th r ee pa r t icipa n t s wer e r a n dom ly a ssign ed t o ea ch of t h e fou r gr ou ps, sin ce t h er e wa s n o differ en ce in t h eir exper ien ce in Web u sa bilit y eva lu a t ion s. Th e in st r u m en t a t ion : we u sed t h e docu m en t s pr esen t ed in Sect ion 4.3 t o su ppor t t h e exper im en t a l t a sks (4 da t a ga t h er in g docu m en t s, 2 a ppen dices a n d 2 qu est ion n a ir es) a n d t h e t r a in in g m a t er ia l (3 slide set s).

5.1.2. Operation. Th is sect ion det a ils t h e exper im en t a l oper a t ion by descr ibin g t h e

pr epa r a t ion , t h e execu t ion , t h e da t a r ecor din g, a n d t h e da t a va lida t ion . Wit h r ega r d t o t h e pr epa r a t ion of t h e exper im en t , t h e exper im en t wa s pla n n ed t o be con du ct ed in t wo da ys owin g t o t h e pa r t icipa n t s‟ a va ila bilit y a n d t h e opt im iza t ion of r esou r ces. Ta ble 5 sh ows t h e pla n n in g for bot h da ys. Th e su bject s wer e given a t r a in in g session befor e ea ch of t h e in spect ion m et h ods wa s a pplied, in wh ich t h ey wer e a lso in for m ed a bou t t h e pr ocedu r e t o follow in t h e execu t ion of t h e exper im en t . We est a blish ed a t im e slot of 90 m in u t es a s a n a ppr oxim a t ion for ea ch m et h od a pplica t ion . H owever , we a llowed t h e pa r t icipa n t s t o con t in u e t h e exper im en t even t h ou gh t h ese 90 m in u t es h a d pa ssed in or der t o a void a possible ceilin g effect [Sjøber g et a l. 2003]. > Wit h r ega r d t o t h e execu t ion of t h e exper im en t , t h e exper im en t t ook pla ce in a sin gle r oom a n d n o in t er a ct ion bet ween pa r t icipa n t s wa s a llowed. We logged a ll t h e in t er ven t ion s t h a t wer e n eces sa r y t o cla r ify qu est ion s con cer n in g t h e com plet ion of t h e exper im en t a l t a sk s, a lon g wit h possible im pr ovem en t s t h a t cou ld be m a de t o t h e exper im en t m a t er ia l. F in a lly, wit h r ega r d t o t h e da t a va lida t ion , we en su r ed t h a t a ll t h e pa r t icipa n t s h a d com plet ed a ll t h e r equ est ed da t a , a n d it wa s n ot t h er efor e n ecessa r y t o disca r d a n y sa m ples. 5.2 The Second Experiment (REP1)

Th is secon d exper im en t (fir st r eplica t ion ) wa s differ en t in t h r ee r espect s a s r ega r ds t h e or igin a l exper im en t . Th ese differ en ces a r e descr ibed a s follows:  Su bject select ion . Th e pa r t icipa n t s wer e in it ia lly 38 Ma st er ‟s st u den t s. Th e pr ofile of t h ese su bject s is descr ibed in Sect ion 4.2.3, a n d a ll of t h em a t t en ded t h e “Qu a lit y of Web In for m a t ion Syst em s” cou r se wh ich t ook pla ce fr om Apr il 2010 t o J u ly 2010. Th is cou r se wa s select ed beca u se t h e n ecessa r y pr epa r a t ion a n d t r a in in g, a n d t h e exper im en t a l t a sk it self, fit t ed t h e scope of t h is cou r se well. We t ook a “con ven ien ce sa m ple” (i.e., a ll t h e st u den t s a va ila ble in t h e cla ss). We cr ea t ed t wo gr ou ps of 10 pa r t icipa n t s, a n d t wo gr ou ps of 9 pa r t icipa n t s , despit e t h e fa ct t h a t it wou ld la t er be n ecessa r y t o disca r d sa m ples in or der t o m a in t a in a ba la n ced design .  Met r ics select ion . Sin ce on ly 12 ou t of 20 u sa bilit y a t t r ibu t es wer e r a n dom ly select ed fr om t h e Web Usa bilit y Model in t h e or igin a l exper im en t , we m a de m in im a l va r ia t ion s in or der t o en a ble n ew a t t r ibu t es t o be eva lu a t ed a s lon g a s t h e eva lu a t ion design wa s n ot a lt er ed. In pa r t icu la r , we r epla ced on e u sa bilit y a t t r ibu t e wit h a n ot h er , a n d we a lso r epla ced a m et r ic fr om a n exist in g a t t r ibu t e wit h a n ot h er m et r ic. We t h er efor e m a in t a in ed t h e sa m e n u m ber of m et r ics t o be a pplied, wh ich wa s 13.  Qu est ion n a ir e. Ta ble 6 pr esen t s t h e t wo n ew closed-qu est ion s t h a t wer e a dded in or der t o eva lu a t e t h e P er ceived Sa t isfa ct ion of Use. Th e qu est ion n a ir e t h er efor e con t a in ed a t ot a l of 10 closed -qu est ion s. > Wit h r ega r d t o t h e exper im en t pr epa r a t ion , t h e exper im en t wa s pla n n ed t o be con du ct ed over t h r ee da ys owin g t o t h e cou r se t im et a ble a n d t h e opt im iza t ion of r esou r ces. Ta ble 7 sh ows t h e pla n n in g for t h ese da ys. On t h e fir st da y, t h e pa r t icipa n t s wer e given t h e com plet e t r a in in g a n d t h ey wer e a lso in for m ed of t h e pr ocedu r e t o follow in t h e execu t ion of t h e exper im en t . Th ey wer e t old t h a t t h eir

a n swer s wou ld be t r ea t ed a n on ym ou sly, a n d wer e a lso in for m ed t h a t t h eir gr a de for t h e cou r se wou ld n ot be a ffect ed by t h eir per for m a n ce in t h e exper im en t . On t h e secon d a n d t h ir d da ys, t h e pa r t icipa n t s wer e given a n over view of t h e com plet e t r a in in g befor e a pplyin g t h e eva lu a t ion m et h od , sin ce a ll t h e gr ou ps wer e loca t ed in t h e sa m e session . As in t h e pr eviou s exper im en t , we est a blish ed a t im e slot of 90 m in u t es wit h ou t a t im e lim it for ea ch m et h od a pplica t ion . > As in t h e or igin a l exper im en t , t h e exper im en t a lso t ook pla ce in a sin gle r oom a n d n o in t er a ct ion bet ween pa r t icipa n t s wa s a llowed. Wit h r ega r d t o t h e da t a va lida t ion , we ch eck ed t h a t a ll t h e pa r t icipa n t s h a d com plet ed a ll t h e r equ est ed da t a . H owever , a t ot a l of 6 sa m ples wer e disca r ded: 4 owin g t o in com plet e da t a , a n d 2 of wh ich wer e r a n dom ly disca r ded t o m a in t a in t h e sa m e n u m ber of sa m ples per gr ou p. Th e exper im en t even t u a lly con sider ed t h e r esu lt s of on ly 32 eva lu a t or s (8 sa m ples per gr ou p). 5.3 The Third Experiment (REP2)

Th is t h ir d exper im en t (secon d r eplica t ion ) wa s a st r ict r eplica t ion of RE P 1. Th e differ en ce wit h r ega r d t o RE P 1 wa s t h e su bject select ion . Th e pa r t icipa n t s wer e in it ia lly 35 Ma st er ‟s st u den t s (Sect ion 4.2.3), a ll of wh om a t t en ded t h e “Qu a lit y of Web In for m a t ion Syst em s” cou r se wh ich t ook pla ce fr om Apr il 2011 t o J u ly 2011. We cr ea t ed t h r ee gr ou ps of 9 pa r t icipa n t s, a n d on e gr ou p of 8 pa r t icipa n t s , despit e t h e fa ct t h a t it wou ld la t er be n ecessa r y t o disca r d sa m ples in or der t o m a in t a in a ba la n ced design . Wit h r ega r d t o exper im en t pr epa r a t ion a n d execu t ion , t h er e wer e n o differ en ces wit h r ega r d t o RE P 1 sin ce t h e sa m e t h r ee da y pla n n in g wa s followed. Wit h r ega r d t o t h e da t a va lida t ion , we ch eck ed t h a t a ll t h e pa r t icipa n t s h a d com plet ed a ll t h e r equ est ed da t a . H owever , a t ot a l of 15 sa m ples wer e disca r ded: 9 owin g t o in com plet e da t a , a n d 6 of wh ich wer e r a n dom ly disca r ded t o m a in t a in t h e sa m e n u m ber of sa m ples per gr ou p. Th e exper im en t even t u a lly con sider ed t h e r esu lt s of on ly 20 eva lu a t or s (5 sa m ples per gr ou p). 6.

RESULTS

Aft er t h e execu t ion of ea ch exper im en t , t h e con t r ol gr ou p a n a lyzed a ll t h e u sa bilit y pr oblem s det ect ed by t h e su bject s. If a u sa bilit y pr oblem wa s n ot in t h e in it ia l list , t h is gr ou p det er m in ed wh et h er it cou ld be con s ider ed a s a r ea l u sa bilit y pr oblem or a fa lse posit ive. Replica t ed pr oblem s wer e con sider ed on ly on ce. Discr epa n cies in t h is a n a lysis wer e solved by con sen su s. Th e con t r ol gr ou p det er m in ed a t ot a l of 13 a n d 14 u sa bilit y pr oblem s in t h e exper im en t a l object s O1 a n d O2, r espect ively. In t h is sect ion , we discu ss t h e r esu lt s of ea ch in dividu a l exper im en t by qu a n t it a t ively a n a lyzin g t h e r esu lt s for ea ch depen den t va r ia ble a n d t est in g a ll t h e for m u la t ed h ypot h eses. We a lso a n a lyze t h e in flu en ce of t h e or der of m et h ods a n d exper im en t a l object s. All t h e r esu lt s wer e obt a in ed by u sin g t h e SP SS v16 st a t ist ica l 3 t ool wit h a st a t ist ica l sign ifica n ce level of α = 0.05. A qu a lit a t ive a n a lysis ba sed on t h e feedba ck obt a in ed fr om t h e open -qu est ion s in t h e qu est ion n a ir e will a lso be pr ovided.

3

SP SS ver sion 12.1.0 forWin dows. SP SS In c., Ch icago, 2004 .

6.1 Quantitative analysis

Ta ble 8 su m m a r izes t h e over a ll r esu lt s of t h e u sa bilit y eva lu a t ion s per for m ed in ea ch exper im en t . Th e cells in bold t ype in dica t e t h e su bject s‟ best per for m a n ce in ea ch st a t ist ic. Th e over a ll r esu lt s obt a in ed h a ve a llowed u s t o in t er pr et t h a t WU E P h a s a ch ieved t h e su bject s‟ best per for m a n ce in a ll t h e st a t ist ics t h a t wer e a n a lyzed. As obser ved in t h ese r esu lt s, WUE P t en ds t o pr ovide a low degr ee of fa lse posit ives (det ect ed u sa bilit y pr oblem s wh ich wer e con sider ed a s n ot r ea l u sa bilit y pr oblem s by t h e con t r ol gr ou p) a n d r eplica t ed pr oblem s (det ect ed u sa bilit y pr oblem s wh ich h a ve a lr ea dy been det ect ed by a pa r t icipa n t in t h e wh ole exper im en t a l object ). Th e low degr ee of fa lse posit ives ca n be expla in ed by t h e fa ct t h a t WUE P a im s t o m in im ize t h e su bject ivit y of t h e eva lu a t ion by pr ovidin g a m or e syst em a t ic pr ocedu r e (m et r ics) t o det ect u sa bilit y pr oblem s r a t h er t h a n in t er pr et in g wh et h er t h e u sa bilit y pr in ciples h a ve been su ppor t ed or n ot (h eu r ist ics). Th e low degr ee of r eplica t ed pr oblem s ca n be expla in ed by t h e fa ct t h a t WUE P pr ovides oper a t ion a lized m et r ics wh ich a r e specifica lly t a ilor ed for ea ch t ype of a r t ifa ct of t h e Web developm en t pr ocess, r edu cin g in t h is wa y t h e su bject ivit y a ssocia t ed t o gen er ic r u les t h a t r elie s on t h e exper ien ce of t h e eva lu a t or . > Th e a n a lysis of ea ch depen den t va r ia ble (E ffect iven ess, E fficien cy, P er ceived E a se of Use, a n d P er ceived Sa t isfa ct ion of U se) a n d t h e h ypot h eses t est in g is det a iled in t h e followin g su bsect ion s. 6.1.1. Effectiveness. F igu r e 4 pr esen t s t h e boxplot s con t a in in g t h e dist r ibu t ion of t h e

E ffect iven ess va r ia ble per su bject a n d per m et h od for ea ch of t h e in dividu a l exper im en t s. Th ese box plot s sh ow t h a t WUE P wa s r ela t ively m or e effect ive t h a n H E wh en in spect in g t h e u sa bilit y of t h e exper im en t a l object s. Alt h ou gh we fou n d t h e WUE P scor es t o be m or e sca t t er ed t h a n t h ose of H E (specifica lly in E XP a n d RE P 1), t h e m edia n va lu e for WU E P (bet ween 50% a n d 60% of u sa bilit y pr oblem s det ect ed) wa s m u ch h igh er t h a n t h a t for H E (bet ween 20% a n d 40%). Th is m a y r epr esen t som e va r ia bilit y in t h e pa r t icipa n t s ‟ per for m a n ce wh en det ect in g u sa bilit y pr oblem s. H owever , t h e m iddle 50 per cen t of WUE P scor es is a bove t h e t h ir d qu a r t ile of H E in a ll t h e in dividu a l exper im en t s. > In or der t o det er m in e wh et h er or n ot t h ese r esu lt s wer e sign ifica n t , we a pplie d t h e Ma n n -Wh it n ey n on -pa r a m et r ic t est t o ver ify H 1 in E XP , sin ce E ffect iven ess(WUE P ) for E XP wa s n ot n or m a lly dist r ibu t ed (p-va lu e = 0.029), a n d t h e on e-t a iled t-t est for in depen den t sa m ples t o ver ify t h is in RE P 1 a n d RE P 2, sin ce bot h E ffect iven ess(WUE P ) a n d E ffect iven ess(H E ) wer e n or m a lly dist r ibu t ed. Th e pva lu es obt a in ed for t h ese t est s wer e: 0.00 1 for E XP , 0.000 for RE P 1, a n d 0.000 for RE P 2. Th ese r esu lt s t h er efor e su ppor t t h e r eject ion of t h e n u ll h ypot h esis H 1 0 for ea ch in dividu a l exper im en t (p-va lu e < 0.05), a n d t h e a ccept a n ce of it s a lt er n a t ive h ypot h esis, m ea n in g t h a t t h e effect iven ess of WUE P is sign ifica n t ly gr ea t er t h a n t h e effect iven ess of H E . 6.1.2. Efficiency. F igu r e 5 pr esen t s t h e boxplot s con t a in in g t h e dist r ibu t ion of t h e

E fficien cy va r ia ble per su bject a n d per m et h od for ea ch in dividu a l exper im en t . Th ese box plot s sh ow t h a t WUE P wa s r ela t ively m or e efficien t t h a n H E wh en con sider in g t h e u sa bilit y of t h e exper im en t a l object s. As in t h e effect iven ess r esu lt s, t h e m edia n va lu e for WUE P (a r ou n d 0.12 u sa bilit y pr oblem s det ect ed per m in u t e) wa s m u ch

h igh er t h a n t h a t for H E (bet ween 0.05 a n d 0.07). In fa ct , t h e m iddle 50 per cen t of t h e WUE P scor es is a lso a bove t h e t h ir d qu a r t ile in a ll t h e in dividu a l exper im en t s. H owever , we fou n d t h e WUE P scor es t o be m or e sca t t er ed t h a n t h ose of H E in a ll t h e in dividu a l exper im en t s. Th is m igh t h a ve been ca u sed by differ en ces in t h e du r a t ion of t h e eva lu a t ion in ea ch m et h od em ploym en t , sin ce H E a ch ieved a m or e con st a n t a n d h igh er va lu e t h a n WUE P . > In or der t o det er m in e wh et h er or n ot t h ese r esu lt s wer e sign ifica n t , we a pplied t h e Ma n n -Wh it n ey n on -pa r a m et r ic t est t o ver ify H 2 in E XP , sin ce E fficien cy(H E ) for E XP wa s n ot n or m a lly dist r ibu t ed (p-va lu e = 0.045), a n d t h e on e-t a iled t-t est for in depen den t sa m ples t o ver ify t h is in RE P 1 a n d RE P 2, sin ce bot h E fficien cy(WUE P ) a n d E fficien cy(H E ) wer e n or m a lly dist r ibu t ed. Th e p-va lu es obt a in ed for t h ese t est s wer e: 0.000 for E XP , 0.000 for RE P 1, a n d 0.000 for RE P 2. Th ese r esu lt s t h er efor e su ppor t t h e r eject ion of t h e n u ll h ypot h esis H 2 0 for ea ch in dividu a l exper im en t (pva lu e < 0.05), a n d t h e a ccept a n ce of it s a lt er n a t ive h ypot h esis , m ea n in g t h a t t h e efficien cy of WUE P is sign ifica n t ly gr ea t er t h a n t h e efficien cy of H E . 6.1.3. Perceived Ease of Use. F igu r e 6 pr esen t s t h e boxplot s sh owin g t h e dist r ibu t ion of

t h e P er ceived E a se of U se (P E U) va r ia ble per su bject a n d per m et h od for ea ch in dividu a l exper im en t . Th ese boxplot s sh ow t h a t t h e pa r t icipa n t s per ceived WUE P t o be r ela t ively ea sier t o u se t h a n H E . Th e m edia n va lu e for WU E P (bet ween 3.8 a n d 4.4 poin t s in t h e 5-poin t Liker t sca le) wa s sligh t ly h igh er t h a n t h a t for H E (bet ween 3 a n d 3.2 poin t s). H owever , we fou n d t h e H E scor es t o be m or e sca t t er ed t h a n t h ose of WUE P in a ll t h e in dividu a l exper im en t s . Th is m a y r epr esen t con t r over sia l per cept ion s a m on g pa r t icipa n t s. > In or der t o det er m in e wh et h er or n ot t h ese r esu lt s wer e sign ifica n t , we a pplied t h e on e-t a iled t-t est for in depen den t sa m ples t o ver ify H 3 in ea ch in dividu a l exper im en t , sin ce bot h P E U(WUE P ) a n d P E U(H E ) wer e n or m a lly dist r ibu t ed. Th e pva lu es obt a in ed for t h ese t est s wer e: 0.003 for E XP , 0.000 for RE P 1, a n d 0.002 for RE P 2. Th ese r esu lt s t h er efor e su ppor t t h e r eject ion of t h e n u ll h ypot h esis H 3 0 for ea ch in dividu a l exper im en t (p-va lu e < 0.05), a n d t h e a ccept a n ce of it s a lt er n a t ive h ypot h esis, m ea n in g t h a t WUE P is per ceived a s ea sier t o u se t h a n H E . 6.1.4. Perceived Satisfaction of Use. F igu r e 7 pr esen t s t h e boxplot s sh owin g t h e

dist r ibu t ion of t h e P er ceived Sa t isfa ct ion of U se (P SU) va r ia ble per su bject a n d m et h od for ea ch in dividu a l exper im en t . Th ese boxplot s sh ow t h a t t h e pa r t icipa n t s wer e m or e sa t isfied wit h WUE P t h a n H E . Th e m edia n va lu e for WU E P (bet ween 3.8 a n d 4.4 poin t s in t h e 5-poin t Lik er t sca le) wa s sligh t ly h igh er t h a n t h a t for H E (a r ou n d 3.5 poin t s). H owever , we a lso fou n d t h a t t h e H E scor es wer e m or e sca t t er ed t h a n t h ose for WUE P in a ll t h e in dividu a l exper im en t s, pa r t icu la r ly in E XP . > In or der t o det er m in e wh et h er or n ot t h ese r esu lt s wer e sign ifica n t , we a pplied t h e on e-t a iled t-t est for in depen den t sa m ples t o ver ify H 4 in E XP a n d RE P 1, sin ce bot h P SU(WUE P ) a n d P SU(H E ) wer e n or m a lly dist r ibu t ed, a n d t h e Ma n n -Wh it n ey n on -pa r a m et r ic t est t o ver ify t h is in RE P 2, sin ce P SU (H E ) for RE P 2 wa s n ot

n or m a lly dist r ibu t ed (p-va lu e = 0.012). Th e p-va lu es obt a in ed for t h ese t est s wer e: 0.000 for E XP , 0.000 for RE P 1, a n d 0.025for RE P 2. Th ese r esu lt s t h er efor e su ppor t t h e r eject ion of t h e n u ll h ypot h esis H 4 0 in ea ch in dividu a l exper im en t (p-va lu e < 0.05), a n d t h e a ccept a n ce of it s a lt er n a t ive h ypot h esis , m ea n in g t h a t t h e su bject s wer e m or e sa t isfied wit h t h e u se of WU E P a s com pa r ed t o H E . 6.2 Influence of Order of Experimental Objects and Methods

We t h en a pplied t h e Sh a pir o-Wilk t est t o t h e Diff fu n ct ion s (Sect ion 4.5), a n d t h is a llowed u s t o det er m in e t h a t m ost of t h ese fu n ct ion s wer e n or m a lly dist r ibu t ed (pva lu e ≥ 0.05). We a lso a pplied t h e t wo-t a iled t -t est for in depen den t sa m ples a n d t h e Ma n n -Wh it n ey t est (depen din g of t h e da t a dist r ibu t ion ) in or der t o ver ify a ll t h e h ypot h eses r ela t ed t o t h e in flu en ce of or der of m et h od a pplica t ion (i.e., H M 1, H M 2 , H M 3, a n d H M 4), a n d t h e in flu en ce of or der of exper im en t a l object em ploym en t (i.e., H O 1, H O 2, H O 3 , a n d H O 4). Ta ble 9 sh ows t h a t a ll t h e p-va lu es obt a in ed wer e ≥ 0.05. We ca n t h er efor e con clu de t h a t t h er e wa s n o effect wit h r ega r d t o t h e or der of m et h od a pplica t ion a n d exper im en t a l object em ploym en t for a n y depen den t va r ia ble. > 6.3 Qualitative Analysis

Th is a n a lysis r evea led sever a l im por t a n t issu es wh ich sh ou ld be con sider ed if WUE P is t o be im pr oved. Wit h r ega r d t o t h e fir st open -qu est ion “Wh at su ggestion s w ou ld you m ak e in ord er to im prove th e m eth od ‟s ease of u se?” (Sect ion 4.3), t h e pa r t icipa n t s su ggest ed t h a t WUE P m igh t be m or e u sefu l if t h e eva lu a t ion pr ocess wer e a u t om a t ed or com pu t er -a ided (pa r t icu la r ly t h e ca lcu la t ion of cer t a in m et r ics). Wit h r ega r d t o t h e secon d open -qu est ion : “Wh at su ggestion s w ou ld you m ak e in ord er to m ak e th e m etrics m ore u sefu l in th e con text of Web u sability evalu ation s?”, t h e pa r t icipa n t s det ect ed t h a t pr ovidin g m or e exa m ples of h ow t o a pply t h e m et r ics m igh t im pr ove t h e a pplica t ion of t h e m et h od. In a ddit ion , t h ey su ggest ed t h a t a m or e det a iled descr ipt ion of t h e oper a t ion a lized m et r ic m igh t be u sefu l sin ce it wa s n ot a lwa ys ea sy t o iden t ify elem en t s of t h e Web a r t ifa ct s in volved in t h e m et r ic ca lcu la t ion . In t h e ca se of H E , a n d wit h r ega r d t o t h e fir st open -qu est ion , t h e pa r t icipa n t s r ecom m en ded a pr eviou s cla ssifica t ion of h eu r ist ics in or der t o det er m in e wh ich on es m igh t be a pplica ble t o ea ch kin d of Web a r t ifa ct obt a in ed fr om a Model-dr iven Web developm en t pr ocess, sin ce t h is m et h od h a s been com m on ly a pplied t o t h e in spect ion of fin a l u ser in t er fa ces. Wit h r ega r d t o t h e secon d open -qu est ion , t h e pa r t icipa n t s a gr eed t h a t t h e h eu r ist ics n eed t o be r edefin ed t o be m or e u sefu l sin ce t h eir descr ipt ion s a r e t oo gen er ic, t h u s lea din g in exper ien ced eva lu a t or s t o obt a in differ en t in t er pr et a t ion s. 7.

FAMILY DATA ANALYSIS

Th is sect ion pr ovides a su m m a r y of t h e r esu lt s obt a in ed. We fir st pr esen t a n a n a lysis of t h e r esu lt s in t h e con t ext of t h e fa m ily of exper im en t s, followed by t h e r esu lt s of a m et a -a n a lysis t h a t a ggr ega t es t h e em pir ica l fin din gs obt a in ed in t h e in dividu a l exper im en t s. 7.1 Summary of Results

We per for m ed a globa l a n a lysis of t h e r esu lt s t o det er m in e wh et h er t h e gen er a l goa l of ou r fa m ily of exper im en t s h a d been a ch ieved. We a lso st u died a ll t h e r esu lt s t o sea r ch for possible differ en ces. A su m m a r y of t h e exper im en t s a n d t h eir r esu lt s is pr ovided in Ta ble 10.

Th r ee exper im en t s wer e per for m ed, in wh ich da t a ga t h er ed fr om 64 su bject s wa s u sed t o t est t h e for m u la t ed h ypot h eses (see Sect ion 4.1.3). Th e m a in r esu lt of t h e fa m ily of exper im en t s in dica t es t h a t a ll t h e a lt er n a t ive h ypot h eses (H 1 a , H 2 a , H 3 a , a n d H 4 a ) wer e su ppor t ed in a ll t h e exper im en t s. Th is ou t com e sh ows t h a t WU E P wa s m or e effect ive a n d efficien t t h a n H E in t h e det ect ion of u sa bilit y pr oblem s in a r t ifa ct s obt a in ed u sin g a specific m odel-dr iven Web developm en t pr ocess (OO-H ). In a ddit ion , t h e eva lu a t or s wer e m or e sa t isfied wh en t h ey a pplied WUE P , a n d fou n d it ea sier t o u se t h a n H E . > Wit h r ega r d t o t h e E ffect iven ess va r ia ble (see Ta ble VIII a n d F igu r e 4), we det ect ed t h a t WUE P wa s a ble t o det ect a t lea st 50% of t h e t ot a l exist in g u sa bilit y pr oblem s in ea ch exper im en t , wh er ea s H E a ccou n t ed for a t lea st 30% of t h e defect s. It is im por t a n t t o n ot e t h a t on ly on e set of m et r ics wa s select ed in t h e eva lu a t ion design st a ge of WUE P , wh er ea s in H E a ll t en h eu r ist ics wer e con sider ed. Th is m a y r epr esen t pr om isin g r esu lt s a s r ega r ds t h e r a n ge of u sa bilit y a spect s t h a t a r e con sider ed in WUE P owin g t o t h e em ploym en t of it s Web u sa bilit y m odel. H owever , t h ese r esu lt s sh ow t h a t t h e r a t io of u sa bilit y pr oblem s det ect ed a r e low for bot h m et h ods, a n d cou ld be im pr oved by con sider in g m or e u sa bilit y a t t r ibu t es in WUE P a n d by r efin in g t h e h eu r ist ic descr ipt ion s in H E . Wit h r ega r d t o t h e E fficien cy va r ia ble (see Ta ble VIII a n d F igu r e 5), we det ect ed t h a t t h ose pa r t icipa n t s wh o u sed WUE P wer e a ble t o det ect on e u sa bilit y pr oblem a ppr oxim a t ely ever y 7 m in u t es (bet ween 0.14 a n d 0.17 u s a bilit y pr oblem s per m in u t e), wh er ea s t h ose pa r t icipa n t s wh o u sed H E det ect ed on e u sa bilit y pr oblem a ppr oxim a t ely ever y 14 m in u t es (bet ween 0.05 a n d 0.07 u sa bilit y pr oblem s per m in u t e). Th is cou ld h a ve been owin g t o t h e fa ct t h a t H E eva lu a t or s a r e r equ ir ed t o spen d m or e t im e on t h e in t er pr et a t ion of ea ch h eu r ist ic in ea ch Web a r t ifa ct . Wit h r ega r d t o t h e P er ceived E a se of Use va r ia ble (see Ta ble VIII a n d F igu r e 6), we det ect ed t h a t WU E P a ch ieved a m ea n scor e of 4.25, 4.16 a n d 3.73 poin t s in t h e 5poin t Lik er t sca le, wh er ea s H E a ch ieved a m ea n scor e of 3.23, 3.44 a n d 3.03 poin t s . Th is m a y in dica t e t h a t m et r ics a r e per ceived a s ea sier t o a pply t h a n h eu r ist ics. H owever , it is im por t a n t t o h igh ligh t t h a t bot h scor es a r e good r esu lt s for bot h m et h ods sin ce a ll of t h em wer e a bove t h e n eu t r a l va lu e est a blish ed a t 3 poin t s. Wit h r ega r d t o t h e P er ceived Sa t isfa ct ion of U se va r ia ble (see Ta ble VIII a n d F igu r e 7), we fou n d t h a t WUE P a ch ieved a m ea n scor e of 4.32, 4.18 a n d 3.82 poin t s in t h e 5-poin t Lik er t sca le, wh er ea s H E a ch ieved a m ea n scor e of 3.36, 3.56 a n d 3.32 poin t s. Th is m a y r epr esen t t h a t m et r ics a r e per ceived a s a u sefu l pr ocedu r e by wh ich t o eva lu a t e Web a r t ifa ct s. Th ese scor es a r e a lso good r esu lt s for bot h m et h ods sin ce a ll of t h em wer e a bove t h e n eu t r a l va lu e est a blish ed a t 3 poin t s. We a lso det ect ed sligh t differ en ces bet ween bot h t ypes of pa r t icipa n t s , sin ce t h e P h D st u den t s a ch ieved bet t er r esu lt s t h a n t h e Ma st er ‟s st u den t s. Th is cou ld h a ve been owin g t o t h e for m er ‟s level of exper ien ce in m odel-dr iven en gin eer in g. Wit h r ega r d t o t h e in flu en ce of ot h er fa ct or s, st a t ist ica l t est s a llow ed u s t o con clu de t h a t t h er e wa s n o in flu en ce wit h r ega r d t o t h e or der of m et h od a pplica t ion a n d exper im en t a l object em ploym en t for a n y depen den t va r ia ble . Th is st r en gt h en s t h e va lidit y of ou r exper im en t a l design a n d a lso m in im izes t h e possible lea r n in g effect wh en bot h m et h ods a r e em ployed. In su m m a r y, t h e r esu lt s su ppor t t h e h ypot h esis t h a t WUE P wou ld a ch ieve bet t er r esu lt s t h a n H E in t h e specified con t ext . Accor din g t o t h e pr eviou sly discu ssed r esu lt s, we ca n con clu de t h a t WUE P ca n be con sider ed a s a pr om isin g a ppr oa ch wit h wh ich t o per for m u sa bilit y eva lu a t ion s of Web a r t ifa ct s obt a in ed fr om a m odel -dr iven

Web developm en t pr ocess . H owever , WUE P wa s oper a t ion a lized in t h e con t ext of a specific pr ocess (OO-H ). We pla n t o a pply WUE P t o eva lu a t e t h e u sa bilit y of Web a r t ifa ct s obt a in ed wit h ot h er m odel-dr iven developm en t pr ocess es (e.g., WebML [Cer i et a l. 2000], UWE [Koch a n d Kr a u s 2003]). F eedba ck on h ow t o im pr ove t h e a ppr oa ch wa s a lso obt a in ed. Ru n n in g a fa m ily of exper im en t s (in clu din g r eplica t ion s) r a t h er t h a n a sin gle exper im en t pr ovided u s wit h m or e eviden ce of t h e ext er n a l va lidit y, a n d t h u s t h e gen er a liza t ion of t h e st u dy r esu lt s. E a ch r eplica t ion pr ovided fu r t h er eviden ce of t h e con fir m a t ion of t h e h ypot h esis. We ca n t h u s con clu de t h a t t h e gen er a l goa l of t h e em pir ica l va lida t ion h a s been a ch ieved. 7.2 Meta-Analysis

Alt h ou gh t h er e a r e sever a l st a t ist ica l m et h ods wit h wh ich t o a ggr ega t e a n d t o in t er pr et t h e r esu lt s obt a in ed fr om in t er r ela t ed exper im en t s [Gla ss et a l. 1981; H edges a n d Olkin 1985; Rosen t h a l 1986; Su t t on et a l. 2001 ], we u sed m et a -a n a lysis beca u se it a llowed u s t o ext r a ct m or e gen er a l con clu sion s. Met a -a n a lysis is a set of st a t ist ica l t ech n iqu es for com bin in g t h e differ en t effect sizes of t h e exper im en t s t o obt a in a globa l effect of a fa ct or . In pa r t icu la r , t h e est im a t ion of effect sizes ca n be u sed a ft er com pa r in g st u dies t o eva lu a t e t h e a ver a ge im pa ct a cr oss st u dies of a n in depen den t va r ia ble on t h e depen den t v a r ia ble. Sin ce m ea su r es m a y com e fr om differ en t set t in gs a n d m a y be n on -h om ogen eou s, a st a n da r dized m ea su r e m u st be obt a in ed for ea ch exper im en t : t h ese m ea su r es m u st be com bin ed t o est im a t e t h e globa l effect size of a fa ct or . In ou r st u dy, we con sider ed t h a t t h e u sa bilit y in spect ion m et h od wa s t h e m a in fa ct or in t h e fa m ily of t h e exper im en t s. Th e m et a -a n a lysis wa s con du ct ed by u sin g t h e Met a -An a lysis v2 t ool [Biost a t 2006]. We em ployed t h e m ea n va lu e obt a in ed u sin g t h e WUE P m et h od m in u s t h e m ea n va lu e a ch ieved wh en u sin g t h e H E m et h od t o ca lcu la t e t h e effect sizes for a ll t h e depen den t va r ia bles (i.e., E ffect iven ess, E fficien cy, P er ceived E a se of Use, P er ceived Sa t isfa ct ion of Use) for ea ch of t h e in dividu a l exper im en t s, a n d t h ese va lu es wer e t h en u sed t o obt a in t h e H edges‟ g m et r ic [H edges a n d Olkin 1985; Ka m pen es et a l. 2007], wh ich wa s u sed a s a st a n da r dized m ea su r e. Th is m ea su r e expr esses t h e m a gn it u de of t h e effect of t h e m et h od em ployed. In or der t o obt a in t h e over a ll con clu sion , we ca lcu la t ed t h e Z-scor e ba sed on t h e m ea n a n d st a n da r d devia t ion of t h e H edges‟ g st a t ist ics of t h e exper im en t s. Mor e specifica lly, we u sed cor r ela t ion coefficien t s, wh ich pr ovided t h e effect sizes t h a t h a d a n or m a l dist r ibu t ion (z i) on ce t h ey h a d been t r a n sfor m ed by t h e F is h er t r a n sfor m a t ion [F ish er 1915]. Th e globa l effect size wa s obt a in ed by u sin g t h e H edges‟ g m et r ic, wh ose weigh t s wer e pr opor t ion a l t o t h e exper im en t ‟s size:

Z

i wi zi i wi

(2)

Wh er e w i = 1/(n i-3) a n d n i is t h e sa m ple size of t h e i-t h exper im en t . Th e h igh er t h e va lu e of H edges‟ g, t h e h igh er t h e cor r espon din g cor r ela t ion coefficien t is. Ta ble 11 su m m a r izes t h e r esu lt s of t h e m et a -a n a lysis: for ea ch exper im en t , it r epor t s t h e effect size, t h e va lu es of t h e H edges‟ g m et r ic, a n d it s sign ifica n ce . F or st u dies in Soft wa r e E n gin eer in g, t h e effect size is r a t ed a s sm a ll (0 t o 0.37), m ediu m (0.38 t o 1), or la r ge (a bove 1) [Ka m pen es et a l. 2007] depen din g on t h e st a n da r dized differ en ce bet ween t h e t wo m ea n s m 1 a n d m 2. F or exa m ple, a n effect size of 0.5 in dica t es t h a t m 1 = m 2 + (0.5 * d), wh er e d is t h e st a n da r d devia t ion (i.e., a posit ive va lu e sign ifies t h a t WUE P a ch ieved bet t er r esu lt s t h a n H E in t h e depen den t va r ia ble defin ed).

> F or t h e r ea der ‟s con ven ien ce, we sh ow t h e m et a -a n a lysis r esu lt s in dia gr a m for m by u sin g a for est plot (or blobbogr a m ). F igu r e 8 sh ows t h e fou r dia gr a m s a s pr ovided by t h e t ool u sed. On t h e left -h a n d side, t h e exper im en t s a r e r epor t ed in ch r on ologica l or der fr om t h e t op down wa r ds. On t h e r igh t -h a n d side, t h e effect of t h e H edges‟ g m et r ic is plot t ed for ea ch exper im en t by a squ a r e wh ose dim en sion s a r e pr opor t ion a l t o t h e weigh t of t h e exper im en t in t h e m et a -a n a lysis. Th e est im a t ion s for st u dies wit h a la r ge sa m ple size a r e m or e a ccu r a t e, sign ifyin g t h a t t h ey m a ke a gr ea t er con t r ibu t ion t o t h e over a ll effect . Th e squ a r e size is pr opor t ion a l t o t h e n u m ber of pa r t icipa n t s a n d t h e exper im en t effect size, a n d t h e squ a r e posit ion wit h r ega r d t o t h e x a xis in dica t es t h e H edges‟ g va lu e. Th e con fiden ce in t er va ls of ea ch exper im en t a r e r epr esen t ed by t h e h or izon t a l lin es. H er e we h a ve con sider ed a con fiden ce in t er va l of 95% for ea ch exper im en t . Th e con fiden ce in t er va l [-1, 0] in dica t es a n ega t ive cor r ela t ion , wh er ea s t h e con fiden ce in t er va l [0, 1] in dica t es a posit ive cor r ela t ion . Th e over a ll con clu sion is r epr esen t ed by a dia m on d in t h e la st r ow of t h e figu r e. In pa r t icu la r , t h e su m m a r y m ea su r e is t h e cen t er lin e of t h e dia m on d, wh ile t h e a ssocia t ed con fiden ce in t er va l is t h e la t er a l t ips of t h e d ia m on d. > Th e effect size obt a in ed wa s la r ge for t h e object ive depen den t va r ia bles (i.e., E ffect iven ess a n d E fficien cy) a n d m ediu m for t h e su bject ive depen den t va r ia bles (i.e., P er ceived E a se of Use a n d P er ceived Sa t isfa ct ion of Use). Th is wa s pr oba bly a r esu lt of t h e n u m ber of exper im en t s u sed in t h e da t a m et a -a n a lysis. Despit e t h e fa ct t h a t t h e fir st exper im en t con t r ibu t ed t o t h e over a ll r esu lt s of t h e m et a -a n a lysis t o a lesser ext en t , t h ese r esu lt s pr esen t a sign ifica n t posit ive effect , a n d we ca n t h u s r eject t h e n u ll h ypot h eses wh ich wer e for m u la t ed for ea ch depen den t va r ia ble (i.e., “t h er e a r e n o sign ifica n t differ en ces bet ween WUE P a n d H E ”). Th e m et a -a n a lysis t h er efor e st r en gt h en s a ll t h e a lt er n a t ive h ypot h eses, pr ovidin g pr om isin g r esu lt s a s r ega r ds WUE P ‟s per for m a n ce. 8.

THREATS TO VALIDITY

We m u st con sider cer t a in issu es wh ich m a y h a ve t h r ea t en ed t h e va lidit y of t h e fa m ily of exper im en t s: 8.1 Internal Validity. Th e t h r ea t s t o in t er n a l va lidit y a r e r eleva n t in t h os e st u dies t h a t

a t t em pt t o est a blish a ca u sa l r ela t ion sh ip. In ou r ca se, t h e m a in t h r ea t s t o t h e in t er n a l va lidit y of t h e fa m ily of exper im en t s wer e: lea r n in g effect , u sa bilit y exper t s‟ exper ien ce, su bject s‟ exper ien ce, in for m a t ion exch a n ge a m on g pa r t icipa n t s, a n d u n der st a n da bilit y of t h e docu m en t s. Th e lea r n in g effect wa s a llevia t ed by en su r in g t h a t ea ch pa r t icipa n t a pplied ea ch m et h od t o differ en t exper im en t a l object s, a n d a ll t h e possible or der com bin a t ion s wer e con sider ed. We a lso a ssessed t h e effect of or der of m et h od a n d or der of exper im en t a l object by u sin g st a t ist ica l t est s . Usa bilit y exper t s‟ exper ien ce m a y be a n in flu en t ia l fa ct or in bu ildin g t h e ba selin e of u sa bilit y pr oblem s det ect ed. We in t en ded t o a llevia t e t h is t h r ea t by in volvin g t h ese exper t s in t h e evolu t ion of t h is ba selin e a ccor din g t o t h e n ew u sa bilit y pr oblem s det ect ed by t h e pa r t icipa n t s. Su bject s‟ exper ien ce wa s n ot a n in flu en t ia l fa ct or sin ce n on e of t h e pa r t icipa n t s h a d a n y exper ien ce in u sa bilit y eva lu a t ion s. We con fir m ed t h is fa ct by a sk in g t h e pa r t icipa n t s a bou t t h eir exper ien ce wit h u sa bilit y eva lu a t ion m et h ods. Th is fa ct wa s t h e r a t ion a le for pr ovidin g t h e t r a in in g session s in bot h m et h ods in ea ch exper im en t

sin ce we in t en ded t o ba la n ce t h e su bject ‟s kn owledge on t h e Web u sa bilit y eva lu a t ion m et h od a ccor din g t o t h e n ovice eva lu a t or pr ofile. H owever , t h e t r a in in g session s m a y h a ve a ffect ed t h e per for m a n ce of t h e exper im en t s , sin ce t h e pa r t icipa n t s r eceived t h e com plet e t r a in in g im m edia t ely befor e t h e exper im en t a l t a sk s in t h e or igin a l exper im en t (E XP ), wh er ea s in t h e r eplica t ion s (RE P 1 a n d RE P 2 ) t h e pa r t icipa n t s r eceived t h e com plet e t r a in in g on t h e pr eviou s da y. In or der t o a llevia t e t h is issu e, we in clu ded a t r a in in g slot befor e t h e exper im en t a l t a sks in RE P 1 a n d RE P 2 in or der t o r em in d t h e pa r t icipa n t s of t h e em ploym en t of bot h in spect ion m et h ods . In or der t o m in im ize in for m a t ion exch a n ge a m on g pa r t icipa n t s, t h ey were monitor ed by th e exper im ent conductors to avoid com mun ication biases wh ile perform ing th e tasks. However , th is m igh t h a ve a ffect ed t h e r esu lt s sin ce t h e exper im en t t ook pla ce over m or e t h a n on e da y, a n d it is difficu lt t o be cer t a in wh et h er t h e pa r t icipa n t s exch a n ged a n y in for m a t ion wit h ea ch ot h er . In order to alleviate this situation, at least to som e exten t , th e par ticipants were asked to retu rn all th e material at th e end of each task. Mor eover , sin ce t h e pa r t icipa n t s in RE P 1 a n d RE P 2 wer e fr om t h e sa m e Ma st er ‟s cou r se bu t fr om differ en t a ca dem ic yea r s, we en su r ed t h a t n o pa r t icipa n t s wh o wer e en r olled in RE P 1 wer e a lso en r olled in RE P 2. F in a lly, u n der st a n da bilit y of t h e m a t er ia l wa s a llevia t ed by clea r in g u p a ll t h e m isu n der st a n din gs t h a t a ppea r ed in ea ch exper im en t a l session . 8.2 External validity. Th is r efer s t o t h e a ppr oxim a t e t r u t h of con clu sion s in volvin g

gen er a liza t ion s wit h in differ en t con t ext s. In ou r ca se, t h e m a in t h r ea t s t o t h e ext er n a l va lidit y of t h e fa m ily of exper im en t s wer e: r epr esen t a t iven ess of t h e r esu lt s a n d t h e size a n d com plexit y of t h e t a sks . Th e r epr esen t a t iven ess of t h e r esu lt s m igh t be a ffect ed by t h e eva lu a t ion design a n d t h e pa r t icipa n t con t ext select ed. Th e eva lu a t ion design m igh t h a ve m a de a n im pa ct on t h e r esu lt s owin g t o t h e select ion of Web a r t ifa ct s (exper im en t a l object s) a n d u sa bilit y a t t r ibu t es t o be eva lu a t ed du r in g t h e design st a g e of WUE P . Wit h r ega r d t o t h e select ion of Web a r t ifa ct s, we a t t em pt ed t o a llevia t e t h is by con sider in g a set of a r t ifa ct s wit h t h e sa m e size a n d com plexit y, a n d wh ich a lso con t a in ed r epr esen t a t ive a r t ifa ct s of a Model-dr iven Web developm en t pr ocess (i.e., n a viga t ion a l m odel, pr esen t a t ion m odel a n d fin a l u ser in t er fa ce). Wit h r ega r d t o t h e select ion of u sa bilit y a t t r ibu t es, w e a t t em pt ed t o a llevia t e t h is t h r ea t by con sider in g a set of r eleva n t u sa bilit y a t t r ibu t es by in volvin g Web u sa bilit y exper t s in t h is decision . In or der t o a llevia t e t h ese issu es, we in t en d t o eva lu a t e m or e Web a pplica t ion s, a n d t o ca r r y ou t su r veys t o pr ovide a pr edefin ed set of u sa bilit y a t t r ibu t es t o be eva lu a t ed in differ en t Web a pplica t ion fa m ilies (e.g., in t r a n et s, socia l n et wor k s, vir t u a l m a r t s) wh ich will be u sefu l a s gu ida n ce for eva lu a t or design er s . In a ddit ion , WUE P h a s been oper a t ion a lized t o be u sed in t h e con t ext of a specific m odel-dr iven Web developm en t m et h od (OO-H ). Con sequ en t ly, ou r r esu lt s ca n on ly be gen er a lized t o Web a pplica t ion s t h a t follow a m odel-dr iven Web developm en t pr ocess t h a t is ba sed on t h e OO-H m et h od. Never t h eless, it ca n be con sider ed a r epr esen t a t ive m et h od of t h e wh ole set of m odel-dr iven Web developm en t m et h ods [Mor en o a n d Va llecillo 2008]. In or der t o per for m differ en t ia t ed r eplica t ion s by con sider in g ot h er m odel-dr iven Web developm en t m et h ods, we wou ld n eed t o per for m som e a da pt a t ion s in t h e st a ges belon gin g t o t h e design eva lu a t or r ole pr oposed in WUE P . Wit h r ega r d t o t h e establish m en t of th e evalu a tion requ irem en ts st a ge (see Sect ion 3.1), t h e a da pt a t ion s n eeded wou ld be:  Upda t e t h e evalu ation profile sin ce t h e m odel-dr iven Web developm en t m et h od t o in t egr a t e WU E P h a s ch a n ged. In a ddit ion , ot h er a spect s su ch a s t h e type of Web application t o be eva lu a t ed a n d it s con text of u se wou ld a lso ch a n ge t o r eflect t h e ch a r a ct er ist ics of t h e Web a pplica t ion t o be eva lu a t ed .



Upda t e t h e selection of Web artifacts (m odels) a s a con sequ en ce of t h e n ew eva lu a t ion pr ofile.  Upda t e t h e selection of u sability attribu tes fr om t h e Web Usa bilit y Model if t h e t ype of Web a pplica t ion h a s ch a n ged in or der t o con sider ot h er u sa bilit y a t t r ibu t es t h a t a r e r eleva n t t o t h is specific t ype of Web a pplica t ion . Wit h r ega r d t o t h e specification of th e evalu ation st a ge (see Sect ion 3.1), t h e a da pt a t ion s n eed ed wou ld be:  Upda t e t h e selection of m etrics a s a con sequ en ce of t h e n ew u sa bilit y a t t r ibu t es select ed.  Operation alize th e selected m etrics in or der t o be a pplied t o t h e n ew Web a r t ifa ct s (m odels) select ed . Th is oper a t ion a liza t ion con sist s of est a blish in g a m a ppin g bet ween t h e gen er ic defin it ion of t h e m et r ic a n d t h e m odelin g pr im it ives of t h e n ew Web a r t ifa ct s con sider ed, wh ich will r equ ir e a pr eviou s a n a lysis of t h e expr essiven ess of t h e m odelin g pr im it ives. In or der t o su ppor t t h is t a sk, t h e equ iva len ce of t h e m odelin g pr im it ives of t h e OO-H m et h od wit h r ega r d t o ot h er m odel-dr iven Web developm en t m et h ods (e.g., U WE , WebML) descr ibed in Ca ch er o et a l. [2007] ca n be u sed. It is im por t a n t t o n ot e t h a t m et r ics wh ich wer e oper a t ion a lized t o be a pplied a t t h e fin a l u ser in t er fa ce ca n be r eu sed wit h ou t a n y a da pt a t ion . Despit e t h e fa ct t h a t a ll t h e in dividu a l exper im en t s wer e per for m ed in a n a ca dem ic con t ext (P h D a n d Ma st er ‟s st u den t s), t h e pa r t icipa n t s‟ per for m a n ce cou ld be con sider ed t o be r epr esen t a t ive of sin gle-exper ien ced eva lu a t or s (i.e., eva lu a t or s wh o h a ve exper ien ce on t h e dom a in , bu t n ot in u sa bilit y eva lu a t ion s) sin ce t h e kin ds of st u den t s in volved will be soon in t egr a t ed in t o t h e in du st r y‟s m a r ket . As fu r t h er wor k, we a r e in t en ded t o con du ct m or e exper im en t s in volvin g dou ble-exper ien ced eva lu a t or s (i.e., eva lu a t or s wh o h a ve exper ien ce on t h e dom a in a n d in u sa bilit y eva lu a t ion s) in or der t o a ssess h ow t h e exper ien ce level wou ld im pa ct on t h e obt a in ed r esu lt s. In a ddit ion , sin ce on ly in t er n a l r eplica t ion s wer e con du ct ed, m or e ext er n a l r eplica t ion s n eed t o be con du ct ed by ot h er exper im en t a l con du ct or s in ot h er set t in gs t o con fir m t h ese r esu lt s. In or der t o a ddr ess t h e a for em en t ion ed lim it a t ion s, t h ese ext er n a l r eplica t ion s will in volve pa r t icipa n t s fr om differ en t con t ext s a n d a lso fr om differ en t levels of exper ien ce in Web u sa bilit y eva lu a t ion s. Th e size a n d com plexit y of t h e t a sks m igh t h a ve a lso a ffect ed t h e ext er n a l va lidit y. We decided t o u se r ela t ively sm a ll t a sk s t h a t wou ld be a pplied in few r epr esen t a t ive Web a r t ifa ct s sin ce a con t r olled exper im en t r equ ir es pa r t icipa n t s t o com plet e t h e a ssign ed t a sks in a lim it ed a m ou n t of t im e. 8.3 Construct validity. Th e con st r u ct va lidit y of t h e fa m ily of exper im en t s m a y h a ve been

in flu en ced by t h e m ea su r es t h a t wer e a pplied in t h e qu a n t it a t ive a n a lysis a n d t h e r elia bilit y of t h e qu est ion n a ir e. We in t en ded t o a llevia t e t h e fir st t h r ea t by eva lu a t in g t h e depen den t va r ia bles t h a t a r e com m on ly em ployed in exper im en t s in wh ich u sa bilit y in spect ion m et h ods a r e in volved. In pa r t icu la r , we em ployed t h e E ffect iven ess a n d E fficien cy m ea su r es a s su ggest ed by H a r t son et a l. [2000] for for m a t ive eva lu a t ion s (i.e., u sa bilit y eva lu a t ion s du r in g t h e Web developm en t pr ocess). Th ese m ea su r es h a ve a lso been em ployed in sim ila r em pir ica l st u dies [Con t e et a l. 2009]. In a ddit ion , t h e su bject ive m ea su r es em ployed wer e P er ceived E a se of U se a n d P er ceived Sa t isfa ct ion of Use , ba sed on t h e Tech n ology Accept a n ce Model (TAM) [Da vis 1989], a well-kn own a n d t h or ou gh ly va lid a t ed m odel for eva lu a t in g in for m a t ion t ech n ologies. Th e r elia bilit y of t h e qu est ion n a ir es wa s t est ed by a pplyin g t h e Cr on ba ch t est . Ta ble 12 sh ows t h e Cr on ba ch ‟s a lph a obt a in ed for ea ch set of closed -qu est ion s in t en ded t o m ea su r e bot h su bject ive depen den t va r ia bles (P er ceived E a se of Use a n d

P er ceived Sa t isfa ct ion of Use). All t h e va lu es obt a in ed wer e h igh er t h a n t h e a ccept a ble m in im u m t h r esh old (α ≥ 0.70) [Ma xwell 2002]. > 8.4 Conclusion validity. Th e m a in t h r ea t s t o t h e con clu sion va lidit y of t h e fa m ily of

exper im en t s wer e t h e da t a collect ion a n d t h e va lidit y of t h e st a t ist ica l t est s a pplied. Wit h r ega r d t o t h e da t a collect ion , we a pplied t h e sa m e pr ocedu r e in ea ch in dividu a l exper im en t in or der t o ext r a ct t h e da t a , a n d en su r ed t h a t ea ch depen den t va r ia ble wa s ca lcu la t ed by a pplyin g t h e sa m e for m u la . Wit h r ega r d t o t h e va lidit y of t h e st a t ist ica l t est s a pplied , we a pplied t h e m ost com m on t est s t h a t a r e em ployed in t h e em pir ica l soft wa r e en gin eer in g field owin g t o t h eir r obu st n ess a n d sen sit ivit y [Ma xwell 2002]. 9.

CONCLUSIONS AND FUTURE WORK

Th e la ck of u sa bilit y eva lu a t ion m et h ods t h a t ca n pr oper ly be in t egr a t ed in t o ea r ly st a ges of Web developm en t pr ocesses m ot iva t ed u s t o pr opose, in a pr eviou s wor k , t h e Web Usa bilit y E va lu a t ion P r ocess (WUE P ) [F er n a n dez et a l. 2011b] a s a n in spect ion m et h od wh ich ca n be in st a n t ia t ed a n d in t egr a t ed in t o differ en t m odel-dr iven Web developm en t pr ocesses. In t h is pa per , we h a ve r epor t ed t h e r esu lt s of a fa m ily of exper im en t s a im ed a t eva lu a t in g pa r t icipa n t s‟ effect iven ess, efficien cy, per ceived ea se of u se, a n d per ceived sa t isfa ct ion of u se wh en u sin g WUE P in com pa r ison t o a n in du st r ia l widely-u sed in spect ion m et h od ba sed on h eu r ist ics: H eu r ist ic E va lu a t ion (H E ). Th e r esu lt s of t h e qu a n t it a t ive a n a lysis sh owed u p t h a t WUE P wa s m or e effect ive a n d efficien t t h a n H E in t h e det ect ion of u sa bilit y pr oblem s in a r t ifa ct s obt a in ed fr om a specific m odel-dr iven Web developm en t pr ocess (i.e., OO-H ). Th ese r esu lt s wer e su ppor t ed by a m et a -a n a lysis t h a t wa s per for m ed in or der t o a ggr ega t e em pir ica l fin din gs fr om ea ch in dividu a l exper im en t . Th e low r a t io of fa lse posit ives obt a in ed by WUE P su ggest s t h a t t h e u se of m et r ics a s pa r t of t h e eva lu a t ion pr ocess r edu ces t h e degr ee of su bject ivit y in t h e eva lu a t ion of Web a r t ifa ct s. Th e low r a t io of r eplica t ed pr oblem s ca n be expla in ed by t h e fa ct t h a t WUE P pr ovides oper a t ion a lized m et r ics wh ich a r e specifica lly t a ilor ed for ea ch t ype of a r t ifa ct of t h e Web developm en t pr ocess, r edu cin g in t h is wa y t h e su bject ivit y a ssocia t ed t o gen er ic r u les t h a t r elies on t h e exper ien ce of t h e eva lu a t or . In a ddit ion , wit h r ega r d t o t h e eva lu a t or s‟ per cept ion s, t h e pa r t icipa n t s wer e m or e sa t isfied wh en t h ey a pplied WU E P , a n d t h ey a lso fou n d it ea sier t o u se t h a n H E . Th e r esu lt s of t h e qu a lit a t ive a n a lysis su ggest t h a t WUE P cou ld be gr ea t ly im pr oved wit h a t ool t h a t a u t om a t es m ost of t h e t a sk s in volved in t h e m et h od, in clu din g t h e ca lcu la t ion of som e m et r ics a n d a llowin g t h e gen er a t ion of u sa bilit y r epor t s. F r om a r esea r ch per spect ive, t h e fa m ily of exper im en t s wa s a va lu a ble m ea n s t o obt a in feedba ck wit h wh ich t o im pr ove ou r Web U sa bilit y E va lu a t ion P r ocess. As fa r a s we kn ow, t h is is t h e fir st em pir ica l st u dy t h a t pr ovides eviden ce of t h e u sefu ln ess of a u sa bilit y eva lu a t ion m et h od for a m odel-dr iven Web developm en t pr ocess. Th is em pir ica l st u dy is in t en ded t o con t r ibu t e t o t h e Web E n gin eer in g r esea r ch t h r ou gh it s pr oposa l of a well-defin ed fr a m ewor k t h a t ca n be r eu sed by ot h er r esea r ch er s in t h e em pir ica l va lida t ion of t h eir Web u sa bilit y eva lu a t ion m et h ods. F r om a pr a ct ica l per spect ive, we a r e a wa r e t h a t t h is st u dy pr ovides on ly pr elim in a r y r esu lt s on t h e u sefu ln ess of ou r Web Usa bilit y E va lu a t ion P r ocess in pr a ct ice. Alt h ou gh t h e exper im en t a l r esu lt s pr ovided good r esu lt s a s r ega r ds t h e per for m a n ce of ou r u sa bilit y in spect ion m et h od for Web a pplica t ion s developed u sin g m odel-dr iven developm en t , t h ese r esu lt s n eed t o be in t er pr et ed wit h ca u t ion sin ce

t h ey a r e on ly va lid wit h in t h e con t ext est a blish ed in t h is fa m ily of exper im en t s. Th er e is a n eed for m or e em pir ica l st u dies wit h wh ich t o t est ou r pr oposa l in ot h er set t in gs. N ever t h eless, t h is st u dy h a s va lu e a s a fir st st u dy t o t est t h e in t egr a t ion of u sa bilit y eva lu a t ion s in t o m odel-dr iven Web developm en t pr ocesses . As fu t u r e wor k, we in t en d t o per for m m or e r eplica t ion s in or der t o m in im ize t h e in flu en ce of t h e t h r ea t s t o va lidit y iden t ified. In pa r t icu la r , t h ese r eplica t ion s will con sider : differ en t exper im en t a l design s con du ct ed by ext er n a l exper im en t er s (i.e., ext er n a l r eplica t ion s); n ew kin ds of pa r t icipa n t s su ch a s pr a ct it ion er s fr om in du st r y wit h differ en t levels of exper ien ce in u sa bilit y eva lu a t ion s; ot h er Web a r t ifa ct s fr om differ en t m odel-dr iven Web developm en t pr ocesses (e.g., WebML [Cer i et a l. 2000], UWE [Koch a n d Kr a u s 2003]); a n d Web a pplica t ion s fr om differ en t dom a in s (e.g., e com m er ce, in t r a n et ) u sin g pr edefin ed set s of u sa bilit y a t t r ibu t es (u sa bilit y pr ofiles). We pla n t o defin e t h ese u sa bilit y pr ofiles con du ct in g su r veys a m on g u sa bilit y a n d Web dom a in exper t s. APPENDIX A: EXAMPLES OF BOTH USABILITY INSPECTION METHODS

Th is a ppen dix pr esen t s br ief exa m ples of h ow bot h in spect ion m et h ods ca n be a pplied in or der t o eva lu a t e Web a r t ifa ct s. Sect ion A.1 sh ows a n exa m ple of t h e WUE P execu t ion st a ge wh en a m et r ic is a pplied t o a n a viga t ion a l m odel (Na viga t ion a l Access Dia gr a m fr om Object Or ien t ed H yper m edia ). Sect ion A.2 sh ows a n exa m ple of t h e H E execu t ion st a ge wh en a h eu r ist ic is a pplied t o a fin a l u ser in t er fa ce. A.1 WUEP example

Let u s su ppose t h a t we wish t o eva lu a t e t h e Ca n cel Su ppor t a t t r ibu t e (ext r a ct ed fr om t h e Web Usa bilit y Model) in a Na viga t ion a l Access Dia gr a m (NAD0) wh ich r epr esen t s t h e con t a ct m a n a gem en t fu n ct ion a lit y of a Web a pplica t ion t h a t h a s been developed by u sin g Object Or ien t ed H yper m edia (OO-H ). A br ief expla n a t ion of t h e m odelin g pr im it ives of a NAD in OO-H is pr ovided in Ta ble A1 for t h e r ea der ‟s con ven ien ce. >

F igu r e A1 pr esen t s t h e Web a r t ifa ct t h a t is goin g t o be eva lu a t ed (NAD0). U ser s ca n r et r ieve t h e in for m a t ion con cer n in g all con tacts or t h ey ca n sea r ch for a given con t a ct by pr ovidin g a n in itial or a sea r ch strin g. Th ese fu n ct ion a lit ies a r e r epr esen t ed by t h e t h r ee n a viga t ion a l lin k s t h a t con n ect t h e m en u collect ion a n d t h e Con tact Details n a viga t ion a l cla s s. User s ca n a lso cr ea t e a n ew u ser by execu t in g t h e N ew m et h od in t h e Create Con tact n a viga t ion a l cla ss, a n d ca n m odify a n exist in g con t a ct by execu t in g t h e M od ify m et h od in t h e Con tact Details n a viga t ion a l cla ss. >

Ta ble A2 pr esen t s t h e oper a t ion a liza t ion of t h e User Operation Can cellability m et r ic (a ssocia t ed wit h t h e Can cel S u pport a t t r ibu t e) in or der for it t o be a pplied t o Na viga t ion a l Access Dia gr a m s (NADs) fr om Object Or ien t ed H yper m edia (OO-H ). > As is sh own in F igu r e A2, wh en t h e UOC m et r ic is a pplied t o NAD0, we obt a in t h e va lu e UOC (NAD) = 1/2 = 0.5, sin ce on ly t h e Ser vice Lin k a ssocia t ed wit h t h e Modify m et h od pr ovides a r et u r n Ta r get Lin k t o t h e pr eviou s n a viga t ion st ep. Con sider in g

t h e est a blish ed t h r esh olds in t h e m et r ic oper a t ion a liza t ion , t h is is a m ediu m u sa bilit y pr oblem (U P 001) wh ich is r epor t ed a s sh own in Ta ble A3. > > A.2 HE example

Let u s su ppose t h a t we wish t o eva lu a t e t h e fin a l u ser in t er fa ce (F UI0 pr esen t ed in F igu r e A3) of t h e sa m e Con t a ct Ma n a gem en t fu n ct ion a lit y pr eviou sly m en t ion ed in A.1 by a pplyin g t h e Visibility of th e S ystem S tatu s h eu r ist ic. > Th e Visibilit y of t h e Syst em St a t u s (VSS) h eu r ist ic st a t es t h a t : “Th e syst em sh ou ld a lwa ys k eep u ser s in for m ed a bou t wh a t is goin g on , t h r ou gh a ppr opr ia t e feedba ck wit h in r ea son a ble t im e. Th e t wo m ost im por t a n t t h in gs t h a t u ser s n eed t o kn ow a t you r sit e a r e pr oba bly w h ere am I? a n d w h ere can I go n ext?. So it is im por t a n t t o k eep u ser s in for m ed a bou t wh a t is h a ppen in g. To pr ove t h is, look for feedba ck on ea ch u ser in t er a ct ion . Ma ke su r e ea ch pa ge is br a n ded a n d t h a t you in dica t e wh ich sect ion it belon gs t o. Lin k s t o ot h er pa ges sh ou ld be clea r ly m a r k ed. Sin ce u ser s cou ld be ju m pin g t o a n y pa r t of you r sit e fr om som ewh er e else, you n eed t o in clu de t h is st a t u s on ever y pa ge. F or exa m ple, wh en a u ser clicks on a 'Sen d' lin k in a n or der for m , feedba ck is n eeded t h a t t ells you a bou t wh et h er you r or der h a s been r eceived by t h e sit e. Th is in for m a t ion m a y a ppea r a s a differ en t pa ge or a popu p t h a t con t a in s a lin k ba ck t o t h e m a in sit e”. As is sh own in F igu r e A4, wh en t h e VSS h eu r ist ic is a pplied t o F UI0, we det ect fou r elem en t s wit h t h e ca pa bilit y of pr ovidin g feedba ck a bou t t h e cu r r en t st a t u s of t h e Web a pplica t ion . Th ese elem en t s a r e: t h e t a bs t h a t pr esen t t h e m a in m en u , t h e t ext box t h a t sh ows t h e con n ect ed u ser , a n d bot h t it les for t h e left m en u a n d for t h e m a in con t en t . Sin ce t h e t a bs do n ot pr ovide feedba ck a bou t wh ich sect ion h a s pr eviou sly been select ed a n d t h e t it le “List of con t a ct s” does n ot pr ovide feedba ck a bou t wh a t cr it er ia wa s u sed t o filt er t h e con t a ct s, we ca n con sider t h e u sa bi lit y pr in ciple t h a t is r epr esen t ed by t h e h eu r ist ic a s pa r t ia lly su ppor t ed by t h e Web a r t ifa ct . Th er e is t h er efor e a u sa bilit y pr oblem t h a t is r epor t ed a s sh own in Ta ble A4. In t h is ca se, t h e sever it y level of t h e u sa bilit y pr oblem is ba sed on t h e eva lu a t or opin ion . It is a lso im por t a n t t o n ot e t h a t , in a m odel-dr iven Web developm en t a ppr oa ch , t h is u sa bilit y pr oblem m a y be cor r ect ed by con sider in g t h e pr eviou s Web a r t ifa ct s t h a t wer e em ployed t o a u t om a t ica lly gen er a t e t h is fin a l u ser in t er fa ce (i.e., t h e Abst r a ct P r esen t a t ion Dia gr a m t h a t defin es t h e u ser in t er fa ce a n d t h e code gen er a t ion r u les t h a t t r a n sfor m NADs a n d AP Ds in t o t h e fin a l UI sou r ce code). > >

APPENDIX B: EXCERPTS FROM THE EXPERIMENTAL MATERIAL

Th is a ppen dix pr esen t s excer pt s fr om a ll t h e differ en t exper im en t a l m a t er ia ls. Sect ion B.1 pr esen t s a n excer pt fr om bot h t h e WUE P a n d H E a ppen dixes t h a t con t a in t h e oper a t ion a lized m et r ics a n d h eu r ist ics t o be a pplied, r es pect ively. Sect ion

B.2 sh ows a n exa m ple of a Web a r t ifa ct t o be eva lu a t ed: t h e Abst r a ct P r esen t a t ion Dia gr a m (Web a r t ifa ct AP D1) for t h e Ta sk Ma n a gem en t fu n ct ion a lit y ext r a ct ed fr om t h e E xper im en t a l Object 1 (wh ich is in clu ded in t h e da t a ga t h er in g docu m en t s: WUE P -O1 a n d H E -O1). Sect ion B.3 collect s t h e exper im en t a l t a sk s t o be ca r r ied ou t wh en WUE P a n d H E a r e a pplied t o AP D1 (t h ese t a sk s a r e a lso in clu ded in t h e da t a ga t h er in g docu m en t s: WU E P -O1 a n d H E -O1). Sect ion B.4 sh ows t h e t em pla t e wh ich wa s em ployed t o r epor t u sa bilit y pr oblem s in WUE P a n d H E . Th e or igin a l m a t er ia ls h a ve been t r a n sla t ed in t o E n glish for t h e r ea der ‟s con ven ien ce. Th e or igin a l exper im en t a l m a t er ia l a n d t h e r a w da t a a r e a va ila ble for down loa d a t h t t p://www.dsic.u pv.es/~a fer n a n dez/J SS/fa m ilyexp.h t m l. B.1 Examples of operationalized metrics and heuristics B.1.1. Operationalized Metrics. Met r ic Usa bilit y a t tr ibu te Gener ic descr ipt ion

Sca le In t er pret a t ion Oper a t ion a liza t ion

Th resholds

Met r ic Usa bilit y a t tr ibu te Gener ic descr ipt ion Sca le In t er pret a t ion

Oper a t ion a liza t ion

Th resholds

Met r ic Usa bilit y a t tr ibu te Gener ic descr ipt ion

Dept h of th e Naviga t ion (DN) Appr opr ia t eness recogn isa bilit y / Na viga bility/ Rea ch a bilit y Level of dept h in t he u ser n a viga t ion , in ot her wor ds, t he lon gest n a viga t ion pa t h wh ich is n eeded to r each an y con t en t /fea t u re (wit hou t loops) fr om t he Web a pp by t he u ser . In t eger grea ter th a n 0 Th e h igher th e va lue, t he m ore difficu lt it is for th e u ser to r ea ch t he con ten t/fea tu re. Th is m et r ic ca n be ca lcu la t ed for each N a viga t ion a l Access Dia gr a m (NAD) by con sider in g th e n um ber of na viga t ion st eps fr om th e longest n a viga t ion pa t h . Wh ere: Na viga t ion step: when a Target Lin k exist s bet ween two n odes (a ny m odelin g pr im it ive a n d/or m or e t h a n on e m odeling pr im it ives con nected by Au t om a t ed Links a n d/or Sour ce Lin ks) Longest n a viga t ion pa t h : Th e pa t h wit h t h e gr ea test n um ber of n a viga t ion st eps, wh ich begin s in th e fir st Na viga t ion a l Cla ss or Collect ion wh ere t he n a viga t ion st a r t s, a n d wh ich ends in t h e la st N a viga t ion a l Cla ss or Service Lin k, from wh ich it is n ot possible t o r each a not her m odeling pr im it ive pr eviou sly visited. Th e ca lcu la t ion form u la is t her efore: DN(NAD) = Nu m ber of n a viga t ion steps from t he longest n a viga t ion pa t h [1 ≥ DN ≤ 4]: No u sa bilit y problem . [5 ≤ DN ≤ 7]: Low u sa bilit y problem . [8 ≤ DN ≤ 10]: Mediu m Usa bilit y P roblem . [DN ≥ 10]: Cr it ica l Usa bilit y P roblem . P ropor t ion of links with out m ea nin gfu l n am es (P LM) Lear n a bilit y / P redict a bilit y / Mea n in gfu l links Ra t io bet ween t he n um ber of links wit hou t a m ea n ingfu l n a m e an d t h e t ot al n u m ber of lin ks. Ra t io bet ween 0 a n d 1. Th e h igh er t he va lue, th e wor se t h e pr edict a bility t h a t is provided, sin ce t he u ser m a y exper ience difficu lt ies in pr edict in g t he t a r get a n d r esu lt s of h is/her a ct ion s. Th is m et r ic ca n be ca lcu la ted in a ll t he a bst r a ct pa ges belongin g to an Abst r a ct P resen t a t ion Dia gr a m (AP D) by con sider in g t he proport ion of non pr oper n am es u sed by APD links. Th e ca lcu la t ion form u la is t herefor e: Number of Links without a meaningful name P LM(AP D) = Total number of Links in the APD [P LM = 0]: No u sa bilit y problem . [0 < P LM ≤ 0.3]: Low u sa bilit y pr oblem . [0.3 < P LM ≤ 0.6]: Mediu m Usa bilit y P roblem . [0.6 < P LM ≤ 1]: Cr it ica l Usa bilit y P roblem . H ea din gs a ccor din g to t h e t a rget of th e links (H AT) E a se of u se / Con sist ency / Hea din g con sist en cy Num ber of hea din gs whose n am e is not in a ccor da n ce wit h th e lin k n a m e

Sca le In t er pret a t ion Oper a t ion a liza t ion

Th resholds

fr om wh ich t he hea din g wa s r each ed. In t eger grea ter th a n 0. Th e h igher t he va lue, t he worse t he con sisten cy t h a t exists in t he Web a pplica t ion con t en t , t h u s a ffect ing t he ea se of u se. Th is m et r ic can be ca lcu la ted in t h e fin a l u ser in ter fa ce (F UI) by con sider in g t h e n am es of t h e lin ks a n d t he h ea din gs of t he con ten t rea ched by t hese lin ks. Th e ca lcu la t ion form u la is t h erefore: H AT(F UI) = Nu m ber of h ea din gs t h a t ar e n ot in accor da n ce wit h t he link n a m e wh ich wa s followed to rea ch t he cu rr en t cont ent . [H AT = 0]: No u sa bilit y problem . [1 ≤ H AT ≤ 3]: Low u sa bilit y problem . [4 ≤ H AT ≤ 6]: Mediu m Usa bilit y P roblem . [H AT ≥ 7]: Cr it ica l Usa bilit y P roblem .

B.1.2. Heuristics. H eur ist ic Descr ipt ion

Ma t ch bet ween syst em an d t he rea l wor ld . Th e system sh ou ld speak t he u sers' lan gu a ge, wit h wor ds, ph r a ses a nd concept s t h a t a re fam ilia r to t he u ser, r a th er t h a n system -or ien t ed term s. F ollow rea l-wor ld con ven t ion s, m a king in form a t ion a ppea r in a n a t u r a l a n d logica l or der . On t h e Web, you h ave to be a wa r e t h a t u ser s will proba bly com e from diver se ba ckgroun ds, so figur in g ou t t h eir "la ngu a ge" can be a cha llen ge. An exam ple of a rea l-wor ld concept t h a t is a pplied t o Web a pplica t ion s m a y be t h e icon s em ployed t o dist in gu ish bet ween er ror s, wa rn in gs, or a dvice. An ot her exam ple wou ld be th e sh oppin g car t m et a phor . In m a ny Web stores, cu stom er s u su a lly click on ce to select a n elem en t (equ iva len t t o ta k in g it off t h e shelf in a rea l st or e), click a ga in to "a dd to ca r t " (equ iva len t to pla cing t h e item in t heir rea l car t ) a nd t h en a dd a t h ir d click to con fir m th eir pu r ch a se in ten t ion (equ iva lent to a ppr oach ing t he ca sh ier in or der to pa y for it ).

H eur ist ic Descr ipt ion

User con t rol a n d freedom User s oft en choose som e fu nct ions by m ist ake a n d will need a clear ly m ar ked "em er gen cy exit " to lea ve t he un wa n t ed st a te wit hou t h a vin g to go th rou gh a n exten ded dia logu e. It is im por t a n t t o provide con t rol oper a t ion s su ch a s : ca n cel, u n do a n d r edo. Ma n y of t h e "em ergency exit s" a r e provided by t h e browser, bu t t h ere is st ill plen t y of room on t he sit e to su ppor t u ser con t rol a n d fr eedom . Or , t h er e are m a n y ways a u t hors ca n t a ke a wa y u ser con tr ol t h a t a re bu ilt in to t he Web. A "h om e" bu t ton on ever y pa ge is a sim ple wa y t o let u ser s feel in con t r ol of t h e sit e. Be ca r efu l wh en for cing u ser s in t o cer t a in fon t s, color s, screen widt h s or br owser ver sion s. An d wa t ch out for som e of t hose "a dva n ced t ech nologies": u ser con t rol is n ot u su a lly a dded u n t il t he techn ology h a s m a t u r ed. One exam ple is a n im a t ed GIF s. Unt il br owser s let u ser s stop a n d r est a r t the a n im a t ion s, t hey ca n do m ore h ar m t h a n good.

H eur ist ic Descr ipt ion

Recogn it ion r a t her t h a n reca ll Min im ize t he u ser 's m em or y loa d by m a kin g object s, a ct ion s, a n d opt ions visible. Th e u ser shou ld not h a ve to r em em ber in form a t ion from one par t of t h e dia logu e to a not her . In st r uct ion s for u se of t he system should be visible or ea sily ret r ieva ble when ever appr opr ia t e. Good la bels a n d descr ipt ive links a r e a lso cru cia l for recogn it ion. It is best t o a lways m a in t a in lin ks, m en us, str uct u res, a ct ion s a n d opt ions visible t o a llow t hem t o be m em or ized. For exam ple, if a websit e h a s a lot of su bm en u s, you shou ld u se a system th a t a llows u se r s to kn ow wh ich sect ion you a re a t a ny t im e. Th is cou ld be lea vin g a "t r a il of cr um bs", or t he Web a pplica t ion cou ld use a color sch em e t h a t m a kes it possible to differ en t iate bet ween t he sect ion s.

B.2 Example of a Web artifact to be evaluated

F igu r e B1 sh ows t h e Abst r a ct P r esen t a t ion Dia gr a m (AP D1) by in clu din g it s six a bst r a ct pa ges. Det a iled in for m a t ion a bou t t h e con t en t of t h ese a bst r a ct pa ges is pr ovided a s follows. E lem en t s m a r k ed wit h „(*)‟ a r e a t t r ibu t es fr om t h e N avigation al classes a n d t h eir displa y t ext in t h e fin a l Web a pplica t ion will be t h e va lu es fr om t h e a t t r ibu t e: Th e fir st a bst r a ct pa ge (F igu r e B1 (a )) r epr esen t s t h e a ccess t o t h e differ en t exist in g folder s: pr edefin ed, cr ea t ed, u ser -specific. It con t a in s:  1 la bel: “Fold er".  1 im a ge: por t folio icon wit h on e t ick.  7 lin k s: “N ew fold er”, “All task s”, “Pen d in g task s”, “E n d ed task s”, “T ask ou t of d ate”, “fold er_n am e(*)", “u ser_n am e(*)”. Th e secon d a bst r a ct pa ge (F ig. B1 (b)) r epr esen t s t h e t a sk list wh ich is filt er ed by t h e select ed folder . It con t a in s:  3 la bels: “Ta sk list ”, “folder _n a m e ( )”, “descr ipt ion (*)”, “!”, “Descr ipt ion ”, “E n d da t e”  2 im a ges: folder icon , por t folio icon .  2 lin k s: “New Ta sk ”, “n a m e a n d st a t u s (*)” Th e t h ir d a bst r a ct pa ge (F ig. B1 (c)) r epr esen t s t h e wa r n in g m essa ge t h a t a ppea r s wh en t h e select ed folder does n ot con t a in a n y a t t a ch ed t a sk .  1 la bel: “NOTICE Th e select ed …”  1 im a ge: excla m a t ion icon Th e fou r t h a bst r a ct pa ge (F ig. B1 (d)) r epr esen t s t h e det a iled t a sk in for m a t ion in con ju n ct ion wit h t h e a va ila ble oper a t ion s: a t t r ibu t e m odifica t ion , en ded per cen t a ge u pda t e, a n d u ser a ssign m en t :  21 la bels: “Ta sk det a il”, “E N1 (*)”, Ta sk t it le, Begin da t e, E n d Da t e, et c.  4 lin k s: “a Ie”, “pa r en t _folder (*)”, Modify, Rea ssign . Th e fift h a bst r a ct pa ge (F ig. B1 (e)) r epr esen t s t h e cr ea t ion of a n ew t a sk . F or m fields r efer t o t h e a t t r ibu t es fr om t h e Ta sk cla ss t h a t wa s defin ed in t h e Cla ss Model:  7 la bels: “New Ta sk ”, “Ta sk n a m e”, “descr ipt ion ”, “pr ior it y”, “a ssign ed u ser ”, “begin da t e”, “E n d da t e (dea dlin e)”.  1 lin k : “N ew” Th e sixt h a bst r a ct pa ge (F ig. B1 (f)) r epr esen t s t h e cr ea t ion of a n ew folder . F or m fields r efer t o t h e a t t r ibu t es fr om t h e F older cla ss t h a t wa s defin ed in t h e Cla ss Model.  3 la bels: “New F older ”, “F older n a m e”, “F older descr ipt ion ”.  1 lin k : “OK”

> B.3 Examples of experimental tasks B.3.1. Experimental tasks for applying WUEP to APD1.

1.

2.

Usin g a s su ppor t t h e list of oper a t ion a lized m et r ics : a . Select t h e m et r ics t h a t ca n be a pplied t o t h e AP D t h a t is sh own in F igu r e B1. b. Apply ea ch m et r ic in or der t o obt a in it s va lu e. c. Cla ssify t h e va lu e obt a in ed a ccor din g t o t h e t h r esh old est a blish ed for ea ch m et r ic. F or ea ch det ect ed u sa bilit y pr oblem (low, m ediu m , cr it ica l), fill in t h e r equ ir ed fields pr ovided by t h e u sa bilit y r epor t t em pla t e, a n d wr it e t h e ID of t h e pr oblem in t h e la st colu m n .

Wr it e st a r t in g t im e (h h :m m ): ________ Met r ic Acr on ym

Met r ic ca lcu la t ion

Sever it y level of t h e u sa bilit y pr oblem

Usa bilit y pr oblem ID









Wr it e fin ish in g t im e (h h :m m ): ________ B.3.2. Experimental tasks for applying HE to APD1.

1.

2.

3.

Usin g t h e list of h eu r ist ics a s su ppor t , iden t ify wh et h er t h e pr in ciples t h a t a r e r epr esen t ed by ea ch h eu r ist ic ca n be a pplied t o t h e AP D t h a t is sh own in F igu r e B1. If n ot , m a r k t h e “Not Applica ble” box. F or ea ch a pplica ble h eu r ist ic, in dica t e t h e degr ee t o wh ich t h e r epr esen t ed pr in ciples a r e su ppor t ed by t h e h eu r ist ic (YE S=Su ppor t ed; P =P a r t ia lly su ppor t ed; NO = Not su ppor t ed). J u st ify you r decision by in dica t in g som e elem en t s fr om t h e a r t ifa ct eva lu a t ed. F or ea ch h eu r ist ic wh ose u sa bilit y pr in ciples wer e n ot su ppor t ed, fill in t h e u sa bilit y pr oblem s det ect ed in t h e u sa bilit y r epor t t em pla t e, a n d wr it e t h e ID of t h e pr oblem in t h e la st colu m n .

Wr it e st a r t in g t im e (h h :m m ): ________ H eur ist ic ID

Usa bilit y pr in ciple r epresen ted

J u st ifica t ion by elem en t s of t he device ID obser ved u sa bilit y problem

Usa bilit y pr oblem ID





Not Applica ble YE S …

P



NO

Wr it e fin ish in g t im e (h h :m m ): ________ B.4 Examples of templates for reporting usability problems B.4.1. Template for reporting usability problems in WUEP.

F ields t o com plet e for ea ch u sa bilit y pr oblem iden t ified:  Descr ipt ion : Text u a l descr ipt ion of t h e pr oblem iden t ified.  Occu r r en ces: Nu m ber of t im es t h e u sa bilit y pr oblem is r epea t ed in t h e sa m e Web a r t ifa ct eva lu a t ed (if a pplica ble).  Recom m en da t ion s: Gu ida n ce on h ow t o pr even t a n d/or cor r ect t h e u sa bilit y pr oblem det ect ed. ID Descr ipt ion Occur ren ces Recom m en da t ion s

P 001

ID Descr ipt ion Occur ren ces Recom m en da t ion s

P 002

… B.4.2. Template for reporting usability problems in HE tasks for applying HE to APD1.

F ields t o com plet e for ea ch u sa bilit y pr oblem iden t ified:  Descr ipt ion : Text u a l descr ipt ion of t h e pr oblem iden t ified.  Occu r r en ces: Nu m ber of t im es t h e u sa bilit y pr oblem is r epea t ed in t h e sa m e Web a r t ifa ct eva lu a t ed (if a pplica ble).  Sever it y level: Cla ssifica t ion of t h e u sa bilit y pr oblem : cr it ica l, m ediu m or low.  Recom m en da t ion s: Gu ida n ce on h ow t o pr even t a n d/or cor r ect t h e u sa bilit y pr oblem det ect ed. ID Descr ipt ion Sever it y level Occur ren ces Recom m en da t ion s

P 001

ID Descr ipt ion Sever it y level Occur ren ces Recom m en da t ion s

P 002

Low

Low

Medium

Cr it ica l

Medium

Cr it ica l

… ACKNOWLEDGMENTS Th e a u t hors wou ld like to t h a n k a ll t he pa r t icipa n ts in t h e exper im en t s, a lon g wit h th e u sa bility exper ts t h a t su ppor t ed cer t a in t a sks of th e eva lu a t ion design st a ge, an d of wh ich t he con t rol gr ou p wa s com posed. Th is resea rch work is fu n ded by t h e MULTIPLE project (TIN2009 -13838) a n d t h e FP U progr a m (AP 200703731) fr om t he Spa n ish Min ist ry of Scien ce an d E du ca t ion .

REFERENCES Abr a h ã o, S., a n d In sfr a n , E . 2006. E a r ly Usa bilit y E valu a t ion in Model-Dr iven Arch it ectu re E n vir on m en ts. In Proceed in gs of th e 6th IE E E In tern ation al Con ference on Qu ality S oftw are (QS IC‟06). IE E E Com pu ter Societ y, 287-294. Abr a h ã o, S., Iborr a , E., a n d Van der don ckt , J . 2007. Usa bilit y E va lu a t ion of User In ter fa ces Gener a ted wit h a Model- Dr iven Arch itectu r e Tool. In M atu rin g Usability: Qu ality in S oftw are, In teraction an d Valu e, Spr in ger , 3-32. Abr a h ã o, S., an d P oels, G. 2009. A fa m ily of exper im en t s to eva lu a te a fu n ct ion a l size m ea su r em ent pr ocedu re for Web a pplica t ion s. In J ou rn al of S ystem s an d S oftw are 82, 2, 253-269. Abr a h ã o, S., J u r isto N., La w, E ., St a ge, J . 2010. Int er pla y bet ween u sa bility a n d soft war e developm en t. J ou rn a l of System s a n d Soft war e 83 (11): 2015-2018. Abr a h ã o, S., In sfr án , E ., Ca r sí, J .A, a n d Genero, M. 2011. E va lu a t in g requ irem ent s m odelin g m eth ods ba sed on u ser per cept ion s: A fam ily of exper im en t s. In In form ation S ciences 181, 16, 3356–3378. Allen , M., Cur r ie, L., Ba kken , S., P a tel, V., a n d Cim in o, J . 2006. H eur ist ic eva lu a t ion of pa per -based Web pa ges: A sim plified in spect ion u sa bilit y m et h odology. In J ou rn al of B iom ed ical In form atics 39, 4, 412423. Ba sili, V.R., Selby, R.W., an d Hu t chen s, D.H . 1986. E xper im en ta t ion in Soft wa re E n gineer in g. In IE E E T ran saction on S oftw are E n gineerin g12, 7, 733-743. Ba sili, V.R., a n d Rom bach , H.D. 1988. The TAME pr oject : t owa r ds im p rovem en t -or ien ted soft wa re en vir onm en t s. In IE E E T ransaction s on S oftw are E n gin eerin g 14, 6, 758–773. Ba sili, V.R. 1993. The E xperim en t a l P a r a digm in Soft wa re E n gineer in g. In Proceed in gs of the In tern ation al Work sh op, E xperim ental S oftw are E n g. Issues: Cri tical Assessm en t and Futu re Directions, LNCS 706, Spr inger . Ba sili, V.R. 1996. The Role of E xper im en t a t ion in Soft war e En gin eer in g: P a st , Cu r ren t, a n d F u t ur e . In Proceed in gs of Intern ation al Conference on S oftw are E n gin eerin g (ICS E ‟96), 442-449. Ba sili, V.R., Sh u ll, F ., a n d La nu bile, F. 1999. Bu ildin g Knowledge t hr ou gh F am ilies of E xper im en t s . In IE E E T ran sactions on S oftw are E n gin eerin g 25, 456-473. Biost a t 2006. Biost a t Com prehen sive Met a -An a lysis v2. h t t p://www.m et a -a n a lysis.com / Bla ckm on , M.H., Kit a jim a , M., a n d P olson , P.G. 2005. Tool for a ccu r a tely predict in g websit e na viga t ion pr oblem s, non -problem s, problem sever it y, a n d effect iven ess of r epa ir s. In Proceed ings of th e S IGCH I con feren ce on H um an factors in com putin g system s (CH I‟05), 31-40. Bolch in i, D., a n d Ga r zot to, F . 2007. Qu a lit y of Web Usa bilit y E va lu a t ion Met hods: An E m pir ica l St u dy on MiLE +. In Proceed in gs of th e Intern ation al Work sh op on Web Usability an d Accessibility (IWWUA„07), 481-492.

Br ia n d, L., La biche, Y., Di P en t a, M., a n d Yan -Bon doc, H. 2005. An exper im en t a l invest iga t ion of form a lit y in UML-ba sed developm en t . In IE E E T ran saction s on S oftw are En gin eerin g 31, 10, 833–849. Ca ch er o, C., Melia , S., Genero, M., P oels, G., an d Ca lero, C. 2007. Towa r ds im pr ovin g t he n a viga bilit y of Web a pplica t ions: a m odel-dr iven a ppr oach . In E uropean J ourn al of In form ation S ystem s 16, 420–447. Ca st eleyn , S., Da n iel, F ., Dolog, P ., a n d Ma ter a, M. 2009. E n gineerin g Web Applications, Spr in ger . Cer i, S., F r a ter n a li, P ., a n d Bongio, A. 2000. Web m odeling la n gu a ge (WebML): a m odeling la n gu a ge for design in g Web sit es. In Proceed in gs of th e 9th World Wide Web Con ference (WWW‟09), pp. 137–157. Ch a t t r a t ich ar t , J ., a n d Brodie, J . 2004. Applyin g u ser t est ing da t a t o UE M per form a n ce m et r ics. In Proceed in gs of th e ACM S IGCHI Con ference on H um an factors in com putin g system s (CH I‟04), ACM pr ess. 1119-1122. Ciolkowski, M., Sh u ll, F ., an d Biffl, S. 2002. A fa m ily of exper im en t s t o in vest iga te t he in flu en ce of con text on t he effect of in spect ion tech n iqu es. In Proceed in gs of th e 6th In tern ation al Con feren ce on Em pirical Assessm ent in S oftw are E n gin eerin g (E AS E ‟02), 48–60. Cockton , G., La ver y, D., a n d Woolr ych n , A. 2003. In spect ion -ba sed eva lu a t ion s. In T h e Hu m an -Com puter In teraction H an d book , 2n d edit ion , J .A. J acko a n d A. Sea r s, E ds. La wren ce Er lba um Associa tes, 11711190. Colosim o, M., De Lucia , A., Sca n n iello, G., a n d Tort or a , G. 2009.E va lu a t in g Lega cy Syst em Migr a t ion Tech nologies t h rou gh E m pir ica l St u dies. In In form ation and S oftw are T ech nology, 51, 12, E lsevier , 433-447. Con over , W. J . 1998. P r a ct ica l Non pa r am etr ic St a t ist ics, Wiley, 3r d edit ion . Con te, T., Ma ssollar , J ., Men des, E ., a n d Tr a va ssos, G.H. 2009. Web usa bilit y in spect ion t ech n iqu e ba sed on design per spect ives. In IE T S oftw are 3, 2, 106-123. Cost a bile, M.F ., a n d Ma ter a , M. 2001. Gu idelines for h yperm edia u sa bilit y inspect ion . In IE E E M u ltim ed ia 8, 1,66-69. Cr u z-Lem us, J .A., Gen er o, M., Ca iva n o, D., Abr a h ão, S., In sfr án , E ., a n d Ca r sí, J .A. 2011. Assessin g t he in flu ence of st er eot ypes on t he com pr ehension of UML se qu en ce dia gr a m s: A fa m ily of exper im ent s. In In form ation an d S oftw are T ech nology 53, 12, 1391-1403. Da vis, F .D. 1989. P erceived Usefu lness, P er ceived ea se of u se a n d u ser a ccep t a n ce of in form a t ion t echn ology. In MIS Qu a r ter ly 13, 3, 319-340. Dzidek , W. J ., Ar ish olm , E ., a nd Br ia n d, L. C. 2008. A Rea list ic E m pir ica l E va lu a t ion of th e Cost s a nd Ben efit s of UML in Soft war e Main t en a nce. In IE E E T ran saction s on S oftw are E n gin eering, 34, 3, 407432. F ent on , N. 1993. How E ffect ive Ar e Soft wa re En gineer in g Met hods ?. In J ou rn al of S ystem s and S oftw are, 22, 2,141-146. F er n a n dez, A., In sfr a n , E ., a n d Abr a h ã o, S. 2009. In tegr a t in g a Usa bilit y Model in to a Model-Driven Web Developm en t P rocess. In Proceedin gs of the 10th In tern ation al Con feren ce on Web In form ation S ystem s E n gin eerin g (WIS E ‟09), LNCS 5802, Spr in ger, 497-510. F er n a n dez, A., Abr a h ã o, S., a nd In sfr a n , E . 2010. Towar ds to t h e Va lida t ion of a Usa bilit y Eva lu a t ion Met hod for Model-Dr iven Web Developm en t. In Proceed in gs of 4th ACM -IE E E In tern ation al S ym posium on Em pirical S oftw are E ngin eerin g an d Measurem ent (E S E M ‟10), ACM, New York, NY. F er n a n dez, A., In sfr a n , E ., a n d Abr a h ã o, S. 2011. Usa bilit y E va lu a t ion Met hods for t h e Web: A Syst em a t ic Ma ppin g St u dy. In In form ation an d S oftw are T ech nology 53, 789–817. F er n a n dez, A., Abr ah ã o, S., a n d In sfr a n , E . 2011. A Web Usa bilit y E va lu a t ion Process for Model-Dr iven Web Developm en t. In Proceed in gs of th e 23rd Intern ation al Con feren ce on Ad van ced In form ation S ystem s E n gin eerin g (CAiS E‟11), Spr in ger, 108-122. F er n a n dez, A., Abr a h ão, S., a nd In sfr a n , E . 2012. A Syst em at ic Review on t he E ffect iven ess of Web Usa bilit y E va lu a t ion Met hods . In Proceed ings of th e 16th In tern ation al Con feren ce on E valu ation & Assessm ent in S oftw are E n gin eerin g (E AS E ‟12), IE E E , 108-122. F isher , R.A. 1915. F requency dist r ibu t ion of t he va lues of t he cor rela t ion coefficien t in sam ples of a n in defin it ely la rge popu la t ion. In B iom etrik a, 10, 4, 507–521. Geller sen , H., Wicke, R., a n d Ga edke, M. 1997. Web Com posit ion : a n object -or ien ted su ppor t syst em for t h e Web en gin eer ing lifecycle. In Com pu ter N etworks an d IS DN S ystem s 29, 8-13, 865-1553. Gla ss, G.V., Mcga w, B., a n d Sm it h , M.L. 1981. M eta-An alysis in S ocial R esearch . Sa ge P u blica t ion s. Gom ez, J ., Ca chero, C., a n d P a st or , O. 2000. E xten din g a con cept u a l m odelin g a ppr oa ch t o Web a pplica t ion design . In Proceed in gs of th e 12th In tern ation al Con feren ce on Ad vanced In form ation S ystem s E n gin eerin g (CAiS E‟00), 79-93. Gr a y, W.D., a n d Sa lzm an , M.C. 1998. Dam a ged m erch a n dise? A r eview of exper im en t s t h a t com pa re u sa bilit y eva lu a t ion m et hods. In H u m an -Com pu ter Interaction 13, 3, 203-261. H a r t son , R.H ., An dr e, T.S., a n d Williges, R.C. 2003. Cr iter ia for E va lu a t in g Usa bilit y E va lu a t ion Met h ods. In In tern ation al J ou rn al of H um an -Com puter Interaction 15, 1, 145-181. H edges, L.V., a n d Olkin , I. 1985. S tatistical M ethod s for Meta-Analysis. Aca dem ia P ress. H er t zum , M., a n d J a cobsen , N.E ., 2001. The eva lu a tor effect: a ch illin g fa ct a bou t u sa bilit y eva lu a t ion m eth ods. In In tern ation al J ou rnal of Hu m an -Com puter Interaction 13, 421-443. H ollingsed, T., a n d Novick, D.G. 2007. Usa bilit y in spect ion m eth ods a ft er 15 yea r s of r esea r ch a n d pr a ct ice. In Proceed ings of the 25th an n u al ACM in tern ation al con ference on Design of com m un ication (S IGDOC‟07), 249-255, ACM P ress.

H or n bæ k, K., a n d F r økjæ r , E . 2004. Two psychology-ba sed u sa bilit y in spect ion tech n iques stu died in a dia r y exper im en t. In Proceed in gs of th e 3rd N ord ic con feren ce on H um an -com puter in teraction (N ord iCH I '04), 3-12. H or n bæ k, K., a n d F røkjæ r , E . 2004. Usa bilit y In spect ion by Meta ph or s of Hu m an Th inkin g Com pa r ed t o H eur ist ic E va lu a t ion. In In tern ation al J ou rn al of H um an -Com puter Interaction 17, 3, 357-374. H or n bæ k, K. 2006. Cu rr ent pr a ct ice in m ea sur in g usa bilit y: ch a llen ges to u sa bilit y st u dies an d resear ch. In : Intern ation al J ou rn al of H um an –Com pu ter S tu d ies 64, 79–102. H or n bæ k, K., a n d La w, E .L.C. 2007. Met a -a n a lysis of corr ela t ion s a m on g u sa bilit y m easu r es. In Proceed in gs of the S IGCH I con ference on H um an factors in com pu tin g system s (CH I '07), ACM New York, 617-626 H or n bæ k, K. 2010. Dogm a s in t he a ssessm en t of u sa bilit y eva lu a t ion m eth ods. In B eh aviour & In form ation T ech nology Vol. 29, Iss. 1, 2010 H öst, M., Regn ell, B., a n d Woh lin , C. 2000. Usin g stu den t s a s su bject s - a com pa r a t ive st u dy of st u den t s a n d pr ofession a ls in lea d-t im e im pa ct a ssessm en t . In Proceed ings of th e 4th Con feren ce on E m pirical Assessm ent and E valu ation in S oftw are E n gin eerin g (E AS E ‟00), 201–214. H u ,P . J ., a n d Ch au , P .Y.K. 1999. E xa m in in g t he Tech nology Accept a nce Model Usin g P h ysicia n Accept a nce of Telem edicine Techn ology. In J ou rn al of M an agem en t In form ation S ystem s 16, 2, 91-113. H va n n ber g, E.T., La w, E ., a n d Lá r u sdót t ir , M. 2007. H euristic evalu ation : Com parin g w ays of find in g an d reportin g usability problem s. In In teracting w ith Com puters 19, 2, 225-240. H wa n g, W., a n d Sa lven dy, G. 2010. Num ber of people requ ired for u sa bility eva lu a t ion: t he 10±2 r u le. In Com m un ication s of the ACM 53, 5, 130-133 ISO/IE C. 2001. ISO/IE C 9126-1 St a n da r d, Soft wa re E n gin eer in g, P rodu ct Qu a lity – P a r t 1: Qu a lity Model. ISO/IE C. 2005. ISO/IE C 25000 ser ies, Soft wa re E n gineer ing, Soft wa r e P rodu ct Qu a lity Requ ir em en t s a nd E va lu a t ion (SQu a RE ). Ivor y, M.Y. 2001. An Em pir ica l F ou n da t ion for Au t om a ted Web In t er fa ce E va lu a t ion. P hD Thesis, Un iver sity of Ca lifor n ia , Ber keley, Com pu ter Science Division. J u r ist o, N., a n d Moreno, A.M. 2001. B asics of S oftw are E n gin eerin g E xperim entation , Klu wer Aca dem ic P u blisher s. J u r ist o, N., Moreno, A., a n d Sanch ez-Segur a , M.I. 2007. Gu idelines for elicit in g u sa bilit y fun ct iona lit ies. In IE E E T ran sactions on S oftw are E n gin eerin g 33, 11, 744-758. Ka m penes, V., Dybå, T., H a n n ay, J .E., a n d Sjøber g, D.I.K. 2007. A System a t ic Review of E ffect Size in Soft wa re E n gin eer in g E xper im en t s. In In form ation an d S oftw are T ech nology 49, 11-12, 1073-1086. Kit chen h am , B.A., P fleeger , S., H oaglin , D.C., E l E m a m , K., a n d Rosen ber g, J . 2002. P relim in a ry gu idelines for em pir ica l resear ch in soft war e en gin eer in g. In IE E E T ran sactions on S oftw are E n gin eerin g28, 8, 721–734. Kit chen h am , B.A. 2008. The role of replica t ion s in em pir ica l soft wa r e en gineer ing - a wor d of wa r n in g. In E m pirical S oftw are En gin eerin g 13, 2, 219-221. Koch , N., a n d Kr a u s, A. 2003. Towa r ds a Com m on Met a m odel for t h e Developm en t of Web Applicat ion s. In 3rd Intern ation al Con feren ce on Web E n gineerin g (ICWE ‟03), LNCS 2722, Spr inger . Kou t sa ba sis, P ., Spyrou , T., a nd Da r zen t a s, J . 2007. E va lu a t in g u sa bilit y eva lu a t ion m et hods: cr iter ia, m eth od a n d a ca se st u dy. In Proceed in gs of th e 12th in tern ation al con feren ce on Hu m an -com pu ter in teraction : in teraction design an d u sability (H CI‟07), 569-578. Lea vit , M., a n d Sh neiderm a n , B. 2006. Resea rch -Ba sed Web Design & Usa bilit y Gu idelin es. U.S. Gover nm en t P r in t in g Office. h t tp://usa bilit y.gov/gu idelin es/in dex.h tm l Lin dsa y, R.M., a n d E h ren ber g, A.S.C. 1993. The Design of Replica t ed St u dies. In T h e Am erican S tatistician , 47, pp. 217-228. Ma la k , G., a n d Sa h r aou i, H . 2010. Modelin g Web Qu a lit y Using a P r oba bilist ic Appr oa ch : An E m pir ica l E va lu a t ion. In ACM T ran sactions on th e Web 4, 3, Ar t icle 9, 31 pa ges. Ma t er a, M., Rizzo, F ., a n d Ca r ugh i, G. 2006. Web u sa bilit y: pr in ciples an d eva lu a t ion m et hods . In Web en gin eerin g, E. Men des a n d N. Mosley, E ds. Spr in ger , 143–179. Ma xwell, K. 2002. Applied S tatistics for S oftw are M an agers. Soft wa r e Qu a lit y In st itu t e Ser ies, P ren t ice H a ll. Men des, E. 2005. A System a t ic Review of Web E ngin eer in g Resea r ch . In Proceed in gs of th e Intern ation al S ym posium on Em pirical S oftw are E ngin eerin g (IS E S E ‟05), 498–507. Molin a , F ., a n d Tova l, A. 2009. In t egr a t in g u sa bilit y requ irem en t s t h a t ca n be eva lu a t ed in design t im e in t o Model Dr iven E n gineer in g of Web In form a t ion System s. In Ad van ces in E n gin eerin g S oftw are 40, 12, 1306-1317. Moreno, N., a n d Va llecillo, A. 2008. Towa r ds in teroper a ble Web en gineer in g m et hods. In J ourn al of th e Am erican S ociety for In form ation S cien ce an d T echn ology 59, 7, 1073–1092. Nielsen , J . 1994. Heur ist ic eva lu a t ion. In Usa bility in spect ion m eth ods, J . Nielsen a n d R.L. Ma ck. E ds. J oh n Wiley & Son s, 25-62. Nielsen , J. 2005.Ten best in t r a net s of 2005. J a kob Nielsen 's Alert box. h t t p://www.u seit .com /a ler t box/20050228.h t m l. Offu t t , J . 2002. Qu a lit y At t r ibu tes of Web Soft war e Applica t ion s . In IE E E S oftw are 19, 2 (Specia l Issu e on Soft wa re E n gin eer in g of In ter n et Soft wa re ), 25-32.

Olsin a , L., an d Rossi, G. 2002. Mea su r in g Web Applica t ion Qu a lity wit h WebQE M. In IE E E M u ltim ed ia 9, 4, 20-29. Olson , J .R., a n d Olson , G.M. 1990. The growt h of cogn it ive m odelin g in h um a n -com pu t er in ter a ct ion since GOMS. In Hu m an -Com puter Interaction 5, 221–265. P a n a ch , I., Con dor i, N., Va lver de, F ., Aqu ino, N., a n d P a stor , O. 2008. Un der st a n da bilit y m ea su rem en t in a n ear ly u sa bilit y eva lu a t ion for m odel-dr iven developm en t: a n em pir ica l st u dy. In Proceed in gs of the 2n d In tern ation al S ym posiu m on E m pirical S oftw ar e E n gineerin g an d M easu rem ent (E S E M ‟08), 354356. Ricca , F ., Di P ent a , M., Tor ch ian o, M., Tonella , P ., a n d Cecca to, M. 2010.How Developers‟ E xperien ce a nd Abilit y In flu ence Web Applica t ion Com preh en sion Ta sks Su pport ed by UML St er eot ypes: A Ser ies of F our E xper im en t s. In IE E E T ran saction s on S oftw are E n gineerin g 36, 1, 96-118. Rosen t h a l, R. 1986. Meta-An alytic Proced u res for S ocial R esearch . Sa ge Pu blica t ion s. Sh u ll, F ., Ca r ver , J . C., Vega s, S., a n d J ur isto, N. 2008. The role of replica t ion s in E m pir ica l S oft wa re E n gin eer in g. In Em pirical S oftware E n gineerin g 13, 2, 211-218. Sjøber g, D.I.K., An da , B., Ar ish olm , E., Dybå, T., J ørgen sen , M., Ka r a h a sa novic, A., a n d Voká c. M. 2003. Ch a llen ges an d recom m en da t ion s when in crea sing t he r ea lism of con tr olled soft wa r e en gineer ing exper im en ts. In E m pirical M ethod s an d S tu d ies in S oftw are E n gin eering E xperiences from E S ER N E T 2001-2003, LNCS 2765, 24-38. Sjøber g, D.I.K., H a nn a y, J . E ., Ha n sen , O., Kam penes, V. B., Ka ra h a sa n ovic, A., Libor g, N., a n d Rekda l, A. C. 2005. A Su r vey of Con tr olled E xper im en t s in Soft war e En gin eer in g. In IE E E T ran saction on S oftw are E n gin eering 31, 9, 733-753. Sch m et tow. M. 2012. Sa m ple size in u sa bilit y st u dies. In Com m u n ication s of the ACM 55, 4 (Apr il 2012), 64-70. DOI=10.1145/2133806.2133824. Sot t et , J ., Ca lva r y, G., Cou t a z, J ., a n d F a vre, J . 2007. A m odel-dr iven en gineer in g a ppr oach for t he u sa bilit y of pla st ic u ser in t erfa ces. In Proceed in gs of th e Work ing Con ference on E n gineerin g In teractive S ystem s, 140–157. Ssem u ga bi, S., a n d De Villier s, R. 2007. A com pa r a t ive st u dy of t wo u sa bilit y eva lu a t ion m eth ods u sin g a web-ba sed e-lea rn in g a pplica t ion. In Proceed in gs of th e ann u al research con ference of the S outh African in stitu te of com pu ter scien tists an d in form ation techn ologists on IT research in developin g coun tries (S AICS IT '07), 132-142. Su t cliffe, A. 2002. Assessin g th e relia bilit y of heur ist ic eva lu a t ion for Web sit e a t t r a ct iven ess an d u sa bilit y. In Proceed in gs of th e 35th An n u al H aw aii In tern ation al Con ference on S ystem S cien ces (H ICS S ‟02), 1838-1847. Su t t on , J .A., Abr am s, R.K., J ones, R.D., Sheldon , A.T., a n d Song, F. 2001. Met hods for Met a -An a lysis in Medica l Resea r ch . J ohn -Wiley & Sons. Ta n , D., a n d Bish u , R. 2009. Web eva lu a t ion : h eu r ist ic eva lu a t ion vs. u ser t est in g. In In tern ation al J ourn al of In d ustrial E rgonom ics 39, 621-627. Va lder a s, P ., a n d P elech a no, V. 2011. A su r vey of requ ir em en t s specifica t ion in m odel-dr iven developm en t of web a pplica t ion s. In ACM T ran saction s on the Web5, 2, Ar t icle 10, 51 pa ges. Ven ka tesh V. 2000. Det erm in a nt s of perceived ea se of u se: in tegr a t in g con t rol, in t r insic m ot iva t ion s, a n d em ot ion in to t he t ech nology accept a n ce m odel. In J ou rn al In form ation S ystem s R esearch ,11, 4, 342–65. Woh lin , C., Ru neson , P ., H öst , M., Oh lsson , M. C.,Regnell, B., An dwesslen , A. 2000. E xperim en tation in S oftw are E n gin eering - An Introd u ction , Klu wer .

Figure 1 Click here to download high resolution image

Figure 2 Click here to download high resolution image

Figure 3 Click here to download high resolution image

Figure 4 Click here to download high resolution image

Figure 5 Click here to download high resolution image

Figure 6 Click here to download high resolution image

Figure 7 Click here to download high resolution image

Figure 8 Click here to download high resolution image

Figure A1 Click here to download high resolution image

Figure A2 Click here to download high resolution image

Figure A3 Click here to download high resolution image

Figure A4 Click here to download high resolution image

Figure B1 Click here to download high resolution image

Figure Captions

F ig. 1. Over view of t h e Web Usa bilit y E va lu a t ion P r ocess

F ig. 2. Over view of t h e H eu r ist ic E va lu a t ion P r ocess

F ig. 3. Over view of t h e fa m ily of exper im en t s

F ig. 4. Boxplot s for t h e E ffect iven ess va r ia ble

F ig. 5. Boxplot s for t h e E fficien cy va r ia ble

F ig. 6. Boxplot s for t h e P er ceived E a se of Use va r ia ble

F ig. 7. Boxplot s for t h e P er ceived Sa t isfa ct ion of Use va r ia ble

F ig. 8. Met a -a n a lysis for a ll t h e depen den t va r ia bles

F ig. 9. E xa m ple of a NAD for con t a ct m a n a gem en t (NAD0)

F ig. 10. E xa m ple of t h e U OC m et r ic a pplica t ion t o NAD0

F ig. 11. E xa m ple of a F UI for con t a ct m a n a gem en t (F UI0)

F ig. 12. E xa m ple of t h e VSS h eu r ist ic a pplica t ion t o F UI0

F ig. 13. E xa m ple of t h e Web a r t ifa ct (AP D1)

Table 1

Table 1. Closed-Questions to Evaluate Both Subjective Dependent Variables Qu est ion s

P osit ive st a tem en t (5 poin t s)

Nega t ive St a tem en t (1 poin t )

P E U1

Th e a pplica t ion procedu re of th e m eth od is sim ple a n d ea sy t o follow. I h a ve fou n d t he eva lu a t ion m eth od ea sy t o lea r n. In gen er a l term s, t he eva lu a t ion m eth od is ea sy to u se. Th e proposed m et r ics/heur ist ics a r e clea r a n d ea sy to un der st a n d. It wa s ea sy t o a pply t he eva lu a t ion m et hod t o th e Web ar t ifa ct s.

Th e a pplica t ion procedu r e of th e m eth od is com plex a n d difficu lt to follow. I h a ve fou n d t he eva lu a t ion m eth od difficu lt t o lea r n. In gen er a l term s, t he eva lu a t ion m eth od is difficu lt t o u se. Th e proposed m et r ics/heur ist ics a r e con fu sin g a n d difficu lt to un derst a n d. It wa s difficu lt t o a pply t he eva lu a t ion m et hod t o th e Web ar t ifa ct s.

In gen er a l term s, I believe t he eva lu a t ion m eth od pr ovides a n effect ive m an n er with wh ich t o detect u sa bilit y problem s. Th e em ploym en t of t he eva lu a t ion m et hod wou ld im prove m y per form a n ce in Web u sa bilit y eva lu a t ion s. I believe t h a t it wou ld be ea sy to be skillfu l in t he use of t he eva lu a t ion m et hod.

In gen er a l term s, I believe t he eva lu a t ion m eth od pr ovides a n in effect ive m a n n er wit h wh ich t o detect u sa bilit y problem s. Th e em ploym en t of t he eva lu a t ion m et hod wou ld n ot im pr ove m y per for m an ce in Web u sa bilit y eva lu a t ion s. I believe t h a t it wou ld be difficu lt t o be skillfu l in t he use of t he eva lu a t ion m et hod.

P E U2 P E U3 P E U4 P E U5 P SU1

P SU2

P SU3

Table 2

Table 2. Experimental Design Schema

1s t S e s s ion 2n d S e s s ion

G1(n su bjects) WUEP a pplied in O1 H E a pplied in O2

Gro u ps (Sam ple size: 4n su bjects) G2 (n su bject s) G3(n su bjects) WUEP a pplied in H E a pplied in O1 O2 WUEP a pplied in H E a pplied in O1 O2

G4 (n su bject s) H E a pplied in O2 WUEP a pplied in O1

Table 3

Table 3. Experimental Objects Ex p e rim e n t a l Obje ct O1

User

O2

Soft wa re P rogr am m e r

P roject Ma n a ger

F u n c tio n al F e a tu re Ta sk Ma n a gem en t

U s e Cas e s

We b Artifa cts to be e v a lu a te d

Cr ea te/Modify/ Delete t a sks, Ca t egor ize t a sks, et c.

Repor t Ma n a gem en t

Cr ea te da ily r epor t s, Access to pa r tn er r epor t s, et c.

1 Na viga t ion a l Access Dia gr am (NAD1) 1 Abst r a ct Pr esen t a t ion Dia gr am (AP D1) 1 F in a l User In ter fa ce (F UI1) 1 Na viga t ion a l Access Dia gr am (NAD2) 1 Abst r a ct Pr esen t a t ion Dia gr am (AP D2) 1 F in a l User In ter fa ce (F UI2)

Table 4

Table 4. Hypotheses to Test the Influence in the Order of Independent Variables Dependent v a ria ble s E ffect iven ess

E fficien cy

P er ceived E a se of Use

P er ceived Sa t isfa ct ion of Use

Ord e r o f Me th od s

Ord e r o f Exp e rim e n ta l Obje cts

H M1 0: E ffec_Diff(WUE P ) = E ffec_Diff(H E ) H M1 a : E ffec_Diff(WUE P ) ≠ E ffec_Diff(H E ) H M2 0: E ffic_Diff(WUEP ) = E ffic_Diff(H E ) H M2 a : E ffic_Diff(WUEP ) ≠ E ffic_Diff(H E ) H M3 0: P EU_Diff(WUEP ) = P E U_Diff(H E ) H M3 a : P EU_Diff(WUEP ) ≠ P E U_Diff(H E ) H M4 0: P SU_Diff(WUE P ) = E ffec_Diff(H E ) H M4 a : P SU_Diff(WU E P ) ≠ E ffec_Diff(H E )

H O1 0: E ffec_Diff(O1) = E ffec_Diff(O2) H O1 a : E ffec_Diff(O1) ≠ E ffec_Diff(O2) H O2 0: E ffic_Diff(O1) = E ffic_Diff(O2) H O2 a : E ffic_Diff(O1) ≠ E ffic_Diff(O2) H O3 0: P EU_Diff(O1) = P EU_Diff(O2) H O3 a : P EU_Diff(O1) ≠ P E U_Diff(O2) H O4 0: P SU_Diff(O1) = P SU_Diff(O2) H O4 a : P SU_Diff(O1) ≠ P SU_Diff(O2)

Table 5

Table 5. Planning for the Original Experiment (EXP) Id. Grou p Tr a in ing (15+20 m in u tes) 1st Session (90 m in u tes) Tr a in ing (20 m in u tes) 1st Session (90 m in u tes)

1s t Da y 2n d D a y G3 (3 su bjects) G4 (3 su bjects) G1 (3 su bjects) G2 (3 su bjects) OO-H In tr oduct ion Tr a in ing with H E

Tr a in ing with WUEP

H E in O1 H E in O2 WUEP in O1 WUEP in O2 Qu est ion n a ir e for H E Qu est ion n a ir e for WUEP Br ea k (180 m in u tes) Tr a in ing with WUEP WUEP in O1 WUEP in O2 Qu est ion n a ir e for H E

Tr a in ing with H E H E in O2 H E in O1 Qu est ion n a ir e for WUEP

Table 6

Table 6. New Closed-Questions Added to the Questionnaire Qu est ion s

P osit ive st a tem en t (5 poin t s)

P SU4

I believe t he eva lu a t ion m eth od h elps to im prove m y skills in Web u sa bilit y eva lu a t ion . I a m sa t isfied wit h th e u se of th e eva lu a t ion m et hod, t o t he poin t th a t I wou ld r ecom m en d it s u se in t he eva lu a t ion of Web a pplica t ion s

P SU5

Nega t ive St a tem en t (1 poin t ) I do not believe th e eva lu a t ion m et hod helps t o im prove m y skills in Web u sabilit y eva lu a t ion . I a m n ot sa t isfied wit h t he use of t h e eva lu a t ion m et hod, t o t he poin t th a t I wou ld n ot recom m en d it s u se in t he evalu a t ion of Web a pplica t ions

Table 7

Table 7. Planning for the Second Experiment (REP1) G1 (9 su bjects) 1st Da y (60 m in u tes)

2n d Da y (30 + 90 m in u tes)

3r d Da y (30 + 90 m in u tes)

Gro u ps G2 (10 su bjects) G3 (10 su bjects) OO-H In tr oduct ion Tr a in ing with H E Tr a in ing with WUEP

G4 (9 su bjects)

OO-H In tr oduct ion Tr a in ing with WUEP Tr a in ing with H E WUEP in O1 WUEP in O2 H E in O1 H E in O2 Qu est ion n a ir e for WUEP Qu est ion n a ir e for H E OO-H In tr oduct ion Tr a in ing with H E Tr a in ing with WUEP H E in O2 H E in O1 WUEP in O2 WUEP in O1 Qu est ion n a ir e for H E Qu est ion n a ir e for WUEP

Table 8

Table 8. Overall Results of the Usability Evaluations St a t ist ics Num ber of problem s per su bject F a lse posit ives per su bject Replica t ed pr oblem s per su bject Du r a t ion (m in )

Met hod HE WUEP HE WUEP HE WUEP HE WUEP

E XP (N=12) Mea n SD 4.25 1.40 2.21 7.00 2.08 2.15 0.00 0.00 1.41 0.79 0.00 0.00 61.83 14.43 13.53 44.16

RE P 1 (N=32) Mea n SD 3.81 1.06 1.64 6.88 2.28 1.57 0.60 0.66 1.72 1.65 0.00 0.00 61.28 19.33 13.81 53.56

RE P 2 (N=20) Mea n SD 3.30 1.22 1.47 7.05 2.50 1.76 0.60 0.40 2.25 1.48 0.31 0.10 63.50 10.89 15.17 53.50

E ffect iven ess (%) E fficien cy (P r ob. / m in ) P er ceived E a se of Use

HE WUEP HE WUEP HE WUEP HE WUEP

31.63 51.83 0.07 0.17 3.23 4.25 3.36 4.52

30.53 54.91 0.07 0.14 3.44 4.16 3.56 4.18

26.28 56.41 0.05 0.14 3.03 3.73 3.32 3.82

P er ceived Sa t isfa ct ion of Use

10.89 16.09 0.02 0.06 1.01 0.57 0.84 0.36

08.63 12.49 0.03 0.05 0.70 0.61 0.64 0.47

09.13 11.45 0.02 0.06 0.89 0.56 0.84 0.49

Table 9

Table 9. p-values obtained for the Influence of Order of Methods and Experimental Objects Or der of Met hods

E xper im en t a l Objects

Depen den t va r ia ble E ffect iven ess E fficien cy P er ceived E a se of Use P er ceived Sa t isfa ct ion of Use

E XP No (0.161) No (0.846) No (0.871) No (0.339)

E ffect iven ess No (0.394) E fficien cy No (0.910) P er ceived E a se of Use No (0.908) P er ceived Sa t isfa ct ion of Use No (0.514) 1 Resu lt obt a in ed wit h t he Ma nn -Wh it ney n on -pa r a m et r ic test

RE P 1 No (0.166) No (0.769) No (0.672) No (0.160)1

RE P 2 No (0.275) No (0.536) No (0.350) No (0.579)1

No (0.642)1 No (0.882) No (0.734) No (0.270)1

No (0.664) No (0.709) No (0.454) No (0.419)

Table 10

Table 10. Summary of the Results of the Family of Experiments E xper im en t E XP RE P 1 RE P 2

Type of su bject s P h D Stu den ts Ma st er’s St u den t s Ma st er’s St u den t s

Num . of su bject s 12

H 1 a , H 2 a , H 3 a , an d H 4 a

32

H 1 a , H 2 a , H 3 a , an d H 4 a

20

H 1 a , H 2 a , H 3 a , an d H 4 a

H ypot heses accept ed

In flu ence of m eth od or der No No

In flu ence of object or der No No

No

No

Table 11

Table 11. The Hedges’ metric values for all the dependent variables Depen den t va r ia ble E ffect iven ess

E fficien cy

P er ceived E a se of Use

P er ceived Sa t isfa ct ion of Use

E xper im en t E XP RE P 1 RE P 2 Globa l Size E XP RE P 1 RE P 2 Globa l Size E XP RE P 1 RE P 2 Globa l Size E XP RE P 1 RE P 2 Globa l Size

E ffect

E ffect

E ffect

E ffect

E ffect Size (H edges’ g) La r ge (1.022) La r ge (1.146) La r ge (1.697)

Sign ifica n ce (p-va lu e) Yes (p = 0.003) Yes (p < 0.001) Yes (p < 0.001)

La r ge (1.243)

Yes (p < 0.001)

La r ge (2.261) La r ge (1.146) La r ge (1.443)

Yes (p < 0.001) Yes (p < 0.001) Yes (p < 0.001)

La r ge (1.352)

Yes (p < 0.001)

Mediu m (0.904) Mediu m (0.811) Mediu m (0.682)

Yes (p = 0.006) Yes (p < 0.001) Yes (p = 0.005)

Mediu m (0.785)

Yes (p < 0.001)

La r ge (1.294) Mediu m (0.825) Mediu m (0.451)

Yes (p < 0.001) Yes (p < 0.001) Yes (p = 0.046)

Mediu m (0.747)

Yes (p < 0.001)

Table 12

Table 12. Cronbach’s alphas for the reliability of questionnaires Depen den t va r ia ble P er ceived E a se of Use P er ceived Sa t isfa ct ion of Use

E XP Accept a ble (0.909)

RE P 1 Accept a ble (0.762)

RE P 2 Accept a ble (0.842)

Accept a ble (0.802)

Accept a ble (0.780)

Accept a ble (0.785)

Table A1

Table A1. Some modeling primitives of a NAD from OO-H Modeling pr im it ive

Mea n in g E a ch NAD h a s a u n iqu e E ntry Poin t User t h a t in dica tes t he st ar tin g poin t of t h e n a viga t ion pr ocess. A Collection is a h ier a r ch ica l st r u ct ure t h a t grou ps a set of n a viga t ion al lin ks. It is a n a bst r a ct ion from th e m en u concept . A N avigation al Class r epr esen ts a view of a set of a t t r ibu tes a n d m eth ods in a cla ss fr om t he UML cla ss dia gr a m t h a t defines t h e con ten t a nd t h e st a t ic st r u ctu re of t he Web a pplica t ion . A T arget L in k r epresen t s th a t t h e t a r get node is rea ch a ble by explicit u ser n a viga t ion . (Depicted a s a bold ar r ow) A S ource L in k represen t s t h a t t h e t a r get node is r each a ble in t h e sam e n a viga t ion step in wh ich t he sour ce node wa s rea ch ed. (Depicted a s a n em pty a r r ow) An Au tom ated L in k repr esent s t h a t t he t a rget node is rea ch able wit h no n eed for user n a viga t ion . (Depicted a s a n a rr ow with a br oken line) A S ervice L in k represen t s th e execu t ion of a m et hod from a n a viga t ion a l cla ss.(Depicted a s a n T arget L ink or S ource L in k with a gear icon )

Table A2

Table A2. User Operation Cancellability Metric Met r ic Usa bilit y a t tr ibu te Gener ic descr ipt ion

Sca le In t er pret a t ion

Oper a t ion a liza t ion

Th resholds

User Oper a t ion Ca n cella bilit y (UOC) Oper a bilit y / Con tr olla bilit y / Can cel su ppor t P ropor t ion bet ween t he nu m ber of oper a t ion s provided t h a t ca n not b e ca n celled by t he u ser pr ior to com plet ion a n d t he t ot a l n um ber of oper a t ion s r equ ir in g t he pre-ca ncella t ion capa bilit y. Ra t io bet ween 0 a n d 1. Th e h igh er th e va lue, t he wor se th e con t rolla bilit y t h a t a ppear s in t h e WebApp owin g t o t he fa ct t h a t it is necessar y to u se exter n a l opera t ion s (Web br owser act ion s) in or der to ret u rn to a pr eviou s st a te if t he u ser wish es to can cel t he cur ren t opera t ion . Th is m etr ic ca n be ca lcu la t ed for ea ch NAD by con sider in g t he S ervice L in k s t h a t h a ve been a ssocia ted wit h t h e N avigation al Classes m et hods a s u ser oper a t ions. These m et hods pr ovide t h e ca ncella t ion if a T argetL in k exists t h a t r etu r n s from t he S ervice L in k t o t he previou s n a viga t ion st ep. The ca lcu la t ion form u la is th er efore: Number of Service Links without a return Target Link OC(NAD) = Total number of Service Links [UOC = 0]: No u sa bilit y problem . [0 < UOC ≤ 0.3]: Low u sa bilit y problem . [0.3 < UOC ≤ 0.6]: Mediu m Usa bilit y P roblem . [0.6 < UOC ≤ 1]: Cr it ica l Usa bilit y P roblem .

Table A3

Table A3. Usability report for usability problem UP001 ID Descr ipt ion Usa bilit y a t tr ibu te Sever it y level E va lu a t ed a rt ifa ct s P roblem sour ce Occur ren ces Recom m en da t ion s

P 001 Th e oper a t ion “cr ea te a new con ta ct ” can n ot be can celled by t he user . E a se of u se / Con tr olla bilit y/ Ca ncel su pport Mediu m NAD (Na viga t ion a l Access Diagra m ) NAD (Na viga t ion a l Access Diagra m ) 1 Service Lin k wit hou t a r etu r n Ta r get Link Add a n ew T arget L in k between t h e S ervice L in k a n d t he Create Con tact n avigation al class in or der t o suppor t t he ca ncella t ion .

Table A4

Table A4. Usability report for usability problem UP002 ID Descr ipt ion

H eur ist ic a pplied Sever it y level E va lu a t ed a rt ifa ct s Sou rce problem Occur ren ces Recom m en da t ion s

P 002 - The t a bs do not provide feedba ck a bou t wh ich sect ion h a s been pr eviou sly selected. - The t it le “List of con t a cts” does n ot pr ovide feedba ck a bou t wh a t cr iter ia was u sed to filter t he con t a ct s. Visibilit y of t he System St a tu s Mediu m F UI (F in a l User In ter fa ce) AP D (Abst r act P resen t a t ion Diagr a m ) a n d Code gener a t ion ru les fr om NAD a n d AP D to F UI 2 E lem en t s th a t do not provide pr oper feedba ck a bou t t he Web a pplica t ion st a t u s. - Repla ce t he “List of con t a cts” t it le with a not her t h a t is m ore specific (in t he r ela ted AP D). - Repla ce t he code gener a t ion ru le t h a t provides t he t a bs in t he FUI sou rce code wit h a not her wh ich ca n sh ow t he cur ren t select ion.