UseLearn: A novel checklist and usability evaluation method for ...

4 downloads 17245 Views 2MB Size Report
Jan 24, 2017 - a synonym for Open and Distance Learning (ODL) and for web- ...... web server, and then the settings were changed in the configura-. tion files to match our ..... best indicator of the model fit, particularly when there is a data.
International Journal of Industrial Ergonomics 40 (2010) 455e469

Contents lists available at ScienceDirect

International Journal of Industrial Ergonomics journal homepage: www.elsevier.com/locate/ergon

UseLearn: A novel checklist and usability evaluation method for eLearning systems by criticality metric analysis Asil Oztekin a, *, Zhenyu James Kong a, Ozgur Uysal b a b

School of Industrial Engineering and Management, Oklahoma State University, 322 Engineering North, Stillwater, OK 74078, USA Department of Industrial Engineering, Fatih University, 34500 Istanbul, Turkey

a r t i c l e i n f o

a b s t r a c t

Article history: Received 28 June 2009 Received in revised form 22 March 2010 Accepted 2 April 2010 Available online 11 May 2010

This paper proposes a new usability evaluation checklist, UseLearn, and a related method for eLearning systems. UseLearn is a comprehensive checklist which incorporates both quality and usability evaluation perspectives in eLearning systems. Structural equation modeling is deployed to validate the UseLearn checklist quantitatively. The experimental results show that the UseLearn method supports the determination of usability problems by criticality metric analysis and the definition of relevant improvement strategies. The main advantage of the UseLearn method is the adaptive selection of the most influential usability problems, and thus significant reduction of the time and effort for usability evaluation can be achieved. At the sketching and/or design stage of eLearning systems, it will provide an effective guidance to usability analysts as to what problems should be focused on in order to improve the usability perception of the end-users. Relevance to industry: During the sketching or design stage of eLearning platforms, usability problems should be revealed and eradicated to create more usable and quality eLearning systems to satisfy the end-users. The UseLearn checklist along with its quantitative methodology proposed in this study would be helpful for usability experts to achieve this goal. Ó 2010 Elsevier B.V. All rights reserved.

Keywords: eLearning (web based learning/distance learning) Usability evaluation checklist Structural equation modeling Criticality metric analysis

1. Introduction eLearning is a compound word comprised of the abbreviation for “electronic” and the word “learning”. It is a modeled system for teaching and learning particularly designed to be applied from a distance by utilizing electronic communication such as internet, and it has recently become popular all over the world. Applications of eLearning can be as influential as the traditional face-to-face teaching and learning style, if the methods are suitable to the teaching tasks, i.e. there is a classroom-simulated student-teacher interaction, and hence the teachers provide students with feedback on time when required. The term eLearning can be regarded as a synonym for Open and Distance Learning (ODL) and for webbased learning. Both providers and consumers of eLearning would want education, training, and learning products and services that are effective and efficient. These concepts are encompassed by the term quality. Consumers of eLearning include students, school boards, education, training departments of governments, and corporations. Providers of eLearning may be publicly-funded schools, universities, and colleges, or they may be private

* Corresponding author. Tel.: þ1 405 744 46 64; fax: þ1 405 744 46 54. E-mail address: [email protected] (A. Oztekin). 0169-8141/$ e see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.ergon.2010.04.001

enterprises producing portions of eLearning content, design and production, delivery and management of learning, and/or student management (Hope and Guiton, 2006). Techniques to measure the quality of computer systems have been discussed for several decades, first under the heading of ergonomics and ease-of-use, and later under the heading of usability (Hornbaek, 2006). Anyone who has spent hours figuring out how to set a VCR clock already has a good idea of what usability is not. Usability can simply be defined as ease of use, the facility with which one can get something doing what it is intended to do. It can apply to practically any object that is used for some purpose (McNamara, 2003). Usability has also been defined as the extent to which an application is learnable and allows users to accomplish specified goals efficiently, effectively, and with a high degree of satisfaction. An additional component that should be added to this definition is usefulness; that is, a highly usable application will not be embraced by users if it fails to contain content that is relevant and meaningful to them (Miller, 2005). Formally defined, usability stands for “the capability to be used by humans easily and effectively”; “quality in use” (Bevan, 1999); “the effectiveness, efficiency, and satisfaction with which specified users can achieve goals in particular environments” (Hornbaek, 2006); how easy it is to find, understand, and use the information displayed on a web-based system (Keevil, 1998); and “the ultimate quality factor” for the

456

A. Oztekin et al. / International Journal of Industrial Ergonomics 40 (2010) 455e469

software architecture (Seffah et al., 2008). The International Standardization Organization (ISO) defines usability as the extent to which a product can be used by specified users to achieve specified goals with efficiency, effectiveness, and satisfaction in a specified context of use (ISO 9241-11, 1998). Usability is moving up the list of strategic factors to be dealt with, especially in software development (Juristo et al., 2007), and therefore usability evaluation has been an increasingly hot topic of the humanecomputer interaction area. There has been a perception that usability is somehow related to quality (ISO 9126, 1991; Nielsen and Mack, 1994; IEEE, 1998; ISO 14598-1, 1999; ISO 13407, 1999; Oztekin et al., 2009). Which one is seen as a subset of the other is a debate that can change from one study area to another one. The fundamental approaches have handled the concept of usability by design heuristics (Nielsen, 1993) and usability rules (Schneidermen, 1998), but they have not provided a quantitative basis for usability evaluation. However, the bottom line is that the end-users of eLearning systems ask for more usable and high quality systems as they do with other web-based systems. This requires a sophisticated methodology to improve the usability perception of eLearning end-users. There have been several attempts to evaluate the usability (Chin et al., 1988; Babiker et al., 1991; Macleod and Rengger, 1993; Rengger et al., 1993; Brooke, 1996; Kirakowski, 1996; Matera et al., 2002; McGee, 2004). Although these studies have provided widely accepted and applied usability evaluation methods, they need to be improved in many aspects. For example, they either do not propose a validated quantitative method along with their checklist or do not intend to calculate a single score for usability index. Their primary focus is how to address usability problems in a qualitative manner by ignoring how to tackle so many usability-related problems, improve them and retest the software system. Although a recent study to humanecomputer interaction problems by means of neuro-fuzzy methods is proposed by Nikov (2007), it does not specifically involve any kind of an eLearning usability evaluation method. From a general user-centered design perspective, Barcellini et al. (2009) investigate the user participation in the design process of an Open Source Software (OSS) called Python. They conduct a detailed analysis which characterizes the participation of various stakeholders in a user-centered design process for the Python programming language which investigates forms of participation and effective roles in OSS design that would improve the interaction between users and system developers (Barcellini et al., 2009). Specifically focused on the usability analysis perspective of eLearning systems, Squires and Preece (1999) review the existing usability checklists and criticize them in terms that most of the usability heuristics-based checklists lack consideration of learning. They propose a usability heuristics-based predictive evaluation methodology which is basically an extension of Molich and Nielsen’s (1990). This study presents a broader perspective for the eLearning software usability, yet it is not quantitatively validated by any case study. Similarly, Parlangeli et al. (1999) provide a three-step evaluation for the effect of usability on the learning assessment for the students. The first step involves heuristic evaluation by two experts in the humanecomputer interaction area. The second step was end-user evaluation conducted by ten college students, and the third step was performed by thirty-six high school students. The study supports the idea that the hypertexts can make the user feel lost, and the problem is more severe if the user is not familiar with the topic. The study itself admits that more quantitative results should be provided to validate the hypothesis underlying the study. Likewise, Storey et al. (2002) perfectly conceptualize the students’ and instructors’ perspectives and expectations from a web-based learning tool. They conduct a series of questionnaires on fifty-four college students and make inferences based on their responses to the questionnaires. This study provides a clear framework for both stakeholders, but it lacks

an analytical methodology as to why the usability is good or bad, and from what criteria it is affected to what degree. On the other hand, Downey et al. (2005) seek the effect of national culture on usability of eLearning. Creating fairly satisfactory representative pools of national cultures, arguably the most interesting finding they reached was that cultures with high power distance indicators tend to have more collectivist rather than individualistic tendencies. These cultures include China, India, Indonesia, Malaysia, and Singapore. They reported strong satisfaction with the eLearning system they used. Additionally, they had a strong correlation between being less averse to change and taking risks and higher errant click rates. Chiu et al. (2005) model the users’ intention to continue using an eLearning system by means of the expectancy disconfirmation theory (EDT). They analytically model the eLearning usability, quality, and values affecting satisfaction and hence the intention for eLearning continuance. This seems to be the foremost attempt to handle the eLearning usability in a cause-and-effect manner. However, it does not provide an explanation to establish criteria for unsatisfactory usability or quality. This question remains critical in order to improve the system to satisfy the end-users. In contrast, Hsu et al. (2009) develop the design criteria and an evaluation scale for eLearning systems. They propose four dimensions (instructional strategy, teaching material, learning tool, and learning interface) for evaluation scale which would enhance meaningful learning, integration of cognitive skills, effective web learning management, and sharing of real-world experience (Hsu et al., 2009). Using this evaluation scale, they compare three eLearning websites. The study presents a comprehensive scale, but it does not provide an analytical foundation to explain why one eLearning website is superior to another one, which is the substantial information to improve the eLearning system further. Payne et al. (2009) investigate the usage of an eLearning system in a workplace which requires chromosome analysis in boars. In this study, some tasks using the provided eLearning system are conducted by novice users of chromosomal analysis. The post-test questionnaires reveal the fact that a significant amount of these novice eLearning system users (w73%) agree their success in chromosome identification can be attributed to the easy-to-navigate and enjoyable simulations used in the eLearning system. Therefore, this research concludes that eLearning systems could also be used outside the conventional learning environment to train unskilled employees so that they can be assigned complex practical tasks by avoiding expensive instruction (Payne et al., 2009). In summary, the review of the related research reveals that there is a gap in the literature to uncover the main usability problems in a cause-and-effect manner. If the usability of an eLearning system is not satisfactory, the underlying usability metrics should be improved. This requires an analytical approach to handle the metrics efficiently and effectively since all cannot be improved definitively due to the constraints e.g. time and money. This fact has been pronounced in the literature, therefore Delice and Gungor (2009) proposed a heuristic evaluation (HE)-based analytic hierarchy process (AHP) for a university online library website to identify the usability problems by the HE and then rate their severity by the AHP. In this way, they defined the solution priority for the identified usability problems. At the end of their study, they also admitted that the usability problems may have a “dependent hierarchical structure” which could be handled by analytic network process (ANP). Alternatively, our study proposes a methodology to approach the problem in a step-by-step fashion. The details of the proposed methodology are further explained in Section 2. 2. Description of UseLearn method It is anticipated that usability and quality do affect each other (Bevan, 1995, 1999; Folmer and Bosch, 2004; Seffah et al., 2008;

A. Oztekin et al. / International Journal of Industrial Ergonomics 40 (2010) 455e469

457

Table 1 eLearning checklist approaches evaluating usability only.

Oztekin et al., 2009). Most of the usability and quality assessment approaches have many overlapping items in their checklists. Therefore, these seemingly separate approaches can be combined, and a new modified method can be created. This study mainly

focuses on the integration of quality and usability approaches of eLearning systems. In the following sections, UseLearn assessment checklist, assessment model, and data analysis procedure will be presented.

458

A. Oztekin et al. / International Journal of Industrial Ergonomics 40 (2010) 455e469

Table 2 Comparison of the eLearning usability and quality evaluation checklists. Pure usability dimensions

Common dimensions

Pure quality dimensions

Visibility Memorability Flexibility Reducing Redundancy Error Prevention

Aesthetics-Aesthetic Design Course Management-Course Information, Online Support, Content Support Interactivity-Opportunities for Interaction Consistency and FunctionalityConsistent and Functional Feedback and Help-Opportunities for Students to receive Feedback Efficiency-Completeness Accessibility-Accessibility

Clear Syllabus Alignment of Course Objectives Clearly Defined Learning Outcomes Variety of Learning Tasks Critical Thinking Opportunities for Self-Assessment Alignment between Objectives, Activities and Assessments

2.1. UseLearn checklist In the development process of UseLearn checklist, first of all the checklist items should be determined, and then for each item the corresponding checklist questions should be designed/worded. The goal of this study is to present a novel checklist which combines quality dimensions with usability dimensions. Therefore, the most appropriate quality and usability assessment checklists for eLearning systems were determined from both areas of evaluation. For specifically evaluating eLearning usability, some common checklists existing in literature are summarized in Table 1. All these checklists were developed to evaluate the usability of eLearning systems. However, the names of the dimensions that they used were different although they referred to the same or related concepts. In this study, they were clustered to explicitly reveal their overlapping and differentiating dimensions. Then Dringus and Cohen’s (2005) checklist approach was chosen because it was the most comprehensive one, and its items (as summarized in Table 1) were more concise than the others. It has 13 dimensions to compare with the quality-related checklist approaches as a further step. Similarly, the checklist approaches evaluating the eLearning systems in the aspect of quality only were also searched and studied. Among those, we decided to use “Quality Assessment Rubric for eLearning Design” (Beebe, 2004) as a checklist tool. Then the overlapping and differentiating dimensions among aforementioned usability checklists (as summarized in Table 1) and Quality Assessment Rubric for eLearning Design were identified. This comparison supported the fact that usability and quality are closely related to each other. A summary of this comparison is represented in Table 2. Considering merely the names of the checklist dimensions in Table 2 verifies the statement that quality and usability are strongly related to each other (Bevan, 1995, 1999; Folmer and Bosch, 2004; Seffah et al., 2008). This comparison and combination of usability and quality checklists provided more than 200 checklist questions in total including both the usability- and quality-related items. However, many of them were repeating each other, that is, addressing the same kind of problem of the eLearning system. Apparently, it was impossible to give the test participants a checklist containing so many questions and request them to answer it. Therefore, it was attempted to decrease the number of questions by selecting the ones that were measuring the related dimension of them most effectively, namely which are easier to understand. While creating the UseLearn checklist, we merged the overlapping dimensions by balancing them with equal number of questions adapted from both approaches if applicable. By doing so, it was intended to let the checklist fairly measure the usability and quality of the eLearning system at the same time. It was required to rename the overlapping dimensions taken from both approaches in different names or choose one of the existing names. For example as seen in Table 2, to decide between aesthetics from usability approach and aesthetic design from quality approach

Comprehensive Assessment Strategy

we chose “aesthetics”. We named a merged dimension as “completeness” which is constituted by efficiency coming from usability approach and completeness coming from quality approach. These checklist dimensions were named as high-level dimensions of UseLearn Checklist and measured by 36 questions on a 5-point Likert scale (Likert, 1932). The UseLearn checklist along with its dimensions and items is represented in Table 3. In terms of the usability evaluation, the most reasonable solution seems to increase the meaningfulness and strategic influence of usability data by representing the entire construct of usability as a single dependent variable (usability index) without sacrificing precision (Sauro and Kindlund, 2005). The usability index is a measure, expressed as a percentage, of how closely the features of a web site match generally accepted usability guidelines (Keevil, 1998). It is widely accepted that usability evaluation depends on efficiency, effectiveness, and satisfaction. Frokjaer et al. (2000) define effectiveness as the accuracy and completeness with which users achieve certain goals. Indicators of effectiveness include quality of solution and error rates. Efficiency is the relation (a) between the accuracy and completeness with which users achieve certain goals and (b) the resources expended in achieving them. Indicators of efficiency include task completion time and learning time. Satisfaction is the users’ comfort and positive attitudes towards the use of the system. Users’ satisfaction can be measured by attitude rating scales (Frokjaer et al., 2000). We named efficiency, effectiveness, and satisfaction as low-level dimensions of the UseLearn checklist because these measures cannot be changed by the user interface designer/usability analyst consciously and directly. Instead, they arise as a result of the highlevel dimensions of the checklist. For example: If the button to change the password in a system is not visible enough (i.e. not located at a visible place on the page), it would take a long time for a user to find it, select it, and change his/her password. This would decrease the usability of the system in terms of speed; hence, in turn, the efficiency of the eLearning system will also be decreased. Efficiency, effectiveness, and satisfaction are measured by 5 additional questions on the checklist. These questions are asked for measuring learning time, task completion time (both of which refer to efficiency); task completion percentage and error rates (both of which refer to effectiveness). At the very end of the checklist, by taking the previous questions into account the participants are asked if they are satisfied with the eLearning course they received, which aims to measure the satisfaction. 2.2. UseLearn assessment model As explained in Section 2.1, the high-level dimensions of UseLearn checklist can be linked to low-level dimensions of it through usability index in a causal (directional) manner. Namely, a change in high-level dimensions would affect the usability perception of the end-users, and hence affect their effectiveness, efficiency, and satisfaction while

A. Oztekin et al. / International Journal of Industrial Ergonomics 40 (2010) 455e469

using the eLearning systems. Structural equation modeling (SEM) is a very appropriate technique to analyze such a relationship. It is a statistical technique that is able to examine causal variables (Jöreskog, 1970). SEM approaches the data differently from the classical statistical methods such as multiple regression or ANOVA. The parameters in the SEM model are connection strengths or path coefficients between different variables, which reflect the effective connectivity. Parameters are estimated by minimizing the difference between the observed covariance and those implied by a structural or path model (McIntosh and Gonzalez-Lima, 1994). SEM can be viewed as a combination of factor analysis and regression. It provides a general framework for statistical analysis that includes many traditional multivariate procedures as special cases (e.g., factor analysis, regression analysis and canonical

459

correlation) (McIntosh and Gonzalez-Lima, 1994). The main goal of SEM is to express the pattern of a series of inter-related dependent relationships simultaneously between a set of latent (unobserved) constructs, each measured by one or more manifest (observed) variables. The measured (manifest) variables are collected from test participants through data collection methods. These variables are represented by the numeric responses to a rating scale item on a questionnaire. In contrast to this situation, latent (unobserved) variables are not directly observed. Examples of latent constructs are usability, customer satisfaction, or quality. In fact, latent variables are theoretical/conceptual constructs which can only be calculated as a combination of the observed variables. By SEM the latent constructs are grouped as exogenous constructs and endogenous constructs. In structural equation models, one or more linear

Table 3 UseLearn checklist questions along with their abbreviated symbols. Dimensions and their items/indicators Corresponding UseLearn checklist questions Error Prevention Error Prevention 1 Error Prevention 2 Error Prevention 3

Can multiple but similar tasks be done easily? Can the user easily undo selections, actions, errors in arrangement or management of items? Do error or warning messages prevent possible errors from occurring?

Visibility Visibility 1 Visibility 2 Visibility 3

Are options (buttons/selections) logically grouped and labeled? Is the intended functionality clear for each option or selection? Is course content meaningfully arranged with links from the homepage?

Flexibility Flexibility 1 Flexibility 2

Is the speed of loading course page high enough? Can users personalize their online learning environment by adding resources, content, learning objects to their own course page?

Course management Course Management Course Management Course Management Course Management

1 2 3 4

Does the course contain important information for the online students and link to support areas? Does the course provide specific resources to support online student learning? Are files easy to upload? Are files easy to download and view?

Interactivity, feedback and help Interactivity, Feedback and Help 1 Interactivity, Feedback and Help 2 Interactivity, Feedback and Help 3

Does the course offer multiple opportunities for interaction and communication among students, to instructor, and to content? Is regular feedback about student performance provided in a timely manner? Is the user provided with sufficient information to know where in the system he/she is?

Accessibility Accessibility 1 Accessibility 2 Accessibility 3

Are screen features adaptable to individual user preferences? Are accessibility issues addressed throughout the course? Are alternative pathways to course content and activities available?

Consistency and functionality Consistency & Functionality 1 Consistency & Functionality 2 Consistency & Functionality 3

Is consistent form and style used for various titles and headers? Do the activity, icon, button, label, and links provide clear purpose/intent that matches the tasks? Does the interface provide adequate “back” button functionality to return to a previous screen?

Assessment Strategy Assessment Strategy 1 Assessment Strategy 2 Assessment Strategy 3

Does the e-learning system require students to self-assess their readiness for online instruction prior to class? Are there multiple assessment strategies to measure content knowledge, skills, and performance standards? Are learning objectives, instructional and assessment strategies closely aligned?

Memorability Memorability Memorability Memorability Memorability

Is Is Is Is

1 2 3 4

the user offered sufficient FAQ and human support to obtain necessary help? cognitive load reduced by providing familiarity of items and action sequences? information presented in organized chunks to support learnability and memorability? there sufficient visibility so the user does not have to look for things and try to remember them?

Completeness Completeness 1 Completeness 2 Completeness 3

Are meaningful labels and descriptive links used to support recognition? Is the course well organized, easy to navigate, and logical? Can you clearly understand all components and structure?

Aesthetics Aesthetics 1 Aesthetics 2

Is there proper use of color or graphics that enhance navigation? Are the screens pleasing to look at?

Reducing redundancy Reducing Redundancy 1 Reducing Redundancy 2 Reducing Redundancy 3

Does modifying an action or activity require excessive “redoing” to make a single change? Are items visible in multiple places and from multiple paths? Are learning objects easily created and reused?

460

A. Oztekin et al. / International Journal of Industrial Ergonomics 40 (2010) 455e469

Fig. 1. UseLearn assessment model.

regression equations explain this dependence between the endogenous and exogenous constructs. The coefficients in these equations are called path coefficients or regression weights. There is a critical difference between factor analysis and SEM modeling. In factor analysis, the manifest variables can load on any factors (constructs) and the number of factors is constrained. But when using SEM, the manifest variables can load only onto particular constructs. Although the main target of SEM is the analysis of unobserved constructs and specifically the analysis of causal relations between unobserved constructs, it is also possible to do some other analyses by SEM such as estimating variance and covariance, test hypotheses, conventional linear regression, and factor analysis. Therefore, SEM seems to be preferable to conventional statistical methods in complex analysis of situations. For example it would be useful if a multiple regression is needed to analyze several dependent variables from the same group of independent variables simultaneously. It is especially preferable if one dependent variable affects another variable at the same time as well. SEM is a strong modeling technique for effectively dealing with multi-collinearity (if many variables are highly correlated), which is one of the benefits of SEM which makes it stronger than factor analysis and multiple regression. Considering all these features of SEM, it can be utilized to determine the weights of each item of UseLearn checklist dimensions. For example, the items (indicators) of visibility dimension are three UseLearn checklist questions denoted by VIS1-VIS3 in Fig. 1. Note that in

Fig. 1, rectangular shapes refer to manifest variables, and oval shapes refer to latent ones. In UseLearn method, these aggregated high-level dimensions will be regarded as the causes for the latent (unobserved) construct usability index. This latent construct would in turn affect the low-level dimensions of usability, namely efficiency, effectiveness, and, satisfaction. Therefore, with the utilization of SEM, the usability of eLearning systems can be assessed by considering both low- and highlevel dimensions of usability in a causal fashion. To improve usability, high-level dimensions would be required to change through the indirect effects in SEM. Moreover, in the analysis, one might be interested in improving a specific low-level dimension of usability (e.g. improving usability by decreasing the error rates of the end-users). This granularity is also provided by UseLearn method through indirect effects of the UseLearn checklist questions on low-level dimensions of usability. Hence, the most and the least critical dimensions of usability can be determined easily for further improvement. 2.3. UseLearn data analysis steps To find the most influential usability problems in an eLearning system and remove them in order to improve the overall eLearning system usability, it is required to reveal the relationships among the checklist items and the usability index. Then the importance level for these influential items can be ranked. In order to achieve these, the UseLearn data analysis is conducted as depicted in Fig. 2.

A. Oztekin et al. / International Journal of Industrial Ergonomics 40 (2010) 455e469

Fig. 2. UseLearn data analysis steps.

Fig. 3. Planning the experiment in Moodle.

461

462

A. Oztekin et al. / International Journal of Industrial Ergonomics 40 (2010) 455e469

Table 4 Descriptive statistics for UseLearn items. Items

Mean

Std. Dev.

Course Management 3 Course Management 4 Course Management 1 Course Management 2 Error Prevention 3 Interactivity, Feedback and Help 3 Consistency and Functionality 1 Aesthetics 1 Flexibility 2 Flexibility 1 Interactivity, Feedback and Help 2 Accessibility 2 Interactivity, Feedback and Help 1 Accessibility 1 Memorability 2 Aesthetics 2 Reducing Redundancy 3 Error Prevention 2 Memorability 4 Completeness 3 Memorability 3 Completeness 2 Accessibility 3 Error Prevention 1 Consistency and Functionality 3 Reducing Redundancy 1 Consistency and Functionality 2 Reducing Redundancy 2 Assessment Strategy 2 Completeness 1 Assessment Strategy 1 Visibility 3 Visibility 1 Assessment Strategy 3 Memorability 1 Visibility 2

2.97 3.41 3.47 3.52 3.52 3.58 3.58 3.60 3.64 3.70 3.70 3.72 3.73 3.73 3.75 3.77 3.80 3.83 3.85 3.92 3.92 3.94 3.94 3.94 3.99 3.99 4.02 4.11 4.15 4.16 4.18 4.19 4.32 4.35 4.38 4.40

2.30 2.20 2.32 2.21 2.22 2.06 2.10 2.22 2.19 2.14 2.12 2.19 2.10 2.04 2.03 1.97 2.03 2.15 1.99 2.04 1.87 1.96 1.95 2.05 1.84 2.02 1.97 1.90 1.80 1.83 1.82 1.80 1.70 1.84 1.79 1.72

First of all, data should be collected from the end-users through UseLearn checklist by requesting them to use the eLearning system to be analyzed. Secondly, the mean scores for each checklist item should be calculated by considering the Likert-type evaluations of the end-users. Then exploratory and confirmatory factor analyses (EFA and CFA) are employed to validate the relations amongst the checklist items and dimensions. The causal relations and hence the indirect effects of the checklist items on the usability index and on the low-level dimensions of usability are determined by SEM. Combining the inverse of the mean scores and the indirect effects, the criticality metric for each checklist item is computed as a next step to determine the importance level of each usability problem. As a last step, usability problems are ranked according to their criticality metric, so that further improvement strategies on the eLearning system can be developed. In the following steps, the detailed description of EFA, CFA, path modeling, and criticality metric calculation in UseLearn data analysis steps are explained. Step 1: Exploratory Factor Analysis (EFA) At step 1, high-level dimensions of usability index are determined/extracted by exploratory factor analysis (EFA) with principal axis factoring. Due to potential conceptual and statistical overlap, it is attempted to produce parsimonious set of distinct non-overlapping variables from the full set of items underlying each construct (Nunnally, 1978). Step 2: Confirmatory Factor Analysis (CFA) At step 2, the measurement models extracted at step 1 for each construct are tested using confirmatory factor analysis (CFA) in

order to determine the overall fit to the data. The CFA technique is based on the comparison of variance-covariance matrix obtained from the sample to the one obtained from the model. The goodness-of-fit of SEM model is usually assessed by a chi-square test and goodness-of-fit indices such as standardized root mean square residual (SRMR) and comparative fit index (CFI) (Bentler, 1995). The following fit index cut-off value for good models is used as a rule of thumb: CFI ¼ 0.90 in combination with SRMR ¼ 0.08, which can retain acceptable proportions of simple and complex true-population models and reject reasonable proportions of various types of misspecified models (Hu and Bentler, 1999). Considering Fig. 1, a CFA relationship for checklist item EP1 can be constructed as in Eq. (1),

EP1 ¼ lx11 error prevention þ d1

(1)

where lx refers to the relationship effect between the latent variable (error prevention) and its indicator (EP1), and d refers to the measurement error. Step 3: Path Modeling At step 3, usability index construct is tested using the method of maximum likelihood estimation. Lisrel 8.8Ò software (Joreskog and Sorbom, 2008) is used to test the causal relationships in the model as depicted in Fig. 1. Usability index is a latent variable that represents the criteria for improving the usability performance. It is affected by the high-level dimensions of eLearning usability, and can be analyzed as in Eq. (2):

h1 ¼ g11 x1 þ g12 x2 þ / þ g1ð12Þ x12 þ z1

(2)

h1 refers to usability index, all g values refer to 12 high-level dimensions of usability as in Fig. 1 (i.e. error prevention, visibility, flexibility, and etc.), x values correspond to their path coefficients and z1 is the measurement error. Usability index affects the low-level dimensions of it, and these relations can be formulated as in Eqs. (3)e(5).

y1 ¼ ly11 h1 þ j11

(3)

y2 ¼ ly21 h1 þ j22

(4)

y3 ¼ ly31 h1 þ j33

(5)

where y1, y2, and y3 refer to low-level dimensions of usability index, namely efficiency, effectiveness, and satisfaction. ly ’s are the corresponding path coefficients, and j’s are the random errors. Step 4: Criticality Metric Analysis Usability analysts usually do not have enough time to confront all of the problematic usability problems to improve the overall eLearning system usability. Therefore, they would like to know which item(s) is/are the most critical one(s) in improving the usability index. However, the general tendency as summarized in Section 1 shows that most of the usability evaluation methods adopt a straightforward way by considering only the most problematic items with the smallest mean/average values for each checklist item evaluated by the end-users based on a Likert-type checklist. This approach lacks one important point of view: Is it really worth tackling with that/those item(s)? In other words, although the item has the smallest mean value based on a surveybased dataset, would it make a significant effect on the usability

A. Oztekin et al. / International Journal of Industrial Ergonomics 40 (2010) 455e469

463

Table 5 Exploratory factor analysis of UseLearn checklist dimensions. Symbol

Factors

Error Prevention 1 Error Prevention 2 Error Prevention 3

0.70 0.69 0.68

1

Visibility 1 Visibility 2 Visibility 3

3

4

5

7

8

9

10

11

12

0.91 0.71

Management Management Management Management

1 2 3 4

0.87 0.78 0.70 0.62

Interactivity, Feedback & Help 1 Interactivity, Feedback & Help 2 Interactivity, Feedback & Help 3

0.76 0.68 0.52

Accessibility 1 Accessibility 2 Accessibility 3

0.90 0.67 0.66

Consistency & Functionality 1 Consistency & Functionality 2 Consistency & Functionality 3

0.72 0.60 0.60

Assessment Strategy 1 Assessment Strategy 2 Assessment Strategy 3 Memorability Memorability Memorability Memorability

6

0.80 0.80 0.61

Flexibility 1 Flexibility 2 Course Course Course Course

2

0.59 0.56 0.46

1 2 3 4

0.76 0.75 0.69 0.51

Completeness 1 Completeness 2 Completeness 3

0.62 0.59 0.45

Aesthetics 1 Aesthetics 2

0.73 0.61

Reducing Redundancy 1 Reducing Redundancy 2 Reducing Redundancy 3

0.55 0.51 0.51

index after changing the corresponding item(s) accordingly? Therefore, UseLearn method proposes to take into account both mean values for each item and also indirect effects of these items on usability index which are calculated by the SEM. This simultaneous consideration would help determine which worthy items to focus on for further improvement. It basically requires that checklist items with the smallest mean values be chosen, provided that a unit change in them would make a significant change on the usability index. The focus here would be a combined metric which considers both the smallest checklist item along with the biggest impact. To make those two contradictory measures comparable, we take the inverse of the mean values for checklist items based on end-user checklist evaluations. By multiplying this inverse of the mean value for a checklist item with the indirect effect of it on the usability index, the target now would be to select the biggest multiplication result, which we named as criticality metric as given in Eq. (6)

Criticality Metric ¼ Indirect Effect

To exemplify, the indirect effect of EP1 on usability index can be calculated by lx11 g11 using Eqs. (1) and (2). Then the criticality metric for each checklist item would be calculated by multiplying the indirect effect term (namely lx g) with the inverse of the average checklist-based evaluation scores. The higher the indirect effect of a particular checklist item on the usability index, the higher the criticality metric. Also, the lower the checklist-based evaluation scores for the same item, again the higher the criticality metric. Hence, both of the information sources are captured in the term, criticality metric. Note that if desired, more granularity on the analysis can also be achieved by analyzing the effect of the checklist items on the low-level dimensions of the usability index. For example, using Eq. (6) one might explore the effect of EP1 on a low-level dimension (e.g. efficiency). This can easily be achieved by extending the indirect effect of EP1 on usability index and then on efficiency by means of Eq. (4). The indirect effect can therefore be rewritten as lx11 g11 ly11 .

1 Average of Checklist Evaluations

(6)

464

A. Oztekin et al. / International Journal of Industrial Ergonomics 40 (2010) 455e469

Table 6 Confirmatory factor analysis of UseLearn checklist dimensions. Dimensions and their items

Regression weight

t-value

CR

Table 7 Structural equation model overall fit indices. AVE

SMC

Error prevention Error Prevention 1 Error Prevention 2 Error Prevention 3

0.79 0.81 0.76

9.30* 9.61* 8.81*

0.83

0.62

0.63 0.65 0.58

Visibility Visibility 1 Visibility 2 Visibility 3

0.86 0.78 0.64

9.88* 8.72* 6.81*

0.81

0.59

0.74 0.61 0.41

Flexibility Flexibility 1 Flexibility 2

0.55 0.94

5.20* 7.84*

0.73

0.59

0.30 0.88

0.82 0.82 0.83 0.79

9.91* 9.92* 10.06* 9.46*

0.89

0.66

0.67 0.67 0.68 0.63

Interactivity, feedback & help Interactivity, Feedback & Help 1 Interactivity, Feedback & Help 2 Interactivity, Feedback & Help 3

0.79 0.68 0.63

8.81* 7.35* 6.71*

0.74

0.49

0.62 0.46 0.40

Accessibility Accessibility 1 Accessibility 2 Accessibility 3

0.71 0.74 0.80

7.71* 8.15* 9.02*

0.79

0.56

0.50 0.55 0.64

Consistency & functionality Consistency & Functionality 1 Consistency & Functionality 2 Consistency & Functionality 3

0.61 0.67 0.69

6.41* 7.12* 7.42*

0.69

0.43

0.37 0.44 0.48

Assessment strategy Assessment Strategy 1 Assessment Strategy 2 Assessment Strategy 3

0.38 0.70 0.65

3.55* 7.05* 6.52*

0.61

0.35

0.14 0.49 0.42

Memorability Memorability Memorability Memorability Memorability

1 2 3 4

0.73 0.75 0.78 0.57

8.19* 8.50* 8.98* 5.95*

0.80

0.51

0.53 0.56 0.61 0.32

Completeness Completeness 1 Completeness 2 Completeness 3

0.72 0.63 0.74

7.98* 6.78* 8.36*

0.74

0.49

0.51 0.40 0.55

Aesthetics Aesthetics 1 Aesthetics 2

0.73 0.71

8.03* 7.81*

0.69

0.52

0.53 0.51

Reducing redundancy Reducing Redundancy 1 Reducing Redundancy 2 Reducing Redundancy 3

0.73 0.59 0.52

7.73* 6.12* 5.31*

0.66

0.38

0.53 0.35 0.27

Course management Course Management Course Management Course Management Course Management

1 2 3 4

*All values are significant at 0.05 level.

3. Application of UseLearn method through an eLearning system in cell biology 3.1. Planning and preparing the experiment In this study, an eLearning biology course was examined to illustrate the UseLearn method. A cell biology course was selected as the representative eLearning course tool because a qualitative course would be easier to understand by the help of an eLearning system rather than a quantitative one. We also wanted to utilize the graphics and figures broadly to make use of the eLearning system. The most suitable course fitting to these requirements was considered to be a biology course.

Overall model fit indices

Reasonable Model fit criteria results

Degrees of freedom (df) Chi-square statistic Chi-square statistic/df Comparative fit index (CFI) Non-normed fit index or Tucker-Lewis Index (NNFI-TLI) Incremental Fit Index (IFI) Parsimony goodness of fit index (PGFI) Root mean square error of approximation(RMSEA) Standardized root mean residual (SRMR)

e e 2e5 >0.90 >0.90 >0.90 >0.50