ERGONOMICS, 2003,
VOL.
46,
NO.
1 ± 3, 220 ± 241
The development of a theoretical framework and design tool for process usability assessment RICHARD J. KOUBEK{*, DAREL BENYSH{, MICHELLE BUCK}, CRAIG M. HARVEY} and MIKE REYNOLDS} {Pennsylvania State University, 310 Leonhard Building, University Park, PA 16802, USA {IBM Corporation, Highway 52 and NW 37 Street, Rochester, MN 55901, USA }Wright State University, 3640 Colonel Glenn Hwy, 207 Russ Engineering Center, Dayton, OH 45435, USA }Louisiana State University, 3128 CEBA, Baton Rouge, LA 70803, USA Keywords: Usability, predictive performance models, human ± computer interaction design. The purpose of usability engineering is to facilitate the deployment of new products by decreasing development costs and improving the quality of systems. This paper will discuss the development and delivery of a unique, theoretically based software tool that provides engineers and designers with easy access to the most recent advances in human ± machine interface design. This research combines several theoretical views of the interaction process into a hybrid model. Based on this model, a software tool was produced that allows engineers to model the human interaction process within their design. The system then provides feedback on the interaction process through items such as: the amount of mental eort required by a user, the degree to which the system conforms to human capabilities, the expected time to complete the interaction, where potential human error may occur, as well as potential misunderstandings or points of confusion to the users. The designer may then use this information to improve the design of the system. Validation of this technique indicates that the hybrid model produces accurate predictions of usability attributes and that the technique transfers from the laboratory to the real world.
1. Introduction A review of the scienti®c literature reveals that usability research has primarily focused on the static elements of the human ± machine interface, such as keyboard design, workstation layout and graphical displays. However, very little is available to help engineers design the interaction process. Designing this process (a sequence of decisions and actions users go through to complete a task using a machine) is often quite complex, requires highly skilled personnel, and is time-consuming. The National Science Foundation (NSF) report entitled New Directions in HCI Education, Research, and Practice examined the implementation of technology in the workplace and consumer environment and noted that `...one of the limiting factors *Author for correspondence; e-mail:
[email protected] Ergonomics ISSN 0014-0139 print/ISSN 1366-5847 online # 2003 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/0014013021000035244
Process usability assessment
221
on the use of new technology is no longer the cost of production and distribution, but the cost of learning and using the technology' (Strong 1994). Further, the solution to this situation focuses on the need to improve the design process as NSF points out that `...designers need to develop and provide proper tools and processes.' NSF calls for a two-fold approach that addresses the development of (1) theoretical models for understanding interaction and (2) practical tools for engineers and designers. Likewise, the recent President's Information Technology Advisory Committee (PITAC) report expanded on these concerns by advocating `. . . major improvements must be made to methods for software development, veri®cation and validation, maintenance, user interfaces to computing systems and electronically represented information. . . ' (PITAC 1999). In order to address these issues, one must look at the nature of the usability problem. Keenan et al. (1999) suggest that the usability problem in dealing with human ± computer interfaces can be broken into two components: artifact and task. The artifact component deals with the static interface of a program, whereas the task component deals with the dynamic aspects of a program or the interaction process with the user. This research focuses on the task component of the usability problem. This paper discusses the development of a hybrid model for representing the human interaction process and the subsequent design of a theoretically based software tool that provides engineers easy access to the most recent advances in human ± machine interface design (Benysh 1997). This framework and tool can ultimately improve industrial competitiveness by decreasing development costs and improve human interaction with complex systems that are found in today's manufacturing, consumer, and service environments (Dillon 2001, Hedge 2001). In addition, it speci®cally addresses the concerns of both NSF and PITAC in pursuing the goal of improving eciency and eectiveness of system design. 2. Human ± environment interaction model In the process of design and evaluation of usability, four key components of a general human ± environment interaction (HEI) model must be addressedÐthe environment, the human, the tool, and the task. The HEI model shown in ®gure 1 illustrates the interactions between the model components. The model highlights a set of important aspects to consider in the development of a framework and
Figure 1.
A general Human ± environment Interaction model.
222
R. J. Koubek et al.
methodology for process usability evaluation. The environment refers to the setting in which the human uses the tool to complete a task including elements such as lighting and temperature. Important aspects of the human component of the HEI model are processing methods, capabilities, knowledge of the task, and knowledge of how to use the tool. Methods depicting human processing follow very closely the various versions of the human information processing models, containing cognitive, perceptual, motor, and memory resources. The tool component contains elements such as uses, functions, functioning (including how to use it as well as how it performs the work), capabilities, usage risks, error handling, and control and display arrangements. Finally, important aspects of the task portion of the HEI are the ultimate goal and particular task attributes. The task goal speci®es the purpose for the human's interaction with the environment in the ®rst place. 2.1. Modelling the HEI Current theories and their derived models can be placed into two classes, applied cognitive models and procedural knowledge structure models. Experts in the modelling, usability, or design evaluation ®elds typically develop applied cognitive models or interface evaluation models (e.g. Vicente 1999). These techniques include a number of modelling languages and their associated methodologies for model development. Furthermore, these techniques can be classi®ed as models of external tasks, user knowledge, user performance, or task knowledge (De Haan et al. 1993). Table 1 summarizes these dierent techniques with respect to usage and provides examples. Particularly, user performance models such as GOMS (Goals, Operators, Methods, and Selection Rules), NGOMSL (Natural GOMS Language), and CCT (Cognitive Complexity Theory) provide useful insight to some of the goals proposed
Table 1. Modelling type
Examples of applied cognitive models.
Uses
External tasks Speci®cation language for how to translate a task into commands in a given environment. User Represent and analyse knowledge knowledge required to translate goals to actions in a given environment. User Describe, analyse, predict user performance behaviour and performance. Similar to User Knowledge models except that they provide quantitative performance measures. Task knowledge
Provide a speci®cation for the full representation of the system interface and task at all levels of abstraction.
Examples
References
External Internal Task Mapping (ETIT)
(Moran 1983)
Action Language, Task-Action Grammar (TAG)
(Reisner 1983, Payne and Green, 1986)
Goals Operators Methods and Selection rules (GOMS), Natural GOMS Language (NGOMSL), Cognitive Complexity Theory (CCT) Command Language Grammar (CLG), Extended Task-Action Grammar (ETAG)
(Card et al. 1983, Kieras, 1988, Bovair et al. 1990) (Moran 1981, Tauber 1990)
Process usability assessment
223
by NSF. These cognitive models provide quantitative measures or predictions of user performance, which include items such as execution time, learning time, and working memory load. Furthermore, these GOMS-type models are implemented in a fairly generic language, have been successfully automated (Byrne et al. 1994, Keiras 1997) and have been applied to areas other than HCI (Grey et al 1993; Koubek et al. 1994). While these GOMS-type models represent alternative procedures to accomplish the task and provide a complete task goal hierarchy, they only address one level of abstraction at a time. Bene®ts of GOMS-type models include the emphasis upon user performance prediction and formalized grammar for describing user tasks. Criticisms of the GOMS-type modelling techniques include the diculty to cope with errors, restricted application to tasks involving little or no problem solving, and their reliance on quantitative aspects of representing knowledge at the expense of qualitative aspects (De Haan et al. 1993). The second class of models includes the traditional declarative knowledge structure models that allow designers to organize task knowledge. These models include the task applied cognitive models such as Command Language Grammar (CLG) and Extended Task-Action Grammar (ETAG). One such model, the Procedural Knowledge Structure Model (PKSM) contains a signi®cant amount of structure as is found in human cognitive representations. Unlike CLG and ETAG, it also represents the procedural knowledge used in task execution found in the other applied cognitive models. Finally, it demonstrates the procedures, automation, and modularity capabilities. The primary contribution is the model's focus on the structural aspects of knowledge while simultaneously incorporating important procedural aspects of the cognitive models (Benysh and Koubek 1993). PKSM represents a task as a three-dimensional pyramid similar to that shown in ®gure 2. The model is sectioned into levels, each containing a ¯owchart representation of the task steps. Objects in the ¯owchart may be task goals (which can be further decomposed), basic task actions (which can be performed), or decision nodes (which control the ¯ow through the chart). In ®gure 2, task goals are represented by rectangles. These goals are subdivided into smaller goals, task elements, or decisions nodes at the next lower level, much like the GOMS-type models. In this same ®gure, basic task actions are represented by ellipses and diamonds represents decision nodes. These task actions and decision nodes, once encountered, will perpetuate through to lower levels since they cannot be further decomposed (Benysh and Koubek 1993). Where traditional declarative knowledge structures models are usually descriptive of actual elicited human knowledge, the PKSM is more readily capable of representing more `prescriptive' knowledge. This capability is similar to the hierarchical task decomposition of the CLG and ETAG, except providing a more ¯exible representation scheme with more functionality. It represents this prescriptive knowledge by modelling the steps, sequences, decisions, and levels of detail required in a process or task. Furthermore, as a structural model, it is capable of being subjected to traditional (declarative) knowledge structure analysis, such as level of abstraction and multiple relations (Goldsmith et al. 1991, Benysh and Koubek 1993). This analysis, in turn, is capable of revealing aspects of user behaviour in complex task domains (Goldsmith and Kraiger 1997). Additionally, parameters corresponding to psychological principles associated with skill and performance can be de®ned. Thus, PKSM is capable of distinguishing between the task performance of experts and novices as well as assessing the usability of the task in general.
224
R. J. Koubek et al.
Figure 2.
Pyramid-type structure of PKSM.
3. Current design tools Many ®elds of engineering have design tools that promote cooperative modelling between the human and the computer. A brief review of these tools will reveal that while these tools have similar goals at improving the user interaction, none of them perform the desired modelling to represent the user interaction process. The ®ve tools that this paper examines are User Interface Management Systems, Guide Line Management Systems, Simulation Languages, Standardized Knowledge Representations, and Predictive Performance Models. User Interface Management Systems (UIMS) are available in each of the interaction paradigms (manual, critic-based, constraint-based, automatic, cooperative) and can provide a programming environment that facilitates the design of the software interface (Lewis and Rieman 1994). Unfortunately, current progress focuses primarily upon the static appearance of the HCI interface and does not address the dynamic interaction or procedural aspects. For example, UIMS includes MIKE (the Menu Interaction Kontrol Environment) (Olsen 1986), UIDE (User Interface Design Environment) (Sukaviriya et al. 1993), HUMANOID (Szekely et al. 1993), and ITS (Wiecha et al. 1990). Guide Line Management Systems facilitate the designer's access to large volumes of work containing HCI design standards, rules, and guidelines. These guidelines specify in exact detail how every feature of the software interface should appear. For example, the Guideline for Usability through Interface Development Experience (GUIDE) provides a comprehensive set of guidelines to various interface problems (Henninger 2000). This facilitates easy access to information necessary for producing proper interface designs. However, it is important to note that they typically provide
Process usability assessment
225
little assistance in the actual design of the interface and focus almost exclusively on the static appearance of the human ± computer interface. Simulation packages provide an environment, language, and structure for modelling user interaction. They are useful in the design process and o-line experimentation with existing systems, as well as for hypothesis testing. Simulation tools require training to use correctly and more importantly, produce valid results. The most noteworthy are SAINT (System Analysis of Integrated Networks of Tasks) or the smaller version Micro-SAINT for the microcomputer. Human operator simulation tools are beginning to incorporate known principles of human cognition or the hierarchical nature of procedures. Knowledge representation, modi®cation, and transmission languages typify the standardized knowledge representation language. Most projects in this area are directed towards the development of standardized knowledge encoding methods and transfer protocols to enable the sharing and reuse of knowledge bases (Genesereth and Fikes 1992). The primary focus of these projects is to develop a standard to enable the sharing of very diverse knowledge bases (Finin et al. 1992). Some work has begun in the area of Predictive Performance Models similar to the focus of this paper. Hudson et al. (1999) describe the development of the CRITIQUE usability evaluation tool that develops a GOMS model by capturing the user as they interact with the user interface. CRITIQUE then automatically generates a GOMS Keystroke-Level Model (KLM), eliminating the need for KLMs to be done by hand. 4. User-Environment Modelling Language (UEML) Applied cognitive models perform a number of functions. First, the task knowledge models, such as the Extended Task-Action Grammar (ETAG), provide formalisms to complete task speci®cation from the semantic to the elemental motion level. The GOMS-type models on the other hand, demonstrate superior modelling of control structures and empirical, quantitative performance predictions of usability. Finally, the knowledge structure models provide a simpler, more intuitive, structural representation that can be subjected to quantitative and qualitative structural analysis. While each type of model captures a single element of the user interaction process, none completely represents the user's interaction. Thus one should consider what a usability framework and subsequent tool must provide designers. Two elements, modelling dynamic interaction and user knowledge, are proposed for the UEML framework. First, the resulting framework must allow one to evaluate the dynamic interaction between the human and machine interface. The machine may be a computer, vending machine, cellular phone, manufacturing equipment, car stereo, etc. and thus a framework adaptable to nearly any domain is useful. In general, the task domain is limited to discrete control tasks. Thus continuous control tasks that require constant monitoring of states or supervisory control tasks are not considered in the present case. Instead, a typical discrete control situation would involve performing part of the task directed towards a goal, assessing the resulting state, and deciding the next portion of task to be attempted. Furthermore, in terms of Rasmussen's (1986) taxonomy, the domain would encompass a wide assortment of skill-based and rule-based tasks, but exclude knowledge-based tasks. As a result, the framework should have the capability to model tasks, including required actions, cognitive steps, decisions, perceptual inputs, and motor outputs.
226
R. J. Koubek et al.
Second, the framework should represent the knowledge needed to work with the system being designed, including domain knowledge, knowledge of activities required to achieve goals, knowledge of how the tool's interface works, and how the system works in terms of internal actions invisible to the user. To meet both of these objectives, a framework called the User-Environment Modelling Language (UEML) was developed. This framework is centred on the creation of the object-oriented class hierarchy displayed in ®gure 3. This organization de®nes the class-based inheritance structure, in which objects (nodes or items in the process model) have certain prede®ned characteristics while other features are inherited from parent objects. These characteristics include the objects' attributes, functioning, and how they manage linkages to other objects. In order for the uni®ed framework (UEML) to model the user's knowledge structure and interactions, a hybrid model that contains Knowledge Structure (KS) and Cognitive Modelling methodologies is needed. Since the focus was on modelling tasks and procedures, the Procedural Knowledge Structure Model (PKSM) was selected as the initial model to represent the structural aspects of the task knowledge as well as procedures inherent in the interface. Since the core of PKSM is the structure of procedural knowledge, not the actual procedural elements, the syntax is fairly generic and can be applied to a wide variety of task domains. Consequently, the PKSM structural model can be used as an alternative representation of other models of cognition and behaviour. Although PKSM can be used as an alternative representation to either theoretic or applied cognitive models, PKSM suers from formal de®nitions of how to specify task goals and levels. Applied cognitive models are able to better express the activities that occur, perceptual stimuli, cognitive steps, or motor responses, as well as provide a few quantitative measures of usability. Therefore to operationalize the UEML framework NGOMSL was used. The use of NGOMSL allows for the
Figure 3.
Taxonomy of objects.
227
Process usability assessment
procedural elements to be modelled by representing goals, sub goals, decisions, steps, sequences, input, output, etc. Therefore, the resulting model permits both NGOMSL and PKSM-types of analysis. Table 2 provides for how NGOMSL relates to PKSM structure. 4.1. Implementing the UEML framework To implement the UEML framework, the Procedure Usability Analysis (PUA) tool was created. This windows-based user driven interface, shown in ®gure 4, provides all the features common to graphical software, including the ability to insert, delete, modify, drag and drop, copy and paste objects. In addition, it contains a search tool and the ability to view an object's properties. PUA was designed to implement procedural ± declarative interaction as speci®ed in the UEML framework. For example, the PUA tool will trace the procedure or task execution and subsequently when a decision node is reached, the tool requires declarative knowledge in order to proceed. This declarative knowledge may reside in working memory or it may need to be extracted from long-term memory. The ®nal version of UEML limited this extraction process to identifying the declarative knowledge items referenced in the completion of a process. It is left to the designer to ascertain the availability of these items. Further, upon `entering' a node, its respective slow, medium, and fast times can be added to the node along with the running totals for the task. As is common in most modelling approaches, the more generic the language, the wider the range of applicability, however increased eort in model development is also required. Simplifying the modelling process primarily involves reducing the number and detail in the types of modelling objects necessary in a model. This can be accomplished by providing a set of prede®ned, highly constrained speci®c nodes, representing whole sequences of more Table 2.
Structure and control in NGOMSL and PKSM. NGOMSL
general hierarchical decomposition simple linear sequence simple logical decision simple looping control structure selection-rule structure
Method to accomplish goal of `X' 1) Accomplish goal of `A' 2) Accomplish goal of `B' 3) etc. 1) Accomplish goal of `A' 2) Accomplish goal of `B' 3) etc. ... IF 5condition4 THEN accomplish goal of `C' ELSE accomplish goal of `D' ... 1) accomplish goal of `C' 2) IF 5condition4continue process 3) ELSE GOTO (1) Method to select goal of `F' IF5cond_14accomplish goal of `C' IF5cond_24accomplish goal of `D' IF5cond_34accomplish goal of `E'
PKSM
228
R. J. Koubek et al.
Figure 4.
The PUA interface main screen.
elemental modelling objects. PUA allows for the development of `composite' nodes that make the model creation easier by reducing the amount of elemental `internal cognitive process' nodes required in the model. Figure 5 illustrates an elemental node model while ®gure 6 displays a reduced or Composite PUA model. This reduction simpli®es the modelling process, however, it may also decrease the tool's range of applicability. Further, oversimpli®cation may not capture the speci®c details that distinguish between similar tasks in the same HEI, potentially reducing the model's accuracy. Nonetheless, the eect of using composite models (based on observable actions) and ignoring internal cognitive details will be examined. The potential capabilities of this UEML and the associated PUA are: . . . . .
Descriptive: model a particular individual rule-set / knowledge. Prescriptive: model the ideal rule-set / knowledge. Predictive: given a descriptive or prescriptive model along with a speci®c task de®nition (parameters, states), be able to predict performance (time, errors, etc). Requirements speci®cation: given any two of knowledge (rule-set), tool, or environment, be able to specify the requirements for the third. Function allocation between human and machine.
5. Hypotheses To assess the potential of the UEML framework as implemented within the PUA, three hypotheses were evaluated.
Process usability assessment
Figure 5.
229
Example excerpt using elemental nodes.
Hypothesis One: Usability measures produced by the PUA will be better predictors of actual task completion time, workload, and usability than the measures of PKSM and NGOMSL combined.
230
R. J. Koubek et al.
Figure 6.
Example excerpt using composite nodes.
Hypothesis Two: Modelling usability at the elemental (operator) level of user activity is more accurate than modelling at the composite level. Hypothesis Three: Real world transferÐPUA will produce similar predictive results in the real world as in the laboratory. 6. Method A two-stage approach (laboratory and ®eldwork) was used to validate the uni®ed framework (UEML) and the resultant PUA system. This work was performed in conjunction with the United States Postal Service (USPS). As such, the software tool was used to develop interaction models for a recently designed vending system, the Postage and Mailing Center (PMC). The laboratory and ®eldwork portions of the project were designed to validate the PUA model. A preliminary usability questionnaire was screened, in the laboratory stage, to identify a set of questions whose responses represent a set of factors contributing to perceived usability of the PMC system. The ®nal version of the revised questionnaire was used in the ®eld stage of the experiment. 6.1. Usability questionnaire development A usability questionnaire was created for the purpose of this experiment in two distinct stages. The initial questionnaire included ®ve dierent questions from four
Process usability assessment
231
dierent usability categories found in literature. Booth (1989) stated that usability has the following aspects: learnability, usefulness, eectiveness, and attitude (satisfaction). Similarly, Schneier and Mehal (1984) state that usability has the following measurable aspects: (1) Easy to learn: the eort to learn or relearn a task should be minimized. (2) Useful: the eort to perform a task should be minimized. (3) Tolerant of errors: it should be dicult to make errors. When they occur, it should be easy to recover. (4) Pleasant to use: users should have a positive attitude. These categories were summarized to be: learnability, usefulness, eectiveness, and satisfaction. Initial questionnaire development followed this categorization to provide a certain degree of content validity. Further, within each of the categories ®ve dierent questions were created. The resulting 20-question initial questionnaire was evaluated during the laboratory experiment. 6.2. Laboratory experiment In preparation for laboratory testing, three PUA models were developed that represent three possible tasks when using the USPS PMC: (1) purchasing stamps, (2) weighing and purchasing postage for a package, and (3) changing mailing addresses. For each task, a set of scenarios (e.g. retail transactions a customer may accomplish) covering a range of `paths' through the modelled processes was developed to provide a representative sample of PUA object types. Scenarios varied in terms of diculty, duration, and content for each task type. 6.2.1. Subjects: For the laboratory portion of the research, 60 participants (37 males and 23 females) performed 15 scenarios that represented the three tasks on the USPS PMC. The mean age of the participants was 30 and they had on average a college sophomore education level. The participants were recruited from a temporary employment-contracting agency. The agency paid the participants at a rate standardized by their skill level. 6.2.2. Laboratory experimental trials: Experimental trials were broken down into three parts: preparation, instructions and observation. The preparation portion of the experiment contained a brief overview of the experiment and consent procedures. During the instructional portion of the trial, subjects became acquainted with NASA-TLX (Hart and Staveland 1988) (e.g. the mental work-load assessment tool), the usability questionnaire, and the USPS machine itself. Participants were also told about the tasks they would be asked to perform and familiarized with each task described on a speci®c Task Instruction Card. The researcher demonstrated how the Task Instruction Card would be used and demonstrated the completion of the three types of tasks using the USPS PMC machine. During the observation portion of the experiment, each subject completed two sets of 15 trials (scenarios). A Task Instruction Card was then developed for each trial (scenario) used during the experiment. The two sets were identical except that they were in a dierent randomized order and the second set included instructions to complete a usability questionnaire for three pre-identi®ed tasks. For the ®rst set of trials, the subject selected a Task Instruction Card, read the card for understanding,
232
R. J. Koubek et al.
completed the task, and then completed the NASA-TLX computerized questionnaire. The ®rst set of trials provided each subject the same level of familiarization with the USPS PMC machine (note: none of the subjects had previously ever used the USPS PMC machine). During the second set of trials, each subject completed the same set of trials, except that instructions to complete a usability questionnaire were placed in the stack following three scenarios. These three scenarios had been identi®ed as high, medium, and low for mental workload. The second set of trials was video-taped for observation analysis. Thirty of the 60 participants were randomly extracted from the video tapes to obtain accurate elapsed times for each trial and to estimate the parameters required by the dierent modelling techniques. The parameters derived from this group were then used to analyse the data from both the laboratory and ®eld participants. Data from the remaining 30 subjects was then used for analysis with respect to the predictions made by these techniques. This group was to provide data for the initial analysis portion of the experiment. 6.2.3. Usability questionnaire analysis: Using the questionnaire data obtained in the ®rst task of the laboratory experiment, the second task was the analysis and re®nement of the Usability Questionnaire. Cronbach's Alpha was used to assess internal consistency for each of the four question categories, with ®ve items per category. Typically, scores of a 40.9 are considered quite good. However, scores of a 40.7 are considered to be acceptable in many situations such as statistical research on sizable samples (Cronbach 1990). The results of this assessment indicated acceptable to good internal consistency within the four question groups (learnability, a=0.87; usefulness, a=0.72; eectiveness, a=0.88; satisfaction, a=0.83). The only questionable ®nding is when evaluating the second question group, `usefulness,' with respect to the three dierent task types. More variability was found in the responses to the questions regarding usefulness of purchasing stamps and changing a mailing address. In particular, the third and fourth questions of this category produced large variances. Both of these questions related to the participants' opinion on their success or performance in accomplishing the goals of the task whereas the other questions related much more directly to the machine. It is most likely that the observed variances are a result of individual biases in evaluating their own performance. Regardless, the internal consistency for the question group over all three task types was considered satisfactory. Following the completion of the internal consistency analysis, principal components analysis was used. This analysis was intended to identify several questions that appeared to provide a clear measure of an aspect of usability. The results, however, indicated that both of the components were nearly equally balanced across all of the questions. Following the inconclusive results of the Principal Components analysis, a Factor Analysis was performed. The Factor Analysis technique produced a bit more variety in contributions for the various questions to the factors. Several of the questions were identi®ed as being able to distinguish between the two usability factors. These questions were identi®ed by the size of their absolute dierence between their factor weights. Then, using these questions, a stepwise regression was performed to see which of the questions were capable of accounting for variability in responses with respect to the three task types. The results indicated three of the candidate questions. The fourth question was selected by the size of its absolute
Process usability assessment
233
dierence alone. The product of these analyses was a reduced usability questionnaire (table 3) consisting of four of the original questions, one from each usability question category. The result of this task in the validation methodology was the reduction of the preliminary Usability Questionnaire to a more manageable set of questions for use in the ®eld-testing phase of the validation. 6.3. Field data collection Field data was collected by observing and interviewing customers at two facilities in the USA: Ann Arbor, Michigan and Miami, Florida. Both locations had implemented the PMC within their postal facilities. Customer task execution time and task steps describing the aspects of the task in order to determine the task path for later replication in the three modelling environments were collected from a total of 100 customer observations. 7. Data extraction and encoding Thirty of the 60 subjects' data from the laboratory trials were randomly selected to extract the parameters required for the three predictive models. This task involved acquiring elapsed node times from video tape and the analysis of those times. More speci®cally, these times were used in: (1) The revised PUA model required more accurate time estimates for PMCspeci®c Input and Output elemental nodes. (2) The simulation model required the identi®cation of the distribution's parameters for each observed node. (3) The Composite PUA model required mean times for observed nodes. The Composite PUA models were created as discussed earlier along with changing the time estimates for the nodes to match the mean estimates collected from the ®eld observations. As a result, these models were limited to only producing the mean time estimate. The output of this model was used to test hypotheses one and two. Table 3.
Revised usability questionnaire.
Learnability question number 5: Was the amount of time it took learn to do this task on this machine acceptable? Very acceptable Acceptable Borderline Unacceptable Very unacceptable Eectiveness question number 6: How eective was this machine in completing this task? Very eective Eective Borderline Usefulness question number 15: How useful was this machine in completing this task? Very useful Useful Borderline
Ineective
Very ineective
Useless
Very useless
Satisfaction question number 16: Are you satis®ed with this machine in terms of the time to complete task? Very satisfactory Satisfactory Borderline Unsatisfactory
Very unsatisfactory
234
R. J. Koubek et al.
The video tape was used to capture all the user ± machine interactions. Data from the video tape was used to acquire time information. For the ®rst set of 30 subjects, the video tape was used exclusively to assess elapsed times for observed nodes. For the second set of subjects, the video tape was used to collect data on the total completion time for each task trial as well as to observe the number of deviations in the task performance. To facilitate this process, a piece of software was created for observational data encoding. This software runs on a system comprised of a computer workstation with a time code reader card and an edit-quality VCR. The system allows for precise control of the VCR for locating the exact position on the video tape where an observational item begins and ends, then it transfers those time codes (at 1/30 s resolution) into a Microsoft1 Excel spreadsheet. This system was used to extract and record elapsed times for task elements in each of the three basic PMC task types. The observed task elements fell into two categories: composite nodes and elementals. The composite nodes were de®ned as larger, meaningful sub-tasks within task type, such as `Weigh Package', `Pay With Cash', or `Enter Address'. These nodes encompassed all the activities required to accomplish all the tasks in the scenarios discussed earlier. In addition, a set of elemental nodes was compiled which contained all the minute observable nodes in the three basic PMC tasks, such as `insert a coin', `push a button', `print a stamp', and `retrieve change'. These elemental nodes were needed to provide times for unknown PMC-speci®c human and machine input and output activities. To promote consistency in identi®cation of these nodes, the following de®nitions were used: (a) from the moment machine output is completed until the beginning of the ®rst human action, (b) from one human action to the next action or end of last action (for a sequence of actions, each will be an observed node), (c) from the last human action to the beginning of ®rst machine output, (d) from one machine output to the next output or end of last output (for a sequence of outputs, each will be an observed node). Each rule's letter (A, B, C, or D) represented a portion of the human ± machine interaction process, as represented in ®gure 7. This decomposition of the tasks into a collection of node types was utilized to provide a valid set of observable, repeatable, identi®able, nodes for the purpose of data collection. 7.1. Development of PUA predictive models Using the collected elapsed times for the various node types, a mean time elapsed for each observable node was computed. The mean estimates for the composite nodes (®gure 6) were used in the composite PUA models and the mean time estimates for the elemental nodes (®gure 5) were used to update the original PUA models. The original elemental PUA models were revised using the results from the elemental analysis to update the node time data for each elemental node. The revised PUA models more accurately re¯ect the true time elements that were unknown when the models (and the PUA program) were created. Other than the modi®ed elemental input/output node times, these models were unchanged. More speci®cally, for each instance of these elemental nodes in the models, the slow, medium, and fast times
Process usability assessment
Figure 7.
235
Observable nodes for task decomposition.
were all replaced by the same mean value for the respective node. Therefore, the models are unchanged with the exception that the slow and fast predictions are no longer as meaningful as they once were. As a result, a majority of the ®nal analysis will focus on only the medium predictions. 8. Results and discussion 8.1. Hypothesis one Hypothesis one proposed that the PUA measures would be better predictors of actual task completion time, workload and usability than the measures of PKSM and NGOMSL combined. To test hypothesis one, a stepwise regression of the usability predictions from the PUA, NGOMSL, and the PKSM was performed. The PUA software provided prediction outputs for each of these techniques. To evaluate this hypothesis, the dependent variables time to complete the task, mental workload assessment, usability questionnaire average response, and the number of deviations from the task goals were evaluated against each predictive model. Pearson's r was computed for each dependent variable for both the PUA and NGOMSL+PKSM models. A t-test was then conducted to determine if the correlations (r) were signi®cantly dierent than zero. Correlations for each of the dependent variables were found to be signi®cantly dierent than zero (p50.0001) for each model. To determine if the PUA and NGOMSL+PKSM were dierent in predicting the dependent variable, a subsequent Fischer's z-test was performed. Although the correlation of completion times predicted by the PUA (r=0.97) and NGOMSL+PKSM (r=0.96) were both signi®cantly close to the actual time taken to complete these tasks, the Fischer's z-test revealed that the PUA correlation was signi®cantly dierent than the NGOMSL + PKSM combined (z=2.11, p=0.0178). This indicates that the PUA model is a better predictor of actual task performance. Similarly, task goal deviations found the PUA model (r=0.67) and NGOMSL+PKSM (r=0.63) with signi®cant correlations. Once again though, Fischer's z-test found that the PUA model was marginally better at predicting task goal deviations (z=1.64, p50.051). Perceived workload and usability questionnaire
236
R. J. Koubek et al.
responses found no dierences between the PUA and NGOMSL+PKSM techniques. Overall, PUA was equal or better as a predictor than NGOMSL and PKSM combined. 8.2. Hypothesis two Hypothesis two proposed that modelling usability at the elemental (operator) level of user activity is more accurate than modelling at the composite level. This experimental question examined the accuracy in time predictions using a variety of approaches varying the level of element decomposition. Using the same experimental data as hypothesis one, the testing for this hypothesis compared the accuracy of time predictions produced by the three dierent modelling techniques: PUA (medium times only), revised PUA (as discussed in }7.1 using medium times), and Composite PUA. The actual elapsed task performance times were extracted from a portion of the subject's data. Accuracy was calculated by correlating each technique's (PUA, revised PUA, and Composite PUA) predicted time against the actual user performance time for the task (refer to table 4). The correlations of the task durations were subjected to Fischer's z-test to determine if there were signi®cant dierences between the correlations. Looking at table 5, we can see that all three models are signi®cantly dierent from each other (p50.05). This analysis shows that the revised PUA predictions have a signi®cantly higher correlation with actual task performance time. The Composite model predictions are not as accurate as the revised PUA models. Further, as expected, the revised PUA is more accurate than the original PUA models that did not include PMC-speci®c time parameters for input and output nodes speci®c to the machine (p=0.017). However, this dierence does not compare to the dierence between the Composite and both types of PUA models (p50.0001). The marginal increase in accuracy of the revised PUA required the observational data collection for the machine-speci®c node time parameters. In retrospect, using the PUA or Composite model would appear to be sucient. While it could be argued that the Composite model takes less time to
Table 4.
Pearson's r r2 t statistic Ho (null hypothesis): R=0 Fischer's z transformation
Table 5.
Summary task duration correlations.
Composite PUA
PUA medium
Revised PUA medium
0.89 0.79 58.54 (p50.0001) 1.42
0.92 0.85 72.03 (p50.0001) 1.61
0.93 0.87 77.71 (p50.0001) 1.68
PUA, revised PUA, and composite PUA comparison.
z-statistic matrix
Composite PUA
PUA medium
Revised PUA medium
Composite PUA
Ð
PUA medium
Ð
5.65 (p 50.0001) Ð
Revised PUA medium
Ð
Ð
7.76 (p50.0001) 2.11 (p=0.017) Ð
237
Process usability assessment
create, in fact, it took longer to create since time was required to identify and obtain time estimates for the composite nodes. 8.3. Hypothesis three Hypothesis three was designed to evaluate the transferability of the PUA model to what would be found in the real world. Thus the ®nal task was to validate the PUA predictions with that obtained in the ®eld portion of this experiment. Assessing this hypothesis required two types of analysis. The ®rst analysis for this portion of the experiment was performed by a stepwise regression of the PUA predictions for each customer's task against the actual usability questionnaire responses from the customer. For the second analysis, a correlation of the observed task completion time and the PUA predicted task durations: slow, medium, and fast was computed. To perform the ®rst analysis, Pearson's r and Fischer's transformation were performed on the data. The z-statistics for comparing these transformed correlations were calculated for each pair of correlations as shown in table 6. The results indicate that there is not sucient evidence to reject the null hypothesis when comparing against the accuracy of the laboratory questionnaire and laboratory predictions (workload, questionnaire). A signi®cant dierence exists between the ®eld questionnaire predictions and the accuracy of the laboratory deviation predictions. The second portion of the testing of this hypothesis examined the accuracy of the PUA predictions for task completion time in the ®eld with the accuracy obtained in the laboratory portion of the study. The results in table 7 show that there is sucient evidence to conclude that the accuracy of the laboratory predictions is signi®cantly greater than the predictions for the ®eld data (p50.05). Dierences in the environments in which the data were collected account for these ®ndings. The laboratory setting exists primarily to control the amount of outside variability entering into the observations, in order to more accurately assess the eects of interest. In the ®eld setting, no such controls exist. The task performance predictions are signi®cant in spite of this unknown variability. As the hypothesis is stated, it seems reasonable to conclude that the time prediction based on the hybrid model provide similar success when transferred to the real world from the laboratory experiments. Therefore, hypothesis three is supported. 9. Conclusion The purpose of this research was to develop a hybrid model of the human ± environment interaction that is capable of modelling knowledge of a task process,
Table 6. z-statistic matrix
Field vs laboratory measure comparisons. Field Laboratory questionnaire questionnaire
Laboratory workload
Laboratory deviations 9.93 (p50.0001) 9.52 (p50.0001) 10.16 (p50.0001) Ð
Field questionnaire
Ð
Laboratory questionnaire
Ð
0.41 (p=0.69) Ð
Laboratory workload
Ð
Ð
0.23 (p=0.82) 0.64 (p=0.52) Ð
Laboratory deviations
Ð
Ð
Ð
238
R. J. Koubek et al. Table 7.
z-statistic matrix Field Laboratory
PUA fast PUA medium PUA slow Original Medium Revised Medium
Field vs laboratory PUA model comparison. PUA fast
Field PUA medium
PUA slow
Laboratory Original PUA Revised PUA medium medium
Ð Ð Ð
0.26 Ð Ð
0.53 0.27 Ð
7.03 6.43 5.82
8.60 8.00 7.40
Ð
Ð
Ð
Ð
2.11
Ð
Ð
Ð
Ð
Ð
p50.05
the equipment involved, and the environment in which the task occurs. This goal was achieved by combining several theoretical views of the interaction process into a single model, the User Environment Modelling Language (UEML). Further, based upon this hybrid model, the second objective of this research was to incorporate this model into a practical design tool. The resultant Process Usability Assessment (PUA) software tool provides an environment in which designers can create processes or procedures and then receive feedback on the usability of their design. Upon completion of these objectives, the modelling technique was validated in both the laboratory and ®eld. The results of the ®rst hypothesis found that the UEML measures produce equally accurate correlations between predicted and actual performance than those generated by the two techniques (NGOMSL and PKSM) combined. The second hypothesis showed that the detailed UEML models predicted usability better than a technique using composite modelling nodes. Finally, the ®eld experiment revealed that the UEML technique transferred fairly well to a real world setting. Two potential limitations of the experimental instruments were found. The ®rst limitation is that the laboratory participants' responses to the workload assessment decreased over time. This indicates a practice eect that is not likely to occur in the real world. However, results of the ®eld data analysis indicate that the approach is still fairly accurate. The second potential limitation concerns the laboratory usability questionnaire. From the analysis it appears that the length and repeated application of the questionnaire reduced the variation in laboratory responses and thus the questionnaire's sensitivity. The PUA tool derived from the UEML can be used in many real world applications. The use of such a tool, incorporated into the design process, can result in practical bene®ts, such as decreasing development time and costs, as well as increasing product quality. Additionally, the hybrid model developed in this research provides the designer with immediate feedback. In addition to direct time savings, the usage of this modelling technique results in three potential areas of cost savings. First, decreasing development time will also reduce the total costs associated with the project. Second, the feedback provided by the tool has the potential of reducing the need for an expert consultant and the associated costs. Finally, the tool provides an indirect savings in that more usable products result in higher potential utilization and customer satisfaction.
Process usability assessment
239
A cognitive modelling expert could most likely produce the resultant Process Usability Assessment (PUA) models in about the same amount of time it takes to produce NGOMSL or PKSM models, but not both. Therefore, PUA allows a modeller to build one model that produces a full range of measures. Furthermore, the analysis process is certainly more ecient using the UEML PUA when one considers design revisions, corrections, and system improvements that will require further product remodelling. The approach followed in this study, namely the development of theoretical understanding of interaction and the development of a practical design tool, ®lls the needs identi®ed by the NSF and PITAC publications concerning the implementation of new technologies in the workplace and consumer environment (Strong 1994, PITAC 1999). These objectives were indicated as essential to reducing factors limiting the use of new technology. By providing a means to alleviate these limitations, this research, as a result, facilitates the deployment of new technologies by decreasing development costs and improving the quality of systems. It is anticipated that further research can proceed in two avenues, UEML expansion as well as evaluation within other domains. In general, expansion of the UEML would be directed towards making the representation of the machine and environment more complete and it would also involve making the analysis of the internal cognitive processes more robust. Exploration of the utility of the technique in other domains could include using the tool in the design process, comparison between alternative designs, and exploring the dierences between the user's knowledge and the tool's representation. Acknowledgements This work was funded in part by the United States Postal Service. References
BENYSH, D. V. 1997, Development of a theoretical framework and design tool for process usability assessment. PhD dissertation, Purdue University. BENYSH, D. V. and KOUBEK, R. J. 1993, The implementation of knowledge structures in cognitive simulation environments, Proceedings of HCI International `93 (New York: Elsevier), 309 ± 314. BOOTH, P. 1989, An Introduction to Human ± Computer Interaction (London, Lawrence Erlbaum Associates). BOVAIR, S., KIERAS, D. E. and POLSON, P. G. 1990, The acquisition and performance of textediting skill: a cognitive complexity analysis. Human Computer Interaction, 5, 1 ± 48. BYRNE, M. D., WOOD, S. D., SUKAVIRIYA, P. N., FOLEY, J. D. and KIERAS, D. E. 1994, Automating interface evaluation, Proceedings of CHI `94 Conference on Human Factors in Computing Systems (New York: ACM), 232 ± 237. CARD, S. K., MORAN, T. P. and NEWELL, A. 1983, The Psychology of Human ± Computer Interaction (Hilldale, NJ: Lawrence Erlbaum Associates, Inc). CRONBACH, L. 1990, Essentials of Psychological Testing (New York, NY: Harper & Row). DE HAAN, G., VAN DER VEER, G. C. and VAN VLIET, J. C. 1993, Formal modeling techniques in human ± computer interaction, in G.C. Van der Veer, S. Bagnara and G.A.M. Kempen (eds), Cognitive ErgonomicsÐContributions from Experimental Psychology (Amsterdam: Elsevier), 27 ± 68. DILLON A. 2001, HCI hypermedia: usability issues, in W. Karwowski (ed.), International Encyclopedia of Ergonomics and Human Factors (London: Taylor & Francis), 672 ± 698. FININ, T., MCKAY, D. and FRITZSEN, R. (eds) 1992, An Overview of KQML: A knowledge query and manipulation language, The KQML Advisory Group (University of Maryland).
240
R. J. Koubek et al.
GENESERETH, M. R. and FIKES, R. E. (eds) 1992, Knowledge Interchange Format, Version 3.0 Reference Manual, Technical Report Logic-92-1, Computer Science Department, Stanford University. GOLDSMITH, T. E. and KRAIGER, K. 1997, Structural knowledge assessment and training evaluation, in J. K. Ford (ed.). Improving Training Eectiveness in Work Organizations (Hillsdale, NJ: Erlbaum) 73 ± 96. GOLDSMITH, T. E., JOHNSON, P. J. and ACTON, W. H. 1991, Assessing Structural Knowledge, Journal of Educational Psychology, 83, 88 ± 96. GREY, W. D., JOHN, B. E. and ATWOOD, M. E. 1993, Project Ernestine: validating a GOMS analysis for predicting and explaining real-world task performance, Human ± computer Interaction, 8, 237 ± 309. HART, S. G. and STAVELAND, L. 1988, Development of the NASA task load index (TLX): results of empirical and theoretical research, in P. A. Hancock and N. Meshkati (eds), Human Mental Workload (Amsterdam: Elsevier), 139 ± 183. HEDGE, A. 2001, Consumer product design, in W. Karwowski (ed.), International Encyclopedia of Ergonomics and Human Factors (London: Taylor & Francis), 916 ± 919. HENNINGER, S. 2000, A methodology and tools for applying contextual usability guidelines to interface design, Interacting with Computers, 12, 225 ± 243. HUDSON, S. E., JOHN, B. E., KNUDSEN, K. and BYRNE, M. D. 1999, A tool for creating predictive performance models from user interface demonstrations, Proceedings USIT `99 (Asheville: ACM), 93 ± 102. KEENAN, S. L., HARTSON, H. R., KAFURA, D. G. and SCHULMAN, R. S. 1999, The usability problem taxonomy: a framework for classi®cation and analysis, Empirical Software Engineering, 4, 71 ± 104. KIERAS, D. E. 1988, Towards a practical goms model methodology for user interface design, in M. Helander (ed.), Handbook of Human ± Computer Interaction (Amsterdam: Elsevier), 135 ± 157. KIERAS, D. E. 1997, A guide to GOMS model usability evaluation using NGOMSL, in Handbook of Human ± computer Interaction, 2nd edn (Amsterdam: Elsevier), 733 ± 766. KOUBEK, R. J., SALVENDY, G. and NOLAND, S. 1994, Use of hybrid task analysis for personnel selection: combining consensus-based job analysis with protocol analysis for a computer-based task, Ergonomics, 37, 1787 ± 1800. LEWIS, C. and RIEMAN, J. 1994, Task-Centered User Interface Design: A practical introduction (unpublished shareware book: URL: ftp://ftp.cs.colorado.edu/pub/cs/distribs/clewis/ HCI-Design-Book). MORAN, T. P. 1981, The Command Language Grammar: a representation for the userinterface of interactive systems, International Journal of Man ± Machine Studies, 15, 3 ± 50. MORAN, T. P. 1983, Getting into the System: external-internal task mapping analysis, Proceedings CHI `83 (New York: ACM), 45 ± 49. OLSEN, D. R. 1986, MIKE: The Menu Interaction Kontrol Environment, ACM Transactions on Graphics, 5, 318 ± 344. PAYNE, S. J. and GREEN, T. R. G. 1986, Task-Action Grammars: a model of the mental representation of task languages, Human ± Computer Interaction, 2, 93 ± 133. PITAC 1999, Information Technology Research: Investing in Our Future, President's Information Technology Advisory Committee report. RASMUSSEN, J. 1986, Information Processing and Human ± Machine Interaction (New York: North-Holland). REISNER, P. 1983, Analytic tools for human factors in software, in A. Blaser and M. Zoeppritz (eds), Proceedings of Enduser Systems and their Human Factors (Berlin: Springer Verlag), 94 ± 121. SCHNEIER, C. A. and MEHAL, M. E. 1984, Evaluating usability of application interfaces, Human ± Computer Interaction, Proceedings of the First USAÐJapan Conference, Honolulu, HI, USA (Amsterdam: Elsevier), Advances in Human Factors/Ergonomics, vol 1, 129 ± 133. STRONG, G. W. 1994, A Report: New directions in human ± computer interaction education, research, and practice (Washington, DC: NSF Planning Workshop).
Process usability assessment
241
SUKAVIRIYA, P., FOLEY, J. D. and GRIFFITH, T. 1993, A second generation user interface design environment: the model and the runtime architecture, Human Factors in Computing Systems, Proceedings of INTERCHI `93, Amsterdam, The Netherlands (New York: Association for Computing Machinery), 375 ± 382. SZEKELY, P., LUO, P. and NECHES, R. 1993, Beyond interface builders: Model-based interface tools, Human Factors in Computing Systems, Proceedings of INTERCHI '93, Amsterdam, The Netherlands (New York: Association for Computing Machinery), 383 ± 390. TAUBER, M. J. 1990, ETAG: Extended Task-Action GrammarÐa language for the description of the user's task language, Proceedings of Interact `90 (Amsterdam: Elsevier), 163 ± 168. VICENTE, K. J. 1999, Cognitive Work Analysis: Toward safe, productive, and healthy computerbased work (London: Lawrence Erlbaum). WIECHA, C., BENNETT, W., BOIES, S., GOULD, J. and GREENE, S. 1990, ITS: a tool for rapidly developing interactive applications, ACM Transactions on Information Systems, 8, 204 ± 236.