Serious Games Usability Testing: How to Ensure ... - Semantic Scholar

Serious Games Usability Testing: How to Ensure Proper Usability, Playability, and Effectiveness Tanner Olsen, Katelyn Procci, Clint Bowers Department of Psychology, University of Central Florida, Orlando, FL, 32816, USA {t_olsen, kprocci}@knights.ucf.edu, [email protected]

Abstract. Usability testing is an important, yet often overlooked, aspect of serious game development. Issues in usability can drastically impact user experience and thus the learning outcomes associated with serious games. The goal of this paper is to provide serious game developers with an approach to efficiently and effectively apply usability testing into their development process. We propose a three-tiered approach to the assessment of game usability with the addition of assessments playability and learning to traditional usability. Learning or training is the main objective of a serious game and enjoyment is often required when trying to elicit the necessary usage to achieve this goal. Step-by-step procedures and associated measures are provided to assess usability, playability, and learning outcomes concurrently with game development, while taking into account the unique goals and limitations of time, personnel, and budget that small development companies often encounter. Keywords: usability, user experience, serious games

1 Introduction Usability is one of the central elements in the game development process. It is deeply rooted in the overall experience of the player and can affect their interaction with the game. For example, if a player cannot read the text on screen or the controls are difficult to master or are unresponsive, the failure of ensuring good usability detracts from the overall experience. Serious games in particular present unique challenges in regards to effective usability testing. Briefly, serious games have been introduced into a growing number of domains including education, therapy, and personnel training, with the goal of providing a method to supplement traditional means of learning. Accomplishing this goal requires converting the tenets of effective training and learning into game features while utilizing a usable game design. If overall usability fails and all of the player's effort is put toward mastering controls, not much attention and cognitive reserve remains to focus on the actual game content. In small serious games development laboratories, resources are often extremely limited. Constrained by budget, time, and small development teams, usability in serious games often comes as an afterthought. Serious game development is also unique in that it often incorporates researchers who are interested in the science of learning and games. This places further constraints on development as, during the usability process, you need to address two very distinct audiences with two very different needs while creating a singular product. The aim of usability testing, then, is to give meaningful and immediate feedback to the developers while providing useful data for researchers. We have created a procedure for usability testing that addresses the needs of both the developer and the researcher. Our procedure is different from other usability approaches because it addresses the needs of the developer while providing quantitative data for the usability expert. It also addresses the needs unique to serious games, namely usability, playability, and learnability. 1.1 Usability, Playability, and Learnability There is a dynamic interaction of closely related, yet ultimately unique, components that affect the success of the serious game as a training tool. Not only must you assess the game for basic usability, playability and educational merit are also critical. For example, “user interface must not merely be functional or easy to use - it must also be fun!” states game designer Chris Crawford, “if the game interface is clumsy or confusing, the player simply abandons [the game] [1].” There have been a number of scales and measurement techniques developed to assess the usability of any number of distinct systems. Computer systems and programs have been the focus of many usability scales due to their importance in modern society, so there is no shortage of scales available for assessment of various

NOTE: The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-21708-1_70

features. Often, many technology-based companies design their own usability assessments in order to create scales relevant to their particular areas of development. In addition, standardized and validated scales can be used. These scales generally consist of Likert scales that focus on different aspects of usability such as display characteristics, including location of information on screen and legibility; language usage; the ease of interaction with the program and difficulty carrying out desired tasks; how easily the system is learned; general consistency; and other subjective measures associated with how well the system operates. These scales range in length from only several to hundreds of questions of varying specificity. Usability is a more micro approach, focusing on the independent functionalities within individual components of a system. Playability, on the other hand, focuses on a broader sense of overall functionality associated with the integration of several usable tools, allowing for successful and enjoyable interaction with a game. Resnick and Sherer defined play as “entertainment without fear of present or future consequences; it is fun” [2]. For example, if usability testing found that control mechanics for moving a player character and interacting with in-game objects were excellent, yet playtesters are unable to integrate the two functions at the same time, the game is rendered frustrating and unplayable rather than engaging and enjoyable. Playability is therefore a more elusive component to capture. As a holistic experience, playability has been desired in serious games, but there are not any widely used measures for it. There are, however, associated measures that share similar components to those that are of interest in serious game usability testing: these include scales of immersion and presence [3], flow [4], and engagement [5]. Such scales have largely been developed to study the effects of simulations as well as entertainment-based video games on individuals, but also can provide an excellent starting point for assessing playability within the context of serious games. Finally, it is essential that learning outcomes be measured for individuals during the testing phases of the game to ensure that the game achieves its primary objective before too much time, effort, and budget have been committed to the project. Focusing too much attention on adopting the characteristics of the entertainment-based counterparts of a game can result in the sacrifice of learning effectiveness. Poor usability can also impair learning by taxing cognitive resources and decreasing motivation to use the game. Therefore, assessing learning outcomes at various stages during development can help determine what might be the cause of increases and decreases in learning, and help the development team maintain optimal focus on the most important feature of the serious game. 1.2 Functional Balance of Usability Components It is important to maintain a functional balance of each component to promote an optimized level of learning and desire of use. Unfortunately, as mentioned previously, comprehensive usability in serious games is often applied as an afterthought late in the process due to financial constraints and restrictive deadlines. As shortcomings in any of these components can potentially undermine the others, the goals of the serious game will also likely be compromised. As it currently stands, this design approach is ineffective and should be addressed. Though previously developed and validated scales can be very useful to conducting usability testing of your own, pre-made tests can also be limited in their scope. Serious games pose some unique challenges when trying to adopt current scales because they are unlike typical programs subjected to usability testing. It is often necessary to adopt and adapt multiple scales toward measuring each important aspect of the game including the usability, playability, and learning outcome.

2 A Comprehensive Strategy for Usability in Serious Games As a serious games research organization, we strive to uncover the best practices for both design and development. This involves providing practical guidelines for programmers and game designers while cultivating quantitative tools for analysis by usability professionals and researchers. Thus, it is our goal to present developers with a technique to assess usability, playability, and effectiveness that can be applied directly to their own game development cycles. We propose a three-tiered approach to be applied throughout the development process to maximize the effectiveness of serious games in which classic usability, playability, and educational merit are analyzed at several intervals in order to guarantee that these important pieces are not sacrificed at any stage of development. This approach has been designed with small developers in mind and has been optimized to maximize return on investment (ROI). We will provide extensive resources and guidelines that can be used to improve your own internal development models to ensure the development of functional, enjoyable, and effective end products.

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-21708-1_70

2.1 Pre-Development Prior to serious game development, it is important to properly identify the target audience and the user characteristics. Subject matter experts (SMEs) can be useful when compiling a user profile, but it is important to also contact and interview within the target population to assure that potentially important details are not overlooked as SMEs will not know everything. Age, gender, and basic demographic information should be collected as well as any further background information that may influence the interaction with the software. This additional information includes previous knowledge and experience with the material to be covered, gaming experience, reading level, and other user capabilities and limitations, such as disabilities, that may impact interaction with the game. If the background information desired is highly sensitive (religious views, sexual history, etc.), resulting in participants refusing to respond to items, it may be useful to place this survey at the end of the document, as this may increase the response rate due to the amount of commitment the respondent has already made to the study [6] [7]. A cognitive task analysis should be conducted to determine the demands of the task and designs should be implemented that reduce the effort exerted towards secondary tasks such as controls and menu navigation. Also, baseline knowledge in this area should be assessed prior to game development to determine what material is necessary to cover and the amount of time that should be dedicated to each section of the material. Measuring the perceived relevance of the proposed serious game before production is also greatly beneficial. This allows the developer to determine whether the target audience has a need or desire to use a game for learning the material, or whether the game will likely meet a high level of resistance. This user group information can be useful in determining various aspects of the game design from the themes of the art and the style of gameplay to the complexity of language. Small, manageable items such as difficulty of language or controls can potentially have a large impact on user behavior. Use of an inappropriate level of language can quickly alienate the user and negatively affect the user’s desire to use the program as well as their effectiveness at understanding and using the tool. Controls, too, can enhance or diminish the experience. Poor implementation of either of these can negatively impact the desired learning outcome. Overwhelming the user with difficult language, confusing game structure for their skill level, and excessive information can greatly tax the resources of the user, and will likely result in more effort being exerted in figuring out the definitions and controls and less resources available to focus on learning objectives. 2.2 Story Boarding and Paper Prototyping During this phase of development, designers should begin developing concepts for the game in storyboard format, including game design, style, and art while taking general target population and SME considerations into account. Basic principles of human factors and user-centered design should be adhered to while constructing the proposed game model. Generating a chart that lists objectives, game features, implementation, and outcomes can be helpful in this process. Researchers during this phase of development should create a paper prototype of the game as developers design it. A paper prototype is essentially a paper-based version of the game that is capable of demonstrating the structure and elements of the game. This prototype should follow the storyboards and should progress just as the game would from screen to screen. Paper prototyping can be a very useful and cost-effective tool for assessing user reaction to wording, layout, and sequencing or flow of game progression [8], and can be used to experiment with different ideas. Paper prototypes require little technical experience to utilize and are quick and easy to create and modify, unlike computer prototypes that can take a great deal of time and technical skill to modify [9]. Creating the prototype also allows for the first exposure of the game concept to the target users. You should conduct a small focus group of five or six that includes SMEs and members of the target population who play the paper prototype. The information gathered from the prototype trials can provide useful insight to the modifications that should be made prior to software development, and can allow for changes that make the game more useful and accessible to the user. It is important to collect general information and opinions about the game as it stands, the structure, characters, and overall presentation, as well as basic usability feedback that can be helpful in correcting simple errors before further development. During paper prototyping questions should focus on game features, general usability, and ease of understanding. Game features include style of art and gameplay as well as the narrative and plot. Usability issues at this time are largely conceptual and less specific: whether the control scheme seems to fit the game and makes sense, whether the screen order and progression seem logical, and whether the objectives seem clear. Many of these questions can be culled from numerous sources that we have found useful, including items from the System Usability Scale (SUS) [10], Questionnaire for User Interface Satisfaction (QUIS) [11], and Technology Acceptance Model [12]. The items relate to perceived usefulness, behavioral intention, ease of use,


application-specific self-efficacy, enjoyment, opinion of game elements, general usability and playability, and player preferences. At this stage questions can either be asked directly with varying amounts of detail, or a Likert-scale rating system can be employed. To use a Likert scale, we ask that the tester rate each of these items on a 5-point Likert scale, where a 1 indicates "strongly disagree" and a 5 indicates "strongly agree." Scores on Likert scales can be averaged individually or for each subscale. Focused follow-up questions to further understand the testers’ responses should be asked after the questionnaire to collect further information about their problems or suggestions. It is important to take feedback into consideration when there are specific problems that the testers encounter. It is also important, however, to avoid trying to satisfy each individual’s design suggestions, as their personal preferences may not represent the general population and this will only add more work to the development process than necessary. 2.3 Build Alpha 1 Once you have updated the storyboards with the feedback from the focus groups, the programmers create a rough first alpha build of the game. This version of the game lacks any sort of finished art or sound design and serves as an extension of the paper prototype. It should be complete in function only. 2.4 Usability for Alpha 1 Upon completion of a working computer draft, an in-house “game breaking” session should take place. This should be a one-day session involving the researchers and other individuals involved with incorporating usability design principles in the game. Include SMEs in this process if the target audience may have problems operating the game, comprehending the material, or if they pose any additional special needs that may be difficult for the researchers to identify on their own. The “game breaking” process simply involves testing the functionality and limitations of the game. By using members within the organization, testing is relatively fast and affordable compared to recruiting groups of target users and individually directing them through formal testing. In-house sessions do not necessarily call for any form of administered paper measure, but the evaluators should keep questions from the measures in mind, as well as basic human factors and design principles, as they assess the usability of the game. This procedure allows the development team to detect a number of potential problems that may be missed during playtesting due to limited time and the sometimes inhibited exploration that can come with closely observed behavior. It can also provide additional insight into the problems that testers may encounter, allowing for deeper understanding of tester responses during user trials. Errors and bugs found should be logged into a bug tracker, and issues that are determined to be violations of human factors and usability principles, such as difficult controls and fuzzy text, should also be recorded for the programmers to review. After the in-house testing has been completed, a small usability study with as many as five participants pulled from an easily accessible population, such as college students, should be conducted. These participants should all be of good health and sound physical condition. Since this is not conducted in the target population, the goal of this round of usability testing is to catch the glaring usability issues so that future testing can focus on the more succinct, population-specific issues [13]. This phase of testing should take approximately three days to allow time to run participants and collect data. Any required paperwork for participation, such as informed consent, should be completed by testers before interacting with the game. A survey to collect background information should also be administered. Upon completion of this survey, a brief assessment of topic-specific knowledge should be conducted to provide pretest scores. Each participant should then be independently exposed to the game in order to observe opinions that are uninfluenced and unbiased by other participants. There are several methods of administering usability testing, each with their own unique goals. There are those that involve goals to be completed within the game to determine if users can complete a given task in the game with instructions as well as those that involve free exploration of the software to examine how intuitive and easy to use the program is without any guided instruction. While interacting with the game, an observer records their behavior on paper, in particular when participants are having difficulties, asking questions, or demonstrating strong emotion. Using a think-aloud protocol can also be informative to understanding exactly how the user is approaching the task. It can help identify where their focus is drawn and what tasks they find difficult to carry out. Developed by Ericsson and Simon [14], it involves the tester speaking aloud their thoughts and actions as they interact with the game. This procedure can provide useful internal information as to how the user feels or thinks in real time that may not be captured by post-test paper measures. Video recording test sessions can also be very informative and thorough, allowing for excellent future reference, but this can also prove very time-consuming


to analyze especially when the sessions are long [8, 15]. We do not recommend this practice as a session as brief as a half hour can result in over six hours worth of transcription and analysis [16]. While there are advantages to think-aloud studies, there are also several disadvantages, including interference with play due to increased cognitive load, potentially difficult to interpret and unquantifiable data, as well as the threat of altering the flow of the experience. This detrimental phenomenon can be especially apparent when trying to assess engagement and enjoyment, as the talking task disrupts the game-playing experience [16]. Due to the positive and negative aspects of each testing method, it may be beneficial to administer a combination of goal-focused testing with the think-aloud protocol. A modified think-aloud study provides task-specific goals that testers must achieve in addition to instructing participants to speak their thoughts only when they are having difficulty, if they need to ask for instruction, or if they encounter something worth commenting upon. This can minimize interference of the think-aloud process while still allowing insight into the thought process of the user. Learning objectives, like playability, may be difficult to assess this early on in the development process. If there are numerous bugs and usability issues, this will greatly detract from the learning outcome by increasing the effort to operate the game, thereby decreasing the available effort and attention focused on learning the material. Again, this should be kept in mind when conducting the research, and when determining whether to assess learning goals at this stage. If you do decide to assess learning, the learning assessment should be conducted after playing the game. If teaching declarative knowledge is the goal, multiple choice and matching questions are adequate for assessing learning outcomes. If, however, the learning goals are intended to be dynamic or metacognitive in nature, the assessment format should be free response questions or completion of related transfer tasks. Collecting this data will allow for pretest/post-test observations of changes in performance and can help determine whether the learning goals are being achieved during each stage of testing. If there is strong reason to believe that learning goals will be seriously affected by conducting simultaneous usability measures, such as a large amount of content or highly complex content covered, additional participants can be recruited who only play the game and complete the learning assessment. Once the learning assessment is completed, surveys should be distributed to the participants for completion. The length and order of questionnaires distributed can also prove to be important when conducting usability research. Very long surveys can diminish patience in participants, resulting in a lack of willingness to complete surveys—especially after long, aggravating sessions of usability trials [10, 11]. When participants have been performing tasks for long periods of time, their responses to questionnaires can become influenced by their fatigue or desire to be done with testing, which may result in inaccurate data being collected and items being skipped. To avoid some of these problems, it is important to try to make the questionnaires as relatively short as possible while still receiving the necessary feedback. When conducting usability testing, we use sections from both the SUS and the QUIS that we have updated and modified for the gaming domain, as well as additional items from various other scales. The SUS was generated to provide quick assessment of usability without taxing the tester for long periods of time [10]. Though it provides a set of validated measures concerning usability, it was not designed with serious games in mind, and thus it lacks greater depth that is desired when conducting gaming research. The QUIS on the other hand is a very extensive usability scale that consists of many specific areas of usability and numerous questions. It has been scaled back in length on several occasions, but once again focuses largely on systems usability rather than gaming usability. For that reason, additional enjoyment and engagement questionnaires are combined in order to collect data on playability. Depending on the amount of bugs and usability issues expected, it may be difficult to collect meaningful data about the playability at this stage. Engagement and flow, two subscales associated with playability, can be greatly hampered by issues of usability and the presence of bugs. Furthermore, the unfinished quality of the game, art, and sound, for example, may also elicit negative responses. These scales are therefore not of great importance this early on, though they may still provide insight and give a starting score to compare with later builds of the game. Once all the necessary data has been collected a one-day process of data entry and analysis should take place. Bugs found by testers should once again be recorded in the bug tracker. Qualitative and quantitative data should be analyzed and included in the report for the programmers, and should outline findings with interpretations, usability issues, and specific suggestions for improvements. 2.5 Build Alpha 2 The programmers need to be provided with the usability report. The researchers’ suggestions may not always be practical and working with developers directly can help the team decide upon more simple yet effective solutions. Programmers will then be allotted time to address and fix all of the issues found during testing in the Alpha 1 build. This version still features place-holding artwork, yet is functionally complete, usable, and free of most major bugs.


2.6 Usability for Alpha 2 A second round of usability testing will be conducted once again in small groups of about five individuals, however this time the individuals will be pulled from the target audience. This is to ensure that any populationspecific usability issues are addressed. At this stage, learning objectives will also be reassessed to ensure that the game is not only usable, but effective. The usability testing will proceed in the same manner as Alpha 1, with only slight differences. These differences are mostly in the additional learning assessment and playability scales that are implemented if they were not incorporated earlier. In terms of interpretation, these scales are now far more important during this stage of development. For these reasons, testing from the representative population is desirable, as these scales have more meaning when applied to the desired testing group. Additional focus questions may also be included in the interview portion where it is determined that more user feedback is desired. Again, a report is prepared and presented to programmers. 2.7 Build Beta After all major issues from the report have been corrected, development enters the Beta development stage. During this stage, the majority of intense development occurs. For example, all of the art is completed and placed in the game, all menus are finished, and all elements of the whole game experience are included and complete. 2.8 Usability for Beta This round of usability testing will once again draw upon approximately five individuals from the target population to ensure that all issues have been addressed. The process is identical to that of Alpha 2 usability testing. A final report is prepared with any lingering issues and is provided to the programmers. 2.9 Final Build Any issues still present from the beta usability study are addressed and a final draft of the game is prepared. Then conduct one final round of usability testing that will take place solely in-house using researchers and onstaff usability experts. This is to ensure that the final draft is free of any and all bugs, that all issues have been fixed, and that all of the goals of the game have been met. Any last minute adjustments should be implemented; however, at this stage, the game is ultimately complete.

3 Conclusion Usability testing is a critical process necessary for developing effective serious games. With the addition of measures for playability and learning outcomes, it is possible to improve the design process while also ensuring the development of a successful training and learning tool. Our procedure provides structure and measures that are both effective and efficient in the development of well-rounded serious games. The procedure also accounts for the restrictions of small serious game development companies specifically, providing a process that is timeefficient and cost-effective, while requiring minimal personnel. We hope to provide the industry of serious games a useful design process to improve the quality and success of their product. By incorporating effective development principles and usability testing, serious games can ensure the development of games that are both functional and successful in producing the desired learning outcomes.

References 1. 2.

Crawford, C.: Lessons from Computer Game Design. In: Laurel, B. (Ed.) The Art of Human-Computer Interface Design, pp. 103-111. Addison-Wesley, Reading (1999) Resnick, H., Sherer, M.: Computerized Games in the Human Services – An Introduction. In: Resnick, H. (Ed.) Electronic Tools for Social Work Practice and Education, pp. 5-16. The Haworth Press, Bington (1994)


3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

Witmer, B.G., Singer, M.J.: Measuring Presence in Virtual Environments: A Presence Questionnaire. Presence: Teleoperators and Virtual Environments vol. 7, pp. 225-240. MIT Press, Boston (1998) Jackson, S.A., Eklund, R.C.: The Flow Scales Manual. Fitness Information Technology, Morgantown (2004). Brockmyer, J.H., Fox, C.M., Curtiss, K.A., McBroom, E., Burkhart, K.M., Pidruzny, J.N.: The Development of the Game Engagement Questionnaire: A measure of engagement in video game-playing. J. Exp. Soc. Psychol. 45, pp. 624634 (2009) Dillman, D.A.: Mail and telephone surveys: The total design method. Wiley, New York (1978) Sudman, S., Bradburn, N.B.: Asking Questions: A practical guide to questionnaire design. Jossey-Bass Publishers, San Francisco (1982) Shneiderman, B., Plaisant, C.: Evaluating Interface Designs. In: Shneiderman, B., Plaisant, C.: Designing the User Interface, pp. 140--171. Pearson Education, Boston (2005) Medero, Shawn. Paper Prototyping. A List Apart. January 23, 2007 Brooke, J.: SUS: A Quick and Dirty Usability Scale. In: Jordan, P.W., Thomas, B., Weerdmeester, B.A., McClelland, I.L. (Eds.), Usability Evaluation in Industry, pp. 189-194. Taylor & Francis, London (1996) Chin, J.P., Diehl, V.A. Norman, K.L.: Development of an instrument measuring user satisfaction of the humancomputer interface. SIGCHI ‘88, pp. 213-218. New York (1988) Yi, M.Y., Hwang, Y.: Predicting the use of web-based information systems: self-efficacy, enjoyment, learning goal orientation, and technology acceptance model. Int. J. Hum.-Comput. Stud, 59, 431-449 (2003) Krug, S.: Don't Make Me think: A Common-Sense Approach to Web Usability. New Riders Publishing, Indianapolis (2006) Ericsson, K.A., Simon, H.A.: Protocol Analysis: Verbal reports as data. Bradford Books/MIT Press, Cambridge (1984) Preece, J., Rogers, Y., Sharp, H: Interaction Design: Beyond human-computer interaction. John Wiley & Sons, New York (2002) Hoonhout, H.C.M.: Let the game tester do the talking: Think aloud and interviewing to learn about the game experience. In Isbister, K., Schaffer, N.: Game Usability: Advice from the experts for advancing the player experience, pp. 65-77. Morgan Kaufmann, New York (2009)

Full Citation: Olsen, T., Procci, K., & Bowers, C. (2011). Serious games usability testing: How to ensure proper usability, playability, and effectiveness. In A. Marcus (Ed.), Lecture Notes in Computer Science: Vol. 677. Design, User Experience, and Usability. Theory, Methods, Tools, and Practice, part II (pp. 625-634). Heidelberg, Germany: Springer. doi: 10.1007/9783-642-21708-1_70


Serious Games Usability Testing: How to Ensure ... - Semantic Scholar

Serious Games Usability Testing: How to Ensure ... - Semantic Scholar

Suggest Documents

Serious Games Usability Testing: How to Ensure ... - Semantic Scholar

Usability Testing for Serious Games: Making Informed Design ...

Classifying Serious Games - Semantic Scholar

Classifying Serious Games - Semantic Scholar

Remote usability testing - Semantic Scholar

A Pedagogical Approach to Usability in Serious Games | SpringerLink

A Pedagogical Approach to Usability in Serious Games | SpringerLink

Serious Games for Emergency Preparedness - Semantic Scholar

Serious Games for Sexual Health - Semantic Scholar

Creative Learning with Serious Games - Semantic Scholar

Designing Serious Games for Cognitive ... - Semantic Scholar

Serious Games for Mental Health - Semantic Scholar

Games and Serious Games in Urban Planning - Semantic Scholar

Serious games: online games for learning - Semantic Scholar

Developing serious games specifically adapted to ... - Semantic Scholar

Chapter # USABILITY TESTING OF INTERACTION ... - Semantic Scholar

Virtual Prototypes in Usability Testing - Semantic Scholar

Serious Games

How to Ensure That National Radon Survey ... - Semantic Scholar

Serious Games: Serious Opportunities - autzones.com

Usability of Serious Games for Health - IEEE Xplore

Applying usability testing techniques to improve a ... - Semantic Scholar

how serious games inspire new educational frameworks

Applying Website Usability Testing Techniques to ... - Semantic Scholar