PERFORMANCE IN SITU: PRACTICAL APPROACHES TO EVALUATING LEARNING WITHIN GAMES. P.G. Schrader, Ph.D.* Michael McCreery, Ph.D. & David Vallett, Ph.D. Department of Teaching and Learning, UNLV, Las Vegas, Nevada
ABSTRACT The literature is replete with studies that examine learning and performance as a consequence of game play. This venerable line of research has identified numerous outcomes, both positive and negative, and linked them to a myriad of video games and genres. However, the field has yet to actualize the claims of researchers who lauded the educational potential of serious games, as well as games in general. Although there may be a variety of reasons for this paucity of research, we offer suggestions for improving and focusing studies with games by incorporating perspectives informed by humancomputer-interaction literature. Specifically, we provide a foundation for three steps to improve research within games. First, researchers typically fail to define the context appropriate to the methods, questions, and research overall. This results in a conflation of games genres. Said succinctly: not all games are the same. At a minimum, researchers must account for the designed and emergent affordances while evaluating performance. Second, research is often conducted using more traditional methods and designs, neither of which effectively captures the complexity of the systems in which learners interact. Our approach involves performance data that can be linked to change over time during play. Third, researchers typically fail to account for the totality of learning that takes place within video games. For social and immersive games, evaluation of performance must include variables beyond those that are strictly cognitive, like visuo-spatial navigation, behavioral performance, and social interactions. We present practical approaches to evaluating these, and other, variables. Collectively, this chapter will provide a framework for evaluating learning and performance within games.
Keywords: Games, performance, assessment, evaluation
*
Corresponding Author: 4505 S. Maryland Pkwy. Las Vegas, NV 89154
[email protected]
2
Schrader, McCreery, & Vallett
INTRODUCTION Although the fields of education and psychology continue to add to the depth and richness of what we know, one point is clear: learning is a complex endeavor. This has been articulated repeatedly, from a variety of traditions, perspectives, and epistemologies (e.g., Alexander, Bandura, 1997; Bloom, Englehart, Frost, Hill, & Krathwol, 1956; Brown, Collins, & Duguid, 1989; Gibson, 1986; Kant, 2008). It follows that a principal objective of evaluating learning is to determine which variables and contexts predict desired outcomes. However, the evaluation of learning is more difficult due to the inferential nature of assessment, measurement, and observation. Further, the increased use of dynamic and interactive contexts, like videogames, has exacerbated issues with predicting outcomes. Fortunately, it has also become evident that these same environments have beneficial roles in the understanding of both the nature of learning and performance, as well as their assessment. This chapter outlines a framework to adapt or align assessment techniques with the constraints and affordances of complex systems, and describes three broad and useful steps when conducting research with or within games. Further, we expand upon traditional practices by integrating a human-computer interaction (HCI) perspective. According to HCI, performance is based on a complex set of constructs that include individual states, traits, and mental representations. In addition to user characteristics, HCI outlines the importance of the physical and virtual environments, including hardware, software, content, and context. As such, this chapter incorporates the importance of user characteristics in terms of learning with and within games (e.g., cognitive, state and trait influenced learning), as well as the notion of technology acceptance and its impact on usage. Further, this chapter will address games’ potential as both a research context and as a means to capture performance data, over time, pursuant to determining outcomes.
HUMAN-COMPUTER INTERACTION Carroll and Campbell (1989) define human-computer interaction (HCI) as a “content-domain taxonomy of psychology” that explores “how human motivation, action, and experience place constraints on the usability of computer(s)” (p. 247). In other words, HCI is a field in which users are considered in tandem with the technology rather than as separate and isolated. A principal goal of HCI is to generate evidence to inform design of hardware, software, and these environments. This is accomplished through rigorous studies of outcomes and performance, studies in which users’ experiences are defined as bidirectional relationships between individual characteristics (i.e., mental models, states, and traits) and elements of design (i.e., hardware, software, content, and context). It is evident from the literature that desired outcomes and design goals are most obviously achieved when the technology becomes invisible. Here, invisible refers to a state in which technology functions as a means to exert users’ intent while their perceptual focus is elsewhere. More importantly, all outcomes are the result of individual differences working in concert with a mental model of the system that includes a clear understanding of constraints and affordances (Staggers & Norcio, 1993). By extension, systems become “easy to learn, facilitate skills transfer, and encourage the development of expertise in only short exposure times” when users interact with and within them (Jih & Reeves, 1992, p. 43). Further, an
Performance In Situ: Practical Approaches to Evaluating Learning Within Games
3
optimal system is one in which the users’ interactions are a direct expression of their intended goals; the system provides no barriers or obstacles to their intentions. In the field of video games research, examples of games that approach optimal levels of HCI include the Halo series and Super Mario Run, which seamlessly combine game controls with the user experience. Player intentions are easily expressed. By contrast, examples of poor designs include the unmodified World of Warcraft (WoW) UI and original Pokémon Go interface. In WoW, a high level champion has access to more than 100 skills, spells, abilities, consumable items, and other items while the original interface allowed users to configure 12 keys. In Pokémon Go, users were forced to employ uninformed guesswork because the map and UI failed to show any meaningful information, contrary to expectations.
States & Traits: Examining the Human in Human Performance The first component of HCI relates to individual characteristics (i.e., states and traits). Chaplin, John, and Goldberg (1988) describe states (e.g., situational self-efficacy, state anxiety) and traits (e.g., cognitive ability, personality) as prototypical exemplars. That is to say, they represent categories of personal characteristics that are predictive of human performance. However, states and traits differ substantially in that states are unstable, often short-lived, and change as a function of external experiences. By contrast, traits are characteristic of the individual, stable, internal, and provide cross-situational consistency (Chaplin et al., 1988; Steyer, Schmitt, & Eid, 1999). Jointly, states and traits provide researchers with the tools to examine in situate factors alongside more enduring qualities, both of which provide important insights into predicting of outcomes within complex systems. Reviews of education and psychology literature reveal that there are an abundance of constructs and variables that have been linked to learning and performance outcomes. The vast majority of these constructs are malleable and based on the mindset, or state, of the learner. One pervasive example is self-efficacy. Briefly, self-efficacy is defined as a user’s belief in their ability to successfully complete a task (Bandura, 1994; 1997). This construct has been examined in a myriad of tasks and contexts, including those based in technology (Yi & Hwang, 2003). Overall, the literature has demonstrated that high levels of self-efficacy correspond with engagement in and persistence with tasks. By extension, an improvement in one’s selfefficacy corresponds to improved learning gains. Inversely, mitigating self-efficacy diminishes gains. Numerous researchers in educational psychology have dedicated their careers to the study of self-efficacy and the potential to manipulate it as a state. In addition to self-efficacy, researchers have also identified other important state-based constructs in relation to performance, like self-regulation, goal adoption and orientation, and metacognition (Azevedo, 2009; Pintrich, 2000; Yi & Hwang, 2003). In addition to these somewhat pervasive and traditional cognitive constructs, other important variables have emerged over the last few years. For example, emotions have been shown to play a primary and pivotal role in cognitive performance. Like other states, emotions may be a hindrance or boon to memory, judgment, motivation, and interpersonal actions (Forgas, 2013). Perhaps counter-intuitively, Forgas demonstrated that negative affect (e.g., sadness) yields benefits on cognition. Similarly, state anxiety has been found to correspond with heightened attentional processes (Pacheco-Unguetti et al., 2010) and higher
4
Schrader, McCreery, & Vallett
levels of performance during demanding tasks (Yeo et al., 2014). Alternatively, affective states have been found to lead people to commit judgmental mistakes when cognitive strategies are not readily available (Hunsinger & Ray, 2016). While, threat-related negative emotional states have been found to drive people toward short-term thinking rather than weighing long-term consequences (Gray, 1999) Concomitant with studies that examine the relationship among states and performance, researchers have examined the roles of individual differences and intractable traits in learning. Some traits (e.g., height, eye color, gender, race/ethnicity) are best examined in relation to how learners are perceived by others. As a result, models of learning that include these traits are indirect and mediated by the social variables and constructs of the encapsulating culture. Alternatively, the literature describes a number of traits that relate to learning outcomes and are not directly perceived by others. For example, spatial ability has been linked to a variety of technological learning environments, including Internet navigation (Chen & Rada, 1996; Lawless & Schrader, 2008) to science learning (Schrader & Rapp, 2016; Yang, Andre, Greenbowe, & Tibell, 2003). Specifically, a higher level of spatial ability corresponds to a higher capacity for learning in certain types of tasks and environments. Similarly, researchers across disciplines and areas have argued that individual traits are predictive of performance. For example, there is abundant literature on prior knowledge as it relates to performance (McNamara, 2001; McNamara & Kintsch, 1996; Lawless & Schrader, 2008). For example, McCrae (1996) suggested that while intellectual ability is foundational to children’s learning, intelligence is better defined via curiosity, a facet of the personality trait Openness of Experience, as we age into adulthood. This is due to the fact that much of adulthood is spent applying knowledge and skills acquired throughout childhood in new and different ways. This argument appears to bear out as openness has been found to correlate positively with critical thinking (Bidjerano & Dai, 2007). Likewise, performance research has shown that students who have shown higher levels of conscientiousness in conjunction with introversion are more likely to excel academically (Furnham, ChamorroPremuzic, & McDougall, 2003). A review of the literature will reveal many variables that fall into either the state or trait category; this is an incomplete list. However, overarching trends in state-trait research suggest that when the two are examined in conjunction, they provide both a “right now” and “in general” representation of individual differences and their impact on performance (Edmondson et al. 2013, p.2). Further, examining states and traits collectively allows researchers to acknowledge that ‘‘measurement does not take place in a situational vacuum’’ (Steyer, Schmitt, & Eid, 1999, p. 389) and begins to establish a holistic characterization of process and performance within games.
Dispositions and Beliefs in Human Performance In addition to states and traits, dispositions and beliefs have been linked to performance. Both are more enduring than states, while less permanent than traits. In this way, dispositions and beliefs function as an important component of HCI research. As we have seen, research has demonstrated that both situational and cross-situational factors impact performance. However, any technology-mediated performance requires the user to interact with hardware (i.e., mouse, tablet), software (e.g., graphical user interface [GUI]), and content of the software. The recent resurgence of Virtual Reality (VR) provides several good
Performance In Situ: Practical Approaches to Evaluating Learning Within Games
5
example of each working in concert. Users must don goggles, hold some form of controller, execute commands in a very novel environment, and then interact within that environment. The capability of users to interact with VR relies on the hardware and GUI, as well as the software that merges the two. Although there are a multitude of belief and disposition constructs in the literature, the success of complex interactions, like video games, hinges on two specific user beliefs: perceived usefulness and perceived ease of use (Davis, 1989; King & He, 2006). Specifically, perceived usefulness appears to be the strongest predictor of usage behavior, while perceived ease of use was a mediating variable that influenced behavioral outcomes based on an interaction with perceived usefulness (King & He, 2006). In other words, people perceive technology hardware, software, and content more useful when it is easy for them to use (Bourgonjon et al., 2010). This belief system holds true for video games as well.
STEP 1: DEFINING THE CONTEXT AND FRAME Games and Context McGonigal (2009) provided one of the most encompassing, but descriptive, definitions of games to date. This definition is distinct from others because it is descriptive, without becoming encumbered by unnecessary qualifiers. McGonigal distills the essence of games into four essential traits and characteristics of games, specifically: 1) a goal, 2) rules, 3) a feedback system, and 4) voluntary participation. From a design perspective, developers manipulate the way players experience each trait, giving rise to the vast array of games that exist today. From a cognitive perspective, these essential traits are analogous to affordances and constraints that result in dramatically different experiences for users. Regardless of the perspective, the results vary widely, from single player games with no clear victory conditions (i.e., play until you fail, like Centipede or Donkey Kong) to games designed to immerse you in social aspects of learning (e.g., Quest Atlantis or Whyville; see Gentile et al., 2009; Narvaez, Mattan, MacMichael, & Squillace, 2008). Unfortunately, there is a prevailing tendency among researchers to conflate games and/or types of games when describing trends in research while simultaneously overspecifying details and qualifiers in individual studies. The former issue does little to advance collective understanding, while the latter causes those findings to become too compartmentalized and similarly inadequate in developing knowledge in a broader sense. Sparingly few researchers address the nuanced differences among games, the affordances that distinguish one game from another, or the salient properties across genres in a way that benefits research and meaningful assessment of learning. Instead, researchers often use the overly simple term “videogames” to refer to the broad category of immersive experiences like storyboard side-scrollers (e.g., Super Mario Brothers), interactive comic books (e.g., King’s Quest 2015), or first person shooters (e.g., Halo). In this way, the term “videogames” lacks meaning and specificity that researchers would find useful. Expanding on McGonigal’s description of essential traits, we assert that games are complex systems defined by their affordances and constraints. Further, these systems exist as a context and within a broader context. For example, an immersive space like the WoW is a game and an environment that exists persistently, independent of the lived in world. However,
6
Schrader, McCreery, & Vallett
this game exists in a rich, encapsulating culture, filled with machinima, themed conventions, and real-world relationships. Similarly, we maintain that this perspective applies to all games, even single-player games (e.g., games from the Mario franchise). Although the humancomputer-human interactions are limited in these examples, they are also complex systems and contexts. While these ideas may appear trite and obvious, they are typically overlooked when it comes to the evaluation of learning and performance. However, they bear important implications for the assessment of performance, particularly in terms of defining the theoretical and contextual frames as they relate to data (Schrader & McCreery, 2012).
Theoretical and Contextual Frames In the previous section, we defined games as a system and context. By extension, these systems and contexts appear in research under a variety of theoretical frames. From the literature on video games, it is evident that researchers adopt (implicitly and explicitly) one of three theoretical/contextual frames. Specifically, researchers examine learning from, with, and within technologies like video games (Jonassen, Campbell, & Davidson, 1994; Schrader, 2007; Solomon, Perkins, & Globerson, 1991). More importantly, each perspective corresponds with particular types of questions, methods, and data. Further, the perspectives vary in terms of how each accounts for the very nature of human-computer interactions. The learning from technologies, like games, perspective aligns with an outcomes orientation of research. Here, games provide a means of stimulus, intervention, or information. Research questions focus on outcomes and change, and there are numerous examples of research from this paradigm. Unfortunately, this perspective works with games that are extraordinarily constrained and simplified (e.g., a PPT quiz that has a score, see Adams, Mayer, McNamara, Koenig, & Wainess, 2012). It bears stating that this chapter is not directed toward expanding the already numerous ways to evaluate outcomes under the auspice of learning from games. In fact, given the complexity and importance of the natural, some may even question the value of findings related to these contrived situations. The games that are suitable for this approach are nuanced and fit very small niches. We do not consider these types of games broadly applicable or very interesting. Alternatively, learning with and within games are perspectives that typically accounts for the affordances associated with games from a more complex, systems perspective (Barab, Gresalfi, & Ingram-Goble, 2010; Squire, 2006). These theoretical and contextual frames correspond with cognitive and behavioral partnerships with the technology (Solomon et al., 1991). The immersive frame includes human-computer-human interactions, as well (McCreery et al., 2015; Schrader, 2007). From either frame, researchers adopt questions associated with process and interactions. Research from this perspective rarely focuses on outcomes alone. Rather, research may include outcomes, but also attempts to account for the numerous intervening variables in learning from a holistic, dynamic, and encompassing point of view. From these perspectives, humans interact with games on a deep, immersive, and dynamic level. In multiplayer situations, players interact with the systems and with other players in ways that are as meaningful and legitimate as the lived in world. It is our argument that any research associated with games should endeavor to quantify the myriad of interactions with and within these contexts. Games are complex systems and reflect a unique combination of affordances and constraints, not easily distilled into a single experience.
Performance In Situ: Practical Approaches to Evaluating Learning Within Games
7
Fortunately, the very nature of games and these views corresponds with methods that include performance data, available from the contexts themselves.
STEP 2: CONSIDERING GAMES AS PERFORMANCE DATA One of the consequences of a complex systems view is that game play serves as a rich source of data related to performance and outcomes. More importantly, there are a variety of variables that can only be characterized through performance data. For example, numerous researchers have described games as an opportunity for trial-and-error behavior (Gee, 2003; Squire, 2006). Through performance data, we are able to identify valuable events in games like soft failure (i.e., a temporary interruption of success) that make trial-and-error possible (Laughlin & Marchuk, 2005; Vallett, 2015). In this way, soft failure provides feedback and an opportunity for the agent to adjust their actions, aligning them with their goals and functions like a tuning mechanism. Players attempt an action, fail, adjust their behavior, and re-attempt the action. Although traditional methods of assessment may document a binary outcome (success or failure), they do not capture the attunement or maturation process. Similarly, outcome measures are ill equipped to inform design elements of games (i.e., the constraints and affordances). Alternatively, the performance metrics and methods outlined below are intended to capture these data, in situ, and as they occur.
User Trace Data Access to user trace data has served researchers for decades. Early research leveraged server log files (aka, dribble files), which are generated as users interact with systems. For example, webservers capture the number of times a page is accessed, the time it was accessed, and the user Internet Protocol (IP) address that was used to access pages. Using these data, researchers have drawn inferences from the number of files accessed, the duration of access, and the patterns of access (Lawless & Kulikowich, 1996; Lawless, Schrader, & Mayall, 2007; Schrader & Lawless, 2007). This has the advantage of serving as an objective record of action in a system, from which performance may be inferred. However, Schrader and Lawless (2007) describe numerous challenges associated with deriving meaning from these data. In open systems like the WWW, there is no guarantee that a long page access corresponds with reading that page. Often, users have multiple windows open. Eliminating this possibility by controlling the research environment corresponds with an unfortunate decrease in task authenticity. Further, what do the patterns or the number of hyperlink clicks actually mean? A more serious challenge comes from the fact that game systems increase in complexity and number of recorded interactions rises. By consequence, dribble files become massive and overwhelming. As a result, more advanced data mining techniques are involved in using these files as sources of performance data. Ultimately, the ease of acquiring these datasets is somewhat offset by their inferential nature and increasing complexity. Another example of user trace data involves recording user input (e.g., keylogging or mouse tracking). One obvious advantage is that researchers do not need access to server data, making authentic research more viable. For example, very few individuals outside Verant or Sony Online Entertainment (the companies responsible for the EverQuest series) may have
8
Schrader, McCreery, & Vallett
access to the server files of EverQuest I and II. By contrast, all researchers are one download away from authentic user input data (i.e., keylogging), making research within this virtual world much more practical. Unlike dribble files, which are almost usually linked to server content in some way, user input data are often decontextualized. This presents the challenge of aligning what is recorded as input with what happened in the virtual world. It isn’t relevant to know what strings of characters were typed if the researcher doesn’t know what happened. As a result, researchers must take special care to create or record a link between input and outcome within the virtual space. While powerful in their ability to demonstrate the human portion of the HCI, keylogging and other input data are only appropriate in conjunction with the actualization of users intent within the system. As a result, a record of the actualized behaviors is appropriate and advised.
Direct Observation Originally described in literature pertaining to classroom observations, the direct observation method has been used more recently in dynamic, evolving contexts like the WoW (McCreery, Vallett, & Clark, 2015). In the broadest sense, direct observation is a process that logs behavior that occurs during a pre-set duration or experience. The greatest hurdle is defining the behaviors and drawing meaning from those behaviors. In the example of McCreery and colleagues (McCreery, Krach, Schrader, & Boone, 2012; McCreery, Schrader, & Krach, 2011; McCreery et al., 2015), a matrix was developed in advance of coding the data. Specifically, literature on personality and behavior were examined in light of events that a viewer might witness in WoW. As a result, directly observable behaviors, like communicating with a friend or navigating the environment after using a map, served as evidence of actualized personality within the virtual space. These potential behaviors were compiled into the Behavioral Assessment Matrix (BAM), which served as the coding rubric for game play. McCreery, Vallett, and Clark (2015) were careful to describe the key challenges to using this approach. Pragmatically, research of this ilk requires devices that are simultaneously capable of running the game and recording it. Most systems available in research labs on college campuses are not capable of managing the compounded load without issues of lag. However, relatively speaking, technology is only a minor obstacle. The principal issue is labor. Even when using time sequences to segment the observations (e.g., code an observation every 20 seconds for the first ten minutes, the middle ten minutes, and final ten minutes), there is a significant investment in time and energy to code these sessions. Further, one recording of game play yields hundreds or thousands of entries, each of which is observed by a human and coded appropriately. Additionally, this process necessarily involves some redundancy for inter-rater validation. When a researcher accounts for the number of sessions that must be coded to obtain statistical power, the issue of labor is compounded. However, for performance data that are observable and suitable for coding in this way, the BAM approach yields rich and valuable data. Although labor is the main issue with direct observation, the benefits include a contextualized account of performance that includes situational issues that are not explainable using other methods.
Performance In Situ: Practical Approaches to Evaluating Learning Within Games
9
Biometrics As agents interact within games, it is vital to remember that a human is interacting with physical devices. By extension, humans exhibit physiological responses to stimuli and events that have been observed using biometric devices (Mirza-Babaei, Long, Foley, & McAllister, 2011). There exist several devices that are intended to capture the physiological response to stimuli. A quick search will reveal no shortage of devices, including mice that capture pulse-rate, sensors that capture galvanic skin response, bands that measure breathing rate, cameras that record pupil dilation, eye and gaze tracking, and caps that record craniocerebral activity. The principal advantage of using biometric data comes in terms of objectivity. Although data of this sort have been criticized for lacking context, these data are more direct and require less inference when compared to other approaches (e.g., survey methods, qualitative inquiry). Using biometrics, researchers have the ability to make more compelling connections among physiological traits and cognition (Gualeni, Janssen, & Calvi, 2012). Most importantly, biometric data are collected in time, as performance occurs. When linked to events within a game, these data provide an excellent resource to evaluate performance. One prime example is the use of eye or gaze tracking in data collection. Eye tracking data collection has been used in a variety of contexts, including literacy, web navigation, and user interface design (Lucignano, Cuendet, Schwendimann, Shrivani Boroujeni, & Dillenbourg, 2014; Molina, Redondo, Lacave, & Ortega, 2014; Zawoyski, Ardoin, & Binder, 2015). Fundamentally, eye tracking measures the gaze of a user as they view material on a screen. In this way, researchers can determine the duration and frequency of gaze as content is presented. One outcome is a heat map that reveals the tendency and duration of gaze through the use of color, where red is used to represent high frequency gazes and blue is used to represent lower frequency gazes. From these data and data visualizations, researchers infer attention and concentration associated with screen-based information. Like eye tracking, other biometric tools have excellent potential. However, these research protocols can be complicated. In general, biometric devices range from noninvasive (e.g., heart rate monitoring via a mouse) to highly invasive and potentially uncomfortable for the user (e.g., electro-encephalographic, or EEG, head set). In the latter case, fatigue, headaches, and discomfort must be taken into account during studies. In certain circumstances, the biometric devices are not compatible with all users. For EEG headsets, skull size and hair thickness can negatively impact results. For some eye tracking hardware, eye color and race/ethnicity can invalidate results. Although biometric data are objective, they still require considerable inference to determine what the data mean. For example, it is clear that eye tracking data reveal where the user was gazing, and for how long, questions associated with intent, learning, understanding, etc., remain a matter of inference. Beyond design oriented challenges, there are numerous pragmatic issues to address. Not surprisingly, some devices are incredibly expensive, requiring grants, partnerships, or other support. A few tools require sponsorship from or partnerships with a hospital or medical research center (e.g., functional magnetic resonance imaging (FMRI), or functional near infrared spectroscopy (FNIRs)). In many cases, institutional review boards (IRB) will categorize biometric research as medical research. This is a departure from the typical sociobehavioral research conducted by psychologists. At a minimum, this requires different forms and research assurances, resulting in additional planning time well beyond what may be
10
Schrader, McCreery, & Vallett
considered normal for researchers. Regardless of their limitations, biometric data has the potential to offer insight into a variety of questions associated with video games, including those linked to learning, engagement, immersion, learning, activity, and distraction, particularly when biometric data are contextualized within virtual spaces and triangulated with other data (Mirza-Babaei, Long, Foley, & McAllister, 2011).
Think Aloud Protocols Another means to capture data as users interact with technology comes from literature in cognitive psychology (Ericsson & Simon, 1980; 1993). Specifically, think aloud protocols involve users narrating their actions associated with authentic tasks. This narration has been collected in real time (i.e., concurrent) or after the task is completed (i.e., retrospective). In terms of technology, think aloud protocols have the advantage of yielding data linked to cognitive processes, which are otherwise difficult to capture. Think aloud protocols have been criticized for a variety of reasons. Considerable training is required to execute the protocols in a way that minimizes threats. Regardless of how well they are designed and executed, concurrent think aloud protocols run the risk of drawing the user’s attention toward the research questions and away from the task and environment. While this is avoided during the retrospective think aloud protocol, considerable fidelity is lost and the rationale to conduct a think aloud is much less convincing. Further, think aloud protocols are also limited to the questions that guide the narration. Lastly, formal protocols require that narrations are transcribed and then analyzed. As such, this can be both laborious and subjective.
Ethnographic Data and Text Interactions Much of this chapter has focused on an implicit goal of direct and objective data that requires the least inference possible. However, there is no perfect solution and it is often useful to consider other methods of performance data collection. Some researchers have applied ethnographic approaches within video games (see Steinkuehler et al., 2010 regarding Lineage I and II). Few protocols yield data as rich as a complete ethnography. However, this type of research is time consuming and laborious. Data collection takes a significant amount of time and the analysis of those data is also cumbersome. Most importantly, cognitive ethnographies are appropriate to contexts about which we have little information. Given all of the work that has been done with video games, this may not be the most judicious approach to performance data collection. Alternatively, researchers have also used the text-based chat and communication inherent to many massively multiplayer online games (MMOs) to capture social data and interactions. For example, researchers examined games like WoW as vehicles for language acquisition (Zheng, Bischoff, & Gilliland, 2015; Zheng & Newgarden, 2012). In this case, the data function as a record of vocabulary usage and language acquisition within a rich social context. Like most research in authentic environments, a prevailing issue is whether or not the researcher has access to the text files and records of the interactions within the system.
Performance In Situ: Practical Approaches to Evaluating Learning Within Games
11
Back-End Data Finally, a complex but highly effective means of measuring performance in games, with potential for predicting out of game performance, exists in the form of computational modeling from back-end user data. Unlike dribble files or log files, which have limited entries based on researcher-defined variables, back-end data refers to all of the variables that were created for the system to function. In this way, back-end user data are highly complex and interrelated. The majority of these files require unique approaches to grappling with the data complexity. This is evident from a proof of concept case, in which Lamb, Annetta, Vallett, and Sadler (2014) used data from a Serious Educational Game to model HCI and predict specific task success. In brief, players of Mission BioTech attempt a number of drag and drop tasks within the game, which can then be coded as: successful (1) and unsuccessful (0). Tasks were treated as if items on an instrument, factor analyzed, psychometrically validated, converted to a Q-Matrix, and then presented to an Artificial Neural Network for propagation. The resulting attributes allowed measurement and Bayesian prediction of player engagement (stated as ‘flow’), control operations, and science process skills.
STEP 3: DRAWING INFERENCES FROM COMPLEX DATA SOURCES There are numerous examples that outline methods to draw inferences from data in traditional outcome-based research contexts. Briefly, researchers leverage t-tests, variance tests, and multivariate time series, or regression analyses to draw inferences about their research questions. Although the previous list is far from complete, there are new challenges when one adopts a view that learning is complex, holistic, and dynamic within immersive environments. From the HCI perspective, procedural data are suitable for numerous analytic techniques beyond those that are traditionally used in educational research and are outlined below.
Multiple Regression Analysis There are numerous texts and training materials available that address the nuances of multiple regression. Applying a multiple regression analysis to data is suitable when researchers endeavor to create a model that predicts an outcome based on two or more other variables. This approach is often suitable for research in terms of performance within games, especially when performance data can be coded or aggregated in a somewhat simple way. However, video games are interactive and dynamic, involving multiple users. The number of variables involved, particularly from the HCI perspective, can become unwieldy. As a result, models become highly complicated and alternative approaches may be appropriate.
Structural Equation Modeling (SEM) Ideally, there are demonstrable relationships among the variables identified a priori to the gaming experience and the nature of play, especially the decisions made during play. SEM refers to a set of algorithms and approaches that are used to examine patterns in data. For example, confirmatory factor analysis (CFA) is used when researchers theorize that there
12
Schrader, McCreery, & Vallett
are latent structures within the data. CFA is useful when there is reason to believe that a) there exist a limited number of detectible constructs in the data and b) these constructs are meaningful in some way (e.g., predicting performance overall). Alternatively, Path analysis extends beyond the structure (i.e., CFA) and models the relationships among variables. In video game research contexts where there are numerous variables, Path analysis is useful to establish the direct and indirect effects of exogenous (independent) variables on endogenous (dependent) ones. Path analysis is uniquely equipped to demonstrate which variables, if any, mediate the relationships and in what way. Ultimately, Path analysis characterizes the directional dependencies in a system and is suitable for analysis of performance data and associated variables. In terms of game performance, Path analysis is useful to determine which variables impact performance and how they do so.
Bayesian Modeling Similar to the Path analysis subset of SEM, Bayesian modeling offers an indication of the trajectory and/or connections among events in a system. However, the two statistical methods differ in terms of the nature of relationships among exogenous and endogenous variables. In Path analysis, relationships between nodes are identified and designated by directional correlations. By contrast, Bayesian models are defined by probabilistic links. With respect to game performance, when a player starts in a certain condition (a node), they have options to engage in a subsequent behavior (i.e., the next node). Bayesian modeling assigns probabilistic functions to the likelihood of selecting any one of those second nodes as a result of beginning with the first. Said another way, a Bayesian model would quantitatively describe the chance for someone who selected “priest” as their character class to attack the next player they see, a very unlikely scenario. In this way, Bayesian modeling also describes a path, but this path is more related to a Markov chain (aka Drunk Walk) than any kind of correlation.
Data Mining With the added complexity of data and ability to capture performance data, datasets become massive and unwieldy. If the previous methods become unmanageable, data mining techniques may be necessary. In general, data mining refers to the discovery of patterns and relationships within data or databases through the implementation of algorithms involving statistics and nested, recursive predictions (Fayyad, Piatetsky-Shapiro, & Smyth, 1996). Quite often, data mining examines the frequencies, patterns, correlations, or associations among elements within the data. Alternatively, algorithms may leverage Bayesian statistical techniques as a means to predict relationships in terms of probabilistic likelihood. While data mining is gaining favor in a world of data abundance, there are many challenges to consider. Like other methods, there are issues with labor. Although much of data mining is automated, humans are responsible for identifying likely relationships, establishing algorithms, and pursuing the patterns. These algorithms are often quite complex, involved, and nuanced. Further, there are layers of challenges when bridging data types (e.g., biometric vs. trace data). Additionally, the patterns detected may not yield any practical significance; the output requires secondary analysis. In terms of the automated discovery, there are also some challenges associated with artifacts within the data, treatment of outliers, and noise in the data. Ultimately, researchers
Performance In Situ: Practical Approaches to Evaluating Learning Within Games
13
deal with monumental sets of data and these issues need to be addressed on the execution of the knowledge discovery in concert with subsequent analysis and interpretation. Challenges aside, data mining has the potential to identify, confirm, or discover the characterization, discrimination, association, classification, clustering, outliers, and/or trends within performance data generated by performance within games. However, the approach should be pursued with purpose and care. Numerous guides have been published elsewhere (for example, see Cios, Pedrycz, Swiniarski, & Kurgan, 2007).
SUMMARY AND DISCUSSION Throughout this chapter, we have also asserted the importance of establishing a germane contextual frame for research with and within games. From our theoretical lens, the frame is generated through the detection and exploitation of constraints and affordances that are latent within each system. The HCI literature asserts a partnership among several factors, including states, traits, dispositions and beliefs, and the technology. Although there are several types of questions and associated frames, an HCI perspective compels researchers to consider the technology in terms of a context and not just a tool or intervention. Researchers are also reminded that these data are created through the enactment or actualization of user intent; they are performance and process data. As such, they are unlike traditional outcome measures and should be treated differently. Both process and outcome data can be used to describe learning and performance within games. As some have observed in the past, the very technology that is the focus of study also yields unique opportunities for producing or capturing data. The video game is simultaneously a research context and data collection tool. The methods we describe and the list we provide are not intended to be exhaustive. Rather, we highlight methods that move toward objectivity in data, which is accomplished through direct observation, back-end data, and similar sources mentioned here. Further, we focus on data sources and methods that have not received equal attention in the literature, for whatever reason. In this way, we hope to add to the researchers’ repertoire of data collection strategies and analytic techniques and empower them to engage in more predictive, rigorous studies of video game contexts. Ultimately, this chapter has been devoted to elucidating three steps as part of a framework for evaluating learning and performance within games: 1) define the context and frame, 2) consider games as performance data, 3) draw inferences from complex data. Overall, these steps shape the research questions, the data collected, and the manner in which those data are used to address the questions. Implicit in this discussion is the understanding that there is no single, perfect approach to evaluating performance and outcomes from games. Rather, we have suggested that treating games as complex, dynamic environments demands complex models and approaches to research. Further, triangulating findings while mixing these methods may provide greater insights.
14
Schrader, McCreery, & Vallett
REFERENCES Adams, D. M., Mayer, R. E., McNamara, A., Koenig, A., & Wainess, R. (2012). Narrative games for learning: Testing the discovery and narrative hypothesis. Journal of Educational Psychology, 104, 235-249. Azevedo, R. (2009). Theoretical, conceptual, methodological, and instructional issues in research on metacognition and self-regulated learning: A discussion. Metacognition Learning, 4, 87. doi: 10.1007/s11409-009-9035-7 Bandura, A. (1994). Self-efficacy. In V. S. Ramachaudran (Ed.), Encyclopedia of Human Behavior (Vol. 4, pp. 71-81). New York: Academic Press. Bandura, A. (1997). Self-efficacy: The Exercise of Control. New York: W. H. Freeman. Barab, S. A., Gresalfi, M., & Ingram-Goble, A. (2010). Transformational play using games to position person, content, and context. Educational Researcher, 39(7), 525-536. Bidjerano, T., & Dai, D. (2007). The relationship between the big-five model of personality and self-regulated learning strategies. Learning and Individual Differences, 17, 69–81. Bloom, B.S., Englehart, M.D., Frost, E.J., Hill, W.H., & Krathwol, D.R. (1956). Taxonomy of Educational Objectives. Handbook I: Cognitive Domain. New York: David McKay. Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18 (1), 32-42. Carroll, J. M. & Campbell, R. L. (1989). Artifacts as psychological theories: The case of human-computer interaction. Behaviour & Information Technology, 8 (4), 247-256. Chaplin, W. F., John, O. P., & Goldberg, L. R. (1988). Conceptions of states and traits: dimensional attributes with ideals as prototypes. Journal of Personality and Social Psychology, 54 (4), 541. Chen, C., & Rada, R. (1996) Interacting with hypertext: A meta-analysis of experimental studies. Human-Computer Interaction, 11 (2), 125–156.
Cios, K. J., Pedrycz, W., Swiniarski, R. W., & Kurgan, L. (2007). Data Mining: A Knowledge Discovery Approach. New York: NY. Springer. Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly, 319-340.
Ericsson, K., & Simon, H. (1980). "Verbal reports as data". Psychological Review. 87 (3): 215–251. Ericsson, K., & Simon, H. (1993). Protocol Analysis: Verbal Reports as Data (2nd ed.). Boston: MIT Press. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17 (3), 37-54. Forgas, J. P. (2013). Don’t worry, be sad!: On the cognitive, motivational, and interpersonal benefits of negative mood. Current Directions in Psychological Science, 22, 225– 232. http://dx.doi.org/10.1177/0963721412474458 Furnham, A., Chamorro-Premuzic, T. & McDougall, F. (2003). Personality, cognitive ability, and beliefs about intelligence as predictors of academic performance. Learning and Individual Differences, 14, 49–66. Gentile, D., Anderson, C., Yukawa, S., Ihori, N., Saleem, M., Ming, L. K., et al. (2009). The
Performance In Situ: Practical Approaches to Evaluating Learning Within Games
15
effects of prosocial video games
on prosocial behaviors: International evidence from correlational, longitudinal, and experimental studies. Personality and Social Psychology Bulletin, 35 (6), 752-763
. Gibson, J. J. (1986). The Ecological Approach to Visual Perception. Hillsdale, NJ: Erlbaum. Gray, J. R. (1999). A bias toward short-term thinking in threat-related negative emotional states. Personality and Social Psychology Bulletin, 25 (1), 65-75. Gualeni, S., Janssen, D., & Calvi, L. (2012, May). How psychophysiology can aid the design process of casual games: A tale of stress, facial muscles, and paper beasts. In Proceedings of the international conference on the foundations of digital games (pp. 149-155). ACM. Jih, H. J., & Reeves, T. C. (1992). Mental models: A research focus for interactive learning systems. Educational Technology Research and Development, 40(3), 39-53. Jonassen, D. H., Campbell, J. P., & Davidson, M. E. (1994). Learning with media: Restructuring the debate. Educational Technology, Research, & Development, 42 (2), 31-39. Kant, I. (2008). Critique of Pure Reason (M. Weiglet, Trans.). NY: NY, Penguin Classics. Lamb, R., Annetta, L., Vallett, D., & Sadler, T. (2014). Cognitive diagnostic approaches using neural network analysis of serious educational videogames. Computers & Education 70, 92-104. doi:10.1016/j.compedu.2013.08.008 Salomon, G., Perkins, D.N., & Globerson, T. (1991). Partners in cognition: Extending human intelligence with intelligent technologies. Educational Researcher, 20, 2-9. Lawless, K. A., Brown, S. W., & Cartter, M. (1997). Applying educational psychology and instructional technology to health care issues: Combating Lyme disease. International Journal of Instructional Media, 24 (2), 287-297. Lawless, K. A., & Schrader, P. G. (2008). Where do we go now? Understanding research on navigation in complex digital environments. In J. Coiro, M. Knobel, C. Lankshear, and D. J. Leu (Eds.), Handbook of New Literacies (pp 267-296). Hillsdale, New Jersey: Lawrence Erlbaum Associates. Lawless, K. A., Schrader, P. G., & Mayall, H. J. (2007). Acquisition of Information Online: Knowledge, Navigational Strategy and Learning Outcomes. Journal of Literacy Research, 30 (3), 289-306. Lucignano, L., Cuendet, S., Schwendimann, B., Shrivani Boroujeni, M., & Dillenbourg, P. (2014). My hands or my mouse: Comparing a tangible and graphical user interface using eye-tracking data. In Proceedings of the FabLearn conference 2014 (No. EPFL-CONF-209011). McCrae, R. R. (1996). Social consequences of experimental openness. Psychological Bulletin, 120, 323–337. McCreery, M.P., Krach, S.K., Schrader, P.G., & Boone, R. (2012). Defining the virtual self: Personality, behavior and the psychology of embodiment. Computers in Human Behavior, 28 (3), 976-983. McCreery, M. P., Schrader, P. G., & Krach, S. K. (2011). Navigating massively multiplayer online games: Evaluating 21st century skills for learning within virtual environments. Journal of Educational Computing Research, 44 (4), 473-493. McCreery, M. P., Vallett, D., & Clark, C. (2015). Social interaction in a virtual environment: Examining socio-spatial interactivity and social presence using behavioral analytics. Computers in Human Behavior, 51, 203-206.
16
Schrader, McCreery, & Vallett
McNamara, D. S. (2001). Reading both high-coherence and low-coherence texts: Effects of text sequence and prior knowledge. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 55 (1), 51-62. http://dx.doi.org/10.1037/h0087352 McNamara, D. S., & Kintsch, W. (1996). Learning from texts: Effects of prior knowledge and text coherence. Discourse Processes, 22 (3), 247-288. Mirza-Babaei, P., Long, S., Foley, E., & McAllister, G. (2011, September). Understanding the contribution of biometrics to games user research. In Proc. DiGRA (pp. 329-347). Molina, A. I., Redondo, M. A., Lacave, C., & Ortega, M. (2014). Assessing the effectiveness of new devices for accessing learning materials: An empirical analysis based on eye tracking and learner subjective perception. Computers in Human Behavior, 31, 475490. Narvaez, D., Mattan, B., MacMichael, C., & Squillace, M. (2008). Kill bandits, collect gold or save the dying: The effects of playing a prosocial video game. Media Psychology Review, 1 (1). Retrieved December 3, 2016 from: http://mprcenter.org/review/narvaez-prosocial-video-game/. Pintrich, P. R. (2000). Multiple goals, multiple pathways: The role of goal orientation in learning and achievement. Journal of Educational Psychology, 92 (3), 544-555. http://dx.doi.org/10.1037/0022-0663.92.3.544 Schrader, P. G., & Lawless, K. A. (2007). Dribble files: Methodologies to evaluate learning and performance in complex environments. Performance Improvement, 46 (1), 4048. Schrader, P. G., & McCreery, M.P. (2012). Are all games the same? Examining three frameworks for assessing learning from, with, and in games. In Ifenthaler, D., Eseryel, D., & Ge, X. (Eds.) Assessment in Game-Based Learning: Foundations, Innovations, and Perspectives. New York, NY: Springer. Schrader, P.G., & Rapp, E. E. (2016). Does multimedia theory apply to all students? The impact of multimedia presentations on science learning. Journal of Learning and Teaching in Digital Age, 1 (1), 63-73. Squire, K. (2006). From content to context: Videogames as designed experience. Educational Researcher, 35(8), 19-29. Staggers, N., & Norcio, A. F. (1993). Mental models: Concepts for human-computer interaction research. International Journal of Man-Machine Studies, 38(4), 587-605. Steinkuehler, C., King, E., Alagoz, E., Oh, Y., Chu, S., Zhang, B., et al. (2010). Using a designed online games based affinity space as a quasi-natural ethnographic context and experiment lab. In K. Gomez, L. Lyons, J. Radinsky (Eds.) Learning in the Disciplines: Proceedings of the 9th International Conference of the Learning Sciences (ICLS 2010) Volume 2, Short Papers, Symposia, and Selected Abstracts (pp. 330-331). International Society of the Learning Sciences: Chicago IL. Steyer, R., Schmitt, M., & Eid, M. (1999). Latent state–trait theory and research in personality and individual differences. European Journal of Personality. Yang, E., Andre, T., Greenbowe, T. J., & Tibell, L. (2003). Spatial ability and the impact of visualization/animation on learning electrochemistry. International Journal of Science Education, 25 (3), 329-349. Yi, M. Y., & Hwang, Y. (2003). Predicting the use of web-based information systems: selfefficacy, enjoyment, learning goal orientation, and the technology acceptance model.
Performance In Situ: Practical Approaches to Evaluating Learning Within Games
17
Int. J. Human-Computer Studies, 59, 431–449. Zawoyski, A. M., Ardoin, S. P., & Binder, K. S. (2015). Using Eye Tracking to Observe Differential Effects of Repeated Readings for Second- Grade Students as a Function of Achievement Level. Reading Research Quarterly, 50(2), 171-184. Zheng, D., Bischoff, M., & Gilliland, B. (2015). Vocabulary learning in massively multiplayer online games: Context and action before words. Educational Technology Research & Development, 63(5), 771–790. Zheng, D. & Newgarden, K. (2012). Rethinking language learning: Virtual World as a catalyst for change. The International Journal of Learning and Media, 3(2), 13–36.