Tests of Concepts About Different Kinds of Minds ...

6 downloads 0 Views 4MB Size Report
Daniel T. Levin, Stephen S. Killingsworth, Megan M. Saylor, ... Daniel Levin is a psychologist with an interest in the relationship between knowledge and vision; ...
HUMAN–COMPUTER INTERACTION, 2013, Volume 28, pp. 161–191 Copyright © 2013 Taylor & Francis Group, LLC ISSN: 0737-0024 print / 1532-7051 online DOI: 10.1080/07370024.2012.697007

Tests of Concepts About Different Kinds of Minds: Predictions About the Behavior of Computers, Robots, and People Daniel T. Levin, Stephen S. Killingsworth, Megan M. Saylor, Stephen M. Gordon, and Kazuhiko Kawamura Vanderbilt University

This research investigates adults’ understanding of differences in the basic nature of intelligence exhibited by humans and by machines such as computers and robots. We tested these intuitions by asking participants to make predictions about the behaviors of different entities in situations where actions could be based on either goal-directed intentional thought or more mechanical nonintentional thought. Across several studies, adults made more intentional predictions about the behavior of humans than about the behavior of robots or computers. Although initial experiments demonstrated that participants made very similar predictions for computers and anthropomorphic robots, when asked to track robots’ attention to objects, participants began to predict more intentional behaviors for the robot. A multiple regression demonstrated that differential behavioral predictions about mechanical and human entities were associated with ratings of goal understanding but not overall intelligence of current computers/robots. These findings suggest that people differentiate humans and computers along the lines of intentionality but initially equate robots and computers. However, the tendency to equate computers and robots can be at least partially overridden when attention is focused on robots engaging in intentional behavior.

Daniel Levin is a psychologist with an interest in the relationship between knowledge and vision; he is a Professor in the Department of Psychology and Human Development at Vanderbilt University. Stephen Killingsworth is currently a post-doctoral fellow in the Department of Teaching and Learning with an interest in visual cognition and human–computer interaction. Megan Saylor is a developmental psychologist with an interest in language and conceptual development; she is an Associate Professor in the Department of Psychology and Human Development at Vanderbilt University. Stephen Gordon is a graduate student in Electrical and Computer Engineering with an interest cognitive modeling and AI. Kazuhiko Kawamura is an engineer with interests in cognitive modeling and robotics; he is a Professor in the Department of Electrical and Computer Engineering at Vanderbilt University. 161

162

Levin et al.

CONTENTS 1. INTRODUCTION 2. EXPERIMENTS 1 AND 2: CONTRASTING BEHAVIORAL PREDICTIONS FOR HUMANS, ROBOTS, AND COMPUTERS 2.1. Method 2.2. Results 2.3. Discussion 2.4. Experiment 2 Background 2.5. Method 2.6. Results 2.7. Analysis of Experiments 1 and 2 2.8. Discussion of Experiments 1 and 2 3. EXPERIMENT 3: THE IMPACT OF OBSERVING APPARENTLY INTENTIONAL BEHAVIOR ON ATTRIBUTIONS OF AGENCY TO A ROBOT 3.1. Method 3.2. Results 3.3. Discussion 4. EXPERIMENT 4: REPLICATION AND EXTENSION OF THE IMPACT OF OBSERVING INTENTIONAL BEHAVIOR ON ATTRIBUTIONS ABOUT A ROBOT 4.1. Method 4.2. Results 4.3. Discussion 5. EXPERIMENT 5: REGRESSION ANALYSIS OF SURVEY RESPONSES 5.1. Method 5.2. Discussion 6. GENERAL DISCUSSION

1. INTRODUCTION As people interact with an increasing variety of intelligent technologies, they are faced with an array of machines that are in some ways very similar to people and in other ways very different. This makes it important to understand how people construe the internal processes of artificial minds. Previous research has explored how participants attribute specific knowledge to intelligent artifacts such as robots (Lee, Kiesler, Lau, & Chiu, 2005) and how people are more likely to interact with robots that produce social cues (Bruce, Nourbakhsh, & Simmons, 2002). However, little research has explored their more general attributions about the style of thinking that characterizes computer versus human intelligence. Thus, this article represents the folk-psychological equivalent of the long-standing philosophical debate between those who argue that artificial thinking is fundamentally different from human intelligence (e.g., Searle, 1984) and those who argue that nothing fundamental differentiates the two (e.g., Minsky, 1982). However, instead of asking about the ultimate nature of different kinds of representation and intelligence, we ask how normal adults construe

Predictions About Computers, Robots, and People

163

these different kinds of intelligence, based on their experience interacting with people and intelligent artifacts. In exploring adults’ concepts about agents, we have several specific goals. First, we aim to develop an effective means of assessing concepts about agents that does not rely on ratings using ill-defined terms, or on automatic behavior that might not be related to explicit beliefs. Second, we use this measure to test the degree to which people believe that different agents (humans, robots, and computers) think in fundamentally different ways. In particular, we test the degree to which participants are less likely to attribute intentional (e.g., goal-directed) thought to mechanical agents such as computers and robots than to people. Third, we test the degree to which experience with agents can alter people’s concepts about how agents might think. In two experiments, we test whether observing a robot engage in characteristically intentional behavior leads participants to attribute more intentional thinking to it. Although everyone clearly knows that living and artificial agents such as people and computers are different in many ways, some research suggests that participants perceive marked commonality between computers and people, inasmuch as participants tend to treat computers as social agents. For example, in a set of wellknown studies, Reeves and Nass (1996) found that people obey interpersonal social norms such as reciprocity when interacting with computers, and are even hesitant to criticize a computer ‘‘to its face.’’ Other recent research has demonstrated that people readily anthropomorphize even simple mechanical devices. For example, Epley, Akalis, Waytz, and Cacippo (2008) observed that participants are more likely to rate each of a range of devices as having ‘‘a mind of its own,’’ ‘‘intentions,’’ ‘‘free will,’’ ‘‘consciousness,’’ and emotions when they are experiencing loneliness. Findings such as these led Epley to develop a broad three-factor theory of situational and motivational factors modulating anthropomorphism (Epley, Waytz, & Cacioppo, 2007). Under this theory, people are more likely to anthropomorphize agents when anthropocentric knowledge is accessible, when they are motivated to understand the behavior of the agent, and when they desire social contact. Similarly, Barrett and Keil (1996) demonstrated that participants anthropomorphized two nonhuman entities, God and a fictitious ‘‘massively parallel’’ computer, by implicitly assuming they are spatiotemporally limited, can focus their attention on only one thing at a time and can perform only one task at a time. Findings such as these suggest a relatively broad drive to anthropomorphize a range of agents. However, a contrasting tradition of research suggests that people strongly distinguish between human, intentional thinking, and more mechanical thinking. Research in cognitive development suggests that by the end of the 1st year of life, infants begin to distinguish goal-directed intentional action from actions apparently produced by a nonhuman, or from actions that do not reflect coherent goals (Gergeley, Nadasdy, Csibra, & Biro, 1994; Meltzoff, 1995; Woodward, 1998). Infants also have different expectations about the movement of people and the movement of inanimate objects (Kuhlmeier, Bloom, & Wynn, 2005; Spelke, Phillips, & Woodward, 1995) and can rely upon both static features (e.g., a face) and contingent interaction to follow the gaze of an apparently intentional agent (Johnson, Slaughter, &

164

Levin et al.

Carey, 1998). Many researchers have argued that early contrasts such as these become elaborated into a theory that allows older children to understand the representations underlying others’ behavior (Gopnik, Slaughter, & Meltzoff, 1994; Woodward, 2005). As children develop this representational ‘‘theory of mind’’ (TOM), they learn that a person’s representations are not direct or veridical copies of the world but rather are meaningful interpretations of the world that are intimately related to that person’s beliefs, desires, goals, and experiences (for a review, see Wellman, Cross, & Watson, 2001; Wimmer & Perner, 1983). One might interpret this developmental research to suggest that an early-developing TOM is elaborated into a more complete and accurate adult knowledge base that effectively details different causal mechanisms of thought that characterize different agents. However, the anthropomorphism findings just reviewed imply that adults’ generalization of these principles may be quite broad, and other recent research has questioned the degree to which adults deploy an effective understanding of the representations involved in everyday tasks. For example, the well-known false belief task reveals how children as young as 4 years understand that changes to the world that one does not witness will make one’s representation of the world out-of-date, thereby leading to a false belief (e.g., see Wimmer & Perner, 1983). However, even adults do not fully appreciate the range of circumstances under which their representations will become ‘‘out-of-date.’’ For example, large proportions of adults incorrectly predict that they will see visual changes to scenes, even when those changes occur out of their view and across a substantial delay (Levin, Drivdahl, Momen, & Beck, 2002). This suggests that adults’ best representational understanding can be overwhelmed by situation-specific demands and constraints (see also Barr & Keysar, 2005; Birch & Bloom, 2007; Epley et al., 2007). In the case of visual change detection, the immediacy of vision and the ease with which visual features are accessed in meaningful environments may lead people not only to overestimate the proportion of visual features they have represented (Levin & Beck, 2004; Rensink, 2000) but also to misconstrue more basic aspects of vision such as the fact that vision results exclusively from light entering the eye (for a review, see Winer & Cottrell, 1996). Although infants clearly distinguish intentional and nonintentional agents, the findings just presented demonstrate that adults’ intuitions about different types of agents are not necessarily a straightforward, and error-free elaboration on this basic contrast. It is also possible that some of these early distinctions remain effective, but only for the purposes of simple classification and perceptual identification (see, e.g., Gao, Newman, & Scholl, 2009). Beyond this, adults may anthropomorphize machines, but they do this only when they fail to directly consider their capacities, and therefore respond ‘‘as if’’ a mechanical agent is human while holding no particular commitment as to the specific way the machine might be ‘‘thinking.’’ One aspect of previous research that supports this possible lack of commitment is that many of the previous findings rely on judgments that machines have goals, are intentional, and have ‘‘minds of their own.’’ Thus, the previous research may have failed to reveal beliefs that adults hold about fundamental differences in the kinds of thinking inherent

Predictions About Computers, Robots, and People

165

to different kinds of intelligent system. To explore these beliefs, in this article we ask several specific questions. The most basic is whether adults hold any clear beliefs that mechanical agents engage in the same kind of intentional thinking as do people. Furthermore, if there are clear differences in adults’ beliefs about humans versus mechanical agents, it is not clear how specific surface and behavioral properties of agents will affect adults’ intentional attributions. We therefore examine how ascribed intentionality can be influenced by such properties. Research exploring infants’ attributions of agency have generally demonstrated that interactivity is a key element in leading children to treat an agent as intentional (for review, see Arita, Hiraki, Kanda, & Ishiguro, 2005), but only a few studies have asked what factors affect adults’ attributions about the agency of different entities. So, adults may or may not be similar to infants in requiring interaction with an agent before attributing intentionality to it, and previous research has focused more on situational and motivational factors than on systematically manipulating specific features of the entities themselves. Some exceptions to this general rule include research asking whether the country of origin of a robot affects users’ expectations about the robot’s culture-specific knowledge (Lee et al., 2005), and a few studies have asked broader questions about the nature of thinking in different entities. For example, Gray, Gray, and Wenger (2007) asked participants to rate the similarity of a range of entities on 18 different mental attributes and argued that participants reliably differentiate entities on the basic dimensions of experience and agency (see also Morewedge, Preston, & Wegner, 2007). Although the agency dimension explained far less variance than the experience dimension, it is important to note that these dimensions were interpretations of latent variables resulting from a principle components analysis of similarity ratings on a wide range of rated dimensions. Accordingly, the dimensions observed by Gray et al. may, or may not, directly reflect the concept of attributed intentionality that we are attempting to isolate. In related work, Haslam (2006) argued that people define humanness (and consequently engage in dehumanization) in two complementary ways. First, they contrast a fully human entity with animals, and second, they contrast humanness with machines. Clearly, the availability of both an agency dimension in the Gray work and a human–machine contrast in the Haslam work suggest that people will differentiate the agency of people, robots, and computers. However, none of these studies have explored the degree to which these concepts can be altered by new information about an entity. Therefore, in the present work, we sought to manipulate the apparent behavior of an entity (a robot) to determine the degree to which this experience could increase adults’ explicit judgments of intentionality. In addition, we sought to place judgments about intentionality and the goaldirectedness of behavior on firmer ground by linking them directly with predictions about the behavior of different entities in specific situations. As just reviewed, previous research exploring explicit beliefs about the intentionality of different agents has relied upon participants to rate intentions, free will, and consciousness directly (Epley, Akalis, Waytz, & Cacioppo, 2008; Gray et al., 2007; Morewedge et al., 2007), or has reported participant comments about entities such as robots (Kanda, Hirano, Eaton,

166

Levin et al.

& Ishiguro, 2003), without assessing whether these complex ideas are associated with expectations about specific observable behaviors that any given entity might exhibit. We therefore investigated adults’ understanding of different representational systems by asking participants to make predictions about the behavior of these systems in situations that pit intentional cognitions against more mechanical, nonintentional cognitions. To do this, we created a series of illustrated scenarios in which participants imagined observing a computer, a person, and a robot either engage in an ambiguous behavior or encounter a situation that could be approached in different ways. Participants were asked to predict what each entity would do. We included the person and the computer to test for a basic contrast between intentional human thinking and mechanical thinking and included the robot to test the degree to which an anthropomorphic form and/or behavior would induce adults to make more intentional responses. Three of the scenarios were based Woodward’s (1998) hypothesis that objectdirected behavior is important for intentional reasoning. In the scenario most similar to those used by Woodward, participants were told that the entity has directed an action toward one of two objects, each in its own salient location. The objects were then switched, and participants were asked which object the entity would act upon now. If participants believe that the action prior to the location switch was goal directed, they should predict that the entity would act on the same object in its new location, whereas if they believe the action was location directed, participants would predict that the agent would act on the new object in the old location. In addition to the object/location scenarios, we included a scenario testing another possible basic contrast between intentional and mechanical action. In this scenario, we asked whether participants would presume that a human or mechanical system would categorize objects based on functional/taxonomic categories or based on the most salient perceptual features of the object. This contrast is fundamental in conceptual development, and many researchers have explored the process by which children transition from using perceptual features for grouping to a deeper knowledge of the utility of taxonomic categories (e.g., see Deak & Bauer, 1996). An important part of TOM is that an individual can make a clear link between an internal representation and a meaningful real-world object category. This is central to the link between TOM and word learning, especially for artifact categories that are organized around human goals (Bloom, 1997). We predicted that participants would choose functional/taxonomic categories for the person and perceptual feature-based categories for the machines. In the rest of this article, we refer to responses in which entities select objects/goals over locations and responses choosing taxonomic/functional categories over perceptual feature-based categories as ‘‘intentional responding.’’ We note that these responses are not necessarily proof that participants are making intentional attributions online but rather are characteristic outcomes of intentional mental processes. To further support the hypothesis that the contrast between human and machine predictions is related to intentionality, participants completed a questionnaire at the end of the session asking them to rate how effective computers are at understanding

Predictions About Computers, Robots, and People

167

human goals and to rate how intelligent computers are overall. If human–machine differences in the prediction scenarios are related to intentionality, then they should be predicted by individual differences in ratings of machine goal understanding. To allow sufficient power, we present an analysis of these data at the end of this report based on all four experiments.

2. EXPERIMENTS 1 AND 2: CONTRASTING BEHAVIORAL PREDICTIONS FOR HUMANS, ROBOTS, AND COMPUTERS 2.1. Method Participants. Fifteen participants (10 female; M age D 19.5) completed Experiment 1. Participants were undergraduate students at Vanderbilt University who received course credit in exchange for participating. Materials. Participants read general directions about three entities and then responded to a series of five scenarios asking them to offer a prediction about the behavior of each entity, or to judge which of two instructions would be best for a specific task. The first page of directions informed participants that we were interested in their ‘‘intuitions about three different kinds of things: a person named John, a robot called OSCAR, and a computer system called Yd3.’’ and emphasized that ‘‘there are no right or wrong answers—just respond based on your judgment about what each thing will do.’’ This was followed by a note explaining that ‘‘OSCAR can physically grab objects at different locations using his arm and Yd3 has been loaded into a system that can physically lift objects at different locations using a mechanical vacuum device.’’ Following the general directions, the three entities were pictured on separate pages. ‘‘John’’ was illustrated with a picture of a White male with a neutral expression; OSCAR was illustrated with a picture of an anthropomorphic robot with arms, a head, and a body; and Yd3 was illustrated with an LCD computer monitor attached to a keyboard and mouse (see Figure 1). Underneath the illustration of each entity was an agent-appropriate instruction of the following form: ‘‘When making your responses remember that [John, Yd3, OSCAR] is a [Human, Computer System, Robot]; consider what kind of processes characterize a [Human, Computer System, Robot] as opposed to another kind of thing.’’ The scenarios were then presented to the participants. The first scenario (the ‘‘Object vs. Location’’ scenario; see Figure 2) described two trials of a ‘‘reaching exercise’’ in which an entity reached for one of two objects on a grid. Then, the objects’ locations were swapped, and the subject was asked whether the entity would reach to the old location (and therefore the new object) or to the new location (and the old object). The intentional response was the reach for the old object at the new location, demonstrating a belief that the entity was engaged in a goal-directed reach to the object and would reach for it again.

168

Levin et al.

FIGURE 1. Initial descriptions of the human, computer, and robot. (Color figure available online.)

When making your responses remember that John is a person. Consider what kind of processes characterize a person as opposed to another kind of thing.

When making your responses remember that Yd3 is a computer system. Consider what kind of processes characterize a computer system as opposed to another kind of thing.

When making your responses remember that OSCAR is a robot. Consider what kind of processes characterize a robot as opposed to another kind of thing.

Predictions About Computers, Robots, and People

169

FIGURE 2. The Switch scenario. (Color figure available online.) Imagine Yd3, John, and OSCAR are completing a series of three exercises. In both of the first two exercises, you observe each pick up the duck at location A1 as illustrated below.

Before the beginning of the third exercise, the duck and truck are swapped, so that the duck is at location C3, and the truck is at location A1. What will happen?

Question 1a Will Yd3 select (A) the duck at C3, or (B) the truck at A1? 1b, Will John select (A) the duck at C3, or (B) the truck at A1? 1c, Will OSCAR select (A) the duck at C3, or (B) the truck at A1?

In the ‘‘Feature vs. Category’’ scenario, participants were asked which of the two organizational schemes the entity would use. An array of six objects was pictured first in a disorganized state, then depicted as organized in two different ways. In one organization, the objects were grouped by perceptual similarity (the darker, square objects were grouped), and in another they were organized by semantic category (candy and office supplies). The categorical organization would be putatively characteristic of that used by an intentional system.

170

Levin et al.

In the ‘‘Position vs. Category’’ scenario, participants were told that the entity ‘‘reached’’ for the first, third, and fifth item in a row of seven items, including writing utensils and other similarly shaped objects. The three reached-for items were writing utensils. The sixth item was a marker and the seventh was a screwdriver. The question is whether the system would continue a spatial pattern of reaching and go for the seventh item (the nonintentional response) or continue reaching for writing utensils and reach for the marker at the sixth position. In the ‘‘Name vs. Location’’ scenario, participants were shown a picture of a floppy disk and a red pen and asked whether it would be better to direct the entity to ‘‘lift the red pen’’ (intentional) or to ‘‘lift the object on the left’’ (mechanical response). Finally, in the ‘‘Side vs. Shape’’ scenario, participants were shown two pictures, each representing a trial in a sorting task in which cards with a circle or square on the left- or right-hand side were placed into boxes labeled with a matching shape in the matching location. Then, on the critical trial, participants are asked what the entity would do with a card that matches (a) the shape, but not the location, of the illustration on one box and (b) the location, but not the shape, of the illustration on the other box. The intentional response would be to put the card in the box that matched in shape, not location. Each subject made predictions for each entity in each scenario before continuing to the next scenario. Participants were run individually or in small groups by an experimenter who advanced PowerPoint slides containing the text and illustrations for the scenarios. The order of scenarios was the same for all participants (it followed the order previously described), whereas the order of entities was rotated across participants such that each entity was the first response for one group of participants (the order of entities presented at the beginning of the questionnaire paralleled the order for each scenario). Participants responded on a paper form for each scenario. After they responded to all five scenarios, participants were instructed to go back and briefly justify their responses on unlabelled lines beneath each prediction response. After they completed the individual response justifications, they completed a more general description of how they chose responses for the different systems. Finally, participants completed a brief questionnaire asking them to provide their judgment of the general capabilities of computers ‘‘given the current state of technology.’’ These survey items were completed for all four experiments reported here, and they are described more fully and analyzed for all four experiments in the Results section of Experiment 4. Response justifications were analyzed by first coding them using 10 different categories. These categories were coded independently by two raters, who then resolved disagreements to produce a final set of response classifications. Of the 10 initially developed categories, five were relatively frequent and were analyzed further: Stimulus Association (e.g., statements such as ‘‘A and B go together’’;  D .59), Location Reference (‘‘X would reach to that spot’’;  D .72), Perceptual Salience (‘‘Color stands out more’’;  D .57), Knowledge (‘‘X would/would not know what it is’’;  D .52), and Programming (‘‘X is programmed to look for Y’’;  D .74). The other five occurred very rarely (in less than 6% of responses) and were usually

Predictions About Computers, Robots, and People

171

associated with low interrater reliabilities, so they are not discussed further (these included Cue Focus: ‘‘X would pay attention to : : : ’’; Importance to entity, ‘‘X would need it’’; Idiosyncratic focus: ‘‘A guy would want a screwdriver’’; Utility-based choice: ‘‘X knows that a pen can be used to write’’; Physical/Sense Limitation: ‘‘X would not be able to use/see that’’).

2.2. Results The proportions of intentional behavioral predictions across the five scenarios were entered into a three-level one factor repeated measures analysis of variance (ANOVA), with entity (human/computer/robot) as the one factor. This produced a significant effect for entity, F(2, 28) D 43.193, MSE D .045, p < .001. Contrasts demonstrate that participants made far more intentional predictions for human behavior (91%) than for robot behavior (28%), F(1,14) D 69.341, MSE D .042, p < .001, or for computer behavior (28%), F(1, 14) D 57.698, MSE D .051, p < .001 (both ps reflect Bonferroni corrections). Response Justifications. Differences in response justifications among entities were tested by contrasting the proportion of references to each justification category between the human and the computer, between the human and the robot, and between the robot and computer. All p values were corrected using the Bonferroni method. Participants gave Stimulus Association responses for 32% of human scenarios, 21% of robot scenarios, and 28% of computer scenarios (ps > .28). They gave significantly fewer Location Reference responses for the human (5%) than for the robot (32%), t(14) D 6.325, p < .001, or the computer (31%), t(14) D 5.551, p D .001. Differences among entities for the Perceptual Salience response were nonsignificant (human D 23%, computer D 33%, robot D 36%) as were differences for the Knowledge response (human D 31%, computer D 30%, robot D 27%). Participants were less likely to mention programming for the human (1%) than for the robot (19%), t(14) D 4.026, p D .018, or the computer (16%), although the human–computer difference was nonsignificant, t(14) D 2.750, p D .234.

2.3. Discussion Experiment 1 clearly demonstrates that participants make different predictions about the behavior of a human and mechanical entity. Across the five scenarios, participants predicted more intentional behaviors for people and more location/feature oriented behaviors for computers. Also, response justifications suggest that participants ascribe more location-based behavior to robots and computers. Although the human– computer difference was quite large, there was no robot–computer difference at all, even though participants saw an anthropomorphic robot with a name. Thus, it appears that participants assume that robots and computers will produce fundamentally the same kind of behavior and that simple anthropomorphism is not sufficient to elicit different predictions about computers versus robots.

172

Levin et al.

2.4. Experiment 2 Background In Experiment 2, we tested whether participants would produce more intentional responses for robots than computers if the robot was further anthropomorphized. Participants completed similar questionnaires to those in Experiment 1 but also viewed a brief video demonstrating how the robot could walk, run, and pause to allow a person to pass in front of it. Our goal was to expose participants to an anthropomorphic robot without explicitly telling participants that the robot had the representational structure necessary to perform any specific task in a goal-directed manner. If certain ‘‘human-like’’ behaviors automatically suggest the robot’s behaviors are goal directed, participants should produce more intentional responses for the robot than for the computer in this experiment.

2.5. Method Participants. Twenty-four participants participated in Experiment 2. Of these 19 were female (M age D 42). Participants were Vanderbilt Medical Center employees recruited from the hospital cafeteria in exchange for candy. Materials. The same set of questionnaires used in Experiment 1 was used here, and the method of presentation was identical. However, in Experiment 2 the questionnaire was preceded by a short (27 s) video showing ASIMO, a highly anthropomorphic robot, walking around obstacles and walking in a hall, then stopping to let some people pass. In addition, the response justifications were dropped from the experiment to facilitate running participants in the short time available to this population of participants.

2.6. Results The proportions of intentional behavioral predictions across the five scenarios were entered into a three-level, one-factor repeated measures ANOVA, with entity (human/computer/robot) as the one factor. This produced a significant effect for entity, F(2, 46) D 12.124, MSE D .041, p < .001. As in Experiment 1, unplanned contrast indicate that participants predicted more intentional behaviors for humans (73%) than for robots (48%), F(1, 23) D 10.897, MSE D .064, p D .009, or for computers (47%), F(1, 23) D 27.943, MSE D .029, p < .001 (both ps reflect Bonferroni corrections) and did not differentiate between computers and robots. Comparison Between Experiments 1 and 2. Participants predicted significantly fewer intentional responses for the human in Experiment 1 than in Experiment 2 (73% vs. 91%), t(37) D 3.334, p D .002, and they predicted significantly fewer intentional responses for the robot and computer in Experiment 1 than in Experiment 2: computers D 28% vs. 48%, t(37) D 2.437, p D .02; robots D 28% vs. 47%, t(37) D 2.417, p D .02.

173

Predictions About Computers, Robots, and People

2.7. Analysis of Experiments 1 and 2 To further explore the consistency of responses on individual scenarios, data for all 39 participants in Experiments 1 and 2 were combined. This analysis demonstrated that four of the five scenarios produced a strong contrast between humans and robots, and between humans and computers, whereas none of them produced a significant difference between robots and computers (see Figure 3). With the exception of the Side vs. Shape scenario, participants produced significantly more intentional responses for the human than either of the other entities. In addition, we tested the degree to which responses for different entities were the same for a given participant. The global similarity in overall levels of predicted intentionality for computers and robots, combined with our within-participants design, may mean that participants usually decided to give the same response for computers and robots (perhaps based on some kind of demand characteristic) without thinking through the specific link between each entity and the scenario. Given base rates of responding on the five scenarios, one would expect that if participants were independently considering each entity’s behavior, that they would give the same response to the computer and the robot on 58% of trials. In fact, they gave the same response on 68% of trials. This rate is significantly greater than that predicted by independence, t(38) D 2.460, p D .019, suggesting that participants did rely on some common concepts when considering these entities. However, it is far from the high level of concordance one would expect if participants were simply giving the same response twice to mechanical entities that they failed to distinguish at all.

2.8. Discussion of Experiments 1 and 2 The results of Experiments 1 and 2 demonstrate that participants make different predictions for the goal-driven behavior of humans than for machine behaviors.

FIGURE 3. Percentage of intentional responses for each scenario across entities in Experiments 1 and 2. Scenario Object vs. Location Feature vs. Category Position vs. Category Name vs. Location Side vs. Shape

Human

Robot

Computer

82% 62% 82% 95% 76%

46%** 23%** 48%** 21%*** 64%

31%*** 21%** 53%* 23%*** 69%

Note. n D 39. Significance levels in the robot and computer columns are the results of chisquare tests comparing proportions between humans and robots in the robot column and between humans and computers in the computer column. All robot–computer comparisons are nonsignificant (ps > .23). * p < .05 for human–robot or human–computer difference. ** p < .01 for human–robot or human– computer difference. *** p < .001 for human–robot or human–computer difference.

174

Levin et al.

These two experiments converge on this conclusion using different populations and further suggest that simple anthropomorphic labeling and behaviors are not sufficient to lead adults to give more intentional responses for robots than computers. Again, there was no sign of a difference in predictions between the computer and robot, even when prefaced with a video showing highly anthropomorphic behavior. Thus, Experiments 1 and 2 are consistent in demonstrating that simple labeling and anthropomorphic movements are not sufficient to change participants’ attributions about the robots from a relatively nonintentional framework. Before moving on, it is important to discuss our decision to rely on a withinsubjects experiment in which participants made predictions for each of three entities for all of the scenarios. Although it is possible that this has induced comparative processes that influenced participants’ ratings, it is not clear that such processes would be uncharacteristic of everyday judgments. In addition, we have previously used a between-participants design in which different participants made predictions for different agents (Levin et al., 2006), and these produced very similar results, especially when comparing scenarios from the previous study that were the same as those used here. In that study, we observed 66% intentional predictions for the human, 33% for the robot, and 37% for the computer. So the pattern of results is very similar in that humans are rated much more intentionally than machines, which are in turn rated similarly overall. More generally, we relied upon a within-participants approach to make our measures of agency more practical. Asking participants about a range of agents allows between-group manipulations of factors affecting attributions about agents, such as Experiments 3 and 4, much more tractable. Asking separate subjects about each kind of agent would mean that experiments with two conditions moderating these concepts would rely on a 2  3 crossed design with six separate groups of participants. We therefore feel that our within-participants approach is not only valid but much more useful for a wide variety of experimental purposes. Finally, we note that within-participants comparisons of different agents have been successfully used in both the developmental and adult literatures exploring concepts about agency.

3. EXPERIMENT 3: THE IMPACT OF OBSERVING APPARENTLY INTENTIONAL BEHAVIOR ON ATTRIBUTIONS OF AGENCY TO A ROBOT Experiments 1 and 2 were consistent in failing to produce a difference in intentional responding between computers and robots. However, it is possible that stronger manipulations are necessary to invoke a more intentional interpretation of a robot. To do this, we asked all participants to watch a robot repeatedly engaging in a simple apparently intentional behavior, that is, selectively looking at one of a pair of objects. According to many authors, this behavior is a key signifier

Predictions About Computers, Robots, and People

175

of intentionality that even young infants use to guide their own attention and action (see, e.g., Butterworth & Jarrett, 1991; Hood, Willen, & Driver, 1998; Moll & Tomasello, 2004), and ultimately to bootstrap a full intentional TOM (Flavell, Green, & Flavell, 1990; Gopnik et al., 1994). Moreover, in adults, following the target of another person’s gaze appears to be automatic, and probably continues to serve as an entry point into a wide range of interpersonal interactions and intentional attributions. Therefore, in Experiment 3, we employed an attentional-focus manipulation in which we asked participants to track the objects that a robot was looking at. So, before completing the behavioral predictions, participants viewed a video in which a robot engaged in a selective looking task similar to those used in research exploring children’s gaze tracking (e.g., see Brooks & Meltzoff, 2005). In the video, an experimenter showed a robot a series of pairs of items. For each pair, the robot was shown looking at one, then the other member of the pair. In addition to the attentional-focus manipulation that was experienced by all participants, we introduced a secondary manipulation designed to assess the degree to which revealing the robot’s internal representations would affect any induced differences between the robot and the computer. We therefore used two slightly different attentional-focus videos, which were viewed by different groups of participants (this manipulation was done between-participants because asking participants to view two videos, one with the meter and one without, might lead to substantial carryover effects, or to additional distracting cognitions about the appearance or disappearance of the meter from the robot). For participants in the ‘‘meter’’ condition, the robot’s looking was accompanied by deflections on an ‘‘excitement meter’’ that were high for the items that participants were told the robot was ‘‘interested in.’’ The excitement meter consisted of a graphical display on an LCD monitor in the body of the robot. In addition to showing interest on the excitement meter, the robot in the meter condition was referred to using anthropomorphic language. The other group of participants, in the no-meter condition, saw the same videos but with the excitement meter blanked and with no anthropomorphic language used to describe the robot. The meter was intended to focus participants’ attention on the connection between the robot’s internal representations and specific objects in the environment. To ensure close attention to the robot’s looking, all participants were told that they would need to remember what the robot looked at, and they were tested for their memory of the robot’s ‘‘preferences.’’ Both videos showed the robot looking for a longer time at one member of each pair than the other. Our basic hypothesis is that viewing the attentional-focus videos will lead participants in this experiment to make more intentional behavioral predictions for the robot than for the computer. If the attentional focus manipulation is successful in leading participants to differentiate the robot and the computer, it implies that participants start with a default assumption that all mechanical intelligent systems are not goal driven but that they begin to incorporate more intentional attributions about these systems once they are asked to focus on the link between objects and representations internal to these systems.

176

Levin et al.

3.1. Method Participants. Eighteen Vanderbilt University students completed Experiment 3 in exchange for course credit. Of these, eight were male (M age D 19.6). Ten participants were in the no-meter condition, and eight were in the meter condition. Procedure. The procedure for Experiment 3 was similar to that in Experiments 1 and 2, with the exception that participants completed an attentional-focus exercise before responding to the scenarios. In this exercise, participants viewed an edited video of a robot being shown 10 pairs of objects. For each pair, an initial shot showed a male experimenter picking up two objects and placing them on a table in front of the robot (see Figure 4). Then, a second shot showed the robot looking first to the object on the left, and then to the object on the right (always in that sequence). Then the experimenter was seen removing the pair of objects from the table and replacing them with the new pair. This sequence was repeated for each of 12 object pairs. For each pair, the robot looked to one of the objects for 2 s and to one for 1 s. In the meter condition, this looking behavior was accompanied by a display in the torso of the robot that depicted a moving line graph that deflected to a ‘‘high’’ level of excitement for the object that was the target of the longer look and to a lower level of excitement for the object that was the target of the shorter look. In the no-meter condition, the identical video was shown with the meter blocked from view by a mask generated during the editing process. Before viewing the video, participants were told to pay close attention to what the robot looked at, and they were informed that they would be asked questions about the robot’s preferences after the video. To reinforce the meter/no-meter distinction, when introducing the robot, the no-meter robot was described in mechanical terms and the meter robot was described in anthropomorphic terms. This was done by naming the meter robot ISAC and referring to it as ‘‘he,’’ whereas the no-meter robot was called I-SAC and referred to as ‘‘it.’’ In addition, ISAC was described as ‘‘looking FIGURE 4. Still frames from attentional focus video, no-meter condition. A: A model shows the robot a pair of objects. B: The robot is shown looking at one of the objects using a standard ‘‘reverse angle’’ edit. (Color figure available online.)

Predictions About Computers, Robots, and People

177

at’’ things, whereas I-SAC was described as ‘‘processing’’ them. The no-meter I-SAC introduction read as follows: In the following video you will see several sets of objects presented to robot called I-SAC. I-SAC will process each of the objects, first processing the object on the left, then processing the object on the right. Your task is to remember what objects are presented to I-SAC and its responses to these objects.

The anthropomorphic ISAC instruction read: In the following video you will see several sets of objects presented to a robot called ISAC. ISAC will look at each of the objects, first looking at the object on his left, then looking at the object on his right. The meter in ISAC’s stomach area will indicate ISAC’s interest in the objects presented. Your task is to remember what objects are presented to ISAC and his responses to these objects.

After viewing the video, participants completed a two-alternative forced-choice recognition test in which each pair of objects was presented, and participants were asked to recall which the robot ‘‘processed’’ the most. After completing the memory test, participants completed a questionnaire with four of the five scenarios used in Experiments 1 and 2. The Side vs. Shape scenario was eliminated because it did not consistently differentiate between humans and machines. Then participants completed another questionnaire asking them the same computercapability questions as in Experiments 1 and 2, but instead of generic computers and robots, the questions referred directly to the specific robot used in these experiments. In addition, participants were asked several questions about their familiarity with technology, their exposure to technology, and more specifically whether they know how to program a computer; about the amount to science fiction they consume; and about how often they play video games. As previously mentioned, results from these questionnaires are analyzed after the results of Experiment 4. Finally, participants were asked about the degree to which the robot’s looking preferences were determined by the shape, color size, and kind of objects to which was exposed.

3.2. Results The scenario data were entered into a 3 (agent: human, robot, computer)  2 (video type: meter, no-meter) mixed factors ANOVA with video type as the betweenparticipants factor. The effect of agent was significant, F(2, 32) D 41.525, MSE D .053, p < .001, whereas the effect of video type (F < 1) and the Agent  Video Type interaction, F(2, 32) D 1.288, MSE D .053, p D .290, were not significant. Overall, participants made 90% intentional predictions for the human, 40% intentional predictions for the robot, and 21% intentional predictions for the computer. The key prediction in this experiment was that viewing the videos would lead participants to attribute more intentionality to the robot than to the computer, so the difference between robot and computer intentionality was tested with a planned contrast, which

178

Levin et al.

FIGURE 5. Total proportion of intentional responses for a human, a robot, and a computer in meter and no-meter conditions.

was significant, F(1, 16) D 5.778, MSE D .052, p D .029 (see Figure 5). As in Experiments 1 and 2, participants attributed more intentional responses to the human (90%) than to the robot (40%), F(1, 16) D 45.593, MSE D .050, p < .001, Bonferroni corrected; or the computer (21%), F(1, 16) D 72.583, MSE D .057, p < .001, Bonferroni corrected. The tendency to differentiate robots and computers was stronger for participants who viewed the no-meter video than for those who viewed the meter video. The robot–computer contrast was 9% for those who saw the meter video, F(1, 7) D .568, MSE D .062, p D .476, as compared with 28% for those who saw the no-meter video, F(1, 9) D 8.442, MSE D .045, p D .017. Recognition Analysis. In this analysis, we tested whether participants were able to remember the degree to which the robot processed objects by subtracting the mean processing rating for objects that were targets of long looks (and high meter deflection) from the ratings of objects that were the target of a short look (and low deflection). Thus, this score reflects the mean predominance of preferred objects in rating scale units for each subject, and would be zero if participants did not remember which objects the robot preferred. Preference recognition was significantly greater for the meter video (C1.81) than for the no-meter video (C.13), t(16) D 3.326, p D .004. The no-meter recognition score was not significantly different from zero, t(9) D .346, ns, which suggests that participants did not remember which specific objects the no-meter robot preferred. Although no-meter participants did not rate the long-look objects higher than the short-look objects, all participants successfully differentiated objects that had been in the initial looking exercise from foils that had not (meter D 97% correct; no-meter D 96% correct). Combined, the results of these tests of memory for the objects and memory for the looking preferences make it clear that participants were attending

Predictions About Computers, Robots, and People

179

to the objects in both conditions closely enough to differentiate objects that had been in the videos from those that were never presented but that they had difficulty remembering the which specific object the robot preferred to look at, especially in the no-meter condition.

3.3. Discussion The results of Experiment 3 indicate that attending to the looking behavior of a robot was effective in increasing the degree to which participants would make intentional predictions for a robot, although it did not eliminate the robot–human difference. This suggests that participants are willing to imbue robots with partial intentionality but still not treat them just like humans. The results also suggest that our attempt to lead participants to consider the representations inherent to the robot using the ‘‘excitement meter’’ and to reinforce this tendency with an anthropomorphic name did not work as we expected it would. The no-meter robot was more strongly differentiated from the computer. This suggests not only that naming has little effect on intentional responding but also that the unnatural representational indicator was less effective than the more subtle but more natural eye movement behavior that participants were provided with in the no-meter condition. Alternatively, one might argue that the meter was simply a distraction and generally drew attention from the looking task. However, the preference recognition data argue against this because they demonstrate that participants were better able to recognize objects the robot ‘‘preferred’’ in the meter condition. Combined, these results suggest that the attentionalfocus manipulation was successful independent of either anthropomorphic naming, or of people’s ability to remember the specific preferences of the robot.

4. EXPERIMENT 4: REPLICATION AND EXTENSION OF THE IMPACT OF OBSERVING INTENTIONAL BEHAVIOR ON ATTRIBUTIONS ABOUT A ROBOT Experiment 3 demonstrates that participants will, in some cases, differentiate robots and computers, but the experiment suffers from several shortcomings. Most important, it is not clear whether the increase in intentional ascription caused by the reference-focus task occurred because participants specifically attributed intentionality to the robot or because they were generally sensitized to any sign of intentionality by focusing on the attentive processes of some entity. To test this hypothesis, robot intentional focus conditions (both with the meter and no-meter videos) were run as in Experiment 3 (again with separate groups of subjects; we used a betweenparticipants manipulation to avoid the possibility that a given participant might carry over a more or less intentional interpretation of the human or robot model from one model to another). In addition, a third group of participants viewed a video in which the robot was replaced by a person viewing the pairs of objects. If the

180

Levin et al.

robot–computer differentiation observed in Experiment 3 was due to a general sensitization to intentionality, then it should occur in all conditions. On the other hand, if participants were actually changing their understanding of the robot, then they should not differentiate the computer and robot when they view a person looking at objects. Accordingly, we predict that the robot–computer difference in intentionality will be greater for participants who view the robot versions of the attentional tracking video (especially in the no-meter version of the video) than for participants who view the human version of the video.

4.1. Method Forty-five Vanderbilt University students completed Experiment 4 (19 male, M age D 19.6) in exchange for course credit. Of these, 14 were in the no-meter condition, 16 were in the meter condition, and 15 were in the human condition. The procedure in Experiment 4 was identical to that in Experiment 3. Again, both meter and no-meter attentional focus conditions were included, but in addition, a condition was added in which the robot was replaced with a male experimenter who looked at each object after they were placed on the table. Similar to the robot, the human model looked at one of the objects for 1 s and at the other for 2 s. The same recognition test followed the video, with the wording changed to be appropriate for each condition.

4.2. Results The data were entered into a 3  3 Video Type (no meter, meter, human)  Entity (human, robot, computer) mixed ANOVA with video type as the betweenparticipants factor and entity as the within-participants factor. The main effect of entity was significant, F(2, 84) D 249.9, MSE D .031, p < .001, whereas the main effect of video type was nonsignificant, F(2, 42) D 1.398, MSE D .037, p D .26. The Entity  Video Type interaction was significant, F(4, 84) D 2.819, MSE D .031, p D .030. Overall, participants predicted 91% intentional responses for the human, in contrast to 26% for the robot and 14% for the computer (see Figure 6). Both the human–robot and human–computer contrasts were significant—human–robot contrast, F(1, 44) D 210.825, p < .001; human–computer contrast, F(1, 42) D 551.832, MSE D .048, p < .001, Bonferroni tests—as was the robot–computer contrast, F(1, 42) D 10.038, MSE D .056, MSE D .083, p D .003, planned contrast. A direct test of the hypothesis that the robot–computer contrast would be stronger in the no-meter condition than in the human condition approached significance, F(1, 27) D 2.456, MSE D .036, p D .065, one-tailed. Participants differentiated the computer and robot most strongly in the no-meter condition (16% robot–computer contrast), F(1, 13) D 8.163, MSE D .022, p D .013; less so in the meter condition (13% robot–computer contrast), F(1, 15) D 2.727, MSE D .046, p D .119; and almost not at all in the numan condition (5% robot– computer contrast), F(1, 14) D 1.313, MSE D .014, p D .271.

Predictions About Computers, Robots, and People

181

FIGURE 6. Total proportion of intentional responses in Experiment 4 for a human, a robot, and a computer in meter, no-meter, and human conditions.

The critical comparison between the computer and robot conditions was reanalyzed using nonparametric statistics to ensure that this contrast was robust against any nonnormalities inherent to these data. To ensure sufficient power, data from robot and computer conditions in both Experiments 3 and 4 were combined, and the computer–robot difference tested with a sign test that was significant (p < .002). Recognition Results. Participants successfully recognized high-preference objects relative to low-preference objects for the meter condition, and the human condition, but not for the no-meter robot. An ANOVA comparing the difference in preference ratings for the long-look versus short-look objects across the three groups (meter, no meter, and human) produced a significant effect of group, F(2, 41) D 16.482, MSE D .675, p < .001. The mean preference advantage for long-look objects was .06 in the no-meter condition, 1.77 in the meter condition, and .66 in the human condition. This preference advantage was significantly greater than zero in the meter, and human conditions (ps < .001) and not significantly different from zero in the no-meter condition, t(12) D .440, ns. All pairwise comparisons between conditions were significant (ps < .005). Individual Scenario Analysis. To test the breadth of changes caused by the attentional-focus manipulation, we looked at data from individual scenarios from the meter and no-meter conditions using the combined data from Experiments 3 and 4. Figure 7 presents the difference in intentional attribution for robots and computers such that positive values indicate more intentional attributions for the robot than the computer. The figure makes clear two things. First, the attentional-focus manipulation produces the largest difference in robot versus computer predictions for the Object vs. Location scenario (p < .001; McNemar test). This difference was significant for both the meter and no-meter condition tested individually (both ps < .001;

182

Levin et al.

FIGURE 7. Difference in intentional responding by scenario for meter and no-meter robot conditions in Experiments 3 and 4. Difference in intentional responding between robots and computers by scenario

Note. Positive values represent increased intentional attribution for robots relative to computers.

McNemar test). Second, there was also a robot-computer difference in the Position vs. Category scenario, but only for the no-meter condition (p D .016; McNemar test).

4.3. Discussion Experiment 4 converges with Experiment 3, again demonstrating that the attentional focus manipulation leads participants to differentiate robots from computers. Experiment 4 also replicated the finding that the participants who anthropomorphized the most (the no-meter robot participants) did not successfully track attention to any specific object. One minor inconsistency with Experiment 3 was that in Experiment 4, the difference between the meter and no-meter condition was quite small. Accordingly, we cannot be certain that there is any difference between these conditions, but Experiment 4 corroborates the basic findings of Experiment 3. Finally, Experiment 4 demonstrated that the attentional-focus manipulation was specific to the robot and not a more general consequence of tracking the attention of any agent: Tracking the attention of a human did not lead participants to differentiate the robot and the computer. The individual scenario analysis demonstrated that the impact of the attentional focus manipulation was relatively narrow, confined to the two scenarios that most strongly contrasted object-directed goals with spatial responding. Comparing these scenarios with the demands of the attentional focus task makes it clear that the two

Predictions About Computers, Robots, and People

183

are, in some ways, similar. The attentional-focus task directly implies that the robot is looking at specific objects, and participants are asked to remember the robot’s responses to the objects. However, it is important to point out that the tasks are in other ways different and that we are not simply forcing participants to an interpretation of the robot’s action that is directly reflected in a later, essentially identical scenario. Instructions for the task were written to avoid directly stating that the robot eye movements were driven by object-defined goals. Participants were told only that the robot would ‘‘process/look at each of the objects, first processing/looking at the object on the left, and then the object on the right’’ and that their ‘‘task is to remember what objects are presented to I-SAC/ISAC and its/his responses to these objects.’’ Thus, the task instructions and the task itself emphasized a set of instances where the robot selected and differentially evaluated each of a pair of objects. This generalized to a goal-directed interpretation of two or three ambiguous actions. Accordingly, the results of Experiments 3 and 4 suggest that different entailments of intentionality can, to a degree, be manipulated independently. One interesting and unanticipated finding was that the relatively smaller effect size in the Experiment 2 appears to have been due to the older participants who differentiated less between the computers and people. This finding suggests that there may be cohort effects whereby younger people are more familiar with computers and therefore are more likely to differentiate them from humans. However, this finding should be interpreted cautiously for now because we do not have extensive data on these participants. Therefore, it is, for example, possible that education level is confounded with age.

5. EXPERIMENT 5: REGRESSION ANALYSIS OF SURVEY RESPONSES In Experiment 5, we analyzed the results of the postexperiment questions that were included in Experiments 1 to 4. Our primary interest was to test whether participants attributed lower levels of intentionality to the computer because they believed them to simply be less intelligent than people. Therefore, all participants were asked to rate the degree to which they believed computers to be ‘‘intelligent’’ relative to humans. In addition, participants rated the degree to which computers were capable of understanding the goals inherent to human behavior. If the basic difference between predictions for humans and machines is based on differential levels of attributed intentionality rather than a broader difference in overall attributed capacity, we would expect that a regression predicting this difference would include the goal question as a significant predictor, even controlling for attributed intelligence. We also controlled for sex and age in this regression, as the older participants in Experiment 2 did appear to attribute less of a difference between the humans and machines. The postscenario questionnaire responses from Experiments 1 to 4 were combined and analyzed to test several important hypotheses. First, it is possible that people

184

Levin et al.

give fewer intentional responses for computers and robots because they believe them to be generally incapable machines and therefore perform all tasks in a rote fashion. If this were true, then ratings of machine intelligence should predict differential responding to the human versus the computer and robot. In contrast, our initial hypothesis is that the most important difference between machines and humans is that humans are capable of intentional reasoning. If this is true, then intention-relevant ratings should predict the human–computer difference. To test these hypotheses, a series of multiple regressions were run with subject age and sex as predictors, along with the subset of questionnaire items that were asked of all participants in all four experiments reported here. We controlled for sex and age in this regression, as the older participants in Experiment 2 did appear to attribute less of a difference between the humans and machines. Questionnaire items included questions asking participants about the degree to which computers can ‘‘recognize objects within a scene,’’ can ‘‘learn through prior experience and observation,’’ can ‘‘engage in self-directed action,’’ and can ‘‘infer the goals of human action.’’ Finally, participants were asked how intelligent computers are. These items were included in questionnaires following all four experiments, and relationships between questionnaire response and behavior predictions were analyzed using the combined results of these experiments to afford a reasonable degree of power. In Experiment 2, these items were asked for both computers and robots separately, and they were averaged for this analysis. The computer and robot responses were generally highly correlated, with correlation coefficients ranging from .735 to .842, with the exception of the object recognition item, which was only modestly correlated (r D .558) between computers and robots. In Experiments 3 and 4, these questions always referred to the target entity, and because this was a human in the human condition of Experiment 4, responses from participants in this condition were not included.

5.1. Method To test these hypotheses, stepwise multiple regressions (probability for entry was .05, and probability for removal was .10) predicting the difference between the proportion of intentional responses for humans and the average of the proportions of intentional responses for computers and robots were run on data from the 87 participants in Experiments 1 to 4 (not including the human condition participants in Experiment 4). Initial predictors included all those previously listed, and the final model included only the goal-inference item (ˇ D .259, p D .005) and age (ˇ D .506, p < .001) as predictors (overall Adjusted R2 D .403), F(2, 83) D 29.75, p < .001. The general intelligence item did not approach significance (ˇ D .011, p D .92). None of the other predictors approached significance in the final model (ps > .41) except sex (ˇ D .156, p D .071; female D 1, male D 0). Thus, strong differentiation between machines and humans in intentional responding was associated with low ratings of machine goal inference. Because the questions in Experiments 1 and 2 referred to computers/robots in general, whereas the questions in Experiments 3 and 4 referred to ISAC in particular,

Predictions About Computers, Robots, and People

185

the regressions were rerun separately for each pair of experiments. These regressions included only the critical goal-detection and intelligence questions to minimize power issues. Although marginally nonsignificant in both cases, the beta values for the goalunderstanding item were of similar magnitudes (Experiments 1 and 2: ˇ D .327, p D .083; Experiments 3 and 4: ˇ D .262, p D .092). In contrast, the intelligence rating was nonsignificant (Experiments 1 and 2: ˇ D .080, p D .667; Experiments 3 and 4: ˇ D .128, p D .407). The age effect was unexpected, and it indicates that older participants differentiated the machines and people less than younger participants. However, because the older participants were almost exclusively the Medical Center employees from Experiment 2, this could simply be a population effect. To test whether age was correlated with differential responding within the Medical Center population, the previous stepwise regression was rerun, with only the responses of the 24 participants from Experiment 2 included. This stepwise regression resulted in a model that included only age as a predictor (ˇ D .586, p D .003). None of the other predictors approached significance (all ps > .20). Because of the age effect, the overall stepwise regression was rerun with only the 62 younger participants from Experiments 1, 3, and 4 to verify that the goal-inference question predicted intentional responding in the absence of potential age/cohort effects. In this regression, the only significant predictors were goal inference (ˇ D .243, p D .049), and sex (ˇ D .248, p D .045), and the other variables did not approach significance (ps > .48).

5.2. Discussion The questionnaire analysis was particularly helpful in demonstrating that the basic human–machine difference in intentional responding is not simply due to the assumption that machines have less general intelligence than people but seems to be more a question of people’s attribution of the capability to understand behavior goals to mechanical agents. It is important to note that although the goal-inference question is, perhaps, the most characteristic of intentional thinking, it was not the only question that focused on intentionality. In addition, the more basic question of object identification and the ability to engage in self-directed action are also plausibly related to intentionality, and these did not predict differential responding in the scenarios. Perhaps this is because object identification is seen as a basic perceptual skill, and the ability to engage in self-directed action is characteristic of a wide range of programmed devices that can respond to simple environmental cues.

6. GENERAL DISCUSSION In these studies we repeatedly observed that participants differentiate people from computers and robots by making substantially more intentional behavioral predictions for people. Experiments 1 and 2 converged to show that participants predict

186

Levin et al.

that people will behave in a more goal-oriented manner than computers and robots and will organize objects using categorical knowledge as opposed to computers and robots, which will be more likely to organize objects using perceptual features. These experiments were also consistent in showing that participants did not predict more intentional behavior for the robots than for the computer despite the fact that the robots were given names and shown engaging in simple anthropomorphic behaviors. However, Experiments 3 and 4 demonstrated that participants did differentiate computers and robots when asked to track the robot’s focus of attention. Additional analyses support the conclusion that a key contrast between predictions for humans and computers/robots is related to intentionality: We found that individual differences in beliefs about the degree to which computers can infer the goals of human behavior predict the magnitude of the difference in intentional responding. Thus, these experiments have demonstrated and validated a means of assessing people’s understanding of broad differences in the kinds of intelligence that characterizes different living and artificial entities. A key finding from these experiments is that simple anthropomorphization of the robot repeatedly failed to lead participants to predict more intentional behavior for the robot than for the computer, whereas the stronger attentional focus manipulation did lead to differentiation. Clearly, a range of qualitative and quantitative differences between these weak and strong manipulations could have caused the latter to be more effective. In Experiments 1 and 2 the anthropomorphism of the robot involved only a few mentions of the robot’s name and/or a brief video showing anthropomorphic behavior, whereas the attentional-focus manipulation involved a series of 12 focus trials and a recognition test for the robot’s preferences. In addition, the latter experiments asked participants to directly consider the robot’s looking behavior and to make a link between this behavior and the robot’s representations and preferences, whereas the more simple manipulations in Experiments 1 and 2 required none of this. Although it is not possible to be certain which of these factors is most crucial, the finding nonetheless provides important clarification of how adults deploy their concepts about agents. In particular, these findings suggest a model of the link between adults’ and children’s concepts about agents. The developmental work has focused on basic cues to agency such as those employed in Experiments 1 and 2. For example, basic anthropomorphic features and actions are known to influence not only infants’ classifications of agents as living but also their attributions of goaldirectedness, whereas adults were not influenced by these simple cues in the present experiments. So, one basic hypothesis that might integrate our findings in adults with the Nass work and with the developmental work is to assume that simple concepts about agency develop early (e.g., Meltzoff, 1995; Woodward, 1998) and continue to be deployed later in development when adults find themselves in social situations that limit their ability to consider agency deeply. These simple concepts reflect relatively promiscuous attributions of agency to a wide range of apparently animate things. However, with time, adults develop the capacity to resist these attributions, and they begin to differentiate agents, so long as they have the opportunity to consider agents’

Predictions About Computers, Robots, and People

187

actions deeply, or at least explicitly, as was the case in our behavioral prediction scenarios. This model would be very similar to Epley et al.’s (2007) model whereby people initially anthropomorphize a wide range of things but then can sometimes correct this initial tendency by lessening these attributions if they have sufficient cognitive resources or motivation. However, based on Experiments 3 and 4, and other data we have collected, we would argue that a broad contrast between initially equating agents and later differentiation might be too simple. In Experiments 3 and 4, participants making explicit judgments started to anthropomorphize robots when they had accumulated evidence that the robots appeared to engage in typical intentional looking behavior. In addition, we have recently observed that participants who take longer to make their prediction decision (either because they choose to or are led to) tend to distinguish less between agents than participants who decide more quickly. Thus, we see less differentiation between humans and machine-agents with more thought. Findings such as these suggest that a range of task-appropriate reasoning processes can be invoked when making initial judgments about an agent and that each of these processes change in characteristic ways with increasing involvement, experience, and learning. So, when engaged in automatic social processing, such as that investigated by Reeves and Nass (1996), people may effectively equate agents at first and then allow for more differentiation with increased need to consider agents more deeply. However, in situations requiring more concrete behavioral predictions, such as those we have explored here, people may at first lead people to a more differentiated view of agents, which then gives way to a reconceptualization of agents as being more intentional than initially thought. This highlights the need to include motivational and situational constraints on reasoning about agents such as those hypothesized by Epley et al. (2007), but it also highlights the complexity of changes to agent-concepts as people apply more cognitive resources to the problem. So instead of hypothesizing a simple shift from initial anthropomorphizing cognitions to later cognitions that differentiate agents, it may be necessary to account for two broad collections of cognitive strategies and heuristics, one that is applied on one’s initial encounter with an agent and one that is used to reconceptualize the agent as additional evidence is considered. The initial processes could include both relatively implicit simple agencydetection and more explicit predictions, whereas secondary processes are sometimes invoked when initial processes fail, as, for example, when an agent repeatedly produces hypothesis-inconsistent behavior. Before concluding, we would like to address two potential objections to these results. Although previous research has explored adults’ implicit beliefs about the similarity between computers and people (as revealed by their behavior during social interactions), ours is one of the few publications to empirically explore adults’ explicit beliefs about computer versus human thinking. One might object that the reason for this lack of previous research might be that these explicit beliefs are perfectly available to intuition (because we all have them and know about them) and that our finding of differentiation is an unsurprising verification that people believe that clearly different things think differently. A related objection would discount these findings

188

Levin et al.

because the predictions we assessed reflect the behavior of a single hypothetical entity (e.g., the specific computer and robots described in the scenarios) that participants know nothing about except for what they have read in our descriptions. Therefore, the predictions do not reflect a general concept but rather an ad hoc guess tailored to a specific setting. We would like to argue against both of these ideas. First, the argument that our experiments are unnecessary confirmations of the obvious reflects a form of hindsight that can be applied to almost any new finding that makes some contact with real-world knowledge. Not only is this form of hindsight notoriously unreliable, but in this case we have clear examples not only of researchers who strongly argue both for differentiation of human and artificial intelligence versus the fundamental similarity of the two (as reviewed in the introduction) but also of experimental results suggesting that, on one hand, adults treat computers and people similarly but, on the other hand, children clearly differentiate the two. If highly informed researchers take both views, and if data from naive children and adults suggest both possibilities, then the prediction that adults will make fundamental distinctions between human and artificial intelligence in our scenarios is hardly certain. More specifically, though, recent theory explaining TOM reasoning has argued that people’s interactions with other minds requires two important processes: one that is automatic (primarily for detecting basic agency and tracking true knowledge states) and one that is more controlled and explicit (used for tracking belief–environment mismatches and other higher order functions; Baron-Cohen, 1995; Leslie, Freidman, & German, 2004). Clearly, then, understandings both of people’s reflexive treatment of different agents and their more explicit beliefs about their nature and predictions about their behavior will be necessary for a useful explanation of the adult TOM. Regarding the argument about ad hoc guessing, our choice to have participants focus on specific entities for the predictions has the advantage of allowing participants to read relatively concrete stories. This approach is common in both the adult and developmental concept literatures (e.g., see Barrett & Keil, 1996). The fact that participants focus on specific entities has the advantage of lessening uncertainty about the targets of their inferences. Although this can be seen as narrowing the generalizations that might follow from our data, specific findings in the present experiment reinforce a link to more general concepts. The basic computer–human difference was predicted with equal strength by both judgments about the understanding of goals in human action by computers in general and judgments about the specific computer in the experiment. This correlation connects the scenario responses with more general beliefs about computers, and the more general concept of the goals inherent to human action. To verify the generality of these inferences, we reran a version of Experiments 1 and 2 without the descriptions of the specific entities. Instead, we simply asked a new set of 16 participants (from Nashville State Community College, M age D 28) to imagine ‘‘a robot,’’ ‘‘a person,’’ or ‘‘a computer’’ and to make predictions about these. Results were very similar to those in Experiment 2: 42% intentional responses for the computer, 48% for the robot, and 78% for the human. Again, the differences between the human and the robot, and between the human and the computer, were

Predictions About Computers, Robots, and People

189

significant, t(15) D 4.069, p D .001, and t(15) D 4.552, p < .001 respectively. The difference between the robot and computer was nonsignificant, t(15) D .808, p D .432. The data presented here suggest that adults have a relatively strong understanding of variants in intentionality across entities. However, this understanding serves only as a starting point for reasoning about the entities, and it may be overridden by specific experiences and inferences that must be made after learning about an entity. When considering adults’ understanding of different entities, it is also important to point out that although our basic behavioral prediction effects are very strong and consistent, a substantial minority of adults in each experiment gave nominally intention-based responses for computers and, in Experiment 2, gave mechanical responses for people. Therefore, it is clear that adults have some basic notions about different kinds of minds, but they are not immediately obvious to everyone. It is also important to point out that although we might consider anthropomorphic assumptions to be ‘‘wrong’’ in the context of current technology, the future may change this. So, a more detailed understanding of how people construe the thinking inherent to people, computers, and intermediate entities such as robots will not only help us understand how to better design a variety of artificial agents but also give us an important window both in understanding the development of TOM, and in understanding adults’ deployment of TOM when facing the myriad challenges posed when navigating a technologically advanced social landscape.

NOTES Acknowledgments. We thank Carl Frankel for reading and commenting on this article. Support. This material is based on work supported by the National Science Foundation under Grant No. 0433653 to DTL, MMS, and KK. Authors’ Present Addresses. Daniel T. Levin, Department of Psychology and Human Development, Vanderbilt University, Peabody College #512, 230 Appleton Place, Nashville, TN 37203-5701. E-mail: [email protected]. Stephen S. Killingsworth, Department of Psychology and Human Development, Vanderbilt University, Peabody College #512, 230 Appleton Place, Nashville, TN 37203-5701. E-mail: [email protected]. Megan M. Saylor, Department of Psychology and Human Development, Vanderbilt University, Peabody College #512, 230 Appleton Place, Nashville, TN 37203-5701. E-mail: [email protected]. Stephen Gordon, Electrical Engineering and Computer Science Department, Featheringill Hall, Room 324, Box 351674 Station B, Vanderbilt University, Nashville, TN 37235-1674. E-mail: [email protected]. Kazuhiko Kawamura, Electrical Engineering and Computer Science Department, Featheringill Hall, Room 324, Box 351674 Station B, Vanderbilt University, Nashville, TN 37235-1674. E-mail: kawamura@ vuse.vanderbilt.edu. HCI Editorial Record. First manuscript received March 24, 2009. Revisions received January 26, 2010, August 15, 2011, and November 8, 2011. Final manuscript received January 6, 2012. Accepted by Steve Whittaker. — Editor

190

Levin et al.

REFERENCES Arita, A., Hiraki, K., Kanda, T., & Ishiguro, H. (2005). Can we talk to robots? Ten-monthold infants expected interactive humanoid robots to be talked to by persons. Cognition, pp. B49–B57. Baron-Cohen, S. (1995). Mindblindness: An essay on autism and theory of mind. Cambridge, MA: Bradford/MIT Press. Barr, D., & Keysar, B. (2005). Mindreading in an exotic case: The normal adult human. In B. F. Malle & S. D. Hodges (Eds.), Other minds: How humans bridge the divide between self and others (pp. 271–283). New York, NY: Guilford. Barrett, J. L., & Keil, F. C. (1996). Conceptualizing a non-natural entity: Anthropomorphism in God concepts. Cognitive Psychology, 31, 219–247. Birch, S. A. J., & Bloom, P. (2007). The curse of knowledge in reasoning about false beliefs. Psychological Science, 18, 382–386. Bloom, P. (1997). Intentionality and word learning. Trends on Cognitive Sciences, 1, 9–12. Brooks, R., & Meltzoff, A. N. (2005). The development of gaze following and its relation to language. Developmental Science, 10, 126–134. Bruce, A., Nourbakhsh, I., & Simmons, R. (2002). The role of expressiveness and attention in human–robot interaction. Proceedings of the IEEE International Conference on Robotics and Automation, May, 4138–4142. Butterworth, G., & Jarrett, N. (1991). What minds have in common is space: Spatial mechanisms serving joint visual attention in infancy. British Journal of Developmental Psychology, 9, 55–72. Deak, G. O., & Bauer, P. J. (1996). The dynamics of preschoolers’ word choice. Child Development, 67, 740–767. Epley, N., Akalis, S., Waytz, A., & Cacioppo, J. T. (2008). Creating social connection through inferential reproduction. Psychological Science, 19, 114–120. Epley, N., Waytz, A., & Cacioppo, J. T. (2007). On seeing human: A three-factor theory of anthropomorphism. Psychological Review, 114, 864–886. Flavell, J. H., Green, F. L., & Flavell, E. R. (1990). Developmental changes in young children’s knowledge about the mind. Cognitive Development, 5, 1–27. Gao, T., Newman, G. E., & Scholl, B. (2009). The psychophysics of chasing: A case study in the perception of animacy. Cognitive Psychology, 59, 154–179. Gergeley, G., Nadasdy, Z., Csibra, G., & Biro, S. (1995). Taking the intentional stance at 12 months of age. Cognition, 56, 165–193. Gopnik, A., Slaughter, V., & Meltzoff, A. (1994). Changing your views: How understanding visual perception can lead to a new theory of the mind. In C. Lewis & P. Mitchell (Eds), Origins of an understanding of mind (pp. 157–181). Hillsdale, NJ: Erlbaum. Gray, H. M., Gray, K., & Wegner, D. M. (2007). Dimensions of mind perception. Science, 315, 619. Haslam, N. (2006). Dehumanization: An integrative review. Personality and Social Psychology Review, 10, 252–264. Hood, B. M., Willen, J. D., & Driver, J. (1998). Adult eyes trigger shifts of visual attention in human infants. Psychological Science, 9, 131–134. Johnson, S., Slaughter, V., & Carey, S. (1998). Whose gaze will infants follow? The elicitation of gaze following in 12-month olds. Developmental Science, 1, 233–238. Kanda, T., Hirano, T., Eaton, D., & Ishiguro, H. (2003). A practical experiment with interactive humanoid robots in a human society. Proceedings from Humanoids 2003: Third IEEE International Conference on Humanoid Robots. Piscataway, NJ: IEEE.

Predictions About Computers, Robots, and People

191

Kuhlmeier, V., Bloom, P., & Wynn, K. (2005). Do 5-month-old infants see humans as material objects? Cognition, 94, 95–103. Lee, S. L., Kiesler, S., Lau, I. Y., & Chiu, C. Y. (2005). Human mental models of humanoid robots. Proceedings from ICRA’05: IEEE International Conference on Robotics and Automation. Piscataway, NJ: IEEE. Leslie, A. M., Friedman, O., & German, T. P. (2004). Core mechanisms in ‘‘theory of mind’’. Trends in Cognitive Sciences, 8, 528–533. Levin, D. T., & Beck, M. R. (2004). Thinking about seeing: Spanning the difference between metacognitive failure and success. In D. T. Levin (Ed.), Thinking and seeing: Visual metacognition in adults and children (pp. 121–143). Cambridge, MA: MIT Press. Levin, D. T., Drivdahl, S. B., Momen, N., & Beck, M. R. (2002). False predictions about the detectability of unexpected visual changes: The role of beliefs about attention, memory, and the continuity of attended objects in causing change blindness blindness. Consciousness and Cognition, 11, 507–527. Levin, D. T., Saylor, M. M., Varakin, D. A., Gordon, S. M., Kawamura, K., & Wilkes, D. M. (2006). Thinking about thinking in computers, robots, and people. Proceedings of the 5th Annual International Conference on Development and Learning, 5, 49. Meltzoff, A. N. (1995). Understanding the intentions of others: Reenactment of intended acts by 18-month-old children. Developmental Psychology, 31, 838–850. Minsky, M. (1982). Why people think computers can’t. The AI Magazine, 4, 3–15. Moll, H., & Tomasello, M. (2004). 12- and 18-month-old infants follow gaze to spaces behind barriers. Developmental Science, 7, F1–F9. Morewedge, C. K., Preston, J., &Wegner, D. M. (2007). Timescale bias in the attribution of mind. Journal of Personality and Social Psychology, 93, 1–11. Reeves, B., & Nass, C. (1996). The media equation. New York, NY: Cambridge University Press. Rensink, R. A. (2000). The dynamic representation of scenes. Visual Cognition, 7, 17–42. Searle, J. (1984). Minds, brains, and science. Cambridge, MA: Harvard University Press. Spelke, E. S., Phillips, A. T., & Woodward, A. L. (1995). Infants’ knowledge of object motion and human action. In D. Sperber, D. Premack, & A. Premack (Eds.), Causal cognition: A multidisciplinary debate (pp. 44–78). New York, NY: Oxford University Press. Wellman, H. M., Cross, D., & Watson, J. (2001). Meta-analysis of theory-of-mind development: The truth about false belief. Child Development, 72, 655–684. Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 13, 103–128. Winer, G. A., & Cottrell, J.E. (1996). Does anything leave the eye when we see? Extramission beliefs of children and adults. Current Directions in Psychological Science, 5, 137–142. Woodward, A. L. (1998). Infants selectively encode the goal object of an actor’s reach. Cognition, 69, 1–34. Woodward, A. L. (2005). Infants’ understanding of the actions involved in joint attention. In R. V. Kail (Ed.), Advances in child development and behavior (pp. 229–262). Oxford, UK: Elsevier.

Copyright of Human-Computer Interaction is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Suggest Documents