Rationalization: A Neural Machine Translation Approach to ... - arXiv

5 downloads 46 Views 330KB Size Report
Feb 25, 2017 - Brent Harrison 1 Upol Ehsan 1 Mark O. Riedl 1. Abstract. We introduce AI ...... Letham, Benjamin, Rudin, Cynthia, McCormick, Tyler H,. Madigan ...
Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

Brent Harrison 1 Upol Ehsan 1 Mark O. Riedl 1

arXiv:1702.07826v1 [cs.AI] 25 Feb 2017

Abstract We introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had done the behavior. We describe a rationalization technique that uses neural machine translation to translate internal state-action representations of the autonomous agent into natural language. We evaluate our technique in the Frogger game environment. The natural language is collected from human players thinking out loud as they play the game. We motivate the use of rationalization as an approach to explanation generation, show the results of experiments on the accuracy of our rationalization technique, and describe future research agenda.

1. Introduction Autonomous systems must make complex sequential decisions in the face of uncertainty. Explainable AI refers to artificial intelligence and machine learning techniques that enable human users to understand, appropriately trust, and effectively operate semi-autonomous systems. Explainability becomes important in human-agent (or robot) teams wherein a human operator works alongside an autonomous or semi-autonomous agent. In the event of failure—or if the agent performs unexpected behaviors—it is natural for the human operator to want to know why. Explanations help the human operator understand why an agent failed to achieve a goal or the circumstances whereby the behavior of the agent deviated from the expectations of the human operator. They may then take appropriate remedial action: trying again, providing more training to machine learning algorithms controlling the agent, reporting bugs to the manufacturer, repairing the sensors or effectors, etc. Explanation differs from interpretability, which is a feature of an algorithm or representation that affords inspec1

Georgia Institute of Technology, Atlanta, Georgia, USA. Correspondence to: Brent Harrison .

tion for the purposes of understanding behavior or results. For example, decision trees are considered to be relatively interpretable and neural networks are generally considered to be uninterpretable without additional processes to visualize patterns of neuron activation (Zeiler & Fergus, 2014; Yosinski et al., 2015). Interpretability is a feature desirable for AI/ML system developers who can use it to debug and improve their systems. Explanation, on the other hand, is grounded in natural language communication and is theorized to be more useful for non-AI-experts who need to operate autonomous or semi-autonomous systems. In this paper we introduce a new approach to explainable AI: AI rationalization. AI rationalization is a process of producing an explanation for agent behavior as if a human had done the behavior. AI rationalization is based on the observation that humans do not generally understand how their own brains work and consequently do not give explanations that literally reveal how a decision was made. Rather it is more likely that humans invent plausible explanations on the spot. However, we accept human-generated rationalizations as providing some lay insight into the mind of the other. AI rationalization has a number of theoretical benefits: (1) by communicating like humans, rationalizations are naturally accessible and intuitive to humans without AI or computer science training; (2) human-likeness of communication between autonomous system and human operator may afford higher degrees of trust, rapport, and willingness to use autonomous systems; (3) rationalization is fast, sacrificing absolute accuracy for real-time response, appropriate for real-time human-agent collaboration. Note that AI rationalizations, like human-produced rationalizations, are not required to be accurate, but only to give some insight into the agent’s decision-making process. Should deeper, more accurate explanations or interpretations be necessary, rationalizations may need to be supplemented by other explanation, interpretation, or visualization techniques. We propose a technique for AI rationalization that treats the generation of explanations as a problem of translation between state-action-state sequences that are produced during execution of an autonomous system and natural language. Language translation technology using encoder-

Natural Language Rationalizations

decoder neural networks (e.g., (Luong et al., 2015)) has reached the level of maturity such that it is used in commercial language-to-language translation. This is the first time, to our knowledge, that it has been used to translate the internal data structures of a reinforcement learning (RL) agent into natural language. To translate state-action representations into natural language, we collect a corpus of natural language utterances from people performing the task that we will later ask the autonomous system to perform. We treat the string serializations of state-action pairs and natural language utterances as equivalent and train an encoderdecoder neural network that must find a neural embedding that produces a rational natural language output when presented with state-action input. Specifically, we apply AI rationalization to an agent that plays the game, Frogger (Figure 1). We do this for two reasons. First, we aim to show the technique works on an autonomous system without adaptation—we use an arbitrary state representation designed for the game without consideration for rationalization. This environment is also notable because conventional learning algorithms do not learn to play Frogger like human players, and our target audience would not be expected to understand the specific information about how an agent learns to play this game or why it makes the decisions that it does. Most people will probably also not have knowledge on the specific state representation used in these environments. Second, we aim to establish a baseline that we can compare to later investigations of AI rationalization applied to state abstractions learned from a deep learning techniques such as a convolutional neural network. The contributions of our paper are as follows: • We introduce the concept of AI rationalization as an approach to explainable AI. • We describe a technique for generating rationalizations that treats explanation generation as a language translation problem. • We report on an experiment using synthetic data to assess the accuracy of the translation technique. • We identify a number of subsequent hypotheses about AI rationalization. The remainder of the paper is organized as follows. In Sections 2 we overview related work on explainable AI. In Section 3, we describe our AI rationalization technique. Section 4 describes specific details of how we used rationalization in experiments designed to assess the accuracy of the technique. In Section 5 we propose a number of next steps in the investigation of rationalization as an approach to explainable AI.

Figure 1. The Frogger game environment.

2. Related Work Interpretability has received an increasing amount of attention recently. For a model to be interpretable it must be possible to explain why it generates certain outputs or behaves in a certain way. Inherently, some machine learning techniques produce models that are more interpretable than others. Models such as decision trees (Letham et al., 2015), generalized additive models (Caruana et al., 2015), and attention-based approaches (Xu et al., 2015) have the benefit of being naturally interpretable. This is because one need only examine the output to understand the model’s decision making process (in the case of rule based approaches such as decision trees) or examine some internal features which reveal which areas of the model’s feature space have the greatest influence on prediction (in the case of attention and regression-based approaches). While these techniques are effective, relying solely on them for interpretable models greatly inhibits machine learning researchers by limiting the types of models they can use for learning. An alternate approach to creating interpretable machine learning models involves creating separate models of explainability that are often built on top of black box techniques such as neural networks. These approaches, sometimes called model-agnostic (Ribeiro et al., 2016) approaches, allow greater flexibility in model selection since they enable black-box models to become interpretable; however, the added layer of complexity incurred by creating a model specifically for explainability can be costly in terms of training time and explanation quality when compared to naturally interpretable models. A common ap-

Natural Language Rationalizations " annota4ons%

states%

Task%

" ac4ons%

Corpus%Genera4on%

LSTM%Encoder%

LSTM%Decoder%

Training%the%model%

Natural% Language% Explana4on%

Ra4onaliza4on%%

Figure 2. The workflow for our proposed neural machine translation approach to rationalization generation.

and deemed this action to be the most likely to achieve the proach to creating model-agnostic is to visualize the inhighest expected reward according to iterative application ternal state of the model, as has been done with convoluof the Bellman update equation.” The closer the explanational neural networks (Zeiler & Fergus, 2014; Yosinski et al., 2015). Other approaches seek to learn a naturally tion to the actual implementation of the decision-making or machine learning algorithm, the more background knowlinterpretable model which describes predictions that were Use"these"bullets"for"each"stage"if"you"don’t"like"the"icons"placed"next"to"the" edge the human operator will require to make sense of the made (Krause et al., 2016) or by intelligently modifying current"labels"such"as"training"the"model,"corpus"genera=on,"etc" explanation. model inputs so that resulting models can describe how outputs are affected. Rationalization is geared toward human operators who may not have significant background knowledge in computer Explainable AI has been explored in the context of ad-hoc science or artificial intelligence (including machine learntechniques for transforming simulation logs to explanations ing). Rationalizations generated by an autonomous or (van Lent et al., 2004), intelligent tutoring systems (Core et al., 2006), and transforming AI plans into natural lansemi-autonomous system do not need to accurately reflect the true decision-making process underlying the agent sysguage (van Lent et al., 2005). Our work differs in that we tem, but must still give the operator some insight into what explore the application of explainable AI techniques to sethe agent is doing and trying to achieve. Human rationalquential decision-making in stochastic domains focus on izations are often produced quickly and thus we propose generating rationalizations, not explanations. that AI rationalizations should also be generated quickly; Our work differs from this previous work in that we specifthe ideal use case is in real-time critical scenarios in which ically explore rationalization. By focusing on rationalizait may be inconvenient or impossible to quickly produce tion we relax the requirement that explanations need to be an explanation that can be understood and acted upon by faithful to the original machine learning algorithm’s decia human operator. If more accurate explanations are necsion making process. This allows us more flexibility in creessary, rationalization can be coupled with other explanaating explanations; however, situations may arise where rations, interpretations, or visualization techniques that retionalizations would be detrimental to a human seeking to quire greater time and greater cognitive effort to underunderstand an algorithm’s motives. stand.

3. AI Rationalization Rationalization is a form of explanation that attempts to justify or explain an action or behavior. Whereas explanation implies an accurate account of a decision-making process, AI rationalization suggests that the explanation is the one that a human would most likely give were he or she in full control of an agent or robot. There is no clear guidance on what makes a good explanation; for an agent using Q-learning (Watkins & Dayan, 1992), explanations of decisions could range from “the action had the highest Q value given this state” to “I have explored numerous possible future state-action trajectories from this point

This paper, which is the first to explore the question of rationalization as an approach to explanation, looks at the accuracy of rationalizations generated by neural machine translation techniques. Our approach for translating stateaction-state sequences to natural language consists of two general steps. First, we must create a training corpus of natural language and state-action pairs. Second, we use this corpus to train an encoder-decoder network to translate the state and action information to natural language. Our system’s workflow is shown in Figure 2.

Natural Language Rationalizations

3.1. Training Corpus In order to learn translation from state-action pairs to language, we must first create a corpus of natural language descriptions of behavior and their corresponding state and action information. To do this, we ask people to complete the agent task’s in a virtual environment and are prompted to “think aloud” as they perform the task. We record the visited states and performed actions along with the natural language utterances of critical states and actions. This method of corpus creation ensures that the annotations gathered are associated with specific states and actions. In essence we create parallel corpora, one of which contains state representations and actions, the other containing natural language utterances. The state representation and actions constitute a “machine language”. The general approach is agnostic to the details of how the parallel corpora are obtained. We describe the specific methodology we used to create the parallel corpora for the purpose of experimentation in Section 4.1. The precise representation of states and actions in the autonomous system does not matter as long as they can be converted to strings in a consistent fashion. Autonomous agent may have a state representation tailored to a specific environment or may use a neural network to learn a representation of the environment. Our approach emphasizes that it should not matter how the state representation is structured and the human operator should not need to know how to interpret it. 3.2. Translation from Internal Representation to Natural Language In this work, we use encoder-decoder networks to translate between complex state and action information and natural language rationalizations. Encoder-decoder networks, which have primarily been used in machine translation and dialogue systems, are a generative architecture comprised of two component networks that learn how to translate an input sequence X = (x1 , ..., xT ) into an output sequence Y = (y1 , ..., yT0 ). The first component network, the encoder, is a recurrent neural network (RNN) that learns to encode the input vector X into a fixed length context vector v. This vector is then used as input into the second component network, the decoder, which is a RNN that learns how to iteratively decode this vector into the target output Y . Since the input data that we work with in this paper can be quite long it is possible that the context vector may lose information about the input sequence. To address this, we specifically use an encoder-decoder network with an added attention mechanism (Luong et al., 2015). The attention mechanism allows the decoder network to selectively peek at the input data and use this information along with the

Figure 3. Testing and training maps made with 25% obstacles (left), 50% obstacles (center), and 75% obstacles (right).

context vector to iteratively predict outputs.

4. Experiments The objective of the experiments we describe below is to assess the accuracy of the neural machine translation approach to rationalization. Evaluating natural language generation is challenging; utterances can be “correct” even if they do not exactly match known utterances from a testing dataset. To facilitate the assessment of rationalizations generated by our technique, we devised a technique whereby semi-synthetic natural language was paired against stateaction representations internal to an autonomous system. The semi-synthetic language was produced by observing humans “thinking out loud” while performing a task and then creating a grammar that reproduced and generalized the utterances (described below). Our evaluation methodology enables us to identify the rules that fire during autonomous system execution and compare rationalizations generated by the neural machine translation network to a ground-truth. Later, we can replace the grammar with a corpus entirely trained on humans performing a task in the wild. We conducted the experiments by generating rationalizations for states and actions in the game Frogger. In this environment, the agent is a frog that must navigate from the bottom of the screen to the top. Scattered throughout the environment are obstacles that the agent must avoid as well as platforms that the agent must jump on in order to successfully reach its goal. The actions available to the agent in this environment are movement actions in the four cardinal directions and action for standing still. We chose Frogger as an experimental domain because computer games have been demonstrated to be good stepping stones toward real-world stochastic environments (Laird & van Lent, 2001; Mnih et al., 2015) and because Frogger is fast-paced, stochastic, has a reasonably rich state space, and yet can be learned optimally without too much trou-

Natural Language Rationalizations

ble. Furthermore, unlike other games, the maps can be easily modified to produce playable variations and Frogger’s starting point can changed. We evaluate our rationalization technique two two baselines. The first baseline, randomly selects any sentence that can be generated by the testing grammar. The second baseline will always select sentences associated with the rule that is most commonly used to generate rationalizations on a given map. 4.1. Methodology 4.1.1. G RAMMAR C REATION In order to translate between state information and natural language, we first need ground truth language containing rationalizations that can be associated explicitly with this state and action information. To generate this information, we used crowdsourcing to gather a set of gameplay videos of human participants playing Froggerwhile engaging in a think-aloud protocol, following the work by (Dorst & Cross, 2001; Fonteyn et al., 1993). In total, we gathered recordings of 12 participants from 3 continents (4 from Asia, 3 from Europe, and 5 from North America) who were recruited online. In these recordings, we specifically asked participants to verbalize an their internal thoughts while playing Frogger as if they were speaking to another person. Each person was allowed to play one level of the game, and each person played the same game level as each other participant. After players completed the game, they uploaded their gameplay video to an online speech transcription service. This service generated transcripts of each players speech from their video recordings. This transcript was then presented to each player and they were asked to map important utterances to specific actions available in the game. They were also asked to assign utterances to pre-assigned stages of the game (e.g. being in the starting row, being in the region of the game with cars, etc.). This process produced 225 action-rationalization trace pairs of gameplay aligned with speech. Typically, think-aloud protocol studies lack self-validation that gives the participant the voice to select important aspects of their transcript. This method of running a thinkaloud study, however, gives players some amount of autonomy in specifying important utterances that may not be discovered otherwise. This layer of self-validation facilitates the robustness of the data. We then use these action-rationalization annotations generate a grammar for generating sentences. The role of this grammar is to act as a representation of the reasoning that humans go through when they generate rationalizations or explanations for their actions. Thus, a grammar allows us

to associate rules about the environment with the rationalizations that are generated. One benefit of using a grammar is that it allows us to explicitly evaluate how accurate our system is at producing the right kind of rationalizations. Since the grammar contains the rules that govern when certain rationalizations are generated, it allows us to compare automatically generated rationalizations against a groundtruth that one would not normally have if the entire corpus was crowdsourced in the wild. To generate the grammar, we used qualitative coding schemes and clustered the user annotation data that we gathered previously. Our technique was inspired by Grounded Theory (Strauss & Corbin, 1994), which provides an analytical tool and facilitates discovery of emerging patterns in data (Walsh et al., 2015). Once clustered, we used Tracery (Compton et al., 2014), an authoring tool based on a standard context-free grammar (CFG), to build the grammar which we used to generate ground truth rationalizations. 4.1.2. T RAINING AND T EST S ET G ENERATION Since we are using a grammar produce ground truth rationalizations, one can interpret the role of the encoderdecoder network as learning to reproduce this grammar. In order to train the network to do this, we use the grammar to generate rationalizations for each state in the environment. The rules that the grammar uses to generate rationalizations are based on a combination of the state of the world and the action taken. Specifically, the grammar uses the following triple to determine which rationalizations to generate: (s1 , a, s2 ). Here, s1 is the starting state of the world, a is the action that was performed in s1 , and s2 is the resulting state of the world after action a is executed. States s1 and s2 consist of the (x,y) coordinates of the agent and the current state of grid environment. Our grammar would generate a rationalization for each possible (s1 , a, s2 ) triple in the environment. These triples are then clustered based on the rules that were used to generate these rationalizations. For evaluation, we take 20% of the triples in each of these clusters and set them aside as a testing set. This ensures that the testing set is a representative sample of the parent population while still containing example triples associated with each rule in the grammar. To aid in training we duplicate training examples until the training set contains 1000 examples per grammar rule. We then inject noise into these training samples in order to help avoid overfitting and help the network possibly generalize to other map types. Recall that the input to the encoderdecoder network is a triple of the form (s1 , a, s2 ) where s1 and s2 are states. Also recall that each state does include a full representation of the Frogger map at the current time. To inject noise, we randomly select 30% of the rows in this

Natural Language Rationalizations

map representation for both s1 and s2 and redact them by replacing them with a dummy value. This set of examples becomes our training set. To evaluate the extent to which rationalization generalizes to different environments other than the one the encoderdecoder network was trained on, we developed three different maps. The first map was randomly generated by filling 25% of the bottom with car obstacles and filling 25% of the top with log platforms. The second map was 50% cars/logs and the third map was 75% cars/logs. See Figure 3. For the remainder of the paper, we refer to these maps as the 25% map, the 50% map, and the 75% map respectively. We also ensured that it was possible to complete each of these maps to act as a loose control on map quality. 4.1.3. T RAINING AND TESTING THE N ETWORK The parallel corpus of state-action representations and natural language are used to train an encoder-decoder neural translation algorithm based on (Luong et al., 2015). Specifically, we use a 2-layered encoder-decoder network with an embedding layer and an attention mechanism. The encoder and decoder networks are long short-term memory (LSTM) recurrent neural networks where each LSTM node has a hidden size of 300. We train the network for 50 epochs, after which the trained model is used to generate rationalizations for each triple in the testing set. To further test the generalization of this technique, we also have each model produce rationalizations for each triple in the testing set for each other map used during evaluation. Consider the 25% map as an example. After training, the model trained on the 25% map will then generate rationalizations for states sampled from the 25% map. To test generalization, this model will also be asked to generate rationalizations for states sampled from the 50% map as well as the 75% map. The goal of this evaluation is to determine how appropriate a rationalization is by determining how well it matches when the grammar would produce a similar rationalization. To evaluate accuracy, we need to have a way to associate the sentence predicted by our model with a rule that exists in our grammar. Because encoder-decoder networks are generative, we cannot explicitly match the generated sentences to sentences that are generated by the grammar. To determine the set of rules that are associated with a predicted sentence, we use BLEU (Papineni et al., 2002) score to calculate sentence similarity. The BLEU score is a common metric used in machine translation to determine the quality of translated sentences with respect to a reference sentence. We calculate the BLEU score between the sentence generated by our predictive model with each sentence that can be generated by the grammar and record the sentence that achieves the highest score. We can then identify which rule

in the grammar would have been used to generate this sentence and use that to calculate accuracy. If this rule matches the rule that was used to produce the test sentence then we say that it was a match. Total accuracy is defined as the percentage of the predictions that matched their associated test example. If the predicted sentence does not achieve a BLEU score of greater than 0.7 when compared to any reference sentence, then we say that this predicted sentence does not match any sentence in the grammar and this automatically gets counted as a mismatch. This threshold is put in place to ensure that low quality rationalizations in terms of grammar and syntax do not get erroneously matched to rules in the grammar. It is possible for a generated sentence to be associated with more than one rule in the grammar. If the rule that generated the testing sentence matches at least one of the rules associated with the generated sentence, then we count this as a match as well. 4.2. Results A screenshot of an agent trained to play Frogger and “thinking out loud” after every action using our system is shown in Figure 4. The results of our experiments can be found in Table 1. As you can see in the table, the encoderdecoder network was able to consistently outperform both the random baseline and majority baseline models. Comparing the maps to each other, the encoder-decoder network produced the highest accuracy when generating rationalizations for the 75% map, followed by the 25% map and the 50% map respectively. To evaluate the significance of the observed differences between these models, we ran a chisquared test between the models produced by the encoderdecoder network and random predictor as well as between the encoder-decoder network models and the majority classifier. Each difference was deemed to be statistically significant (p < 0.05) across all three maps. To test how well this technique generalizes to other types of maps, we performed another set of experiments where models built on specific maps were used to predict test sentences generated by the grammar for different maps. The results of these experiments can be found in Table 2. These results show that the model built off of the 25% map had difficulty generating the correct rationalization for states drawn from th 50% map as well as the 75% map, achieving accuracy values of 0.124 and 0.159 respectively. The models trained on the 50% map and the 75% had similar success generating rationalizations for states generated from the 25% map achieving accuracy values of 0.453 and 0.442 respectively. The model trained on the 50% map did reasonably well generating rationalizations for states drawn from the 75% (accuracy = 0.674). This value is actually comparable to how this model performed generating ratio-

Natural Language Rationalizations

centage. This means that this network was able to better learn when it was appropriate to generate certain rationalizations when compared to the random and majority baseline models. Given the nature of our test set as well, this gives evidence to the claim that these models can generalize to unseen states as well. It is important to note, however, that claims about the possibility of these maps generalizing using the results in Table 1 only extend to states drawn from maps with similar obstacle densities. While it is not surprising that encoder-decoder networks were able to outperform these baselines, the margin of difference between these models is worth noting. The performances of both the random and majority classifiers are a testament to the complexity of this problem.

Figure 4. The Frogger agent rationalizing its actions.

Table 1. Accuracy values for Frogger environments with different obstacle densities. Accuracy values for sentences produced by the encoder-decoder network (full) significantly outperform those generated by a random model and a majority classifier as determined by a chi-square test.

Map 25% obstacles 50% obstacles 75% obstacles

Full 0.777 0.687 0.80

Random 0.00 0.00 0.00

Majority vote 0.168 0.204 0.178

nalizations for states sampled from the 50% map (accuracy = 0.687). The accuracy value that the model trained on the 75% also performed comparably on states sampled from the 50% map (0.732) and those sampled from the 75% map (0.80). 4.3. Discussion The first thing to note about our results is that the models produced by the encoder-decoder network significantly outperformed the baseline models in terms of accuracy per-

Table 2. Accuracy values for Frogger environments with different obstacle densities. Models were tested on the test sets created from different Frogger maps.

Testing Map 25% obstacles 50% obstacles 75% obstacles

25% obst. – 0.453 0.442

Training states 50% obst. 75% obst. 0.124 0.159 – 0.674 0.732 –

The results of our second set of experiments warrant further discussion as well. The goal of these experiments was to see how well models would generalize to states sampled from maps with different obstacle densities. What we found is that in general, the model trained on the 25% map did not generalize well to the other maps. This is likely due to the fact that this map contained significantly fewer obstacles than the other two maps. When learning, the encoderdecoder network how to associate certain obstacle configurations with natural language rationalizations. Since there were very few obstacles in the 25% map, the network did not have enough information to determine how to translate between states with high-obstacle densities and natural language. In other words, this model did not generalize well because there were no samples that were suitable for learning about states with high obstacle density. The models trained on the 50% and 75% maps fared much better generalizing to different maps. Both models achieved similar performance on the test set sampled from the 25% map. These accuracy values, while still well the highest accuracy values achieved, support the idea that the presence of more obstacles in this environment makes it more likely that the model will be able to generalize. This is likely due to the nature of the rules associated with the grammar used to generate the training sentences in these environment. These grammar rules often used the presence of the different obstacles in frogger to determine when to generate certain sentences. Since these environments contained a high density of obstacles, it was more likely that the network could learn state abstractions useful in predicting when certain natural language should be generated. It is also interesting to note that the 75% model achieved similar accuracy values on the 75% map test set as well as the 50% test set. We also observe similar behavior with the 50% maps. This most likely occurs because both these maps ended up with similar densities due to variance in the map generation algorithm. This result still helps further support the claim that these models will generalize to

Natural Language Rationalizations

different maps of similar densities.

5. Future Work Having determined that neural machine translation can accurately produce rationalizations for state-action pairs, the next step is to investigate hypotheses about how rationalizations impact human trust, rapport, and willingness to use autonomous and semi-autonomous systems. To address these questions, it will be necessary to conduct experiments in which non-expert human operators work with autonomous systems to achieve a particular task. Questionnaires can be devised to assess human operators’ subjective feelings of trust and rapport. Treating trust and rapport as dependent variables, between-subject studies can be devised so that different populations work with autonomous systems with and without rationalization abilities. One can look at the effects of “thinking out loud” during execution versus rationalization when something unexpected occurs. Further, one can devise interventions to ensure that a failure happens or that the agent is trained to optimize a reward function that does not match operator expectations. Since rationalizations do not truly reflect the autonomous systems’ underlying decision-making process, the question arises of how inaccurate can rationalizations be before subjective feelings of trust and rapport are significantly affected. The experimental methodology we describe above can be adapted to inject increasingly more error into the rationalizations.

6. Conclusions AI rationalization provides a new lens through which we can explore the question of what makes a good explanation of the decisions of an autonomous or semi-autonomous agent. In the future, we envision the increasing need for human operators of autonomous or semi-autonomous who do not have computer science or AI backgrounds. These operators will naturally want to know why an agent fails to achieve an objective or why its behavior deviates from expectations. Further, operators may need to quickly and effortlessly model the reasoning behind agent decisions in real-time scenarios. Since AI rationalization is the process of generating explanations as if a human had done the behavior, we hypothesize that rationalization will increase human operators’ feelings of trust, rapport, and comfort using autonomous and semi-autonomous systems. We have shown that our approach to AI rationalization, using neural machine translation from the autonomous agent’s internal state and action representations to natural language produces rationalizations with accuracies above

baselines. Testing hypotheses about trust and rapport are discussed as next steps in determining the applicability of rationalization as an approach to explainable AI. Rationalization allows autonomous systems to appear more natural and human-like in their decision-making when their decision-making processes can be very complicated and different than human reasoning. We believe that AI rationalization can be an important step toward deployment of real-world commercial robotic systems including in healthcare, disability support, personal services, and military teamwork.

References Caruana, Rich, Lou, Yin, Gehrke, Johannes, Koch, Paul, Sturm, Marc, and Elhadad, Noemie. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730. ACM, 2015. Compton, Kate, Filstrup, Benjamin, et al. Tracery: Approachable story grammar authoring for casual users. In Seventh Intelligent Narrative Technologies Workshop, 2014. Core, Mark, Lane, H. Chad, van Lent, Michael, Gomboc, Dave, Solomon, Steve, and Rosenberg, Milton. Building Explainable Artificial Intelligence Systems. In Proceedings of the 18th Innovative Applications of Artificial Intelligence Conference, Boston, MA, July 2006. Dorst, Kees and Cross, Nigel. Creativity in the design process: co-evolution of problem–solution. Design studies, 22(5):425–437, 2001. Fonteyn, Marsha E, Kuipers, Benjamin, and Grobe, Susan J. A description of think aloud method and protocol analysis. Qualitative Health Research, 3(4):430– 441, 1993. Krause, Josua, Perer, Adam, and Ng, Kenney. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 5686–5697. ACM, 2016. Laird, John and van Lent, Michael. Human-level ais killer application: Interactive computer games. AI Magazine, 22(2):15–25, 2001. Letham, Benjamin, Rudin, Cynthia, McCormick, Tyler H, Madigan, David, et al. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics, 9(3): 1350–1371, 2015.

Natural Language Rationalizations

Luong, Minh-Thang, Pham, Hieu, and Manning, Christopher D. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421, Lisbon, Portugal, September 2015. Association for Computational Linguistics. Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei, Veness, Joel, Bellemare, Marc, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas, Ostrovski, Georg, Petersen, Stig, Beattie, Charles, Sadik, Amir, Antonoglou, Ioannis, King, Helen, Kumaran, Dharshan, Wierstra, Daan, Legg, Shane, and Hassabis, Demis. Human-level control through deep reinforcement learning. Nature, 518(7540):529533, 2015. Papineni, Kishore, Roukos, Salim, Ward, Todd, and Zhu, Wei-Jing. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics, 2002. Ribeiro, Marco Tulio, Singh, Sameer, and Guestrin, Carlos. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM, 2016. Strauss, Anselm and Corbin, Juliet. Grounded theory methodology. Handbook of qualitative research, 17: 273–85, 1994. van Lent, Michael, Fisher, William, and Mancuso, Michael. An explainable artificial intelligence system for small-unit tactical behavior. In Proceedings of the 16th conference on Innovative Applications of Artifical Intelligence, 2004. van Lent, Michael, , Carpenter, Paul, McAlinden, Ryan, and Brobst, Paul. Increasing replayability with deliberative and reactive planning. In 1st Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pp. 135–140, Maria Del Rey, California, June 2005. Walsh, Isabelle, Holton, Judith A, Bailyn, Lotte, Fernandez, Walter, Levina, Natalia, and Glaser, Barney. What grounded theory is a critically reflective conversation among scholars. Organizational Research Methods, 18 (4):581–599, 2015. Watkins, Chris and Dayan, Peter. Q-learning. Machine Learning, 8(3-4):279292, 1992. Xu, Kelvin, Ba, Jimmy, Kiros, Ryan, Cho, Kyunghyun, Courville, Aaron, Salakhudinov, Ruslan, Zemel, Rich,

and Bengio, Yoshua. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of The 32nd International Conference on Machine Learning, pp. 2048–2057, 2015. Yosinski, Jason, Clune, Jeff, Fuchs, Thomas, and Lipson, Hod. Understanding neural networks through deep visualization. In In ICML Workshop on Deep Learning. Citeseer, 2015. Zeiler, Matthew D and Fergus, Rob. Visualizing and understanding convolutional networks. In European conference on computer vision, pp. 818–833. Springer, 2014.

Suggest Documents