A Hierarchical Reinforcement Learning Based Ar;ficial Intelligence for Non-‐Player Characters in Video Games Hiram Ponce and Ricardo Padilla Universidad Panamericana Tecnológico de Monterrey, CCM
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
1
November, 2014
Contents Introduc;on Related Work AI-‐Based Video Games Techniques Reinforcement Learning Mechanisms
MaxQ-‐Q Algorithm Descrip;on of the Proposal Architecture of an NPC Implementa;on Experiment Design and Metrics
Results and Discussion Predictability Natural Humanness Assessment
Conclusions and Future Work
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
2
November, 2014
Introduc:on Video games conforms a huge industry that is always developing new technologies. Improvements for user experience in video games: Aiming to create more entertaining and realis;c games for user immersion Mostly achieved by manipula;ng graphics and game performance Non-‐player characters (NPCs) are one aspect in game performance NPC interac;on accordingly to current situa;ons, AI-‐based NPCs
Drawbacks of NPCs: Tend to be predic;ve Tend to stop challenge users Perceived as straigh]orward reac;ve agents
AI-‐based non-‐player characters (NPCs): Gives them the look-‐and-‐feel of intelligent agents But requires natural humanness assessment
Natural humanness assessment is related to the ability of an agent to be unpredictable as possible, and to learn and adapt from changes of the environment 13th MICAI 2014, Tuxtla Gu;érrez, Mexico
3
November, 2014
Introduc:on Decision-‐making problem in NPCs: Decision response has to be in real-‐;me, i.e. performance of the game does not be affected Decisions have to be more natural, for arising NPCs more human-‐like agents
NPC architectures: Reac4ve approaches: NPCs are previously coded, lacking the intelligent performance Delibera4ve approaches: ;me and memory consuming increase substan;ally Reinforcement learning approach as a trade-‐off between reac;ve and delibera;ve agents
Reinforcement learning approach in NPCs: Training step allows NPCs to learn from environment (e.g. increase natural humanness assessment) Implementa4on step allows reac;ve ac;ons (e.g. reduce ;me consuming) Curse of dimensionality as a weakness, but improved by hierarchical reinforcement learning
This work proposes an alterna:ve solu:on based on hierarchical reinforcement learning approach in order to: reduce predictability of NPCs make them adapt from changes of the environment act as human-‐like agent as possible provide more enjoyable game experience for players
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
4
November, 2014
Related Work AI-‐based video games techniques: Finite states machines straigh]orward behavior, easy to program and design, less scalabi;y and usability, lacking on sharing tasks or decisions-‐making processes Behavioral trees Using of constrains and heuris;c cost func;ons to achive good behaviors, good for scalability, sharing tasks but it is completely designed and scripted, cannot adapt from the ini;al behavior established. STRIPS-‐like methodology Requires pre-‐condi;ons models, set of ac;on guide the planning for reaching the goal A-‐star (A*) Planning op;miza;on to get the set of ac;ons in NPCs Supervised learning Imita4on of players’ movements, evolu;on through the game, ;me consuming
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
5
November, 2014
Related Work Reinforcement learning mechanisms: Reinforcement learning Gives NPCs the ability of adapta4on to the environment, crea;ng own plans of ac;ons NPCs are more believable characters in games, in constrast with imita;on (predictable) Experience as a behavioral mo;va;on, less dependency of scripted NPCs Curse of dimensionality: high dimensional states, large sets of ac;ons, ;me and memory consuming
Hierarchical reinforcement learning Divides the overall state of enviroment, modeled as Markov decision processes (MDPs) or semi-‐MDP State abstrac;on for state reduc;on in certain subtasks
Common HRL techniques: Op;ons Hierarchical abstract machines (HAMs) MaxQ
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
6
November, 2014
MaxQ-‐Q Algorithm MaxQ method: HRL method that provides hierarchical decomposi;on of a problem in subtasks Value func;on (state reward) is also divided into a set of func;ons for each subtask Each sub-‐problem has its own policy, and is independent Subtasks uses subset of features (state abstrac;on) to learn Subtask is a triple of terminal states, possible ac;ons and pseudo-‐reward func;ons Hierarchical policy is the set of policies of all the graph
MaxQ-‐graph: Set of nodes to represent the whole problem in subtasks Max nodes: subtasks Q nodes: ac4ons of subtasks
MaxQ-‐Q algorithm is an implementa:on of the MaxQ method 13th MICAI 2014, Tuxtla Gu;érrez, Mexico
7
November, 2014
MaxQ-‐Q Algorithm
Original Die=erich’s MaxQ-‐Q algorithm 13th MICAI 2014, Tuxtla Gu;érrez, Mexico
8
November, 2014
MaxQ-‐Q Algorithm MaxRoot
QGet
QPut
MaxGet
MaxPut
QPickup
QNavigateForPut
QNavigateForGet
t/source
QPutdown
t/destination
Pickup
Putdown
MaxNavigate(t)
QNorth(t)
QEast(t)
QSouth(t)
QWest(t)
North
East
South
West
An example of a MaxQ-‐graph 13th MICAI 2014, Tuxtla Gu;érrez, Mexico
9
November, 2014
Descrip:on of the Proposal Architecture of an NPC:
Agents NPCs using MaxQ-‐Q algorithm as a decision-‐making process, exploi;ng: Temporal abstrac4on State abstrac4on Task sharing
Environment Anything else from an agent, like: other NPCs, players, obstacles, etc.
Proposed NPC-‐architecture
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
10
November, 2014
Descrip:on of the Proposal Implementa:on: A simple case study was designed to test and validate the proposed MaxQ-‐based NPC Capture the Flag (CTF): Player: first person shooter (FPS) Enemy: MaxQ-‐based NPC Objec;ve of the game: reach the flag and return to the base before the other player finds it first Set of ac;ons (five): move leM, move right, move to the front, move back, and stay stopped MAxQ-‐based NPC programmed in the middleware Pogamut AI-‐low level code at Java Game environment comes from Unreal Tournament 2004
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
11
November, 2014
Descrip:on of the Proposal
Example of the FPS view of the CTF strategic game for the case study 13th MICAI 2014, Tuxtla Gu;érrez, Mexico
12
November, 2014
Descrip:on of the Proposal Experiment Design and Metrics: Predictability (quan:ta:ve): ra;o of the expected loss of a short-‐run forecast to the expected loss of a long-‐run forecast predictable (0.0) and unpredictable (1.0) E: value func4ons
between MaxQ-‐based NPC and FSM-‐based NPC (most preferred and implemented AI-‐based technique in video games)
Human-‐like behavior (qualita:ve): Methodology based on the work of Kluwer, et. al. (2012) Post-‐survey applied to players once they finished 7 trials of the game Measuring two dimensions: entertainment -‐ enjoyable, challenging, non-‐boring naturalness -‐ subjec;ve predictability, randomized ac;ons, non-‐adapta;on to user’s gameplay, humanness assessment, hardness of bea;ng 13th MICAI 2014, Tuxtla Gu;érrez, Mexico
13
November, 2014
Results and Discussion Predictability (quan:ta:ve): MaxQ-‐based NPCs are 52.5% more unpredictable, than the most NPCs developed in video games 75% of unpredictable values are enclosed under 38.6% represen;ng a strong evidence that MaxQ-‐based NPCs challenge to users with unthinkable ac;ons Empirically, predictability has to be 50.0%: balance between ability to solve the game and challenging
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
14
November, 2014
Results and Discussion Natural humanness assessment (qualita:ve): Likert scale (no neutral op4on for polariza4on purposes): totally disagree (-‐2) disagree (-‐1) agree (1) totally agree (2)
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
15
November, 2014
Conclusions and Future Work MaxQ-‐Q HRL algorithm based NPC for improving natural humanness assessment A simple case study of CTF were presented, measuring: predictability (52.5% more unpredictable MaxQ-‐based NPC vs FSM-‐based NPC) qualita;ve measures confirms the previous result non-‐randomiza;on of ac;ons, non-‐adapta;on of gameplay were perceived by users improves hardness of bea;ng and humanness assessment
This work formally introduces a measurement of predictability in NPCs of video games Future work: Enlarge the sample size in the case study Covering a wide range of users Quan;ta;ve measurements: online ;me processing, memory usage, Turing test Automated state abstrac;on in MaxQ-‐based NPCs
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
16
November, 2014
References [1] A. Botea, R. Herbrich, and T. Graepel. Video Games and Ar4ficial Intelligence. Microsom Research Cambridge, Sydney, Australia, 2008. [2] F. X. Diebold and L. Kilia. Measuring predictability: Theory and macroeconomic applica;ons. Journal of Applied Econometrics, 16:675–669, 2001. [3] T. Dieperich. Hierarchical reinforcement learning with MAXQ value func;on decomposi;on. Journal of Ar4ficial Intelligence Research, 13:227–303, 2000. [4] J. K. Gemrot. Agents for Games and Simula4ons, chapter Pogamut 3 Can Assist Developers in Building AI (Not Only) For Their Videogame Agents, pages 1–15. Springer, 2010. [5] R. Herbich, M. Hapon, and M. Tipping. Mixture model for mo4on lines in a virtual reality environment. Technical Report US Patent 7358973 B2, Microsom Corpora;on, 2013. [6] D. Isla. Building a be=er ba=le. In Game Developers Conference, San Francisco, 2008. [7] T. Kluwer, F. Xu, P. Adolphs, and H. Uszkoreit. Evalua4on of the komparse conversa4onal non-‐player characters in a commercial virtual world. In Interna;onal Conference on Language Resources and Evalua;on, number 3535-‐3542, Istanbul, 2012. [8] J. Llargues, J. Peralta, R. Arrabales, M. Gonzalez, P. Cortez, and A. Lopez. Ar;ficial intelligence approaches for the genera;on and assessment of believable human-‐like behaviour in virtual characters. Expert Systems With Applica4ons, 41(15):7281–7290, 2014.
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
17
November, 2014
References [9] R. Mikkulainen. Crea4ng intelligent agents in games. In The Bridge, pages 5–13, 2006. [10] T. Mitchell. Machine Learning. McGraw Hill, 1997. [11] R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Proceedings of the 1997 Conference on Advances in Neural Informa;on Processing Systems, pages 1043–1049. MIT Press Cambridge, 1997. [12] R. Supon, D. Precup, and S. Singh. Between MDPs and semi-‐MDPs: A framework for temporal abstrac;on in reinforcement learning. Ar4ficial Intelligence, 112:181–211, 1999. [13] A. Taylor. HQ-‐DoG: Hierarchical q-‐learning in domina4on games. Master’s thesis, The University of Georgia, August 2012. [14] M. Wooldridge. An Introduc4on to Mul4-‐Agent Systems. John Wiley & Sons, 2009.
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
18
November, 2014
Ques:ons?
Hiram Ponce and Ricardo Padilla
[email protected]
13th MICAI 2014, Tuxtla Gu;érrez, Mexico
19
November, 2014