A Hierarchical Reinforcement Learning Based ...

A Hierarchical Reinforcement Learning Based Ar;ficial Intelligence for Non-‐Player Characters in Video Games Hiram Ponce and Ricardo Padilla Universidad Panamericana Tecnológico de Monterrey, CCM

13th MICAI 2014, Tuxtla Gu;érrez, Mexico

1

November, 2014

Contents Introduc;on Related Work AI-‐Based Video Games Techniques Reinforcement Learning Mechanisms

MaxQ-‐Q Algorithm Descrip;on of the Proposal Architecture of an NPC Implementa;on Experiment Design and Metrics

Results and Discussion Predictability Natural Humanness Assessment

Conclusions and Future Work


2

November, 2014

Introduc:on Video games conforms a huge industry that is always developing new technologies. Improvements for user experience in video games: Aiming to create more entertaining and realis;c games for user immersion Mostly achieved by manipula;ng graphics and game performance Non-‐player characters (NPCs) are one aspect in game performance NPC interac;on accordingly to current situa;ons, AI-‐based NPCs

Drawbacks of NPCs: Tend to be predic;ve Tend to stop challenge users Perceived as straigh]orward reac;ve agents

AI-‐based non-‐player characters (NPCs): Gives them the look-‐and-‐feel of intelligent agents But requires natural humanness assessment

Natural humanness assessment is related to the ability of an agent to be unpredictable as possible, and to learn and adapt from changes of the environment 13th MICAI 2014, Tuxtla Gu;érrez, Mexico

3

November, 2014

Introduc:on Decision-‐making problem in NPCs: Decision response has to be in real-‐;me, i.e. performance of the game does not be affected Decisions have to be more natural, for arising NPCs more human-‐like agents

NPC architectures: Reac4ve approaches: NPCs are previously coded, lacking the intelligent performance Delibera4ve approaches: ;me and memory consuming increase substan;ally Reinforcement learning approach as a trade-‐off between reac;ve and delibera;ve agents

Reinforcement learning approach in NPCs: Training step allows NPCs to learn from environment (e.g. increase natural humanness assessment) Implementa4on step allows reac;ve ac;ons (e.g. reduce ;me consuming) Curse of dimensionality as a weakness, but improved by hierarchical reinforcement learning

This work proposes an alterna:ve solu:on based on hierarchical reinforcement learning approach in order to: reduce predictability of NPCs make them adapt from changes of the environment act as human-‐like agent as possible provide more enjoyable game experience for players


4

November, 2014

Related Work AI-‐based video games techniques: Finite states machines straigh]orward behavior, easy to program and design, less scalabi;y and usability, lacking on sharing tasks or decisions-‐making processes Behavioral trees Using of constrains and heuris;c cost func;ons to achive good behaviors, good for scalability, sharing tasks but it is completely designed and scripted, cannot adapt from the ini;al behavior established. STRIPS-‐like methodology Requires pre-‐condi;ons models, set of ac;on guide the planning for reaching the goal A-‐star (A*) Planning op;miza;on to get the set of ac;ons in NPCs Supervised learning Imita4on of players’ movements, evolu;on through the game, ;me consuming


5

November, 2014

Related Work Reinforcement learning mechanisms: Reinforcement learning Gives NPCs the ability of adapta4on to the environment, crea;ng own plans of ac;ons NPCs are more believable characters in games, in constrast with imita;on (predictable) Experience as a behavioral mo;va;on, less dependency of scripted NPCs Curse of dimensionality: high dimensional states, large sets of ac;ons, ;me and memory consuming

Hierarchical reinforcement learning Divides the overall state of enviroment, modeled as Markov decision processes (MDPs) or semi-‐MDP State abstrac;on for state reduc;on in certain subtasks

Common HRL techniques: Op;ons Hierarchical abstract machines (HAMs) MaxQ


6

November, 2014

MaxQ-‐Q Algorithm MaxQ method: HRL method that provides hierarchical decomposi;on of a problem in subtasks Value func;on (state reward) is also divided into a set of func;ons for each subtask Each sub-‐problem has its own policy, and is independent Subtasks uses subset of features (state abstrac;on) to learn Subtask is a triple of terminal states, possible ac;ons and pseudo-‐reward func;ons Hierarchical policy is the set of policies of all the graph

MaxQ-‐graph: Set of nodes to represent the whole problem in subtasks Max nodes: subtasks Q nodes: ac4ons of subtasks

MaxQ-‐Q algorithm is an implementa:on of the MaxQ method 13th MICAI 2014, Tuxtla Gu;érrez, Mexico

7

November, 2014

MaxQ-‐Q Algorithm

Original Die=erich’s MaxQ-‐Q algorithm 13th MICAI 2014, Tuxtla Gu;érrez, Mexico

8

November, 2014

MaxQ-‐Q Algorithm MaxRoot

QGet

QPut

MaxGet

MaxPut

QPickup

QNavigateForPut

QNavigateForGet

t/source

QPutdown

t/destination

Pickup

Putdown

MaxNavigate(t)

QNorth(t)

QEast(t)

QSouth(t)

QWest(t)

North

East

South

West

An example of a MaxQ-‐graph 13th MICAI 2014, Tuxtla Gu;érrez, Mexico

9

November, 2014

Descrip:on of the Proposal Architecture of an NPC:

Agents NPCs using MaxQ-‐Q algorithm as a decision-‐making process, exploi;ng: Temporal abstrac4on State abstrac4on Task sharing

Environment Anything else from an agent, like: other NPCs, players, obstacles, etc.

Proposed NPC-‐architecture


10

November, 2014

Descrip:on of the Proposal Implementa:on: A simple case study was designed to test and validate the proposed MaxQ-‐based NPC Capture the Flag (CTF): Player: first person shooter (FPS) Enemy: MaxQ-‐based NPC Objec;ve of the game: reach the flag and return to the base before the other player finds it first Set of ac;ons (five): move leM, move right, move to the front, move back, and stay stopped MAxQ-‐based NPC programmed in the middleware Pogamut AI-‐low level code at Java Game environment comes from Unreal Tournament 2004


11

November, 2014

Descrip:on of the Proposal

Example of the FPS view of the CTF strategic game for the case study 13th MICAI 2014, Tuxtla Gu;érrez, Mexico

12

November, 2014

Descrip:on of the Proposal Experiment Design and Metrics: Predictability (quan:ta:ve): ra;o of the expected loss of a short-‐run forecast to the expected loss of a long-‐run forecast predictable (0.0) and unpredictable (1.0) E: value func4ons

between MaxQ-‐based NPC and FSM-‐based NPC (most preferred and implemented AI-‐based technique in video games)

Human-‐like behavior (qualita:ve): Methodology based on the work of Kluwer, et. al. (2012) Post-‐survey applied to players once they finished 7 trials of the game Measuring two dimensions: entertainment -‐ enjoyable, challenging, non-‐boring naturalness -‐ subjec;ve predictability, randomized ac;ons, non-‐adapta;on to user’s gameplay, humanness assessment, hardness of bea;ng 13th MICAI 2014, Tuxtla Gu;érrez, Mexico

13

November, 2014

Results and Discussion Predictability (quan:ta:ve): MaxQ-‐based NPCs are 52.5% more unpredictable, than the most NPCs developed in video games 75% of unpredictable values are enclosed under 38.6% represen;ng a strong evidence that MaxQ-‐based NPCs challenge to users with unthinkable ac;ons Empirically, predictability has to be 50.0%: balance between ability to solve the game and challenging


14

November, 2014

Results and Discussion Natural humanness assessment (qualita:ve): Likert scale (no neutral op4on for polariza4on purposes): totally disagree (-‐2) disagree (-‐1) agree (1) totally agree (2)


15

November, 2014

Conclusions and Future Work MaxQ-‐Q HRL algorithm based NPC for improving natural humanness assessment A simple case study of CTF were presented, measuring: predictability (52.5% more unpredictable MaxQ-‐based NPC vs FSM-‐based NPC) qualita;ve measures confirms the previous result non-‐randomiza;on of ac;ons, non-‐adapta;on of gameplay were perceived by users improves hardness of bea;ng and humanness assessment

This work formally introduces a measurement of predictability in NPCs of video games Future work: Enlarge the sample size in the case study Covering a wide range of users Quan;ta;ve measurements: online ;me processing, memory usage, Turing test Automated state abstrac;on in MaxQ-‐based NPCs


16

November, 2014

References [1] A. Botea, R. Herbrich, and T. Graepel. Video Games and Ar4ficial Intelligence. Microsom Research Cambridge, Sydney, Australia, 2008. [2] F. X. Diebold and L. Kilia. Measuring predictability: Theory and macroeconomic applica;ons. Journal of Applied Econometrics, 16:675–669, 2001. [3] T. Dieperich. Hierarchical reinforcement learning with MAXQ value func;on decomposi;on. Journal of Ar4ficial Intelligence Research, 13:227–303, 2000. [4] J. K. Gemrot. Agents for Games and Simula4ons, chapter Pogamut 3 Can Assist Developers in Building AI (Not Only) For Their Videogame Agents, pages 1–15. Springer, 2010. [5] R. Herbich, M. Hapon, and M. Tipping. Mixture model for mo4on lines in a virtual reality environment. Technical Report US Patent 7358973 B2, Microsom Corpora;on, 2013. [6] D. Isla. Building a be=er ba=le. In Game Developers Conference, San Francisco, 2008. [7] T. Kluwer, F. Xu, P. Adolphs, and H. Uszkoreit. Evalua4on of the komparse conversa4onal non-‐player characters in a commercial virtual world. In Interna;onal Conference on Language Resources and Evalua;on, number 3535-‐3542, Istanbul, 2012. [8] J. Llargues, J. Peralta, R. Arrabales, M. Gonzalez, P. Cortez, and A. Lopez. Ar;ficial intelligence approaches for the genera;on and assessment of believable human-‐like behaviour in virtual characters. Expert Systems With Applica4ons, 41(15):7281–7290, 2014.


17

November, 2014

References [9] R. Mikkulainen. Crea4ng intelligent agents in games. In The Bridge, pages 5–13, 2006. [10] T. Mitchell. Machine Learning. McGraw Hill, 1997. [11] R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Proceedings of the 1997 Conference on Advances in Neural Informa;on Processing Systems, pages 1043–1049. MIT Press Cambridge, 1997. [12] R. Supon, D. Precup, and S. Singh. Between MDPs and semi-‐MDPs: A framework for temporal abstrac;on in reinforcement learning. Ar4ficial Intelligence, 112:181–211, 1999. [13] A. Taylor. HQ-‐DoG: Hierarchical q-‐learning in domina4on games. Master’s thesis, The University of Georgia, August 2012. [14] M. Wooldridge. An Introduc4on to Mul4-‐Agent Systems. John Wiley & Sons, 2009.


18

November, 2014

Ques:ons?

Hiram Ponce and Ricardo Padilla [email protected]


19

November, 2014

A Hierarchical Reinforcement Learning Based ...

A Hierarchical Reinforcement Learning Based ...

Suggest Documents

Model-based hierarchical reinforcement learning and ... - BioMedSearch

Utilizing hierarchical extreme learning machine based reinforcement

hierarchical reinforcement learning in continuous

Hierarchical Reinforcement Learning: Assignment of Behaviours to ...

Hierarchical Reinforcement Learning and Task Decomposition in ...

Hierarchical reinforcement learning in communication ... - CiteSeerX

Hierarchical Reinforcement Learning in Computer ... - Pieter Spronck

Hierarchical Imitation and Reinforcement Learning - arXiv

Practical Hierarchical Reinforcement Learning in Continuous Domains

Algorithms for Batch Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning with Deictic ... - Pieter Spronck

Hierarchical Multi-Agent Reinforcement Learning - Inria

Hierarchical Reinforcement Learning and Central Pattern ... - CiteSeerX

A neural model of hierarchical reinforcement learning - Plos

A Hierarchical Reinforcement Learning Method for Persistent Time ...

A neural model of hierarchical reinforcement learning - PLOS

A Reinforcement Learning based, Priority-aware Computational

Designing a Reinforcement Learning-based ... - Semantic Scholar

Schema-Based Modular Reinforcement Learning

Practical Kernel-Based Reinforcement Learning

Bootstrapping Reinforcement Learning-based ...

Reinforcement Learning based approach to

A Hierarchical Approach to Efficient Reinforcement ... - CiteSeerX

A Hierarchical Approach to Efficient Reinforcement ... - CiteSeerX