A Hierarchical Reinforcement Learning Based ...

4 downloads 0 Views 3MB Size Report
Nov 6, 2014 - t/source t/destination. QNavigateForGet. Putdown. QNavigateForPut. East. South. West. North. Pickup. QNorth(t). QEast(t). QSouth(t). QWest(t).
A  Hierarchical  Reinforcement  Learning  Based  Ar;ficial   Intelligence  for  Non-­‐Player  Characters  in  Video  Games Hiram  Ponce  and  Ricardo  Padilla Universidad  Panamericana Tecnológico  de  Monterrey,  CCM

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

1

November,  2014

Contents Introduc;on Related  Work AI-­‐Based  Video  Games  Techniques Reinforcement  Learning  Mechanisms

MaxQ-­‐Q  Algorithm Descrip;on  of  the  Proposal Architecture  of  an  NPC Implementa;on Experiment  Design  and  Metrics

Results  and  Discussion Predictability Natural  Humanness  Assessment

Conclusions  and  Future  Work

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

2

November,  2014

Introduc:on Video  games  conforms  a  huge  industry  that  is  always  developing  new  technologies. Improvements  for  user  experience  in  video  games: Aiming  to  create  more  entertaining  and  realis;c  games  for  user  immersion Mostly  achieved  by  manipula;ng  graphics  and  game  performance Non-­‐player  characters  (NPCs)  are  one  aspect  in  game  performance NPC  interac;on  accordingly  to  current  situa;ons,  AI-­‐based  NPCs

Drawbacks  of  NPCs: Tend  to  be  predic;ve Tend  to  stop  challenge  users Perceived  as  straigh]orward  reac;ve  agents

  AI-­‐based  non-­‐player  characters  (NPCs): Gives  them  the  look-­‐and-­‐feel  of  intelligent  agents But  requires  natural  humanness  assessment

Natural  humanness  assessment  is  related  to  the  ability  of  an  agent  to  be  unpredictable  as   possible,  and  to  learn  and  adapt  from  changes  of  the  environment 13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

3

November,  2014

Introduc:on Decision-­‐making  problem  in  NPCs: Decision  response  has  to  be  in  real-­‐;me,  i.e.  performance  of  the  game  does  not  be  affected Decisions  have  to  be  more  natural,  for  arising  NPCs  more  human-­‐like  agents

NPC  architectures: Reac4ve  approaches:  NPCs  are  previously  coded,  lacking  the  intelligent  performance Delibera4ve  approaches:  ;me  and  memory  consuming  increase  substan;ally Reinforcement  learning  approach  as  a  trade-­‐off  between  reac;ve  and  delibera;ve  agents

Reinforcement  learning  approach  in  NPCs: Training  step  allows  NPCs  to  learn  from  environment  (e.g.  increase  natural  humanness  assessment) Implementa4on  step  allows  reac;ve  ac;ons  (e.g.  reduce  ;me  consuming) Curse  of  dimensionality  as  a  weakness,  but  improved  by  hierarchical  reinforcement  learning

This   work   proposes   an   alterna:ve   solu:on   based   on   hierarchical   reinforcement   learning   approach  in  order  to:   reduce  predictability  of  NPCs make  them  adapt  from  changes  of  the  environment act  as  human-­‐like  agent  as  possible provide  more  enjoyable  game  experience  for  players

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

4

November,  2014

Related  Work AI-­‐based  video  games  techniques: Finite  states  machines straigh]orward  behavior,  easy  to  program  and  design,  less  scalabi;y  and  usability,  lacking  on   sharing   tasks  or  decisions-­‐making  processes Behavioral  trees Using  of   constrains  and  heuris;c   cost  func;ons  to  achive  good  behaviors,  good  for  scalability,  sharing   tasks  but  it  is  completely  designed  and  scripted,  cannot  adapt  from  the  ini;al  behavior  established. STRIPS-­‐like  methodology Requires  pre-­‐condi;ons  models,  set  of  ac;on  guide  the  planning  for  reaching  the  goal A-­‐star  (A*) Planning  op;miza;on  to  get  the  set  of  ac;ons  in  NPCs Supervised  learning Imita4on  of  players’  movements,  evolu;on  through  the  game,  ;me  consuming

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

5

November,  2014

Related  Work Reinforcement  learning  mechanisms: Reinforcement  learning Gives  NPCs  the  ability  of  adapta4on  to  the  environment,  crea;ng  own  plans  of  ac;ons NPCs  are  more  believable  characters  in  games,  in  constrast  with  imita;on  (predictable) Experience  as  a  behavioral  mo;va;on,  less  dependency  of  scripted  NPCs Curse  of  dimensionality:  high  dimensional  states,  large  sets  of  ac;ons,  ;me  and  memory  consuming

Hierarchical  reinforcement  learning Divides  the  overall  state  of  enviroment,  modeled  as  Markov  decision  processes  (MDPs)  or  semi-­‐MDP State  abstrac;on  for  state  reduc;on  in  certain  subtasks

Common  HRL  techniques: Op;ons Hierarchical  abstract  machines  (HAMs) MaxQ

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

6

November,  2014

MaxQ-­‐Q  Algorithm MaxQ  method: HRL  method  that  provides  hierarchical  decomposi;on  of  a  problem  in  subtasks Value  func;on  (state  reward)  is  also  divided  into  a  set  of  func;ons  for  each  subtask Each  sub-­‐problem  has  its  own  policy,  and  is  independent Subtasks  uses  subset  of  features  (state  abstrac;on)  to  learn   Subtask  is  a  triple  of  terminal  states,  possible  ac;ons  and  pseudo-­‐reward  func;ons Hierarchical  policy  is  the  set  of  policies  of  all  the  graph

MaxQ-­‐graph: Set  of  nodes  to  represent  the  whole  problem  in  subtasks Max  nodes:  subtasks Q  nodes:  ac4ons  of  subtasks

MaxQ-­‐Q  algorithm  is  an  implementa:on  of  the  MaxQ  method 13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

7

November,  2014

MaxQ-­‐Q  Algorithm

Original  Die=erich’s  MaxQ-­‐Q  algorithm 13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

8

November,  2014

MaxQ-­‐Q  Algorithm MaxRoot

QGet

QPut

MaxGet

MaxPut

QPickup

QNavigateForPut

QNavigateForGet

t/source

QPutdown

t/destination

Pickup

Putdown

MaxNavigate(t)

QNorth(t)

QEast(t)

QSouth(t)

QWest(t)

North

East

South

West

An  example  of  a  MaxQ-­‐graph 13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

9

November,  2014

Descrip:on  of  the  Proposal Architecture  of  an  NPC:

Agents NPCs  using  MaxQ-­‐Q  algorithm  as  a   decision-­‐making  process,  exploi;ng: Temporal  abstrac4on State  abstrac4on Task  sharing

Environment Anything  else  from  an  agent,  like:   other  NPCs,  players,  obstacles,  etc.

Proposed  NPC-­‐architecture

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

10

November,  2014

Descrip:on  of  the  Proposal Implementa:on: A  simple  case  study  was  designed  to  test  and  validate  the  proposed  MaxQ-­‐based  NPC Capture  the  Flag  (CTF): Player:  first  person  shooter  (FPS)   Enemy:  MaxQ-­‐based  NPC Objec;ve  of  the  game:  reach  the  flag  and  return  to  the  base  before  the  other  player  finds  it  first Set  of  ac;ons  (five):  move  leM,  move  right,  move  to  the  front,  move  back,  and  stay  stopped MAxQ-­‐based  NPC  programmed  in  the  middleware  Pogamut AI-­‐low  level  code  at  Java Game  environment  comes  from  Unreal  Tournament  2004

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

11

November,  2014

Descrip:on  of  the  Proposal

Example  of  the  FPS  view  of  the  CTF  strategic  game  for  the  case  study 13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

12

November,  2014

Descrip:on  of  the  Proposal Experiment  Design  and  Metrics: Predictability  (quan:ta:ve): ra;o  of  the  expected  loss  of  a  short-­‐run  forecast  to  the  expected  loss  of  a  long-­‐run  forecast predictable  (0.0)  and  unpredictable  (1.0) E:  value  func4ons

between   MaxQ-­‐based   NPC   and   FSM-­‐based   NPC   (most   preferred   and   implemented   AI-­‐based   technique  in  video  games)

Human-­‐like  behavior  (qualita:ve): Methodology  based  on  the  work  of  Kluwer,  et.  al.  (2012) Post-­‐survey  applied  to  players  once  they  finished  7  trials  of  the  game Measuring  two  dimensions:   entertainment  -­‐  enjoyable,  challenging,  non-­‐boring naturalness   -­‐  subjec;ve  predictability,  randomized  ac;ons,   non-­‐adapta;on  to  user’s  gameplay,   humanness  assessment,  hardness  of  bea;ng 13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

13

November,  2014

Results  and  Discussion Predictability  (quan:ta:ve): MaxQ-­‐based  NPCs  are  52.5%  more  unpredictable,  than  the  most  NPCs  developed  in  video  games 75%  of  unpredictable   values  are  enclosed  under  38.6%  represen;ng  a  strong   evidence  that  MaxQ-­‐based   NPCs  challenge  to  users  with  unthinkable  ac;ons Empirically,  predictability  has  to  be  50.0%:  balance  between  ability  to  solve  the  game  and  challenging

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

14

November,  2014

Results  and  Discussion Natural  humanness  assessment  (qualita:ve): Likert  scale  (no  neutral  op4on  for  polariza4on  purposes): totally  disagree  (-­‐2) disagree  (-­‐1) agree  (1) totally  agree  (2)

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

15

November,  2014

Conclusions  and  Future  Work MaxQ-­‐Q  HRL  algorithm  based  NPC  for  improving  natural  humanness  assessment A  simple  case  study  of  CTF  were  presented,  measuring: predictability  (52.5%  more  unpredictable  MaxQ-­‐based  NPC  vs  FSM-­‐based  NPC) qualita;ve  measures  confirms  the  previous  result non-­‐randomiza;on  of  ac;ons,  non-­‐adapta;on  of  gameplay  were  perceived  by  users improves  hardness  of  bea;ng  and  humanness  assessment

This  work  formally  introduces  a  measurement  of  predictability  in  NPCs  of  video  games Future  work: Enlarge  the  sample  size  in  the  case  study Covering  a  wide  range  of  users Quan;ta;ve  measurements:  online  ;me  processing,  memory  usage,  Turing  test Automated  state  abstrac;on  in  MaxQ-­‐based  NPCs

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

16

November,  2014

References [1]   A.   Botea,   R.   Herbrich,   and  T.   Graepel.   Video   Games  and   Ar4ficial   Intelligence.   Microsom   Research  Cambridge,   Sydney,  Australia,  2008. [2]  F.  X.  Diebold  and  L.  Kilia.  Measuring  predictability:  Theory  and   macroeconomic   applica;ons.  Journal  of  Applied   Econometrics,  16:675–669,  2001. [3]  T.   Dieperich.  Hierarchical   reinforcement  learning  with   MAXQ  value  func;on  decomposi;on.  Journal   of  Ar4ficial   Intelligence  Research,  13:227–303,  2000. [4]   J.   K.  Gemrot.   Agents  for  Games  and   Simula4ons,  chapter   Pogamut  3   Can   Assist   Developers   in   Building   AI  (Not   Only)  For  Their  Videogame  Agents,  pages  1–15.  Springer,  2010. [5]  R.  Herbich,  M.   Hapon,  and  M.  Tipping.  Mixture  model  for  mo4on   lines   in   a  virtual  reality  environment.  Technical   Report  US  Patent  7358973  B2,  Microsom  Corpora;on,  2013. [6]  D.  Isla.  Building  a  be=er  ba=le.  In  Game  Developers  Conference,  San  Francisco,  2008. [7]  T.   Kluwer,  F.   Xu,  P.  Adolphs,   and  H.  Uszkoreit.  Evalua4on  of  the  komparse   conversa4onal  non-­‐player  characters   in   a   commercial   virtual   world.   In   Interna;onal   Conference   on   Language   Resources   and   Evalua;on,   number   3535-­‐3542,  Istanbul,  2012. [8]   J.  Llargues,  J.  Peralta,  R.  Arrabales,  M.  Gonzalez,  P.   Cortez,  and  A.  Lopez.  Ar;ficial  intelligence  approaches  for  the   genera;on   and   assessment   of   believable   human-­‐like   behaviour   in   virtual   characters.   Expert   Systems   With   Applica4ons,  41(15):7281–7290,  2014.

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

17

November,  2014

References [9]  R.  Mikkulainen.  Crea4ng  intelligent  agents  in  games.  In  The  Bridge,  pages  5–13,  2006. [10]  T.  Mitchell.  Machine  Learning.  McGraw  Hill,  1997. [11]   R.   Parr   and   S.   Russell.   Reinforcement   learning   with   hierarchies   of   machines.   In   Proceedings   of   the   1997   Conference  on  Advances  in  Neural  Informa;on  Processing  Systems,  pages  1043–1049.  MIT  Press  Cambridge,  1997. [12]   R.  Supon,  D.  Precup,  and  S.   Singh.   Between  MDPs  and  semi-­‐MDPs:   A  framework  for  temporal  abstrac;on  in   reinforcement  learning.  Ar4ficial  Intelligence,  112:181–211,  1999. [13]   A.   Taylor.   HQ-­‐DoG:   Hierarchical   q-­‐learning   in   domina4on   games.   Master’s   thesis,   The   University   of   Georgia,   August  2012. [14]  M.  Wooldridge.  An  Introduc4on  to  Mul4-­‐Agent  Systems.  John  Wiley  &  Sons,  2009.

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

18

November,  2014

Ques:ons?

Hiram  Ponce  and  Ricardo  Padilla [email protected]

13th  MICAI  2014,  Tuxtla  Gu;érrez,  Mexico

19

November,  2014