Adaptive Game AI Architecture with Player Modeling - IEEE Computer ...

1 downloads 0 Views 409KB Size Report
Adaptive Game AI Architecture with Player. Modeling. Emanuel Mineda Carneiro, Adilson Marques da Cunha, Luiz Alberto Vieira Dias,. Computer Science ...
2014 11th International Conference on Information Technology: New Generations

Adaptive Game AI Architecture with Player Modeling Emanuel Mineda Carneiro, Adilson Marques da Cunha, Luiz Alberto Vieira Dias, Computer Science Division Brazilian Aeronautics Institute of Technology (ITA) Sao Jose dos Campos - SP - Brazil [email protected], [email protected], [email protected]

Abstract—Computer controlled characters in games are expected to present realistic behaviors and adapt them depending on current situation. This paper proposes an approach for a reliable real time adaptive game AI architecture based upon player modeling.

II.

Computer controlled characters in games are expected to present realistic behaviors and adapt, when faced with disadvantageous situations. A lot of research has been made in recent years to fulfill this expectation.

Keywords-Game AI; decision making; adaptive behavior; player modeling.

Some works focused on rule base adaptation and generation. Spronck et al. [3] proposed a technique named Dynamic Scripting, based upon reinforcement learning, which could dynamically adapt a rule base between game sessions. Moreover, rule base generation using evolutionary algorithms was presented by Crocomo and Simoes [4]. These works do not account for changes in player’s behaviors, during game sessions, being unable to adapt in real time, when underperforming.

I.

INTRODUCTION Even though there have been enormous advances in computer graphics, animation, and audio for games, most of them contain very basic Artificial Intelligence (AI) techniques, if any [1]. It is common practice to artificially increase challenge by creating an imbalance between Non-Player Characters (NPCs) and Player Characters (PCs) rather than using a better game AI. The problem with this approach, based upon the use of non-adaptive game AI, is that once a weakness is discovered, nothing stops the human player from exploiting the discovery [2].

Reliable real time adaptive game AI were already proposed in other works. For instance, Bakkes and van den Herik [2] have proposed a case-based approach to decide NPCs’. This approach has based its decision upon the best possible outcome, calculated by matching the game state with a database of gameplay samples, collected from several game sessions. As this approach’s performance relies on a substantial amount of gameplay samples, it is expected to present good results in games that have access to a common database server to store and recover these samples.

Adaptive game AI, capable of changing an NPC reaction based upon player behavior, could solve this problem. Unfortunately, this kind of AI, which usually makes use of machine learning techniques, often needs a huge amount of historical data and/or several training sessions to present a satisfying performance. In addition, some adaptive game AI can generate undesirable behaviors during the learning process. These behaviors could be considered efficient, when evaluated using some set of metrics, but will look unnatural for a human spectator.

An approach based upon player modeling [5] and Markov Decision Processes (MDPs) was proposed by Tan and Cheng [6]. This approach combines an online updated player model with an offline updated world model, to decide on the NPC’s next action.

These issues are aggravated in linear single player games. In this case, there’s scant historical data available and the probability of playing against a defeated important NPC twice, in most cases, is zero. Those factors make it impossible to effectively apply classical machine learning training techniques.

There are other adaptive approaches for the problem each one suited for some specific range of scenarios. This paper architecture, as the one proposed by Tan and Cheng [6], uses player modeling to identify different player styles. Instead of using this information to decide on the next action, it tries to find, on real time, the best behavior against that specific type of player. Each behavior is defined by a deterministic Decision Making System (DMS) and is as reliable as desired by the designer scripting it.

This paper’s objective is to present an approach to create a game AI architecture capable of adapting its behavior online, in a fast and reliable way, according to the player’s inputs, improving the challenge level and reducing the predictability of NPCs. 978-1-4799-3187-3/14 $31.00 © 2014 IEEE DOI 10.1109/ITNG.2014.40

RELATED WORK

40

III. PROPOSED ARCHITECTURE This section discusses the proposed architecture, presenting details on all of its components. This work focuses on decision-making. An implementation of the proposed architecture is a DMS, which fits the game AI model proposed by Millington and Funge [7], as presented in Fig. 1.



An execution management system responsible for allocating execution time for each of the game’s components;



A monitor responsible for recording player actions, for verifying the NPC’s performance against an Adaptation Condition (AC), and for activating a DMS. It sends an alert to the Player Models Manager (PMM) in two occasions: at the end of game sessions or when an AC is met;



A PMM that will find a player model, which best matches the actions record of the active player, and that will define the active DMS accordingly;



An active DMS responsible for defining the next action to be executed by the NPC, according to the world’s condition. At the initialization, a random DMS can be defined as active; and



A movement system responsible for NPC’s movements and animations.

Figure 1: Game AI model [7]

A. Player model The adopted player model was based upon the TAP model [8]. On it, a player is described by its actions or by information generated through it. The TAP model was extended adding weighted connections between it and all available DMSs. Each weight should reflect the effectiveness of the DMS, when used against a player that matches the TAP model. Fig. 2 presents an example of a player model.

Figure 3: The conceptual view

AC is often related to the NPC’s performance. A bad performance, for instance, could trigger the AC, resulting in an alert being sent to the PMM. The AC definition is implementation dependent and affects the amount of adaptation attempts in a game session. C. PMM PMM is the architecture’s core and it is responsible for the player models management. When activated, it performs an algorithm that can be resumed by steps described as follow. 1) Step 1 – Active player model definition Figure 2: A player model based upon TAP model [8]

In this first step, the PMM searches a player model best matching the actions record of the active player.

Researches, as in [9], have demonstrated that this type of model is efficient to represent PCs in games.

Initially, a TAP model is built based upon the actions record of the active player. The resulting TAP model, called player TAP model from this point onward, is then compared with each available player model using cosine similarity (1).

B. System Architecture The system architecture’s conceptual view, as presented in Fig. 3, is composed by the following elements: 

cos(θ)=A.B/||A||.||B||

A world interface that provides access to a collection of data, representing the state of each of the world’s elements;



Depending on the active player’s TAP model and the player models already identified, the PMM can face three different situations: no player model was previously identified; no player

41

The quality of the DMS, resulting from a proposed architecture implementation, will depend on the individual quality of the deterministic DMSs encapsulated by it. With proper values of AC and ST, if there is no single strategy capable of defeating all DMSs, it is guaranteed that the player will need to change strategy, at least once to succeed.

model with high enough similarity value does exist; or there are player models similar to the player TAP model. When no player model was previously identified, a new player model is created taking the player TAP model as basis. In this case, all weighted connections are created with default weights and the newly created player model is defined as active.

IV.

A constant named Similarity Threshold (ST) defines the minimum value of similarity needed to consider two TAP models identical.

VALIDATION

To validate the proposed architecture, a turn-based game was implemented. Fig. 4 presents a screenshot of the game.

When no player model presents a similarity value higher than ST, a new player model is created taking the player TAP model as basis. In this case, weighted connections are created with weights based upon the player model with highest similarity value, using the formula: wNk = wAk.similarity,



where, wNk is the weight of the connection between the new player model and the k-th DMS; and wAk is the weight of the connection between the player model with highest similarity value and the k-th DMS. In this case, the newly created player model is defined as active. When there are player models that present similarity values higher than ST, the one with the highest value is defined as active. It must be noted that a low ST can result in generic player models that are unable to identify a player. Figure 4: Turn-based game built to validate the proposed architecture

2) Step 2 – Weigth adjustment

Two agents, a PC and an NPC, with only one main attribute, named Health, and a minimal set of secondary attributes, compose the game. Health has an initial value of 1000 and the game’s objective is to reduce the adversary agent’s Health to 0. To accomplish this goal, a player has five types of actions: movement, melee attacks, ranged attacks, resource building, and special attacks.

In this step, reinforcement is applied to the connection between the active DMS and the active player model. The reinforcement value depends on the performance of the active DMS, measured since its last activation. A positive reinforcement value should be applied for a strong performance and a negative one for a weak performance. Performance is implementation dependent.

Experiments using the developed game were held to validate the proposed architecture. During these experiments, the NPC was controlled by an implementation of the architecture, featuring three different deterministic DMSs. The PC was also controlled by three different DMSs.

3) Step 3 – Active DMS definition After the weight adjustment, the DMS with the highest connection weight is defined as active. 4) Step 4 – Reinitialization

NPC’s DMSs were named NDMS1, NDMS2, and NDMS3. All of them were Rule-Based Systems (RBSs). NDMS1 had a strategy focused on ranged attacks and would try to keep the NPC at a safe distance from the PC. NDMS2 had a strategy focused on melee attacks and would try to keep the NPC at melee range of the PC. NDMS3, on the other hand, had a strategy focused on defense, resource building, and counterattack.

Finally, it becomes necessary to reinitialize the player’s actions record. This guarantees that a change in the players’ strategy, in response to the adaptation, will result in a new player model, independent of the previously identified one. D. Limitations The proposed architecture is limited by the way it handles player models. It supports only one active player model, at a given time, so it can only be used in single player games.

PC’s DMSs where named PDMS1, PDMS2, and PDMS3. PDMS1 were a RBS with a strategy similar to NDMS2. PDMS2 were a RBS with a strategy similar to NDMS1. PDMS3 used a simple implementation of Dynamic Scripting, composed by the rules of all other DMSs.

It also depends on the ability to track players’ actions and measure the DMS’s performance at any given time. At least, approximations to those features must be available.

Table 1 summarizes the relative individual performance of PC’s DMSs compared to NPC’s DMSs. Performance strength

42

presents the performance variation, during the 10 initial game sessions.

if related to probability of victory. A strong performance indicates a high probability of victory. However, a weak performance indicates a low probability of victory. Table 1: Relative individual performance NDMS1

NDMS2

NDMS3

PDMS1

Strong

Strong

Weak

PDMS2

Weak

Weak

Strong

Strong

Strong

Strong

a

PDMS3

a. After algorithm convergence

Performance was measured according to the following formula: pN = ΔHP - ΔHN,



where pN is the NPC’s performance, ΔHP is the variation of the PC’s Health attribute, and ΔHP is the variation of the NPC’s Health attribute.

Figure 6: The performance variation over the 10 initial game sessions

As expected, two player models were identified, matching PDMS1 and PDMS2. Fig. 7 presents the player model corresponding to PDMS1 and Fig. 8 presents the player model corresponding to PDMS2.

AC was defined as pN < -100.

 

Reinforcement was defined as wMD = pN / ΔT,

 

where wMD is the weight of the connection between the active player model and the active DMS and ΔT is the amount of turns since the DMS was last activated. Player model was defined as shown in Fig. 5.

Figure 7: The first player model identified on experiment 1

Figure 5: The player model

A. Experiments Two experiments, described in the following sections, were conducted. 1) First experiment Figure 8: The second player model identified on experiment 1

The first experiments involved only deterministic DMSs and evaluated the proposed architecture’s ability to identify player models and adapt accordingly.

It should be noted that NDMS3 presents the highest weighted connection in the PDMS1 model and NDMS1 presents the highest weighted connection in the PDMS2 model. Both results were expected as they match what was previously presented on Table 1.

In this experiment, the PC was alternately controlled by PDMS1 and PDMS2. Each DMS would control the PC for an entire game session at a time. The NPC won 889 out of the 1000 game sessions held. It was able to won nine out of the 10 initial game sessions. Fig. 6

43

2) Second experiment

activated by the first time. The second player model was identified, as expected, on the second game session, when the NPC’s performance dropped below the AC for the first time.

In the second experiment, a Dynamic Script implementation was used to control the PC. The experiment had 10000 game sessions. Fig. 9 presents the performance variation, during the 10 final game sessions. The initial game sessions are not worth of note, as the Dynamic Scripting was still in its initial phases of learning.

It is important to notice that, from the third game session onwards, when the NPC’s performance dropped below the AC for the first time, the correct player model was found and the DMS with the highest weighted connection was automatically activated. This implies that, for this game, the proposed architecture was able to identify a player model and the best suited DMS to that player model in the same game session it was first exposed to a PC. The ability to adapt, even using only deterministic DMS on its composition, represents an important feature for developers. Deterministic DMSs are testable, so they can have their quality assured. Regarding response time, the proposed architecture, when adapting, presents an algorithmic complexity that is on par with a deterministic DMS. Its performance depends on the number of recorded actions, the amount of identified player models, and the combined amount of DMSs. As these parameters are directly or indirectly configurable, it is possible to fine-tune as needed. VI.

CONCLUSIONS AND FUTURE WORKS This paper presented an approach for a game AI adaptive architecture based on player modeling. The experiments’ results have shown that, in a given scenario, it could adapt, in real time, without the need of long training periods or huge amounts of historical data.

Figure 9: The performance variation over the 10 final game sessions

Taking into account the individual performances presented in Table 1, it would be expected the PC to outperform the NPC, after convergence of the Dynamic Scripting algorithm. Nevertheless, both agents presented similar performances. The NPC won only 536 out of the 1000 final game sessions held.

Even with its limitations, the proposed architecture could be applied to several single player games. Its ability to adapt on the same game session, which it was exposed to a PC, can represent a huge advantage, mainly because, in this type of games, it is common for them to be played just once for a given player.

At the end of the second experiment, seven player models had been identified. V. RESULTS ANALISYS The initial ten game sessions of the first experiment have presented events that helped to perceive the main advantages of the proposed architecture. Fig. 10 highlights these events.

The reliability of the approach lies on the use of deterministic DMSs, as they can be tested and certified. It should minimize undesired behaviors, relegating their existence to bugs instead of byproducts of adaptation process. Future works could focus on automatic DMS creations, for situations where none of the current DMSs can present an adequate challenge, when faced against a certain player style. ACKNOWLEDGMENT The authors would like to thank all people that helped in making this work possible, directly or indirectly contributing to it. In addition, they would like to thank the Brazilian Aeronautics Institute of Technology for providing the research infrastructure needed by this work. REFERENCES [1]

Figure 10: Details from the adaptation process [2]

The first player model was identified when the NPC’s performance dropped below the AC and the PMM was

44

A. Ram, S. Ontanon, and M. Mehta, “Artificial intelligence for adaptive computer games,” in Proceedings of the International Flair Conference on Artificial Intelligence, Eds. AAAI Press, 2007, pp. 22-29. S. Bakkes and J. van den Herik, “Rapid and reliable adaptation of video game AI,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 1, Eds. IEEE Press, 2009, pp. 93-104.

[3]

[4]

[5]

[6]

P. Spronck, M. Ponsen, I. Sprinkhuizen-Kuyper, and E. Postma, “Adaptive game AI with dynamic scripting,” in Machine Learning, vol. 63, Eds. Kluwer Academic Publishers, 2006, pp. 217-248. M. K. Crocomo and E. V. Simoes, “Um algoritmo evolutivo para aprendizado online em jogos eletronicos,” in Proceedings of SBGames, 2008, pp. 159-168. R. Houlette, “Player modeling for adaptive games,” in AI Game Programming Wisdom, ed. 2, Eds. Charles River Media, 2003, pp. 557566. C. T. Tan and H. Cheng, “An automated model-based adaptive architecture in modern games,” in Proceedings of the 6th AAAI Co

[7] [8]

[9]

45

nference on Artificial Intelligence and Interactive Digital Entertainment, Eds. AAAI Press, 2010, pp. 186-191. I. Millington and J. Funge, Machine Learning, ed. 2, Eds. Morgan Kaufmann, 2009. C. T. Tan and H. Cheng, “Personality-based adaptation for teamwork in game agents,” in Proceedings of the 3rd AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Eds. AAAI Press, 2007, pp. 37-42. M. C. Machado, G. L. Papa, and L. Chaimowicz, “Characterizing and modeling agents in digital games,” in Proceeding of SBGames, 2012, pp. 26-33.