MPRL: Multiple-Periodic Reinforcement Learning for Difficulty Adjustment in Rehabilitation Games Yoones A. Sekhavat Faculty of Multimedia, Tabriz Islamic Art University Tabriz, Iran Email:
[email protected]
Abstract-Generally, the difficulty level of a therapeutic game is regulated manually by a therapist. However, home-based rehabilitation games require a technique for automatic difficulty adjustment. T his paper proposes a personalized difficulty ad justment technique for a rehabilitation game that automatically regulates difficulty settings based on a patient's skills in real time. To this end, ideas from reinforcement learning are used to dynamically adjust the difficulty of a game. We show that difficulty adjustment is a multiple-objective problem, in which some objectives might be evaluated at different periods. To address this problem, we propose and use Multiple-Periodic Reinforcement Learning
(MPRL)
that makes it possible
to
evaluate different objectives of difficulty adjustment in separate periods. T he results of experiments show that MPRL outperforms traditional Multiple-Objective Reinforcement Learning (MORL) in terms of user satisfaction parameters as well as improving the movement skills of patients.
I.
INTRODUCTION
A rehabilitation program is an important part of recovery in this condition that can assist in the cognitive assess ment and rehabilitation of patients [1]. Effective rehabilitation techniques can help patients regain lost skills and work to be independent again. The focus of rehabilitation is on the enhancement of a range of motion, coordination or motion control. This can be achieved by performing tasks that include reaching and grasping. In addition to quality and the amount of physical activities in rehabilitation, active participation and the engagement of patients during therapeutic sessions are crucial [2], [3], [4]. However, rehabilitation activities are generally designed as repetitive tasks that make patients frustrated and tired. Research has shown that patients who enjoy rehabilitation tasks spend more time developing their skills in a given activity [5]. As a result, the design of a rehabilitation task as a therapeutic game can result in a motivating rehabilitation environment. Such serious games by enhancing the exercises with physical and cognitive meanings can increase the volume of rehabilitation [18], [10], [7], [8], [9]. Understanding the characteristics of a rehabilitation game and the effect of these characteristics on the difficulty of a game is crucial. This makes it possible for a therapist to manipulate intensity, duration, challenges and frequency in order to motivate the patient. Automatic maintaining of a patient's flow state through using a Dynamic Difficulty Adjustment (DDA) makes it possible to adapt the challenges of a task based on a user's skills [23]. As argued in [6], players
978-1-5090-5482-4/17/$31.00 ©2017 IEEE
fill more immersed in games when the difficulty is adjusted according to their skills. The difficulty level of a rehabilitation game is generally con figured by a therapist in therapy sessions [11]. Consequently, when the game difficulty is beyond a patient's capabilities, or it is not challenging enough, this is the therapist who manipu lates the settings to regulate the game. In [12], five therapeutic game exercises were proposed, in which each exercise involves a set of difficulty parameters that are regulated manually. This problem is partially addressed by [14] and [15], in which the difficulty of a game is determined automatically according to a player's static profile at the beginning. However, players' skills are changed over time that requires the manual manipulation of parameters. One alternative approach to implement Dynamic Difficulty Adjustment (DDA) in computer games is the use of behav ior rules. Such rules are defined based on domain-specific knowledge in the game development phase. In Hunicke and Chapman's approach [16], the game settings are controlled by a set of rules. For example, when the game is too hard, the player receives more weapons or faces fewer opponents. Not only the manual defining of such static rules is time consuming and error-prone, identifying the conditions to pick a rule is not trivial due to the dynamic nature of games and players. To address this problem, a dynamic scripting technique is proposed in [17], in which a probability is assigned to picking a rule based on success or failure rates. However, building and maintaining such rules and probabilities become impractical as the game complexity increases. This paper proposes a personalized difficulty adjustment module, in which game difficulty is adjusted according to the progress of a patient. This system exploits the cognitive pro cesses in order to mediate between perception and action. This module takes into account individual variability in the deficits and the behavior of patients for the sake of optimizing the impact of rehabilitation. The overall process of the personalized difficulty adjustment system proposed in this paper is shown in Fig. I. To implement dynamic difficulty adjustment, ideas from Reinforcement Learning (RL) is used to adapt game difficulty according to the skills of a patient. This paper proposes a Multiple-Periodic Reinforcement Learning (MPRL), in which different objectives can be evaluated at separate periods. We argue this technique can create a game experience, in which task difficulties are appropriate to the capabilities of the
Patient (player)
Game Properties
Fig. 1. The overall process of the proposed rehabilitation game enhanced with MPRL dynamic difficulty adjustment module
patients. Such an experience supports players' motivation by meeting user satisfaction indicators. In particular, this paper proposes an adaptive therapeutic game to improve functional arm capabilities. This system employs a Multiple-Periodic Re inforcement Learning (MPRL) that makes it possible to handle different objectives of difficulty adjustment in separate periods. A mid-term study is conducted to evaluate the effectiveness of the proposed system to improve patients' skills. II.
DY NAMIC DIFFICULTY ADJUSTMENT
(DDA)
Difficulty adjustment is defined as a process of finding an adequate policy for choosing game properties that provide a good balancing in the game. According to the theory of flow [19], [20], the perceived challenge of a game depends on the skills of a player. Based on the theory of relationship be tween motivation and learning [21], the most effective perfor mance of human is achieved at intermediate levels of arousal. Consequently, the difficulty of a game must be regulated and balanced (neither too hard nor too easy). According to this theory, a personalized difficulty level such that challenges each user at her own level of competence is crucial. Since the difficulty of a game can be specified using various properties, manual difficulty adjustment of a game is not a trivial task (even for a therapist). In addition, in home-based therapy, where therapy sessions are held in a patient's place, automatic setting of difficulty is important. This automated system must be able to assess the performance level of a patient in order to regulate the difficulty level. This system must also identify the effect of game parameters on the performance in order to support personalized difficulty level. The adjustment must be performed as quickly as possible to keep the behavior of the game believable [22]. Few research has considered difficulty adjustment for reha bilitation games in real-time. In [12], a rehabilitation game is proposed to train impaired hand movements based on reaching and grasping targets. This game adapts the difficulty based on the feedbacks of a patient. In [24], a framework is proposed for media adaptation in task-oriented neuromotor rehabilitation that adapts difficulty level based on a player's biofeedback. However, in both mentioned works ([12], [24]), the difficulty
level is regulated based on static data. Achieving difficulty adjustment goals requires planning, learning, and analyzing the interactions between the players and the environment. The problem is that these goals may significantly vary and some times may contradict or enforce each other. Consequently, goal fusion techniques must be applied to generalize this multiple goal problem. We are motivated to generalize the multiple goal difficulty adjustment that deals with multiple-goals in scenarios in which the skills of a player dynamically change during the game. A. DDA Using Reinforcement Learning
Reinforcement learning is a computational approach to automate goal-directed learning [26], [25]. Essentially, Re inforcement Learning (RL) is the process of mapping the states of an agent (also called situations) to a set of actions. In each step, an RL agent determines and executes the best action on the environment that maximizes long-term rewards. In the difficulty adjustment problem, the score of a player is the main parameter indicating the state of a player. The actions that an RL agent can do to adjust the difficulty is a change in game properties associated with difficulty parameters (e.g., changing the speed of moving charter in the gaming environment). Through observing the consequences of changing these parameters, the RL agent learns to make decisions that maximize the rewards. In the RL technique, given the current state s, an action a is chosen that leads to a new state s' and receives an immediate reward r(s, a ) . This technique aims to maximize the expected value of future rewards. This is performed by learning an optimal policy that maximizes the overall rewards. A policy is a function 7r ( s) -+ a that maps a state into an action. An expected return starting from state s and performing action a following policy 7r is known as action-value function, represented by Q7r(s, a ) . One of the common algorithms for reinforcement learning is Q-Learning that iteratively computes the action-value function based on the following rule, in which V(s') maXa Q(s, a ) , a is the learning rate and I is the discount factor, representing the importance of immediate rewards relative to future rewards. =
Q(s, a)
+-
(l-a) Q(s, a)+a(,+,.v(s') Q(s, a))
(1)
RL techniques and theories are widely considered and extended in recent years that has resulted in advances in some challenging real-world problems [13]. Multiple-objective Reinforcement Learning (MORL) is an important variation of RL that takes into account multiple goals for decision-making. In this setting, Ii is the feedback associated to ith objective. In the case of direct relations between objectives, it is possible to extract a single objective function by jointly considering the goals. On the other hand, in the case of completely unrelated objectives, reward functions and decision-making policies can be taken into account separately. The problem arises when the objectives conflict, or the problem needs to realize a trade-off among the conflicting goals. We argue that dynamic difficulty adjustment is a multiple-goal problem that can be better solved by solutions that explicitly consider the goals simultaneously. III.
REHABILITATION GAME W ITH
DDA
The adaptive rehabilitation game proposed in this paper is a Kinect-based game that employs this motion sensing device to track the movements of arms. The game consists of a green landscape populated with a number of trees and buildings. The physical movements of arms mapped onto the movements of virtual arms, where the player controls the arms of the character viewed in a first-person perspective. This view can provide an effective drive onto these multi modal populations of neurons and results in inducing stronger activation of primary and secondary motor areas associated with sensory motor controls. The game we propose and use to test difficulty adjustment is a game of hitting balls placed on an arch of balls. The virtual character controlled by the motions of a player moves in a predefined path in the game. Periodically, arches of balls appear in the path, where the player needs to raise her hand in order to hit the bright ball among other balls on the arch. Hitting a right ball is a win that accumulates towards the final score. In this scenario, task-oriented action execution is combined with the observation of virtual hands to create conditions that result in the functional reorganization of a patient. The difficulty level of a game is defined according to a set of parameters called game properties. In the game proposed in this paper, the game properties identifying the difficulty of tasks in the game are the speed of moving the character on the road (speed), the size of balls (size), and the distance between the arches of balls (distance). Consequently, the performance of the user will be a function of these parameters. These are the parameters that must be modified in order to regulate the difficulty of this game. A. Dynamic Difficulty Adjustment Module
Although for advanced players a game experience is more satisfying when it is difficult to defeat [27], beginners enjoy the game when it is challenging yet beatable [28]. The general belief is that the game should neither be too easy nor too difficult [29]. Even though game properties identify
the difficulty of the game, the difficulty is a relative parameter that depends on the abilities of the player. In the context of the rehabilitation game proposed in this paper, possible outcomes for passing an arch of balls is either hitting the right ball, which is a win, or hitting a wrong ball, which is a loss. Not being able to hit any ball is also considered as a loss. In [30], the authors argue that wins/losses margin in each game must be small. Therefore, given Wand L as the total number of wins and losses, respectively, a player is considered most satisfied when IW- LI O. In particular, we identified three indicators (goals) to measure the satisfaction a player derives from the game. In order to keep the number of wins and losses equal, IW- LI must be minimized (gl)' Given ri and rHl the average individual scores of a player in two consecutive rounds i and i+1, to avoid big progress or regress after each round, I'i -ri+ll must be minimized. Small differences imply that a player is engaged in the game and tries to get a better score by playing more games (g2). The player must perceive a progress in the game and achieve a slightly better score in comparison to the previous round. To this end, the number of progresses (i.e., 2::�=O('HI -'i> 0) ) in n consecutive rounds must be maximized (g3)' =
B. Need for Multiple-Periodic Reinforcement Learning
The indicators we suggested to measure the satisfaction a player derives from a game can be considered as the objectives (goals) of difficulty adjustment. Addressing this problem using existing MORL techniques has some limitations. First, existing algorithms for reinforcement learning are episodic, where all objectives are evaluated in the same predefined episodes. However, in dynamic difficulty adjustment, the objectives of a problem cannot be evaluated at the same periods. Some objectives are evaluated based on the information at the end of each episode, while in order to evaluate some objectives, the history of previous evaluations are also required. For example, although the score of a player at the end of each round is enough to decide to what extent the goal gl is satisfied, the scores of two consecutive rounds (the score of the current round and the score of previous round) is required to decide if goal g2 is satisfied. On the other hand, the history of five or more rounds is required for checking to what extent g3 is satisfied. This opens the need for a multiple-periodic reinforcement learning that makes it possible to evaluate value functions of different goals in separate periods. Second, this problem requires a learning method that makes it possible to associate different sets of states with differ ent goals, no matter the goals enforce or contradict each other. Consequently, a learning agent can simultaneously be in different states regarding different goals. This paper pro poses Multiple-Period Reinforcement Learning (MPRL) that addresses these two problems. This technique has declarative semantics that defines the temporal setting for RL as well as procedural semantics that show how evaluation functions can be computed.
g3
g3
r-----------------------------l------
According to the user satisfaction indicators defined in the previous section, the states are defined as follows. For g1, the value of IW - LI at the end of each round (each round is passing 10 consecutive arches of balls) specifies the state of the MPRL agent corresponding to gl. In particular, since W + L 10, the possible outcome of each round regarding gl is a value in S1={-10, -8, . . . ,0,...,8, 1O}. In the case of g2, where the difference between the scores of two consecutive rounds must be minimized, the value of Iri+l - ril indicates the state of a MPRL agent regarding goal g2. Since the outcome of a single round ri can be a number between 0 and 10, possible outcomes for the difference between two consecutive rounds Iri+l - ril will be a value in S2 {-10, -9, ...,9, 10}. To indicate the state of the MPRL agent based on goal g3, the history of 6 consecutive rounds indicates the new state. Identifying a correct evaluation cycle is important because short cycles can result in a random behavior, while long cycles prevent fast evolution. Given L�=o(ri+1 - ri > 0) as a outcome function, the set of possible states are S3 {5,4, 3, 2,1,O}. Considering goals gl, g2 and g3 in the rehabilitation game, the state of an agent is represented by a triple (81,82,83), where =
Direct action
Fig. 2. The architecture of MPRL where different goals are evaluated in separate periods.
=
IV.
DIFFICULTY A D JUSTMENT USING
MPRL
The Multiple-Periodic Reinforcement Learning technique proposed in this paper for dynamic difficulty adjustment has three unique characteristics that distinguish this technique from exiting multiple objective Reinforcement Learning tech niques. First, MPRL makes it possible to define multiple periods to evaluate different goals. We show how this setting is compatible with the difficulty adjustment goals. Second, MPRL allows to associate each goal with a different set of states, where the learning agent can simultaneously be in different states. Third, MPRL has a probabilistic two-level action system that makes it possible to indirectly change the game properties. These unique characteristic are elaborated in the following. A. Multiple Periods
In the multiple-periodic architecture we define, some goals might be evaluated more frequently than others. Goals that are evaluated frequently require a shorter history of the outcomes, while the evaluation of infrequent goals requires a longer history of outcomes. The overall architecture of Multiple-Periodic RL is shown in Fig. 2. In this figure, ti on the X axis shows the end of ith round of the game, where each round is passing 10 arches of balls. The distance between two rounds is called a period. Three different types of periods with the lengths of 1, 2 and 6 rounds are shown in Fig. 2. The vertical dashed lines show what goals must be evaluated at the end of each period. Goal g1 is evaluated in t1,t2,... , goal g2 is evaluated in t2, t4,... , and goal g3 is evaluated in t6,t12,... periods. In the case of more than one goal at the end of a period, the Goal Fusion component takes into account different goals based on their severity parameter. Given k goals g1,. . . ,gk and their corresponding severity parameters AI, ..., Ab the aggregated reward is computed is shown in Algorithm 1. B. Multiple States
In MPRL, each objective is associated with a set of states. In this setting, no matter the objectives are conflicting, consistent, or unrelated, they are associated with different states. Conse quently, an MPRL agent can simultaneously be in different states regarding different objectives.
=
8i
E
Si·
C. Probabilistic Two-level Actions
Unlike existing techniques in which actions are direct and explicit, we propose a technique in which running actions are probabilistic. In this setting, performing an action depends on the probability of that action. We define two types of actions: indirect actions and direct actions. Indirect actions are in charge of increase/decrease in the probability of running direct actions. On the other hand, direct actions are the real actions on the environment that may change the state of an agent. In our game, direct actions are defined in terms of changes (increase or decrease) in the game properties. An adaptive Uni-Chromosome Controller is used to indicate how game parameters (including speed, size and distance) can be modified to meet the objectives. A big advantage of this controller is that the training and adaptation process occurs in real-time during the game. The Adaptive Uni-Chromosome Controller stores six real numbers corresponding to the game properties. As shown in Fig. 3 (a), the chromosome is an array of six real numbers, where each position in the chromosome corresponds to a behavior in the controller. The real number represents the probability of activating a behavior in the game corresponding to a game property. The first two positions correspond to the speed (PI for increasing the speed, and P2 for decreasing the speed), the second two positions correspond to the size (P3 for increasing the size, and P4 for decreasing the size) and the third two positions correspond to the distance (PS for increasing the distance, and P6 for decreasing the distance). Each real number represents the probability of activating a controller (decrease or increase) for a game parameter. In each round, any of decrease or increase controllers can be activated based on the activation probabilities in the chromosome (1 means activation and -1 means deactivation (Fig. 3 (b)). An
Speed
Size
UUU
(a) (b)
I I I 1 1 I 1 I 1 I 1 1 1 1 Pi
0.24
P2
0.8S
P3
0.54
Data:
Distance
P4
0.08
PS
0.47
P6
0.13
Probabilities chromosome Based on game properties Example of Probabilities chromosome Increase/Decrease activation chromosome
Fig. 3. Uni-Chromosome Controller used to indicate the probabilities of activating/deactivating game properties (a), an example of the chromosome including an array of seven real numbers (b), an example of activa tion/deactivation chromosome based on the probabilities.
example of activation or deactivation policy given probabilities is shown in Fig. 3 (c). The controller chromosome models the difficulty of the game by encoding a behavior set that is expected to be hard enough to get a score. At the beginning of the game, the chromosome is randomly initialized. At the end of each round, the chromosome is updated based on the rules in Algorithm 1, that can be considered as the policies of reinforcement learning. Given t representing the counter of rounds in the game, the evaluating of goal gi is performed in Pi periods, where t mod Pi = O. After evaluating a goal (or a set of goals in that period), the controller chromosome C is updated. Ci is the ith place in the chromosome representing the probability of activating/deactivating of decrease/increase of a game property (speed, size, distance). In this algorithm, sign(p) denotes the state of the chromosome before an update. If it was active already, then sign(p) = 1. Otherwise, sign(p) = -1. The parameter ex is the learning rate for reinforcement learning. The activation of an increase in a parameter means lO% increase in that parameter. Accordingly, the activation of decrease indicates 10% decrease in that parameter. In MPRL, instead of updating q-values and selecting actions based on these values, a probabilistic approach is used in which all actions have a chance to be performed at different periods. However, according to these probabilities, they may or may not be performed. As shown in Algorithm 1, the learning rate (ex) indicates to what extent the algorithm must explore new conditions or exploit the previous learnings. In this algorithm, each goal has its own impact on the aggregated reward (rw). Parameter Ai associated with goal gi determines the severity and the priority of the goal, that is directly applied on the reward function. At the end of the algorithm (when chromosome C is updated), the direct action of increase or decrease in game properties are performed based on the probabilities in C. V.
EVALUATION
Game properties as well as the scores of the player at the end of the round. Result: Updates the controller chromosome C. Let Ci the ith place in the controller chromosome. Let G = {gl, ..., gk} the set of k goals. gi is associated with a set of states Si = {sf, ..., sf}· Let Pi representing the period at which goal gi is evaluated. Let t the rounds counter. Initialize chromosome C arbitrarily. repeat
Let G = null (goals that are evaluated at the current round). float rw = 0; (representing aggregatedRewards)
for i=l to k do
I
if
I
end
t mod Pi G = G U gi
==
0 then
gi in G do gi ==gl then rW1 = Sl + (It.scorel - 5) aggregatedRewards += gl* Ai * rW1; gi ==g2 then rW2 = S2 + (It.scorel-I(t-I).scorel) aggregatedRewards += g2* Ai * rW2; gi ==g3 then rW3 = S3 + L�::(ri+1-ri > 0) aggregatedRewards += g3* Ai * rW3;
foreach if
I I
if
if
end
I
rnd = Random(O,l); (rnd :s; 0.3 ) then for p 1 to 6 do if (p is even) then I Cp =(I-ex)*Cp+ex*(sign(p) +rw) if (p is odd) then I Cp =(1-ex) *Cp-ex*(sign(p) +rw)
if
=
end
for
p
=
1 to 6 do
with the probability of Ci in C, run the corresponding direct action update the states (Sl, S2, S3)
end
until the game is progress; Algorithm 1: MPRL
Algorithm.
to measure the effect of the proposed technique on the arm moving skills. In particular, eight hemiplegic patients (6 males and 2 females, mean age 54.2±9.1 years, lO9.1±85.2 days after stroke) participated in the experiments. In the recruitment process, patients with emotional or cognitive deficits and those who had a problem with understanding of the tasks were excluded.
A set of experiments were performed on a working pro totype of the rehabilitation game developed in Unity game enginel. The experiments were conducted on eight patients
A. Study Design
1 The source code of the game and the MPRL module is available for download at www.carlab/MPRL.rar
Two independent variables were considered in this study were the difficulty adjustment with traditional MORL, and the
MPRl
MORl
Histogram of
� 20 � m -5 0
50 �40 �� 30 0" 20 Q) �1O 0
.•
�__�--+-� �-+ --�--••
__
'+-
o
t: (]) �
(]) 0..
-10
-8
-6
-4
-2
0
2
4
6
8
10
W-L
Histogram of
MPRl
-.-Speed
-20
!
�Size -..t,... Distance
-40 -60
Session #
Fig. 5. The average percent of changes on difficulty parameters at the end of therapy sessions for MPRL.
50 �40 �� 30 0" 20 Q) �1O 0
MORl Vl
-10
-8
-6
-4
-2
0
2
4
6
8
10
W-L
Fig. 4. The histogram of the score differences for the players.
IW -LI
20
g:, � -5 0 b � -20 �
in MORL and MPRL
difficulty adjustment with MPRL. The main objective of the experiment was to compare the effects of these systems on improving the skills of patients and meeting user satisfaction indicators. We designed a mid-term study where each patient carried out eight rehabilitation sessions, one each day at ap proximately the same hour during two weeks. In each session, patients were asked to play the game for 20 minutes. The experiments were conducted in Cognitive Augmented Reality lab in Iran (www.cariab.ir). Four patients used the game with MPRL, and the remaining four patients used the game with MORL. Before using the systems, we optimized the learning rate by five healthy patients. 1) Meeting the Player Satisfaction Indicators: In the first set of experiments, we were interested to find to what extent the player satisfaction conditions (regarding gl, g2 and g3) are met using MPRL and MORL. According to gl, minimizing IW - LI results in player satisfaction and the entertainment of a player. The histogram of the differences between Wand L shows that in a considerable amount of cases IW- LI is low for MPRL (Fig.4). This histogram is almost symmetric, and the mean values are very close to zero. This implies that MPRL has been able to match the skills of players in most of the cases. More specifically, in more than 70% of the cases, a score difference is four or less, implying that the player is entertained in at least 70% of the cases. On the other hand, in the histogram of score differences for MORL (shown in Fig.4), the mean values are not close to zero. The results indicate that MPRL has been more successful than MORL to meet player satisfaction indicator in terms of minimizing IW- LI.
(]) 0..
!
.... 2
3
4
5
7
8
-.-Speed �Size
�---�--.
-40 -60
6
__.;;]J__
Jo .
-..t.- Distance
Session #
Fig. 6. The average percent of changes on difficulty parameters at the end of therapy sessions for MORL.
B. Improvement in Arm Movement Skills
A difficulty adjustment module modifies the difficulty pa rameters based on the capabilities of a user. As a result, an increase in a game difficulty parameter can be considered as a measure of improvement in the skill of moving arms. However, since patients' skills are not the same at the beginning, instead of the actual values of the difficulty parameters, the rate of changes in difficulty parameters was considered as a measure of improvement. To this end, the changes in the average of speed, size and distance in each therapy session were taken into account. As shown in Fig. 5 and Fig. 6, the rates of decrease in the size and distance for MPRL are more than this rates for MORL. This represents an increase in the amplitude of player movements, which is an important factor in the rehabilitation process. In the case of speed, although the amount of speed has been increased during eight sessions for both MPRL and MORL, there is no considerable improvement on speed over eight sessions for both systems. Explaining this observation requires exploring the correlations between difficulty parameters, that will be addressed in the future work.
V I.
CON CLUSION
In this paper, we presented a rehabilitation game and a learning method to dynamically adjust the difficulty of this game according to the skills of a player. This system infers a correct model of a user's skills level, while subtly changing
the difficulty level. After identifying game properties affecting the difficulty of the game, we determined three objectives that must be accomplished in order to satisfy players. We showed these objectives cannot be evaluated at the same periods as some might need a history of previous evaluations. To address this problem, we proposed and used Multiple-Periodic Reinforcement Learning (MPRL), which is an extension of reinforcement learning to handle multiple-periodic objectives. We proposed new semantics and procedures for MPRL with two-level probabilistic actions. In this architecture, the MPRL agent can perform indirect actions corresponding to different goals in order to increase or decrease the probabilities of direct actions. On the other hand, direct actions are performed based on these probabilities. The results of a set of experiments on patients show that MPRL outperforms traditional MORL in terms of player satisfaction indicators as well as improving the motor control system in long terms. For future work, we aim to study the effect of patients' motivation and engagement to adjust the difficulty level. REFEREN CES [l] Hocine, N. , Gouaich, A. , & Cerri, S. A. (2014, April). Dynamic Difficulty Adaptation in Serious Games for Motor Rehabilitation. In International Conference on Serious Games (pp. 115-128). Springer International Publishing. [2] Andrade, K. D. O., Pasqual, T B., Caurin, G. A. , & Crocomo, M. K. (2016, May). Dynamic difficulty adjustment with Evolutionary Algorithm in games for rehabilitation robotics. In Serious Games and Applications for Health (SeGAH), 2016 IEEE International Conference on (pp. 1-8). IEEE. [3] Cirstea, M. c., & Levin, M. F. (2007). Improvement of arm movement patterns and endpoint control depends on type of feedback during practice in stroke survivors. Neurorehabilitation and neural repair. [4] Dobkin, B. H. (2005). Rehabilitation after stroke. New England Journal of Medicine, 352(16), 1677-1684. [5] Hocine, N., & Gouach, A. (2011, July). Therapeutic games' difficulty adaptation: An approach based on player's ability and motivation. In Computer Games (CGAMES), 2011 16th International Conference on (pp. 257-261). IEEE. Chicago [6] Afergan, D. , Peck, E. M., Solovey, E. T, Jenkins, A. , Hincks, S. W, Brown, E. T, & Jacob, R. J. (2014, April). Dynamic difficulty using brain metrics of workload. In Proceedings of the 32nd annual ACM conference on Human factors in computing systems (pp. 3797-3806). ACM. [7] Sekhavat, Y. A. , (2016). Nowcasting Mobile Games Ranking Using Web Search Query Data. International Journal of Computer Games Technology. [8] Sekhavat, Y. A. (2016). KioskAR: An Augmented Reality Game as a New Business Model to Present Artworks. International Journal of Computer Games Technology. [9] Sekhavat, Y. A. (2016). "Privacy Preserving Cloth Try-On Using Mo bile Augmented Reality," in IEEE Transactions on Multimedia, doi: 1O.1109ffMM.2016.2639380 (to appear) [l0] Lucca, L. F. (2009). Virtual reality and motor rehabilitation of the upper limb after stroke: a generation of progress? Journal of rehabilitation medicine, 41(12), 1003-1006. [11] Gouach, A., Hocine, N., Van Dokkum, L. , & Mottet, D. (2012, January). Digital-pheromone based difficulty adaptation in post-stroke therapeutic games. In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium (pp. 5-12). ACM. [12] Heuser, A. , Kourtev, H. , Winter, S., Fensterheim, D. , Burdea, G., Hentz, Y., & Forducey, P. (2007). Telerehabilitation using the Rutgers Master II glove following carpal tunnel release surgery: proof-of-concept. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 15(1), 43-49. [l3] Liu, c., Xu, X. , and Hu, D. (2015). Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3), 385-398.
[l4] Natkin, S., Yan, c., Jumpertz, S., & Market, B. (2007, January). Creating multiplayer ubiquitous fames using an adaptive narration model based on a users model. In Digital Games Research Association International Conference (DiGRA 2007). [15] Yannakakis, G. N., & Hallam, J. (2008, December). Real-time adaptation of augmented-reality games for optimizing player satisfaction. In 2008 IEEE Symposium On Computational Intelligence and Games (pp. 103110). IEEE. [l6] Hunicke, R. , and Chapman, Y. (2004) Al for dynamic difficulty adjust ment in games. Proceedings of the Challenges in Game Al Workshop, Nineteenth National Conference on Artificial Intelligence. [17] Spronck, P., Sprinkhuizen-Kuyper, I. , & Postma, E. (2004). Difficulty scaling of game AI. In Proceedings of the 5th International Conference on Intelligent Games and Simulation (GAME-ON 2004) (pp. 33-37). [l8] Prabhakaran, S. , Zarahn, E. , Riley, c., Speizer, A. , Chong, J. Y., Lazar, R. M., ... & Krakauer, J. W (2007). Inter-individual variability in the capacity for motor recovery after ischemic stroke. Neurorehabilitation and neural repair. [19] Csikszentmhalyi, M.1990. Flow: The Psychology of Optimal Experi ence. NY, NY: Harper Collins Berkely, California: OsbornelMcGraw-Hill. [20] Tan, C. H. , Tan, K. c., & Tay, A. (2011). Dynamic game difficulty scaling using adaptive behavior-based AI. IEEE Transactions on Compu tational Intelligence and Al in Games, 3(4), 289-301. [21] Cameiro, M. S. , i Badia, S. B., Oller, E. D., & Verschure, P. F. (2010). Neurorehabilitation using the virtual reality based Rehabilitation Gaming System: methodology, design, psychometrics, usability and validation. Journal of neuroengineering and rehabilitation, 7(1), 1. [22] Andrade, G., Ramalho, G. , Santana, H., & Corruble, Y. (2005, August). Extending reinforcement learning to provide dynamic game balancing. In Proceedings of the Workshop on Reasoning, Representation, and Learning in Computer Games, 19th International Joint Conference on Artificial Intelligence (IJCAI) (pp. 7-12). [23] Gouach, A., Hocine, N. , Van Dokkum, L. , & Mottet, D. (2012, January). Digital-pheromone based difficulty adaptation in post-stroke therapeutic games. In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium (pp. 5-12). ACM. [24] Chen, Y., Xu, W, Sundaram, H., Rikakis, T, & Liu, S. M. (2008). A dynamic decision network framework for online media adaptation in stroke rehabilitation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 5(1), 4. [25] Wang, H. , Gao, Y., & Chen, X. (2010). Rl-dot: A reinforcement learning npc team for playing domination games. IEEE Transactions on Computational intelligence and Al in Games, 2(1), 17-26. [26] Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. I, No. I). Cambridge: MIT press. [27] Buro, M., & Furtak, T (2003). RTS games as test-bed for real-time Al research. In Proceedings of the 7th Joint Conference on Information Science (JCIS 2003) (pp. 481-484). [28] Scott, B. (2002). The illusion of intelligence. Al game programming wisdom, 1, 16-20. [29] Malone, T W (1980, September). What makes things fun to learn? Heuristics for designing instructional computer games. In Proceedings of the 3rd ACM SIGSMALL symposium and the first SIGPC symposium on Small systems (pp. 162-169). ACM. [30] Hagelback, J. , & Johansson, S. J. (2009, September). Measuring player experience on runtime dynamic difficulty scaling in an RTS game. In 2009 IEEE Symposium on Computational Intelligence and Games (pp. 46-52). IEEE.