Fast Concurrent Reinforcement Learners - Semantic Scholar

Recommend Documents

155 CONCURRENT REINFORCEMENT SCHEDULES ... - CiteSeerX

JENNIFER J. MCCOMAS AND ANDREA L. THOMPSON. THE UNIVERSITY OF MINNESOTA ..... by Iwata, Dorsey, Slifer, Bauman, and Rich- man (1982/1994).

INTERACTION OF REINFORCEMENT ... - Semantic Scholar

including stereotypy, self-injury, aggression, leaving the workstation, and other off-task behaviors. No formal schedule of reinforcement was employed in their ...

OPIOID ABSTINENCE REINFORCEMENT ... - Semantic Scholar

Silverman, Stitzer, & Svikis, 2001; Petry &. Martin, 2002 ... Esch, 1997; Chutuape, Silverman, & Stitzer,. 1999; Hall, Bass ..... Action Editor, Kenneth Silverman.

Associative Reinforcement Learning - Semantic Scholar

Reinforcement Learning, Associative Learning, Agents and Multi-agent ... we propose to enhance learning algorithms, reinforcement learning in particular, for.

CONCURRENT VARIABLE-INTERVAL ... - Semantic Scholar

be obtained from Gene M. Heyman, Andrus Geron- tology Center, University Park, .... computer (Digital Equipment Corporation. PDP-9T) controlled the ...

Concurrent clause strengthening - Semantic Scholar

CÃ©dric Piette, Youssef Hamadi, and Lakhdar Sais. Vivifying propositional clausal formulae. In Malik Ghallab, Constantine D. Spyropoulos, Nikos Fakotakis, and.

Concurrent Software Engineering - Semantic Scholar

the application has been coded in a programming language. .... apple orange. Figure 3: LTS corresponding to the protocol. [start; *(apple | orange); stop].

Induction chemotherapy, concurrent ... - Semantic Scholar

Sep 14, 2005 - ventilation-perfusion nuclide scintigraphy to assess prospec- ... present study by a pulmonologist/thoracic surgeon and a medical ...... RT: radiotherapy; S: surgery; NR: not reported; CT: chemotherapy; HT: hyperthermia; IORT: ...

Creating Advice-Taking Reinforcement Learners - Springer Link

RICHARD VIA [...I.IN A NF) JULYE W $||AVI,IK. maclin (Âºcs.wisc.edu and shavlik &cs.wisc.edu. Computer Sciences Å¿ept., University of Wisconsin, #210 W. Dayton ...

Expectancies in decision making, reinforcement ... - Semantic Scholar

Matthijs A. A. van der Meer* and A. David Redish*. Department of Neuroscience ...... Atallah, H. E., Lopez-Paniagua, D., Rudy,. J. W., and O'Reilly, R. C. (2007).

Reinforcement Learning for Partially Observable ... - Semantic Scholar

AbstractâApproximate dynamic programming (ADP) is a class of reinforcement learning ..... The optimal control problem [17] is to find the policy uk = Î¼(xk) that ...

Mutual reinforcement of inflammation and ... - Semantic Scholar

May 6, 2015 - Helicobacter pylori cagA-positive strain delivers the CagA ...... pathogens including Helicobacter bilis and Helicobacter hepaticus were ...

A Generalized Reinforcement-Learning Model - Semantic Scholar

Jan 18, 1996 - A Generalized Reinforcement-Learning Model: Convergence and Applicationa. Michael L. Littman1. Csaba Szepesva`ri2. Technical Report No ...

Reinforcement Learning for Spoken Dialogue ... - Semantic Scholar

automated synthesis of complicated dialogue systems from passive corpora ... TOOT, or whether the user prefers to hear their email ordered by date or sender in ...

Least-Squares Methods in Reinforcement ... - Semantic Scholar

2 Shannon Laboratory, AT&T Labs â Research, Florham Park, NJ 07932, U.S.A. [email protected] ..... We call this algorithm LSQ [7] due to its similarity to LSTD. LSQ learns the ..... Tetris is a popular tiling video game. Although the ...

Recently, researchers have tried to calculate the similarity of two data objects by measuring ... data space are connected via intra-type relationships RiâSiÃSi.

Least-Squares Methods in Reinforcement ... - Semantic Scholar

and the game of Tetris. 1 Introduction. Linear least-squares methods have been successfully used for prediction prob- lems in the context of reinforcement ...

Prioritized Sweeping Reinforcement Learning ... - Semantic Scholar

Sep 2, 2016 - Routing protocols for an ad hoc network [2] are generally .... the best estimate for delivering packet P to its final destination D. In order that ...

Sequence labeling with Reinforcement Learning ... - Semantic Scholar

searching for a globally optimal label sequence, we learn to construct this ... Sequence labeling is the generic task of assigning labels to the elements of a ...

Reinforcement Learning When Visual Sensory ... - Semantic Scholar

robot change motion m state evaluator reinforcement signal state x sensor actuator. Fig. 2 Structure of reinforcement learning system d dt d dt2. 2 rnd1 rnd2. +. ~.

Designing a Reinforcement Learning-based ... - Semantic Scholar

Abstract. This paper investigates the challenges posed by the application of reinforcement learning to large-scale strategy games. In this context, we present ...

Forgetting in Reinforcement Learning Links ... - Semantic Scholar

Oct 13, 2016 - Ayaka Kato1, Kenji Morita2*. 1 Department of ...... Takahashi YK, Roesch MR, Wilson RC, Toreson K, O'Donnell P, Niv Y, et al. Expectancy- ...

Reinforcement learning based dual-control ... - Semantic Scholar

Dec 18, 2008 - A dual-controller approach is undertaken where primary ...... Louis, MO, in 2002 and the M.S. degree in computer engineering from Univer-.

Faster Reinforcement Learning After Pretraining ... - Semantic Scholar

Faster Reinforcement Learning After Pretraining. Deep Networks ... Department of Computer Science. Colorado .... deal with some degree of randomness in observable variables .... to add hills to the track and to change the effect of actions in.

Fast Concurrent Reinforcement Learners - Semantic Scholar

Download PDF

3 downloads 0 Views 109KB Size Report

Comment

bikram, sandip @euler.mcs.utulsa.edu. Jing Peng. Dept. of Computer ..... traction), according to the outline provided by Singh et al. (2000). The remaining task is ...

Fast Concurrent Reinforcement Learners Bikramjit Banerjee & Sandip Sen Math & Computer Sciences Department University of Tulsa Tulsa, OK 74104 fbikram, [email protected] Abstract When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes non-stationary. This makes learning in multiagent systems more difficult than single-agent learning. A few methods, however, are known to guarantee convergence to equilibrium in the limit in such systems. In this paper we experimentally study one such technique, the minimax-Q, in a competitive domain and prove its equivalence with another well-known method for competitive domains. We study the rate of convergence of minimax-Q and investigate possible ways for increasing the same. We also present a variant of the algorithm, minimax-SARSA, and prove its convergence to minimax-Q values under appropriate conditions. Finally we show that this new algorithm performs better than simple minimax-Q in a general-sum domain as well.

1 Introduction The reinforcement learning (RL) paradigm provides techniques using which an individual agent can optimize environmental payoff. However, the presence of a non-stationary environment, central to multi-agent learning, violates the assumptions underlying convergence proofs of single agent RL techniques. As a result, standard reinforcement learning techniques like Q-learning are not guaranteed to converge in a multi-agent environment. The focus of convergence for multiple, concurrent learners is on an equilibrium strategy-profile rather than optimal strategies for the individual agents. The stochastic-game (or Markov Games) framework, a generalization of Markov Decision Processes for multiple controllers, has been used to model learning by agents in purely competitive domains [Littman, 1994]. In that work, the author has presented a minimax-Q learning rule and evaluated its performance experimentally. The Q-learning rule, being a model-free and online learning technique, is particularly suited for multi-stage games, and as such, an attractive candidate for multiagent-learning in uncertain environments. Hu and Wellman (1998) extended Littman’s framework to enable the agents to converge to a mixed-strategy Nash equi-

Jing Peng Dept. of Computer Science Oklahoma State University Stillwater, OK 74078 [email protected] librium. There have also been other research on using the stochastic-game framework in multiagent learning, such as an empirical study of multiagent Q-learning in semi-competitive domains [Sandholm and Crites, 1996] among others. The first two techniques mentioned above are known to converge in the limit. In this paper we show that these two techniques are essentially identical in purely competitive domains. To increase the convergence rate of the minimaxQ rule, we extend it to incorporate SARSA [Rummery, 1994; Sutton and Burto, 1998] and Q() [Peng and Williams, 1996] techniques, and study the convergence rates of these different methods in a competitive soccer domain [Littman, 1994].

2 Multiagent Q-learning Definition 1 A Markov Decision Process (MDP) is a quadruple fS; A; T; Rg, where S is the set of states, A is the set of actions, T is the transition function, T S A ! P D S , P D being a probability distribution, and R is the reward function, R S A !