Spike-based reinforcement learning of navigation

Recommend Documents

Robot Navigation using Reinforcement Learning and Slow

May 4, 2012 - Figure 5.2: The Pioneer 3DX robot. .... Surely we can only answer this question ... Page 13 ... Theory In chapter 3 the theoretical background of reinforcement learning is ...... All sub-images represent cell activity levels at.

Efficient reinforcement learning of navigation strategies in an ... - UPC

... to reactive systems is sufficient to allow a robot to generate efficient trajectories ... A reinforcement-learning robot learns by doing and does not require a ..... sample of locations within the office and the first part of the corridor. Fig. 4

Reinforcement Learning

we present reinforcement learning methods, where the transition and reward functions ... way to determine through incremental experience the best way to look after the car ...... The SARSA (Î») algorithm [RUM 94] is a first illustration. As shown ...

Mobile Robot Navigation using Reinforcement ... - Semantic Scholar

... and G.Hayes [5] suggested a model for mobile robot navigation (DRAMA) ..... 578-582. Taiwan,. (2008). 20. Gavrilov A.V.: New Paradigm of Context based ...

APPLICATION OF NOVEL REINFORCEMENT LEARNING ...

Nov 19, 2009 - Learning Automata (DARLA) searches the optimum sub-interval of each variable in discrete space based on a pre-specified cost function which ...

Differential Modulation of Reinforcement Learning

Sep 24, 2014 - learning, while memantine mildly attenuated approach learning but had no effect on avoidance ...... pancy by amisulpride in schizophrenia.

Quantum Reinforcement Learning - arXiv

Abstractâ The key approaches for machine learning, especially learning in unknown .... for the parallelization of a fuzzy logic control algorithm to speed up the ...

reinforcement learning concrete problems

changes in the environment that are not accounted for in the reward scheme. I believe this problem ... the OpenAI gym environment called Torcs. Which is a car ...

Reinforcement Learning and Dimensionality

Apr 15, 2011 - [2] have suggested that this transformation could correspond to a. Principal Component ... studied to model motor activities, for example in ... The striato-nigral loop in the BG [13] reciprocally links the Striatum ..... colors) can b

Reinforcement Learning

J ing"D eng and onald J . W illiams. r ncremental multi8 stepÐ²Ð± 8 learning. r n. W .W .Cohen and H .H irsh, editors, Machine Learning: Proceedings of the 11th In-.

Reinforcement Learning - VideoLectures

Jul 3, 2017 - Traffic lights? .... When S is a finite set of states, this is a system of linear equations ... Main idea:

Quantile Reinforcement Learning

Nov 3, 2016 - JMLR: Workshop and Conference Proceedings 60 (2016) 1â16. ACML 2016 .... We will call t-history ht a succession of t state-action pairs starting from state s0 (e.g., ht = (s0,a1,s1,...,stâ1 ...... IOS Press, 2012. P. Weng and B.

Modified Reinforcement Learning Infrastructure

reinforcement learning infrastructure (MRLI) is presented for a customer .... We call the elements of S locations ... center or decides to buy online. In order to ...

Bayesian Inverse Reinforcement Learning

Computer Science Dept. University of Illinois at Urbana-Champaign. Urbana, IL 61801 ... In this paper we model the IRL problem from a Bayesian perspective.

Relational Deep Reinforcement Learning

Jun 28, 2018 - This mechanism has parallels with graph neural networks ... We use a convolutional neural network (CNN) to parse pixel inputs into k.

Reinforcement Learning - Google Sites

Learn how to automate your systems how to build chat bots and the future of deep learning Explore the applications of ma

Quantile Reinforcement Learning

Nov 3, 2016 - helicopter flight (Abbeel et al., 2010) or human-level video game ... In this context, he showed the sufficiency of working in a state space augmented ...... Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel ...

Reinforcement Learning - VideoLectures.NET

... with Monte Carlo Tree Search [Xiao-Xiao et al, 2014]. â«. Trust Region Policy OpÃ mizaÃ on [Schulman, Levine, Morit

Collaborative Deep Reinforcement Learning

Feb 19, 2017 - using extensive empirical evaluation on OpenAI gym. CCS CONCEPTS ... a given environment and using rewards from the environment as.

Multi-Advisor Reinforcement Learning

Apr 3, 2017 - Microsoft Maluuba. MontrÃ©al ... Ms Pac-Man, called Pac-Boy, that a MAd-RL architecture ... cal state space representation [BÃ¶hmer et al., 2015;.

Quantum Reinforcement Learning - arXiv

the agent) st, and then choose an action at. After executing the action, the agent receives a reward rt+1, which reflects how good that action is (in a short-term ...

Reinforcement Learning - VideoLectures.NET

Step-sizing / Natural Gradient / Trust Regions (TRPO). â«. Generalized ...... Efficient scheme through conjugate gradie

Transfer Learning for Multiagent Reinforcement Learning ... - IJCAI

a Transfer Learning (TL) framework to accelerate learning by exploiting two ... Many RL domains can be treated as Multiagent Systems. (MAS), in which multiple ...

Learning Options in Reinforcement Learning - CiteSeerX

The options framework (Precup, 2000; Sutton, Precup & Singh,. 1999) provides a natural way of incorporating such actions into reinforcement learning systems ...

Spike-based reinforcement learning of navigation

Download PDF

1 downloads 0 Views 279KB Size Report

Comment

Jul 11, 2008 - Address: 1Laboratory of Computational Neuroscience, School of Computer and Communications Sciences and Brain Mind Institute, Ecole.

BMC Neuroscience

BioMed Central

Open Access

Poster presentation

Spike-based reinforcement learning of navigation Eleni Vasilaki*1, Robert Urbanczik2, Walter Senn2 and Wulfram Gerstner1 Address: 1Laboratory of Computational Neuroscience, School of Computer and Communications Sciences and Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, CH-1015, Switzerland and 2Institute of Physiology, University of Bern, Buehlplatz 5, 3012 Bern, Switzerland Email: Eleni Vasilaki* - [email protected] * Corresponding author

from Seventeenth Annual Computational Neuroscience Meeting: CNS*2008 Portland, OR, USA. 19–24 July 2008 Published: 11 July 2008 BMC Neuroscience 2008, 9(Suppl 1):P72

doi:10.1186/1471-2202-9-S1-P72

This abstract is available from: http://www.biomedcentral.com/1471-2202/9/S1/P72 © 2008 Vasilaki et al; licensee BioMed Central Ltd.

Introduction We have studied a spiking, reinforcement learning model derived from reward maximization [1,2] where causal relations between pre-and postsynaptic activity set a synaptic eligibility trace [2,3]. Neurons are modeled according to the "Integrate-and-Fire" model with escape noise. Synapses are binary and are modulated via the release probability. The synaptic release probability is updated when a global reward signal (such as dopamine) is received. We have used the learning algorithm in a model of the Morris Water Maze task. The simulated rat explores the environment in random search. After only few trials the rat has learned to approach the goal from arbitrary start conditions, see Figure 1. The model features automatic generalization in state and action space due to coding by overlapping profiles of place cell and action cells [4].

References 1.

2. 3. 4.

Pfister JP, Toyoizumi T, Barber D, Gerstner W: Optimal SpikeTiming Dependent Plasticity for Precise Action Potential Firing in Supervised Learning. Neural Computation 2006, 18(6):1309-1339. Florian RV: Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation 2007, 19(6):1468-1502. Izhikevich EM: Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling. Cerebral Cortex 2007, 17:2443-2452. Strösslin T, Sheynikhovich D, Chavarriaga R, Gerstner W: Robust self-localisation and navigation based on hippocampal place cells. Neural Networks 2005, 18(9):1125-1140.

Figurelatency Escape 1 versus number of trials Escape latency versus number of trials. Escape latency measures the time it takes the simulated rat to reach a hidden platform starting from arbitrary initial conditions. Learning is achieved in less than 20 trials. Error bars indicate 25% and 75% percentiles.

Page 1 of 1 (page number not for citation purposes)