Adaptive multi-objective reinforcement learning with ...

Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework

Mohamed A. Khamis, Walid Gomaa Department of Computer Science and Engineering Egypt-Japan University of Science and Technology (E-JUST)

Khamis, M.A., Gomaa, W., Adaptive multi-objective reinforcement learning with hybrid exploration for trafﬁc signal control based on.... Eng. Appl. Artif. Intel. (2014), http://dx.doi.org/10.1016/j.engappai.2014.01.007i

1

RL-Based Traffic Signal Controllers  Class of AI traffic signal controllers  A trial-and-error process 

Agent learns a policy that optimizes cumulative reward gained over time



RL is based on a sequential online decision making process

 Junction-based state-space representation, Thorpe et al., 1996 

Estimator for every junction state.



This representation quickly leads to a very large state-space

 Vehicle-based state-space representation, Wiering et al., 2000’s 

Estimator for every vehicle state (position): Q(s, red), Q(s, green)



Number of states grow linearly in number of lanes and vehicles positions



Scale well for large networks

29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

2

Multi-Objective Traffic Signal Controllers  Using neuro-fuzzy or multi-objective genetic algorithms 

Neural Networks & genetic algorithms require many computations



Parameters are difficult to be determined



E.g., Lertworawanich et al., 2011

 We are the first to apply multi-objective RL for TC 

Based on consolidated immediate reward functions



Houli et al., 2010, Multi-objective RL for TC



However, activate one objective according to the traffic demand!

29-Jan-14


3

MAS Traffic Signal Control  Traffic Signal Control: o Finding the optimal traffic signal configuration o Red/Green consistent configurations  Multi-Agent System (MAS) modeling, Wiering et al. 2000’s o Vehicle: passive agent (communicate with the junction) o Junction: active agent (traffic signal controller)

 The controller at each junction sums up the gains Q(s, red)−Q(s, green) of all vehicles waiting at the current junction and chooses the traffic signal configuration (consistent green lights on all directions of the junction) with the maximum cumulative gain. 29-Jan-14


4

Multi-Objective RL Traffic Signal Control Two possible options for multi-objective Q-function implementation  Separate Q values for each objective,  Consolidated immediate reward functions  more suitable to vehicle-based modeling

 Some objectives dominate according to road conditions  E.g., congestion in specific lanes

29-Jan-14


5

GLD Traffic Signal Simulation Model  Widely used open source, developed by Wiering et al. in early 2000’s  Develop and experiment various traffic signal controllers  Various Performance Indices

 Edit/create traffic networks  Schedule traffic demands

Drawbacks:

 Discrete-time discrete-space model of traffic dynamics  Oversimplifications in modeling the driving behavior  Some simplifications in computing the statistics 29-Jan-14


6

Contributions to the GLD Simulator  Varying distributions of traffic demand o Allow for variability and non-stationarity  Applying the Intelligent Driver Model (M. Treiber et al., 2000) o Acceleration/deceleration model o Continuous time/continuous space model

 Sign oscillation problem (a Zeno phenomenon) o Resulted from the infinitesimally slow acceleration of back vehicles when the traffic signal is just turning green o Solved by giving those stationary vehicles some penalty smaller than that when the traffic signal is red. 29-Jan-14


7

Handling Non-Stationarity Using Bayesian approach to estimate the underlying MDP parameters Current estimation becomes the prior for the next time step Let P be a random variable representing an estimator of some unknown parameter for: 1. Pr(a | s): posterior probability of taking action a given state s, or 2. Pr(s’ | s, a): transition probability of being in next state s’ given (s, a). E.g., Fix some state s, then Pr (a|s) has one parameter P for Pr {a = RED}

is a sequence of Bernoulli random variables defined at the time indices where the state s is occupied by a vehicle.

29-Jan-14


8

Handling Transient Periods  Hybrid exploration technique based on both  ε-exploration and softmax exploration

 Use softmax exploration to better respond to transient periods (e.g., due to congestion at rush hours).  The traffic signal decision is chosen proportional to the gain values: exp(gi)/∑gi exp(gi)  gi is the cumulative gain of the vehicles in the lanes of the traffic signal configuration # i  Cooperation is used to check if some junction is in a transient state  most likely transferred soon to neighboring junction  During this period, it is more appropriate to use softmaxexploration. 29-Jan-14

M. A. Khamis - PhD Thesis Defense - August 2013

9

Experimentations  Congestion:  Examined by the average trip time

Mean value is lower ≈ 8 times!

 TC-1 is the single-objective controller (M. Wiering 2000) that is based on the frequentist probability estimation using ε-exploration.

29-Jan-14


10

Experimentations  Fuel Consumption:  Examined by the average number of trip stops

 Green Wave:  Examined by the average number of trip absolute stops

Vehicle stops is lower ≈ 22 times when using the multi-objective controller

29-Jan-14


11

Conclusions  Multi-Agent Reinforcement Learning for Traffic Signal Control  Proposed Framework 

Contributions to the GLD traffic simulator



E.g., Applying continuous time-space acceleration model

 Handling Traffic Network Non-Stationarity 

Bayesian approach to estimate the unknown parameters



Adaptive Cooperative Hybrid Exploration

 Multi-Objective RL for Traffic Signal Control 

Significantly outperforms the underlying single-objective controller



Under congested periods and adverse weather conditions

29-Jan-14


12

• Questions and updated GLD source – E-mail: [email protected]; [email protected]

29-Jan-14


13

Adaptive multi-objective reinforcement learning with ...

Adaptive multi-objective reinforcement learning with ...

Suggest Documents

Multiagent reinforcement learning with adaptive ... - Semantic Scholar

Empirical evaluation methods for multiobjective reinforcement learning ...

Multiagent reinforcement learning with adaptive state focus - CiteSeerX

Motivated Reinforcement Learning for Adaptive ... - CiteSeerX

Reinforcement Learning for Adaptive Routing - arXiv

Reinforcement Learning with Average Cost for Adaptive Control of ...

REinforcement learning based Adaptive samPling: REAPing ... - arXiv

Reinforcement Learning and Adaptive Dynamic ... - Semantic Scholar

An Adaptive Reinforcement Learning-based ... - Semantic Scholar

A Reinforcement Learning Scheme for Adaptive

REINFORCEMENT LEARNING WITH LINEAR FUNCTION

Experiments with Reinforcement Learning in

Reinforcement Learning with Kernel Recursive

Indirect Reinforcement earning with Adaptive State

Reinforcement Learning

Multiple Instance Learning with MultiObjective Genetic Programming ...

An Adaptive Quantum-based Multiobjective

Reinforcement Learning for Adaptive Theory of Mind in the Sigma ...

Reinforcement Learning is Direct Adaptive Optimal ... - Semantic Scholar

Adaptive Fusion by Reinforcement Learning for Distributed Detection ...

Reinforcement learning to adaptive control of nonlinear systems ...

Augmenting Reinforcement Learning Feedback with ...

Importance Sampling for Reinforcement Learning with ... - CiteSeerX

Continuous control with deep reinforcement learning