Adaptive multi-objective reinforcement learning with ...

2 downloads 0 Views 1MB Size Report
Jan 29, 2014 - Mohamed A. Khamis, Walid Gomaa. Department of Computer Science and Engineering .... mohamed.abdelaziz[email protected].
Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework

Mohamed A. Khamis, Walid Gomaa Department of Computer Science and Engineering Egypt-Japan University of Science and Technology (E-JUST)

Khamis, M.A., Gomaa, W., Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on.... Eng. Appl. Artif. Intel. (2014), http://dx.doi.org/10.1016/j.engappai.2014.01.007i

1

RL-Based Traffic Signal Controllers  Class of AI traffic signal controllers  A trial-and-error process 

Agent learns a policy that optimizes cumulative reward gained over time



RL is based on a sequential online decision making process

 Junction-based state-space representation, Thorpe et al., 1996 

Estimator for every junction state.



This representation quickly leads to a very large state-space

 Vehicle-based state-space representation, Wiering et al., 2000’s 

Estimator for every vehicle state (position): Q(s, red), Q(s, green)



Number of states grow linearly in number of lanes and vehicles positions



Scale well for large networks

29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

2

Multi-Objective Traffic Signal Controllers  Using neuro-fuzzy or multi-objective genetic algorithms 

Neural Networks & genetic algorithms require many computations



Parameters are difficult to be determined



E.g., Lertworawanich et al., 2011

 We are the first to apply multi-objective RL for TC 

Based on consolidated immediate reward functions



Houli et al., 2010, Multi-objective RL for TC



However, activate one objective according to the traffic demand!

29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

3

MAS Traffic Signal Control  Traffic Signal Control: o Finding the optimal traffic signal configuration o Red/Green consistent configurations  Multi-Agent System (MAS) modeling, Wiering et al. 2000’s o Vehicle: passive agent (communicate with the junction) o Junction: active agent (traffic signal controller)

 The controller at each junction sums up the gains Q(s, red)−Q(s, green) of all vehicles waiting at the current junction and chooses the traffic signal configuration (consistent green lights on all directions of the junction) with the maximum cumulative gain. 29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

4

Multi-Objective RL Traffic Signal Control Two possible options for multi-objective Q-function implementation  Separate Q values for each objective,  Consolidated immediate reward functions  more suitable to vehicle-based modeling

 Some objectives dominate according to road conditions  E.g., congestion in specific lanes

29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

5

GLD Traffic Signal Simulation Model  Widely used open source, developed by Wiering et al. in early 2000’s  Develop and experiment various traffic signal controllers  Various Performance Indices

 Edit/create traffic networks  Schedule traffic demands

Drawbacks:

 Discrete-time discrete-space model of traffic dynamics  Oversimplifications in modeling the driving behavior  Some simplifications in computing the statistics 29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

6

Contributions to the GLD Simulator  Varying distributions of traffic demand o Allow for variability and non-stationarity  Applying the Intelligent Driver Model (M. Treiber et al., 2000) o Acceleration/deceleration model o Continuous time/continuous space model

 Sign oscillation problem (a Zeno phenomenon) o Resulted from the infinitesimally slow acceleration of back vehicles when the traffic signal is just turning green o Solved by giving those stationary vehicles some penalty smaller than that when the traffic signal is red. 29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

7

Handling Non-Stationarity Using Bayesian approach to estimate the underlying MDP parameters Current estimation becomes the prior for the next time step Let P be a random variable representing an estimator of some unknown parameter for: 1. Pr(a | s): posterior probability of taking action a given state s, or 2. Pr(s’ | s, a): transition probability of being in next state s’ given (s, a). E.g., Fix some state s, then Pr (a|s) has one parameter P for Pr {a = RED}

is a sequence of Bernoulli random variables defined at the time indices where the state s is occupied by a vehicle.

29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

8

Handling Transient Periods  Hybrid exploration technique based on both  ε-exploration and softmax exploration

 Use softmax exploration to better respond to transient periods (e.g., due to congestion at rush hours).  The traffic signal decision is chosen proportional to the gain values: exp(gi)/∑gi exp(gi)  gi is the cumulative gain of the vehicles in the lanes of the traffic signal configuration # i  Cooperation is used to check if some junction is in a transient state  most likely transferred soon to neighboring junction  During this period, it is more appropriate to use softmaxexploration. 29-Jan-14

M. A. Khamis - PhD Thesis Defense - August 2013

9

Experimentations  Congestion:  Examined by the average trip time

Mean value is lower ≈ 8 times!

 TC-1 is the single-objective controller (M. Wiering 2000) that is based on the frequentist probability estimation using ε-exploration.

29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

10

Experimentations  Fuel Consumption:  Examined by the average number of trip stops

 Green Wave:  Examined by the average number of trip absolute stops

Vehicle stops is lower ≈ 22 times when using the multi-objective controller

29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

11

Conclusions  Multi-Agent Reinforcement Learning for Traffic Signal Control  Proposed Framework 

Contributions to the GLD traffic simulator



E.g., Applying continuous time-space acceleration model

 Handling Traffic Network Non-Stationarity 

Bayesian approach to estimate the unknown parameters



Adaptive Cooperative Hybrid Exploration

 Multi-Objective RL for Traffic Signal Control 

Significantly outperforms the underlying single-objective controller



Under congested periods and adverse weather conditions

29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

12

• Questions and updated GLD source – E-mail: [email protected]; [email protected]

29-Jan-14

http://dx.doi.org/10.1016/j.engappai.2014.01.007i

13

Suggest Documents