Combining Machine Learning and Multi-Agent Approach for Controlling Traffic at Intersections ´ zy´ Mateusz Krzyszto´ n1 and Bartlomiej Snie˙ nski2 1
2
Institute of Control and Computation Engineering Warsaw University of Technology, Warsaw, Poland,
[email protected], AGH University of Science and Technology, Krakow, Poland
[email protected]
Abstract. Increasing volume of traffic in urban areas causes great costs and has negative effect on citizens’ life and health. The main cause of decreasing traffic fluency is intersections. Many methods for increasing bandwidth of junctions exist, but they are still insufficient. At the same time intelligent, autonomous cars are being created, what opens up new possibilities for controlling traffic at intersections. In this article a new approach for crossing an isolated junction is proposed - cars are given total autonomy and to avoid collisions they have to change speed. Several methods for adjusting speed based on machine learning (ML) are described, including new methods combining different ML algorithms (hybrid methods). The approach and methods were tested using a specially designed platform MABICS. Conducted experiments revealed some deficiencies of the methods - ideas for addressing them are proposed. Results of experiments made it possible to verify the proposed idea as promising. Keywords: intersection; self-driven car; reinforcement learning; hybrid methods
1
Introduction
With improvements in technology, multi-robot systems become more present in our life, increasing standard of our living in the vast majority of cases. To make this impact even more efficient, multi-robot systems exploit artificial intelligence (AI) very often. Robots, often identified with agents [1], can use multiple algorithms within AI. A very popular choice for implementing decision-making mechanism for robots is to use machine learning (ML) methods, which allow robots to learn from their experience [2]. One of the natural environments for multi-robot systems is a city, especially its transport infrastructure. According to EU Commission report [3], a large number of European towns and cities suffer from increasing traffic congestion, which causes not only great costs (estimated at e80 billion annually), but also has significant influence on environment and hence on citizens’ life and health (around 23% of all CO2 emissions from transport in UE comes from urban areas).
2
Unfortunately, extending existing infrastructure is limited. Hence, methods for improving fluency of traffic using existing infrastructure need to be found. The main cause of limited traffic fluency in cities are intersections [4], because cars often have to stop before crossing them. Some existing solutions (both already implemented in real cities and these at the research stage) for increasing fluency at junctions will be discussed. The main aim of this article is to present a new approach for managing traffic at the intersection, based on giving full autonomy to self-driven, intelligent vehicles, which are slowly but successfully being introduced on real world roads [5]. Then, several decision-making methods for a vehicle at an intersection, based on ML, are proposed. They are evaluated using a specially developed platform and the results are discussed.
2
Related Work
Improving traffic at intersections in terms of safety and fluency was under study since the 19th century when the first traffic lights were introduced [6]. This is still the most popular solution for controlling traffic at junctions. Despite many advantages, like simplifying making decisions by drivers and increasing safety, traffic lights have negative influence on traffic fluency [6]. Hence, many systems were developed to adjust traffic lights’ cycles on neighboring intersections to increase overall bandwidth of the infrastructure [7,8,9]. To optimize traffic multi-agent systems are commonly used. For example, in [10] particle swarm optimization techniques are applied to adapt traffic lights. More flexible and innovative approach is described in [11], where every car is an autonomous agent and there can be many cars crossing a junction in different directions at the same time. To cross an intersection, an agent has to send a reservation request to the intersection control system. The control system simulates the path of the car at the junction and checks if it does not contain already reserved fields in the given time. The agent sends requests until one of them does not conflict with already accepted requests. The main problem with deploying this method in real world is a restriction that every intersection needs to be equipped with a special control system. Ensuring safety for other road users (cyclist, pedestrians) also has to be considered. In multi-agent systems two main techniques applied for learning are reinforcement learning, and evolutionary computation. However, other techniques, such as supervised learning are also applied. Good survey of learning in multi-agent systems working in various domains can be found in [12] and [13]. Learning can be applied in various environments. Predator-Prey is one of them, where several learning techniques were applied. [14] is an example of reinforcement learning application. In this work predator agents use reinforcement learning to learn a strategy minimizing time to catch a prey. Another domain, where several learning techniques were applied is target observation. In [15] rules are evolved to control large area surveillance from the air. In [16] Parker presents cooperative observation tasks to test autonomous generation of cooperative behaviors in robot teams. Lazy learning based on rein-
3
forcement learning is used to generate strategy better than a random, but worse than a manually developed one. Results of application of reinforcement learning mixed with state space generalization method can be found in [17], where Evolutionary Nearest Neighbor Classifier - Q-learning (ENNC-QL) is proposed. It is a hybrid model, a combination of supervised function approximation and state space discretization with Q-learning. This method has similar goals to hybrid algorithm presented in this paper: reduction of state space for reinforcement learning, with minimal possible information loss, so that the Markov property can be still satisfied after applying the reduction. For ENNC-QL this works best in deterministic domains. Technically, ENNC-QL algorithm works as a very sophisticated function approximator with built-in discretization support. The main application domain of ENNC-QL approach consists of problems with possibly large, continuous state spaces. [17] gives no information about experiments with pure discrete state spaces, so the range of applications is basically somewhat different than for the hybrid model proposed here. Additionally, the ENNC-QL algorithm requires several predefined phases in order to compute discretization and state space representation, including explicit exploration phase and two learning phases, so it might be hard to apply in non-stationary, changing environments. On the other hand, it is more generic than hybrid model described here, because it can be easily applied to any continuous state space problem without making any assumptions on the problem’s domain. There are also several other works about learning in multi-agent systems that are using supervised learning. Rule induction is used in a multi-agent solution for vehicle routing problem [18]. However, in this work learning is done off-line. In [19], agents learn coordination rules, which are used in coordination planning. If there is not enough information during learning, agents can communicate additional data during learning. Singh et. al add learning capabilities into BDI model [20]. Decision tree learning is used to support plan applicability testing.
3
New Approach For Intersection Crossing
It is assumed that with an increasing number of intelligent, self-driven cars it will be possible to resign from some current rules of crossing intersections in favor of higher degree of cars’ autonomy. To increase fluency of traffic in existing infrastructure, a new approach SInC (Simultaneous Intersection Crossing) for controlling traffic at a single junction is proposed. 3.1
General idea
The SInC approach is inspired by the zipper method [21], which is used when few lanes reduce to a single one. It increases fluency during changing lane by eliminating unnecessary car stops to avoid collision. Analogously, cars at a junction do not have to stop to give way to other cars but they can cross it simultaneously, reducing or increasing speed to avoid collision. Decisions about speed change are made by fully autonomous agent that is responsible for steering car while
4
crossing intersection. The main issue in SInC is making decisions that minimize crossing time and prevent accidents. These two goals are often contradictory. The agents make decision about velocity change basing on current situation at intersection in real time. 3.2
Model
To simplify the examination of SInC the following assumptions were made. Cars cross intersection straight only and they do not change lanes. The environment is (according to features described in [13]): – discrete - intersection is modeled as a grid, time is divided into steps, speed and acceleration of the car are natural and integer numbers, respectively; – fully observable - agent knows the whole state of the intersection in every time step; – deterministic (partially) - agent’s decision to change speed from s to s’ always causes that in the next time step car speed is s’ ; – non-episodic - every decision made by the agent is independent on previous ones. The state of the car on the intersection is defined by its’ location, direction and value of speed (s). The state of intersection is composed of all cars’ states. In every step of simulation agent chooses action ai ∈ A, which represents decision to change speed by i. The set of actions A is defined, according to maximal speed change a max, as: A := {a−a
max , a−a max+1 , . . . , aa max−1 , aa max }
.
(1)
If the speed after change is beyond the permitted range of speed values, the resulting speed value is set to the adequate end of the range. 3.3
Methods
Decision-making methods proposed in this article are based on ML. Of course, every other algorithm may be used to implement this approach. The agent steering the car knows the state of intersection and it uses that state to describe the state of the car with following attributes: – distance to target - end of intersection (dt ); – speed of the car (sc ); – distance of car to the nearest collision point with a car coming from the crossing road (collision car) (dcp ); – distance from the collision car to the collision point (dccp ); – speed of the collision car (scc ). Below term ”state” refers to these attributes. In Fig. 1 (left) an example realization of a state is shown, where dt =18, sc =2, dcp =10, dccp =8, scc =1. To avoid collisions and minimize time for crossing the intersection, agent uses ML to create mapping of every possible processed state of the car to the best action ai ∈ A in that processed state. Below methods for making decisions by agents are proposed.
5
Fig. 1. On the left the example state at an intersection is shown. Place where collision between the blue car and the nearest car to it (middle green) can occur is marked with a red dot. The end of intersection is marked with black lines. On the right initial situation at intersection for experiment with consistent goals is presented. Intelligent agent is marked with green color, agents simulating real traffic with red.
3.3.1 Simple method based on reinforcement learning (RL) In reinforcement learning, an agent experiments with the environment by taking actions that are not always optimal in the current state [2]. To teach an agent to make best decisions, agent gets reward r for every action which is sum of: – rc - negative reward for collision – rt - positive reward for reaching target – rs - negative reward for making move The reinforcement learning algorithm is used by the agent to learn how to get the biggest total reward during crossing the intersection. Rewards rt and rs promote crossing it as quickly as possible. Reward rc teaches the agent to avoid a collision. 3.3.2 Hybrid method based on reinforcement learning and state reduction (RLSR) A drawback of the method above is a fast growing number of states with the increase of the number of possible values of every part of the state (e.g. possible values of car speed). To limit the number of possible states, the part of state that describes possibility of collision (sc , dcp , dccp , scc ) can be reduced to a bivalent attribute informing if in the given state a collision is possible [22,23]. To reduce the state, a classifier is used [24]. To teach the classifier which states are dangerous, all states visited by car are classified as ”collision possible” if a collision happened during crossing the intersection and as ”collision impossible” if the car safely reached the end of intersection. Again the reinforcement learning is used to learn the best action in a given reduced state. Scheme of the method is presented in Fig. 2. An obvious deficiency of this method is losing considerable part of training information for reinforcement learning, which may affect the quality of agent decision. Some possible improvements are discussed in section 5.
6
Fig. 2. General scheme of hybrid method based on reinforcement learning and state reduction.
3.3.3 Hybrid method based on reduced state meaning (RSM) This method is similar to the previous one. Again classifier is used to reduce the state. Then, depending on the value of the reduced state various methods can be performed. For example if no collision is possible search algorithm A* [25] can be used to get to target with the shortest path. Since in this model it is assumed that the car does not change lanes, simple methods are implemented - if a collision is possible, the agent reduces the speed of the car, otherwise accelerates it. In both cases speed is changed by one. 3.3.4 Hybrid method based on reinforcement learning with reduced state and supervisor (RLS) Reinforcement learning is used as in the second method - with state reduction. Then the decision (speed value change) obtained from the reinforcement algorithm together with the state of car (not reduced) is passed to the supervisor, who decides if the decision is correct or not. If not, the supervisor can change the decision and that decision is executed by the agent. The supervisor is implemented using a classifier. To teach the classifier how to rate decisions, examples of following form are used: hsv , asv , classi . (2) Classifier has to store every visited state sv (vector of attributes) combined with action asv (one attribute) performed by the agent in that states as not labeled example. When collision occurs or car gets to the end of intersection safely, stored examples are labeled with class ”bad” or ”good” respectively. Then these examples are added to training set. The classifier should either be able to learn
7
Fig. 3. General scheme of hybrid method based on reinforcement learning with reduced state and supervisor.
online (increase quality of decision with every new example in scalable fashion) or be retrained with updated training set periodically. The decision, if action chosen by the reinforcement learning algorithm is correct, is based on classifying that action together with the current state. The supervisor decides that action is wrong only if he is sure to a certain, configurable, extent. The scheme of this method is shown in Fig. 3.
4 4.1
Realization And Evaluation of SInC Platform MABICS
To verify the SInC approach and methods presented above, a special platform MABICS (Multi-Agent Based Intersection Controlling System) was created. The platform simulates real traffic on an isolated, fully configurable intersection. An external application is used to generate possible moves of vehicles during intersection crossing [26]. MABICS allows implementing different methods for controlling single vehicle. Methods discussed previously were implemented using RLPark [27] and WEKA [28] libraries for reinforcement learning and classifiers, respectively. The platform supports also conducting experiments. During single experiment vehicles cross the intersection many times to generate knowledge how to make the best decisions. The outcome of the experiment is two graphs. They present change of the collisions number and the times of passing the intersection with gaining the experience by agent.
8
Fig. 4. Change of collisions number (left) and time needed for crossing intersection (right) with gaining the experience by agent - moving average with window size equal to fifty crossings of intersection.
4.2
Experiment
Agent controlling vehicle has two different goals to achieve. First is to cross intersection as fast as possible, second is not to collide with other vehicles. These two goals can be consistent (to avoid collision agent has to speed up) or contradictory (avoiding collision requires slowing down what causes later achieving end of junction). In experiment consistent configuration was used. The initial situation for experiment with consistent goals is shown on Fig. 1 (right). On the left side of intersection vehicle controlled by intelligent agent is placed. On the bottom there are 21 vehicles simulating real traffic moving with different, random, constant speeds. It is easy to observe that the optimal strategy for intelligent agent to achieve both targets is to accelerate to maximal speed. 4.3
Results
The experiment was repeated three times. Obtained average results of time and collision numbers for different methods are presented on Fig. 4. All agents except the one using RSM method were improving their decisions, but could not find the optimal strategy, which occurred to be too far from the initial, random decisions’ strategy. The agents learn to increase speed, which is expected behavior in problem with consistent targets, but they do accelerate only in some steps. Hence, they do not cross intersection fast enough to pass the vehicles coming from their right side. In such situation, when they ”meet” other vehicles in the middle of crossroad, they try to avoid collisions by increasing and reducing speed. It means, that they look for suboptimal strategy in contradictory goals situation and they partially succeed (except RLSR) - the number of collisions is reduced with continuing learning. Different behavior is shown by the agent that uses the RSM method. This agent achieves the best results. Analysis of every serie shows that in two of them this agent finds the best strategy (no collisions in last periods of series). In one serie the agent behaves similarly to other agents and looks
9
for suboptimal strategy in the contradictory situation, but still achieves better results (0.3 collision per crossing).
5
Conclusions And Future Work
The presented work is only an introduction to research on possibility of using ML to control autonomous vehicles at an intersection. The conducted experiments made it possible to verify new idea preliminarily as promising - some of the presented methods increased the quality of decisions with successive crossings, which may indicate that in a more realistic environment collisions will be completely eliminated. The future work should include targeting the problem of poor exploration of the search space, for example by introducing variable parameters for reinforcement learning. Additionally proposed methods should be improved with ideas that occurred during work - adding negative reward for stopping, marking not all visited states as ”bad” if a collision occurred, but only a few last states and adding new values of the reduced state in the RLSR method for different types of collisions. It is also necessary to integrate system MABICS with more realistic and more efficient application for generating possible moves of vehicles. Then experiments with different configurations of intersections and longer learning periods should be conducted.
References [1] Kaminka, G. A.: Robots are Agents, Too! 6th International Joint Conference on Autonomous Agents and Multiagent Systems, Honolulu, Hawaii, USA (2007) [2] Sutton, R. S., Barto, A. G.: Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA (1998) [3] European Commission: Urban mobility package - frequently asked questions [Online]. Brussels. Available at: http://europa.eu/rapid/press-release MEMO-131160 en.doc [Accessed: 15 October 2014] (2013) [4] Xia, X., Xu, L.: Coordination of Urban Intersection Agents Based on Multiinteraction History Learning Method, in ICSI’10 Proceedings of the First international conference on advances in Swarm Intelligence - Volume Part II, Berlin, Heidelberg, 2010 Springer-Verlag, pp. 383-390 (2010) [5] Markoff,J.: Google Cars Drive Themselves, in Traffic [Online]. New York: The New York Times, 10 October 2010, p. A1 (2010) [6] Datka, S., Suchorzewski, W., Tracz, M.: Traffic Engineering, Warsaw: Wydawnictwo Komunikacji i Laczno´sci, pp. 282;324;328 (1999) (in Polish) [7] Robertson, D. I.: TRANSYT: a Traffic Network Study Tool, Transport and Road Research Laboratory Report, (1969) [8] Taale, H., Fransen, W. C. M., Dibbits, J.: The second assessment of the SCOOT system in Nijmegen, IEE Road Transport Information and Control, Conference Publication No 454 (1998) [9] Sims, A. G., Dobinson, K. W.:The Sydney coordinated adaptive traffic (SCAT) system philosophy and benefits, IEEE Transactions on Vehicular Technology, vol.29, no.2, pp.130-137 (1980)
10 [10] Cajias, R. H., Gonzalez-Pardo, A., Camacho D.: A Multi-agent Traffic Simulation Framework for Evaluating the Impact of Traffic Lights, Proceedings of the 3rd International Conference on Agents and Artificial Intelligence, vol. 2 (2011) [11] Dresner, K., Stone, P.: Multiagent Traffic Management: Opportunities for Multiagent Learning. in Lecture Notes in Computer Science vol. 3898, pp 129-138 (2006) [12] Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, vol. 11, pp. 387-434 (2005). [13] Weiss, G.: Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. London: The MIT Press (1999). [14] Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents, Proc. of 10th Int’l. Conference on Machine Learning (ICML 93), Morgan Kaufmann, pp. 330-337, (1993) [15] Wu, A. S., Schultz, A. C., Agah, A.: Evolving control for distributed micro air vehicles, Proc. of IEEE Int’l. Symp. on Computational Intelligence in Robotics and Automation (CIRA 99), IEEE, 1999, pp. 174-179 (1999). [16] Parker, L. E., Touzet, C.: Multi-robot learning in a cooperative observation task, in Distributed Autonomous Robotic Systems 4, L. E. Parker, G. Bekey, and J. Barhen, Eds. Berlin: Springer-Verlag, pp. 391-401 (2000) [17] Fernandez, F., Borrajo, D., Parker, L. E.: A reinforcement learning algorithm in cooperative multirobot domains, Journal of Intelligent Robotics Systems, vol. 43, pp. 161-174 (2005). [18] Gehrke, J. D., Wojtusiak, J.: Traffic prediction for agent route planning, Proc. of the Int’l. Conf. on Computational Science, vol. 3, Springer-Verlag, pp. 692-701 (2008) [19] Sugawara, T., Lesser, V.: On-line learning of coordination plans, Proc. of the 12th Int’l. Workshop on Distributed Artificial Intelligence (1993) [20] Singh, D., Sardina, S., Padgham, L., Airiau, S.: Learning context conditions for bdi plan selection. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems - vol. 1, pages 325-332, Richland, SC (2010) [21] Minnesota Department of Transportation, ”Zipper Merge” [Online]. 2014. Available at: http://www.dot.state.mn.us/zippermerge/ [Accessed: 10 November 2014]. [22] Wiatrak, L.: ”Hybrid Learning in agent systems”, Master thesis, Cracow (in Polish) (2012) ´ zy´ [23] Snie˙ nski, B., W´ ojcik, W., Gehrke, J.D., Wojtusiak, J. Combining rule induction and reinforcement learning: An agent-based vehicle routing. In: Proc. of the ICMLA 2010. Washington D.C., p. 851-856 (2010) [24] Sammut, C., Webb, G. I.: Encyclopedia of Machine Learning, 1st ed.: Springer Publishing Company, Incorporated (2011) [25] Barr, A., Feigenbaum, E.: The Handbook of Artificial Intelligence, Vol. 1: Stanford, Calif.: HeurisTech Press ; Los Altos, Calif. pp. 64-67 (1981) [26] Mozgawa, J., Kazir´ od, M.: Steering vehicles in discrete space [Online]. Available at: https://github.com/myzael/Sterowanie-pojazdami-w-przestrzenidyskretnej/wiki [Accessed: 10 November 2014]. (in Polish) (2013) [27] RLPark, Introduction to RLPark [Online]. Available at: http://rlpark.github.io/ [Accessed: 10 November 2014]. (2013) [28] Witten, I. H., Frank E., Hall, M. A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Elsevier (2011)