optimize traffic signal timing plans to facilitate vehicle movement. These sys- ..... This work was carried out as part of the ESPRIT Framework V Vintage project.
Design of a Traffic Junction Controller Using Classifier System and Fuzzy Logic Y. J. Cao, N. Ireson, L. Bull and R. Miles Intelligent Computer Systems Centre Faculty of Computer Studies and Mathematics University of the West of England, Bristol, BS16 IQY, UK
Abstract. Traffic control in laige cities is a difficult and non-trivial optimization problem. Most of the automated urban traffic control systems aie based on deterministic algorithms and have a multi-level architecture; to achieve global optimality, hierarchical control algorithms are generally employed. However, these algorithms are often slow to react to varying conditions, and it has been recognized that incorporating computational intelligence into the lower levels can remove some burdens of algorithm calculation and decision making from higher levels. An alternative approach is to use a fully distributed architecture in which there is effectively only one (low) level of control. Such systems are aimed at increasing the response time of the controller and, again, these often incorporate computational intelligence techniques. This paper presents preliminary work into designing an intelligent local controller primarily for distributed traffic control systems. The idea is to use a classifier system with a fuzzy rule representation to determine useful junction control rules within the dynamic environment.
1
Introduction
Traffic control in large cities is a difficult and non-trivial problem. The intensively increasing number of vehicles and passengers often causes delays, congestion, accidents and other unwanted events that influence negatively many social activities. These difficulties will be solved by expanding networks of roads, constructing subways, changing drivers' habits, and so on. However, a reasonable operating traffic signal system could lower expenses within a shorter time. To this end, automated urban traffic control systems (AUTCS), such as TRANSYT[1], SCATS[2], LVA[3] and SC00T[4], have been widely applied to optimize traffic signal timing plans to facilitate vehicle movement. These systems are usually based on sophisticated equipment and possess broad functional capabilities, for instance automated data gathering, analysis and prediction of transport situations; automated decision making and control implementation, etc. [5] ~ [9].
343
The policies used in such systems divide broadly into two categories: the fixed-time systems where traffic plans are generated off-line and applied on-line, and on-line systems where traffic plans are generated on-line and are applied directly to the traffic. Both methods have their advantages and disadvantages[12]. However, the common feature to all methods is that the objective function of the algorithm is based on minimizing the total delay within a network, although some use a combination of delay and stops. There are a number of varying implementations which apply these methods, however, there are still many open problems, caused by the following circumstances: • most AUTCS have a centralized structure, i.e. information gathering and processing, as well as control computations, are carried out in a centralized manner, in this case efficiency is decreased due to the large volume and the heterogeneous character of information [5]. • distributed AUTCS also possess considerable drawbacks, caused by the inefficient accounting of interactions between subsystems and the complex communication structure [6]. • most of the existing (centralized and distributed) AUTCS operate by means of entirely quantitative algorithms that do not reflect the qualitative aspects of transport process [7]. The above considerations suggest that it is desirable to use new techniques for solving transport problems in large cities. These techniques are usually based on artificial intelligence principles and sophisticated computing devices, operating with large memory, at a high speed and in parallel. They are usually associated with so-called 'distributed intelligence systems', or 'distributed intelligence control systems' (DICS) in the case of control problems [10, 11]. The general approach is based on the decomposition of the system into subsystems, which is often presented as a hierarchical multilevel control structure. To achieve global optimality, hierarchical control algorithms are generally employed. However, these algorithms have a slow speed of reaction and it has been recognized that incorporating some computational intelligence into lower levels can remove some burdens of algorithm calculation and decision making from higher levels. Alternatively, a fully distributed architecture can be used in which each subsystem is solely responsible for one aspect of the system and where a coherent global control plan emerges from the interactions of the subsystems; no hierarchical structure is included. Such an approach is aimed at increasing the speed of response of the local controller to changes in the environment. In both cases there is a need to apply intelligent algorithms to designing efficient local controllers. This paper is devoted to designing an intelligent local traffic controller primarily for a distributed architecture. The idea is to use a classifier system [13], with both evolutionary and reinforcement learning, to determine appropriate control rules within the dynamic traffic environment.
344
2
Urban Traffic Control and Previous Work
The overall goal of a traffic control system is to design the hardware configuration and implement a software control algorithm for the system, with the flexibility to select a suitable criterion according to traffic conditions. The four criteria are minimizing: delays, stops, fuel consumption and exhaust emission rate. Within an urban traffic network, speed is restricted, and the fuel consumption and exhaust emission rate depend greatly on traffic conditions and on the timing of the signals in the network. Computer-controlled hardware supported by a software algorithm is proposed to achieve the overall objective function in its many forms. There is a growing body of work concerned with the use of adaptive computing techniques for the control and modelling of traffic junctions, including fuzzy logic [14] ~ [17], neural networks [18], and evolutionary algorithms [19] ~ [22]. Pappis and Mamdani's fuzzy logic controller (FLC) showed good performance in terms of the average delay at an intersection of two one-way roads with dynamic traffic flow rates, compared with a fixed-cycle traffic controller [14]. They used 25 control rules included in one fuzzy rule-set. In general, the fuzzy traffic controller (FTC) determines the the extension time of the green phase, with the fuzzy input variables of; the number of vehicles in the green approaches (= arrival) and the number of vehicles during the red signal period (= queue). These methods consider only the arrival and queue values of an intersection when the controllers are operating, but do not consider the total number of vehicles entering an intersection from other intersections by the second (= volume). Therefore, these controllers are not suitable for real intersections with variable flow rates. To solve this problem, Kim proposed an FLC using different control rules and different maximum extension time according to traffic volume, and consequently vehicles could flow smoothly at an intersection [15]. A similar approach was applied by Ho, who used nine fuzzy rule-sets according to arrival and queue flow rates [16]. Conventional FLCs use membership functions and control laws generated by human operators. However, this approach does not guarantee the optimal solution in fuzzy system design. Kim, et al. used a genetic algorithm (GA) [23] to tune the membership functions for the terms of each fuzzy variable [17]. Montana and Czerwinski [19] proposed a mechanism to control the whole network of junctions using genetic programming [25]. They evolved mobile "creatures" represented as rooted trees which return true or false, based on whether or not the creature wished to alter the traffic signal it has just examined. Mikami and Kakazu used a combination of local learning by a stochastic reinforcement learning method with a global search via a genetic algorithm. The reinforcement learning was intended to optimize the traffic flow around each crossroad, while the genetic algorithm was intended to introduce a global optimization criterion to each of the local learning processes [20, 21]. Escazut and Fogarty proposed an approach to generate a rule for each junction using classifier systems in biologically inspired configurations [22].
345
3
Classifier Systems and Fuzzy Logic
There are two alternative approaches to classifier systems, named as Michigan [13] and Pittsburgh [24] approaches. Some research work has combined both classifier systems with fuzzy logic. Valenzuela-Rendon proposed a fuzzy Michigan-style classifier system, which allows for inputs, outputs, and internal variables to take continuous values over given ranges. The fuzzy classifier system learns by creating fuzzy rules which relate the values of the input variables to internal or output variables. It has credit assignment and conflict resolution mechanisms which reassemble those of common classifier systems, with a fuzzy nature. The fuzzy classifier system employs a genetic algorithm to evolve adequate fuzzy rules [26]. Carse and Fogarty proposed a fuzzy classifier system using the Pittsburgh model, in which genetic operations and fitness assignment apply to complete rule sets, rather than to individual rules, thus overcoming the problem of conflicting individual and collective interests of the classifiers. The fuzzy classifier system dynamically adjusts both membership functions and fuzzy relations [27]. The work presented in this paper is different since it does not use fuzzy rules but generates the rules using a fuzzy coding strategy. In order to make our exposition self-contained, we introduce Michigan-style classifier systems and fuzzy logic briefly in this section.
3.1
Classifier systems
A classifier system is a learning system in which a set (population) of conditionaction rules called classifiers compete to control the system and gain credit based on the system's receipt of reinforcement from the environment. A classifier's cumulative credit, termed strength, determines its influence in the control competition and in an evolutionary process using a genetic algorithm in which new, plausibly better, classifiers are generated from strong existing ones, and weak classifiers are discarded. A classifier c is a condition-action pair c = < condition >:< action >, with the interpretation of the following decision rule: if a current observed state matches the condition, then execute the action. The condition is a string of characters from the ternary alphabet { 0 , 1, # }, where # acts as a wildcard allowing generalization. The action is represented by a binary string and both conditions and actions are initiahzed randomly. The real-valued strength of a classifier is estimated in terms of rewards obtained according to a payoff function. Action selection is implemented by a competition mechanism, where a strength proportionate selection method is usually used. To modify classifier strengths, the given credit assignment algorithm is used, e.g. the Bucket brigade [13]. To create new classifiers a standard GA is applied, with three basic genetic operators: selection, crossover and mutation. The GA is invoked periodically and each time it replaces low strength classifiers with the offspring of the selected fitter ones (the reader is referred to [13] for full details).
346
3.2
Fuzzy logic
Fuzzy sets allow the possibility of degrees of membership. That is, an element might be assigned a set membership value between 0 and 1 (inclusive). For example, given the fuzzy set "long queue", we may speak of a particular queue being a member of this set to degree 0.8. This would be a long queue, but not the longest queue imaginable. The function which assigns this value is called the membership function associate with the fuzzy set. Fuzzy membership functions are the mechanism through which the fuzzy system interfaces with the outside world. A typical choice for a fuzzy membership function is a piecewise linear trapezoidal function. A fuzzy system contains fuzzy sets defined for the input variables, output values and a set of fuzzy rules defined. A fuzzy rule is an "if condition then action" expression in which the conditions and the actions are fuzzy sets over given variables. Fuzzy rules are also called linguistic rules because they represent the way in which people usually formulate their knowledge about a given process. There are several approaches for obtaining "defuzzified" output from a fuzzy system [28]. Let us assume the output variables have four fuzzy sets associate with it: "ZE" (zero), "PS" (positive small), "PM" (positive medium), and "PL" (positive large). So we will assume that ZE, PS, PM, and PL represent specific numerical values. A fairly simple method is to calculate the nonzero activated weights, say wi, 'W2, w^ and w^, and then we compute the system output as the following: ^^ ^ w;i-NE + ioa-PS + u^s •PM + u;4-PL ,,, Output = —3 (1) E i = i Wi With more general output fuzzy sets, determination of the defuzzified output involves computation of centroid values of regions defined by overlapping membership functions [28].
4
Traffic Signal Control Using Classifier Systems
The goal of intelUgent traffic controllers is to diminish congestion caused by the stop signs of coordinated traffic lights when traffic volume is light, and pass maximum traffic flows through intersections in the case of heavy traffic volume. The classifier system based controller (CSC) designed in this paper will employ both evolutionary and reinforcement learning to determine useful control rules to be applied at a given traffic intersection where the traffic volume dynamically changes, so that vehicles can flow at an intersection smoothly. The block diagram of the proposed CSC is shown in Figure 1. The classifler system employed is a version of Wilson's "zeroth-level" system (ZCS) [29]. ZCS is a Michigan-style classifier system, without internal memory. It is noted that a number of slight modifications have been made to the system presented in [29], including the use of a niche-based GA. The reader is referred to [30] for full details.
347
Traffic Condition
Fuzzy Coding
Classifier System
Traffic Light
Figure 1: The block diagram of the proposed CSC Condition
Traffic condition from east Traffic condition from south
Action
Traffic condition 1 from north Traffic condition from west
State duration Traffic light state
Figure 2: Structure of the classifier system rules In order to avoid the genetic algorithm manipulating unnecessarily long rules, we extend the binary string representation in ZCS to a more general representation, which uses 0 to L (L < 10) for each variable (bit) position instead of the binary code. This reduces the string length significantly and appears to benefit multiple variable problems. For these hybrid strings, mutation in the GA is performed by changing an allele to a randomly determined number between 0 and L other than itself [32]. The design details of the CSC used in this paper, including the use of a Fuzzy representation, is described in the next section.
4.1
Individuals
The classifiers have the representation shown in Figure 2. The condition part of each classifier consists of four bits, which reflects the scalar level of queue length from each direction. In this application, the scalar level is set to 4, which ranges from 0 to 3, corresponding to the four linguistic variables, {zero, small, medium, large }. The action part indicates the required state of the signal and the period of duration. For instance, the rule 1302:14 says that if the queue from directions east and west are small (1) and zero (0), but the queue from directions south and north are large (3) and medium (2), then the traffic light stays green vertically (1) for 4 seconds (4). The membership function of this variable is given in Figure 3. The fuzzy output is calculated according to equation (1).
348
zero
small
medium
large
1.0 0.5
2
4
6
8
10
12 (vehicles)
Figure 3: The membership function of fuzzy variables
4.2
Evaluation of actions
We assume that the junction controller can observe the performance around it, let the evaluated performance be P. Traffic volume sensors are set at each of the intersections. They are able to count the numbers of the cars that come from all directions, pass through the intersection and stop at the intersection. In this study, the evaluation function we use to reward the individuals is the average queue at the specific junction. Let qi denote the queue length from direction i at the intersection (i = 1, 2, 3, 4), then the evaluation function is: f = \ Z)i=i 9i- We thus attempted to minimize this measure. Let us identify the A;-th cycle by a subscript fc, then fk for the cycle k is calculated by observing the sensor from the beginning of the A;-th cycle to the end of this cycle. Thus, the evaluated performance of the action performed at the A;-th cycle is computed as Pfc = /fc_i — /fc. Specifically, if P^ > 0, the matched classifiers containing the performed action should be rewarded, otherwise penalized.
4.3
Reinforcement learning
After the CSC has produced an action, the environment judges the output, and accordingly, gives payoff in the following manner: • Rewards: The objective of the local signal controller is to minimize the average queue length, fi. We have found the performance-based rewards are helpful in the environments we used in our experiments. The reward scheme we used was r, = 'yPi, where 7 is the reward constant and chosen to be 150 in the simulation. • Punishments: We use punishments (i.e., negative rewards). We found the the use of appropriate punishments results in improved performance (in a fixed number of cycles), at least in the environments used in our experiments. We also found that large punishments could lead to instability of the classifiers and slow convergences of the rules. The appropriate punishments should be 30 percent of normal rewards.
349
^®
|oo»l
Figure 4: The simulated traffic environment
5
Simulation Results
Since developing an accurate traffic simulation is a difficult problem, as well as building an appropriate traffic controller, and the main objective of this simulation is to test whether the proposed classifier system controller works under a traffic environment or not, we developed a simplified traffic simulator, which is similar to the one used in [21]. The simulator is composed of four four-armed junctions and squared roads, which is shown in Figure 4. Each end of a road is assumed to be connected to external traffic, and cars are assumed to arrive at those roads according to a Poisson distribution. Each intersection has two "complementary" signals: when the horizontal signal is red, the vertical signal is green and vice versa. Each of the cars attempts to attain the same maximum speed. When a car passes an intersection, it changes its direction according to the probability associated with that intersection. Specifically, let di, i = 1,2,3, be the next directions for a car, that is, {di} = { right, forward, left }. At each of the intersections, the probabilities {pdi} are previously given, where pd- corresponds to the probability of selecting an action di for the car passing through the intersection. Roads are not endless, thus only a limited number of cars is allowed to be on the road at a given time. If a car reaches the end of the road, then the car is simply removed from the simulation, and another car is generated, entering on a randomly selected road. Two types of junction controllers were used to control junction I in Figure 4 for the comparison: random controllers and our classifier system based controller. The random controller determines whether to change its phase or not randomly at 50% of probability and the duration of the state randomly between 1 to 6 seconds. As the major task is to test whether the proposed CSC can learn some good rules at the traffic junction, the other three junctions (II, III and IV in Figure 4) were controlled by random controllers here. The parameters used for the CSC were the same as those in [30] except that fewer rules (100) were
350
4000
5000 Time Steps
6000
Figure 5: Performance of a random traffic junction controller used to reduce the time taken to manipulate the rule-base and a slightly higher mutation rate was used due to the change in the number of variable values (0.05 per variable). Experiments were carried out for three different types of traffic conditions. In these simulations, the mean arrival rates for the cars are the same but the number of cars in the area is limited to 30, 50, and 70, corresponding to a sparse, medium, and crowded traffic condition. In all cases, the CSC is found to learn how to reduce the average queue length at the specific junction. For example. Figure 5 and Figure 6 show the average performances of the random controller and CSC respectively over 10 runs in the crowded case. It can be seen that the CSC consistently reduces the average queue length over 10,000 iterations whilst the random controller's junction queue length continues to oscillate.
6
Conclusion and Future Work
In this paper we have described the application of a learning classifier system scheme to generate the effective control of a traffic junction. This preliminary work on the classifier system based controller needs, of course, a number of extensions. But these results are encouraging since we have shown that the classifier system, with both evolutionary and reinforcement learning, can determine useful control rules within a dynamic environment, and thus improve the traffic conditions. We are currently extending this work in a number of directions, particularly examining ways of improving the use of classifier systems in distributed problem solving domains, i.e. multi-agent systems (e.g. after [31])
351
0
1000
2000
3000
4000
5000 6000 Time Steps
7000
8000
9000
10000
Figure 6: Performance of the classifier system based junction controller
7
Acknowledgment
This work was carried out as part of the ESPRIT Framework V Vintage project (ESPRIT 25.569).
References [1] Robertson, D. I.: TRANSYT- A traffic network study tool. Transport and Research Laboratory, Crowthorne, England (1969) [2] Luk, J. Y., Sims, A. G. and Lowrie, P. R.: SCATS application and field comparison with TRANSYT optimized fixed time system. In Proc. lEE Int. Conf. Road Traffic Signalling, London (1982) [3] Lowrie, P. R.: The Sydney coordinated adaptive traffic system. In Proc. lEE Int. Conf. Road Traffic Signalling, London (1982) [4] Hunt, P. B., Robertson, D. I., Bretherton, R. D. and Winston, R. L: SCOOT-A traffic responsive method of co-ordinating traffic signals. Transport and Research Laboratory, Crowthorne, England (1982) [5] Scemama, G.: Traffic control practices in urban areas. Ann. Rev. Report of the Natl Res. Inst, on Transport and Safety. Paris, France (1990) [6] Barriere, J., Farges, J. and Henry, J.: Decentralization vs hierarchy in optimal traffic control. IFAC/IFIP/IFORS Conf. on Control in Transportation Systems, Vienna, Austria (1986)
352
[7] Wu, J. and Hey decker B.: A knowledge based system for road accident remedial work. In Proc. Int. Conf. on Artificial Intelligence (CIVIL-C0MP91), Oxford, UK (1991) [8] Al-Khalili, A. J.: Urban traffic control - a general approach. IEEE Trans, on Syst. Man and Cyber. 15, (1985) 260-271 [9] Strobel, H.: Computer controlled urban transportation: a survey of concepts, methods and experiences. IFAC/IFIP/IFORS Conf. on Control in Transportation Systems, Vienna, Austria (1986) 10] Yang, D. and Huhns, M.: An architecture for control and communications in distributed artificial intelligence systems. IEEE Trans, on Syst. Man and Cyber., 15, (1985) 316-326 11] Decker, K.: Distributed problem solving techniques: a survey. IEEE Trans, on Syst. Man and Cyber. 17, (1987) 729-740 12] Robertson, D. I.: Traffic models and optimum strategies of control: a survey. In Proc. of Int. Symp. Traffic Control Systems. University of California, (1979) 262-288 13] Holland, J. H., Holyoak, K. J., Nisbett, R. E. and Thagard, P. R.: Induction: Processes of Inference, Learning and Discovery. MIT Press, Cambridge, MA (1986) 14] Pappis, C. P. and Mamdani, E. H.: A fuzzy logic controller for a traffic junction. IEEE Trans, on Syst. Man and Cyber. 7, (1977) 707-717 15] Kim, J.: A fuzzy logic control simulator for adaptive traffic management. In Proc. IEEE Int. Conf. on Fuzzy Systems, Barcelona, (1997) 1519-1524 16] Ho, T. K.: Fuzzy logic traffic control at a road junction with time-varying flow rates. lEE Electronics Letters, 32, (1996) 1625-1626 17] Kim, J. W., Kim, B. M. and Kim, J. Y.: Genetic algorithm simulation approach to determine membership functions of fuzzy traffic controller. lEE Electronics Letters, 34, (1998) 1982-1983 18] Ledoux, C : Urban Traffic flow model integrating neural networks. Transportation Research, Part B: Emerging Technologies, 5, (1997) 287-300 19] Montana, D. J. and Czerwinski, S.: Evolving control laws for a network of traffic signals. Proc. of 1st Annual Conf. on Genetic Programming, (1996) 333-338 20] Mikami, S. and Kakazu, K.: Self-organized control of traffic signals genetic reinforcement learning. Proceedings of the IEEE Intelligent Vehicles Symposium, (1993) 113-118
353
[21] Mikami, S. and Kakazu, K.: Genetic reinforcement learning for cooperative traffic signal control. Proceedings of the IEEE World Congress on Computational Intelligence, (1994) 223-229 [22] Escazut, C. and Fogarty, T. C : Coevolving classifier systems to control trafRc signals. In Koza, J. R (ed): Late breaking papers at the Genetic Programming 1997 Conference, Stanford University, (1997) 51-56 [23] Holland, J. H.: Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA (1992) [24] Smith, S.: A Learning System Based on Genetic Algorithms, PhD Thesis, University of Pittsburgh (1980) [25] Koza, J. R: Genetic Programming. MIT Press, Cambridge, MA (1992) [26] Valenzuela-Rendon, M.: The fuzzy classifier system: motivation and first results. In Schwefel, H. P. and Manner, R. (eds): Parallel Problem Solving from Nature I, Springer-Verlag, (1990) 338-342 [27] Carse B. and Fogarty, T. C : A fuzzy classifier system using the Pittsburgh approach. In Davidor, Y., Schwefel, H. P. and Manner, R. (eds): Parallel Problem Solving from Nature III, Springer-Verlag, (1994) 260-269 [28] Kosko, B.: Neural Networks and Fuzzy Systems. Prentice-Hall International Editions (1992) [29] Wilson, S. W.: ZCS: A zeroth level classifier system. Evolutionary Computation, 2, (1994) 1-18 [30] Bull, L: On ZCS in Multi-Agent Environments. Parallel Problem Solving From Nature - PPSN V, Springer Verlag (1998) 471-480 [31] Bull, L., Fogarty, T. C , and Snaith, M.: Evolution in Multi-Agent Systems: Evolving Communicating Classifier Systems for Gait in a Quadrupedal Robot. In Eshelman, L. J. (ed): Proceedings of the Sixth International Conference on Genetic Algorithms, Morgan Kaufmann, (1995) 382-388 [32] Cao, Y. J. and Wu, Q. H.: Mechanical design optimization by mixedvariable evolutionary programming. In Proc. of IEEE International Conf. on Evolutionary Computation, (1997) 443-446