Effective Racing on Partially Observable Tracks ...

1

Effective Racing on Partially Observable Tracks: Indirectly Coupling Anticipatory Egocentric Sensors with Motor Commands Martin V. Butz and Matthias J. Linhardt and Thies D. Lönneker

Abstract— The TORCS-based Simulated Car Racing Championship (SCRC) poses a demanding challenge for designing an effective racing car controller. Controllers do not receive any global track information, but only perceive simulated, carcentered sensory information about the current, local track properties and about surrounding opponents. Our racing controller, termed COBOSTAR, uses the sensors that give the most anticipatory information to learn a sensory-to-motor policy, which was optimized by means of the covariance matrix adaptation evolution strategy (CMA-ES). The basic policy was extended by additional modules to prevent detrimental skidding, to safely land after jumps, to implement effective opponent avoidance, and to recover and learn on-line from accidents. Owing to this approach, COBOSTAR won two out of the three competition legs of the 2009 SCRC and, without hardly any further modifications, also was amongst the first in the 2010 SCRC despite the addition of noise to the utilized sensors. This paper describes the COBOSTAR controller as it was submitted to the last of the three competition legs in 2009. Evaluations of distinct controller modules are provided where possible. A future outlook summarizes the lessons learned during the design of the racer and proposes the utilization of the framework also in broader research and application contexts.

I. I NTRODUCTION This paper introduces the controller architecture of our COBOSTAR racer (a loose acronym for “COgnitive BOdySpaces: TORCS-based Adaptive Racing”). Inspired by behavior-based robotics, subsumption architectures, and Braitenberg vehicles [1], [2], [3], the COBOSTAR racer generates an indirect sensory-to-motor mapping mainly relying on simulated distance sensors. COBOSTAR won two out of the three legs of the 2009 Simulated Car Racing championship [4] and also was amongst the first racers in the 2010 championship, despite the addition of sensor noise in the latter competitions. The championships were conducted by means of the Open Racing Car Simulator (TORCS) [5], which is an opensource car racing environment with a rather realistic track and car simulation engine. TORCS was derived and further developed from the robot auto racing simulator (RARS) [6]. The championship setup enforces the usage of only local information. It was introduced in the 2008 competitions at the IEEE World Congress on Computational Intelligence The authors are with the Department of Psychology, Cognitive Psychology III, COBOSLAB, and the Department of Computer Science, Chair of Artificial Intelligence and Applied Informatics, at the University of Würzburg, Germany, (phone: +49 931 318 2808; fax: +49 931 318 2815; email: [email protected], [email protected], [email protected]).

Fig. 1. The TORCS car racing environment provides a rather realistic physics simulator as well as advanced graphical support.

(WCCI-2008) and the IEEE Congress on Evolutionary Computation (CEC-2008) [7]. Further information about the current Simulated Car Racing Championship can be found on the web [8]. A screen shot of the game is shown in Figure 1 to illustrate its full graphics support, besides the advanced underlying physics simulator that controls the behavior of the cars. In regular TORCS, cars can be controlled either by a human player or by common car controllers with global track information. The Simulated Car Racing Championship setup [8], however, provides a client-server architecture, where the server realizes the controller in TORCS and the client is a standalone C++ or Java program, which receives simulated sensory information from the server and sends motor commands to the server. A detailed description of the 2009 championship can be found in [4]. In contrast to the common car controllers of regular TORCS, controllers in the championship setup receive only a limited amount of egocentric, local sensory information. Although this restriction poses a great challenge for the development of controllers, it also lays groundwork for more elaborate findings. These controllers are essentially in the same difficult position as autonomous systems in the real world, namely to make the best possible use of the scarce sensory data available. Thus, successful strategies may not only be of interest for the development of artificial intelligence in games, but they may also be well-applicable in autonomous systems that have to navigate in unknown environments. Thus, developing racing controllers for a video game can be more than pure child’s play and may lead to serious contributions to the field of robotics and to autonomous systems research in general.

2

Since the track properties are only partially accessible and provide only limited local information about the track, neither optimal speeds nor racing lines can be calculated precisely. Thus, our COBOSTAR racer relied on a general, hand-designed policy function that converts current distance sensor information into target speed and angle. This policy was optimization by means of the Covariance Matrix Adaptation evolution strategy (CMA-ES) [9]. In particular, the policy uses the information of the furthest reaching distance sensor in track direction as input, converting its distance and relative angle to the target values. Given the current speed of the racer, these target values are then converted into acceleration or breaking commands as well as appropriate current steering angles. This indirect sensory-to-motor mapping was extended by additional modules for particular situations and circumstances. These modules include rather common operations, such as an anti-lock braking system (ABS), antislip regulation (ASR), and a straight-forward gear shifting approach. More elaborate modules ensure effective off-track propulsion depending on wheel slip, reliable recovery when stuck against a wall, safe landings after jumps, and tracking and avoidance of opponent cars. Before delving into the details of the COBOSTAR controller and the mentioned features, we give an overview of related work and the TORCS championship setup. Various evaluation results of each COBOSTAR control feature provide further insights on their effectiveness. A discussion on the future of the competition and control designs as well as on the potential impact of the championship setup concludes the paper. II. R ELATED W ORK Several previous approaches for solving the simulated car racing challenge were rule-based and only slightly optimized in several respects. On the other hand, a very effective controller designed by Matt Simmerson, which won the competition leg at the World Congress on Computational Intelligence (WCCI) in 2008 [7], used a very general evolutionary neural network approach [10] to evolve an effective indirect sensory-to-motor mapping [11], [12]. In this case, a selected subset of sensory information was used and mapped onto the behavior outputs (throttle, break, and gear shifting) with a constraint NEAT approach. Later, an enhanced NEAT controller with pre-programmed gear-shifting and recovery policy as well as an overtaking sub-routine [13] was programmed, which outperformed Simmerson’s controller. These successes point out that the NEAT algorithm can be very useful to evolve robust controllers. However, we challenged this approach with a more constrained controller that maps distance and other sensory information indirectly onto speed and steering behavior. This approach is more in line with the indirect mapping used by [14], in which case imitative driving behavior was developed. Previous approaches to car racing were already developed for the forerunner of TORCS, the robot auto racing simulator (RARS) [6]. For example, Stanley et al. [15] developed a car

racing strategy that depended on range-finders and developed a sensory-to-motor mapping with the NEAT algorithm. In this case, the racing car perceived the track with eight range sensors and it mapped this information onto speed and steering controls. Interesting behaviors emerged such as the behavior to approach a curve from the outside and then cut through on the inside. This evolved control strategy was then further evaluated to develop a crash warning system [16], [15]. While the reported results point into an interesting direction, it remains an open question if NEAT is the most appropriate algorithm for such car racing competitions. Another evolutionary computation-based approach was implemented using genetic programming techniques. In this case, a large variety of strategies could be evolved, unfortunately yielding a rather inflexible driving behavior [17]. Nonetheless, the results suggest that evolutionary approaches are applicable in the TORCS competition for the development or optimization of sensory-to-motor mappings. Togelius, Lucas, and De Nardi [18] provide a general overview on the various computational intelligence challenges that car racing games can pose. They contrast the challenge to optimize a pre-structure strategy with the challenge to learn a completely new strategy from scratch. Certainly, the latter approach is much harder to realize, since the search space of full-blown car strategies is usually much larger and diverse than the search space in an optimization problem, in which the basic behavioral policy mapping is provided. Our COBOSTAR racer focuses on the optimization challenge, starting with a simple, parametrized sensory-tomotor mapping. Summaries of the best strategies that were submitted to the 2009 competition can be found in [4]. Out of these submissions, four had their strategy mainly learned or evolved from scratch; three had the basic mapping from sensors to motors designed and then optimized or enhanced this mapping with learning or evolutionary techniques—our COBOSTAR racer belongs to this category; six other contributions were mainly hand-coded. Thus, a diverse collection of competitors participated—three out of the five best racers fitted in the second category (including COBOSTAR), one was mainly evolved, and one was mainly hand-coded. III. S IMULATED C AR R ACING C OMPETITION 2009 As mentioned above, the 2009 Simulated Car Racing Competition [12] provided a client-server architecture, in which the server realizes the car controller in the TORCS game [5] while the client decides on the actual control commands. Particularly, the server converts the status of the racer on track into simulated sensor information and sends this information to the client. The client, in turn, has to use the sensor information in some way and generate motor commands out of it, which are sent back to the server. These motor commands are then applied to the car within the TORCS simulator by the server. The main challenge of the championship is that the track is not fully observable but the sensors provide only partial information about the track line, other road properties, the current

3

car state, and the behavior of opponents. This seemingly artificially created problem is grounded on the idea to only use information that could be retrieved by sensors available to an actual car or other autonomous robotic system. Thus, the controllers developed in the game scenario may very well be useful for the control of actual autonomous robotic vehicles. Table I lists and describes all available sensors. The most informative sensors about the track line ahead are the distanceSensors, since only those sensors provide information about the free space ahead. Since predictive information is essential to realize our envisioned anticipatory approach [19], the basic strategy of the COBOSTAR controller relies mainly on the distanceSensors’ information. Table II lists the motor commands available to the client.

TABLE II E FFECTORS THAT CAN BE USED TO CONTROL THE CAR IN THE S IMULATED C AR R ACING C OMPETITION . Name

Range

accel

[0,1]

Description Gas pedal command (0 means no gas, 1 full throttle).

brake

[0,1]

Brake pedal command (0 means no brake, 1 full brake).

switchGear

{-1,0,...,6}

The gear to switch to or to stay in if the current gear is specified.

steering

[-1,1]

Steering scaled between full left (-1) and full right (1), which corresponds to an absolute angle of 0.785398 rad.

IV. COBOSTAR OVERVIEW TABLE I S ENSORS AVAILABLE TO THE CLIENT- BASED CONTROLLER IN THE S IMULATED C AR R ACING C OMPETITION . Name

Range

Description

angleToTrack

[-π,π)

Angle between the car orientation and the direction of the track axis.

curLapTime

[0,∞)

Time elapsed during the current lap.

curDamage

[0,∞)

Current damage of the car (0 corresponds to no damage).

distFromStartLine

[0,∞)

Distance of the car from the start line along the track line.

distRaced

[0,∞)

Distance covered by the car from the beginning of the race.

fuelLevel

[0,1]

curGear

{-1,0,...,6}

lastLapTime

[0,∞)

opponentSensors

[0,100]36

Current fuel level. The current gear where -1 is reverse, 0 is neutral, and the gear number otherwise. Time to complete the last lap. Vector of 36 sensors that specify distances to opponent cars in meters. The sensors cover the full surrounding of the car, partitioned into 36 equally spaced sectors. Each opponent car is assigned to exactly one sector, if it is in range.

racePos

{1,2,...,n}

Position in the race with respect to the (n1) other cars.

rpm

[0,∞)

Revolutions per minute of the car engine.

speedX

(−∞,∞)

Speed of the car along the longitudinal axis of the car.

speedY

(−∞,∞)

Speed of the car along the transverse axis of the car.

[0,100]19

Vector of 19 distance to track edge sensors in meters, which cover the half circle facing towards the car’s orientation in equally spaced sectors. The maximum range of each sensor is 100 meters. When outside of the main track, then these sensors are undefined.

relTrackPos

(−∞,∞)

Distance between the car and the track axis. The value is normalized w.r.t to the track width: it is 0 when car is on the axis and -1 or 1 when on the right or left edge of the track, respectively, and smaller -1 or greater 1 when outside of the actual track.

wheelSpinVel

(−∞,∞)4

Vector of four sensors representing the rotation speed of the wheels.

distanceSensors

COBOSTAR’s control architecture consists of three major parts: A sensory-to-motor mapping provides the primary driving skills on the road and while off-road (termed on-track and off-track, respectively). An opponent monitor governs this mapping through tracking and avoidance of opponent cars, considering opponent cars as moving obstacles. Additional modules extend the basic sensory-to-motor mapping with more elaborated strategies in particular situations. Figure 2 illustrates COBOSTAR’s behavior-based subsumption architecture including the flow of sensory information through the different modules as well as the respectively controlled motor output. While some sensory information is directly mapped onto desired target speeds and steering angles, other information is used to selectively activate and deactivate more specialized behavior sub-routines and modules. The following sections detail the functionality of the individual behavioral modules as well as the module interactions. V. S ENSORY- TO -M OTOR M APPING The primary part of COBOSTAR’s driving behavior is based on optimized indirect mappings of sensory input to motor output. While being on-track (that is, on the actual racing track) COBOSTAR can mainly rely on the information provided by distanceSensors. However, while off-track (that is, outside of the actual racing track) the distanceSensors information is not available, requiring a different sensoryto-motor mapping. In the following, both on- and off-track sensory-to-motor mappings, the optimization procedure that was used for these two, and an on-line adaptation of the mapping while driving are detailed. A. On-Track The on-track strategy of COBOSTAR mainly relies on the 19 distanceSensors, which provide information about the track edges ahead that are surrounding the driver. The maximum distanceSensors’ value essentially provides the information in which direction it is possible to drive straight on for the longest time. Figure 3 illustrates the information provided by the distanceSensors.

4

Sensory-tomotor Mapping

Target

distanceSensors

on-track

speed

opponentSensors

off-track

angle

Sensory Inputs

Opponent Tracking

distFromStartLine rpm

Gear Shifting

angleToTrack relTrackPos speedY speedX wheelSpinVel

Additional Modules

Motor Commands brake switchGear

crash adaptation

accel

recovery behavior

stearing

acceleration reduction jump detection

Fig. 2. Overview of wiring and interaction of different parts and modules in COBOSTAR’s control architecture. Arrows indicate the flow of information from sensory input through the control architecture to motor output. For better distinction of overlapping arrows, arrows are color and style-coded.

velocity vt0 : p p4 α−9 6 d − θ1 , − p5 vt0 = p1 + p2 d + p3 max 0, θ2 − θ1 9 (2)

Fig. 3. The distanceSensors are used to determine heading direction and current target speed. The maximum distanceSensors’ value (thick line) indicates the direction in which the car can drive straight on for the longest time.

The on-track sensory-to-motor mapping thus uses the angle α0 and the distance d of the longest distanceSensors’ value. Angle α0 is further adjusted by taking into account the two neighboring distance measurements dl and dr to drive curves even more smoothly: d − dl (1) 2d − dl − dr In this way, the target angle slightly points towards the direction of the larger neighboring distance. Target angle α and distance d were then converted into a target velocity vt and a target steering angle τ by means of two functions. The method that converts the distance d and angle α into the target velocity vt first determines a target α = α0 − 0.5 +

as long as d < θ2 . Otherwise, vt0 is set to 1000. The equation can be tuned by six parameter values pi and two threshold values θi . Essentially, the equation determines the target speed out of constant, linear, and polynomial summands, which depend on the distance measure, and an additional subtraction component, which takes the measured angle into account to decrease the speed in sharp curves. The target velocity vt is set to the minimum speed defined in parameter p7 given the determined target speed from Equation 2 yields a velocity below p7 or to vt0 otherwise. The constant 9 comes from the fact that the angle α is a value in the range of [0, 18], where value 9 points directly towards the car’s current heading direction. Figure 4 shows the mapping from the distanceSensors to the velocity for several angles for the parameter setting used in the optimized strategy of COBOSTAR.1 The target velocity vt is then compared with the current velocity vc . If the current car velocity is smaller than the current target velocity (vc < vt ) then the car accelerates maximally. Otherwise, the strategy still scales the braking level. Neither brake nor acceleration are applied if the current velocity vc is smaller than p8 vt (that is, vt ≤ vc < p8 vt , where p8 was optimized to p8 = 1.13). If the current velocity 1 The exact optimization values were p = 43.23, p = 1.99, p = 1 2 3 104.8, p4 = 9.38, p5 = 907.7, p6 = 1.92, and p7 = 11.89. The thresholds were optimized to θ1 = 36.50 and θ2 = 97.33.

5 Mapping from the maximum distanceSensors’ value and its angle to target velocity 400 minimum alpha=9 alpha=11 alpha=13 max threshold

350

target velocity

300 250 200 150 100 50 0 0

20

40

60

80

100

max value of distanceSensors

Fig. 4. Offset, linear, and polynomial summands shape the mapping from the value of the maximal distanceSensor to the target speed. A minimum target speed value prevents overly small or even negative target speeds. A maximum distance value ensures that the reception of a large distance reading always is converted into a full throttle command.

is even larger, then the brake is applied increasingly strongly using the following scaling equation: b = min{1, p9 (vc − p8 vt )},

(3)

where slope parameter p9 evolved to p9 = 0.70. In this way, gradual braking is applied. This was expected to yield particularly effective car control in curves that become increasingly tight, since in this case immediate full braking may lead to uncontrollable car behavior. The steering is accomplished with a rather straight-forward mapping, by converting the target angle into the steering direction using: τ = p10 (9 − α), (4) where parameter p10 was optimized to p10 = 0.39. 1) Gear-Shifting: A simple not yet mentioned strategy modification is gear-shifting. Shifting gears simply depended only on the rpm engine revolutions currently measured. This was the standard procedure also provided with the basic controller given by the organizers. Tests on a few tracks suggested, however, that the car’s engine worked best when using the full spectrum of its gears. We shifted up each time when the engine reached a speed of 9500 rpm and shifted back down when the rounds-per-minute sensor dropped below 3300, 6200, 7000, 7300, and 7700 for gears 2, 3, 4, 5, and 6, respectively. No further evaluations nor optimizations were done for gear-shifting. 2) ABS and ASR: The final two parameters that were optimized on-track were the two parameters for the antilock braking system (ABS), which came with the basic competition code. Slip and range for ABS were optimized on-track to 11.7 and 10.18, respectively. Anti-slip regulation (ASR) was not applied while driving on-track since no benefits were found. 3) Track-Dependent Adjustment: As mentioned above, the angle estimate of the furthest distance is interpolated between the maximal distanceSensors’ value and its neighbors (see

Equation 1). Experiments on very wide speed-way tracks showed that this modification resulted in slightly irregular driving behavior, that is, the car would drive on a slightly wavy trajectory rather than straight. To avoid this, we had the car measure the track width with its distanceSensors upon the start of the race, assuming that the track width at the start reflects the track properties throughout the race. If the measured distance exceeded the hand-set threshold 28, the steering factor parameter p10 was set to 0.1 and the angular adjustment, which was usually set to 0.5, was set to 0.1 (see Equation 1). In this way, excessive steering was prevented and irregularities in the neighboring sensors were not taken as strongly into account. The result was a much more straight driving behavior reaching maximum speeds that were impossible without these adjustments. B. Off-Track In the event the car is not situated on the main track any longer due to a crash, exaggerated speed in a curve, or simply a steering mistake, the situation changes drastically. The distanceSensors are not available any longer. Moreover, steering becomes much more error-prone and also the wheel slippage is much stronger. Thus, a very different driving strategy needs to be applied. The off-track sensory-to-motor mapping relies on the angleToTrack sensor and the relTrackPos sensor (cf. Table I). The target speed off-track was designed to rely on a constant offset plus a linear scaling depending on the difference between a target angle β and the currently sensed angleToTrack value γ. The target angle was computed from the current relative position to the track middle axis relTrackPos t as follows: β = sgn(t)(abs(t) − q1 )q2

(5)

The target speed was then determined by combining this target angle with the current angleToTrack and a constant offset yielding: vt = q3 + q4 (max{0, 1 − q5 abs(γ − β)}),

(6)

where the finally used optimized parameters were q1 = 0.392, q2 = 0.150, q3 = 117.5, q4 = 123.6, and q5 = 34.56. Thus, the target speed was determined by offset q3 and further scaled by the angle of heading: the faster, the more the car is heading towards the track. The target velocity vt is then handled as in the on-track strategy to determine if the car should accelerate or brake. In sum, the off-road strategy tries to head towards the track again with a speed that mostly prevents excessive slippage or even spinning. On-road and off-road behaviors were optimized by evolving the mentioned parameters pi and θi on-road as well as the parameters qi off-road. The following section details how this was done. C. Strategy Optimization Parameter optimization was accomplished by the covariance matrix adaptation evolution strategy (CMA-ES) [9],

6

using the Java code available on-line [20]. Note that the optimization methodology is similar to natural gradient-based optimization in policy gradient reinforcement learning [21]. The optimization underwent first an on-track, then an off-track, and finally another very extensive on-track optimization turn. Further optimization turns could certainly yield even better controllers, but time (time to setup further runs rather than computational time) prevented further optimization efforts. Moreover, all parameters were optimized on various available tracks in TORCS and then compared on the other tracks for their generality. The most general parameter set was the one finally chosen for the competitions. Since we expected the optimization surface of the generated mappings to yield many local optima and since also the mapping strongly depends on the complexity of the track, we ran many independent optimization runs on various tracks available in the TORCS distribution. The results confirmed that the CMA-ES-based optimization mechanism does not necessarily converge to a global optimum. Indeed, in successive optimization runs on the same track the parameter combinations evolved did usually not match each other. Moreover, it turned out that sometimes parameter settings that were optimized on a different track yielded even better performance than the parameter settings that were optimized on that track. The biggest challenge for the off-road optimization part was to generate a useful fitness signal. If the car is controlled by a reasonably good strategy then the car will hardly ever come into an “off-track” situation. Thus, the fitness signal with respect to the offset optimization parameters is very weak and hardly depends on the off-road strategy. To generate a more useful and meaningful fitness signal, we thus used the evolved strategy from the settings described above and added a “crash strategy”. This detrimental driving strategy was applied every 300 meters and caused the car to drive in alternate trials completely to the left or right. The distance raced with this strategy thus mainly depended on the quality of the crash recovery and particularly on how fast and effective the car re-entered the racing track. We consequently left the fitness value the same as in on-road optimization but the strategy had the enforced crash behavior activated every 300 meters. We did not analyze the optimization of the off-road behavior in as much detail, but generally the results showed similar parameter variations as the optimized parameters for the on-road strategy. The off-road strategy optimization was done between the two on-road optimization turns. During on-road optimization, the off-road strategy stayed fixed. Further information on the exact optimization setup, parameter choices, and detailed results can be found elsewhere [22].

D. Adaptation in Successive Laps An additional behavioral on-line adjustment was made when the car entered the second and possibly successive laps. We created several criteria to adjust the parameter settings

in successive laps given particular behavioral values in the first lap. Seeing the best performance on the Aalborg track was also the one optimized on that track and seeing further more that this performance excelled the others by far, we decided to analyze the general properties in the first lap and adjusted our strategy parameters according to these measurements in the second lap. Consequently, we adjusted to the Aalborg strategy if the first lap’s average speed was below 29—indicating a really curvy track, where the distance sensor values and thus the average speed of the COBOSTAR racer stay rather small. We adjusted to the Street 1 strategy, if the average speed of the first lap was below 43. Moreover, the parameter values optimized for the Wheel 1 track were chosen if the minimum speed (after initial speed-up) was never below 61. In this way, we switched to a more aggressive strategy if the track appeared easy and to a more conservative strategy if the track appeared to be more complex. This strategy enhancement resulted in a distance gain of 1.67% averaged over all tested tracks. VI. O PPONENT T RACKING

AND

AVOIDANCE

After the competition leg at GECCO-2009, in which we won the qualifying but only came in third in the actual competition against the seven other best competitors of the qualifying, we realized that opponent avoidance was the essential behavior that COBOSTAR was still missing at this point. However, it seemed not very attractive to impose a rule-based opponent strategy module or strategy alternation module given the presence or behavior of some opponents. Moreover, we realized that the sensory information about the opponents was not very informative because it does not provide any information about the opponent speeds. Thus, to realize opponent avoidance, we first built an opponent monitor mechanism that tracks the surrounding opponents over time assigning them relative speeds with respect to our own current speed. Given thus the locations and relative speeds of the opponents, we projected the opponentSensors’ perceptions onto the distanceSensors’ perceptions, handling opponents essentially like moving obstacles. Due to this projection mechanism, nearly no further case-based opponent handling appeared necessary and rather nice opponent avoidance and even overtaking behavior emerged. The opponent monitor assigns opponent properties to each opponentSensors value below 100 meters (since 100 meters was the maximum range). By integrating these values over time, active opponent properties are assigned to the respective sensor readings by matching the closest locations over time with each other. Once a match is applied, the opponent’s speed can be derived by measuring the change in relative distance to the COBOSTAR racer. In this way, the speed of opponent cars relative to the controlled racer is derivable. Given the relative speed of an opponent car, the time until this opponent may be reached can be calculated, assuming no changes in the current behavior of the opponent until that

7

2 This is certainly an invalid assumption, but it is a sufficiently good approximation for the avoidance purpose. 3 Due to noiseless sensors and opponent behaviors, the results are deterministic and thus exemplary results (no averages) are reported.

Avoiding Opponents off 30.000

Time Steps

time.2 If this time is shorter than the time calculated until the two track edges closest to the respective opponent direction would be reached given the current speed of the COBOSTAR racer, then the distance of the respective distanceSensors reading is replaced by the relative speed-dependent projected distance of the opponent. That is, given a speed v of the COBOSTAR racer and a time t until opponent impact, then the distanceSensors in the respective direction are set to vt if this value is smaller than the current distanceSensors’ reading. The sensory-to-motor mapping function is then applied as before using the modified distanceSensors’ readings. In this way, if an opponent blocks the current best path because it is slower than the COBOSTAR racer and it is currently situated in the desired driving direction, this direction is blocked off and the sensory-to-motor mapping automatically chooses a direction around the opponent. Moreover, the COBOSTAR racer may decelerate if the alternative passage yields a more complex trajectory, because in this case alternative values of the distanceSensors would be smaller. Finally, another opponent mechanism was employed if an opponent was very close in front of COBOSTAR. This was especially done because the damage model in the TORCS racing engine incurs much more damage upon impacts on the front of the car in comparison to the back of the car. If there is an opponent right in front of the COBOSTAR racer, that is, if its distance is smaller than 15 meters, then the target speed determined by the sensory-to-motor mapping was further decreased by the factor max{0, (d − 5)/10}, avoiding damaging impact where possible. A rigorous evaluation of these mechanisms is once again very difficult, since success depends at least on the behavior of the competing cars, on the circumstances of the interaction, and on the damage model used in the simulator. Nonetheless, we did an evaluation on three tracks with four opponents with the same cars but with gameprovided controllers (berniw3, lliaw3, olethros3, tita3) ahead of COBOSTAR upon the race start. Fig. 5 shows that COBOSTAR finishes these tracks with opponent avoidance mechanisms turned on faster than when the opponent avoidance mechanisms were turned off3 . However, other results did not necessarily confirm this observation. It appears that a more aggressive driving behavior, which applies if opponent avoidance is switched off, can also yield emergent overtaking behavior by, for example, successfully pushing the opponent car to the side. Thus, although the employed opponent monitoring and avoidance mechanism definitely has the effect of avoiding opponents more effectively, it does not necessarily improve the performance of COBOSTAR in terms of achieved racing times or effective overtaking. It is left for future research to further improve the described mechanism.

on

20.000

10.000

0 Ruudskogen

Alpine2

Olethros

Track Fig. 5. Comparison of opponent avoidance switched on or off on three different tracks, five rounds each.

VII. A DDITIONAL M ODULES The basic sensory-to-motor mapping working in concert with the opponent monitor provides an overall reliable and successful driving performance. However, this rather general account cannot cover behavior that is necessary to cope with certain less common but quite problematic circumstances. Thus, four separate subsumption modules, of which each one offers an elaborate solution to a rather particular problem, were added to COBOSTAR’s control architecture. These modules are introduced in detail in the following. A. Off-Track Acceleration Reduction Anti-slip regulation (ASR) was applied when off-track by measuring the slip of the accelerating wheels. To do so, the average wheel speed of the two hind wheels was computed in km/h from the wheelSpinVel vector. The slip of the wheels was then defined as the difference between the sensed current velocity of the car and the average wheel speed. If the difference between the resulting slip was more negative than parameter q6 = −10.01, then the acceleration of the car was decreased by the subtracter (slip − q6 )/q7 , where parameter q7 evolved to q7 = 155. Despite this application of ASR, we later still observed rear wheel burn-outs, consequently loosing control over the car. Thus, despite ASR, the gas pedal was not used carefully enough. To prevent this problem, off-track speedreduction was introduced. This method reduces the to-beapplied acceleration stemming from the off-track sensory-tomotor mapping explained above by a certain percentage sR respective the rear wheels’ slip. Every time the car leaves the track, the sR value is reset to 90%. As long as the car remains off-track, the value is adjusted to the inferred surface properties in subsequent time steps. The adaptive adjustment has the goal of providing the best possible forward momentum on the current off-track surface without loosing control by keeping rear slip above values of −15.

8

The adaptation is realized by decreasing the sR value by 10% of its current value in each time step given a current rear slip above −15. The value is increased by 50% of its current value once rear slip drops below −15. The range of the sR value was between 0% and 100%. This off-track speed-reduction mechanism is deactivated when the car is back on-track for more than 30 time steps. The adaptive process ensures the best trade-off between keeping control over the car and providing sufficient forward momentum. Although, subjective observations of off-track and getting-back-on-track behavior credit the off-track speedreduction with good performance, an accurate objective evaluation is difficult. Further research has to prove if the used mechanism is the best one, or if evolutionary engineering could be used to optimize the hand-picked values for offtrack speed-reduction further. Moreover, a similar procedure could be developed for on-track behavior, to optimize the forward momentum with respect to the available track surface. B. Recovery Behavior Having an accident or driving off-track cannot always be avoided. Particularly if other cars are on the track as well, accidents may occur or the car may even be pushed off the track by another car. Due to the much more slippery surface off-track, it sometimes happens that the car comes in a state in which forward acceleration is not effective any longer. This is particularly the case when the car’s front is facing a wall. In this case, reversing the car is usually the only solution to recover from such a stuck situation. To be able to trigger such a recovery behavior, a stuck detection module was implemented. Its main function is to detect situations when the car is stuck against a wall and it cannot recover from it by forward movements. Stuck detection is realized by counting the number of time steps in which the car moves slower than q9 = 2.03 km/h, that is, below a pre-defined stuck speed. If the stuck counter reaches a predefined maximum of q8 = 54 and the car’s engine is below 1000 rpm, the current angleToTrack is saved, the stuck counter is reset to 0, and the curGear is switched to reverse until the current angleToTrack reading is less than half of the saved angleToTrack value or when the stuck counter again reaches its predefined maximum. This method yielded good performance when completely stuck, that is, hardly moving at all. However, further observations on stuck situations revealed a second type of stuck situation, which was not covered by the original stuck detection module. In this case, the car scrapes along a wall (usually with an angle of 45◦ towards the wall) with its outer front edge and cannot be turned back towards the track by further acceleration. In this case, the stuck detection mechanism above does not trigger because of the higher velocity with which the car is scraping along the wall. Therefore, a second stuck detection mechanism was introduced. It eventually triggers the backing up behavior

when the car is off-track, turned more than 22.5◦ away from the track axis, and has a speed between 5 and 50 km/h. These criteria have to be fulfilled for a certain period of time that is longer for higher speeds (as they have a better chance of freeing themselves after a while) and shorter for slower speeds (which have higher priority since more time is lost). To achieve this, a second stuck counter was used that increased in each time step by a certain amount given the criteria above are satisfied. It is increased by five minus 10% of the car’s absolute speed. Thus, for speed values between 5 and 50 km/h, the stuck-2 counter is increased by values between 4.5 and 0 in each time step. The stuck-2 counter’s maximum value after which a stuck-2 detection is triggered is set to 600. If the car leaves this situation before this value is reached, it is set back to zero and the car behavior continues normally. Moreover, to prevent the strategy from alternating between different stuck behaviors, the specified stuck-2 detection is disabled after each stuck or stuck-2 situation for the next 500 time steps. As in the other stuck case, if a stuck situation is detected, the angleToTrack value is stored, the car switches to reverse, and backs the car up until the angleToTrack value is halved or until the car is fully back on-track (indicated by absolute values of relTrackPos less than 0.5). Both recovery routines ensure that the car frees itself in situations in which further acceleration or steering does not pose a solution. However, despite a multitude of subjective observations of very successful stuck behaviors, again an objective evaluation of the resulting performance is very difficult, since stuck situations occur very rarely and can hardly be predicted nor generated automatically. Nevertheless, both stuck detection mechanisms are seen as crucial recovery mechanisms, because even if they are rare entities, consequences can be fatal: in the best case, a large amount of time will be lost; in the worst case, the track will never be finished. C. Jump Detection An additional control challenge comes from the fact that the sensors did neither provide information about the track surface nor about the track’s slope. However, extreme track slopes result in significantly different car behavior. Downhill, the car can accelerate much faster but also loses control faster in comparison to an uphill slope. Unfortunately, such track properties are hard to detect without a direct sensor reading. However, it is possible to detect when a wavy slope causes the car to jump. On one particular track (CG track 3) our car would jump over one particular crest and get totally smashed upon landing. Apparently due to an upcoming curve, the controller would tend to steer towards the left while still in the air, so that the landing led to an accident. To prevent this accident from occurring, we thus built a jumpdetection mechanism and consequent appropriate handling of the anticipated landing. Jump detection can be split into two separate elements: detecting the actual jump (when the car has lost contact

9

200

Speeds / Wheel Slip

150

100

Forward Speed Rear Wheels Slip Front Wheels Slip LateralSpeed

50

0

-50

-100 54,6

54,8

55,0

55,2

55,4

55,6

55,8

Time (sec.) Fig. 6. Development of four sensory values over time before, during, and after a jump on track CG track 3.

with the ground) and detecting the subsequent landing (when contact with the ground has been reestablished). Essentially, when the car is jumping and accelerating while in the air, the accelerating wheels will tend to rotate in comparison to the actual speed of the car. Thus, we used the speed of the car (speedX) and average speed of the rear wheels to detect jumps, while the speed of the front wheels was used to detect landings. When the car is in mid-air, any usage of the gas pedal results in a highly negative rear slip (rear wheels turn faster than the actual speed of the car), as the wheels experience no friction from the road’s surface. We used values of rear slip below −50 as the indicator for a jump, as long as the absolute value of lateral speed (speedY) was below 1 and forward speed was higher than 50 km/h. This ensures that jumps are only indicated if the main force on the car is a forward one, which in turn reduces the risk of a false alarm, for example, when the car has totally lost control in a sharp turn and is spinning around its own axis (which could produce similar low values of rear slip) or when the car is accelerating from the start line with high rear slip. Fig.6 shows the behavior of the relevant sensors during the crucial jump on CG track 3. Once a jump is detected, the jump detection module overwrites any steering action to ensure the most favorable front wheel posture for the moment when the car lands. Without overwriting steering actions, the car may have the front wheels turned as the jump may be followed by a track curve, as is the case on CG track 3. When the car hits the ground, the turned front wheels result in the car veering off towards the side immediately, which may be the worst kind of accident, as the car’s front usually is turned away from the track’s driving direction afterwards, requiring a lot of time for recovery. To prevent this accident, the front wheels

are positioned respecting the car’s flight angle by combining speedX and speedY sensory information. Since the car may not have driven perfectly straightforward over the crest that resulted in the jump, and since additionally the car may lean towards one side in mid-air, the car’s longitudinal axis is never exactly in line with its axis of flight. The difference between these two axes is the desired front wheel angle, which can be used to determine the currently desired steering action. By this means, the jump detection module forces the front wheels to be turned in the direction towards which the car is flying until a landing is detected. Landings are detected by front slip values above 15. High slip values in the front wheels are well apt for indicating a landing since the sudden weight shift to the front of the car and the consequent onset of friction during impact result in an immediate deadlock. Once landing is detected, the jump detection module stops overwriting steering actions and is reset to be prepared to detect a possible successive jump. Although our evaluations showed that the implemented jump detection module works reliably, there remains a risk of malfunctioning. Since the module restricts control over the steering wheel by the basic sensory-to-motor mapping, such malfunctions can have severe negative consequences. To reduce possible negative consequences of a false alarm when detecting jumps or of a non-detected landing, the jump detection module’s grip on the steering wheel was not only canceled after a detected landing, but also one second after the start of the detected jump (the jump with the longest observed flight time lasted for 0.8 seconds) It was also cancelled, if the car’s forward velocity increased by more than 5 km/h after the detected jump start, because in the air there is no forward acceleration possible. In sum, the jump detection module ensures safe landings utilizing the available sensory information. This in turn prohibits the occurrence of time-costly accidents without slowing down before crests. The high efficiency of the jump detection module can be seen in Fig.7. The figure shows the performance of five rounds on CG track 3. The basic sensoryto-motor mapping always caused the car to veer off the track after landing. The jump detection module reliably prevented this accident, resulting in overall faster lap times. Although crash detection (see below) also improves lap times after a crash, jump detection results in even better performance since it prevents the accident even in the first round and does not reduce speed around the area of the long jump in subsequent rounds. The time steps needed to complete five rounds without jump detection are 18399 while with jump detection only 16172 steps are needed. Moreover, the jump detection module not only recognizes such large jumps like the one on CG track 3 but also small jumps induced by little bumps on the road. This is evident, for example, on the tracks Olethros (number of necessary steps for five rounds with crash detection but without jump detection: 33099, with jump detection: 31351) and Mixed1 (15448 time steps without jump detection, compared to only 13070 time steps with jump detection).

10

Original Crash Detection

4.000

Jump Detection

Time Steps

3.000

2.000

1.000

0 1

2

3

4

5

Average

Round

Fig. 7. Comparison of five rounds on CG track 3 with the original behavior, additional crash detection, and additional jump detection. Crash detection improves lap times after the first round, jump detection improves performance even more, as no danger zones slow the car down. However, jump detection, of course, can only improve lap times on tracks that elicit jumping behavior.

D. Crash Adaptation Despite COBOSTAR’s reliable overall performance, no sensory-to-motor mapping can totally guarantee flawless driving—especially not in a dynamic surrounding with other cars and with such restricted sensory input concerning the layout of the track ahead. Thus, there remains always the risk of producing a crash (for example, by underestimating the difficulty of a turn and approaching it too fast). However, it is possible to learn from such a mistake by remembering the place where the accident happened and approaching this crash site with more caution in subsequent rounds. Therefore, we implemented an additional module that detects crashes and adapts the behavioral strategy in successive rounds by remembering particular crash points on the road using the distFromStartLine sensor. We additionally defined a severity measure for a crash occurrence in order to adapt only to crashes that actually lead to a significant loss in speed or a significant increase in car damage. The adaptation mechanism had to be carefully tuned in order to be effective, since an overly careful adaptation in successive laps could lead to a severe overall performance loss while an insufficient adaptation could lead to even worse crashes in successive laps. Crash detection was simply defined by the lateral speed of the car (speedY). If this speed was above 30, a crash location monitoring mechanism started for the next 500 iterations monitoring the worst angleToTrack measurement and the widest distance from the track axis. If the angleToTrack pointed towards the back of the track or the absolute distance from the track axis was larger than 1.2 in relTrackPos units, or if the curDamage increase that occurred during these 500 iterations was larger than 800 points, then this track

location was recorded. The severity of the crash s ∈ [0, 1] was determined by the scaled largest deviation from these threshold values. Moreover, if an opponent was close shortly before the occurrence of the large lateral speed, then the severity was further downscaled, since an interaction with the opponent might have caused the crash. The severity measure was then used to assess how strongly the target speed on and before the crash point should be decreased. To do so, the point in history was first assessed at which the speed v before the crash did not further increase and at which the lateral speed of the car was still sufficiently low. This point defined the end of the crash adaptation area. At this point, the speed of the car was scaled to a value of v(1 − 0.5s) of the speed v at that point. Given this change in speed and respecting the possible change in speed over successive iterations, the new target speed is progressively adjusted backwards in time given the actually measured speeds and changes in these speeds during the time before the crash occurrence point. This is done until this target speed reduction meets the actual speed monitored at this iteration time before the crash point. In this way it was possible to induce an earlier braking point before the supposed curve in which the crash actually occurred. This point was stored using the distFromStartLine measure denoting the beginning of the crash adaptation area. In successive laps, once the car enters a stored crash adaptation area, the target speeds generated by the sensory-tomotor mapping are overruled by the speed values determined by the speed reduction mechanism explained above. If this target speed in a crash adaptation area lies more than 10 km/h below the current speed of the car, a full braking action was imposed. If it lied closer to the current speed, then the amount of braking was linearly decreased. If it lies above the current speed of the car, then no change in motor commands was imposed. As the reader may have noticed, lots of scaling parameters are involved in this procedure and these parameters were only hand-tuned in the submitted COBOSTAR strategy. Thus, there is certainly still room for optimization. However, it is again very hard to automatically optimize these parameters, because crashes occur rather rarely and again the optimal parameters certainly vary from track to track and crash situation to crash situation. Nonetheless, as the results in Fig.7 show exemplary, the crash detection mechanism very often had a positive effect on the performance of the COBOSTAR racer. VIII. D ISCUSSION The TORCS racing car competition [12] is still rather young and still a rather big challenge with lots of room for improvement. Since only local track information is available, a generally applicable behavioral strategy has to be employed that does not make too many assumptions about the track. In this discussion we address a few alternative approaches not considered in COBOSTAR and end with an outlook on future research challenges.

11

The resulting COBOSTAR strategy has shown to ensure sufficient overall behavioral flexibility. With the additions of the recovery behavior mechanisms, the off-track traction control, and the jump detection module, COBOSTAR was able to recover from highly unlikely situations and handle jumps in a controlled way. The opponent monitor allowed the projection of opponents onto the distanceSensors yielding emergent opponent avoidance. The driving style that originates from COBOSTAR’s control architecture is in our subjective point of view rather human-like and very elegant and smooth, with one exception: As the current implementation of steering allows only steering movements of a certain size, minor angle corrections while driving on straight track parts sometimes result in minuscule left-rightswaying behavior. Videos showing COBOSTAR’s driving behavior can be found at [24]. Concerning the achieved competitiveness, it is hard to judge whether COBOSTAR can pose as a challenging opponent for human players due to their differences in skill and expertise. From our personal experience, COBOSTAR does pose a hard challenge to human players, but there may be others that are driving much better than we are. For example, one of the authors tried to beat COBOSTAR and needed several attempts to achieve a faster lap time on the track Forza (with the same car car1-trb1, keyboard as input device, and difficulty level pro). However, a much more objective comparison is to compare COBOSTAR’s racing times with opponent drivers that are already available in the regular TORCS distribution. Figure 8 shows the racing times of COBOSTAR as well as of bt3 and Olethros—two very fast and aggressive drivers that come with the TORCS distribution. As can be seen, COBOSTAR finishes the five-lap races with comparable racing times. Similar observations were made in a recent study, in which COBOSTAR was mostly beaten by a human player but usually yielded comparable racing times [25]. Again, it has to be emphasized that both human players as well as controllers in the regular TORCS distribution have access to global track information—such as the radius of the

Driver 500

COBOSTAR bt3 Olethros

400

Seconds

An alternative to our anticipatory behavior-based approach would be to take a more global perspective on the track. Particularly, it would be possible to scan the track in the first round applying a slow but safe controller and thus gain global track information. This information could then be used in successive laps to determine an approximately optimal track line and control the car accordingly [23]. However, although a basic racing line could be generated by these means, the track observability prevents the detection of the current track surface and also variations in the slope of the track are not directly detectable given the provided sensors. Thus, the track line and in particular also the according target speed cannot be fully optimized. We refrained from taking such an approach because the race only lasts five laps and a passive controller in the first lap may have resulted in significant behavioral drawbacks. Moreover, we were interested in how good the behavior of the racer can get without any allocentric track outline information.

300

200

100

0 Aalborg

Dirt6

Ruudskogen

Track Fig. 8. Comparison of COBOSTAR and two regular drivers that have access to global track information. Depicted is the time needed for finishing five laps on three different tracks.

next curve—and thus have a highly significant advantage. All in all, the COBOSTAR strategy has shown that even without global track information—be it provided or inferred—highly competitive racing behavior can be generated, which even partially outraces controllers that have access to global track information. Besides the lesson learned that a behavior-based subsumption architecture, which relies on anticipatory, egocentric track information, can yield a highly competitive racing controller, we like to stress that the racing car setup with local information may also be used for other research efforts. The goal of safely controlling cars autonomously on actual roads is not too far away any longer. The Tartan Racing team [26], winner of the DARPA Grand Challenge 2007 in urban car racing, has shown that autonomous car control is possible even in an urban setting. Albeit the Tartan car also strongly relied on allocentric GPS sensors, various egocentric sensors were used to adjust the local driving strategy. Inevitably, all autonomous systems have to rely on egocentric sensors to some extent, since allocentric sensors, like GPS, cannot provide exact up-to-date environmental information about the real world. Especially when the controlled autonomous system is supposed to function in any real world circumstances, only egocentric sensors can offer first-hand data that reflects the current state of the environment directly. Our solutions to determine the current desired speed and steering angle may very well be included in more sophisticated autonomous vehicles—either for providing potential steering commands or also for feeding a forward model with likely steering commands. The induction of opponent drivers onto the distanceSensors measurements may be incorporated in more sophisticated avoidance or overtaking routines. The jump

12

detection mechanism as well as the off-road acceleration reduction routine may be applicable in other scenarios to infer loss of grip on the road as well as to activate appropriate accident prevention routines. Finally, also the crash adaptation mechanism may be applied in other contexts to learn from inappropriate driving behavior and also to deduce appropriate, situation-dependent crash prevention. Although the TORCS-based racing environment is certainly only realistic to a certain degree, it is cheap to work with and it is flexibly accessible. Thus, future research on optimized car control, opponent handling and interactions are realizable in TORCS and are of significant research interest also beyond the current simulated car racing competition. Due to these considerations, the TORCS Simulated Car Racing Championship was enhanced providing additional focus sensors. Also, noise was added to the provided distance sensor measurements [27]. Moreover, a new “Demolition Derby” competition was established to further study car interactions [28]. These modifications may lead research an immense step forward towards a biological model of human drivers as well as towards the design of effective, cognitivelyinspired autonomously controlled robotic devices in general. The COBOSTAR racer may be considered another step in this direction. ACKNOWLEDGMENTS The authors acknowledge funding from the Emmy Noether program of the German research foundation (grant BU1335/3-1) and like to thank the COBOSLAB team as well as the team of Prof. Frank Puppe at the chair of artificial intelligence and applied informatics of the computer science department at their university. R EFERENCES [1] V. Braitenberg, Vehicles: Experiments in Synthetic Psychology. Cambridge, MA: MIT Press, 1984. [2] R. A. Brooks, “Elephants don’t play chess,” Robotics and Autonomous Systems, vol. 6, pp. 3–15, 1990. [3] ——, “Intelligence without representation,” Artificial Intelligence, vol. 47, pp. 139–159, 1991. [4] D. Loiacono, P. L. Lanzi, J. Togelius, E. Onieva, D. A. Pelta, M. V. Butz, T. D. Lönneker, L. Cardamone, D. Perez, Y. Saez, M. Preuss, and J. Quadflieg, “The 2009 simulated car racing championship,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 2, pp. 131–147, 2010. [5] (2007) TORCS, the open racing car simulator. [Online]. Available: http://torcs.sourceforge.net/ [6] (2006) Robot auto racing simulator. [Online]. Available: http: //rars.sourceforge.net [7] D. Loiacono, J. Togelius, P. L. Lanzi, L. Kinnaird-Heether, S. M. Lucas, M. Simmerson, D. Perez, R. G. Reynolds, and Y. Saez, “The WCCI 2008 simulated car racing competition,” Proceedings of the IEEE Symposium on Computational Intelligence and Games, p. 119126, 2008. [8] (2009) Simulated car racing championship software manual. [Online]. Available: http://sourceforge.net/projects/cig/files/ Championship2009Manual/ [9] N. Hansen and A. Ostermeier, “Completely derandomized selfadaptation in evolution strategies,” Evolutionary Computation, vol. 9, pp. 159–195, 2001. [10] K. O. Stanley and R. Miikkulainen, “Evolving neural networks through augmenting topologies,” Evolutionary Computation, vol. 10(2), pp. 99–127, 2002.

[11] (2008) WCCI simulated car racing results. [Online]. Available: http://cig.dei.polimi.it/?cat=4 [12] D. Loiacono, P. L. Lanzi, and J. Togelius, “2009 Simulated Car Racing Championship,” 2009, http://cig.dei.polimi.it/. [Online]. Available: http://cig.dei.polimi.it/ [13] L. Cardamone, D. Loiacono, and P. L. Lanzi, “Evolving competitive car controllers for racing games with neuroevolution,” Genetic and Evolutionary Computation Conference, GECCO 2009, pp. 1179–1186, 2009. [14] ——, “Learning drivers for TORCS through imitation using supervised methods,” Computational Intelligence in Games, vol. IEEE CIG 2009, pp. 148–155, 2009. [15] K. Stanley, N. Kohl, R. Sherony, and R. Miikkulainen, “Neuroevolution of an automobile crash warning system,” Genetic and Evolutionary Computation Conference, GECCO 2006, pp. 1977–1984, 2006. [16] N. Kohl, K. Stanley, R. Miikkulainen, M. Samples, and R. Sherony, “Evolving a real-world vehicle warning system,” Genetic and Evolutionary Computation Conference, GECCO 2006, pp. 1681 – 1688, 2006. [17] M. Ebner and T. Tiede, “Evolving driving controllers using genetic programming,” Computational Intelligence in Games, vol. IEEE CIG 2009, pp. 279–286, 2009. [18] J. Togelius, S. M. Lucas, and R. D. Nardi, “Computational intelligence in racing games,” in Advanced Intelligent Paradigms in Computer Games, N. Baba, L. C. Jain, and H. Handa, Eds. Berlin Heidelberg: Springer-Verlag, 2007, pp. 39–70. [19] M. V. Butz, O. Sigaud, and P. Gérard, Eds., Anticipatory Behavior in Adaptive Learning Systems: Foundations, Theories, and Systems (LNAI 2684). Berlin Heidelberg: Springer-Verlag, 2003. [20] N. Hansen. (2008) CMA evolution strategy source code. [Online]. Available: http://www.lri.fr/∼ hansen/cmaes inmatlab.html [21] V. Heidrich-Meisner and C. Igel, “Similarities and differences between policy gradient methods and evolution strategies,” in 16th European Symposium on Artificial Neural Networks (ESANN 2008), M. Verleysen, Ed., vol. ESANN 2008. Belgium: D-side publications, 2008, pp. 149–154. [22] M. V. Butz and T. Lönneker, “Optimized sensory-motor couplings plus strategy extensions for the TORCS car racing challenge,” IEEE Symposium on Computational Intelligence and Games, vol. CIG 2009, pp. 317–324, 2009. [23] B. Beckman, The Physics of Racing. online, 1991, retrieved 02/2010. [Online]. Available: http://www.sae.org.html [24] (2009) COBOSLAB YouTube channel. [Online]. Available: http: //www.youtube.com/user/COBOSLAB [25] J. Quadflieg, M. Preuss, O. Kramer, and G. Rudolph, “Learning the track and planning ahead in a car racing controller,” IEEE Conference on Computational Intelligence and Games, vol. IEEE CIG 2010, pp. 395–402, 2010. [26] C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, J. Dolan, D. Duggins, D. Ferguson, T. Galatali, C. Geyer, M. Gittleman, S. Harbaugh, M. Hebert, T. Howard, A. Kelly, D. Kohanbash, M. Likhachev, N. Miller, K. Peterson, R. Rajkumar, P. Rybski, B. Salesky, S. Scherer, Y. Woo-Seo, R. Simmons, S. Singh, J. Snider, A. Stentz, W. R. Whittaker, and J. Ziglar, “Tartan racing: A multi-modal approach to the darpa urban challenge,” Carnegie Mellon University, Tech. Rep., 2007. [27] D. Loiacono, L. Cardamone, M. V. Butz, and P. L. Lanzi, “2010 Simulated Car Racing Championship,” 2010, http://cig.dei.polimi.it/. [Online]. Available: http://cig.dei.polimi.it/ [28] M. V. Butz, M. J. Linhardt, D. Loiacono, L. Cardamone, and P. L. Lanzi, “Demolition Derby 2011,” 2010, http://cig.dei.polimi.it/. [Online]. Available: http://www.coboslab.psychologie.uni-wuerzburg. de/competitions/

13

Martin V. Butz received the Ph.D. degree in computer science from the University of Illinois at UrbanaChampaign, Urbana, IL, in 2004. Since then, he has been working at the Department of Cognitive Psychology III, University of Würzburg, Germany. In October 2007, he founded his own cognitive systems laboratory: Cognitive Bodyspaces: Learning and Behavior (COBOSLAB), funded by the German research foundation under the Emmy Noether framework. Dr. Butz has organized the workshop series on Anticipatory Behavior in Adaptive Learning Systems (ABiALS) since 2002. His participation in the racing car competition shows that anticipatory behavior principles are also applicable to other areas of artificial intelligence, including computational intelligence in games Matthias J. Linhardt studied psychology at the University of Würzburg, Germany, from 2003 until 2009. After his diploma thesis on modeling reaching behavior with neuro-evolutionary algorithms, he joined the COBOSLAB team because of his personal interest in the interdisciplinary field of psychology and computer science. At COBOSLAB, he participated in the development of the COBOSTAR entry to the 2009 Simulated Car Racing Championship and co-organized the 2010 Demolition Derby Competition. Since fall 2010 he is simultaneously studying Computing in the Humanities at the University of Bamberg, Germany. Thies D. Lönnecker started studying computer science and engineering at the Hamburg University of Technology, Hamburg, Germany, in 2001 and switched to the University of Würzburg, Germany, in 2005, for studying computer science and linguistics. He has been working as a tutor at the Department of German Studies in 2007 and 2008. Due to his knowledge in the field of cognitive systems and as part of the preparation for his diploma with the focus on artificial intelligence, he co-developed the COBOSTAR controller for the 2009 Simulated Car Racing Championship at the COBOSLAB.