NAFIPS CarFinalwPages - Portland State University

Using DHP Adaptive Critic Methods to Tune a Fuzzy Automobile Steering Controller Larry J. Schultz Thaddeus T. Shannon George G. Lendaris Electrical Engineering Dept. Systems Science Ph.D. Program Systems Science Ph.D. Program Northwest Computational Intelligence Laboratory . P.O. Box 751, Portland State University, 97207 [email protected] [email protected] [email protected]

ABSTRACT The approximate dynamic programming method known as Dual Heuristic Programming (DHP) is applied to the design of a fuzzy controller for a 4-wheel, 2-axle vehicle. This controller is designed to guide an autonomous automobile on a curved road while maintaining lateral acceleration at comfortable levels. 1.

Introduction The purpose of this paper is to demonstrate the use of Dual Heuristic Programming (DHP) to “tune” (complete the design of) a fuzzy controller to be approximately optimal, according to selected criteria. The task is to control the steering and velocity of an automobile on a curving roadway. We previously addressed neuro-control of automobile steering and velocity on a straight road in [1][2]. This paper reports extensions of that work to curving roadways, and to the use of fuzzy (instead of neural) controllers. 2.

Dual Heuristic Programming DHP is a member of a family of approximate dynamic programming techniques that may be applied to non-linear control problems [3]. When DHP is applied to develop an approximately optimal control law the process may be contrasted with linear optimal control. The well known linear quadratic regulator (LQR) problem [4] is the problem of finding an optimal control over a specified interval for a linear plant, such that a quadratic cost function is minimized. The linear system is governed by a system equation of the form:

x& = Ax + Bu

with x the state vector and u the control vector. The quadratic cost function is of the form:

J=

1 ∞ T [ x Qx + u T Ru]dt 2 ∫0

where Q and R are matrices weighting the relative importance of minimizing the values of the various states and controls. The problem is to specify a control law that defines u such that J is minimized. The closed form LQR solution is:

u = −R − 1 B T Px where P is the positive definite solution of the steady state Algebraic Ricatti Equation. For many real world problems, of course, linear optimal control methods cannot be applied, most often because the system dynamics cannot be adequately represented as a linear system. Difficulties also arise when design requirements cannot be expressed through a quadratic cost function. Further, there are cases where the knowledge available for the given problem may more easily be expressed in terms of the Fuzzy Logic formalism than in the linear algebraic formalism. Consider a generic discrete-time controller-plant system. We express our control objective through a primary utility (or cost) function of general form:

U [t ] = f (R[t], u[t ], t)

where the state vector is now denoted R[t] to adhere to the conventions of approximate dynamic programming literature. . We may then form a secondary utility (or cost) function ∞

J [t ] = ∑ γ k U [t + k ] k= 0

where γ is a discount factor. Once again, the problem is to select a control law that minimizes J. The expression for J[t] may also be written in a form known as the Bellman Recursion:

J [t ] = U [k ] + γJ [t + 1]

We implement the controller as a learning structure (be it a neural network, a neuro-fuzzy structure, or some

.

This work was supported by the National Science Foundation under grant ECS-9904378.

other trainable function approximation device) with output u[t] (the control vector) and input R[t] (the state vector). To train the control structure using, say, a gradient-based method would require knowledge of the gradient of the cost function with respect to the controller output, ∂J [t ] ∂u [t] . In the DHP method, the controller training information is obtained indirectly through an entity known as the critic. The critic is implemented as another learning structure, with input R[t] and output ∂J [t ] ∂R[t ] , the gradient of the cost function with respect to the state vector. A full derivation and discussion of the DHP training equations may be found in [5]. Here we summarize the controller training equation as (in tensor notation, and assuming a discount factor of 1):

∂J [t ] ∂U [t ] ∂J [t + 1] ∂R i [t +1] = + ∂uk [t] ∂u k [t] ∂R i [t + 1] ∂u k [t]

The critic is trained to adhere to the Bellman Recursion over time: ∂ J [t ] ∂ U (t ) ∂U (t ) ∂ u j ( t) = + ∂ R[ t] ∂R i (t ) ∂u j (t ) ∂Ri (t ) ∂J [ t + 1]  ∂ Rk ( t + 1) ∂ Rk ( t + 1) ∂u m (t )  + +   ∂ Rk [ t]  ∂Ri (t ) ∂ u m ( t ) ∂ Ri ( t )  Evaluating above terms of the form: ∂ R[ t + 1] and ∂ R[ t + 1] ∂ u[ t ] ∂R[t ] requires that a differentiable model of the plant be available (hence, DHP is a model-based method). A variety of function approximation methods based on learning to effect system identification may b e employed (for example, a fuzzy TSK model) to obtain the required derivatives [6]. An interesting interpretation of the critic function in DHP arises when considering linear optimal control. Recall that P is the solution to the algebraic Ricatti equation. For a linear plant, we may write an expression for J in terms of P [4]:

J [t ] =

1 R[t]T PR[t ] 2

and ∂ J [ t] = PR[t ] ∂ R[t ] So in the LQR context the gradient of J with respect to the state vector is a linear function of the states and P has a closed form solution. For a nonlinear system that relationship is nonlinear and (most likely) no analytical

solution exists. The critic training process may be interpreted as an approximate numerical solution for the equivalent of P for the non-linear system. DHP therefore represents a method of developing an approximately optimal solution for problems where linear optimal control methods are not applicable or desirable. We are concerned herein with the approximately optimal tuning (refining the design) of fuzzy controllers. 3.

The Vehicle Model A “single-track” or “bicycle” mathematical model of the four-wheel automobile captures the critical features of steering and velocity dynamics [7]. In the single track model, the two front wheels are lumped together and modeled as a single wheel in the centerline of a single track. The same is done for the rear wheels as shown in Figure 1. v mb

δb

δf

β cg

lb

lf

FDi

l=lb+ l f m=mb+ mf

FLi αi

vi

mf y

y0

δi

βi Where subscript i is: f for front wheels b for back wheels

x ψ ρ = ψ&

x0

Figure 1 – Single-track vehicle model diagram The motion of the vehicle is described by the state variables v , β , ρ , and ψ . The magnitude and direction of the vehicle’s velocity at its center of gravity are specified by v and β , respectively. The angular rate of rotation is specified by ρ , and the orientation of the vehicle relative to inertial coordinates is termed ψ . Lateral motion is induced by the steering angle of the front wheels, δ f . Back wheel steering was NOT used for our study – back wheels were fixed at a zero angle. Lateral force (FLi) acts in a direction perpendicular to the wheel axis. Steering force magnitude is a function of the angle αi , the difference between the wheel angle and the angle of the wheel axle velocity vector. Note that the back wheels produce a side force during a turn even though they are not steered. Drive force (FD i) acts in a direction inline with the front wheels with a magnitude dependent on wheel slip κ

IFSA/NAFIPS Conference, Vancouver, B.C., July, 2001 (manuscript date: 3/30/01)

2

(indirectly on engine throttle/brake) as described below. The vehicle dynamics may be described by the following equations, upgraded from the constant velocity model of [7]:

 mv(β& + ρ ) − sin ( β ) cos( β ) 0  f x        mv&  =  cos( β ) sin ( β ) 0  f y   ml f lb ρ&   0 0 1 m z   

with forces and moments described by:

( ) ( ) ( )

 f x   − sin δ f  f  =  cos δ f  y  m z   l f cos δ f

0 1 − lb

( ) sin (δ ) l sin (δ ) cos δ f f

f

f

( )

 FLf αf  − cos(β)     FLb (αb ) − sin (β )  F (κ )   D 0   F  drag 

Lateral Accel. (m/s2)

The lateral (FL) and drive (FD) force were mo deled as non-linear relationships after the commonly used empirical model of Pacejka [8]. Pacejka’s model is linear for small wheel angles (or wheel slips) but available force reaches a saturation level for higher wheel angles, reflecting the onset of sliding. Figure 2 shows a sample lateral force curve; the longitudinal force curves are quite similar. Drag force was estimated as proportional to the square of vehicle velocity (as in [9], for example). 10 8 6 4 2 0

For our study we used a vehicle of 650 kg mass, with a length of 4 meters and a center of gravity in the geometric center of the vehicle. 4.

The Guidance Problem and Controller Design We selected the task of developing a vehicle guidance system for operation on curved roads at highway speeds. General performance requirements for the system are: • The vehicle must stay within its lane. Lateral position error (from the center of the lane) should be maintained at 1 meter or less. • Sustained acceleration of over 4.5 m/s 2 (one half of a “g”) should be avoided (for comfort). • The vehicle should be maintained at near nominal speed when possible, but we choose to reduce vehicle speed as needed in order to satisfy the acceleration requirement during sharp curves. We assume that our vehicle possesses a camerabased vision system as referenced in [10] from which the following information available to the controller: ρˆ : road curvature or target yaw rate

ψˆ : ye :

road heading or target yaw angle lateral position error

Sliding (saturation) Transition to sliding Good traction (approx. linear) 0

0.05

0.1

0.15

0.2

Slip angle (rad)

Figure 2 – Tire angle - lateral force relationship (sample curve) The position of the vehicle in inertial coordinates may be tracked my augmenting the state model:

x&0 = v cos( β + ψ ) y& 0 = v sin ( β +ψ )

(3.3)

In order to implement a state feedback controller, we would like to ensure that the entire state vector is observable. If we outfit the vehicle with lateral accelerometers and load cells at the front and rear axles, and a longitudinal accelerometer along the vehicle’s axis (all feasible), then it can be shown that all of the elements of the state vector may be reproduced from measurements.

In considering a control design for this guidance problem, we first begin by characterizing the vehicle model of the previous section. If wheel and sideslip angles are assumed small the trigonometric terms in the model have minor effect, and there are two major nonlinear effects in the vehicle model: 1. The lateral and longitudinal tire force relationships have saturation levels, with the effect that vehicle responsiveness decays for higher acceleration maneuvers. 2. The dynamic model is nonlinear in velocity, with the effect that at higher velocity the vehicle is more responsive to steering than at lower velocity. Additionally aerodynamic drag is nonlinear in velocity, increasing as the square of velocity. These two non-linear effects, coupled with our discontinuous acceleration requirement, motivate our use of fuzzy controller. We use a TSK structure [9] and develop fuzzy rules with antecedents designed to deal with these major non-linear effects. For steering control, we need a baseline front wheel angle that is dependent (in a nonlinear fashion) on the curvature of the road. We also need to trim front wheel


3

angle to adjust for position or yaw error, recalling that lateral dynamics change in a nonlinear fashion with vehicle velocity. We therefore structure the TSK controller with Gaussian membership functions based on ρˆ and v as:

(

)

 ρˆ − ρˆ 2 (v − v )2  mi ( ρˆ , v ) = exp − i 2 − i 2   2σ ρ 2σ v   where ρˆ i and vi specify the center of rule i. The rule consequents are of the form:

δ f i = K0 i + K y i ye + Kψ i (ψ − ψˆ )

The first (bias) term may be interpreted as the nominal wheel angle required to hold a curve. The linear part of the TSK consequent is in the form of a PD control law for lateral position (since lateral error rate of change is roughly proportional to yaw error). Turning our attention to velocity and acceleration control, we will need to design for the nonlinear drag force (varies as v2). We also note that the nominal lateral acceleration during a curve may be expressed as ρˆ ⋅ v . Therefore the rules that were established for steering control are also appropriate for velocity control. The velocity control consequents are of the form: κi = Kκ i + K vi (v − vˆ ) + K v&i v& Here the linear portion of the consequent again represents a PD control law for velocity. The bias term serves two functions. For gentle curves, the bias term represents nominal wheel slip (thrust) required to hold nominal velocity. For sharp curves, we would like the bias term to produce braking to lower velocity and reduce lateral acceleration. Given the antecedent and consequent structures of the fuzzy rules as described, the controller’s output may be written as:

 δ f   mi ( ρˆ , v )   ∑ δ  i   κ  i  u=  f =  κ  ∑ m j ( ρˆ ,v ) j

Twenty-five rules were established, with rule centers at ρ ˆ : [-.2 -.1 0 .1 .2] rad/s and v :[20 25 30 35 40] m/s. Note that at the ext reme points nominal lateral acceleration is near 1 g, the physical limit for our terrestrial vehicle. The bias terms in the rule consequents were prestructured by experimentally (via simulation) determin-

ing the wheel angle and wheel slip required to make a turn of .1 rad/s at 30 m/s and scaling these values linearly based on rule centers ( ρ ˆ , v ). The linear parameters in the rule consequents were set to zero. The controller structure described was hypothesized to have the power to meet our design objectives. The challenge was to tune the controller, for which task we employed the DHP methodology. 5.

DHP Setup and Training Process Our first task was to develop a primary cost function that embodied our design objectives. We developed two different cost functions and will discuss results obtained using each of them. Velocity and position tracking are dealt with in the following (first) version of the cost function:

U 1 [k ] = y e [k ]2 + 13ψ e [k ]2 +

1 1 v e [k ]2 + v&[k ]2 20 20

where the subscript e denotes error from target. The various weighting factors were selected primarily for the purpose of normalizing unit differences. In the second cost function, we added a term for the purpose of acceleration control:

 U 1 [k ], U 2 [k ] =  2 U 1 [k ]+ 5( ρˆ ⋅ v − 4.5) ,

ρˆ ⋅ v < 4.5 ρˆ ⋅ v ≥ 4.5

The weighting factor of 5 was chosen to cause acceleration cost to override velocity error cost. The DHP critic was structured as a TSK model using exactly the same 25 rule antecedents that were used for the controller. The critic had 7 outputs corresponding to the gradient of J with respect to each of the elements of the augmented state variable vector R: [ β, v, ρ, ye ,ψe , ve , v& ]T. The critic consequents were of the form (using

λv as an example):

λv i = K v i R T

where Kvi is a vector of consequent parameters for the velocity output of rule i. All critic consequent parameters were initialized to zero. A parameterized “s-curve” roadway was established for training purposes. The s-curves consisted of a short straightaway entering a left (or right) turn of specified turn radius followed by a section of equal radius but opposite turning direction, finishing with another short straightaway (see Figure 3).


4

against v . Indeed the controller has learned to provide more steering damping for higher vehicle velocity. 2

40 30

1.5

turn radius r

turn radius r

20 10

target ρ = v /r

0 0

100

200

300

x0 position (m)

Lat. Pos. Error (m)

y0 position (m)

50

Vehicle "crashes" at +/- 1.5

1 0.5 0 -0.5 -1

Desired envelope +/- 1

-1.5

Figure 3 – Sample "s-curve" roadway used for training.

.2 to .2 rad/s for velocities ranging from 20 to 40 m/s. A training pass consisted of simulating vehicle motion through each of the 10 s-curves in turn while training the fuzzy controller and critic structures. If the vehicle “ran off the road” (ye >1.5m), the vehicle was reset to the beginning of the s-curve. Simple gradient descent was used to train the critic and controller consequent parameters only (see [6][11] for details on the DHP algorithm and training equations for fuzzy structures). Perturbation through the plant model was used to obtain plant derivatives. Results For the first experiment we used cost function U1[k] with no lateral acceleration penalty. A plot of position error over time is shown in Figure 4. Note that in the early stages of training, the vehicle repeatedly “crashed.” The DHP process effected learning from these errors, and we note that in the latter half of training, the position error was maintained within the desired envelope. The largest values for position error occurred during sharp curves at high velocity, near the limits of the vehicle’s ability to hold the road. Recall that our analysis of the vehicle lateral dynamics suggested that the vehicle’s steering response was “higher gain” at higher velocities. We would then expect that more damping would be required for position control at higher velocities. In a PD control law, for a given gain on the “P” term, the more negative the gain on the “D” term the more damping is provided. Figure 5 illustrates the consequent coefficients Kψ plotted

0

200

400

600

Time (s)

Figure 4 – Lateral position error versus simulated training time obtained using utility function U1[k] 0

Psi Error Coefficient

A set of 10 different s-curves were developed whose turning radii were designed such that ρˆ ranged from -

-2

-0.1

-0.2

-0.3

-0.4

6.

e

-0.5 20

25 30 35 Velocity Value for Rule Center

40

Figure 5 – Damping coefficient in fuzzy rules versus velocity rule center In the second experiment we used U2[k] to promote lateral acceleration control via speed reduction. In Figure 6 we illustrate lateral acceleration results from a portion of the training runs. For a moderate curve requiring modest lateral acceleration, the two plotted data sets are quite similar. For a sharp curve, however, there is a distinct difference. The lower trace demo nstrates the effect of speed reduction as lateral acceleration is reduced to meet the 4.5 m/s 2 constraint. Finally, we note that lateral acceleration control has a beneficial effect on position error as shown in Figure 7. We note that the position error is maintained well within the + 1m maximum position error envelope; in


5

fact, it is contained within an envelope of +0.5 meters of position error in the latter stages of training.

2

Avg. Lateral Acc. (m/s )

Sharp Curve

U1[k] U2[k] 6

Moderate Curve

2

0 450

460

470

480

Time (s)

Figure 6 – Average lateral acceleration obtained using U1[k] and U2[k]. 2 1.5

Vehicle "crashes" at +/- 1.5

Lat 1 . Po 0.5 s. Err 0 or -0.5 (m ) -1 Desired envelope +/- 1

-1.5 -2 0

200

400

600

Time (s)

Figure 7 – Lateral position error versus simulated training time obtained using utility function U2[k] (lateral acceleration reduced) 7.

References

[1]

8

4

8.

Conclusions We have demonstrated the applicability of DHP for the automated tuning of a fuzzy controller for a practical engineering problem. The fuzzy control framework is a natural choice for such guidance control, due to ease of handling the nonlinear factors with fuzzy rules, and in addition, maintaining an interpretable form for the control law. However, hand tuning of the controller parameters in such a representation is not, in general, a trivial task. The demonstration showed that DHP was able to automatically develop a reasonable controller for the given problem specification, based on specification of just a few parameters in a simple utility function.

G.G. Lendaris and L.J. Schultz, “Controller Design (from scratch) Using Adaptive Critic Neuro Control,” Proc. of IEEE-International Symp osium on Intelligent Control (IEEE-ISIC ’2000), Patras, Greece, July, 2000. [2] G.G. Lendaris, L.J. Schultz, and T.T. Shannon, “Adaptive Critic Design for Intelligent Steering and Speed Control of a 2-Axle Vehicle,” Proc. of IJCNN’2000, Como, Italy, July, 2000. [3] P.J. Werbos, “Approximate Dynamic Programming for Real-Time Control and Neural Modeling,” in Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, D.A. White and D.A. Sofge, eds., Van Nostrand Reinhold, New York, NY, USA, 1992, pp 493-525. [4] R.F. Stengel, Optimal Control and Estimation, Dover Publications Inc., New York, NY, USA, 1986. [5] G. Lendaris, and C. Paintz, “Training Strategies for Critic and Action Neural Networks in Dual Heuris tic Programming Method,” Proc. of IJCNN’97, Dallas, TX, IEEE Press, July, 1997. [6] T.T. Shannon and G.G. Lendaris, “Adaptive Critic Based Approximate Dynamic Programming for Tuning Fuzzy Controllers,” Proc. of IEEEFUZZ 2000, San Antonio, TX, USA, May 2000. [7] J. Ackermann, “Robust Decoupling, Ideal Steering Dynamics and Yaw Stabilization of 4WS Cars,” Automatica, 30:11, 1994, pp 1761-1768. [8] E. Bakker, L. Nyborg and H. Pacejka, “Tyre Modelling for Use in Vehicle Dynamics Studies,” S.A.E. Paper 870421, 1987. [9] J. Yen and R. Langari, Fuzzy Logic, Prentice Hall, Upper Saddle River, NJ, USA, 1999. [10] T. Hryces, Neurocontrol: Towards an Industrial Control Methodology, John Wiley and Sons, Inc., New York, NY, USA, 1997. [11] G.G. Lendaris, T.T. Shannon, L.J. Schultz, S. Hu tsell and A. Rogers, “Dual Heuristic Pro gramming for Fuzzy Control,” Proc. of IFSA/NAFIPS 2001, Vancouver, B.C., Canada, July, 2001.


6