Bounded Anytime Deflation - Google Sites

On the Time-Optimal Planning and Execution Problem Thomas Allen [email protected] The University of Sydney

Overview

• MoBvaBon • Time-‐OpBmal Planning and ExecuBon (TOPE) • TOPE Process ImplementaBons • Monte-‐Carlo ComparaBve Analysis • Conclusions and Discussion

2

RSS2011 -‐ Integrated Planning and Control Workshop

Thomas Allen

off between tP and tESteven , attempting e optimum value of any system parameters that Allen Thomas and Schedt Thomas and Allen Steven Scheding s total time. Procedures are presented to apply Weighted A∗ and its many variants an existing replanning system, and to determine approach [8]. The invention of anyt curacy and timeliness. It is shown that the TOPE an automatic approach, with the trad eld Abstract— lower total times than other planning systems Abstract— paper introduces the Time-Optimal Planning Su This paperThis introduces the Time-Optimal Planning computational intractable. intractable. Such algorithm available time [5, 9]. ements areand met. Execution (TOPE)inproblem, which aim is to minimise and Execution (TOPE) problem, which theinaim is tothe minimise adoes large decrea aGoal large decrease in tP .perf Planning s cenarios w here B me-‐ The TOPE process not • the total planning and execution time required to achieve a the total planning and execution time required to achieve a With all th With all these techniqu ning, instead it performs time-optim TRODUCTION AND MOTIVATION goal. The TOPE process is derived and shown to be capable of to-‐goal i s t he p rimary m etric goal. The TOPE process is derived and shown to be capable of totheir fine-tune th to fine-tune algorithm solving thisinproblem instate dynamic state spaces, by continuously cution, whereby the combination of solving this problem dynamic spaces, by continuously • e.g. of Unmanned Ground Vehicles the process computing a sequence of actions off off The between tP between and tE , attP the value optimum value of any system parameters that process calculatingcalculating the optimum of any system parameters that is minimised. attempts ∗ ∗ and its A ecuted, achieve a goaltime. [1].total For time. a single plan and are topresented can this Procedures to apply AWeighted can affect thisaffect total Procedures are presented apply Weighted ma by pre-computing the optimum tE and ,this theprocess total this time achieve aexisting goal isreplanning given by to an to determine to process antoexisting replanning system, andsystem, to determine approach [8]. approach [8]. The the inventi For a s ingle p lan a nd e xecute • parameters that will minimise tota itsaccuracy requiredand accuracy and timeliness. is shown that the TOPE its required timeliness. It is shown It that the TOPE automaticwia an automatican from prior techniques is approach, that the TO tyield = tPlower + tEyield (1) other process can planning systems cycle: process can totallower times total than times other than planning systems available comp computational ti if these requirements if these requirements are met. are met. recomputes available these system parameters

MoBvaBon

The TOPE TOPE process doe e time taken to compute the plan to a goal, and uses feedbackThe from previous cycles ning, instead I.this INTRODUCTION AND MOTIVATION ning, instead performs I. execute INTRODUCTION AND MOTIVATION amount oplan. f work describes taken to The Time-Optimal • Huge form the computation. Ofitthose tech cution, whereb cution, whereby the comb Execution (TOPE) invests initial aofonly opBmising sprocess ingle ycle some Planning is the of computing sequence of actions anytime algorithms have the Planning is the a process ofcprocess computing a sequence actions minimised. is minimised. The proces order to calculate theachieve choice of parameters to that, when executed, achieve aFor goal [1]. For a single and isbut parameter adjustment, only for that, when executed, goal [1]. a single plan and plan • Mostly unsuited to daynamic spaces by pre-com t tE by E system minimise this t. The total thusis given execute the total timetime to achieve aparameter goal byispre-computing executethat cycle, the cycle, total time to achieve a goal byis given which adjusted in anthe op parameters tha parameters that will minim than determined analytically. from prior istect from (1) prior techniques r tP + tE(2) • Lots to=f w tC ork + tP� o +pBmising tE� t = tP + t E to= The(1)remainder of this paper is recomputes recomputes these systemth separately, b ut n ot t ogether Sections and III derive the equation the time taken tothe compute planand to aIIgoal, and where tplanning the time to plan to the a goal, where P istaken tE� are tPtheis new andcompute execution times, usesfrom feedback Start uses feedback previ process, and define its operating spa time taken toontrol execute this Time-Optimal plan. The Time-Optimal tEIfis taken to Pexecute this plan. tE is3 the time RSS2011 -‐the Integrated lanning and C W orkshopThe Allen arameters. form Thomas the comp form the computation. Of

Overview of the TOPE Process • In each cycle, invest some iniBal Bme, t C , to determine the ideal

‘system parameter’ values, used to adjust the system’s behaviour

•

� � ˆ ˆ ˆ Expected Bme remaining becomes: t = tC + tP + tE

•

If t C < ( t ˆP − t ˆ P ) + ( t Ê − t ˆ E ) , investment may reduce the Bme-‐to-‐goal

�

�

• In dynamic state spaces, this investment is repeated in every cycle • What are these system parameters? • Any online-‐adjustable parameter that affects t ˆ C , t ˆP , or tÊ • e.g. HeurisBc weighBng factor, grid cell size, path diversity, sampling density 4


Thomas Allen

TOPE Process as a Feedback Control System Feedback

Measurement

Original FCS TOPE Process

System Parameters Disturbance

Disturbance

Objective

Objective

Output Controller

Feedback

5


Plant

Sensing

Thomas Allen

  i=j  TOPE Process as a Feedback Control System n   � = arg min tCj + tPj + Measurement[tCi + tPi ] + tEn (Extrac Feedback   y∈Y i=j+1   Original FCS n   � � � System Parameters TOPE ˆ i + tˆPi + tDisturbance ˆn ˆj + tPˆj + DisturbancetC Process = arg min tC (Takin E   y∈Y y∈Y

Objective

(Removing constant

[tCi + tPi ] + tEn

= arg min

i=j+1

�

Controller

ˆj ˆj + tPˆj + tE = arg min tC y∈Y

Feedback

�

Output

Plant

(Substituting Equation 4.7

Sensing

ThisProcess derivation yields finalpform of theof TOPE equation: adjusts the the system arameters the original FCS • The TOPE

•

∗

�

Parameters found by TOPE EquaBon: y = arg min tˆC + tˆP + tÊ y∈Y

6

�

Thomas Allen where the time components are the expected times required


Equation 4.9, compute the plan given the argument set y ∗ , and are the expected times required to compute the solution to respectively. This is the general form of the TOPE equation, a an given the argument set y ∗ , and execute the computed plan,

What does arethe TOPE EquaBon mean? removed because it is applicable at all iterations.

eral form of the TOPE equation, and the j-th iteration labels

plicable at all iterations.

•

Succinctly, the TOPE equation states that the best parameters

States that “the best toparameters tthose o apply to expected the replanning ystem system that are to minimisesthe total time on states that the best parameters apply to theare replanning are those ethe xpected minimise the tgoal. otal me-‐to-‐goal” Importantly, areBnot the parameters that minimise tˆP + pected to minimise total timeto required to achieve athese

minimise • Not that minimising t ˆ E ;, they but that minimising tˆC + tˆP + tÊ , meaning the parameters account f are the parameters he parameters minimise t ˆP +

meaning parameters for the time required to B find them. TOPE process usingto this equation • the That is, the paccount arameters account fThe or the me required find them can be consider Nothing is more and therefore precious, be able to decide. sing this equation can difficult, be considered to bedynamic amore statement of than the to principle programming (Bellman 1957a); when the fu Nothing is more difficult, and therefore more precious, than to be able to decide. iple (Bellman 1957a); when the future effects of decisions are the optimal action at each point in time, uncertain, by taking Analogous to the dynamic programming principle: Napoleon Bonaparte imal action at each point in time, the resulting behaviour will be as close to globally optimal asNapoleon possible.Bonaparte The TOPE equat • When the future is uncertain, do the opBmal thing at all Bmes mal as possible. The TOPE equation is analogous to MDP a policy used by an (Bellman 1957b), or the Bellman equation (Be

•

957b), or the Bellman equation (Bellman 1957a, chap.III). The

• Similar to a policy in a Markov decision process or Bellman equaBon In any moment of decision the best thing you can do is the right thing. In anyThe moment decision the do bestisthing you can do is the right thing. worst of thing you can nothing. The worst thing you can do is nothing. Theodore Roosevelt Theodore Roosevelt 7


Thomas Allen

y. The colour gradient distance between the start and goal points is 128m, g The experiments are performed in two parameter s until the first plan is available, the middle term represents the hrocess rangeswas from highly minimum heuristic estimate of traversal timeconsisting ofn), 25.7 The first is a one-dimensional space total time spent executing each intermediate plan (for i < Experimental Scenarios mpletely untraversable of avelocity state space discretisation parameter. pa maximum of 5m/s. The optimum pathThis howe , represents the time spent executing and the final term, t E n stored in a ‘cost grid’ ter, c,traversal is t drawn from the48.6s, following set of the side nl an expected time of reflecting the < t + t or else n is not the final plan (where E P C n mum cost value within n+1 n+1 of the square grid cells used to store the cost Ttraversing errain Data:low velocity regions. • Experimental final iteration). of deemed to have a cost on of the maps’ generated {0.25, 0.375, 0.5,d0.75, 1, 1.5, 2, 3, 4, 6, 8} metres. • ‘Cost f rom r eal w orld ata each iteration in the replanning process, the TOPE or a grid cell At is given The second space is two-dimensional, and combin ario shown collected in Marulan, Australia, problem is defined as: or syntheBc cell size parameter with a heuristic inflation factor f randomised fractal terrain data B. Parameter Spaces are marked × vmin (8) weighted A∗∗=algorithm. This heuristic inflation (5) factor • Costs reflect traversability as velocity limitsmin {t} arg y ur gradient y∈Ydetermines a scalar that on th /s, and vmin = 0.5m/s. The experiments are performedtheinupper two bound parameter • Data loaded incrementally by rvalue ay-‐tracing om highly respect to the optimum. Itconsistin is well direction of travel) di- of a isplana with one-dimensional space whereThe y∗ isfirst the set of parameters to a planning system (in traversable the expected traversal that increasing this value typically decreases the amo sof pace, ,, is a Y state space discretisation This p the space, of2D: available parameter sets) parameter. which minimises • Parameter computational time required to find a plan, but the tra finds the path which ‘cost grid’t, the total time required to achieve a goal, as given by size, c, is is usually drawnan from thein the following setplan of[3,side increase cost of this time.• Grid cell ter, lue within(4). The space Y is application specific but can include any8, 14 ofwthis the values square cells from useda set to ranging store from the 1cos of εgrid are drawn to ctions•make use of HeurisBc eighBng factor, have a cost parameter affects thestarting total t.from Possible generated thespaces previous he data from all{0.25, cells which Start 0.375, 0.5,by0.75, 1, 1.5, 2, 1000, 3, 4,parameter 6,raising 8} metres. discussed tofurther in Section ll location is givenare the power of 0.95,III. andtoincluding he ’s is received Goal the uninflated The second space is two-dimensional, and combi High-‐fidelity vehicle m odel ray-‐tracing • retained There are where two important points to note regarding these ε+= 1. and thereafter. Distance:

data acquisiBon Firstly, very parameter close tis o rused eality cell =size with a heuristic inflation equations. (5) at every iteration, and thusfactor in 124m (8) ∗ algorithm. This heuristic inflation facto weighted A any iteration j Control where i < j, tCi and tPi are constants (since RSS2011 -‐ Integrated Planning and Workshop Thomas Allen 8

TOPE DemonstraBon • Replanning using A*

• 4X Speed

• i.e. non-‐incremental

• TOPE process adjusts

the cell size and heurisBc inflaBon factor prior to each plan

• TOPE process is only

applied to the ‘global’ planner in a hierarchy • ‘local’ level smooths the plan to be executed

9


Thomas Allen

TOPE ImplementaBons • 12 TOPE implementaBons were developed • Four ‘heurisBc’ models that make domain-‐specific assumpBons (H1-‐4) • The same four but starBng with knowledge of the best fixed parameter set -‐ effecBvely a source of ground truth informaBon (H1-‐4b) • Four ‘staBsBcal’ models that use online reinforcement learning to build a map of appropriate parameter sets under various situaBons (S1-‐4)

• Compared to 8 ‘baseline’ techniques from the literature (B1-‐4, B1-‐4b) • The best performing implementaBon was S4 in almost all situaBons • Assesses the expected total Bme using the previous and best known parameter sets by planning with each • Returns the bener choice, but learns from both • (S4 was the technique shown in the previous demonstraBon slide) 10


Thomas Allen

Typical Comparison Results Comparison of All Implementations − Example Scenario

110 100

B1 B2 B3 B4 B1b B2b B3b B4b

t (s)

90 80 70 60 50 45 11

H1

H1b

H2

H2b H3 H3b H4 H4b Estimation Technique


S1

S2

S3

S4 Thomas Allen

Monte-‐Carlo Analysis -‐ Real World Data TOPE outperformed Baseline

Both Techniques Equal

Baseline outperformed TOPE

B4b B3b

Baseline Techniques

B2b B1b B4 B3 B2 B1 H1 12

H2

H3

H4

H1b

H2b H3b TOPE Techniques


H4b

S1

S2

S3

S4 Thomas Allen

Conclusions from Monte-‐Carlo Analysis • Three main conclusions: • The best TOPE technique is more likely to outperform than be outperformed by any comparison technique that uses only fixed parameter sets. • The average performance of the best TOPE technique is likely to be superior to the average performance of any comparison technique, since the Bmes it is beneficial typically outweigh the Bmes it is detrimental. • TOPE techniques are more robust than any fixed parameter technique, since the ability to change parameter values allows them to succeed in a greater variety of scenarios.

13


Thomas Allen

Conclusions and Discussion • The TOPE process is able to meet or improve upon the performance of any method that uses only fixed system parameters, given the same sources of state informaBon.

• Process is not limited to the example parameter spaces or scenarios • Any online-‐adjustable parameter that affects t ˆC , t ˆP , or tÊ • TOPE structure wraps exisBng systems, thus supports newer techniques • Accounts for CPU usage (even dynamic) by measuring its own runBme

• Monte-‐Carlo analysis over randomised fractal and real world terrain demonstrated these performance gains for several implementaBons

14


Thomas Allen

Thank you for your attention - Questions?

to show that the process was tractable Any unknown regions are assumed to ntly, none of these experiments required maximum velocity, which is an admissibl SensiBvity Analysis of a technique to solve (7). Instead, the since the true velocity cannot be greate ers were determined by brute force over distance between the start and goal point ach iteration, and the TOPE process was minimum heuristic estimate ∗ of traversal First e xperiment v alidated t he T OPE p rocess g iven t rue y • simulated estimation times. maximum velocity of 5m/s. The optimum

an expected traversal time of 48.6s, refle impossible tlow o anain perfecBon? • What if, as is likely in pracBce, it is of traversing velocity regions. were performed by simulation of the • How low must t be for inaccurate parameter sets to be sufficient? er the 2D path planningC scenario shown What is of the the operaBng region ( t C vs δy B. ) for any given TSpaces OPE system? Parameter e start• and goal search are marked

o

and 1, respectively. The colour gradient The experiments are performed in two denoted c, which ranges from highly • Procedure: The first is a one-dimensional spac ions (c 0) to completely • =Perform brute-‐force untraversable parameter search btain gspace round tdiscretisation ruth parameter parame set ofand a ostate . This cost data is stored in a ‘cost grid’ is drawn from the following s • (The parameter space analysed is ter, 1D -‐ c, only) holds the maximum cost value within of thecentred squareon grid cells usedδ to sto y with • Perturb t his s et b y d rawing f rom a G aussian v ariance s with no data are deemed to have a cost {0.25, 0.375, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, 8} m velocity limit for a grid cell is given The second space is two-dimensional, plus two sigma versus t C and δ infl • Results show mean TOPE total Bme cell size parameter with a heuristic + (1 − p) × vmin shows best p(8) p × vmax • Horizontal surface erforming comparison minus two sigma weighted A∗ algorithm. This heuristic in

n = 10, vmax = 5m/s, and v and C= 0.5m/s. RSS2011 -‐ Integrated Planning min ontrol Workshop 16

a scalar value that determines the upper Thomas Allen

PerturbaBon of y

17


Thomas Allen

PerturbaBon of y : OperaBng Region

18


Thomas Allen

Conclusions of SensiBvity Analysis Experiments • For this parBcular scenario, the TOPE process is viable: • There exists an (albeit small) operaBng region • Region is compared to best possible fixed parameter set from staBc analysis • For any other comparison, the operaBng region will be equal or greater

• Validates TOPE process with: • ‘Sufficient’ accuracy (but not perfect) • ‘Sufficient’ Bmeliness (but not instantaneous)

• But does it work in pracBce? • Is that operaBng region achievable? • Can the system parameters actually be changed fast enough?

19


Thomas Allen

P2

tÊ = tÊ2

tÊ = 0

Simplified Replanning Process 1.

• Replanning involves t Elapsed = ∑ t P i + t

P1

2.

Goal

tÊ = undefined

n

i =1 planning while execuBng, then series depiction of a typical replanning process in a dynamic updaBng the acBve t = 0 t =t plan when possible 3. 4.

Goal

tÊ = tÊ1

En

Elapsed

Elapsed

P1

P1

Goal

P2

Goal

tÊ = 0

tÊ = tÊ2

tal to B achieve Total me to a goal for a replanning system is given by: • time

achieve na goal: • t =

� i=1

tP i + t E n

n

(4.1) t Elapsed = ∑ t P i + t

t Elapsed = t P1 + t P2

i =1

En

At the start of any iteraBon, i , expected remaining Bme is: tPˆ • o compute the plan to a goal in any iteration, i, and tE is the

i

î + tE

n

• These values are unknown unBl the end of the iteraBon al plan (where tEn < tPn+1 or else n is not the final iteration).

RSS2011 -‐ Integrated Planning and Control Workshop 20 t know which iteration is the final, in any particular iteration

Thomas Allen

Time-‐Focussed Metrics • MulB-‐objecBve opBmisaBons require a metric and weighBngs • Metric funcBon returns ‘cost’ between two states: m : X × X → R

• It is useful if the domain of this funcBon is Bme, but bener if it is ‘real’ Bme

• i.e. a duraBon of Bme not mulBplied by any scalar value • e.g. cost and distance à velocity limit and expected execuBon Bme

• For planning systems, allows comparison of planning Bme versus execuBon Bme of these plans

• Total Bme-‐to-‐goal is (in general) just t = tP + tE

21


Thomas Allen

TFMs: HeurisBc Graph Search Example • Weighted-‐A* algorithm as weighBng factor, ε , is increased: • The expected execuBon Bme, t Ê , increases, but the planning Bme, t P decreases tE=44.2s

ε = 1.0ε 22

tE=46.7s

tE=61.9s

= 2.0ε = 4.0


Thomas Allen

TFMs: HeurisBc Graph Search Example • Weighted-‐A* algorithm as weighBng factor, ε , is increased: • The expected execuBon Bme, t Ê , increases, but the planning Bme, t P decreases tE=44.2s tP=23.3s Total: 67.5s

ε = 1.0ε 23

tE=46.7s tP=4.7s Total: 51.4s

tE=61.9s tP=0.2s Total: 62.1s

= 2.0ε = 4.0


Thomas Allen

TFMs: Total Time Versus HeurisBc WeighBng Factor

24


Thomas Allen

Experimental ValidaBon • Aim to show TOPE process can reduce the total Bme-‐to-‐goal • UGV example and cost data, with tuning parameters: • Grid cell size, c (requires storing mulBple cost maps simultaneously) • HeurisBc inflaBon factor, ε (applied directly to planner’s cost funcBon)

• All other interfaces as per physical CORD vehicles: • Same controllers, state machine, replanning algorithm, etc

• Ground truth obtained by brute-‐force parameter search • Computes trajectory for all possible combinaBons of parameters • SimulaBon process can pause execuBon while searching then specify tC 25


Thomas Allen

The output of the TOPE equation is a set of parameters, he y∗ of (7), from a problem-specific parameter space. Baseline Comparison Techniques This space can include any continuous or discrete parameter which effects a change in the trade-off between tC , tP , and . As examples: level of techniques: state space discretisation, E• Four baseline cthe omparison heuristic weighting values, • B1: A* algorithm, with the fixed maximum c and ε = 1sampling density, he choice planning aalgorithm, thefixed level c aofnd discretisation • B2: of Weighted-‐A* lgorithm, with fixed ε > 1 of continuous parameters, or thewith choice c ignore parameters • B3: AnyBme-‐A* algorithm fixed to with minimal effects. algorithm with fixed c • B4: AnyBme-‐D* The last two items in this list bear further discussion. Since an estimate of tC , the time required to calculate the answer to variaBons of fixed parameter values: • isTwo 7), an input to the equation itself, parameters which affect • B# -‐ Values sed in previous his variable are uthemselves partfield oftrials the parameter space. • B#b -‐ Tthe bisest possible xed proportional parameters (from a brute likely to fibe to the sizeforce of staBc analysis) Furthermore, C Y , or even the number of sets of combinations of parameters technique uses round parameters truth parameter values y• ∈ YTOPE . Thus, the choice of gwhich in Y have the at each eastiteraBon, effect on taPnd and tEdis a means vofariable varying the ependent is tC , and could be considered by the TOPE process. 26

IV. THE TOPE PROCESS


Thomas Allen

Results: 1D Parameter Space ( c only) Total Time to Achieve a Goal Versus Simulated Estimation Time ï Shorter Scenario 65 B1 B2 B3 B4

t (s)

60


B1b B4b

55

B2b

50

45

27

B3b

0

0.5

1


1.5 t (s) C

2

2.5

3 Thomas Allen

Results: 1D Parameter Space ( c only) Total Time to Achieve a Goal Versus Simulated Estimation Time ï Shorter Scenario 65 B1 B2 B3 B4

t (s)

60


B1b B4b

55

B2b

50

45

28

B3b

0

0.5

1


1.5 t (s) C

2

2.5

3 Thomas Allen

Results: 2D Parameter Space ( c and ε ) Total Time to Achieve a Goal Versus Simulated Estimation Time ï Shorter Scenario 65 B1 B2 B3 B4

t (s)

60


B1b B4b

55

B2b

50

45

29

B3b

0

0.5

1


1.5 t (s) C

2

2.5

3 Thomas Allen

Conclusions of ValidaBon Experiments • If calculaBon of y ∗ can be performed: • Perfectly accurately • ‘Sufficiently’ quickly

• TOPE process can improve performance compared to exisBng methods using only fixed parameter values

• AddiBonal dimensions in the parameter space allow more opBons, and may further improve performance

30


Thomas Allen

PerturbaBon of tC

31


Thomas Allen

PerturbaBon of t C : OperaBng Region

32


Thomas Allen