a Heuristic for Near-Optimal Software Pipelining - Semantic Scholar

2 downloads 0 Views 640KB Size Report
Jun 1, 2006 - some improvements but most of the time, ILP timeout without a better solution. ⇒ expensive optimization without significant improvements ...
Problem

ILP for software pipelining

SCAN

SCAN : a Heuristic for Near-Optimal Software Pipelining Florent Blachot, Benoît Dupont de Dinechin and Guillaume Huard ID-IMAG, Grenoble, France STMicroelectronics, Grenoble, France

01 June 2006

Conclusion

Problem

ILP for software pipelining

SCAN

Conclusion

Software pipelining

Software pipelining : a classic compiler optimization improves the performances of inner loops on instruction-level parallel processors cyclic scheduling 1-periodic with integral period objective is to minimize period of the cyclic schedule NP-Hard problem performed by heuristics such as iterative modulo scheduling

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

3 typed instructions to schedule ...

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

... on 2 typed processors

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

A kernel occurs and ...

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

.. in loop context, this kernel is repeated...

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

... a great number of times

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

...to obtain a periodic schedule

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

Idle Time

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

Idea : move into idle time...

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

Idea : ... instructions from others iterations...

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Software pipelining on a small instance

Idea : ... to obtain a schedule with a lower period

0 1 2 3 4 5 6 7 8 9 10

Conclusion

Problem

ILP for software pipelining

SCAN

Conclusion

Denition on a small instance

Denition of Period : Number of time units for a new kernel

0

P=2

10

Problem

ILP for software pipelining

SCAN

Conclusion

Denition on a small instance

Denition of Time Horizon : number of time units for an old kernel

0

H=4

10

Problem

ILP for software pipelining

SCAN

Motivation to use ILP

Context of st200 processors : family of VLIW embedded processors 4 UET instructions by time units production compiler with a software pipeliner : Iterative Modulo Scheduler (IMS) on some loops, IMS fails to minimize period embedded computing ⇒ invest large amount of time ⇒ ILP for these dicult loops

Conclusion

Problem

ILP for software pipelining

SCAN

First disappointment on ILP

2 dierent ILP models [Eichenberger 1997, Dupont de Dinechin 2003] most recent implemented into production compiler some improvements but most of the time, ILP timeout without a better solution ⇒ expensive optimization without signicant improvements

Conclusion

Problem

ILP for software pipelining

SCAN

Conclusion

ILP formulation time indexed formulation for all instructions i and for all time units t , a binary variable xit is created a binary variable xit is true ⇔ instruction i was scheduled at time t ¯ −1 HX

t =0

xit xit

¯ − 1] ∈ {0, 1} ∀i ∈ [1, n], ∀t ∈ [0, H =1

∀i ∈ [1, n]

Problem

ILP for software pipelining

SCAN

ILP formulation

ILP strongly depends on 2 xed values ¯ −1 HX

s =t

xis +

t +θij X −P ωij −1

b HP−1 c

s =0

xjs

≤1

¯ − 1], ∀(i , j ) ∈ Edep ∀t ∈ [0, H

¯

n X X i =1 k =0

→ −

xit +k ×P bi



→ −

B

¯ − 1] ∀t ∈ [0, H

Conclusion

Problem

ILP for software pipelining

SCAN

ILP formulation

xed value P : value to minimize ¯ −1 HX

s =t

xis +

t +θij X −P ωij −1

H¯ −1 c n bX P X

i =1 k =0

s =0

xjs → −

xit +k ×P bi

≤1



→ −

B

¯ − 1], ∀(i , j ) ∈ Edep ∀t ∈ [0, H ¯ − 1] ∀t ∈ [0, H

Conclusion

Problem

ILP for software pipelining

SCAN

ILP formulation

xed value H¯ : an upper bound on Time Horizon ¯ −1 HX

s =t

xis +

t +θij X −P ωij −1

b HP−1 c

s =0

xjs

≤1

¯ − 1], ∀(i , j ) ∈ Edep ∀t ∈ [0, H

¯

n X X i =1 k =0

→ −

xit +k ×P bi



→ −

B

¯ − 1] ∀t ∈ [0, H

Conclusion

Problem

ILP for software pipelining

SCAN

¯ inuence ILP ? How does H

xit

¯ − 1] ∈ {0, 1} ∀i ∈ [1, n], ∀t ∈ [0, H

directly determines number of variables on ILP formulation, less variables ⇒ easy to solve but decrease H¯ ⇒ problem may become infeasible and increase H¯ ⇒ often causes timeout existence of upper and lower bounds for Time Horizon bounds far o practical value strictly mathematical approach cannot help us ⇒ experimental approach

Conclusion

Problem

ILP for software pipelining

SCAN

Conclusion

Experimental approach

consists to test our ILP formulation for all P and for all H¯ on a large benchmark lower bound for P : a theorical one MinP upper bound for P : a pratical one PIMS from IMS lower value for H¯ : a pratical one HIMS from IMS upper bound for H¯ : a pratical one, done by compilator constraint

Problem

ILP for software pipelining

Experimental conditions

Experimental conditions : 169 inner-loops of classical tests pentium 1.8GHz CPLEX 9.0 timeout of 3000s ⇒ total computing time approx. 2 weeks

SCAN

Conclusion

Problem

ILP for software pipelining

SCAN

Conclusion

Space characterization

Space for a loop from Transfo.c with a timeout of 3000s MinP

HIMS

+5

+5

+10

Infeasible

PIMS

Feasible

P

+15 Timeout

+64



Problem

ILP for software pipelining

SCAN

Conclusion

Space characterization

General space for benchmark with a timeout of 3000s MinP

HIMS Infeasible

PIMS

IMS point

P

Timeout

Feasible



Problem

ILP for software pipelining

SCAN

Conclusion

Idea of SCAN

Large feasible & quickly solved area Infeasible

PIMS

MinP

HIMS

P

IMS point

Time Out

Feasible



Problem

ILP for software pipelining

SCAN

Conclusion

Idea of SCAN

Idea : follow frontier of this large area MinP

HIMS Infeasible

Time Out

SCAN

PIMS

Feasible

P



Problem

ILP for software pipelining

SCAN

Conclusion

Idea of SCAN

SCAN for a loop from Transfo.c MinP

HIMS

+5

+5

+10

Infeasible

PIMS

Feasible

P

+15 Timeout

+64



Problem

ILP for software pipelining

SCAN

Conclusion

Results of SCAN

timeout PSCAN vs. PHMS 5s 4.10% 10s 4.21% 25s 4.28% 75s 4.29% 500s 4.29%

PMinP vs. PSCAN average time for scan 1.39% 1.28% 1.20% 1.19% 1.19%

3.36s 5.95s 9.19s 22.10s 98.34s

Problem

ILP for software pipelining

SCAN

Conclusion

Conclusion

Conclusion : SCAN made ILP formulation of software pipelining usable and ecient great improvement gain on period with addition of 1 hour on all benchmark up to 33.3% gain on hard loop, like t32x32 close to lower bound (1.19%) ⇒ hard to gain more than SCAN