Optimal Task Placement to Improve Cache Performance

0 downloads 0 Views 106KB Size Report
Oct 3, 2007 - Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University. 1 of 25. EMSOFT 2007 – Salzburg – Austria. Optimal ...
EMSOFT 2007 – Salzburg – Austria

Optimal Task Placement to Improve Cache Performance Sebastian Altmeyer and Gernot Gebhard Compiler Design Lab Saarland University rd

Wednesday, 3 October, 2007 Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

1 of 25

Introduction

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

2 of 25

Scope ●

Single-processor Embedded System



Cache using any Replacement Policy



Straight memory-to-cache mapping –

No virtual memory



Memory maps consecutively to cache sets



Preemptive task scheduling



Schedule statically known

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

3 of 25

Motivation ●



Cache performance optimization reasonable –

Increasing demands at Embedded Systems



Steadily growing cache size

Preemptive task scheduling –

Some task sets only schedulable in this fashion



Renders previous timing guarantees invalid



Induces dynamic context-switch costs

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

4 of 25

Observation ●

Memory placement influences cache performance



Example: –

Tasks T1 and T2 execute mutual exclusively



Task T3 executes frequently, interrupting T1 and T2

Main Memory: T1 T2 T3

Direct-Mapped Cache: 0

T1

n 2n 3n

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

T2

T3 0

n

2n

⇒ Bad Performance 5 of 25

Observation cont. ●

Memory placement influences cache performance



Example: –

Tasks T1 and T2 execute mutual exclusively



Task T3 executes frequently, interrupting T1 and T2

Main Memory: T1 T3 T2

Direct-Mapped Cache: 0

T1

n 2n 3n

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

T3

T2 0

n

2n

⇒ Good Performance 6 of 25

Goals ●



Improve performance of Embedded Systems by –

Trying to keep cached data persistent



Reducing context-switch costs

Make static timing analyses feasible by –

Identifying persistent cache sets



Being able to derive tight WCET bounds

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

7 of 25

Optimization Method

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

8 of 25

Method ●





Determine performance factors –

Cache configuration



Tasks



Schedule

Compute optimal task placement –

Building cost function



Determining start address of each task

Arrange code according to placement

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

9 of 25

Performance Factor: Cache



Cache size –

Determines room for possible optimization



Some tasks should fit inside the cache

Associativity and Replacement Policy –

Strongly influences cache performance



Affects predictability of cache behavior (minimum life span [1]): Replacement Policy

Associativity



PLRU

LRU

1

1

1

2

2

2

4

3

4

n

log2(n) + 1

n

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

10 of 25

Performance Factor: Task ●





Notion Task used ambiguously –

Describes an operation processing data



Denotes a set of interdependent procedures

Performance-affecting factors –

Period: Indicates severity of data being evicted



Start Address & Size: Determine occupied cache sets

Example: Task (Procedures P1-3): P1 0

P2 n

Cache: P1

P3 2n

P2 P3

4n

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

0

n

2n 11 of 25

Performance Factor: Schedule ●



Schedule induces Task Interdependency Relation –

Defines which tasks a task might interrupt



Conflicting tasks should avoid each other

Example: Task Interdependency: T1

Optimal Task Placement:

T2

T1

T3

T2 T3

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

0

n

2n

12 of 25

Optimization Problem ●



Find task placement with minimal costs –

Count conflicts with preempting tasks for cache sets



Ignore if number of conflicts < minimum life span



Weight conflicts proportional to task period

Compute optimal solution via ILP –

Complexity linear in number of tasks and cache sets

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

13 of 25

Example Taskset:

Main Memory:

T1 T2 T3 T4

0 2n 4n

Task Interdependency: T1

T2

T1 T4 T2

0

T3

4n

2n

Direct-Mapped Cache: T1 T2 T3

T3

0

T4

n

2n

T4 Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

14 of 25

Simplified Optimization Problem ●





Problems of ILP-based approach –

Introduces memory gaps



NP-completeness (k-colorability [2])



Long solving time for large tasksets

Idea: Reduce search space –

Always arrange tasks consecutively in memory



Determine optimal permutation of tasks

Simulated Annealing –

Approximate optimal solution

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

15 of 25

Example Taskset:

Main Memory:

T1 T2 T3 T4

0

T1 T4 T2 T3

2n 4n

T2

2n 4n

Task Interdependency: T1

0

Direct-Mapped Cache: T1 T2

T3 0

T4 T3 n

2n

T4 Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

16 of 25

Practice

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

17 of 25

Task Placement Framework Source Code

Schedule

Cache Config.

Compiler Object Files

Optimizer

Linker

Placement

Memory Image

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

18 of 25

Experimental Results ●





Optimized three task sets –

Selected ten tasks from WCET Benchmark [3]



Executed under RTEMS operating system [4]



Simulated with ARM7 emulation MPARM [5]

Performed optimization for three caches (LRU) –

16kb direct-mapped



32kb two-way set-associative



32kb four-way set-associative

Task sets do not fit in any cache

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

19 of 25

Experimental Results cont. Cost-function compared to cache misses –

Cheaper placement leads to better avg. performance



Model not (yet) accurate enough 100 97.5 95 92.5 90

Task Set A

87.5 85 82.5

Task Set B

80

Percentage



77.5 75

Task Set C

72.5 70 67.5 65

Cost Ratio

62.5 60 57.5

Miss Ratio

55 52.5 50 47.5 45 direct

2-way

4-way

direct

2-way

4-way

direct

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

2-way

4-way

20 of 25

Experimental Results cont. ●

Persistent cache sets for specific task (compress) 100 95 90 85 80 75

Task Set A

70 65

Task Set B

Percentage

60 55

Task Set C

50 45 40

Best Placement

35 30

Worst Placement

25 20 15 10 5 0 direct

2-way

4-way

direct

2-way

4-way

direct

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

2-way

4-way

21 of 25

Conclusion

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

22 of 25

Achievements ●





New method to optimize cache-performance –

Program code needs not be modified



Arrange instructions and data differently in memory

Computation of optimal task placement –

Globally minimizes threat of eviction



Leads to better average performance

Classification of cache sets (non-/persistent) –

Allow tight timing guarantees for preemptive scheduling

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

23 of 25

Future Work ●

Improve cost function –



Weight loops differently than straight-lined code

Place procedures instead of whole tasks –

Allows higher variability



Achieve better results



Restrict to preemption points



Evaluation using real-world task sets

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

24 of 25

References [1] J. Reineke, D. Grund, C. Berg, and R. Wilhelm. Predictability of Cache Replacement Policies. Reports of SFB/TR 14 AVACS 9, SFB/TR 14 AVACS, September 2006. [2] C. Guillon, F. Rastello, T. Bidault, and F. Bouchez. Procedure placement using temporal-ordering information: Dealing with code size expansion. Journal of Embedded Computing, 1(4):437–459, 2005. [3] Benchmarks: http://www.mrtc.mdh.se/projects/wcet/benchmarks.html [4] RTEMS Operating System: http://www.rtems.com/ [5] MPARM: http://www-micrel.deis.unibo.it/sitonew/research/mparm.html

Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University

25 of 25

Suggest Documents