Optimal Task Placement to Improve Cache Performance
Recommend Documents
Therefore, voltage profile had been always a concern in grids which have large wind farms. Nowadays, Flexible AC Transmission System (FACTS) devices.
increasing cache flexibility; software techniques [13-23] modify program structure ... possible mappings: therefore, a brute force attack to the optimum problem is ...
they implement different optimization techniques such as caching, that may be ... Keywords: big data, search engines, intersection caching, classification.
[2] Sanjeev Arora and Carsten Lund. Hardness of approximations. In Dorit Hochbaum, editor,. Approximation Algorithms for NP-hard Problems. PWS Publishing ...
Modifying Boosted Trees to Improve Performance on Task 1 of the 2006 KDD Challenge Cup. Robert M. Bell. AT&T Labs-Research. 180 Park Ave. Florham Park ...
pects of the motion, using vector quantities such as Car- tesian velocities and forces. ...... Conference on Computation
accrued by prosecution and defense competing against each other. The use of ..... an ensemble of SVM is functionally similar to AdaBoost. [20]. The strangeness ...
intelligent control algorithm analysis will auto-motion each motors to move in order to show that view points at the center of monitor. Figure 1. MorDinDaeng-LMR ...
Wireless Communication, since the beginning of this century has observed enormous ... phone users, thus cellular telepho
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April .... the control functions a
Nov 9, 2016 - delivery in stochastic wireless caching helper networks in the noise-limited .... file fi, Ïλu is the density of user requests in a given time slot.
Nov 6, 2015 - ... a factor of 12. In [2], [3], [4] and [5], outer bounds tighter than the cut- ... proposed in [10, Theorem 1] and loosened in [10, Corollary. 1]. ... The binomial coefficient is indicated as B(n, k) := (n k). The ... K , time-sharing
level cache (LLC) can improve data sharing and avoid three- ..... capacity is no smaller than twice the cumulative L1 cache capacity, as Hsu et al. ..... shared memory programming,â citeseer.ist.psu.edu/476450.html, Oc- tober 1997.
Jun 16, 2016 - Content distribution networks have been extremely successful in ..... Note that in a network composed of domains where providers care about.
Sep 17, 2018 - ing 1000-way softmax classification is replaced by binary classifiers ...... by: difference in cosines of angles (blue line) or feature similarity gain ...
lem, in which the conditional statement takes this form: If a card has a vowel on ... the selection task under deadline and in free time conditions. Additionally, the ...
game elements compared to a regular working memory training. .... level of âChallengesâ and âSkillsâ, since the working memory training task and its mechanics.
Sep 6, 2017 - with and without concern about falling (CoF) to a non-training control group on walking .... kind of DT training due to the additional cognitive ef-.
Nov 14, 2016 - In a caching system, there is a server with N files and K users each with a ... Delivery phase, which takes place during peak times: Requested ...
An extension of the filtering concept involves adding a second level (L2) trace-cache that stores less frequent traces that are replaced in the FTC or the MTC.
evaluated using a traffic monitoring application fed with real traces. ... One of the functions that are widely used in networking is flow identification that is.
We describe a practical one-bit cache-line tag implementation of ..... two blocks conflict, we want to replace the block whose reuse level is .... dered list from smallest to largest by the » ordering of the
improve the query throughput of Cuckoo hashing. .... case memory access time of d memory accesses, the conditional query has a higher ..... London, 2010. ... and architectures: A tutorial and survey,â IEEE Journal of Solid-State Circuits, vol.
to people strolling on sidewalks. Although some may dismiss this scenario as too far off in the future, infotainment devices are being deployed on new cars, and ...
Optimal Task Placement to Improve Cache Performance
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
0
n
2n 11 of 25
Performance Factor: Schedule ●
●
Schedule induces Task Interdependency Relation –
Defines which tasks a task might interrupt
–
Conflicting tasks should avoid each other
Example: Task Interdependency: T1
Optimal Task Placement:
T2
T1
T3
T2 T3
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
0
n
2n
12 of 25
Optimization Problem ●
●
Find task placement with minimal costs –
Count conflicts with preempting tasks for cache sets
–
Ignore if number of conflicts < minimum life span
–
Weight conflicts proportional to task period
Compute optimal solution via ILP –
Complexity linear in number of tasks and cache sets
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
13 of 25
Example Taskset:
Main Memory:
T1 T2 T3 T4
0 2n 4n
Task Interdependency: T1
T2
T1 T4 T2
0
T3
4n
2n
Direct-Mapped Cache: T1 T2 T3
T3
0
T4
n
2n
T4 Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
14 of 25
Simplified Optimization Problem ●
●
●
Problems of ILP-based approach –
Introduces memory gaps
–
NP-completeness (k-colorability [2])
–
Long solving time for large tasksets
Idea: Reduce search space –
Always arrange tasks consecutively in memory
–
Determine optimal permutation of tasks
Simulated Annealing –
Approximate optimal solution
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
15 of 25
Example Taskset:
Main Memory:
T1 T2 T3 T4
0
T1 T4 T2 T3
2n 4n
T2
2n 4n
Task Interdependency: T1
0
Direct-Mapped Cache: T1 T2
T3 0
T4 T3 n
2n
T4 Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
16 of 25
Practice
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
17 of 25
Task Placement Framework Source Code
Schedule
Cache Config.
Compiler Object Files
Optimizer
Linker
Placement
Memory Image
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
18 of 25
Experimental Results ●
●
●
Optimized three task sets –
Selected ten tasks from WCET Benchmark [3]
–
Executed under RTEMS operating system [4]
–
Simulated with ARM7 emulation MPARM [5]
Performed optimization for three caches (LRU) –
16kb direct-mapped
–
32kb two-way set-associative
–
32kb four-way set-associative
Task sets do not fit in any cache
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
19 of 25
Experimental Results cont. Cost-function compared to cache misses –
Cheaper placement leads to better avg. performance
–
Model not (yet) accurate enough 100 97.5 95 92.5 90
Task Set A
87.5 85 82.5
Task Set B
80
Percentage
●
77.5 75
Task Set C
72.5 70 67.5 65
Cost Ratio
62.5 60 57.5
Miss Ratio
55 52.5 50 47.5 45 direct
2-way
4-way
direct
2-way
4-way
direct
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
2-way
4-way
20 of 25
Experimental Results cont. ●
Persistent cache sets for specific task (compress) 100 95 90 85 80 75
Task Set A
70 65
Task Set B
Percentage
60 55
Task Set C
50 45 40
Best Placement
35 30
Worst Placement
25 20 15 10 5 0 direct
2-way
4-way
direct
2-way
4-way
direct
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
2-way
4-way
21 of 25
Conclusion
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
22 of 25
Achievements ●
●
●
New method to optimize cache-performance –
Program code needs not be modified
–
Arrange instructions and data differently in memory
Computation of optimal task placement –
Globally minimizes threat of eviction
–
Leads to better average performance
Classification of cache sets (non-/persistent) –
Allow tight timing guarantees for preemptive scheduling
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
23 of 25
Future Work ●
Improve cost function –
●
Weight loops differently than straight-lined code
Place procedures instead of whole tasks –
Allows higher variability
–
Achieve better results
●
Restrict to preemption points
●
Evaluation using real-world task sets
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University
24 of 25
References [1] J. Reineke, D. Grund, C. Berg, and R. Wilhelm. Predictability of Cache Replacement Policies. Reports of SFB/TR 14 AVACS 9, SFB/TR 14 AVACS, September 2006. [2] C. Guillon, F. Rastello, T. Bidault, and F. Bouchez. Procedure placement using temporal-ordering information: Dealing with code size expansion. Journal of Embedded Computing, 1(4):437–459, 2005. [3] Benchmarks: http://www.mrtc.mdh.se/projects/wcet/benchmarks.html [4] RTEMS Operating System: http://www.rtems.com/ [5] MPARM: http://www-micrel.deis.unibo.it/sitonew/research/mparm.html
Sebastian Altmeyer and Gernot Gebhard – Compiler Design Lab – Saarland University