Reducing OLTP Instruction Misses with Thread Migration
Recommend Documents
Aug 6, 1990 - addresses with the word ''help'' in the Subject line; you will receive detailed instruc- ... cache lines can be advantageously used to decrease cache miss rates ..... at the miss address even if the requested line is already present.
however, does not have a repository for the data. It may or may not have a copy of the ..... su ers a remote read miss, it loads the data in state shared non-master.
low a miss rate as a traditional 4-way cache on all 26 SPEC2K benchmarks for the ... Extension of Conference Paper An earlier version [Zhang 2006] of this paper appears in the 33rd .... call this the balanced cache, or simply, the B-Cache.
migration mechanism, show how we optimized the performance of our caching fetch operation to eliminate a performance penalty that can occur with ...
Contact: Manager,Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-. 1331 ... guest thread can emulate hardware functionality, allowing .... either Code, Predicate, Group or Call nodes.
Thread Tranquilizer: Dynamically Reducing Performance Variation. Kishore Kumar Pusukuri, Rajiv Gupta, and Laxmi N. Bhuyan, University of California, ...
coordinated strategy to reduce both capacity and conflict misses by changing the ...... The proposals we have just discussed emphasize the flex- ibility of ...
International Journal of Computer Science & Information Technology ..... Science, and the PhD degree in Computer Science from the Ss. Cyril and Methodius.
While software simulation provides the most flexibility in ... an affordable time. ..... [13] Daeho Seo, Akif Ali, Won-Taek Lim, Nauman Rafique, and Mithuna.
have developed a multi-grained migration/checkpointing package ... a novel data conversion scheme is proposed to analyze data types automatically and ...
degree of parallelism, or the number of processors required. Reconfiguration may ... computation units in multi-threaded DSM systems, we fo- cus on thread migration ..... the base address of the current memory segment in the first unit block, as show
plications that operate on significant amounts of data inside the critical section ... Scheduler, Thread Migration, Profiling and tuning applica- tions, Intel R.
[12] Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. Thread motion: Fine-grained power management for multi-core systems. In Proceedings of ISCA 2009 ...
load/store queue, loads may execute when all prior store addresses are known ..... of going all the way out to DRAM and back for each cache line in a.
Automated tiering enables organizations to enhance storage performance ... can dramatically enhance OLTP application per
processing (OLTP) applications often need to ... In fact, benchmark testing at Dell Labs in February ... can dramaticall
An industry standard, Advanced Configuration and Power Interface (ACPI) defining an open interface used by the OS to control power/energy consumption is ...
SE-371 79 Karlskrona, Sweden. Email: {Jan.Kasper.Martinsen,Hakan.Grahn}@bth.se. Anders Isberg and Henrik Sundström. Sony Mobile Communications AB.
FFT 128 stage4 to stage6. 45.6%. 47.2%. 53. 33. StreamMD mole DR XL COND. 98.2%. 98.7%. 125. 68. Table 5: Scheduler results of software-pipeline initiation ...
Jun 29, 2004 - AbstractâSimultaneous Multi Threading (SMT) is a processor design method in which concurrent hardware threads share processor resources ...
as a staging area for the bulk transfers to and from DRAM ... the ALU pipeline from the unpredictable latencies of DRAM ..... conforming to XDR DRAM [9].
Apr 17, 2010 - AbstractâWe introduce the Execution Migration. Machine (EM2), a .... we delineate the operation of execution migration, and in Section IV ...
Apr 2, 2012 - sharing of best practice is a key theme, and the UK looks to share its ...... Training is provided to IOs
Mar 31, 2012 - The EMN has been established via Council Decision 2008/381/EC. Available to download from ... LIST OF FIGURES vi ... 3.1.2.1 Visa Application Data Sharing with the UK. 31. 3.1.2.2 ..... Statutory Instruments made under ...
Reducing OLTP Instruction Misses with Thread Migration
Transactions Running Parallel. 6. T1. T2. T3. Instruction parts that can fit into L1-I. Threads. Transaction. T123. Comm
Reducing OLTP Instruction Misses with Thread Migration Islam Atta Pınar Tözün Anastasia Ailamaki Andreas Moshovos University of Toronto École Polytechnique Fédérale de Lausanne
OLTP on a Intel Xeon5660 Shore‐MT Hyper‐threading disabled 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
0.8
Breakdown of Core Stalls
Instructions per Cycle
better
0.9
Resource (includes data) Instructions
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
TPC‐C
TPC‐E
TPC‐C
TPC‐E
IPC threshold => Migrate
8
TMi Transaction A T1 T2
CORES 0 1
time
T1 T2 T1 T1
Where to migrate?
• Check the last N misses recorded L1I in other caches 1) No matching cache => Move to an idle core if exists T1 2) Matching cache => Move to that core T2 3) None of above => Do not move T2
9
Experimental Setup • Trace Simulation – – – – –
PIN to extract instructions & data accesses per transaction 16 core system 32KB 8‐way set‐associative L1 caches Miss‐threshold is 256 Last 6 misses are kept
• Shore‐MT as the storage manager – Workloads: TPC‐C, TPC‐E
10
Impact on L1‐I Misses Misses per k‐Instruction
better
45 40 35
Instruction
30 25 20 15 10 5 0 No Migration
TMi TPC‐C
TMi Blind
No Migration
TMi
TMi Blind
TPC‐E
Instruction misses reduced by half
11
Impact on L1‐D Misses Misses per k‐Instruction
better
45 40 35
Write Data Read Data Instruction
30 25 20 15 10 5 0 No Migration
TMi TPC‐C
TMi Blind
No Migration
TMi
TMi Blind
TPC‐E
Cannot ignore increased data misses
12
TMi’s Challenges • Dealing with the data left behind – Prefetching
• OS support needed – Disabling OS control over thread scheduling
13
Conclusion • ~50% of the time OLTP stalls on instructions • Spread computation through thread migration • TMi – Halves L1‐I misses – Time‐wise ~30% expected improvement – Data misses should be handled