Reducing OLTP Instruction Misses with Thread Migration

Recommend Documents

Reducing Compulsory and Capacity Misses - HP Labs

Aug 6, 1990 - addresses with the word ''help'' in the Subject line; you will receive detailed instruc- ... cache lines can be advantageously used to decrease cache miss rates ..... at the miss address even if the requested line is already present.

Reducing Remote Con ict Misses: NUMA with Remote ... - CiteSeerX

however, does not have a repository for the data. It may or may not have a copy of the ..... su ers a remote read miss, it loads the data in state shared non-master.

Reducing Cache Misses Through Programmable ... - Semantic Scholar

low a miss rate as a traditional 4-way cache on all 26 SPEC2K benchmarks for the ... Extension of Conference Paper An earlier version [Zhang 2006] of this paper appears in the 33rd .... call this the balanced cache, or simply, the B-Cache.

A Multithreaded Runtime System With Thread Migration for ...

migration mechanism, show how we optimized the performance of our caching fetch operation to eliminate a performance penalty that can occur with ...

Hardware to Software Migration with Real-Time Thread Integration ...

Contact: Manager,Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-. 1331 ... guest thread can emulate hardware functionality, allowing .... either Code, Predicate, Group or Call nodes.

A Thread Tranquilizer: Dynamically Reducing ...

Thread Tranquilizer: Dynamically Reducing Performance Variation. Kishore Kumar Pusukuri, Rajiv Gupta, and Laxmi N. Bhuyan, University of California, ...

Reducing Capacity and Conflict Misses using Set Saturation ... - udc.es

coordinated strategy to reduce both capacity and conflict misses by changing the ...... The proposals we have just discussed emphasize the flexibility of ...

reducing competitive cache misses in modern processor ... - arXiv

International Journal of Computer Science & Information Technology ..... Science, and the PhD degree in Computer Science from the Ss. Cyril and Methodius.

Deadlock-Free Fine-Grained Thread Migration - People.csail.mit.edu ...

While software simulation provides the most flexibility in ... an affordable time. ..... [13] Daeho Seo, Akif Ali, Won-Taek Lim, Nauman Rafique, and Mithuna.

Process/Thread Migration and Checkpointing in Heterogeneous ...

have developed a multi-grained migration/checkpointing package ... a novel data conversion scheme is proposed to analyze data types automatically and ...

MigThread: Thread Migration in DSM Systems - CiteSeerX

degree of parallelism, or the number of processors required. Reconfiguration may ... computation units in multi-threaded DSM systems, we fo- cus on thread migration ..... the base address of the current memory segment in the first unit block, as show

Thread Migration to Improve Synchronization Performance - The ...

plications that operate on significant amounts of data inside the critical section ... Scheduler, Thread Migration, Profiling and tuning applications, Intel R.

Deadlock-Free Fine-Grained Thread Migration - People.csail.mit.edu

[12] Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. Thread motion: Fine-grained power management for multi-core systems. In Proceedings of ISCA 2009 ...

Reducing Cache Misses Using Hardware and Software ... - CiteSeerX

load/store queue, loads may execute when all prior store addresses are known ..... of going all the way out to DRAM and back for each cache line in a.

Enhancing OLTP database performance with ... - Dell Community

Automated tiering enables organizations to enhance storage performance ... can dramatically enhance OLTP application per

Enhancing OLTP database performance with ... - Dell Community

processing (OLTP) applications often need to ... In fact, benchmark testing at Dell Labs in February ... can dramaticall

Reducing Power Requirements of Instruction Scheduling Through ...

An industry standard, Advanced Configuration and Power Interface (ACPI) defining an open interface used by the OS to control power/energy consumption is ...

Reducing Memory in Software-Based Thread-Level Speculation for ...

SE-371 79 Karlskrona, Sweden. Email: {Jan.Kasper.Martinsen,Hakan.Grahn}@bth.se. Anders Isberg and Henrik SundstrÃ¶m. Sony Mobile Communications AB.

Tradeoff between Data-, Instruction-, and Thread-Level Parallelism ...

FFT 128 stage4 to stage6. 45.6%. 47.2%. 53. 33. StreamMD mole DR XL COND. 98.2%. 98.7%. 125. 68. Table 5: Scheduler results of software-pipeline initiation ...

Thread-Sensitive Instruction Issue for SMT Processors - CiteSeerX

Jun 29, 2004 - AbstractâSimultaneous Multi Threading (SMT) is a processor design method in which concurrent hardware threads share processor resources ...

Tradeoff between Data-, Instruction-, and Thread-Level Parallelism ...

as a staging area for the bulk transfers to and from DRAM ... the ALU pipeline from the unpredictable latencies of DRAM ..... conforming to XDR DRAM [9].

Instruction-Level Execution Migration - DSpace@MIT

Apr 17, 2010 - AbstractâWe introduce the Execution Migration. Machine (EM2), a .... we delineate the operation of execution migration, and in Section IV ...

Practical Measures for Reducing Irregular Migration - European ...

Apr 2, 2012 - sharing of best practice is a key theme, and the UK looks to share its ...... Training is provided to IOs

practical measures for reducing irregular migration: ireland

Mar 31, 2012 - The EMN has been established via Council Decision 2008/381/EC. Available to download from ... LIST OF FIGURES vi ... 3.1.2.1 Visa Application Data Sharing with the UK. 31. 3.1.2.2 ..... Statutory Instruments made under ...

Reducing OLTP Instruction Misses with Thread Migration

Download PDF

3 downloads 208 Views 254KB Size Report

Comment

Transactions Running Parallel. 6. T1. T2. T3. Instruction parts that can fit into L1-I. Threads. Transaction. T123. Comm

Reducing OLTP Instruction Misses with Thread Migration Islam Atta    Pınar Tözün Anastasia Ailamaki Andreas Moshovos University of Toronto École Polytechnique Fédérale de Lausanne

OLTP on a Intel Xeon5660 Shore‐MT Hyper‐threading disabled 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

0.8

Breakdown of Core Stalls

Instructions  per Cycle

better

0.9

Resource (includes data) Instructions

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

TPC‐C

TPC‐E

TPC‐C

TPC‐E

IPC  threshold => Migrate

8

TMi Transaction A T1 T2

CORES 0 1

time

T1 T2 T1 T1

Where to migrate?

• Check the last N misses recorded L1I in other caches 1) No matching cache => Move to an idle core if exists T1 2) Matching cache => Move to that core T2 3) None of above => Do not move T2

9

Experimental Setup • Trace Simulation – – – – –

PIN to extract instructions & data accesses per transaction 16 core system 32KB 8‐way set‐associative L1 caches Miss‐threshold is 256 Last 6 misses are kept

• Shore‐MT as the storage manager – Workloads: TPC‐C, TPC‐E

10

Impact on L1‐I Misses Misses per k‐Instruction

better

45 40 35

Instruction

30 25 20 15 10 5 0 No Migration

TMi TPC‐C

TMi Blind

No Migration

TMi

TMi Blind

TPC‐E

Instruction misses reduced by half

11

Impact on L1‐D Misses Misses per k‐Instruction

better

45 40 35

Write Data Read Data Instruction

30 25 20 15 10 5 0 No Migration

TMi TPC‐C

TMi Blind

No Migration

TMi

TMi Blind

TPC‐E

Cannot ignore increased data misses

12

TMi’s Challenges • Dealing with the data left behind – Prefetching

• Depends on thread identification – Software assisted – Hardware detection

• OS support needed – Disabling OS control over thread scheduling

13

Conclusion • ~50% of the time OLTP stalls on instructions • Spread computation through thread migration • TMi – Halves L1‐I misses – Time‐wise ~30% expected improvement – Data misses should be handled

Thank you!

14