Advances in Simultaneous Multithreading Testcase Generation ...

3 downloads 1635 Views 5MB Size Report
E-mail: ludden,[email protected]. Michal Rimon. Allon Adir .... Thread Irritation: POWER6 in-order design. Processor Cycles ... Test Template. Test Template.
Advances in Simultaneous Multithreading Testcase Generation Methods John Ludden Bryan Hickerson

Michal Rimon Allon Adir

E-mail: ludden,[email protected]

E-mail: michalr,[email protected]

IBM Systems & Technology Group

© 2010 IBM Corporation

Haifa Research Lab

IBM System p

Overview 1. Background and Motivation 2. The Thread Irritation Technique 3. Additional SMT verification techniques 4. Summary

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Simultaneous Multithreading (SMT) Increase execution unit utilization Increases total throughput

Example: 2-way SMT

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

SMT in IBM POWER Processors

567mm2

341mm2 389mm2

415mm2

Dual Core Chip Multi Processing Distributed Switch Shared L2 Dynamic LPARs (32)

Dual Core Enhanced Scaling

SMT2

Distributed Switch + Core Parallelism + FP Performance + Memory bandwidth + Virtualization

Dual Core High Frequencies Virtualization + Memory Subsystem + VMX (Altivec) Instruction Retry Dyn Energy Mgmt

SMT2 +

Protection Keys

Multi Core On-Chip eDRAM Power Optimized Cores Mem Subsystem ++

SMT4

Reliability + VSX & VMX (AltiVec) Protection Keys+

Concept Phase On Schedule Core running in Continued Leadership

2010 © 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

MP-based SMT verification •

User specifies a separate scenario for each threa



Collisions between shared resources Memory Caches (L1, L2) Core-level registers

Used as the only SMT verification technique for POWE –Good coverage results

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

MP-Based SMT verification in POWER6 Some bug scenarios which can be exercised with Single-threaded testcases in POWER5 require multi-threaded testcases in POWER6 – POWER5 out-of-order design – POWER6 in-order design Coverage holes

Example:

SEQUENCE add

G3,G1,G2 More SMT scenarios G10,G8,G9 needed

ldx subf G5,G6,G7

Processor Cycles POWER5 out-of-order design

© 2006 IBM Corporation

Processor Cycles POWER6 in-order design

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Drawbacks of MP-Based SMT verification Threads are not typically synchronized – de facto simulation running mostly single-threaded tests – synchronization routines will not solve the problem

No way to control cross-thread interaction – each thread should be doing an ‘interesting scenario’ at the same time

Generation time is exponential with number of threads

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

The Thread Irritation technique Description – Primary thread doing a single-threaded scenario – One or more “irritator” threads executing in an infinite loop – Primary thread kills Irritator thread(s) by terminating the infinite loop

Irritator thread restrictions – Cannot modify memory read by primary thread – Cannot modify registers shared with other thread – Cannot cause unexpected exceptions

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Thread Irritation Main Properties: – Balanced test length among all threads – Ensures thread interaction – Efficient Testcase Generation

Effectively exposes:

–livelock, hangs, thread starvation

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Thread Irritation: POWER6 in-order design SEQUENCE add

G3,G1,G2

ldx G10,G8,G9 subf G5,G6,G7

POWER6 single-threaded scenario

Processor Cycles

© 2006 IBM Corporation

POWER6 thread-irritation scenario

Processor Cycles

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Thread Irritation: POWER5 out-of-order design SEQUENCE add

G3,G1,G2

ldx G10,G8,G9 subf G5,G6,G7

POWER5 single-threaded scenario

Processor Cycles

© 2006 IBM Corporation

POWER5 thread-irritation scenario

Processor Cycles

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Thread Irritation: POWER7 usage Core Level Registers Common to all threads

– SMT4 – Out-of-order

T0 Registers T0 finite random instruction Stream

T1 Registers

T2 Registers

T3 Registers

T1 infinite loop irritator A

T2 infinite loop irritator B

T3 infinite loop irritator C

Real Memory Example: Long Random Thread SEQUENCE REPEAT 100 Load ?addr stw nop, A stw nop, B stw nop, C Generated Instr: 103 Simulated Instr: 103

© 2006 IBM Corporation

Irritator Thread A

Irritator Thread B

Irritator Thread C

LB0: SEQUENCE Store ?addr A: b to LB0

LB1: SEQUENCE Store ?addr B: b to LB1

SEQUENCE C: b2self

Generated Instr: 2 Simulated Instr: Infinite

Generated Instr: 2 Simulated Instr: Infinite

Generated Instr: 1 Simulated Instr: Infinite

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Thread Irritation in Post-Silicon Verification Post-Silicon platforms are much faster than simulation enabling many more verification cycles

But have limited observability and controllability Harder to monitor, check, debug, measure coverage

Communication with the environment can become a bottleneck Need a self contained solution – e.g. an exerciser

Multi-Threaded Exerciser Imag Test Templates Topology Architectural Model

Threadmill TestTemplate Template Test TestTemplate Template Test

Builder

Generation Execution Checking

OS services © 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Thread Irritation on Post-Silicon Assigns system threads to primary and irritator scenario roles – Utilizes all of the system’s threads (up to 1024 in a POWER7 system).

The last primary thread of a combination kills the irritators with an interrupt Execute thousands of primary/irritation combinations in a single image

An example exerciser setup for 16 system threads (T0-T15), 10 primary scenarios (P0-P9), 10 irritation scenario (I0-I9) Threads T0, T1, T2, T3 Threads T4,T5 Threads T6,T7,..T15

: 4-threaded primary scenarios P0,..P9 : single thread irritation scenarios I0,..I4 : single thread irritation scenarios I5,..I9

Number of combinations: 10 X 5 X 5 = 250

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

POWER7 Thread Irritation Bug Summary Exposed 23 “high quality” bugs in pre-silicon simulation Bug areas: – Flush Related: 6 – Hang: 5 – Thread Starvation: 5 – Live-lock: 4 – Cache Write Transition: 2 – Branch to wrong target EA: 1

One “high quality” bug exposed by Thread Irritation in post-silicon

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

!

"

"

IBM System p

Additional SMT Verification Techniques Motivation – POWER7 – Four threads per core – MP-Based technique ineffective for > 2 threads

Cpu Time (minutes)

Generation CPU Time Per Testcase - Unit Comparison (500 Instructions Per Thread) 70

Fixed Point Unit

60

Instruction Fetch Unit

50

Load Store Unit Floating Point Unit

40 30 20 10 0 1

2

4

Number of Threads

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Additional SMT Verification Techniques – Thread merge

Thread Merge – Generate buckets of single-threaded testcases

T

• Architectural resource separation

T11, T12,T13,…

– Merged into SMT testcases

Advantages

T21, T22,T

T1,T2

– Achieve a desired generation/simulation time ratio • Bucket size is a testcase generation parameter – Exercises ST testcases under different micro-architecture conditions

© 2006 IBM Corporation

T2

T11,T21 T11,T22 T11,T23 : : :

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Additional SMT Verification Techniques – Thread replication Thread Replication – Generate a single-threaded testcase • Each load/store instruction accesses a new-memory location – Replicate for multiple threads

Advantages – Achieve an optimal generation/simulation time ratio – Creates interesting SMT scenarios: • Execute the same instruction stream – Exercise Instruction-Cache logic • Stress the same resources

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Comparison of SMT testcase generation methods Livelock

Starvation

+

-

-

-

-

-

-

+

-

+

-

-

-

+

+

+

-

+

+

+

+

-

-

-

-

-

-

+

© 2006 IBM Corporation

True Sharing

Balanced Instr Stream

MP-Based

Shared Instr. Stream

Irritation

Thread Interaction

Replication

Efficient Generation

Merge

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

IBM System p

Summary We explained the challenges in SMT verification – In-order design – Out-of-order design

We described and analyzed three new SMT verification methods: – Thread Irritation – Thread Merge

scalable to any number of threads

– Thread Replication

We showed the effectiveness of the Thread Irritation technique – In simulation and post-silicon verification

New methods do not exclude the MP-based verification technique – Architectural thread-synchronization constructs – True-sharing bug scenarios

© 2006 IBM Corporation

DRAFT: IBM Confidential

IBM Sys

© 2010 IBM Corpor

Suggest Documents