Advances in Simultaneous Multithreading Testcase Generation Methods John Ludden Bryan Hickerson
Michal Rimon Allon Adir
E-mail: ludden,
[email protected]
E-mail: michalr,
[email protected]
IBM Systems & Technology Group
© 2010 IBM Corporation
Haifa Research Lab
IBM System p
Overview 1. Background and Motivation 2. The Thread Irritation Technique 3. Additional SMT verification techniques 4. Summary
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Simultaneous Multithreading (SMT) Increase execution unit utilization Increases total throughput
Example: 2-way SMT
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
SMT in IBM POWER Processors
567mm2
341mm2 389mm2
415mm2
Dual Core Chip Multi Processing Distributed Switch Shared L2 Dynamic LPARs (32)
Dual Core Enhanced Scaling
SMT2
Distributed Switch + Core Parallelism + FP Performance + Memory bandwidth + Virtualization
Dual Core High Frequencies Virtualization + Memory Subsystem + VMX (Altivec) Instruction Retry Dyn Energy Mgmt
SMT2 +
Protection Keys
Multi Core On-Chip eDRAM Power Optimized Cores Mem Subsystem ++
SMT4
Reliability + VSX & VMX (AltiVec) Protection Keys+
Concept Phase On Schedule Core running in Continued Leadership
2010 © 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
MP-based SMT verification •
User specifies a separate scenario for each threa
•
Collisions between shared resources Memory Caches (L1, L2) Core-level registers
Used as the only SMT verification technique for POWE –Good coverage results
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
MP-Based SMT verification in POWER6 Some bug scenarios which can be exercised with Single-threaded testcases in POWER5 require multi-threaded testcases in POWER6 – POWER5 out-of-order design – POWER6 in-order design Coverage holes
Example:
SEQUENCE add
G3,G1,G2 More SMT scenarios G10,G8,G9 needed
ldx subf G5,G6,G7
Processor Cycles POWER5 out-of-order design
© 2006 IBM Corporation
Processor Cycles POWER6 in-order design
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Drawbacks of MP-Based SMT verification Threads are not typically synchronized – de facto simulation running mostly single-threaded tests – synchronization routines will not solve the problem
No way to control cross-thread interaction – each thread should be doing an ‘interesting scenario’ at the same time
Generation time is exponential with number of threads
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
The Thread Irritation technique Description – Primary thread doing a single-threaded scenario – One or more “irritator” threads executing in an infinite loop – Primary thread kills Irritator thread(s) by terminating the infinite loop
Irritator thread restrictions – Cannot modify memory read by primary thread – Cannot modify registers shared with other thread – Cannot cause unexpected exceptions
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Thread Irritation Main Properties: – Balanced test length among all threads – Ensures thread interaction – Efficient Testcase Generation
Effectively exposes:
–livelock, hangs, thread starvation
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Thread Irritation: POWER6 in-order design SEQUENCE add
G3,G1,G2
ldx G10,G8,G9 subf G5,G6,G7
POWER6 single-threaded scenario
Processor Cycles
© 2006 IBM Corporation
POWER6 thread-irritation scenario
Processor Cycles
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Thread Irritation: POWER5 out-of-order design SEQUENCE add
G3,G1,G2
ldx G10,G8,G9 subf G5,G6,G7
POWER5 single-threaded scenario
Processor Cycles
© 2006 IBM Corporation
POWER5 thread-irritation scenario
Processor Cycles
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Thread Irritation: POWER7 usage Core Level Registers Common to all threads
– SMT4 – Out-of-order
T0 Registers T0 finite random instruction Stream
T1 Registers
T2 Registers
T3 Registers
T1 infinite loop irritator A
T2 infinite loop irritator B
T3 infinite loop irritator C
Real Memory Example: Long Random Thread SEQUENCE REPEAT 100 Load ?addr stw nop, A stw nop, B stw nop, C Generated Instr: 103 Simulated Instr: 103
© 2006 IBM Corporation
Irritator Thread A
Irritator Thread B
Irritator Thread C
LB0: SEQUENCE Store ?addr A: b to LB0
LB1: SEQUENCE Store ?addr B: b to LB1
SEQUENCE C: b2self
Generated Instr: 2 Simulated Instr: Infinite
Generated Instr: 2 Simulated Instr: Infinite
Generated Instr: 1 Simulated Instr: Infinite
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Thread Irritation in Post-Silicon Verification Post-Silicon platforms are much faster than simulation enabling many more verification cycles
But have limited observability and controllability Harder to monitor, check, debug, measure coverage
Communication with the environment can become a bottleneck Need a self contained solution – e.g. an exerciser
Multi-Threaded Exerciser Imag Test Templates Topology Architectural Model
Threadmill TestTemplate Template Test TestTemplate Template Test
Builder
Generation Execution Checking
OS services © 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Thread Irritation on Post-Silicon Assigns system threads to primary and irritator scenario roles – Utilizes all of the system’s threads (up to 1024 in a POWER7 system).
The last primary thread of a combination kills the irritators with an interrupt Execute thousands of primary/irritation combinations in a single image
An example exerciser setup for 16 system threads (T0-T15), 10 primary scenarios (P0-P9), 10 irritation scenario (I0-I9) Threads T0, T1, T2, T3 Threads T4,T5 Threads T6,T7,..T15
: 4-threaded primary scenarios P0,..P9 : single thread irritation scenarios I0,..I4 : single thread irritation scenarios I5,..I9
Number of combinations: 10 X 5 X 5 = 250
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
POWER7 Thread Irritation Bug Summary Exposed 23 “high quality” bugs in pre-silicon simulation Bug areas: – Flush Related: 6 – Hang: 5 – Thread Starvation: 5 – Live-lock: 4 – Cache Write Transition: 2 – Branch to wrong target EA: 1
One “high quality” bug exposed by Thread Irritation in post-silicon
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
!
"
"
IBM System p
Additional SMT Verification Techniques Motivation – POWER7 – Four threads per core – MP-Based technique ineffective for > 2 threads
Cpu Time (minutes)
Generation CPU Time Per Testcase - Unit Comparison (500 Instructions Per Thread) 70
Fixed Point Unit
60
Instruction Fetch Unit
50
Load Store Unit Floating Point Unit
40 30 20 10 0 1
2
4
Number of Threads
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Additional SMT Verification Techniques – Thread merge
Thread Merge – Generate buckets of single-threaded testcases
T
• Architectural resource separation
T11, T12,T13,…
– Merged into SMT testcases
Advantages
T21, T22,T
T1,T2
– Achieve a desired generation/simulation time ratio • Bucket size is a testcase generation parameter – Exercises ST testcases under different micro-architecture conditions
© 2006 IBM Corporation
T2
T11,T21 T11,T22 T11,T23 : : :
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Additional SMT Verification Techniques – Thread replication Thread Replication – Generate a single-threaded testcase • Each load/store instruction accesses a new-memory location – Replicate for multiple threads
Advantages – Achieve an optimal generation/simulation time ratio – Creates interesting SMT scenarios: • Execute the same instruction stream – Exercise Instruction-Cache logic • Stress the same resources
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Comparison of SMT testcase generation methods Livelock
Starvation
+
-
-
-
-
-
-
+
-
+
-
-
-
+
+
+
-
+
+
+
+
-
-
-
-
-
-
+
© 2006 IBM Corporation
True Sharing
Balanced Instr Stream
MP-Based
Shared Instr. Stream
Irritation
Thread Interaction
Replication
Efficient Generation
Merge
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor
IBM System p
Summary We explained the challenges in SMT verification – In-order design – Out-of-order design
We described and analyzed three new SMT verification methods: – Thread Irritation – Thread Merge
scalable to any number of threads
– Thread Replication
We showed the effectiveness of the Thread Irritation technique – In simulation and post-silicon verification
New methods do not exclude the MP-based verification technique – Architectural thread-synchronization constructs – True-sharing bug scenarios
© 2006 IBM Corporation
DRAFT: IBM Confidential
IBM Sys
© 2010 IBM Corpor