Black-box Approach to Understanding Concurrency in DaCapo

25 downloads 23 Views 1010KB Size Report
A Black-box Approach to. Understanding Concurrency in. DaCapo. Tomas Kalibera. Matthew Mole. Richard Jones. Jan Vitek ...
A Black-box Approach to Understanding Concurrency in DaCapo Tomas Kalibera Matthew Mole Richard Jones Jan Vitek

Great tool for systems research. But…

DACAPO BENCHMARKS

• Open-source suite of Java application benchmarks • Widely used for experimental evaluation in academic research (including GC/MM) • Releases – 2006 – mostly single threaded – 2009 – new multi-threaded and scalable benchmarks – New release expected in 2012

DaCapo 2009 avrora

simulates a number of programs run on a grid of AVR microcontrollers

batik

produces a number of Scalable Vector Graphics (SVG) images based on the unit tests in Apache Batik

eclipse

executes some of the (non-gui) jdt performance tests for the Eclipse IDE

fop

takes an XSL-FO file, parses it and formats it, generating a PDF file.

h2

executes a JDBCbench-like in-memory benchmark, executing a number of transactions against a model of a banking application, replacing the hsqldb benchmark

jython

inteprets a the pybench Python benchmark

luindex

Uses lucene to indexes a set of documents; the works of Shakespeare and the King James Bible

lusearch

Uses lucene to do a text search of keywords over a corpus of data comprising the works of Shakespeare and the King James Bible

pmd

analyzes a set of Java classes for a range of source code problems

sunflow

renders a set of images using ray tracing

tomcat

runs a set of queries against a Tomcat server retrieving and verifying the resulting webpages

tradebeans

runs the daytrader benchmark via a Jave Beans to a GERONIMO backend with an in memory h2 as the underlying database

tradesoap

runs the daytrader benchmark via a SOAP to a GERONIMO backend with in memory h2 as the underlying database

xalan

transforms XML documents into HTML

http://dacapobench.org/benchmarks.html

Threading in DaCapo 2009 avrora

Driven by a single external thread, but it is internally multithreaded with each simulated element using a thread (i.e. each node in a grid of simulated nodes is threaded). Avrora demonstrates a high volume of fine granularity interactions between simulator threads.

eclipse

Driven by a single external thread it is internally multithreaded. However, some worker thread activity seems to be serialised, while others seem to engage in some fine granularity interactions. As such, eclipse exhibits periods of little concurrency and brief periods of moderate granularity concurrency (further investigation is required).

h2

Multithreaded, it is driven by one client thread per hardware thread and internally has a server thread for each client thread as well as other support threads. The number of client threads for the default benchmark size is set a one per hardware thread.



http://dacapobench.org/threads.html

How `concurrent’ the benchmarks REALLY are ? • Threading – How many threads do non-trivial portion of work? – How many do so concurrently (vs. short-lived threads)

• Communication – To what extent threads share memory, acquire locks, access volatiles, or use wait/notify? – How they share memory?

Byte-code instrumentation that scales to DaCapo.

HOW WE MEASURE JAVA APPS

Byte-code instrumentation • Instrument byte-code of applications – Allocations, memory access, monitor operations

• Dynamic instrumentation – Load Java agent before “main”method and instrument classes already loaded – Instrument any classes loaded later

• Platform independence – Native code of VM (GC), OS is excluded – Java libraries are included

Instrumentation meta-data • Per-thread counters – Count operations done by the thread – Fully thread-local

• Per-object state – Last thread that wrote it, if it is a shared object,… – Stored in hand-implemented hash-map • Indexed by object reference • Semi-thread local, self organising for performance

Black-box metrics of concurrency.

WHAT WE MEASURE

As if we had full execution trace… • Record operations allocation, memory read, memory write, monitor entry • For each operation, keep timestamp target object current thread Our implementation is however on-the-fly.

Shared memory accesses • Shared object – Object ever accessed by more than 1 thread – Shared read, write, entry • Read from a field/element of a shared object

• Spot-shared object – Object accessed by more than 1 thread recently – Spot-shared read, write, entry • Read from a field/element of a spot-shared object

Alternating modifications • “change of write-ownership of object, assuming monitor entry is also kind of a write’’ • Alternating write – Write to an object that was last written/entered by another thread

• Alternating entry – Entry to …

Thread density • How many threads significantly contribute to work done by the program? • Work – Operations: allocation, reads, writes, entries – Shared, spot-shared, alternating operations

• Significantly – hottest threads to cover 95% of work

• Periodic density – the median density over short execution time intervals

Memory sharing patterns • Read-only/write-only shared – Object only written (or only read)

• Stationary object (read, write) – ``kind of immutable object’’ – Object not written after read

• Single-writer object – Object only written by 1 thread

• Same-owner read (write) – Access to object that was last accessed by current thread

The real concurrency in DaCapo.

RESULTS

Normalized Rates (Operations Per Second)

Thread Density

Thread Density Only benchmarks with concurrent activity

Thread Density Only 2009 benchmarks with concurrent activity (measured on 4-core machine)

Percentage of Shared/Spot-Shared Accesses (concurrent 2009 benchmarks)

Sharing Patterns of Reads and Writes (concurrent 2009 benchmarks)

Rates of Synchronization Operations (Operations Per Second)

Few observations.

SUMMARY

Summary, observations • Nearly all sharing is stationary, single-writer, or without ownership change. • More reads than writes, particularly for statics. Reads dominate shared accesses. • Static accesses are massively shared. • Reads are to older objects than writes. • Young objects do not dominate memory accesses. The chance that an access is shared increases with age. • Volatile accesses are nearly as frequent as locks. Nearly half of static accesses are to volatiles.

Rates of Synchronization Operations (Operations Per Second)