A Black-box Approach to. Understanding Concurrency in. DaCapo. Tomas
Kalibera. Matthew Mole. Richard Jones. Jan Vitek ...
A Black-box Approach to Understanding Concurrency in DaCapo Tomas Kalibera Matthew Mole Richard Jones Jan Vitek
Great tool for systems research. But…
DACAPO BENCHMARKS
• Open-source suite of Java application benchmarks • Widely used for experimental evaluation in academic research (including GC/MM) • Releases – 2006 – mostly single threaded – 2009 – new multi-threaded and scalable benchmarks – New release expected in 2012
DaCapo 2009 avrora
simulates a number of programs run on a grid of AVR microcontrollers
batik
produces a number of Scalable Vector Graphics (SVG) images based on the unit tests in Apache Batik
eclipse
executes some of the (non-gui) jdt performance tests for the Eclipse IDE
fop
takes an XSL-FO file, parses it and formats it, generating a PDF file.
h2
executes a JDBCbench-like in-memory benchmark, executing a number of transactions against a model of a banking application, replacing the hsqldb benchmark
jython
inteprets a the pybench Python benchmark
luindex
Uses lucene to indexes a set of documents; the works of Shakespeare and the King James Bible
lusearch
Uses lucene to do a text search of keywords over a corpus of data comprising the works of Shakespeare and the King James Bible
pmd
analyzes a set of Java classes for a range of source code problems
sunflow
renders a set of images using ray tracing
tomcat
runs a set of queries against a Tomcat server retrieving and verifying the resulting webpages
tradebeans
runs the daytrader benchmark via a Jave Beans to a GERONIMO backend with an in memory h2 as the underlying database
tradesoap
runs the daytrader benchmark via a SOAP to a GERONIMO backend with in memory h2 as the underlying database
xalan
transforms XML documents into HTML
http://dacapobench.org/benchmarks.html
Threading in DaCapo 2009 avrora
Driven by a single external thread, but it is internally multithreaded with each simulated element using a thread (i.e. each node in a grid of simulated nodes is threaded). Avrora demonstrates a high volume of fine granularity interactions between simulator threads.
eclipse
Driven by a single external thread it is internally multithreaded. However, some worker thread activity seems to be serialised, while others seem to engage in some fine granularity interactions. As such, eclipse exhibits periods of little concurrency and brief periods of moderate granularity concurrency (further investigation is required).
h2
Multithreaded, it is driven by one client thread per hardware thread and internally has a server thread for each client thread as well as other support threads. The number of client threads for the default benchmark size is set a one per hardware thread.
…
http://dacapobench.org/threads.html
How `concurrent’ the benchmarks REALLY are ? • Threading – How many threads do non-trivial portion of work? – How many do so concurrently (vs. short-lived threads)
• Communication – To what extent threads share memory, acquire locks, access volatiles, or use wait/notify? – How they share memory?
Byte-code instrumentation that scales to DaCapo.
HOW WE MEASURE JAVA APPS
Byte-code instrumentation • Instrument byte-code of applications – Allocations, memory access, monitor operations
• Dynamic instrumentation – Load Java agent before “main”method and instrument classes already loaded – Instrument any classes loaded later
• Platform independence – Native code of VM (GC), OS is excluded – Java libraries are included
Instrumentation meta-data • Per-thread counters – Count operations done by the thread – Fully thread-local
• Per-object state – Last thread that wrote it, if it is a shared object,… – Stored in hand-implemented hash-map • Indexed by object reference • Semi-thread local, self organising for performance
Black-box metrics of concurrency.
WHAT WE MEASURE
As if we had full execution trace… • Record operations allocation, memory read, memory write, monitor entry • For each operation, keep timestamp target object current thread Our implementation is however on-the-fly.
Shared memory accesses • Shared object – Object ever accessed by more than 1 thread – Shared read, write, entry • Read from a field/element of a shared object
• Spot-shared object – Object accessed by more than 1 thread recently – Spot-shared read, write, entry • Read from a field/element of a spot-shared object
Alternating modifications • “change of write-ownership of object, assuming monitor entry is also kind of a write’’ • Alternating write – Write to an object that was last written/entered by another thread
• Alternating entry – Entry to …
Thread density • How many threads significantly contribute to work done by the program? • Work – Operations: allocation, reads, writes, entries – Shared, spot-shared, alternating operations
• Significantly – hottest threads to cover 95% of work
• Periodic density – the median density over short execution time intervals
Memory sharing patterns • Read-only/write-only shared – Object only written (or only read)
• Stationary object (read, write) – ``kind of immutable object’’ – Object not written after read
• Single-writer object – Object only written by 1 thread
• Same-owner read (write) – Access to object that was last accessed by current thread
The real concurrency in DaCapo.
RESULTS
Normalized Rates (Operations Per Second)
Thread Density
Thread Density Only benchmarks with concurrent activity
Thread Density Only 2009 benchmarks with concurrent activity (measured on 4-core machine)
Percentage of Shared/Spot-Shared Accesses (concurrent 2009 benchmarks)
Sharing Patterns of Reads and Writes (concurrent 2009 benchmarks)
Rates of Synchronization Operations (Operations Per Second)
Few observations.
SUMMARY
Summary, observations • Nearly all sharing is stationary, single-writer, or without ownership change. • More reads than writes, particularly for statics. Reads dominate shared accesses. • Static accesses are massively shared. • Reads are to older objects than writes. • Young objects do not dominate memory accesses. The chance that an access is shared increases with age. • Volatile accesses are nearly as frequent as locks. Nearly half of static accesses are to volatiles.
Rates of Synchronization Operations (Operations Per Second)