When a Prolog goal fails, it 'backtracks' by resuming exe- cution at the most recent point in the computation where alternative clauses existed to match the ...
Technische Universität Dresden, Center for Information Services and High Performance Computing (ZIH), ... parallel performance measurement in three leading HPC tools: PAPI [1] ..... calls into the CUDA or OpenCL libraries, VampirTrace can.
Jun 18, 2010 - analysis system and introduces the generic-operation-type abstraction concept. .... pilers, including offerings from HP and IBM, as well as ..... poor orchestration of parallel code such as uneven work- ...... Dynaprof tutorial.
note considers the classical elliptic test problem of a Poisson equation with ... is an IBM Server x iDataPlex purchased in 2009 by the UMBC High Performance ...
Abstract. In this report, we present an application of parallel computing to an estimation procedure in statistics. The method of maximum likelihood estimation ...
Sep 27, 2011 - the suitability of field- programmable gate arrays. (FPGAs) ..... automation. The size of .... The great flexibility of the network, which includes a.
Silvia Brunet. Virgil Andronache. Nelson L. Passos. Ranette ..... [1] A. Aiken, A. Nicolau, âFine-grain Parallelization and the. Wavefront Method,â Languages and ...
Key words. parallel computing, automatic performance analysis, event tracing. 1. ... as collected by profiling tools, is sufficient to describe a multitude of ...... a Python class library for the detection of typical problems affecting the performan
E-mail: {waheed,yan}@nas.nasa.gov. Abstract ... performance counters can provide reliable measure- ... tions of using these counters for different performance.
http://parallel.bas.bg/â¼ivan/, http://parallel.bas.bg/â¼yavor/. 2 Institute of .... For the largest problems in this set of experiments (n = 96), parallel efficiency is.
1) Monash University, School of Business Systems, Vic 3800, Australia. David. .... An example of a partitioning function is to allocate each processor with.
Dries Kimpe1,2, Andrea Lani3, Tiago Quintino1,3, Stefan Vandewalle1,. Stefaan ... remote file for writing on multiple processors using posix semantics is ill defined and often leads to ... A Study of Real World I/O Performance in Parallel Scientific
Alan H. Karp and Horace P. Flatt. There are many ways to measure the ..... Gustafson [4] has argued that this is not how parallel processors are used. He argues ...
tool support is limited for applications written in programming models that present a partitioned ... Performance analysis tool, PGAS, UPC, SHMEM, GASP, ...... the results of tool scalability testing, and discuss two case studies performed us-.
B1(x)=3x2 ? 2x3. The periodic band reductions required are performed using succes- sive band reduction 5]. As in dense SYISDA 9], the Invariant Subspace ...
computing the currents on the second (main) reflector by utilizing the currents on .... 26.1 i 6.26. Table 5. The effect of a FORTRAN optimization, shown by T3D ...
For the shared memory platforms we implemented the multithreaded STRIKE ... 64GB RAM. OS .... node is affected by the file server's ability to handle these multiple read .... Double speed is achieved in distributed-system due to the dedicated ...
Oct 11, 2013 ... 12th VI-HPS Tuning Workshop, 7-11 October 2013, JSC, Jülich. Introduction to ...
2013, JSC, Jülich. Example: XNS .... “math” Operations?
Adobe Professional 7.0. PLEASE DO NOT RETURN ..... Short Course: Reduced-Order Model Development and Control Design (with K. Kunisch), SIAM Meeting ...
target applications are continuous- ow embedded systems. .... of video frames were processed in parallel using the originally speci ed sequential algorithm. ... mean latency is su cient provided the distribution of transfer times is in nitely divisib
Hung-Hsun Su, Max Billingsley III, Alan D. George. {su, billingsley, george}@hcs.ufl.edu ...... Chan, A., W. Gropp, and E. Lusk. (2000). Scalable log files for ...
Aug 13, 2014 - The recovered MnO2 nanoflowers exhibit high specific capacitance (294 .... CV, MnO2 nanoflowers electrode exhibits rectangular shape in CV.
experience of utilizing the experimental parallel metaheuristics framework ParMetaOpt, developed at Computer. Systems Dept., Technical University of Sofia.
Performance Studies of Shared-Nothing. Parallel Transaction Processing systems. Jie Li. Institute of Information Sciences & Electronics. University of Tsukuba ...
Performance Studies of Shared-Nothing Parallel Transaction Processing systems Jie Li Institute of Information Sciences & Electronics University of Tsukuba, Tsukuba Science City Japan
Contents s s s s s
s
Introduction Analytic Model Computer Simulation Study Scheduling Algorithms The WDPP (Wait-Depth Priority Protocol) CC Algorithm Concluding remarks and future work
Background s
Demands of parallel Transaction Processing (TP) • Traditional On-Line Transaction Processing (OLTP) applications: More and more users • Banking, Airline reservation …...
• Internet-based electronic commerce: Accessible to anyone in the world • Home shopping, Home banking …...
s
High performance parallel TP systems: No matter how much transactions volume grows, the systems should be able to be scaled to meet demands • To support thousands of processors and millions of device connections. • To scale throughput to accommodate high transaction processing rate.
Two Performance Factors for TP systems s
The degree of contention for hardware resources • Hardware resource contention: Competition for hardware resources • Hardware resource requirement can be met cost-effectively with shared-nothing parallel TP architecture
s
The degree of contention for data resources • Data contention: competition for data resources • Concurrency control (CC): Two-Phase Locking (2PL) • Scheduling Algorithms: FCFS (First-Come-First-Served), SCST (Synchronizing Completion of Sub-Transactions)
Performance Factors for Parallel TP s
The Degree of Declustering (DD) • Decluster a relation in a database, i.e., decide the number of nodes over which the relation should be declustered. • An essential factor that reflects the degree of parallelism.
s
Two other factors • Additional overheads caused by the parallel processing. • Concurrency control (CC) algorithms, e.g., Two-Phase Locking (2PL).
Shared-nothing parallel transaction (TP) Systems Frontend
Frontend
Frontend
Interconnection Network PE
P
PE
P
PE
M
M
M
Disk
Disk
Disk
P
PE: Processing Element P: Processor M: Memory
Examples: NonStop SQL of Tandem, Bubba of MCC, Gamma of the University of Wisconsin
Parallel transaction processing model Users Responds Requests
Node z+1 TM
Frontend
Frontend
Interconnection Network Processor Memory
Processor
P1
Memory
P2
Processor Memory
Pz
Partition 1 (P1) Partition 2 (P2) D data granules Partition z (Pz)
Transactions arriving at the system are accepted and started by transaction managers that reside at the frontend nodes. The results are also routed by these frontend nodes to their users. Each frontend node has an identical copy of a global directory of a relation in the database telling which PE holds which data of this relation. A transaction consists of several sub-transactions, each of which works on the PE where the data to be used reside, and all of which are executed in parallel. Assume that sub-transactions do not need data from each other. A sub-transaction is modeled as a sequence of data-processing steps. The number of steps is called the size of the sub-transaction. Each step involves a lock request for the granule to be accessed, followed by the granule access to disk, and a period of CPU usage for processing this granule. The 2PL CC method is used to resolve data contention and to maintain data consistency and integrity. To ensure transaction atomicity, the 2PC is applied.
Flow diagram used to characterize the parallel TP systems with dynamic 2PL with no-waiting policy Tx
Encounters a conflict Request a lock Ok
Access a disk for a granule
Tx1 k1
ki times for node i
Process this Granule with CPU
Tx2 k2
Txz kz
Let k1 ≥ k 2 ≥ L ≥ k f . a
N10
t N1i
N1k1
N2i
N2k2+1
aij Initiating
TTM
N20 Nz0
Nzi
N fi
Nzkz+1
Committing
Tcmt
Flow diagram for characterizing a parallel TP system a: abort rate; t: throughput; N fi : the number of subtransactions holding i locks at node f; N fk +1 : the number of subtransactions holding kf locks and waiting for its siblings for 2PC; TTM : mean residence time of a transaction at initiating stage. Tcmt : mean residence time of transactions at committing stage. M: MPL. f
An assumption s
s
If a subtransaction encounters a lock conflict, its siblings can be informed of the fact with little delay compared to the duration of a transaction step. Ex: Processor speed: 100MIPS Network bandwidth: 100Mbps Disk access: 13ms
The duration of a transaction step (Cs) is at least 14 ms. A lock conflict message is broadcasted to all the related nodes from the node where the lock occurs. The message is assigned a high scheduling priority. The communication time delay (Cc) is at most 0.1 ms. Cc/Cs=0.0071