Speculative Parallelization on GPGPUs - UCR CS - University of ...

Recommend Documents

Speculative Parallelization Needs Rigor - UCR CS

Sep 23, 2012 - Speculative Parallelization Needs Rigor. âProbabilistic Analysis for Optimal Speculation of. Finite-State Machine Applications. Zhijia Zhao, Bo ...

Supporting Speculative Parallelization in the Presence of ... - UCR CS

access tracking overhead as safe assumption is consider the entire data structure as ..... to hold its non-speculative state address now. In Fig. 3, when line.

Supporting Speculative Parallelization in the Presence of ... - UCR CS

cial buffers [10, 25, 26], versioning cache [9], versioning memory ...... This can cause L2 cache pollution and hence dr

Speculative Parallelization Using State Separation and ... - UCR CS

Jun 6, 2010 - speculations (i.e., dependence assumption violations) at runtime and appropriately deal with the ... out the benefits of parallelism. One solution is to ... true at runtime, using speculative parallelization technique will not speed up

Fastpath Speculative Parallelization

At the University of Rochester, this work was supported by NSF grants CNS- ... from Sun and IBM; and by financial support from IBM, Intel, and Microsoft.

VALVE - UCR CS - University of California, Riverside

technique to reduce the power consumption in the off-chip data buses. While past ... (VALVE), a scheme capable of encoding both full matches and variable length ... store a finite set of table entries at the encoder and decoder ends. Each CAM ...

Fastpath Speculative Parallelization - Semantic Scholar

Our system, Fastpath, works entirely in software within a single address space, making it ..... validation checks are performed by a single bookkeeping thread.

Principles of Speculative Run{time Parallelization - CiteSeerX

computers, software developers are often forced to tediously hand{code optimiza- ... piling system that employs run-time and classic techniques in tandem to auto- ...... Dimensional Finite Element Code For Solid and Structural Mechanics.

O ... - UCR CS

model for exploiting parallelism in loops with I/O operations. Par- allelizing hybrid loops ..... To apply DOALL parallelization to a hybrid loop we need to break.

Saturation NOW - UCR CS

NASA grant NAG-1-02095 and. NSF grants CCR-0219745 and ACI-0203971 ...... 267 298 340 337 371. 491. 275 7.04 . 10291 737 â57414. (26) 355 486 1098.

Speculative Parallelization of Sequential Loops on Multicores - LLVM

Jun 19, 2009 - We apply the above approach to speculative parallelization of loops in ... with which success or failure of speculation can be ascertained; ...

Continuous Speculative Program Parallelization in Software

â¡Intel China Research Center. Abstract ... Categories and Subject Descriptors D.3.4 [Software]: PRO- .... copied three times in a scheme we call triple updates.

Speculative Parallelization Using State Separation and ... - CiteSeerX

Speculative Parallelization Using State. Separation and Multiple Value Prediction. Chen Tian, Min Feng, Rajiv Gupta. University of California, CSE Department, ...

Hardware for Speculative Reduction Parallelization and Optimization

Consider, as an example, the loop shown in Figure 1, where arrays f, g and k depend on the ... Distributed Shared-Memory (DSM) multiprocessors, some important codes would .... Consequently, we add a Read and a Write bit per word (4 bytes). ... cohere

Architectural Support for Scalable Speculative Parallelization in ...

Load, and Safe Store bits1. The Dirty bit is set when a processor writes to any one of the words in the line, while the Invalid bit is set when the word is invalidated.

Keyword Search on Spatial Databases - UCR CS

search on spatial data and keyword search on text data have been extensively ... data structures and algorithms used in spatial database search and Information ...... In VLDB, 1996. [LKP95] Dik Lun Lee, Young Man Kim, Gaurav Patel: Efficient.

bioinformatics applications note - UCR CS - University of California ...

Apr 15, 2012 - BIOINFORMATICS APPLICATIONS NOTE. Vol. 28 no. ... still use hashing (including RMAP-bs, SOAP, MAQ and BRAT). The fastest tools for ...

Continuous Spatiotemporal Trajectory Joins - UCR CS - University of

Second, we require an easy to compute lower bound distance function between the trajectory .... 2 shows that moving object was in grid cell 2 in the time interval (t1; t2) and so on. A ..... results for objects with average speed less than 60 mph).

Proceedings Template - WORD - UCR CS - University of California ...

memory footprint such as a cell phone, pacemaker or. âsmartshoeâ [34]. ...... significant wandering baseline (not illustrated) in parts of this dataset has no medical ...

Incremental Medians via Online Bidding - UCR CS - University of ...

Sep 28, 2007 - Abstract In the k-median problem we are given sets of facilities and customers, and distances between them. For a given set F of facilities, the ...

Warp Processors - UCR CS - University of California, Riverside

We describe a new processing architecture, known as a warp processor, that utilizes a ... Permission to make digital or hard copies of part or all of this work for ..... ital signal processors to achieve zero loop overhead, meaning that cycles are.

Proceedings Template - WORD - UCR CS - University of California ...

FPGA. Streams-C [5] allows programmers to write parallel C programs and generate RTL VHDL codes for a target Annapolis Microsystems Firebird FPGA board.

Proceedings Template - WORD - UCR CS - University of California

ABSTRACT. It has become commonly accepted that higher abstraction programming languages are necessary for a wider acceptance of reconfigurable ...

(SARA) for Audience Behavior Analysis - UCR CS - University of ...

second commercial ad during the Super Bowl in US has hit. 4 million .... The rest of the paper is organized as follows. Section II reviews related work. In Section ...

Speculative Parallelization on GPGPUs - UCR CS - University of ...

Download PDF

2 downloads 0 Views 69KB Size Report

Comment

Feb 25, 2012 - Abstract. This paper overviews the first speculative parallelization technique for GPUs that can exploit parallelism in loops even in the pres-.

Speculative Parallelization on GPGPUs Min Feng

Rajiv Gupta

Laximi N. Bhuyan

University of California, Riverside {mfeng, gupta, bhuyan}@cs.ucr.edu

Abstract This paper overviews the first speculative parallelization technique for GPUs that can exploit parallelism in loops even in the presence of dynamic irregularities that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, computation, misspeculation check, result committing, and misspeculation recovery. We perform misspeculation check on the GPU to minimize its cost. We optimize the procedures of result committing and misspeculation recovery to reduce the result copying and recovery overhead. Finally, the scheduling policies are designed according to the types of cross-iteration dependences to reduce the misspeculation rate. Our preliminary evaluation was conducted on an nVidia Tesla C1060 hosted in an Intel(R) Xeon(R) E5540 machine. We use three benchmarks of which two contain irregular memory accesses and one contain irregular control flows that can give rise to cross-iteration dependences. Our implementation achieves 3.6x-13.8x speedups for loops in these benchmarks. Categories and Subject Descriptors guages]: Processors – compilers

D.3.4 [Programming Lan-

General Terms Performance

1. Introduction Dynamic irregularities have been widely studied for high performance computing on General-Purpose Graphics Processing Units (GPGPUs or GPUs for short) [3–5]. Existing works have focused on optimizing the performance in the presence of such irregularities. However, in this work, we consider a new class of dynamic irregularities in loops that may cause cross-iteration dependences at runtime. Thus, presence of such dynamic irregularities prevents existing techniques from parallelizing the loops for GPUs. In particular, we have identified two types of dynamic irregularities that may dynamically cause cross-iteration dependences to arise. Next we illustrate them using examples. Dynamic irregular memory accesses refer to memory accesses whose memory access patterns are unknown at compile time. They may result in infrequent cross-iteration dependences at runtime. Figure 1(a) shows an example, where each iteration of the loop reads A[P [i]] and writes to A[Q[i]]. The memory access patterns of A[P [i]] and A[Q[i]] are determined by the runtime values of the elements in arrays P and Q. It is possible that an element in

for (i=0; i