A Lightweight Algorithm for Dynamic If-Conversion ... - Semantic Scholar

Recommend Documents

DYNAMIC OPTIMIZATION ALGORITHM FOR ... - Semantic Scholar

Istanbul Technical University, Department of Civil Engineering, 34469, ... methodology for the vertical alignment optimization of a highway using dynamic.

DYNAMIC OPTIMIZATION ALGORITHM FOR ... - Semantic Scholar

Page 1 ... which can be solved by a multistage optimization technique. ... methodology for the vertical alignment optimization of a highway using dynamic.

A Dynamic Batch Algorithm for Maintaining a ... - Semantic Scholar

[4] Irit Katriel and Hans L. Bodlaender. Online topological ordering. In Proceedings of the ACM Symposium on. Discrete Algorithms (SODA), page (to appear).

SLIMâA Lightweight Environment for ... - Semantic Scholar

Collaborative software development is a research paradigm, which has emerged in the broader ... ing suitable for support by lightweight collaboration tools. ..... 7 The Canvas element was introduced by Apple in 2004 and was included in the.

A memetic algorithm for optimal dynamic design of ... - Semantic Scholar

more sensors were used once as CHs while less were used twice or three times .... wireless sensor and actuator networks, Electronics Letters 44 (20) (2008).

A Dynamic Flow Control Algorithm for LTE ... - Semantic Scholar

Ping-Chen Lin, Ray-Guang Cheng, Senior Member, IEEE, and Yu-Jen Chang ... C. Lin, R. G. Cheng, and Y. J. Chang are with the Dept. of Electronic.

A Dynamic Replica Selection Algorithm for ... - Semantic Scholar

wards it to the destination replica group through Maestro-. Ensemble, as shown in Figure 1. For the sake of clarity, in this figure we have illustrated a server ...

A Dynamic Embedding Algorithm for Wireless ... - Semantic Scholar

Jun 26, 2014 - Email: {vandebej, ahmadih, linda.doyle} @tcd.ie ... with a mobile network operator to obtain bulk access to network services at wholesale rates, ... resources from the physical infrastructure provider (InP) based on more specific ...

A Efficient and Dynamic Algorithm for Accurate ... - Semantic Scholar

Mount Pleasant, MI 48859 [email protected]. Abstract- Mobile robots play a prominent role in many industrial applications to simplify the job of an individual ...

A memetic algorithm for multi-objective dynamic ... - Semantic Scholar

Oct 13, 2007 - said and done about the use of evolutionary algorithms in a multi-objective scenario. .... Only one facility can be open at each location, in each.

A dynamic programming algorithm for the local ... - Semantic Scholar

the network, and B to an upper bound on concentrator capacity. ... Consultants, Utrecht, The Netherlands. ... of computer communication networks and tele-.

A Dynamic Algorithm for Reachability Games ... - Semantic Scholar

2 UniversitÃ¤t Leipzig, Institut fÃ¼r Informatik, Germany ..... however still say that a reachability game G is played on trees if its arena is a forest F. A node u.

A Latency-Aware Algorithm for Dynamic Service ... - Semantic Scholar

proposed service hosting architecture and its protocols to support ... automatically select the best service, depending on a number ... across the Internet; the dotted lines show the peer-to-peer overlay topology ..... 7-10). Subsequently, the algori

SLIMâA Lightweight Environment for ... - Semantic Scholar

paper then describes the design and implementation of the SLIM environment with special emphasis .... of UML diagrams in a software engineering context [10].

A Dynamic Rescheduling Algorithm for Resource ... - Semantic Scholar

Sep 21, 2011 - riodically provides information about the systems' states. ... In Section 2 the related work regarding the tech- niques for re-scheduling ... Another approach considers the allocation of jobs to resources using batch mode methods. ....

Imperialist Competitive Algorithm with Dynamic ... - Semantic Scholar

Jan 23, 2017 - Abstract: In this paper we are presenting a method using fuzzy logic for dynamic parameter adaptation in the imperialist competitive algorithm, ...

Dynamic Offloading Algorithm in Intermittently ... - Semantic Scholar

Cloud providers are not necessarily to be business-level providers such as Amazon. Instead, a .... 10â2 s [12]. The RTT of WiFi networks are in the order of.

Lightweight Modeling of Dynamic Websites using ... - Semantic Scholar

Lightweight Modeling of Dynamic Websites using UML. Tim Schattkowsky. C-LAB. FÃ¼rstenallee 11. 33102 Paderborn, Germany. +49 (0)5251 60.6142.

Lightweight Modeling of Dynamic Websites using ... - Semantic Scholar

running example of an online reservation system we shall demonstrate the derivation of code templates in HTML and PHP from use case and activity diagrams.

Energy Efficient Dynamic Cluster Algorithm for Ad ... - Semantic Scholar

Broadband Communication Center of Excellence Program. non-active nodes can enter ..... After joining the former AT&T Bell Labs in 1986, his work moved to.

Efficient dynamic Monte Carlo algorithm for time ... - Semantic Scholar

Oct 18, 2006 - efficiently is critical to the performance of algorithms ap- plied to study surface chemistry, and in this paper, we are concerned primarily with this ...

Applications of a semi-dynamic convex hull algorithm - Semantic Scholar

JOHN HERSHBERGER and SUBHASH SURI ... planar convex hull problem, where h denotes the number of points on the hull. .... Let us define P(v) = P c~ S(v).

A dynamic Bayesian network data fusion algorithm ... - Semantic Scholar

Nov 2, 2011 - network data fusion algorithm for estimating leaf area index using ..... where x is LAI, t is DOY and a, b, c and d are undetermined coefficients that can be ..... rated to facilitate simulation of the growing crop (Richter et al. 2008)

A Dynamic Ant Colony Based Routing Algorithm ... - Semantic Scholar

In this paper we study the performance of ant colony based routing algorithms in mobile ad hoc networks (MANETs) and present SAMP-DSR, a new algorithm to ...

A Lightweight Algorithm for Dynamic If-Conversion ... - Semantic Scholar

Download PDF

1 downloads 294 Views 327KB Size Report

Comment

Jan 14, 2010 - Checking Memory Coalesing. Converting non-coalesced accesses into coalesced ones. Checking data sharing p

An Optimizing Compiler for GPGPU Programs with Input-Data Sharing Yi Yang

Ping Xiang

Jingfei Kong

Huiyang Zhou

Dept. of ECE North Carolina State University

School of EECS Univ. of Central Florida

School of EECS, UCF Univ. of Central Florida

Dept. of ECE North Carolina State University

[email protected]

[email protected]

[email protected]

[email protected]

Abstract Developing high performance GPGPU programs is challenging for application developers since the performance is dependent upon how well the code leverages the hardware features of specific graphics processors. To solve this problem and relieve application developers of low-level hardware-specific optimizations, we introduce a novel compiler to optimize GPGPU programs. Our compiler takes a naive GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler then analyzes the code, identifies memory access patterns, and generates optimized code. The proposed compiler optimizations target at one category of scientific and media processing algorithms, which has the characteristics of input-data sharing when computing neighboring output pixels/elements. Many commonly used algorithms, such as matrix multiplication, convolution, etc., share such characteristics. For these algorithms, novel approaches are proposed to enforce memory coalescing and achieve effective data reuse. Data prefetching and hardwarespecific tuning are also performed automatically with our compiler framework. The experimental results based on a set of applications show that our compiler achieves very high performance, either superior or very close to the highly finetuned library, NVIDIA CUBLAS 2.1.

The input of our compiler is a naïve GPU kernel function, which simply identifies the fine-grain work item that can be executed in parallel. A typical candidate is to compute one data element in the output domain. The code segment shown in the Figure 1 is such a naïve kernel for matrix multiplication, which calculates one element in the product matrix. This kernel is functionally correct but does not include any optimization for performance. Our compiler analyzes the naïve kernel, checks the off-chip memory access patterns, and converts the non-coalesced memory accesses into coalesced ones. Then the compiler finds possible data sharing across neighboring threads and thread blocks. Based on the data sharing pattern, the compiler intelligently merges threads and/or thread-blocks to improve memory reuse. Additionally, the compiler schedules the code to enable data prefetching so as to overlap the computation and memory access latencies. float sum = 0; for (int i=0; i