Achieving Scalable Locality With Time Skewing - Computer Science ...

Recommend Documents

Increasing Temporal Locality with Skewing and Recursive ... - Rice CS

Dept. of Computer Science. Rice University. Houston, TX [email protected]. John Mellor-Crummey. Dept. of Computer Science. Rice University. Houston, TX.

Achieving Efficient Distributed Scheduling with ... - Computer Science

Achieving Efficient Distributed Scheduling with Message Queues in the Cloud ... Ramamurty, Ke Wang, Ioan Raicu .... higher number of jobs in order to improve the application ... scheduling/management system that satisfies four major.

Locality Preserving Feature Learning - Computer Science

Locality Preserving Feature Learning. Quanquan Gu. Marina Danilevsky. Zhenhui Li. Jiawei Han. Department of Computer Science. University of Illinois at ...

achieving scalable hardware verification with symbolic simulation

with John Davis. John has created a very ... that lasted long after my undergraduate studies and spurred many of the publications that led to this research work.

achieving scalable hardware verification with ... - Semantic Scholar

David Dill. I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of ...

Illumination Normalization with Time-dependent ... - Computer Science

Abstract. Cast shadows produce troublesome effects for video surveillance systems, typically for object tracking from a fixed viewpoint, since it yields appearance ...

Efficiently Achieving List Decoding Capacity - Computer Science ...

Chapter 14. Efficiently Achieving List Decoding Capacity. In the previous chapters, we have seen these results related to list decoding: • Reed-Solomon codes of ...

A Scalable Services Architecture - Cornell Computer Science

A Scalable Services Architecture. â. Tudor Marian. Ken Birman. Department of Computer Science. Cornell University, Ithaca, NY 14853. {tudorm,ken ...

Scalable Byzantine Agreement - UNM Computer Science

... examples where such malicious peers acting alone or in concert have wreaked havoc on a p2p ..... Information Processing Letters, 14(4):183â186, June 1982.

Scalable Leader Election - UNM Computer Science

corrupt), and (1âb)n of which are good, for some fixed b. Our goal is to design an algorithm that ensures a good. âDepartment of Computer Science, University ...

Scalable Elastic Systems Architecture - Computer Science - Boston ...

[email protected] ... software and wanted to use Amazon's EC2 HPC offering for an 8 ... massive cloud resources elastically, and system software and li-.

Exploiting Constructive Interference for Scalable ... - Computer Science

AbstractâExploiting constructive interference in wireless net- works is an emerging trend for it allows multiple senders transmit an identical packet ...

ACHIEVING SCALABLE PARALLEL MOLECULAR ... - CiteSeerX

a better decomposition increases in cost with the quality of prediction, while ..... MODELING PARALLEL MD DOMAIN DECOMPOSITIONS USING A BSP MODEL.

Scalable attribute-aware network embedding with locality - arXiv

Adding attributes for nodes to network embedding helps to improve the ability of .... world, and network analysis can benefit a lot from the co-analysis of network ...

Time Skewing Made Simple - Semantic Scholar

Frigo and Strumpen [3] introduced a cache oblivious time skewing scheme that can be cast into few lines of code and adapts auto- matically to the memory ...

SLAW: a Scalable Locality-aware Adaptive Work

Notice that T2 spawns T3 at S3, so ..... 1) Niagara 2: This system includes a 8-core 64-thread. 1.2GHz ... 2) Xeon SMP: This system includes four Quad-Core Intel ...... openmp tasks suite: A set of benchmarks targeting the exploitation of.

Scalable Locality-Conscious Multithreaded Memory Allocation - People

between 8 KB and 4MB, while the IBM Power4/Power5, Intel. Xeon and UltraSPARC processors provide two page sizes, a small page size of 4 or 8 KB and a ...

Characterizing Reference Locality in the WWW - Computer Science

This is demonstrated in Figure 1 for all 2135 documents accessed in the BU ..... software environments and found that the distributions of intermiss distances are ...

Characterizing Reference Locality in the WWW - Computer Science

We also show that spatial locality in a reference stream can be characterized ... Reference streams exhibiting temporal locality can benefit from caching and ...

Data Locality Optimization - LSU Computer Science - Louisiana State ...

[email protected]. Abstract. The goal of our project is the development of a program synthesis system to facilitate the development of high-performance ...

COMPUTER SCIENCE WITH C++

www.cppforschool.com. COMPUTER SCIENCE. WITH C++. FOR CBSE CLASS 12. Revision Notes. Part -I. [Structure, Class and Object, Constructor and.

Real-Time Android with RTDroid - Computer Science and Engineering

amine Android's application framework, which provides program- ming constructs ..... backs that the Android framework calls whenever there is any re- quested ...

Achieving Long-Term Surveillance in VigilNet - Computer Science

strategies in VigilNet, a major recent effort to support long- term surveillance using power-constrained sensor devices. We integrate a novel tripwire service with ...

Achieving distributed user access control in ... - Computer Science

design a scheme to counteract the access control system. Besides, we also addresses node duplication attacks and. DoS at

Achieving Scalable Locality With Time Skewing - Computer Science ...

Download PDF

6 downloads 0 Views 224KB Size Report

Comment

David Wonnacott. Haverford ...... [KMP+95] Wayne Kelly, Vadim Maslov, William Pugh, Evan Rosser, Tatiana Shpeisman, and David Wonnacott. The Omega.

Rutgers University Department of Computer Science Technical Report 378

Submitted 12 Feb 1999

Achieving Scalable Locality With Time Skewing David Wonnacott Haverford College Haverford, PA 19041

[email protected] Revised to October 9, 1998

Abstract The widening gap between processor speed and main memory speed has generated interest in compiletime optimizations to improve memory locality. The possibility that processors will continue to outpace memory systems raises the question of whether these techniques can be used to produce ever higher degrees of locality. In this article, we discuss techniques that can be used to produce \scalable locality": locality that can be adjusted for any ratio of processor and memory speeds. We identify a class of calculations for which existing techniques do not generally produce scalable locality, and give an algorithm for obtaining scalable locality for a subset of this class. We include empirical evidence that this transformation can provide dramatic speedups when processor speed is far higher than memory speed.

1

Introduction

The widening gap between processor speed and main memory speed has generated interest in compile-time optimizations to improve memory locality (the degree to which values are reused while still in cache [WL91]). A number of techniques have been developed to improve the locality of \scienti c programs" (programs that use loops to traverse large arrays of data) [GJ88, WL91, Wol92, MCT96, Ros98]. These techniques have generally been successful in achieving good performance on modern architectures. However, the possibility that processors will continue to outpace memory systems raises the question of whether these techniques can be \scaled" to produce ever higher degrees of locality. We say a calculation exhibits scalable locality if its locality can be made to grow at least linearly with the problem size while using cache memory that grows less than linearly with problem size. In this article, we show that some calculations cannot exhibit scalable locality, while others can (typically these require tiling). We then discuss the use of compile-time optimizations to produce scalable locality. We identify a class of calculations for which existing techniques do not generally produce scalable locality, and give an algorithm for obtaining scalable locality for a subset of this class. Our techniques make use of value-based dependence relations [PW93, Won95, PW98], which provide information about the ow of values in individual array elements among the iterations of a calculation. We initially ignore issues of cache interference and of spatial locality, and return to address these issues later. We de ne the balance of a calculation (or compute balance) as the ratio of operations performed to the total number of values involved in the calculation that are live at the start or end. This ratio measures the This

is supported by NSF grant CCR-9808694

1

for (int t = 0; t