Software Assistance for Data Caches - Inria

Recommend Documents

Precise timing analysis for direct-mapped caches - HAL-Inria

Jul 11, 2013 - Design Automaton Conference, DAC, Jun 2013, Austin,. TX, United States. .... (a) A control flow graph (CFG). Cache Inst. line c0 m1,m5 c1 m2,m6 c2 m3,m7 c3 .... cache line c0. We call B2 a vacuous ..... App.Fixed. Prec-Time.

Software Data Spreading: Leveraging Distributed Caches to Improve ...

Jun 5, 2010 - software data spreading, that leverages the cache capacity of ex- .... Cache. CPU. Cache data structures. Periodic thread migrations. Figure 1.

Software Data Spreading: Leveraging Distributed Caches to Improve ...

performance processor is now multicore, and multi-socket config- urations are common even on personal machines. Current main- stream offerings contain 4 to ...

Jigsaw: Scalable Software-Defined Caches - People.csail.mit.edu

bandwidth and energy (e.g., in a 64-bank cache with 8-way banks, in a virtual ... Jigsaw implements share monitoring hardware to let software find the optimal ...

Thread-Shared Software Code Caches - BurningCutlery

Derek Bruening, Vladimir Kiriansky, Timothy Garnett, and Sanjeev Banerji. Determina, Inc. {derek ...... translation to address real-life challenges. In Interna-.

Thread-Shared Software Code Caches - BurningCutlery

building thread-shared code caches and enumerate the difficulties of .... The best vendor-reported TPC- .... With an all-shared or an all-private code cache, links.

Software-Managed Caches: Architectural Support for ... - ECE @ UMD

University of Maryland, College Park. Email: [email protected]. The problem with traditional caches. It has long been recognized that, for good performance, ...

Engineering Software Assemblies for Participatory ... - HAL-Inria

Jan 25, 2016 - Democracy: The Participatory Budgeting Use Case .... development of such digital democratic assemblies and cit- izens. ... Instead, App-.

Software Abstractions for Parallel Architectures - HAL-Inria

Jan 30, 2015 - form of a program written within the model is simple. ...... CLUSTER128: a 32-node cluster from the Bordeaux site of the GRID5000 plat- ...... [Bikshandi 2006] Ganesh Bikshandi, Jia Guo, Daniel Hoeflinger, Gheorghe Almasi,.

Enabling Software Management for Multicore Caches ... - Google Sites

ment at different levels of software, such as operating systems, compilers, and ..... filing unit provide a set of count

Enabling Software Management for Multicore Caches with a ...

Enabling Software Management for Multicore Caches with a Lightweight. Hardware Support. Jiang Lin1, Qingda Lu2, Xiaoning Ding2, Zhao Zhang1, Xiaodong ...

Deployment versus data retrieval costs for caches in the plane ...

Keywords. EWI-25112; Network coding; Content distribution networks; METIS-306043; IR-92205; Pareto optimization. Cite this. Apa; Author; BIBTEX; Harvard ...

Supporting Strong Coherency for Active Caches in Multi-Tier Data

It has been well acknowledged in the research community that in order to provide or design a data-center environment which is efficient and offers high ...

Populating Personal Linked Data Caches using ...

Gothic Architecture may be enriched with context attributes based on broader topics such as Gothic Art. 2.2 Context-Aware Task Model. Our context-aware task ...

User Assistance for Effective Data Mining

There is a huge amount of information that is ... and potentially useful information from data. .... Completed her Bachelor's degree in Information Technology.

Dynamic Dictionary-Based Data Compression for Level-1 Caches

are guaranteeing that when a dictionary entry is decayed no live line in the cache can possibly refer to this entry. This allows us to replace dead entries in the ...

Hierarchical Caches for Grid Workflows

characterized by the need to access, analyze, and manipulate voluminous ... produced, great challenges for grid-enabled scientific workflow systems still lie ahead. ... improbable in a spatiotemporal environment where space and time are vast ...

Software for Writing Assistance and Improvement for Advanced

with the system we have built, TechWriter, are to analyze the students writing ..... a way that could destroy the original intention of the sentence, unbeknownst ..... TechWriter with multiple essaysâor multiple copies of the same essayâat a time

Efficient Data Management for Data-Intensive Applications ... - HAL-Inria

Feb 18, 2010 - a blob, reading/writing a subsequence of size bytes from/to the blob starting .... write-intensive applications running in Desktop Grids that have.

Named Data Networking: a Natural Design for Data ... - HAL-Inria

Dec 19, 2013 - information dissemination in both wired and wireless networks have pushed the ... ages the basic NDN forwarding fabric enhanced with packet.

food assistance - UNHCR Data Portal

The IASC Gender Marker is a tool that codes, on a 2 -â0 scale, whether or not a humanitarian project is designed to en

Associative Caches in Formal Software Timing Analysis - CiteSeerX

Data flow analysis and local simulation of program segments are combined to safely pre- dict cache line contents for associative caches in software running.

An Instruction to Accelerate Software Caches - AES - TU Berlin

show that the proposed instruction accelerates the software cache access time by a factor of 2.6. .... First, there is no need to perform the cache access in a single or few ..... (2007) http://personals.ac.upc.edu/alvarez/hdvideobench/index.html.

Design and Implementation of Software-Managed Caches ... - CiteSeerX

Sep 3, 2009 - 4096. 8192. CG equake. FT. IS. N o rm alized N u mber of Misses. FIFO. Clock. LRU. 0.0. 0.2. 0.4. 0.6. 0.8. 1.0. 1.2. FAC. 4W. C. IIC. ESC. FAC.

Software Assistance for Data Caches - Inria

Download PDF

103 downloads 25146 Views 309KB Size Report

Comment

Keywords: software-assisted caches, data locality, numerical codes. 1 Introduction .... the cache locations of the requested physical lines are stored in a bu er.

Software Assistance for Data Caches O. Temam PRiSM Laboratory Versailles University 78 Versailles France

Email:

[email protected]

N. Drach LRI Orsay University 91 Orsay France

Email:[email protected]

Abstract Hardware and software cache optimizations are active elds of research, that have yielded powerful but occasionally complex designs and algorithms. The purpose of this paper is to investigate the performance of combined though simple software and hardware optimizations. Because current caches provide little exibility for exploiting temporal and spatial locality, two hardware modi cations are proposed to support these two kinds of locality. Spatial locality is exploited by using large virtual cache lines which do not exhibit the performance aws of large physical cache lines. Temporal locality is exploited by minimizing cache pollution with a bypass mechanism that still allows to exploit spatial locality. Subsequently, it is shown that simple software informations on the spatial/temporal locality of array references, as provided by current data locality optimization algorithms, can be used to increase cache performance signi cantly. The performance and design tradeos of the proposed mechanisms are discussed, Software-assisted caches are also shown to provide a very convenient support for further enhancement of data locality optimizations.

Keywords: software-assisted caches, data locality, numerical codes.

1 Introduction

This paper derives from several observations on application codes, cache designs and state-of-the-art compiler-optimizers. Let us rst discuss the spatial and temporal locality properties of numerical codes. With respect to temporal reuse, gure 1a shows the reuse distance distribution of the traced memory references for the numerical benchmarks used in this paper (0 corresponds to data referenced only once). First, it appears that a sizable amount of data are used only once or very few times, so that techniques for hiding compulsory misses are required. It also appears that reuse distances are often larger than 1000 references, while for these same traces the average lifetime of a cache line in a 8-kbyte cache with a 32-byte cache line is approximately equal to 2500 references. So, for these codes the temporal reuse is likely to be disrupted by cache pollution. With respect to spatial reuse, gure 1b shows the average vector length of requests issued by load/store instructions.1 This vector length proves to be often larger than the cache line size currently used in small on-chip caches (32 bytes). In other terms, there is This work was supported by the Esprit Agency DG XIII under Grant No. APPARC 6634 BRA III. A vector sequence terminates when the instruction has not been used during more than 500 references, (i.e., a value much smaller than the average lifetime of a cache line,) or when the stride is greater than 32 bytes (i.e., the corresponding spatial locality would not be exploited with a cache line size of 32 bytes). 1

1

Vector size