Implementing a High Performance Tensor Library - Walter Landry

Recommend Documents

High-Performance Tensor Contractions for GPUs

person 0.1 ..... CPU and GPU tensors can be expressed through the function copy (List- .... Haidar, Karlin, Kolev, Masliah, and Tomov. 0. 20. 40. 60. 80. 100 .... ANL/MCS-P5361-0615, Mathematics and Computer Science, Arbonne ...

AutoMT, a library for tensor operations and its performance evaluation ...

Jul 1, 2015 - Department of Mechanical Engineering, Graduate School of Science and ... for âAuto (Automation)â-âM (Matrix)â-âT (Tensor)â and is a kind of DSL (Domain ... Therefore, AutoMT can accelerate computer program developments.

Implementing Molecular Dynamics on Hybrid High Performance ...

posing the work by atom or using a force-decomposition that evenly divides force .... gle desktop. ... decomposition, atom decomposition, force decomposition or.

Implementing a High Performance Work System ... - Semantic Scholar

Aug 28, 2006 - Engineering from Oklahoma State University and a BS in Physics from ... skills to perform their jobs in a more effective manner as well as the opportunity to assume ... work, people, technology and information in a manner that ...

Experiences Implementing a High-Performance TCP in ... - CiteSeerX

otherwise, to republish, to post servers or to redistribute to lists, requires prior specific permission and/or a fee. ... Experiences implementing a high performance TCP in user-space ...... networking then mechanisms have to be in place to check.

Implementing High Performance Multicast in a ... - Cornell University

puter Science, Cornell University, Ithaca, NY 14850, USA; Email: [email protected], .... while transmitting to R, the node can batch these messages ... Region. Senders. Send to B. Figure 3: To multicast to a group, QSM sends a copy to each ...

IMPLEMENTING TENSOR ANALYSIS IN Mathematica WITH ...

ABSTRACT (Maximum 200 words). Expressions that play roles in tensor analysis , such as Christofell symbols and curvature tensors, are coded as Mathematica.

Octave Landry - Semantic Scholar

Sep 17, 2013 - patients were an indication of his talents. From 1852,. Fig. 1. Octave Landry by Gustave Courbet (1819â1877) around. 1863. Downloaded by:.

Implementing Ultra High Performance Concrete (UHPC) - Iowa State ...

Implementing Ultra High Performance Concrete (UHPC) with Dowel Bars in Longitudinal ... truck load testing results of the shear keys and the dowel bars.

High Performance Field Emitters - Wiley Online Library

Feb 18, 2016 - Eon and Ethr, defining them as 10% and 30%, respectively, of a normalized .... proportion of the data points lie at the top, with high Î² , and to the left of the ..... [72] T. H. Seo , K. J. Lee , T. S. Oh , Y. S. Lee , H. Jeong , A.

DESIGNING AND IMPLEMENTING A PERFORMANCE ... - RUN

Still today, some small companies use random measures in their business ..... management accounting demands measuring wo

DESIGNING AND IMPLEMENTING A PERFORMANCE ... - RUN

preventing them to have information technology systems as developed as those of larger companies (Levy et al., .... Acco

Implementing performance measurement systems: a literature ...

This paper reviews the different performance measurement system design processes ... systems: a literature review', Int. J. Business Performance Management,. Vol. .... recourse to higher-level techniques, such as cost accounting. However .... Figu

Implementing a Comprehensive Performance Management Approach ...

Performance Management Approach in Community Policing Organizations: An ... Moving from a traditional performance system

RaftLib: A C++ Template Library for High Performance Stream Parallel ...

Stream processing or data-flow programming is a compute ..... Custom split reduce objects can be created by the ...... http://www.tiobe.com/index.php/content/.

TerraFly: A High-Performance Web-based Digital Library System for ...

This system utilizes a database engine, web technology, dynamic mosaicking of digital images, and a multithread server design developed by our group at the ...

A High-Performance Active Digital Library - LIGO - Caltech

providing high-performance data-retrieval and data-mining services, with ... allowing the computer to choose when to run a job rather than the human user doing ...

PyFAI: a Python library for high performance ...

This can mean running in a dedicated server (Tango device server,. LImA (Homs et al., 2011) where things must be fast and stable over weeks of operation and ...

High Performance OTFTs Fabricated Using a ... - Wiley Online Library

Jul 13, 2016 - Liquid Crystalline Material of 2-(4-Dodecyl phenyl)[1] ... when the C12-Ph-BTBT thin films are annealed at the SmX phase before the electrode ...

PyFAI: a Python library for high performance ... - Semantic Scholar

Dec 19, 2014 - PyFAI: a Python library for high performance azimuthal ... contribution describes how to convert an image into a radial profile using the.

A High-Performance Graph Processing Library on the ... - Google Sites

Gunrock of- fers: -the best performance on GPU graph analytics; ... build/lib/libgunrock.so'). ### read in .... â¢Gunro

PyFAI: a Python library for high performance azimuthal

Dec 19, 2014 - core systems like Graphical Processing Units (GPU) or on accelerators via ... it is an averaging of the signal along the azimuthal angle.

PM : A High-Performance Communication Library for Multi ... - CiteSeerX

larger number of sender nodes causes each window bu er size smaller and .... transmit" method or \Return To Sender [3]" which sends back a messages which ...

Limbo: A Flexible High-performance Library for ... - Open Journals

Jun 26, 2018 - The implementation of Limbo follows a policy-based design (Alexandrescu 2001) ... Modern C++ Design: Generic Programming and Design.

Implementing a High Performance Tensor Library - Walter Landry

Download PDF

48 downloads 10595 Views 289KB Size Report

Comment

We also want to contract tensors together that result in a tensor expression, such .... the compiler in the case where the programmer does know at compile-time ...

Implementing a High Performance Tensor Library Walter Landry University of Utah University of California San Diego 9500 Gilman Dr Mail Code 0424 La Jolla, CA 92093-0424 Tel:858-822-5053 [email protected] Abstract Template methods have opened up a new way of building C++ libraries. These methods allow the libraries to combine the seemingly contradictory qualities of ease of use and uncompromising eciency. However, libraries that use these methods are notoriously dicult to develop. This article examines the benets reaped and the diculties encountered in using these methods to create a friendly, high performance, tensor library. We nd that template methods mostly deliver on this promise, though requiring moderate compromises in usability and eciency.

1

2

1 Introduction Tensors are used in a number of scientic elds, such as geology, mechanical engineering, and astronomy. They can be thought of as generalizations of vectors and matrices. Consider the rather prosaic task of multiplying a vector P by a matrix T , yielding a vector Q 









Qx Txx Txy Txz Px       Qy  =  Tyx Tyy Tyz   Py  . Qz Tzx Tzy Tzz Pz If we write out the equations explicitly then

Qx = Txx Px + Txy Py + Txz Pz , Qy = Tyx Px + Tyy Py + Tyz Pz , Qz = Tzx Px + Tzy Py + Tzz Pz . Alternatively, we can write it as

Qx =

X

Txj Pj

j=x,y,z

Qy =

X

Tyj Pj

j=x,y,z

Qz =

X

Tzj Pj

j=x,y,z

or even more simply as

Qi =

X

Tij Pj ,

j=x,y,z

where the index i is understood to stand for x, y , and z in turn. In this example, Pj and Qi are vectors, but could also be called rank 1 tensors (because they have one index). Tij is a matrix, or a rank 2 tensor. The more indices, the higher the rank. So the Riemann tensor in General Relativity, Rijkl , is a rank 4 tensor, but can also be envisioned as a matrix of matrices. There are more subtleties involved in what denes a tensor, but it is sucient for our discussion to think of them as generalizations of vectors and matrices. Einstein introduced the convention that if an index appears in two tensors that multiply each other, then that index is implicitly summed. This mostly removes the need to write P the summation symbol j=x,y,z . Using this Einstein summation notation, the matrix-vector multiplication becomes simply Qi = Tij Pj . Of course, now that the notation has become so nice and compact, it becomes easy to write much more complicated formulas such as the denition of the Riemann tensor m i i i = dGijkl − dGilkj + Gm Rjkl jk Gml − Glk Gmj .

There are some subtle dierences between tensors with indices that are upstairs (like T i ), and tensors with indices that are downstairs (like Ti ), but for our purposes we can treat them

3 the same. Now consider evaluating this equation on an array with N points, where N is much larger than the cache size of the processor. We could use multidimensional arrays and start writing lots of loops

for(int n=0;n