IMPLEMENTING BLAS LEVEL 3 ON THE CAPâII Abstract The Basic ...

Recommend Documents

IMPLEMENTATION OF THE BLAS LEVEL 3 ON THE AP1000 ...

PRELIMINARY REPORT ... This paper discusses the techniques used to achieve this goal, together with the ... 1991 Mathematics Subject Classification. Primary ...

Implementation of the BLAS Level 3 and LINPACK ... - CiteSeerX

in n unknowns, using Gaussian elimination with partial pivoting and double-precision (64-bit) ... Section 4 describes the implementation of parallel matrix.

Level-3 BLAS on the TI C6678 multi-core DSP - UT Computer Science

Level-3 BLAS on the TI C6678 multi-core DSP. Murtaza Ali, Eric Stotzer. Texas Instruments. {mali,estotzer}@ti.com. Francisco D. Igual. Dept. Arquitectura de.

A Linear Algebra Core Design for Efficient Level-3 BLAS

A Linear Algebra Core Design For Efficient Level-3 BLAS. Ardavan Pedram1, Syed Zohaib Gilani2, Nam Sung Kim2,. Robert van de Geijn3, Michael Schulte4, ...

Level-3 BLAS on a GPU: Picking the Low ... - UT Computer Science

Apr 30, 2009 - Level-3 BLAS on a GPU: Picking the Low Hanging Fruit. Francisco D. Igual. Gregorio Quintana-OrtÄ±. Depto. de IngenierÄ±a y Ciencia de ...

Implementing Sustainable Development on the Local Level ...

Feb 8, 2012 - Indeed, sustainability concepts and programs appear to have spread throughout ..... Cumulative share of buildings greater than 10,000 sq. ft LEED certified is .... hosts the state capital, and its concentration of high-tech companies (w

IMPLEMENTING THE HIGH LEVEL ARCHITECTURE IN THE ...

Orlando, Florida 32816-2993, USA. Frank Riddick ... is due not only to the technical sophistication necessary to create software ..... schedule planning for launch vehicles. Figu .... classes. UML class diagrams can then be used to analyze the.

IMPLEMENTING THE HIGH LEVEL ARCHITECTURE IN THE

Apr 12, 2004 - Department of Industrial Engineering and Management. Systems ... Run-time Infrastructure (RTI), and (3) a data specification tool ... For instance, NASA KSC has models that ..... Mathematics from Purdue University. Frank has.

Implementing and Measuring the Level of

11 Sep 2014 - Henry Mbah*, Olubunmi Ruth Negedu-Momoh, Oluwasanmi Adedokun, Patrick Anibbe Ikani, .... HIV&AIDS prevention, care, surveillance, and treatment programs [4]. Parsons et ..... N Engl J Med 359: 553â555. doi:10.1056/.

ABSTRACT. National Conference on Exploring Basic ...

ABSTRACT. National Conference on Exploring Basic and Applied Sciences for Next. Generation Frontiers, pg. 24, Lovely Professional University, Punjab.

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Mar 16, 2012 - FYROM and a consulting company in Cyprus. During ... conducted house searches, seized documents and compu

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Oct 16, 2014 - IMPLEMENTATION AND APPLICATION BY BRAZIL OF THE ..... GDP in kickbacks and bribes.17 In the wake of the h

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Mar 18, 2011 - Accounting requirements, external audit, and company compliance and ethics ..... Only 5% of SMEs are thou

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Jun 15, 2012 - International Business Transactions and the 2009 Recommendation of the .... (d) Statute of Limitations an

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Dec 14, 2012 - Co-operation by the Spanish Authorities in this Evaluation . ..... March 2006, taking into account progre

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Jun 15, 2012 - This document and any map included herein are without prejudice to the status of or ..... (c) Tax Secrecy

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Oct 16, 2014 - foreign bribery including as concerns its âChampion Companiesâ. ...... public agents, i.e âquem, ai

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Jun 15, 2012 - c) Official development assistance (ODA) . .... day on-site visit to the Slovak Republic on 7-9 February

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Dec 11, 2014 - TABLE OF CONTENTS. EXECUTIVE ...... investigative judges and prosecutors in economic crime cases and asse

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Mar 16, 2012 - have been opened so far, of which one is still ongoing. .... the bribe via transfers from off-shore compa

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Jun 14, 2013 - International Business Transactions and the 2009 Recommendation of ... territory, to the delimitation of

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Dec 11, 2014 - through bank accounts in Switzerland, Hong Kong, and the Cayman Islands. ...... The time taken to open fo

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Mar 18, 2011 - including the availability of expertise in forensic accounting and information technology. 38. Based on i

phase 3 report on implementing the oecd anti-bribery ... - OECD.org

Jun 14, 2013 - The lawyers stated that investigations of legal persons only take place after the .... Some lawyers cited

IMPLEMENTING BLAS LEVEL 3 ON THE CAPâII Abstract The Basic ...

Download PDF

1 downloads 0 Views 77KB Size Report

Comment

PETER E. STRAZDINS AND RICHARD P. BRENT. Abstract. The Basic Linear Algebra Subprogram (BLAS) library is widely used in many supercomputing.

IMPLEMENTING BLAS LEVEL 3 ON THE CAP–II PETER E. STRAZDINS AND RICHARD P. BRENT

Abstract The Basic Linear Algebra Subprogram (BLAS) library is widely used in many supercomputing applications, and is used to implement more extensive linear algebra subroutine libraries, such as LINPACK and LAPACK. The use of BLAS aids in the clarity, portability and maintenance of mathematical software. BLAS level 1 routines involve vector-vector operations, level 2 routines involve matrix-vector operations, and level 3 routines involve matrix-matrix operations. To take advantage of the high degree of parallelism of architectures such as the CAP-II, BLAS level 3 routines are desirable. These routines are not I/O bound; for n × n matrices, the order of arithmetic operations is O(n3 ) whereas the order of I/O operations is only O(n2 ). We are concerned with implementing BLAS level 3 for real matrices on the CAP-II, with emphasis on obtaining the highest possible performance, without sacrificing numerical stability. While the CAP-II has many features that make it very well-suited for this purpose, there are also many new challenges in implementing BLAS level-3 on a distributed memory parallel computer (these are currently being considered also by the authors of BLAS-3, who designed it primarily for cache and shared memory architectures). One such challenge is its external interface: BLAS-3 subroutines can be called by the host program, with the CAP array used like a (very powerful) floating point unit. Alternatively, (CAP cell equivalents of) BLAS-3 subroutines may be called by the cell programs; this approach is more efficient but deviates from the BLAS standards. These issues are discussed. We also discuss the high-level design of basic parallel matrix multiplication, transposition and triangular matrix inversion algorithms to be used by the BLAS-3 subroutines. With the efficient row/column broadcast available on the CAP-II using wormhole routing, “semi-systolic” algorithms, with a low startup time, appear to be superior to other algorithms. It is hoped that input from the workshop can help to finalize details of the high-level design. On the lower levels of design, optimization of cell program codes (e.g. optimizing inner product or gaxpy loops, use of the SPARC cache) must be considered. Also relevant is the degree of optimization possible from the available CAP-II cell program compilers. Comments Only the Abstract is given here. The full paper appeared as [4]. For related, later work, see [3, 5]. References [1] R. P. Brent (editor), Proceedings of the Second Fujitsu-ANU CAP Workshop, Department of Computer Science, Australian National University, November 1991, 254 pp. rpb129. 1991 Mathematics Subject Classification. Primary 65Y05; Secondary 65F05, 65Y10. Key words and phrases. BLAS, basic linear algebra subroutines, parallel computation, distributed memory, local memory, MIMD, Fujitsu AP1000, Gaussian elimination, partial pivoting, matrix multiplication. c 1990, ANU and Fujitsu Laboratories Ltd. Copyright c 1993, R. P. Brent. Comments rpb121a typeset using AMS-LATEX.

2

RPB121A, STRAZDINS AND BRENT

[2] R. P. Brent and M. Ishii (editors), Proceedings of the First Fujitsu-ANU CAP Workshop, Fujitsu Research Laboratories, Kawasaki, Japan, November 1990, 205 pp. rpb123. [3] R. P. Brent and P. E. Strazdins, “Implementation of the BLAS level 3 and Linpack benchmark on the AP 1000”, Fujitsu Scientific and Technical Journal 29, 1 (March 1993), 61–70. rpb136. [4] P. E. Strazdins and R. P. Brent, “Implementing BLAS level 3 on the CAP-II”, in [2, paper 12, pp. 1–9]. rpb121. [5] P. E. Strazdins and R. P. Brent, “The implementation of BLAS level 3 on the AP 1000: Preliminary report”, in [1, H1–H17]. rpb131. Department of Computer Science and Computer Sciences Laboratory, Australian National University, Canberra, ACT 0200 E-mail address: {peter,rpb}@cs.anu.edu.au