On Efficient Storage of Sparse Matrices - Institut für Mathematik
Recommend Documents
Target architecture: Vector supercomputers e.g. The Earth Simulator, CRAY. X1. 1. .... Bi Jagged Diagonal Storage (Bi-JDS) â New Sparse Matrix. Storage ...
and prove a priori and a posteriori error estimates for variational inequalities ... Fréchet derivative Dφ while ψ : Y → R∪{+∞} is convex and lower semi-.
Apr 25, 2014 - languages, e.g. Nvidia's C++ for CUDA (proprietary) or OpenCL .... ming model, and programming environment designed for Nvidia's GPUs [10]. It ...... Application Design and Development, Morgan Kaufmann, 1 ed., 2011.
The development of MuPAD began back in 1990 at the University of Paderborn
with a ..... The command diff(f, x $ n), n being a positive integer, yields the n-th.
Fechnerian Scaling in R: The Package fechner. Preprint Nr. 07/2009 â 17. April 2009. Institut für Mathematik, UniversitätsstraÃe, D-86135 Augsburg.
v alu e estimators or asym p totic g oo d ness. - of. - fit tests ) ... v alu e estimates , see [5], [10]. O n the ..... C ro f ton 's f or m ula w hich can be w ritten as follo w s :.
come two problems experienced with standard low-order quadrilaterals: (a) they lead ... The right-hand side of (1.2) may serve as a refinement indicator within an ..... allelogram T â T2h is partitioned into four congruent sub-parallelograms to.
Aug 20, 2007 ... Test Surmise Relations, Test Knowledge Structures, and their ... August 2007.
Institut für Mathematik, Universitätsstraße, D-86 135 Augsburg.
the sparse matrix has been assembled we transfer the matrix to e.g. CRS ... elements in a highly regular fashion, normally row-wise or column-wise, and do.
1 Elemente der Mathematik 1. 4. 6. Summe Pflichtbereich: 4. 6. Fachgebiet:
Mathematik. Modulbeauftragter: Prof. Dr. Jost-Hinrich Eschenburg. Inhalte /
Lehrziele ...
Mar 16, 2011 - Asymptotic Goodness-of-fit Tests for the Palm Mark Distribution of ... measures, β-mixing point process, central limit theorem, Bernstein's block-.
For triangulations T into right isosceles triangles we quote from [22] (and only c2 = 2.36 depends on T ) ...... each of the three squares into four congruent triangles.
Dec 3, 2015 - 3 Dept. of Statistics, Yale University, New Haven, CT 06511 ... IM] 3 Dec 2015 ... (iii) Ensure positive-definiteness of the resulting precision.
an interdigital transducer (IDT) close to the lateral wall of the separation channel for generation of the SAWs. The cells can be distinguished by fluorescence.
[1] as the authors studied the linear algebraic properties of magic squares. The ..... [ 5 ] B. Kolman, D. Hill, Elementary Linear Algebra with Applications , 9th.
Mar 16, 1999 - This paper introduces a novel data structure used to store sparse matrices .... Using perfect hash table ([10]) the query time is O(1) and the space ..... on Algorithms, volume 855 of Lecture Notes in Computer Science, pages.
Data/signal in n-dimensional space : x. E.g., x is an ... Efficient algorithms for encoding and recovery. ⢠Why linear ... Recent results: the âbest of both worldsâ ...
Computing the Precoding Distribution of a Capacity Achieving Repetition. Feedback Strategy. 18 ... 1,0 < i ^ j < m, and a precoding distribution qi,0 < i < m qi = 1).
Universität Zürich. Mathematisch-naturwissenschaftliche Fakultät. Wegleitung
zum Studium der Mathematik. Ausgabe Januar 2011 ...
We call this the \time is proportional to flops" rule it is a fundamental tenet of our design. With sparse techniques, i
SPD case enables to separate structural properties of ma- ... SPD case is simpler and more transparent .... Definition 11 u a v are called indistinguishable if. AdjG.
method to compute an orthonormal basis of the null space of a sparse square ..... nulls.m, is publicly availalble at http://www.tau.ac.il/~stoledo/research.html.
Aug 29, 2011 ... tral notions such as Lyapunov- and Bohl exponents, and Sacker-Sell ... †Institut
für Mathematik, MA 4-5, Technische Universität Berlin, D-10623 ...
On Efficient Storage of Sparse Matrices - Institut für Mathematik
3. 0. 4 rowptr. 0. 5. 7. 9. 11. 13. ¦ c§. Figure 1: (a) Sparsity pattern of the 6¢ 6 âarrow-headâ matrix. (b) The âarrow-headâ matrix after CRS compression.
RBW06
Sparse Storage
On Efficient Storage of Sparse Matrices Shahadat Hossain Department of Mathematics and Computer Science University of Lethbridge, Canada
Computing by the Numbers: Algorithms, Precision, and Complexity Matheon Workshop 2006 in honor of the 60th birthday of Richard Brent Weierstrass Institute for Applied Analysis and Stochastics Berlin
RBW06
Sparse Storage
Outline Introduction and background Sparse matrix storage schemes: Compressed Row, Jagged Diagonal Performance Issues Concluding remarks and future directions
RBW06
Sparse Storage
Challenges in Large-scale Numerical Computing High Performance Computing Challenges Climate Models e.g. Community Climate System Model High-temperature Superconductor model calculation Supercomputer Systems Superscalar e.g., SGI Altix, IBM Power and Power4 Parallel vector supercomputers e.g., Earth Simulator, Cray X1 Widening gap between sustained and peak performance!!
RBW06
Sparse Storage
Krylov Subspace Methods Rn
Solve for x
Ax Rn
n
where A
b
is large and sparse.
Main computational tasks in a Krylov subspace method (e.g. Conjugate Gradient)
1. sparse matrix times vector (MVP): w
2. dot product: c
xT y αx α
4. preconditioning operation
y
3. vector update (saxpy): y
R
Av
RBW06
Sparse Storage
Exploiting Sparsity Target architecture: Vector supercomputers e.g. The Earth Simulator, CRAY X1 1. Regular structure: e.g., banded: Store the diagonals; can be implemented efficiently with excellent performance (flops). 2. General structure: Either adapt the diagonal storage scheme or reorder the matrix to obtain a diagonal structure. Observation: On vector machines long vectors can improve the performance of MVP calculation significantly
RBW06
Sparse Storage
Compressed Row Storage (CRS) a03 0
a04 0
a20 a30
0
0
0
0
a22 0
0
a40
0
0
a33 0
a44
a00 a10
a01 a11
a02 0
a03 0
a04 0
a20 a30
a22 a33
0
0
0
0
0
0
a40
a44
0
0
0
a30
a33
value
a00
colind
0
1
2
3
4
rowptr
0
5
7
9
11
a01
a02
b
a
a02 0
a01 a11
a00 a10
a03
a04 0
1
a10
a11
0
2
a20 0
3
a22 0
a40
a44
4
13
c
Figure 1:
(a) Sparsity pattern of the 6 6 “arrow-head” matrix. (b) The “arrow-head” matrix after CRS compression. (c) Compressed Row Storage (CRS) data structure for the “arrow-head” matrix.
RBW06
Sparse Storage
MVP in CRS for (int i = 0; i < m ; i++) { upper = rowptr[i+1]; // fetch the upper index // loop over row i for (int j = rowptr[i]; j < upper; j++) { b[i] = b[i] + value[j] * v[colind[j]]; } }
Figure 2: Matrix-vector multiplication in CRS scheme Most rows contain only few nonzero entries (ρi ) compared with the matrix dimension – ρi n implies shorter vector in the inner for-loop.
RBW06
Sparse Storage
Jagged Diagonal Storage(JDS)
a11 0
0 a22 a32
a31 0
a40
0
a03 0
0
0
0
a33 0
a34 a44
0
a00 a10
a03 a11
0
0
0
0
0
0
a20 a30
a22 a31
0
0
0
a40
a44
a32 0
a33 0
a34 0
a44
a32
value
a30
colind
0
0
0
jdptr
0
5
10
perm
3
1
2
a10
a20 0
a00
a40
0
11 0
b
a
0
a20 a30
0
a00 a10
1 12
1
a31
a11
2
3
a22 4
2
a03 3
a33
a34
4
13
4
c
Figure 3: (a) Sparsity pattern of the 6
6 row permuted “arrow-head” matrix. (b) The row-permuted “arrowhead” matrix after JDS compression. (c) Jagged diagonal storage (JDS) data structure for the “arrow-head” matrix.
RBW06
Sparse Storage
MVP in JDS for (int j = 0; j < rho_max ; j++) { upper = jdptr[j+1]; // fetch the upper index // loop over jagged diagonal j for (int i = jdptr[j]; i < upper; i++) { col = colind[i]; b[i] = b[i] + value[i] * v[col]; } }
Figure 4: Matrix-vector multiplication in JDS scheme Needs an m-vector to store the original row order. Extra work to restore the correct order in the result .
RBW06
Sparse Storage
Transposed Jagged Diagonal Storage (TJDS) a02
a03
a04
a10
a11
a22
a33
a44
a20
0
0
0
0
a30
0
0
0
0
a40
0
0
0
0
a11
a22
a33
a01
a00
a
value
a00
rowind
0
0
0
tjdptr
0
5
10
a01
a02 0 11
a03 0
a04 1
12
a10 1
2
3
4
2
3
a44
a20
a30
a40
4
13
b
Figure 5: (a) The “arrow-head” matrix after the columns have been compressed. (b) Transposed jagged diagonal storage (TJDS) data structure for the “arrow-head” matrix.
RBW06
Sparse Storage
MVP in TJDS for (int i = 0; i < col-max ; i++) { upper = tjdptr[i+1]; // fetch the upper index ind = 0; //index to iterate over v’s elements // loop over jagged diagonal j for (int j = tjdptr[i]; j < upper; j++) { row = rowind[j]; b[row] = b[row] + value[j] * v[ind]; ind++; } }
Figure 6: Matrix-vector multiplication in TJDS scheme No need for a permutation vector in TJDS scheme
RBW06
Sparse Storage
Bi Jagged Diagonal Storage (Bi-JDS) – New Sparse Matrix Storage Scheme 1. JDS and TJDS are suitable for MVP calculation on vector machines. 2. The number of nonzero entries in each row may vary considerably for general sparse matrices e.g. the arrow-head matrix under JDS or TJDS is stored as many short jagged diagonals. The Bi-JDS scheme partitions a sparse matrix into two segments and compresses the matrix in both column and row directions.
RBW06
Sparse Storage
Bi-JDS Data Structure a03 0
a04 0
a20 a30
0
0
0
0
a22 0
0
a40
0
0
a33 0
a44
column compression
row compression
a00 a10
a01 a11
a02 0
a03 0
a04 0
a20 a30
a22 a33
0
0
0
0
0
0
a40
a44
0
0
0
value
a00
row col ind
0
tjdptr
11
row min
2
rcol max
1
rcol min
1
a10 0
a20 0
0
b
a
a30 0
a02 0
a01 a11
a00 a10
a40 1
1
a01 2
a11 3
a22 4
0
a33 0
a44
a02
a03
a04
0
14
c
Figure 7:
(a) Bi-JDS partition of the 6 6 “arrow-head” matrix. (b) The “arrow-head” matrix after column and row compression. (c) Bi-jagged diagonal (Bi-JDS) data structure for the “arrow-head” matrix
RBW06
Sparse Storage
MVP with Bi-JDS //The jagged diagonals are full-length; so jptr array is not needed. for (int j = 0; j < rho_min ; j++) { ind = 0; for (i = m*j; i < m*(j+1); i++) { col = row_colind[i]; b[ind] = b[ind] + value[i]*v[col]; ind++; } } // Now update the product with the contribution from transposed jagged diagonals. k = 0; for (int i = 0; i < rcol_max; i++){ ind=minrow+1; for (int indtjd=tjdptr[i]; indtjd < tjdptrr[i+1]; indtjd++){ row = row_colind[k]; b[row] = b[row] + val[indtjd]*v[ind]; ind=ind+1; k=k+1; } }
RBW06
Sparse Storage
Storage Requirements
memoryBiJDS:
2nnz A
ntjd
1
κmax
2nnz A
1
memoryTJDS:
ρmax
2nnz A
m
1
κmax κmin maximum number of nonzero entries in any row of A, maximum number of nonzero entries in any column of A, and minimum number of nonzero entries in any column of A.
where ntjd ρmax κmax κmin
memoryJDS:
RBW06
Sparse Storage
Computational Experiments Table 1: The Harwell-Boeing Test Problems Info. m
n
nnz
ρmax
ρmin
κmax
κmin
can144
144
144
720
15
6
15
6
b.
can838
838
838
5424
32
6
32
6
c.
can1054
1054
1054
6625
35
6
35
6
d.
can1072
1072
1072
6758
35
6
35
6
e.
add20
2395
2395
17319
124
2
124
2
f.
bfw398a
398
398
3678
23
3
21
3
g.
bfw398b
398
398
2910
13
2
13
2
h.
bfw782a
782
782
5982
13
2
13
2
Nr.
Name
a.
RBW06
Sparse Storage
Computational Experiments Figure 8: Properties of the jagged diagonals in sparse storage schemes. Full diagonal Full jagged diagonals
Non−full jagged diagonals
70
400 JDS TJDS Bi−JDS
60
35
(a)
(b)
(c)
Non−full diagonals (d)
700
(a)
JDS TJDS
350
Bi−JDS
30
600
25
500
20
400
15
300
10
200
5
100
0
0
300
Number of nonzeros
Percent full
50
40
30
250
200
150
20 100
10
0
50
(a)
(b)
(c)
(d)
0
(a)
(i) Cannes problems
(b)
(c)
(d)
(ii) Misc. problems
(b)
(c)
(d)
RBW06
Sparse Storage
Computational Experiments Figure 9: Estimate of storage requirements in sparse storage schemes. 4
Cannes problems 8000
7000
2
JDS TJDS BiJDS
Misc. problems
x 10
JDS TJDS BiJDS
1.8 1.6
6000 1.4
Storage space
5000
1.2
4000
1 0.8
3000
0.6 2000 0.4 1000
0
0.2
(a)
(b)
(c)
(d)
0
(a)
(b)
(c)
(d)
RBW06
Sparse Storage
Concluding Remarks 1. The BiJDS scheme produces more full-length jagged diagonals as well as longer non-full jagged diagonals compared with JDS or TJDS scheme. 2. The BiJDS scheme is also space efficient and does not need to store the permutation order of the rows of the matrix. 3. Can be combined with blocking strategy to reduce indirect memory access. 4. Extensive numerical tests on practical problems.