Apr 6, 1999 - [37] B. C. McCandless. The role of abstraction in ... [38] B. C. McCandless and A. Lumsdaine. The role of ... in C++ Gems, ed. Stanley Lippman.
A Modern Framework for Portable High Performance Numerical Linear Algebra
A Thesis
Submitted to the Graduate School of the University of Notre Dame in Partial Fulfillment of the Requirements for the Degree of
Master of Science in Computer Science and Engineering
by
Jeremy G. Siek, B.S.
Andrew Lumsdaine, Director
Department of Computer Science and Engineering Notre Dame, Indiana April 1999
A Modern Framework for Portable High Performance Numerical Linear Algebra
Abstract by Jeremy G. Siek This thesis describes a generic programming methodology for expressing data structures, algorithms, and optimizations for numerical linear algebra. A high-performance implementation of this approach, the Matrix Template Library (MTL), is also described. The goal of the MTL is to facilitate development of higher-level libraries and applications for scientific computing. In addition, the programming techniques developed in this thesis are widely applicable and can be used to reduce development costs, improve readability, and improve the performance of many kinds of software. Portable high performance is a particular focus of the MTL. Flexible kernels were constructed that provide an automated tool for cross architecture performance portability.
ii
This is for all the code warriors in scientific computing.
Contents Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
x
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Chapter 2: Generic Programming . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Generic Programming and the Standard Template Library . . . . . . . . . 2.2 Generic Programming for Linear Algebra . . . . . . . . . . . . . . . . .
5 5 9
Chapter 3: Related Work . . . . . . . . . . . . . . . . . 3.1 Traditional Basic Linear Algebra Libraries . . . . 3.2 Automatically Tuned Dense Linear Algebra . . . 3.2.1 The Optimizing Compiler Approach . . . 3.2.2 The Library Approach . . . . . . . . . . 3.3 C++ Libraries for Linear Algebra . . . . . . . . . 3.4 Generic Programming and Software Engineering
. . . . . . .
13 13 14 14 14 15 16
Chapter 4: MTL Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Pointwise LU Factorization Example . . . . . . . . . . . . . . . . . . . . 4.2 Blocked LU Factorization Example . . . . . . . . . . . . . . . . . . . .
17 21 24
Chapter 5: MTL Components . . . . . . . . . . . 5.1 Domain Analysis of Matrix Storage Formats 5.1.1 Matrix Element Type . . . . . . . . 5.1.2 Matrix Shape . . . . . . . . . . . . 5.1.3 Matrix Storage . . . . . . . . . . . 5.1.4 OneD Storage . . . . . . . . . . . . 5.2 Component Selection and Generation . . .
30 30 31 32 34 37 38
iii
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
CONTENTS . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
39 40 41 42 42 43 43 44 44 46 46 48 49 50 51 51 51 51 52 52 53 57
Chapter 6: High Performance . . . . . . . . . . . . . . 6.1 Mayfly Components . . . . . . . . . . . . . . . . 6.2 High Performance Iterators . . . . . . . . . . . . 6.3 High Performance & Template Metaprogramming 6.4 Fixed Algorithm Size Template (FAST) Library . 6.5 Basic Linear Algebra Instruction Set (BLAIS) . . 6.6 BLAIS in a General Matrix-Matrix Product . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
61 66 69 72 74 75 78
Chapter 7: Iterative Template Library (ITL) 7.1 Generic Interface . . . . . . . . . . . 7.2 Ease of Implementation . . . . . . . . 7.3 ITL Performance . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
81 81 83 85
Chapter 8: Performance Experiments . . . . . . . . . . . . . 8.1 Dense Matrix-Matrix Multiplication . . . . . . . . . . 8.2 Dense and Sparse Matrix-Vector Multiplication . . . . 8.3 Performance Analysis of Matrix-Matrix Multiplication
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
86 86 87 88
Chapter 9: Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
5.3
5.4 5.5 5.6
5.7 5.8
5.2.1 Template Metaprogramming MTL Concepts . . . . . . . . . . . 5.3.1 Matrix . . . . . . . . . . . . 5.3.2 Vector . . . . . . . . . . . 5.3.3 IndexedIterator . . . . . . 5.3.4 Indexer . . . . . . . . . . . 5.3.5 OneDIndexer . . . . . . . 5.3.6 Offset . . . . . . . . . . . . MTL Object Memory Model . . . . The MTL Component Architecture . TwoD Storage Classes . . . . . . . 5.6.1 dense2D . . . . . . . . . . 5.6.2 compressed2D . . . . . . 5.6.3 array2D . . . . . . . . . . 5.6.4 envelope2D . . . . . . . OneD Containers/Vectors . . . . . . 5.7.1 dense1D . . . . . . . . . . 5.7.2 compressed1D . . . . . . Adaptors . . . . . . . . . . . . . . . 5.8.1 sparse1D . . . . . . . . . 5.8.2 Scaling Adaptors . . . . . . 5.8.3 Striding Adaptors . . . . . .
iv . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
CONTENTS
Chapter 10: Future Work and Conclusion . . . . . . . . 10.1 Future Work . . . . . . . . . . . . . . . . . . . . 10.1.1 MTL User Interface . . . . . . . . . . . 10.1.2 MTL Functionality . . . . . . . . . . . . 10.1.3 Higher Level Libraries . . . . . . . . . . 10.1.4 New Language . . . . . . . . . . . . . . 10.1.5 MTL for Advanced Parallel Architectures 10.2 Conclusion . . . . . . . . . . . . . . . . . . . .
v
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
94 94 94 94 95 95 96 96
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
Appendix A: Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.1 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.2 RowMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.3 ColumnMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.4 DiagonalMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.5 Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.6 TwoDStorage . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Container type generators . . . . . . . . . . . . . . . . . . . . . . . . . . A.2.1 matrix< T, Shape = rectangle, Storage = dense, Orientation = row major > . . . . . . . . . . . . . . . . . . . . . . . . . A.2.2 band view . . . . . . . . . . . . . . . . . . . . . . . . A.2.3 block view . . . . . . . . . . . A.2.4 symmetric view . . . . . . . . . . . . . . . . A.2.5 triangle view . . . . . . . . . . . . . . . . . A.3 Container type selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.1 rectangle . . . . . . . . . . . . . . . . A.3.2 symmetric . . . . . . . . . . . . . . A.3.3 hermitian . . . . . . . . . . . . . . . A.3.4 banded . . . . . . . . . . . . . . . . . . A.3.5 triangle . . . . . . . . . . . . . . . . A.3.6 diagonal . . . . . . . . . . . . . . . . . A.3.7 array . . . . . . . . . A.3.8 dense . . . . . . . . . . . . . . . . . . . A.3.9 compressed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.10 packed . . . . . . . . . . . . . . . . . . A.3.11 banded view . . . . . . . . . . . . . . . A.3.12 envelope . . . . . . . . . . . . . . . . . A.3.13 linked list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.14 sparse pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.15 tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4 Container classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4.1 dense1D . . . . . . . . . . . . . . . . . . . . .
103 103 103 113 113 114 115 117 119 119 121 122 123 124 125 125 127 129 129 131 133 134 136 137 140 141 142 143 144 144 145 145
CONTENTS A.4.2 compressed1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4.3 external vec . . . . . . . . . . . . . . . . . . . A.4.4 generic dense2D A.4.5 dense2D . . . . . . . . A.4.6 external2D . . . . . . . A.4.7 generic comp2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4.8 compressed2D . . . . . . . . . A.4.9 ext comp2D . . . . . . . . . . A.4.10 array2D . . . . . . . . . . . . . . . . . . . . . . . . . A.5 Container adaptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.5.1 linalg vec . . . . . A.5.2 scaled1D . . . . . . . . . . . . . A.5.3 sparse1D . . . . . . . . . . . . . . . . . . . . . . . A.5.4 strided1D . . . . . . . . . . . . A.5.5 scaled2D . . . . . . . . . . . . . . . . . . . . . . . A.5.6 block2D . . . . . . . . . . . . . . . . . . . A.6 Container functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.6.1 scaled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.6.2 strided . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.6.3 rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.6.4 columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.6.5 trans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.6.6 blocked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.6.7 blocked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7 Container tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.1 banded tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.2 column matrix traits . . . . . . . . . . . . . . . . . . A.7.3 column tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.4 dense tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.5 diagonal matrix traits . . . . . . . . . . . . . . . . . . A.7.6 diagonal tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.7 external tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.8 hermitian tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.9 internal tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.10 linalg traits . . . . . . . . . . . . . . . . . . . . . . . A.7.11 matrix traits . . . . . . . . . . . . . . . . . . . . . . . A.7.12 not strideable . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.13 oned tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.14 rectangle tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.15 row matrix traits . . . . . . . . . . . . . . . . . . . . A.7.16 row tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.17 sparse tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.18 strideable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.19 symmetric tag . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7.20 triangle tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
147 150 153 157 158 160 163 164 166 169 169 172 174 177 179 180 183 183 184 185 186 186 187 188 189 189 189 189 189 189 189 189 189 189 189 190 191 191 191 191 191 191 191 191 191
CONTENTS
vii
A.7.21 twod tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Appendix B: Iterators . . . . . . . . . . . . . . . . . . . . . . B.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . B.1.1 IndexedIterator . . . . . . . . . . . . . . . . . B.2 Iterator functions . . . . . . . . . . . . . . . . . . . . B.2.1 trans iter . . . . . . . . . . . . . . . . . . . . B.3 Iterator adaptors . . . . . . . . . . . . . . . . . . . . . B.3.1 dense iterator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
192 192 192 193 193 194 194 196 198 200 201
Appendix C: Algorithms . . . . C.0.6 sum . . . . . . . C.0.7 set . . . . . . . . C.0.8 scale . . . . . . . C.0.9 set diagonal . . . C.0.10 two norm . . . . C.0.11 one norm . . . . C.0.12 infinity norm . . C.0.13 max index . . . C.0.14 max . . . . . . . C.0.15 min . . . . . . . C.0.16 transpose . . . . C.0.17 transpose . . . . C.0.18 mult . . . . . . . C.0.19 mult . . . . . . . C.0.20 mult . . . . . . . C.0.21 tri solve . . . . . C.0.22 tri solve . . . . . C.0.23 rank one update C.0.24 rank two update C.0.25 copy . . . . . . . C.0.26 add . . . . . . . C.0.27 add . . . . . . . C.0.28 add . . . . . . . C.0.29 ele mult . . . . . C.0.30 ele mult . . . . . C.0.31 ele div . . . . . C.0.32 swap . . . . . . C.0.33 dot . . . . . . . C.0.34 dot . . . . . . . C.0.35 dot conj . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
203 203 204 205 206 207 208 209 210 211 211 212 212 213 214 216 217 218 220 221 223 223 224 225 226 226 227 227 228 228 229
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS
viii
C.0.36 dot conj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 C.0.37 lu factorize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Appendix D: Function Objects . . . . . . . . . D.0.38 givens rotation . . . . . . D.0.39 givens rotation D.0.40 modified givens . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
232 232 233 234
Appendix E: Iterative Template Library E.1 Concepts . . . . . . . . . . . . E.1.1 Iteration . . . . . . . . . E.1.2 Preconditioner . . . . . E.2 Algorithms . . . . . . . . . . . E.2.1 cg . . . . . . . . . . . . E.2.2 cgs . . . . . . . . . . . E.2.3 bicg . . . . . . . . . . . E.2.4 gmres . . . . . . . . . . E.2.5 bicgstab . . . . . . . . . E.2.6 qmr . . . . . . . . . . . E.2.7 tfqmr . . . . . . . . . . E.2.8 gcr . . . . . . . . . . . E.2.9 cheby . . . . . . . . . . E.2.10 richardson . . . . . . . . E.3 Preconditioners . . . . . . . . . E.3.1 ILU . . . . . E.3.2 ILUT . . . . E.3.3 SSOR . . . . E.3.4 cholesky< Matrix > . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
235 235 235 237 238 238 238 239 240 241 242 243 244 245 245 246 246 248 249 251
Appendix F: Fixed Algorithm Size Template (FAST) Library F.0.5 copy . . . . . . . . . . . . . . . . . . . . . . . F.0.6 transform . . . . . . . . . . . . . . . . . . . . F.0.7 transform . . . . . . . . . . . . . . . . . . . . F.0.8 fill . . . . . . . . . . . . . . . . . . . . . . . . F.0.9 swap ranges . . . . . . . . . . . . . . . . . . F.0.10 accumulate . . . . . . . . . . . . . . . . . . . F.0.11 accumulate . . . . . . . . . . . . . . . . . . . F.0.12 inner product . . . . . . . . . . . . . . . . . . F.0.13 inner product . . . . . . . . . . . . . . . . . . F.0.14 count . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
253 253 253 254 254 255 255 255 256 256 257
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
CONTENTS
ix
Appendix G: Basic Linear Algebra Instruction Set (BLAIS) Library G.0.15 add . . . . . . . . . . . . . . . . . . . . . . . G.0.16 copy . . . . . . . . . . . . . . . . . . . . . . G.0.17 copy . . . . . . . . . . . . . . . . . . G.0.18 dot . . . . . . . . . . . . . . . . . . . . . . . G.0.19 mult . . . . . . . . . . . . . . . . . . G.0.20 mult . . . . . . . . . . . . . . . G.0.21 rank one . . . . . . . . . . . . . . . . G.0.22 set . . . . . . . . . . . . . . . . . . . . . . . G.0.23 set . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
258 258 259 259 260 260 261 261 262 262
Appendix H: MTL to LAPACK Interface . . . . . . . . . . H.0.24 lapack matrix H.1 Functions . . . . . . . . . . . . . . . . . . . . . . . H.1.1 gecon . . . . . . . . . . . . . . . . . . . . . H.1.2 geev . . . . . . . . . . . . . . . . . . . . . . H.1.3 geqpf . . . . . . . . . . . . . . . . . . . . . H.1.4 geqrf . . . . . . . . . . . . . . . . . . . . . H.1.5 gesv . . . . . . . . . . . . . . . . . . . . . . H.1.6 getrf . . . . . . . . . . . . . . . . . . . . . . H.1.7 getrs . . . . . . . . . . . . . . . . . . . . . . H.1.8 geequ . . . . . . . . . . . . . . . . . . . . . H.1.9 gelqf . . . . . . . . . . . . . . . . . . . . . H.1.10 orglq . . . . . . . . . . . . . . . . . . . . . H.1.11 orgqr . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
263 263 264 264 265 266 267 268 269 270 271 272 272 273
Appendix I: Utilities . . . . . . . . . . . . . . . . . . . . . I.1 Concepts . . . . . . . . . . . . . . . . . . . . . . I.1.1 Indexer . . . . . . . . . . . . . . . . . . . I.1.2 Offset . . . . . . . . . . . . . . . . . . . . I.2 Functions . . . . . . . . . . . . . . . . . . . . . . I.2.1 read dense matlab . . . . . . . . . . . . . I.2.2 write dense matlab . . . . . . . . . . . . . I.2.3 read sparse matlab . . . . . . . . . . . . . I.2.4 write sparse matlab . . . . . . . . . . . . . I.3 Classes . . . . . . . . . . . . . . . . . . . . . . . I.3.1 dimension I.3.2 harwell boeing stream . . . . . . . . I.3.3 matrix market stream . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
274 274 274 276 278 278 279 279 280 280 280 281 282
. . . . . . . . . . . . .
Tables 1.1
Breakdown of personal accomplishments vs. others' related work and work used in this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.1
Excerpt from the STL random-access iterator requirements.
. . . . . . .
6
3.1
Summary of C++ Libraries for Linear Algebra
. . . . . . . . . . . . . .
16
4.1
MTL generic linear algebra algorithms. . . . . . . . . . . . . . . . . . .
19
4.2
MTL adaptor classes and helper functions for creating algorithm permutations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Permutations of the add() operation made possible with the use of the scaled() adaptor helper function. . . . . . . . . . . . . . . . . . . . .
20
5.1
Matrix associated types, in addition to those of Container . . . . . . . .
41
5.2
Matrix method requirements, in addition to those of Container
. . . . .
58
5.3
Vector requirements, in addition to those of Container . . . . . . . . . .
59
5.4
IndexedIterator requirements . . . . . . . . . . . . . . . . . . . . . . .
59
5.5
Indexer requirements
. . . . . . . . . . . . . . . . . . . . . . . . . . .
59
5.6
OneDIndexer requirements . . . . . . . . . . . . . . . . . . . . . . . .
59
5.7
Offset requirements
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
6.1
The effect of iterator and comparison operator choice on performance (in Mflops) for dot product on Sun C, IBM XLC, and SGI C compilers. . . .
70
MTL test suite results summary. . . . . . . . . . . . . . . . . . . . . . .
93
4.3
9.1
x
Figures 2.1
Separation of containers and algorithms using iterators. . . . . . . . . . .
6
2.2
The TwoD iterator concept. . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.3
Simplified example of a generic matrix-vector product, with a comparison to the traditional approach to writing dense and sparse matrix-vector products. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
4.1
LU factorization pseudo-code. . . . . . . . . . . . . . . . . . . . . . . .
21
4.2
Diagram for LU factorization. . . . . . . . . . . . . . . . . . . . . . . .
22
4.3
Complete MTL version of pointwise LU factorization. . . . . . . . . . .
25
4.4
Pointwise step in block LU factorization. . . . . . . . . . . . . . . . . . .
27
4.5
Update steps in block LU factorization. . . . . . . . . . . . . . . . . . .
28
4.6
MTL version of block LU factorization. . . . . . . . . . . . . . . . . . .
29
5.1
Feature diagram for common matrix formats. . . . . . . . . . . . . . . .
31
5.2
The MTL Matrix configuration grammar. . . . . . . . . . . . . . . . . .
32
5.3
Example of a banded matrix with bandwidth (1,2). . . . . . . . . . . . .
33
5.4
Example of a symmetric matrix with bandwidth (2,2). . . . . . . . . . . .
33
5.5
Example of the dense matrix storage format. . . . . . . . . . . . . . . . .
34
5.6
Example of the banded matrix storage format. . . . . . . . . . . . . . . .
35
5.7
Example of the packed matrix storage format. . . . . . . . . . . . . . . .
35
5.8
Example of the compressed column matrix storage format. . . . . . . . .
36
5.9
Example of the array matrix storage format with dense and with sparse pair OneD storage types. . . . . . . . . . . . . . . . . . . . . . . . . . .
37
5.10 Example of the envelope matrix storage format. . . . . . . . . . . . . . .
38
xi
FIGURES
xii
5.11 Non-intrusive reference counting pointer implementation. . . . . . . . . .
47
5.12 The MTL implementation layer components. . . . . . . . . . . . . . . .
48
6.1
A recursive matrix-matrix product algorithm. . . . . . . . . . . . . . . .
80
7.1
An ITL method interface example. . . . . . . . . . . . . . . . . . . . . .
82
7.2
Example use of the ITL QMR iterative method. . . . . . . . . . . . . . .
83
7.3
Comparison of an algorithm for the preconditioned conjugate gradient method and the corresponding ITL code. . . . . . . . . . . . . . . . . . .
84
7.4
Comparison of ITL and IML++ performance over six matrices. . . . . . .
85
8.1
Performance comparison of generic dense matrix-matrix product with other libraries on Sun UltraSPARC (upper) and IBM RS6000 (lower). . .
89
Performance of generic matrix-vector product applied to column-oriented dense (upper) and row-oriented sparse (lower) data structures compared with other libraries on Sun UltraSPARC. . . . . . . . . . . . . . . . . . .
90
8.2
FIGURES
xiii
Acknowledgements
Thanks go to Andrew Lumsdaine, my advisor and originator of the MTL vision. I thank my father, Richard Siek, for instilling in me a love of good software engineering practices and object-oriented programming. I thank my mother, Elisabeth Siek, for always encouraging creativity and imagination. I thank Alexander Stepanov and David Musser for their inspirational Standard Template Library, and also Bjarne Stroustrup for designing a language with enough flexibility and expressiveness to make MTL not only possible but also efficient. I would like to thank Brian McCandles for his contributions to the first version of MTL. Thanks also go to all of my wonderful colleagues in the Laboratory for Scientific Computing who helped with this thesis in innumerable ways. In addition I would like to thank the Philosophy and Humanities professors I studied under at Notre Dame who gave me an appreciation of how concepts can be made to fit together to form coherent structures.
Chapter 1 Introduction Software construction for scientific computing is a difficult task. Scientific codes are often large and complex, requiring vast amounts of domain knowledge for their construction. They also process large data sets so there is an additional requirement for efficiency and high performance. Considerable knowledge of modern computer architectures and compilers is required to make the necessary optimizations, which is a time-intensive task and further complicates the code. The last decade has seen significant advances in the area of software engineering. New techniques have been created for managing software complexity and building abstractions. Underneath the layers of new terminology (object-oriented, generic [51], aspectoriented [40], generative [17], metaprogramming [55]) there is a core of solid work that points the way for constructing better software for scientific computing: software that is portable, maintainable and achieves high performance at a lower development cost. One important key to better software is better abstractions. With the right abstractions each aspect of the software (domain specific, performance optimization, parallel communication, data-structures etc.) can be cleanly separated, then handled on an individual basis. The proper abstractions reduce the code complexity and help to achieve high-quality and high-performance software. The first generation of abstractions for scientific computing came in the form of sub1
CHAPTER 1. INTRODUCTION
2
routine libraries such as the Basic Linear Algebra Subroutines (BLAS) [22, 23, 36], LINPACK [21], EISPACK [50], and LAPACK [2]. This was a good first step, but the first generation libraries were inflexible and difficult to use, which reduced their applicability. Moreover the construction of such libraries was a complex and expensive task. Many software engineering techniques (then in their infancy) could not be applied to scientific computing because of their interference with performance. In the last few years significant improvements have been made in the tools used for expressing abstractions, primarily in the maturation of the C++ language and its compilers. The old enmity between abstraction and performance can now be put aside. In fact, abstractions can be used to aid performance portability by making the necessary optimizations easier to apply. With the intelligent use of modern software engineering techniques it is now possible to create extremely flexible scientific libraries that are portable, easy to use, highly efficient, and which can be constructed in far fewer lines of code than has previously been possible. This thesis describes such a library, the Matrix Template Library (MTL), a package for high-performance numerical linear algebra. There are four main contributions in this thesis. The first is a breakthrough in software construction that enables the heavy use of abstraction without inhibiting high performance. The second contribution is the development of software designs that allow additive programming effort to produce multiplicative amounts of functionality. This produced an order of magnitude reduction in the code length for MTL compared to the Netlib BLAS implementation, a software library of comparable functionality. The third contribution is the construction of flexible kernels that simplify the automatic generation of portable optimized linear algebra routines. The fourth contribution is the analysis and classification of the numerical linear algebra problem domain which is formalized in the concepts that define the interfaces of the MTL
CHAPTER 1. INTRODUCTION Personal Accomplishments Implementation of all the MTL software Idea to use adaptors to solve “fat” interface problem Use of aspect objects to handle indexing for matrices
3 Others' Related Work BLAS [22, 23, 36] and LAPACK [2]
Generic Programming [43], Aspect Oriented Programming [40], idea of a separation of orientation and 2D containers [37, 38], idea to use iterators for linear algebra [37, 38] Idea to use template metaprogramming to Complete unrolling for operations on perform register blocking in linear alge- small arrays [55], matrix constructor bra kernels interface [16, 18], compile-time prime number calculations [54] Tuned MTL algorithms for high perfor- Tiling and blocking techniques [10, 11, mance 12, 14, 32, 34, 35, 39, 60, 61], automatically tuned libraries [7, 59] Proved that iterators can be used in high Optimizing compilers [33, 41], performance arenas lightweight object optimization, inlining Created the Mayfly pattern Andrew Lumsdaine thought of the name Designed the ITL interface ITL implementation by Andrew Lumsdaine and Rich Lee Table 1.1. Breakdown of personal accomplishments vs. others' related work and work used in this thesis.
components and algorithms. The work in this thesis builds off of work by many other people, and parts of others work is described in this thesis. Table 1.1 is provided in order to clarify what work was done by others, and what work I did as part of this thesis. The related work listed here is only the work that was very closely related to MTL, or that was used heavily in MTL. Chapter 3 describes in more detail the work related to MTL. The following is a road map for the rest of this thesis. Chapter 2 gives a introduction to generic programming, and describes how to extend generic programming to linear algebra. Chapter 3 gives and overview of prior work by others that is related to MTL.
CHAPTER 1. INTRODUCTION
4
Chapters 4 and 5 address the design and implementation of the MTL algorithms and components. Chapter 6 discusses performance issues such as the ability of modern C++ compilers to optimize abstractions and how template metaprogramming techniques can be used to express loop optimizations. Chapter 7 describes an iterative methods library — the Iterative Template Library (ITL) — that is constructed using MTL. The ultimate purpose of the work in this thesis is to aid the construction of higher-level scientific libraries and applications in several respects: reduce the development costs, improve software quality from a software engineering standpoint, and to make high-performance easier to achieve. The Iterative Template Library is an example of how higher-level libraries can be constructed using MTL. Chapter 8 gives the real proof that our generic programming approach is viable for scientific computing: the performance results. The performance of MTL is compared to vendor BLAS libraries for several dense and sparse matrix computations on several different architectures. Chapter 9 summarizes the verification and testing of the MTL software. Chapter 10 discusses some future directions of MTL and concludes the thesis.
Chapter 2 Generic Programming 2.1 Generic Programming and the Standard Template Library This chapter gives a short description of generic programming with a few examples from the Standard Template Library (STL). For a more complete description of STL refer to [52]. For an introduction to STL and generic programming refer to [3, 42, 53]. In the chapters following this one it will be assumed the reader has a basic knowledge of STL and generic programming. Generic programming has recently entered the spotlight with the introduction of the Standard Template Library (STL) into the C++ standard [27]. The principal idea behind generic programming is that many algorithms can be abstracted away from the particular data structures on which they operate. Algorithms typically need the functionality of traversing through a data structure and accessing its elements. If data structures provide a standard interface for these operations, generic algorithms can be freely mixed and matched with data structures (called containers in STL). The main facilitator in the separation of algorithms and containers in STL is the iterator (sometimes called a “generalized pointer”). Iterators provide a mechanism for traversing containers and accessing their elements. The interface between an algorithm
5
CHAPTER 2. GENERIC PROGRAMMING
6
and a container is specified by the types of iterators exported by the container. Generic algorithms are written solely in terms of iterators and never rely upon specifics of a particular container. Iterators are classified into broad categories, some of which are: InputIterator, ForwardIterator, and RandomAccessIterator. Figure 2.1 depicts the relationship between containers, algorithms, and iterators. Iterators Containers
Algorithms
Figure 2.1. Separation of containers and algorithms using iterators. The STL defines a set of requirements for each class of iterators. The requirements are in the form of which operations (functions) are defined for each iterator, and what the meaning of the operation is. As an example of how these requirements are defined, an excerpt from the requirements for the STL RandomAccessIterator is listed in Table 2.1. In the table, X is the iterator type, T is the element type pointed to by the iterator, (a,b,r,s) are iterator objects, and n is an integral type. expression a == b a < b *a a->m ++r --r r+=n a + n b - a
return type bool bool T& U& X& X& X& X Distance
a[n]
convertible to T
note *a == *b b - a > 0 dereference a (*a).m r == s ! ++r == ++s r == s ! --r == --s same as n of ++r f tmp = a; return tmp += n; (anext; return *this; } }; ... };
When dealing with some container class, one can access the correct type of iterator using the double-colon scope operator, as is demonstrated below in the function foo(). template void foo(ContainerX& x, ContainerY& y) { typename ContainerX::iterator xi; typename ContainerY::iterator yi; ... }
CHAPTER 2. GENERIC PROGRAMMING
9
2.2 Generic Programming for Linear Algebra The advantages of generic programming coincide with the library construction problems of numerical linear algebra. The traditional approach for developing numerical linear algebra libraries is combinatorial in the required development effort. Individual subroutines must be written to support every desired combination of algorithm, basic numerical type, and matrix storage format. For a library to provide a rich set of functions and data types, one would need to code hundreds of versions of the same routine. As an example, to provide basic functionality for selected sparse matrix types, the NIST implementation of the Sparse BLAS contains over 10,000 routines and a custom code generation system [48]. The combinatorial explosion in implementation effort arises because, with most programming languages, algorithms and data structures are more tightly coupled than is conceptually necessary. That is, one cannot express an algorithm as a subroutine independently from the type of data that is being operated on. As a result, providing a comprehensive linear algebra library — much less one that also offers high performance — would seem to be an overwhelming task. Fortunately, certain modern programming languages, such as Ada and C++, support generic programming by providing mechanisms for expressing algorithms independent of the specific data structure to which they are applied. A single function can then work with many different data structures, drastically reducing the size of the code. In rough terms, given M algorithms and N data structions, the amount of code goes from O (M
N ) to
just O (M + N ). As a result development, maintenance, testing, and optimization become much easier. If generic algorithms are to be created for numerical linear algebra, there must be a common interface, a common way to access and traverse the vectors and matrices of different types. The STL has already provided a model for traversing through vectors
CHAPTER 2. GENERIC PROGRAMMING
10
and other one-dimensional containers by using iterators. In addition, the STL defines several numerical algorithms such as the accumulate() algorithm presented in the last section. Thus creating generic algorithms to encompass the rest of the Level-1 BLAS functionality [36] is relatively straightforward. Matrix operations are slightly more complex, since the elements are arranged in a two-dimensional format. The MTL algorithms process matrices as if they are containers of containers (the matrices are not necessarily implemented this way). The matrix algorithms are coded in terms of iterators and two-dimensional iterators, as depicted in Figure 2.2. An algorithm can choose which row or column of a matrix to process using the two-dimensional iterator. The iterator can then be dereferenced to produce the row or column vector, which is a first class STL-style container. The one-dimensional iterators of the row vector can then be used to traverse along the row and access individual elements. TwoD Iterator OneD Container TwoD Container
OneD Container
OneD Iterator
Figure 2.2. The TwoD iterator concept. The code for an iterator-based generic matrix-vector product is listed in Figure 2.3. The matvec mult function is templated on the matrix type and the iterator types (which give the starting points of two vectors). The first two lines of the algorithm declare variables for the TwoD and OneD iterators that will be used to traverse the matrix.
CHAPTER 2. GENERIC PROGRAMMING
11
The Matrix::const iterator expression extracts the iterator type from the Matrix type to declare the variable i. Similarly, the Matrix::OneD::const iterator expression extracts the iterator type from the OneD defined inside the matrix to declare variable j. The Matrix::iterator i is set to the beginning of the matrix i = A.begin(). The outer loop repeats until i reaches A.end(). For the inner loop the Matrix::OneD::iterator j is set to the beginning of the row pointed to by i with j = i->begin(). The inner loop repeats until j reaches the end of the row i->end(). The computation for the matrix-vector product consists of multiplying an element from the matrix *j by the appropriate element in vector x. The index into x is the current column, which is the position of iterator j, given by j.index(). The result is accumulated into tmp and stored into vector y according to the position of iterator i. The generic matrix-vector algorithm in Figure 2.3 is extremely flexible, and can be used with a wide variety of dense, sparse, and banded matrix types. For purposes of comparison, the traditional approach for coding matrix-vector products for sparse and dense matrices is listed. Note how the indexing in the MTL routine has been abstracted away. The traversal across a row goes from begin() to end(), instead of using explicit indices. Also, the indices used to access the x and y vectors are abstracted through the use of the row() and column() methods of the iterator. The row() and column() methods provide a uniform way to access index information regardless of whether the matrix is dense, sparse, or banded.
CHAPTER 2. GENERIC PROGRAMMING
12
// generic matrix-vector multiply template void matvec_mult(Matrix A, IterX x, IterY y) { typename Matrix::const_iterator i; typename Matrix::OneD::const_iterator j; for (i = A.begin(); i != A.end(); ++i) for (j = i->begin(); j != i->end(); ++j) y[j.row()] += *j * x[j.column()]; } // BLAS-style dense matrix-vector multiply for (int i = 0; i < m; ++i) for (int j = 0; j < n; ++j) y[i] += a[i*lda+j] * x[j]; // SPARSPAK-style sparse matrix-vector multiply for (int i = 0; i < n; ++i) for (int k = ia[i]; k < ia[i+1]; ++k) y[i] += a[k] * x[ja[k]];
Figure 2.3. Simplified example of a generic matrix-vector product, with a comparison to the traditional approach to writing dense and sparse matrix-vector products.
Chapter 3 Related Work The Matrix Template Library draws on previous research in the following fields:
Linear algebra libraries
Optimizing Compilers
Generic Programming and Software Engineering
The Matrix Template Library is unique in the way it combines the advances made in these three fields to produce a linear algebra library. The MTL contains new insights and advances which were born as a result of merging ideas from these fields.
3.1 Traditional Basic Linear Algebra Libraries The Basic Linear Algebra Subprograms (BLAS) [22, 23, 36] currently define an informal standard in C and Fortran for dense linear algebra. There are many different implementations of the BLAS. The Netlib BLAS [1] (Fortran) provides the reference implementation, and each hardware vendor supplies tuned versions of the BLAS. For sparse matrices, there is the NIST Sparse BLAS [48] (C) and SPARSKIT [49] (Fortran). There is also an effort underway to define the new BLAS, which will add operations on sparse matrices, mixed precision numbers, and intervals. A reference implementation of the new BLAS will be 13
CHAPTER 3. RELATED WORK
14
constructed on top of MTL. The matrix formats and algorithms of the BLAS were used as a basis for the construction of the MTL.
3.2 Automatically Tuned Dense Linear Algebra There are currently two approaches used for generating portable dense linear algebra routines. The first approach is that of optimizing compilers, which transforms the loops found in application programs. The second approach is for the automated code generation and optimization to be built into a library, which is then called from an application program.
3.2.1 The Optimizing Compiler Approach A large amount of research has been done in the compiler community with regards to loop optimizations, especially for dense linear algebra operations [34, 10, 61, 35, 60, 12, 11, 39, 14, 32]. The results of this research has proved extremely valuable in selecting the performance optimization used in MTL. Many of the performance optimizations used in MTL would not be necessary if commercial compilers implemented more transformations in a consistent and reliable fashion. The difficulty in applying optimizations within a general purpose compiler is that it is hard for the compiler to determine where to apply the transformations, and exactly which transformations to apply. An alternative to relying on a general purpose compiler is to build optimizations into a library, such as MTL. This has the advantage that the library designer knows where the optimizations are needed, and which transformations to apply.
3.2.2 The Library Approach There are currently two libraries (both written in C) that provide portable high performance, PHiPAC [7] and ATLAS [59]. There are two basic parts to each of these libraries.
CHAPTER 3. RELATED WORK
15
The first is a script which determines optimal blocking sizes and other parameters for high performance loops for a given architecture. Both PHiPAC and ATLAS use a brute force search to find these parameters. Some work in the compiler community points to a more efficient technique for determining the optimization parameters [10, 61]. The second part of these libraries is a code generation script which generates the optimized code based on these parameters. One of the contributions of the Matrix Template Library is in providing a more elegant and concise method of code generation based on the C++ template system. This is described in section 6.3. We currently use brute force searches to find the appropriate optimization parameters.
3.3 C++ Libraries for Linear Algebra There are many existing linear algebra libraries written in C++. Most of these can be categorized as object-oriented but not generic. They provide matrix and vector objects, but the algorithms are not formulated to allow one algorithm to work with many matrix formats. In addition, the other C++ libraries do not address highly optimized performance and do not claim “vendor-tuned” performance as MTL does. Table 3.1 categorizes the C++ libraries as object-oriented (OO), using expression templates [56] and/or operator overloading (ET/Op), and whether they use generative methods to create component combinations (Generative). Expression templates are a mechanism in C++ for improving the cache behavior when operator overloading is used. Again, none of these libraries use generic algorithms in the sense of the STL. Though not a linear algebra library, Blitz++ [58] is related to MTL in that it uses template metaprogramming [55] techniques to achieve high performance. In addition, the use of expression templates was pioneered in Blitz++.
CHAPTER 3. RELATED WORK
16
Library OO ET/Op Generative Template Numeric Toolkit (TNT) [47] X X C++ Scientific Library (SL++) [20] X X Generative Matrix Computation Library (GMCL) [16] X X X LAPACK++ [25] X Sparslib++ [26] X GNU Scientific Software Library (GNUSSL) [46] X Newmat [19] X X Table 3.1. Summary of C++ Libraries for Linear Algebra
3.4 Generic Programming and Software Engineering As already mentioned, the design of the MTL draws heavily from the Standard Template Library [52, 3], which popularized the notion of generic programming [43] to the C++ community. The mix-and-match component design used in MTL is related to several other works, including the Generative Matrix Computation Library (GMCL) [18, 16] GenVoca [6], and the aspect oriented programming methodology [40] from Xerox Parc. Barton and Nackman [5] introduce several important design techniques for using C++ in scientific computing. The MTL uses several of their techniques.
Chapter 4 MTL Algorithms The Matrix Template Library provides a rich set of basic linear algebra operations, roughly equivalent to the Level-1, Level-2 and Level-3 BLAS, though the MTL operates over a much wider set of datatypes. The Matrix Template Library is unique among linear algebra libraries because each algorithm (for the most part) is implemented with just one template function. From a software maintenance standpoint, the reuse of code gives MTL a significant advantage over the BLAS [22, 23, 36] or even other object-oriented libraries like TNT [47] (which still has different subroutines for different matrix formats). Because of the code reuse provided by generic programming, MTL has an order of magnitude fewer lines of code than the Netlib Fortran BLAS [1], while providing much greater functionality and achieving significantly better performance. The MTL implementation is 8,284 words (according the the unix wc utility) for algorithms and 6,900 words for dense containers. The Netlib BLAS total 154,495 words and high-performance versions of the BLAS (with which MTL is competitive) are even more verbose. In addition, the MTL has been designed to be easier to use than the BLAS. Data encapsulation has been applied to the matrix and vector information, which makes the MTL interface simpler because input and output is in terms of matrix and vector objects, instead of integers, floating point numbers, and pointers. It also provides the right level of abstraction to the user — operations are in terms of linear algebra objects instead of 17
CHAPTER 4. MTL ALGORITHMS
18
low-level programming constructs such as pointers. Table 4.1 lists the principal operations implemented in the MTL. One would expect to see many more variations on the operations to take into account transpose and scaling permutations of the argument matrices and vectors — or at least one would expect a “fat” interface that contains extra parameters to specify such combinations. The MTL introduces a new approach to creating such permutations. Instead of using extra parameters, the MTL provides matrix and vector adaptor classes. An adaptor object wraps up the argument and modifies the behavior of the object in the algorithm. Table 4.2 gives a list of the MTL adaptor classes and their helper functions. A helper function provides a convenient way to create adapted objects. For instance, the scaled() helper function wraps a vector in a scaled1D adaptor. The adaptor causes the elements of the vector to be multiplied by a scalar inside of the MTL algorithm. There are two other helper functions in MTL, strided() and trans(). The strided() function adapts a vector so that its iterators move a constant stride with each call to operator++. The trans() function switches the orientation of a matrix (this happens at compile time) so that the algorithm “sees” the transpose of the matrix. Table 4.3 shows how one can create all the permutations of scaling for a daxpy()-like operation. Example code for a basic implementation of the BLAS daxpy() is listed below. void daxpy(int int int ix, iy; for (ix = 0, y[iy] += a }
n, double a, double* x, incx, double* y, int incy) { iy = 0; i < n; ix += incx, iy += incy) * x[ix];
The example below shows how the matrix-vector multiply algorithm (generically written to compute y
A x) can also compute y
AT x with the use of adap-
tors to transpose A and to scale x by alpha. Note that the adaptors cause the appro-
CHAPTER 4. MTL ALGORITHMS Function Name Vector Algorithms set(x,alpha) scale(x,alpha) s = sum(x) s = one norm(x) s = two norm(x) s = infinity norm(x) i = max index(x) s = max(x) s = min(x) Vector Vector Algorithms copy(x,y) swap(x,y) ele mult(x,y,z) ele div(x,y,z) add(x,y) s = dot(x,y) s = dot conj(x,y) Matrix Algorithms set(A, alpha) scale(A,alpha) set diagonal(A,alpha) s = one norm(A) s = infinity norm(A) transpose(A) Matrix Vector Algorithms mult(A,x,y) mult(A,x,y,z) tri solve(T,x) rank one update(A,x,y) rank two update(A,x,y) Matrix Matrix Algorithms copy(A,B) swap(A,B) add(A,C) ele mult(A,B,C) mult(A,B,C) mult(A,B,C,E) tri solve(T,B)
19 Operation
xi 8i x P x s P i xi s Pi j xi j1 s ( i x2i ) 2 s max j xi j i index of max j xi j s max xi s min xi y x y$x z y x z yx y x+y s xT y s xT y A A Aii s s A
A P maxi ( Pj j aij j) maxj ( i j aij j) AT
y Ax z Ax + y x T ,1x A A + xyT A A + xyT + yxT B A B$A C A+C C B A C AB E AB + C B T ,1B
Table 4.1. MTL generic linear algebra algorithms.
CHAPTER 4. MTL ALGORITHMS Adaptor Class scaled1D scaled2D strided1D row/column orien row orien and strided offset column orien and strided offset
20 Helper Function scaled(x) scaled(A) strided(x) trans(A) rows(A) columns(A)
Table 4.2. MTL adaptor classes and helper functions for creating algorithm permutations.
Function Invocation Operation add(x,y) y x+y add(scaled(x,alpha),y) y x + y add(x,scaled(y,beta)) y x + y add(scaled(x,alpha),scaled(y,beta)) y x + y Table 4.3. Permutations of the add() operation made possible with the use of the scaled() adaptor helper function.
priate changes to occur within the algorithm; they are not evaluated before the call to mtl::mult() (which would hurt performance). The adaptor technique drastically reduces the amount of code that must be written for each algorithm. Section 5.8 discusses the details of how these adaptor classes are implemented. // y ::type myMatrix; // definition: template < class T, class Shape = rectangle, class Storage = dense, class Orien = row_major > struct matrix { typedef typename IF< EQUAL< Shape::id, RECT>::RET, typename gen_rect::RET, IF< EQUAL< Shape::id, DIAG>::RET, typename gen_diag::RET, generator_error >::RET
CHAPTER 5. MTL COMPONENTS
40
>::RET type; };
The generator class consists of logic that is evaluated at compile-time. The IF construct is itself a simple generator class that selects between two types depending on a condition (which must be a value known at compile time). The following code shows how the IF class can be implemented. template struct IF { typedef error_type RET; }; template struct IF { typedef A RET; }; template struct IF { typedef B RET; };
5.3 MTL Concepts In the context of generic programming, the term concept is used to describe the collection of requirements that a template argument must meet for the template function of templated class to compile and operator properly. In many respects a concept is similar to an interface description in that a concept specifies which methods a class must implement. In addition a concept can require that a class make certain internal type definitions, and a concept can place constraints on the behavior of the methods, such as complexity guarantees. If a class fulfills the requirements of a concept, the class is said to model the concept. A concept can extend another concept, which is called refinement. This terminology was adopted from the SGI STL documentation [3], and most of the concepts from the STL are used in MTL. The bold sans serif font is used for all concepts.
CHAPTER 5. MTL COMPONENTS
41
5.3.1 Matrix The central concept in the Matrix Template Library is of course the Matrix. An MTL Matrix can be thought of as a Container of Containers (referred to as a 2D Container). As an STL Container an MTL Matrix has begin() and end() methods which return iterators for traversing over the 2D Container. These iterators dereference to give a 1D Container, which also has begin() and end() functions. The MTL implements a large variety of matrix formats, all with very different data representations and implementation details. However, the same MTL Matrix interface is provided for all of the matrices. Table 5.1 and Table 5.2 list the requirements of the Matrix concept. In addition, Matrix is a refinement of Container, so the requirements listed here are in addition to those of the Container concept. In the table, X refers to the type that models Matrix and A refers to an object of type X. The symbols m, n, i, j, row start, column start are all of unsigned integral type. split rows and split columns are Containers containing integral values. type definition X::shape X::orientation X::sparsity X::OneD X::OneDRef X::submatrix type X::partition type X::value type
description Tag to describe the shape of the matrix Either row tag or column tag Either dense tag or sparse tag The type of the inner containers (rows, columns, or diagonals) The reference type for OneD The type for a submatrix of X The type of a partitioned X The element type stored in the X
Table 5.1. Matrix associated types, in addition to those of Container
CHAPTER 5. MTL COMPONENTS
42
5.3.2 Vector The MTL Vector concept is a Container in which every element has a corresponding index. The elements do not have to be sorted by their index, and the indices do not necessarily have to start at 0. Also the indices do not have to form a contiguous range. The iterator type must be a model of IndexedIterator, which provides access methods to the indices. Vector is not a refinement of RandomAccessContainer (even though Vector defines operator[]) because Vector does not guarantee amortized constant time for that operation (to allow for sparse vectors). Note also that the invariant a[n] == advance(A.begin(), n) that applies to RandomAccessContainer does not apply to Vector, since the a[i] is defined for Vector to return the element with the ith index. So a[n] == *i if and only if i.index() == n. Table 5.3 lists the associated types and required methods of Vector.
5.3.3 IndexedIterator IndexedIterator is the iterator concept for iterators of Vectors and Matrices. An IndexedIterator provides access to the indices, as well as the elements, of a Vector or Matrix. For instance, given an iterator i of a Vector, i.index() gives the index corresponding to the element stored at *i. For IndexedIterators inside of Matrices, the row and column indices cooresponding to the current position of the iterator can be accessed through the i.row() and i.column() methods. In this way the IndexedIterator concept hides the differences between matrix types such as banded and sparse, allowing the MTL algorithms to be more generic. Table 5.4 gives the requirements for IndexedIterator.
CHAPTER 5. MTL COMPONENTS
43
5.3.4 Indexer This concept is special in that it defines a matrix aspect. An aspect is a cross-cutting attribute that affects the implementation of a component. By packaging the functionality into a separate class, one can swap in different aspects without changing the main code of the component. The Indexer concept is in charge of mapping indices from normal Matrix coordinates into the TwoD coordinate system. Here is an example of such a mapping for a banded matrix. [ 1 Matrix = [ [
2 4
3 5 7
] 6 ] 8 ]
The element whose value is 4 is at (1,1) in Matrix coordinates. In TwoD coordinates the 4 is at (1,0). The TwoD mapping of this matrix would look as follows. [ 1 TwoD = [ 4 [ 7
2 3 ] 5 6 ] 8 ]
The TwoD Mapping for a diagonal matrix would looks as follows. [ 3 TwoD = [ 2 [ 1
6 ] 5 8 ] 4 7 ]
There are three models of the Indexer concept, and each one provides a different mapping. There is the rect indexer for rectangular matrices, the banded indexer for banded matrices, and the diagonal indexer for diagonal matrices. Table 5.5 lists the requirements for Indexer.
5.3.5 OneDIndexer Models of this concept are used to implement the row() and column() methods of Matrix iterators. Table 5.6 gives the methods required by OneDIndexer .
CHAPTER 5. MTL COMPONENTS
44
5.3.6 Offset Offset is also an aspect concept. This concept is used in the dense2D class to allow for variability in the way matrix elements are mapped to linear memory. There are several models of the Offset concept that allow for different storage types such as the packed and banded matrix formats used in the BLAS. The elt(i,j) method gives the offset to find the (i; j ) element. The following code examples gives the implementations of elt(i,j) for rect offset, strided offset, and banded offset. The implementation for banded offset is much different since it must take into account the bandwidth of the matrix. ld is the leading dimension of the matrix (the distance from the start of one row to the start of the next). bw.first() gives the number of sub diagonals, and ndiag is the number of diagonals (bandwidth) of the matrix. // dense rectangular size_type elt(size_type i, size_type j) const return i * ld + j; } // strided size_type elt(size_type i, size_type j) const return j * ld + i; } // banded size_type elt(size_type i, size_type j) const return i * ndiag + max(0, bw.first() - i) + }
{
{
{ j;
The implementation for packed offset uses the formula for an arithmetic series to calculate the offset. Table 5.7 lists the requirements for the Offset concept.
5.4 MTL Object Memory Model The MTL object memory model is handle based. This differs from the Standard Template Library. Object copies and assignment are shallow. This means that when one vector
CHAPTER 5. MTL COMPONENTS
45
object is assigned to another vector object it becomes a second handle to the same vector. The same applies to matrices. The example below demonstrates this. The vector y reflects the change made to vector x, since they are both handles to the same vector. Vector z, however, does not reflect the change in x, since it is a different vector. mtl::dense1D x(5, 1.0); print_vector(x); > 1 1 1 1 1 mtl::dense1D y = x; mtl::dense1D z(N); mtl::copy(x, z); x[2] = 3; print_vector(y); > 1 1 3 1 1 print_vector(z); > 1 1 1 1 1
The main reason that the handle-based object model was used for MTL was that MTL makes heavy use of adaptor helper functions to modify the arguments to MTL algorithms. The adaptor functions return temporary objects that are typically passed as arguments directly to MTL algorithms. If the MTL algorithms use pass-by-reference for all their arguments then the C++ compiler will emit a warning. Using const pass-by-reference solves the problem for in parameters but not for out. MTL solves this problem by instead passing the matrix and vector arguments by value, and by making the matrix and vector objects handles to the underlying data structures. The MTL containers make considerable use of STL containers (though we use our own higher-performance version of the vector container). The underlying STL objects within the MTL OneD containers are reference counted to make the memory management easier for the user. This is especially helpful when the MTL matrix and vector object are used in the construction of larger object-oriented software systems. In C++ the reference counting can be non-intrusive to the classes that must be refer-
CHAPTER 5. MTL COMPONENTS
46
ence counted. This is especially important if one wishes to use containers supplied by other libraries (such as STL). The implementation of reference counting smart pointers in Figure 5.11 derives from the Handle class in [53].
5.5 The MTL Component Architecture The MTL has a layered architecture that maximizes internal reuse, allowing an additive number of classes to be combined to generate a multiplicative number of concrete components. The implementation components coordinate to implement the particular model of Matrix requested by the user. Fig. 5.12 gives an overview of all of the components that go into the MTL matrices and vectors. The notation used is a variation on UML [31, 45]. The solid boxes are MTL classes, and the boxes with dotted lines are template arguments and the corresponding concepts. The models relationship is depicted by placing the class box within the concept box. To construct and MTL Matix types, components are chosen from the models of 2D Storage, Indexer, Offset, and Offset, and plugged in to the matrix implementation class, which provides the glue to implement the Matrix interface. The following sections discuss the role of each of the component concepts, and discusses some of the concrete components that are in the MTL.
5.6 TwoD Storage Classes The matrix formats of the MTL are expressed as 2D Containers. The 2D Containers are neither row-major or column-major. Instead they are orientation neutral, and the Orienter aspect classes (row orien and column orien) are responsible for mapping the matrix coordinates to 2D coordinates. In this way each matrix format can be implemented just once but still provide row and column major versions of the matrix format for the
CHAPTER 5. MTL COMPONENTS
template class refcnt_ptr { typedef refcnt_ptr self; public: refcnt_ptr() : object(0) { } refcnt_ptr(Object* c) : object(c), count(new int(1)) { } refcnt_ptr(const self& x) : object(x.object), count(x.count) { this->inc(); } refcnt_ptr() { this->dec(); } self& operator=(Object* c) { if (object) this->dec(); object = c; count = new int(1); return *this; } self& operator=(const self& x) { if (object) this->dec(); object = x.object; count = x.count; this->inc(); return *this; } Object& operator*() { return *object; } const Object& operator*() const { return *object; } Object* operator->() { return object; } const Object* operator->() const { return object; } void inc() { (*count)++; } void dec() { (*count)--; if (*count ::type Matrix; const Matrix::size_type N = 3; Matrix::size_type large; double dA[] = { 1, 3, 2, 1.5, 2.5, 3.5, 4.5, 9.5, 5.5 }; Matrix A(dA, N, N); // Find the largest element in column 1. large = mtl::max_index(A[0]); // Swap the first row with the row containing the largest // element in column 1. mtl::swap( rows(A)[0] , rows(A)[large]);
5.6.2 compressed2D This storage class implements the compressed row or column matrix format described in Section 5.1.3. The following example shows two ways in which one can construct compressed matrices. The user can provide external data pointers to the matrix constructor, as for matrix A, or the user can create the matrix from scratch as in B, in which case the matrix manages its own memory. const int m = 3, n = 3, nnz = 5; double values[] = { 1, 2, 3, 4, 5 }; int indices[] = { 1, 3, 2, 2, 3 }; /* stored indices are Fortran */ int row_ptr[] = { 1, 3, 4, 6 }; /* Style for compatibility */ // Create from pre-existing arrays typedef matrix::type MatA; MatA A(m, n, nnz, values, row_ptr, indices); // Create from scratch typedef matrix::type MatB; MatB B(m, n); B(0,0) = 1; B(0,2) = 2; B(1,1) = 3; B(2,1) = 4; B(2,2) = 5;
CHAPTER 5. MTL COMPONENTS
50
5.6.3 array2D This class implements the array matrix storage format. array2D is actually implemented with a Container of Containers (whereas the other 2D storage types just act like Containers of Containers). In this way the array2D storage type is the most flexible, since many different types of 1-D Containers can be used in conjunction with this class. The 1-D Containers include linked list (implemented with sparse1D), tree (sparse1D), dense (std::vector), sparse pair ( sparse1D ), and compressed (mtl::compressed1D). One special feature of the array2D is that one can swap and assign the 1D Containers inside the array in constant time. The following example shows how one can create matrices of array storage with many different types of OneD Containers. In addition, the example demonstrates how the OneD Containers in an array matrix can be swapped in constant time. typedef matrix< double, rectangle, array< dense >, row_major>::type MatA; typedef matrix< double, rectangle, array< compressed >, row_major>::type MatB; typedef matrix< double, rectangle, array< sparse_pair >, row_major>::type MatC; MatA A(M,N); MatB B(M,N); MatC C(M, N); // Fill A ... mtl::copy(A, B); MatB::Row tmp = B[2]; B[2] = B[3]; B[3] = tmp; mtl::copy(B, C);
CHAPTER 5. MTL COMPONENTS
51
5.6.4 envelope2D This is the 2D storage type that implements the envelope matrix storage format. Two arrays are used to represent the matrix, the
V AL array holds the values in the sparse
matrix, and the PTR array points to the diagonal elements in VAL. The 1D segments of
V AL are actually dense in the sense that zeros are stored. Each 1D segment starts at the first non-zero element in the row. The index mapping for envelope storage is A(i; j ) = V AL(PTR(i) , i + j ).
5.7 OneD Containers/Vectors 5.7.1 dense1D This is the primary MTL class for representing Vectors. This class uses the std::vector for its implementation. A dense1D object serves as a handle to the underlying std::vector. The iterators of the std::vector are adapted with dense iterator so that they model the IndexedIterator concept.
5.7.2 compressed1D The compressed1D Vector is a sparse vector implemented with a pair of arrays. One array is for the element values, and the other array is for the indices of the elements. The elements are ordered by their index as they are inserted into the compressed1D. compressed1Ds can be used to build matrices with the array storage format, and they can also be used stand-alone as a Vector. [ (1.2, 3), (4.6, 5), (1.0, 10), (3.7, 32) ] [ 1.2, 4.6, 1.0, 3.7 ] [ 3, 5, 10, 32 ]
Value Array Index Array
A Sparse Vector
CHAPTER 5. MTL COMPONENTS
52
The compressed1D::iterator dereferences (*i) to return the element value. One can access the index of that element with the i.index() function. One can also access the array of indices through the nz struct() method. One particularly useful fact is that one can perform scatters and gathers of sparse elements by using the mtl::copy(x,y) function with a sparse and a dense vector.
5.8 Adaptors An adaptor class is one which modifies the interface and/or behavior of some base class. Adaptors can be used for a wide range of purposes. They can be used to modify the interface of a class to fit the interface expected by some client code [28], or to restrict the interface of a class. The STL std::stack adaptor is a good example of this. It wraps up a container, and restricts the operations allowed to push(), pop(), and top(). A good example of an adaptor that modifies the behavior of a class without changing its interface is the reverse iterator of the STL. An adaptor class can be implemented with inheritance or aggregation (containment). The use of inheritance is nice since it can reduce the amount of code that must be written for the adaptor, though there are some circumstances when it is not appropriate. One such situation is where the type to be adapted could be a built-in type (a pointer for example). This is one of the reasons why the reverse iterator of the STL is not implemented with inheritance.
5.8.1 sparse1D The sparse1D class adapts an STL-style container into a sparse vector. The adapted container must have elements that are index-value pairs. The std::vector, std::list, and std::set STL containers all work well with the sparse1D adap-
CHAPTER 5. MTL COMPONENTS
53
tor. The time and space complexity of the various operations on a sparse1D container depend on the adapted container. For instance, random insertion into a set is O (log n) while it is O (n) for a vector. The main purpose of the sparse1D container in the MTL is to allow for flexible construction of many sparse matrix types through composing different types of sparse vectors. The code below is a short example of creating a sparse1D with a set, inserting a few elements, and then accessing the values and indices of the resulting vector. The normal iterators of the sparse1D, returned by begin(), give access to the values of the elements (through dereference) and also the indices (through the index() method on the iterator). If one wishes to view the indices only (the non-zero structure of the vector), one can use the nz struct() method to obtain a container consisting of the indices of the elements in the sparse vector. typedef mtl::sparse1D > SparseVec; SparseVec x; for (int i = 0; i < 5; ++i) x[i*2] = i; ostream_iterator couti(cout); std::copy(x.begin(), x.end(), couti); SparseVec::IndexArray ix = x.nz_struct(); std::copy(ix.begin(), ix.end(), couti); > 01234 > 02468
The value type of the underlying containers must be mtl::entry1, which is just an index-value pair. The elements are ordered by their index as they are inserted.
5.8.2 Scaling Adaptors For performance reasons basic linear algebra routines often need to incorporate secondary operations into the main algorithm, such as scaling a vector while adding to another vector. The daxpy() BLAS routine (y
x + y) is a typical example, with its alpha
CHAPTER 5. MTL COMPONENTS
54
parameter. void daxpy(int n, double* dx, double alpha, int incx, double* dy, int incy); The problem with the BLAS approach is that it becomes necessary to provide many different versions of the algorithm within the daxpy() routine, to handle special cases with respect to the value of alpha. If alpha is 1 then it is not necessary to perform the multiplication. If alpha is 0 then the daxpy() can return immediately. When there are two or more arguments in the interface that can be scaled, the permutations of cases results in a large amount of code to be written. The MTL uses a family of iterator, vector, and TwoD container adaptors to solve this problem. In the MTL, only one version of each algorithm is written, and it is written without regard to scaling. There are no scalar arguments in MTL routines. Instead, the vectors and matrices can optionally be modified with adaptors so that they are transparently (from the point of view of the algorithm) scaled as their elements are accessed. If the scalar value is set at compile-time, one can rely on the compiler to optimize and create the appropriate specialized code. The scaling adaptors should also be coded to handle the case were the specialization needs to happen at run time, though this is more complicated and is a current area of research. Note that the BLAS algorithms do all specialization at run time, which causes some unnecessary overhead for the dispatch in the situations where the specialization could have been performed at compile time.
scale iterator The first member of the scaling adaptor family is the scale iterator. This is an iterator adaptor that multiplies each element by a scalar as the iterator dereferences. Below is an example of how the scale iterator could be used in conjunction with the STL transform algorithm. The transform algorithm is similar to the daxpy()
CHAPTER 5. MTL COMPONENTS
55
routine; the vectors x and y are being added and the result is going into vector z. The elements from x are being scaled by alpha. The operation is z
x + y. Note that this
example is included to demonstrate a concept, it does not show how one would typically scale a vector using MTL. The MTL algorithms take vector objects as arguments, instead of iterators, and there is a scaled1D adaptor that will cause a vector to be scaled. double alpha; std::vector x, y, z; // set the lengths of x, y, and z // fill x, y and set alpha typedef std::vector::iterator iter; scale_iterator start(x.begin(), alpha); scale_iterator finish(x.end()); // z = alpha * x + y std::transform(start, finish, y.begin(), z.begin(), plus())
The following example is an excerpt from the scale iterator implementation. The base iterator is stored as a data member (instead of using inheritance) since the base iterator could possibly be just a basic pointer type, and not a class type. The scalar value is passed into the iterator's constructor and also stored as a data member. The scalar is then used in the dereference operator*() to multiply the element from the vector. The increment operator++() merely calls the underlying iterator's method. template class scale_iterator { public: scale_iterator(const Iterator& x, const value_type& a) : current(x), alpha(a) { } value_type operator*() const { return alpha * *current; } scale_iterator& operator++ () { ++current; return *this; } // ... protected: Iter current; value_type alpha; };
CHAPTER 5. MTL COMPONENTS
56
At first glance one may think that the scale iterator introduces overhead which would have significant performance implications for inner loops. In fact modern compilers will inline the operator*(), and propagate the scalar's value to where it is used if it is a constant (known at compile time). This results in code with no extra overhead.
scaled1D The scaled1D class wraps up any OneD class, and it uses the scaleiterator to wrap up the OneD iterators in its begin() and end() methods. The main job of this class is to merely pass along the scalar value (alpha) to the scaleiterator. An excerpt from the scaled1D class is given below. template class scaled1D { public: typedef scale_iterator const_iterator; scaled1D(const Vector& r, value_type a) : rep(r), alpha(a) { } const_iterator begin() const { return const_iterator(rep.begin(), alpha); } const_iterator end() const { return const_iterator(rep.end(), alpha); } const_reference operator[](int n) const { return *(begin() + n); } protected: Vector rep; value_type alpha; };
The helper template function scaled() is provided to make the creation of scaled vectors easier. template scaled1D scaled(const Vector& v, const T& a) { return scaled1D(v,a); }
CHAPTER 5. MTL COMPONENTS
57
The example below shows how one could perform the scaling step in a Gaussian elimination. In this example the second row of matrix A is scaled by 2, using a combination of the MTL vecvec::copy algorithm, and the scaled1D adaptor. // [[5.0,5.5,6.0], // [2.5,3.0,3.5], // [1.0,1.5,2.0]] double scalar = A(0,0) / A(1,0); Matrix::RowVectorRef row = A.row_vector(1); mtl::copy(scaled(row, scalar), row); // [[5.0,5.5,6.0], // [5.0,6.0,7.0], // [1.0,1.5,2.0]]
As one might expect, there is also a scaling adaptor for matrices in the MTL, the scaled2D class. It builds on top of the scaled1D in a similar fashion to the way the scaled1D is built on the scale iterator.
5.8.3 Striding Adaptors A similar problem to the one of scaling is that of striding. Many times a user wishes to operate on a vector that is not contiguous in memory, but at constant strides such as a row in a column oriented matrix. Again a library writer would need to add an argument to each linear algebra routine to specify the striding factor. Also, the algorithms would need to handle striding by 1 different from other striding factors for performance reasons. Again this causes the amount of code to balloon, as is the case for the BLAS. This problem can also be handled with adaptors. The MTL has a strided iterator and strided vector adaptor. The implementation of the strided adaptors is very similar to that of the scaled adaptors.
CHAPTER 5. MTL COMPONENTS
expression X A(m, n); X A(m, n, sub, super) X A(B); X A(data, m, n); X A(data, m, n, ld); X A(data, m, n, ld, sub, super); A(data); X A(m, n, nnz, val, ptrs, inds); X A(B, sub, super); X A(stream); X A(stream); X A(stream, sub, super); X A(stream, sub, super); A(i,j) A(i,j) A[i] A[i] A.sub matrix(row start, row finish, col start, col finish) A.partition(split rows, split cols) A.subdivide(row split, column split) A.nrows() A.ncols() A.nnz() A.sub() A.super() A.is unit() A.is upper() A.is lower()
58
return type X X X X X X
note Normal Constructor Constructor for banded matrices Copy Constructor Construct for external matrices Different leading dimension (ld) External banded matrices
X X X X X X X X reference const reference OneDRef ConstOneDRef submatrix type
Static sized dimension constructor Sparse external matrix constructor Banded view constructor Matrix market stream constructors Harwell Boeing stream constructor Matrix market stream (banded) Harwell Boeing stream (banded) Element Access Const Element Access OneD Access Const OneD Access Submatrix Access
partition type
Partitioned Matrix Creation
partition type
Subdivide (partition into 4) Number of rows Number of columns Number of non-zero elements Sub part of bandwidth Super part of bandwidth Main diagonal all ones? Matrix shape is upper triangular? Matrix shape is lower triangular?
size type size type size type difference type difference type bool bool bool
Table 5.2. Matrix method requirements, in addition to those of Container
CHAPTER 5. MTL COMPONENTS type definition X::iterator X::sparsity X::IndexArray X::subrange X::partition type a[n] a[n] a.partition(splits) a.subdivide(split) a(s,f) a.nnz()
59
return type tag tag
reference const reference partition type subrange size type
description Models IndexedIterator Either dense tag or sparse tag Container containing the indices of the elements A Vector of a subrange A Vector of Vectors Element access Const element access Partition the vector Divide in 2 Subrange access Number of non-zero elements
Table 5.3. Vector requirements, in addition to those of Container
expression i.row() i.column() i.index()
return type
note Row Index access Column Index access Vector Index access
Table 5.4. IndexedIterator requirements
expression X::dim type x.deref() x.at(p) X::twod dim(dim) x.nrows() x.ncols() x.sub() x.super()
return type
note Pair type for matrix dimension
OneDIndexer dim type size type size type size type difference type difference type
Map point p from Matrix coordinates to TwoD coordinates Calculate the dimensions for the TwoD container number of rows number of columns number of sub diagonals number of super diagonals
Table 5.5. Indexer requirements
expression x.row(i) x.column(i) x.at(i)
return type size type size type size type
note Calculate the row index of the given iterator Calculate the column index of the given iterator Calculate the offset to the element with index i
Table 5.6. OneDIndexer requirements
CHAPTER 5. MTL COMPONENTS
expression x.elt(i,j) x.oned offset(i)
return type size type size type
x.oned length(i) x.twod length() x.stride() X::size(m, n, sub, super) x.major() x.minor()
size size size size
type type type type
size type size type
60
note Maps the (i,j) point from TwoD coords to a linear offset The offset to the start of the ith OneD part of the TwoD The length of the ith OneD part The size of the outer Container of the TwoD The distance from one element to the next, usually 1 The total size of the memory allocated to the TwoD The major dimension size The minor dimension size
Table 5.7. Offset requirements
Chapter 6 High Performance We have presented many levels of abstraction, and a comprehensive set of algorithms for a variety of matrices, but this elegance matters little if high performance cannot be achieved. In this section we will discuss recent advances in languages and compilers that allow abstractions to be used with little or no performance penalty. Furthermore, we will present a comprehensive set of abstractions, the Basic Linear Algebra Instruction Set (BLAIS) and Fixed Algorithm Size Template (FAST) sub-libraries, that have the specific purpose of generating optimized code. The generation of optimized code is made possible with template meta-programming [55]. There is a common perception that the use of abstraction hurts performance. This is due to a particular set of language features that are used to create abstractions and how those language features are implemented. The language features that are used to create abstractions are listed below.
Procedures
The basic building block of abstraction is the procedure call. For each
abstraction level, one needs a set of functions, an interface for the abstraction level. Traditionally a procedure call has incurred a significant overhead—copying parameters to the stack, etc. Many compilers are now able to inline procedure calls to remove this performance penalty. 61
CHAPTER 6. HIGH PERFORMANCE
62
Classes The main tool for data encapsulation is the class language construct. It allows arbitrarily complex sets of data to be grouped together and hidden within an abstraction. Classes (and structures in general) interfere with the register allocation algorithms of many compilers. Optimizing compilers map the local variables of a function to machine registers. This can drastically reduce the number of loads and stores necessary since the registers are being used to cache data from memory. Many compilers do not recognize that this optimization can also be applied objects on the stack. They specify a load or store for each access to a data item within the object, which kills performance for codes like STL and MTL which use iterators — objects that have to be accessed over and over again within the inner loops of the code. A typical example of the problem with small objects comes up in the use of complex numbers in C++. We illustrate the problem with the code below which calculates the sum of a complex vector. We have written out the pseudo-assembly code for the loop. A load or store results from each access to a complex number. Therefore each loop iteration includes 4 loads and 2 stores. We show the pseudo-assembly code for the loop after register allocation has been applied, which maps the complex number a to registers (in this case R3 and R5). This version only does 2 loads and 0 stores inside the loop. In modern processors memory access is more expensive than ALU operations, so the reduction in the number of loads and stores has a large impact on overall performance. complex a, b[N]; // take the sum of vector b for (int i = 0; i < N; ++i) a += b[i];
CHAPTER 6. HIGH PERFORMANCE
63
// pseudo-assembly code (unoptimized) looptop: CMP i N BRZ loopend LOAD a.real R3 LOAD b[i].real R4 ADD R3 R4 R5 STORE R5 a.real LOAD a.imag R3 LOAD b[i].imag R4 ADD R3 R4 R5 STORE R5 a.imag JMP looptop loopend: // pseudo-assembly code (optimized) LOAD a.real R3 LOAD a.imag R5 looptop: CMP i N BRZ loopend LOAD b[i].real R4 ADD R3 R4 R3 LOAD b[i].imag R6 ADD R5 R6 R5 JMP looptop loopend: STORE R3 a.real STORE R5 a.imag
Polymorphism The language feature that enables generic programming is polymorphism. It enables an algorithm to work with many data types and data structures instead of just one in particular. The first language feature in C++ that enabled polymorphism was the virtual function call, which allows a function call to dispatch to a specialized version based on the object type. With virtual functions, the object type is not known until run time, so the dispatch happens during program execution. This is called dynamic polymorphism. A disadvantage of the run time dispatch is that it adds some overhead to the cost of a normal function call. Even worse, virtual function calls interfere with inlining. Virtual function calls cannot be inlined because the object type is not known at
CHAPTER 6. HIGH PERFORMANCE
64
compile time, when the inlining optimization is applied, and the compiler cannot decide which specialized version to dispatch to. Two important advances have been introduced into C++ compilers that remove the performance penalties associated with abstraction. The first is a language feature and the second is a compiler optimization.
Static Polymorphism The addition of templates to the C++ language creates a way for functions to be selected at compile time based on object type. With static polymorphism the object type is known at compile time, which enables the compiler to hard code the dispatch decision. As a result template functions can be inlined in the same way as regular functions. In addition, many C++ compilers have improved their ability to inline functions in general, to the point where one can know with a relatively high degree of confidence that if a function is labeled inline that it is really being inlined. Even extremely complicated layers of functions can be completely flattened out. We prove this with the performance achieved by the BLAIS and FAST libraries.
Lightweight Object Optimization The performance penalty associated with the use of classes and structures can be solved with a relatively straightforward optimization (though the implementation is difficult). Each object is removed from the code, and it is replaced with its individual parts. This happens in a recursive fashion until there are only basic data types (integers, floats, etc.). Then each reference through an object to one of its parts is replaced with a direct reference to the part. Note that this is only applied to objects on the stack (local variables). The Kuck and Associates C++ compiler [33] performs this optimization, which is also known as scalar replacement of aggregates [41]. The end result is that the data items within the objects can then be mapped appropriately to
CHAPTER 6. HIGH PERFORMANCE
65
machine registers by the normal register allocation algorithms. With the use of template functions, and with lightweight object optimization, it is now possible to introduce abstractions with no performance penalty. This allowed us to design MTL in a generic fashion, composing containers and allowing a mix and match model with the algorithms. At this time we know that the compilers that perform both of these optimizations include Kuck and Associates Inc. C++ compiler and the SGI C++ compiler. 1
The egcs compiler (though it has great language support) is missing the lightweight ob-
ject optimization as well as several other optimizations. We will be conducting a complete survey of current compilers' optimization levels in the near future. Even after the “abstraction” barrier has been removed, there are yet more optimizations that need to be applied to achieve high performance. As alluded to in the introduction, achieving high performance on modern microprocessors is a difficult task, requiring many complex optimizations. Today's compiler can aid somewhat in this area, though to achieve “ideal” performance one must still hand optimize the code.
Compiler Optimizations: Unrolling and Instruction Scheduling
Modern compilers
can do a great job of unrolling simple loops and scheduling instructions, but typically only for specific (recognizable) cases. There are many ways, especially in C and C++ to interfere with the optimization process. The MTL containers and algorithms are designed to result in code that is easy for the compiler to optimize. Furthermore, the iterator abstraction makes inter-compiler portability possible, since it encapsulates how looping is performed. This is discussed below in section 6.1 and 6.2. 1
We use the C++ compiler from Kuck and Associates, Inc. (KAI) for development. We have found that the easiest way to check to make sure a function is inlined is to inspect the intermediate C code that KAI's compiler generates. With the proper use of compiler flags we have found that the KAI compiler does a reliable job. Hopefully, in the future, compilers will give more direct feedback on the optimizations that they are performing.
CHAPTER 6. HIGH PERFORMANCE Algorithmic Blocking
66
To obtain high performance on a modern microprocessor, an al-
gorithm must properly exploit the associated memory hierarchy and pipeline architecture. Todays compilers are not able to apply all of these transformations, so the programmer must apply some optimizations by hand. To make matters worse, the transformations are somewhat machine dependent. The number of registers, size of cache, and other machine characteristics affect the blocking sizes. This makes it difficult to express high performance algorithms in a portable fashion. Our solution to this problem is discussed in section 6.3.
6.1 Mayfly Components A specific coding style must be used in order for the compiler inlining and lightweight object optimization to occur. As mentioned above, using template functions is a step in the right direction. In order to have the lightweight object optimization occur, one must code the interface objects (iterators, vectors, matrices) in a particular style, which we have named the Mayfly pattern (cataloging design patterns was popularized by the “Gang-ofFour” [28]) , after the insect well known for its short life span. Mayfly components are objects that live on the stack (and often only in registers), and whose sole purpose is to provide a generic interface to some data structure. The iterators of the Standard Template Library are a good examples of mayflies. Mayflies also play a prominent role in the Matrix Template Library. The MTL supports a wide variety of matrix formats including compressed row/column, diagonal, banded, packed, and envelope (these formats appear in popular linear algebra libraries such as LAPACK, SPARSKIT, etc). It is the role of the mayflies in MTL to make all of these matrix formats appear to have the same interface, while introducing zero overhead. The MTL matrix interface corresponds to the container of containers abstraction,
CHAPTER 6. HIGH PERFORMANCE
67
i.e. from the users perspective all MTL matrices behave like an STL container such as vector< vector >. Of course the commonly used matrix formats do not have this “container of containers” structure, at least not in a concrete sense. For example, the most common matrix format is a single contiguous array, where sections of the array are considered to represent rows or columns of the matrix. When one wants to operate on a particular row some pointer arithmetic is performed to move a pointer to the beginning of a row, and then the pointer can be incremented to traverse the row (or one can also index into the array with an offset).
Mayflies in MTL
In order to provide a “container of containers” interface to this dense
matrix format, the MTL creates row objects on the fly. That is, when the user requests a particular row, the MTL matrix creates a row object. Suppose one wishes to add the elements of row 3 to row 5 of an MTL matrix. One would write the following expression, where A is some MTL matrix. add(A[3],A[5]); The operator[](n) creates a vector object. The object is not allocated on the heap (with new), instead it is just returned by value and copied into the add() routine. This keeps the object on the stack, from which many compilers can further optimize the object so that it lives solely in registers. In addition, most functions with which the mayfly is involved become inlined, and therefore any overhead of passing by value is removed. These are the a characteristics of mayflies: they are passed by value, live on the stack or in registers, are small enough to induce little overhead, and provide an abstract interface to some lower-level data structure. To tie this into the example above, the row vector object did not exist before the call to operator[](n), and it is gone once the function call to add() is over, hence the name mayfly.
CHAPTER 6. HIGH PERFORMANCE
68
MTL matrices export iterators in same way that STL vector< vector > does. There are the 2-D iterators that traverse the outer Container, and there are 1-D iterators that traverse down a particular row (or column). The following example revisits the generic algorithm for matrix-vector multiplication with a focus on the mayfly components. There are several mayfly objects involved in this algorithm. Of course the iterators i and j are mayflies, they are local to this function. In addition, the vector objects that result from dereferencing i, the two expressions (*i), are mayflies. template void mult(const Matrix& A, const VecX& x, VecY y) { typename Matrix::const_iterator i; typename Matrix::OneD::const_iterator j; for (i = A.begin(); i != A.end(); ++i) for (j = (*i).begin(); j != (*i).end(); ++j) y[j.row()] += *j * x[j.column()]; }
Mayflies are high performance
In many cases the overhead associated with mayflies
can be reduced to zero, resulting in the optimal light weight interface. There are several things that help make this happen. By design, one will typically inline the functions in which the mayflies are involved. Second, the data elements within a mayfly can be mapped to registers by a good optimizing compiler (lightweight object optimization [33, 41]). In this way, any overhead that might have in passing the object in a function call, or in the extra loads and stores that would have occurred due to the presence of a structure, have been removed by the compiler. The resulting assembly code ends up being the same as it would have if one had written the code in a non-generic fashion, resulting in the same high performance.
CHAPTER 6. HIGH PERFORMANCE
69
6.2 High Performance Iterators Iterators control how looping is performed, and therefore their design can make a large difference to the performance of a particular code. The biggest concern here is identifying what kind of loops the underlying backend compiler will optimize (perform unrolling, instruction scheduling, etc.). The design space includes whether to increment a pointer or increment an integer offset, and it also includes whether to use the less-than or not-equal operator for the loop termination condition. One would think modern compilers should produce equally good code for all of these cases, but this is not the case. There can be a factor of 2 or more difference depending on the type of loop used. The four variations on the traversal method can be seen in the following example loop, which computes a dot product of two vectors. int i; double* x, *y, *xp, *yp; // integer, != operator for (i = 0; i != N; ++i) tmp += x[i] * y[i]; // integer, < operator for (i = 0; i < N; ++i) tmp += x[i] * y[i]; // pointer, != operator yp = y; for (xp = x; xp != x + N; ++xp, ++yp) tmp += *x + *y; // pointer, < operator yp = y; for (xp = x; xp < x + N; ++xp, ++yp) tmp += *x + *y;
Table 6.1 shows the variations in performance on a loop (dot product) for three different computer architectures/compilers. The dot product computation was chosen because there are no aliasing issues and it includes the typical add/multiply floating point operation. The native C compiler was used for each machine with maximum optimization flags turned on.
CHAPTER 6. HIGH PERFORMANCE iterator type UltraSPARC 30 integer pointer RS6000 590 integer pointer R10000 integer pointer
70 comparison type < != 180.085 44.6187 180.319 44.4693 47.7829 47.78 47.7595 20.8623 102.512 106.101 81.2678 72.648
Table 6.1. The effect of iterator and comparison operator choice on performance (in Mflops) for dot product on Sun C, IBM XLC, and SGI C compilers.
From Table 6.1 we can surmise that by choosing to increment an integer offset, and using the less-than comparison operator, we can achieve top performance with all of the compilers tested. This result differs from the findings of PHiPAC [7], which suggests that one should always use the not-equal operator because it is more efficiently implemented in some architectures. Our experience has shown that the architecture implementation is not as important as whether the compiler knows how to optimize a loop that uses a not-equal comparison operator. Of course, this test did not include all C compilers, but it does give us a warning that we ought to write our code in such a way as to make it easy to change the operator used. In C++ this is easy to do with an extra layer of abstraction. Instead of using a particular operator for each loop, we call the not at() template function. The not at() function then invokes the proper comparison operator. Now there is only one line that needs to change if the compiler or architecture has a particular operator preference. Another reason for using the not at() method is that if one wants to make algorithms generic, then the less-than operator should not be used since most iterator types do not
CHAPTER 6. HIGH PERFORMANCE
71
support less-than. This is why the not-equal operator is instead used for most loops in the STL. The not at() function solves this problem by allowing the less-than operator to be used for random access iterators and the not-equal to be used for all others. The code below shows the implementation of the not at() family of functions. The last not at() function dispatches to either the first or second version of not at() depending on the iterator's category type. The dispatch happens at compile time, and all the functions are easily inlined by a modern C++ compiler. This technique has been referred to as external polymorphism. template bool not_at(const RandomIter1& a, const RandomIter2& b, random_access_iterator_tag) { return a < b; } template inline bool not_at(const Iter1& a, const Iter2& b, AnyTag) { return a != b; } template inline bool not_at(const Iter1& a, const Iter2& b) { typedef typename iterator_traits::iterator_category Cat; return not_at(a, b, Cat()); }
Now that we have a way of picking out the proper operator, and know from the experiment that an integer offset should be incremented instead of a pointer, we are ready to implement the MTL iterators. An excerpt from the dense iterator class is shown below. This is the iterator that is used in the dense1D container . We use the integer pos to keep track of the iterator position, and then use it as an offset in the operator* method. This implementation results in an iterator that produces loops that are easier to optimize for the compilers we have tested. The typical implementation of an iterator for a contiguous memory container such as a vector is to just use a pointer. But as the discussion above points out, this is not always the best choice. One nice thing about the iterator
CHAPTER 6. HIGH PERFORMANCE
72
abstraction is that we are not forced into a particular implementation, we can change the implementation at any time without affecting the code using the iterator. template class dense_iterator { public: dense_iterator(Iter s, size_type i) : start(s), pos(i) { } size_type index() const { return pos; } reference operator*() const { return *(start + pos); } self& operator++ () { ++pos; return *this; } bool operator < (const dense_iterator& x) const { return pos < x.pos; } protected: Iter start; size_type pos; };
6.3 High Performance & Template Metaprogramming The bane of portable high performance numerical linear algebra is the need to tailor key routines to specific execution environments. To obtain high performance on a modern microprocessor, an algorithm must make efficient use of the cache, registers, and the pipeline architecture (typically through careful loop blocking and structuring). Ideally, one would like to be able to express high performance algorithms in a portable fashion, but there is not enough expressiveness in languages such as C or Fortran to do so. This is because the blocking done at the lowest level, for registers and the pipeline, affects the number and type of instructions that must be in the inner loop, as is shown in the example below. The variation of the number of operations in the loop cannot be expressed directly in C or Fortran. Recent efforts (PHiPAC [7], ATLAS [59]) have resorted to going outside the language, i.e., to custom code generation systems to gain the kind of flexibility needed to generate the inner loop in a portable fashion. The BLAIS and FAST libraries use the template metaprogramming features of C++ to express flexible unrolling and blocking factors for inner loops.
CHAPTER 6. HIGH PERFORMANCE
73
// need to unroll by two for machine X for (int i = 0; i < N; i += 2) { y[i] += a * x[i]; y[i+1] += a * x[i+1]; } // need to unroll by three for machine Y for (int i = 0; i < N; i += 3) { y[i] += a * x[i]; y[i+1] += a * x[i+1]; y[i+2] += a * x[i+2]; }
In this section we describe a collection of high performance kernels for basic linear algebra, called the Basic Linear Algebra Instruction Set (BLAIS) and the Fixed Algorithm Size Template (FAST) Library. The kernels encapsulate small fixed size computations to provide building blocks for numerical libraries in C++. The sizes are templated parameters of the kernels, so they can be easily configured to a specific architecture for portability. In this way the BLAIS delivers the power of such code generation systems as PHiPAC [7] and ATLAS [59]. BLAIS has a simple and elegant interface, so that one can write flexible-sized block algorithms without the complications of a code generation system. The BLAIS specification contains fixed size algorithms with functionality equivalent to that of the Level-1, Level-2, and Level-3 BLAS [36, 23, 22]. The BLAIS routines themselves are implemented using the FAST library, which contains general purpose fixed-size algorithms equivalent in functionality to the generic numerical algorithms in the Standard Template Library (STL) [52]. In the following sections, we describe the implementation of the FAST algorithms and then show how the BLAIS are constructed from them. Next, we demonstrate how the BLAIS can be used as high-level instructions (kernels) to construct a dense matrix-matrix product. Finally, experimental results show that the performance obtained by our approach can equal and even exceed that of vendor-tuned
CHAPTER 6. HIGH PERFORMANCE
74
libraries.
6.4 Fixed Algorithm Size Template (FAST) Library The FAST Library includes generic algorithms such as transform(), for each(), inner product(), and accumulate() that are found in the STL. The interface closely follows that of the STL. All input is in the form of iterators (generalized pointers). The only difference is that the loop-end iterator is replaced by a count template object. The example below demonstrates the use of both the STL and FAST versions of transform() to realize an AXPY-like operation (y
x + y).
The first1 and last1 parameters are iterators for the first input container (indicating the beginning and end of the container, respectively). The first2 parameter is an iterator indicating the beginning of the second input container. The result parameter is an iterator indicating the start of the output container. The binary op parameter is a function object that combines the elements from the first and second input containers into the result containers. int x[4] = {1,1,1,1}, y[4] = {2,2,2,2}; // STL template OutIter transform(InIter1 first1, InIter1 last1, InIter2 first2, OutIter result, BinaryOp binary_op); std::transform(x, x + 4, y, y, plus()); // FAST template OutIter transform(InIter1 first1, cnt, InIter2 first2, OutIter result, BinOp binary_op); fast::transform(x, cnt(), y, y, plus());
CHAPTER 6. HIGH PERFORMANCE
75
The difference between the STL and FAST algorithms is that STL accommodates containers of arbitrary size, with the size being specified at run time. FAST also works with containers of arbitrary size, but the size is fixed at compile time. In the example below we show how the FAST transform() routine is implemented. We use a tailrecursive algorithm to achieve complete unrolling — there is no actual loop in the FAST transform(). The template-recursive calls are inlined, resulting in a sequence of N copies of the inner loop statement. This technique, called template metaprograms, has been used to a large degree in the Blitz++ Library and is explained in [16, 55]. // The general case template inline OutIter fast::transform (InIter1 first1, cnt, InIter2 first2, OutIter result, BinOp binary_op) { *result = binary_op (*first1, *first2); return transform(++first1, cnt(), ++first2, ++result, binary_op); } // The N = 0 case to stop template recursion template inline OutIter fast::transform(InIter1 first1,cnt,InIter2 first2, OutIter result, BinOp binary_op){ return result; }
6.5 Basic Linear Algebra Instruction Set (BLAIS) The BLAIS library is implemented directly on top of the FAST Library, as a thin layer that maps generic FAST algorithms into fixed-size mathematical operations. There is no added overhead in the layering because all the function calls are inlined. Using the FAST library allows the BLAIS routines to be expressed in a very simple and elegant fashion.
CHAPTER 6. HIGH PERFORMANCE
76
The following discussions looks at one example from each of the levels of the BLAIS library: vector-vector, matrix-vector, and matrix-matrix.
Vector-Vector Operations The add() routine is typical of the BLAIS vector-vector operations. It is implemented in terms of a FAST algorithm, in this case transform(). The implementation of the BLAIS add() is listed below. The add() function is implemented with a class and its constructor to provide a simple syntax for its use. template struct add { template inline add(Iter1 x, Iter2 y) { typedef typename iterator_traits::value_type T; fast::transform(x, cnt(), y, y, plus()); } };
The example below shows how the add() routine can be used. The comment shows the resulting code after the call to add() is inlined. Note that only one add() routine is required to provide any combination of scaling or striding. This is made possible through the use of the scaling and striding adaptors, as discussed in section 5.8. Any resulting overhead is removed by inlining and lightweight object optimizations [33]. The scl(x, a) call below automatically creates the proper scale iterator out of x and a. double x[4], y[4]; fill(x, x+4, 1); fill(y, y+4, 5); double a = 2; vecvec::add(scl(x, a), y); // // // // //
the compiler expands add() to: y[0] += a * x[0]; y[1] += a * x[1]; y[2] += a * x[2]; y[3] += a * x[3];
CHAPTER 6. HIGH PERFORMANCE
77
Matrix-Vector Operations To illustrate the BLAIS matrix-vector operations we look at the BLAIS matrix-vector multiply. The algorithm simply carries out a vector add operation for each column of the matrix. Using the same technique as in the FAST library, we write a fixed depth recursive algorithm which becomes inlined by the compiler. The implementation is listed below. // General Case template struct mult { template inline mult(AColIter A_2Diter, IterX x, IterY y) { vecvec::add(scl((*A_2Diter).begin(), *x), y); mult(++A_2Diter, ++x, y); } }; // N = 0 Case template struct mult { template inline mult(AColIter A_2Diter, IterX x, IterY y) { // do nothing } };
Matrix-Matrix Operations
The most important of the the BLAIS matrix-matrix oper-
ations is the matrix-matrix multiply. This algorithm builds on the BLAIS matrix-vector multiply. The code looks very similar to the matrix vector multiply, except that there are three integer template arguments (M, N, and K), and the inner “loop” contains a call to matvec::mult() instead of vecvec::add(). Remember that the BLAIS matrixmatrix operation is intended to be used in the inner loop (literally) of algorithms, and the BLAIS operation is completely inlined and expanded. Therefore it is perfectly fine to build the matrix-matrix multiply out of the matrix-vector operation since at this low level cache blocking issues are not a factor.
CHAPTER 6. HIGH PERFORMANCE
78
6.6 BLAIS in a General Matrix-Matrix Product A typical use of the BLAIS kernels would be to construct linear algebra subroutines for arbitrarily sized objects. The fixed-size nature of the BLAIS routines make them wellsuited to perform register-level blocking within a hierarchically blocked matrix-matrix multiplication. Blocking (or tiling) is a well known optimization that increases the reuse of data while it is in cache and in registers, thereby reducing the memory bandwidth requirements and increasing performance. The code below shows the inner most set of blocking loops for a matrix-matrix multiply. The constants BFM, BFN, and BFK are blocking factors chosen so that c can fit into the registers. The blocking factors describe the size and shape of the submatrices, or blocks, that the matrix is divided into. 2 for (jj = 0; jj < N; jj += BFN) for (ii = 0; ii < M; ii += BFM) { copy_block c(C+ii*N+jj, BFM, BFN, N); for (kk = 0; kk < K; kk += BFK) { light2D a(A+ii*K+kk, BFM, BFK); light2D b(B + kk*N + jj, BFK, BFN); matmat::mult(a, b, c); } }
A Configurable Recursive Matrix-Matrix Multiply
To obtain the highest performance
in a matrix-matrix multiply code, algorithmic blocking must be done at each level of the memory hierarchy. A natural way to formulate this is to write the algorithm in a recursive fashion, with a level of recursion for each level of blocking and memory hierarchy. We take this approach in the MTL algorithm. The size and shapes of the blocks at each level are determined by the blocking adaptor. Each adaptor contains the information for 2
Excessive code bloat is not a problem in MTL because the complete unrolling is only done for very small sized blocks
CHAPTER 6. HIGH PERFORMANCE
79
the next level of blocking. In this way the recursive algorithm is determined by a recursive template data structure (which is set up at compile time). The setup code for the matrixmatrix multiply is shown below. This example blocks for just one level of cache, with 64 x 64 sized blocks. The small 4 x 2 blocks fit into registers. Note that these numbers would normally be constants that are set in a header file and derived from the results of an automated parameter search facility. template void matmat::mult(MatA& A, MatB& B, MatC& C) { MatA::RegisterBlock A_L0; MatA::Block A_L1; MatB::RegisterBlock B_L0; MatB::Block B_L1; MatC::CopyBlock C_L0; MatC::Block C_L1; matmat::__mult(block(block(A, A_LO), A_L1), block(block(B, B_L0), B_L1), block(block(C, C_L0), C_L1)); }
The recursive algorithm is listed in Figure 6.1. There is a TwoD iterator (A k, B k, and C i) for each matrix, as well as 1-D iterator (A ki, B kj, and C ij). The matrices have been wrapped up with blocked matrix adaptors, so that dereferencing the OneD iterator results in a submatrix. The recursive call is then made on the submatrices A block, *B kj, and *C ij. The bottom-most level of recursion is implemented with a separate function that makes the calls to the BLAIS matrix-matrix multiply, and “cleans up” the leftover edge pieces. Since the recursion depth is fixed at compile time, the whole algorithm can be inlined by the compiler.
Optimizing Cache Conflict Misses Besides blocking, there is another important optimization that can be done with matrix-matrix multiply code. Typically utilization of the level-1 cache is much lower than one might expect due to cache conflict misses. This is especially apparent for large matrices in direct mapped and low associativity caches.
CHAPTER 6. HIGH PERFORMANCE
80
template void mult(MatA& A, MatB& B, MatC& C) { A_k = A.begin(); B_k = B.begin(); while (A_k != A.end()) { C_i = C.begin(); A_ki = (*A_k).begin(); while (C_i != C.end()) { B_kj = (*B_k).begin(); C_ij = (*C_i).begin(); MatA::Block A_block = *A_ki; while (B_kj != (*B_k).end()) { mult(A_block, *B_kj, *C_ij); ++B_kj; ++C_ij; } ++C_i; ++A_ki; } ++A_k; ++B_k; } }
Figure 6.1. A recursive matrix-matrix product algorithm. The way to minimize this problem is to copy the block of matrix A being accessed into a contiguous section of memory [34]. This allows the code to use blocking sizes closer to the size of the L-1 cache without inducing as many cache conflict misses. It turns out that this optimization is straightforward to implement in our recursive matrix-matrix multiply. We already have block objects (submatrices A block, *B j, and *C j). We modify the constructor for these objects to make a copy to a contiguous part of memory, and the destructor to copy the block back to the original matrix. This is especially nice since the optimization does not clutter the algorithm code, but instead the change is encapsulated in the copy block matrix class.
Chapter 7 Iterative Template Library (ITL) The Iterative Template Library is collection of sophisticated iterative methods written in C++ (similar to the IML++ Library [24]). It contains methods for solving both symmetric and non-symmetric linear systems of equations, many of which are described in [4]. The ITL methods were constructed in a generic style, allowing for maximum flexibility and separation of concerns about matrix data structure, performance optimization, and algorithms. Presently, ITL contains routines for conjugate gradient (CG), conjugate gradient squared (CGS), biconjugate gradient (BiCG), biconjugate gradient stabilized (BiCGStab), generalized minimal residual (GMRES), quasi-minimal residual (QMR) without look-ahead, transpose-free QMR, and Chebyshev and Richardson iterations. In addition, ITL provides the following preconditioners: SSOR, incomplete Cholesky, incomplete LU(n), and incomplete LU with thresholding.
7.1 Generic Interface The generic construction of the ITL revolves around the four major interface components. Vector An array class with an STL-like iterator interface. Matrix Either a MTL matrix, a multiplier (for matrix-free methods), or a custom matrix 81
CHAPTER 7. ITERATIVE TEMPLATE LIBRARY (ITL)
82
with a specialized mtl::mult(A,x,y) function defined. Preconditioner An object with solve(x,z) and trans solve(x,z) methods defined. Iteration An object that defines the test for convergence, and the maximum number of iterations. Figure 7.1 shows the generic interface for the QMR iterative method. The function is templated for each interface component, which allows for the ability to mix and match concrete components. This particular method uses a split preconditioner, hence the M1, and M2 arguments. template int qmr(const Matrix &A, Vector &x, const VectorB &b, const Precond1 &M1,const Precond2 &M2, Iteration& iter);
Figure 7.1. An ITL method interface example. Figure 7.2 gives an example of how one might call the qmr() routine. The interface presented to the user of ITL has been made as simple as possible. This example uses the compressed2D matrix type from MTL and the SSOR preconditioner. As listed above, there are certain requirements for each interface component, but there is a significant amount of flexibility in the concrete implementation of a particular component. For instance, any of the ITL supplied preconditioners (Cholesky, ILU, ILUT, SSOR) can be used with any method (though some are for symmetric matrices only), and custom preconditioners can be added to the list with little extra effort. Likewise, the control over the test for convergence has been encapsulated in the Iteration interface, so that variations in this regard can be made independent of the main algorithms.
CHAPTER 7. ITERATIVE TEMPLATE LIBRARY (ITL)
83
typedef matrix::type Matrix; int max_iter = 50; Matrix A(5, 5); dense1D x(A.nrows(), 0.0); dense1D b(A.ncols(), 1.0); // fill A ... SSOR precond(A); basic_iteration iter(b, max_iter, 1e-6); qmr(A, x, b, precond.left(), precond.right(), iter);
Figure 7.2. Example use of the ITL QMR iterative method. Similarly, the Matrix and Vector interfaces allow for flexibility in matrix storage implementation, or even in how the matrix-vector multiplication is carried out. There are several levels of flexibility available. The ITL uses the MTL interface for its basic linear algebra operations. Since the MTL linear algebra operations are generic algorithms, a wide range of matrix types can be used including all of the dense, sparse, and banded types provided in the MTL. Additionally, the MTL generic algorithms can work with custom matrix types, assuming the matrix exports the required interface. A second level of flexibility is available in that the user may specialize the mtl::mult(A,x,y) function for a custom matrix type, and bypass the MTL generic algorithms. An example use of this would be to perform matrix-free computations [9, 8]. Another possibility would be in using a distributed matrix with parallel versions of the linear algebra operations.
7.2 Ease of Implementation The most significant benefit of layering the ITL on top of the MTL interface is the ease of implementation. The ITL algorithms can be expressed in a concise fashion, very close to the pseudo-code from a text book. Figure 7.3 compares the preconditioned conjugate
CHAPTER 7. ITERATIVE TEMPLATE LIBRARY (ITL)
84
gradient algorithm from [4] with the code from the ITL. Initial r(0) = b , Ax(0) for i = 1; 2; : : : solve Mz (i,1) = r(i,1)
,1 = r( ,1)T z ( ,1) if i = 1 p(1) = z (0) i
i
i
else
,1 = ,1 = ,2 p( ) = z ( ,1) + ,1 p( ,1) i i
i i
i
i
i
endif
q( ) = Ap( ) T = ,1 =p( ) q( ) x( ) = x( ,1) + p( ) r( ) = r( ,1) , q( ) i
i i i
i
i
i
i
i
i
i
i
check convergence; end
i
i
while (! iter.finished(r)) { M.solve(r, z); rho = mtl::dot_conj(r, z); if (iter.first()) mtl::copy(z, p); else { beta = rho / rho_1; mtl::add(z,scaled(p,beta),p); } mtl::mult(A, p, q); alpha = rho/vecvec::dot_conj(p,q); mtl::add(x,scaled(p,alpha),x); mtl::add(r,scaled(q,-alpha),r); rho_1 = rho; ++iter; }
Figure 7.3. Comparison of an algorithm for the preconditioned conjugate gradient method and the corresponding ITL code.
The generic component construction of the ITL also aids in testing and verification of the software, since it enables an incremental approach. The basic linear algebra operations are tested thoroughly in the MTL test suite, and the preconditioners are tested individually before they are used in conjunction with the iterative method. In addition, there is the identity preconditioner, so that the iterative methods can be tested in isolation (without a real preconditioner). The abstraction level over the linear algebra also makes performance optimization much easier. Since the ITL iterative methods do not enforce the use of a particular matrix type, or a particular matrix-vector multiply, optimizations at these levels can happen with no change to the iterative method code. This was a significant factor in our ability to implement and optimize such a large group of iterative methods in short time (4 man months).
CHAPTER 7. ITERATIVE TEMPLATE LIBRARY (ITL)
85
7.3 ITL Performance This section compares the performance of the ITL with IML++ [24], one of the few other comprehensive iterative method packages (which uses SparseLib++ [26] for the sparse basic linear algebra). Six matrices from the Harwell Boeing collection are used. Computation time (in seconds) per iteration is plotted in Figure 7.4 for each of the following methods: CGS, BICG, BICGSTAB, QMR and TFQMR (which only exists in ITL). The ILU (without fill-in) preconditioner was used in all of the experiments. All timings were run an a Sun UltraSPARC 30. The ITL methods were roughly twice as fast as the corresponding IML++ methods. Calculated Time per Iteration 0.025
0.02
Time
0.015
CGS in ITL BICGS in ITL BICGSTAB in ITL QMR in ITL TFQMR in ITL CGS in IML BICGS in IML BICGSTAB in IML QMR in IML
0.01
0.005
0 MCCA
FS7603
PORES2 ZENIOS Matrix Names
SHERMAN5
SAYLR4
Figure 7.4. Comparison of ITL and IML++ performance over six matrices.
Chapter 8 Performance Experiments This chapter presents a set of experiments that provide performance results comparing MTL with other available libraries (both public domain and vendor supplied). The algorithms timed were the dense matrix-matrix multiplication, the dense matrix-vector multiplication, and the sparse matrix-vector multiplication.
8.1 Dense Matrix-Matrix Multiplication Figure 8.1 shows the dense matrix-matrix product performance for MTL, Fortran BLAS, the Sun Performance Library, TNT [47], and ATLAS [59], all obtained on a Sun UltraSPARC 170E. The experiment shows that the MTL can compete with vendor-tuned libraries (on an algorithm that tends to get extra attention due to benchmarking). The MTL and TNT executables were compiled using Kuck and Associates C++ (KCC) [33], in conjunction with the Solaris C compiler. ATLAS was compiled with the Solaris C compiler and the Fortran BLAS (obtained from Netlib) were compiled with the Solaris Fortran 77 compiler. All possible compiler optimization flags were used in all cases. The cache was cleared between each trial of the experiment. To demonstrate portability across different architectures and compilers, Figure 8.1 also compares the performance of MTL with ESSL [30] on an IBM RS/6000 590. In this case, the MTL executable was compiled with the KCC and IBM xlc compilers. 86
CHAPTER 8. PERFORMANCE EXPERIMENTS
87
The presence (and absence) of optimization techniques in the different programs can readily be seen in Fig. 8.1 (top) and manifest themselves quite strongly as a function of matrix size. In the region from N = 10 to N = 256, performance is dominated by register usage and pipeline performance. “Unroll and jam” techniques [10, 61] are used to attain high levels of performance in this region. In the region from 256 to approximately 1024, performance is dominated by data locality. Loop blocking for cache usage is used to attain high levels of performance here. Finally, for matrix sizes larger than approximately N = 1024, performance can be affected by conflict misses in the cache — notice how the results for ATLAS and Fortran BLAS fall precipitously at this point. To attain good performance in the face of conflict misses (in low associativity caches) block-copy techniques as described in [34] are used. Note that performance effects are cumulative. For instance, the Fortran BLAS do not use any of the techniques listed above for performance enhancement. As a result, performance is poor initially and continues to degrade as different effects come into play.
8.2 Dense and Sparse Matrix-Vector Multiplication To demonstrate genericity across different data structures and data types, Figure 8.2 shows performance results obtained using the same generic matrix-vector multiplication algorithm for dense and for sparse matrices, and compares the performance to that obtained with non-generic libraries. The dense matrix-vector performance of MTL is compared to the Netlib BLAS (Fortran), the Sun Performance Library, and TNT [47]. The sparse matrix-vector performance for MTL is compared to SPARSKIT [49] (Fortran), NIST [48](C), and TNT (C++). The sparse matrices used were from the MatrixMarket [44] collection. The cache was not cleared between each matrix-vector timing trial. The focus of this experiment is on the pipeline behavior of the algorithm. If the cache is
CHAPTER 8. PERFORMANCE EXPERIMENTS
88
cleared, the bottle neck becomes memory bandwidth and differences in pipeline behavior can not be seen. Blocking for cache is not as important for matrix-vector multiplication because there is no data reuse of the matrix.
8.3 Performance Analysis of Matrix-Matrix Multiplication The presence (and absence) of different optimization techniques in the various implementations of the matrix-matrix multiplication can readily be seen in Figure 8.1 (the UltraSPARC comparison) and manifest themselves quite strongly as a function of matrix size. In the region from
N
= 10 to
N
= 256, performance is dominated by register usage
and pipeline performance. “Unroll and jam” techniques [10, 61] are used to attain high levels of performance in this region. In the region from 256 to approximately 1024, performance is dominated by data locality. Loop blocking for cache is used to attain high levels of performance here. Finally, for matrix sizes larger than approximately N = 1024, performance can be affected by conflict misses in the cache. The results for ATLAS and Fortran BLAS fall precipitously at this point. To attain good performance in the face of conflict misses (in low associativity caches) block-copy techniques as described in [34] are used. Note that performance effects are cumulative. For instance, the Netlib BLAS do not use any of the techniques listed above for performance enhancement. As a result, performance is poor initially and continues to degrade as different effects come into play.
CHAPTER 8. PERFORMANCE EXPERIMENTS
89
300
MTL Sun Perf Lib ATLAS Fortran BLAS TNT
250
Mflops
200
150
100
50
0 1 10
2
3
10
10 Matrix Size
300
MTL ESSL ATLAS Netlib
250
Mflops
200
150
100
50
0 1 10
2
10
3
10
Matrix Size
Figure 8.1. Performance comparison of generic dense matrix-matrix product with other libraries on Sun UltraSPARC (upper) and IBM RS6000 (lower).
CHAPTER 8. PERFORMANCE EXPERIMENTS
90
160
MTL Fortran BLAS Sun Perf Lib TNT
140
120
Mflops
100
80
60
40
20
0
50
100
150
200 N
250
300
350
400
70
MTL SPARSKIT NIST TNT
60
50
Mflops
40
30
20
10
0 0 10
1
10 Average non zeroes per row
2
10
Figure 8.2. Performance of generic matrix-vector product applied to column-oriented dense (upper) and row-oriented sparse (lower) data structures compared with other libraries on Sun UltraSPARC.
Chapter 9 Testing The Matrix Template Library posed a special challenge for testing due to the combinatorial nature of the components and algorithms. The amount of functionality to be tested was extremely large, so the same techniques used in the MTL to manage code size were applied to the test suite, those of generic programming. The test suite consists of three parts, the matrix tests, the algorithms tests, and a script to drive the test compilation and execution. The matrix tests exercise all of the functionality described in the Matrix concept (Section 5.3.1). The matrix tests are written in a generic style, similar to the generic algorithms of the MTL. Most of the matrix tests can be applied to any MTL matrix, however there are a few tests that must be specialized. For instance, the test that exercises the A(i,j) operator must be specialized for banded matrices. The total number of specializations is not very large, since even the specializations are written in a generic style and cover a large number matrix types. In addition to testing each matrix format, the matrix tests check each matrix adaptor, such as the scaling and transpose adaptors, with every matrix to verify that the adapted matrix can pass all of the tests. The code below shows one of the matrix tests. This tests focuses on the constiterators of the matrix. The test reads the values stored in the matrix, and checks to make sure those values are correct. In this case the matrix has been filled with the series 91
CHAPTER 9. TESTING
92
of numbers 0; 1; 2; : : :. template bool const_iterator_test(const Matrix& A, string test_name) { typedef typename mtl::matrix_traits::value_type T; typename Matrix::const_iterator i; typename Matrix::OneD::const_iterator j; T c = T(0); for (i = A.begin(); i != A.end(); ++i) for (j = (*i).begin(); j != (*i).end(); ++j) { c = c + T(1); if (*j != c) { cerr void matvec_mult(const Matrix& A, VecX x, VecY y) { typename Matrix::const_iterator i; typename Matrix::OneD::const_iterator j; for (i = A.begin(); i != A.end(); ++i) for (j = (*i).begin(); j != (*i).end(); ++j)
103
APPENDIX A. CONTAINERS
104
y[j.row()] += *j * x[j.column()]; }
There are two iterators used in the above algorithm, i and j. We refer to i as a 2D iterator since it iterates through the 2D container. We refer to j as the 1D iterator. *i gives a 1D Container, and the (*i).begin() and (*i).end() expressions define the iteration through the 1D part of the Matrix, which could be a Row, Column, or Diagonal depending on the Matrix type. MTL matrices can also be sparse or dense, so the traversal behaviour of the 1D iterator j varies accordingly. j only iterates over the non-zero elements in the sparse case. In addition the row() and column() functions on the 1D iterator return the row and column index corresponding to the position of the iterator. This hides the differences in indexing between a large class of sparse and dense matrices. Compare the MTL-style code to that of the BLAS-style and the SPARSKIT matrixvector products. The BLAS and SPARSKIT algorithms include a lot of details that are specific to the matrix format being used. For instance, the sparse matrix algorithm must explicitly access the index and pointer arrays ia and ja. // SPARSKIT-style sparse matrix-vector multiply void matvec_mult(double* a, int n, int* ia, int* ja, double* y, double* x) { for (int i = 0; i < n; ++i) for (int k = ia[i]; k < ia[i+1]; ++k) y[i] += a[k] * x[ja[k]]; } }
The dense matrix algorithm must use the leading dimension lda of the matrix to map to the appropriate place in memory. // BLAS-style dense matrix-vector multiply void matvec_mult(double* a, int m, int n, int lda, double* y, double* x) {
APPENDIX A. CONTAINERS
105
for (int i = 0; i < m; ++i) for (int j = 0; j < n; ++j) y[i] += a[i*lda+j] * x[j]; } }
The MTL-style algorithm has none of these format specific expressions because the implementation details are hidden under the MTL Matrix interface. If one uses a dense MTL Matrix with the generic algorithm, the resulting assembly code looks similar to the BLAS-style algorithm. If one uses a sparse MTL Matrix with the generic algorithm, the resulting assembly code looks similar to the SPARSKIT-style algorithm.
operator()(i,j)
One may ask, what about the A(i,j) operator? MTL matrices do
have an A(i,j) operator, but using it typically is not the most efficient or “generic” way to traverse and access the matrix, especially if the matrix is sparse. Furthermore, if the Matrix is banded, one would have to consider which indices of the Matrix are valid to access. With the MTL Matrix iterators, one does not have to worry about such considerations. Traversing with the MTL Matrix iterators gives you access to all matrix elements that are stored in any given storage format.
operator[](i)
This operator gives access to the OneD containers within a Matrix. For
instance, if you have a column major matrix A, and wish to find the index of the maximum element in the first column, one would do the following: Matrix::Column first_column = A[0]; max_elt_index = max_index(first_column);
Accessing Rows vs. Columns Most matrix types only provide access to either rows or columns, but not both. However, if one has a dense rectangle matrix, then the matrix can
APPENDIX A. CONTAINERS
106
be easily converted back and forth from row major to column major using the rows and columns helper functions. The following finds the maximum element in the first row of the matrix. max_elt_index = max_index(rows(A)[0]);
Submatrices
The sub matrix() function returns a new matrix object that is a view
into a particular portion of the matrix. The indexing within the new matrix is reset so that the first element is at (0,0). Here is an example of creating a submatrix: A =
[ 1 2 3 4 ] [ 5 6 7 8 ] [ 9 10 11 12 ] [ 13 14 15 16 ]
A_00 = [ [
1 5
2 ] 6 ]
A_10 = [ 9 10 ] [ 13 14 ] A_00 A_01 A_10 A_11
= = = =
A_01 = [ 3 4 ] [ 7 8 ] A_11 = [ 11 12 ] [ 15 16 ]
A.sub_matrix(0,2,0,2); A.sub_matrix(2,4,0,2); A.sub_matrix(0,2,2,4); A.sub_matrix(2,4,2,4);
If one wants to create submatrices for the whole matrix, as above, MTL provides a short cut in the partition fuction. The function returns a Matrix of submatrices. The input is the row and column numbers that split up the matrices. The following code gives an example. int splitrows[] = { 2 }; int splitcols[] = { 2 }; Matrix::partitioned Ap = A.partition(array_to_vec(splitrows), array_to_vec(splitcols));
Now Ap(0,0) is equivalent to A 00, Ap(0,1) is equivalent to A 01, etc.
APPENDIX A. CONTAINERS
107
Matrix Type Selection To create an MTL Matrix, one uses the matrix type constructor to choose the particular storage format, element type, etc. This performs a shallow copy, since MTL objects are really just reference counted handles to the actually data. The IndexList is some Container containing the list of row and column numbers to use to split the matrix. The result is a matrix of submatrices. Partition the matrix into four submatrices. Refinement of Container Associated types Concept
Type
Tag
X::shape
Description See matrix traits
Tag
X::orientation
Either row tag or column tag
Tag
X::sparsity
Either dense tag or sparse tag
Tag
X::dimension
Either oned tag or twod tag
Matrix
X::transpose type
Used by trans helper function
Matrix
X::strided type
Used by rows and columns helper functions
Matrix
X::scaled type
Used by scaled helper function
Vector
X::OneD
The type for a OneD slice of the Matrix
Vector
X::OneDRef
The type for a reference to a OneD slice of the Matrix
Matrix
X::submatrix type
The type for a submatrix of this Matrix
Matrix
X::partition type
The type for a partitioned version of this Matrix
APPENDIX A. CONTAINERS
108
Concept
Type
TrivialConcept
X::value type
Description The element type
TrivialConcept&
X::reference
Reference to the element type
const TrivialConcept&
X::const reference
Const reference to the element type
TrivialConcept*
X::pointer
Pointer to the element type
BidirectionalIterator
X::iterator
Iterator type, dereference gives a OneD
X::const iterator
Const iterator type
X::reverse iterator
Reverse iterator type
X::const reverse iterator
Const reverse iterator type
X::size type
Size type
Integral
X::difference type
Difference type
size type
X::M, N
for static sized matrices
const BidirectionalIterator BidirectionalIterator const BidirectionalIterator NonNegativeIntegral
Notations X
The type of a model of the Matrix concept
A,B
An object of type X
stream A matrix stream for file IO Expression semantics Expression X(m, X(m, A(m, X(B)
n) n, n, or
or X A(m, n); sub, super) or X sub, super); X A(B);
Semantics
Description Normal Constructor Constructor for banded matrices Copy Constructor
APPENDIX A. CONTAINERS Expression X A(data, m, n); X A(data, m, n, ld);
X A(data, m, n, ld, sub, super); X A(data); X A(m, n, nnz, val, ptrs, inds); X A(B, sub, super); X A(stream); X A(stream); X A(stream, sub, super); X A(stream, sub, super); A.begin() A.begin() A.end() A.end() A.rbegin() A.rbegin() A.rend() A.rend() A(i,j) A(i,j) A[i] A[i] A.sub matrix(row start, row finish, col start, col finish) A.partition(split rows, split cols) A.subdivide(row split, column split) A.nrows() A.ncols()
109 Semantics
Description Contstruct for external matrices Construct for external matrices, with different leading dimension (ld) Constructor for external banded matrices Static sized dimension constructor Sparse external matrix constructor Banded view constructor Matrix market stream constructors Harwell Boeing stream constructor Matrix market stream banded constructors Harwell Boeing stream banded constructor Iterate over OneD slices Const Iterator Begin Iterator End Const Iterator End Reverse Iterator Begin Const Reverse Iterator Begin Reverse Iterator End Const Reverse Iterator End Element Access Const Element Access OneD Access Const OneD Access Submatrix Access
Partitioned Matrix Creation Subdivide (partition into 4) Number of rows Number of columns
APPENDIX A. CONTAINERS Expression
110 Semantics
Description Number of non-zero elements (really number of stored elements) Sub part of bandwidth Super part of bandwidth Whether the main diagonal is all ones, and hence not stored Whether the matrix has elements stored only in the upper triangle Whether the matrix has elements stored only in the upper triangle
Prototype
Description
Complexity
X(size type m, size type n) X(size type m, size type n, difference type sub, difference type super) X(const X& x) X(pointer data, size type m, size type n) X(pointer data, size type m, size type n, size type ld) X(pointer data, size type m, size type n, difference type sub, difference type super) X(pointer data) X(size type m, size type n, size type nnz, pointer val, size type* ptrs, size type* inds) template X(const Matrix& x, difference type sub, difference type super) X(matrix market stream& m in) X(harwell boeing stream& m in)
Normal Constructor
A.nnz()
A.sub() A.super() A.is unit() A.is upper() A.is lower()
Function specification
Constructor for banded matrices Copy Constructor Contstruct for external matrices Construct for external matrices, with different leading dimension (ld) Constructor for external banded matrices Static sized dimension constructor Sparse external matrix constructor
Banded view constructor Matrix market stream constructors Harwell Boeing stream constructor
APPENDIX A. CONTAINERS Prototype X(matrix market stream& m in, difference type sub, difference type super) X(harwell boeing stream& m in, difference type sub, difference type super) iterator begin()
111 Description
Complexity
Matrix market stream banded constructors Harwell Boeing stream banded constructor Iterate over OneD slices
constant time
const iterator begin()
Const Iterator Begin
constant time
iterator end()
Iterator End
constant time
const iterator end()
Const Iterator End
constant time
reverse iterator rbegin()
Reverse Iterator Begin
constant time
const reverse iterator rbegin()
Const Reverse Iterator Begin
constant time
reverse iterator rend()
Reverse Iterator End
constant time
const reverse iterator rend()
Const Reverse Iterator End
constant time
reference operator()(size type i, size type j)
Element Access
constant for dense, linear for sparse
const reference operator()(size type i, size type j)
Const Element Access
constant for dense, linear for sparse
APPENDIX A. CONTAINERS
112
Prototype
Description
Complexity
OneDRef operator[](size type i)
OneD Access
constant
const OneDRef operator[](size type i)
Const OneD Access
constant
submatrix type sub matrix(size type row start, size type row finish, size type col start, size type col finish) partitioned partition(IndexList rows, IndexList columns) paritioned subdivide(size type row split, size type column split) size type nrows()
Submatrix Access
Partitioned Matrix Creation
Subdivide (partition into 4) Number of rows
constant time
size type ncols()
Number of columns
constant time
size type nnz()
Number of non-zero elements (really number of stored elements)
constant time
difference type sub() difference type super() bool is unit()
Sub part of bandwidth Super part of bandwidth Whether the main diagonal is all ones, and hence not stored Whether the matrix has elements stored only in the upper triangle Whether the matrix has elements stored only in the upper triangle
bool is upper() bool is lower()
Models
row matrix
column matrix
diagonal matrix
APPENDIX A. CONTAINERS
triangle matrix
symmetric matrix
113
A.1.2 RowMatrix Description A row-oriented, or row-major Matrix. The iterators for this Matrix type traverse along the rows of the Matrix, and the operator[](n) returns a row of the Matrix. Additionally, there is a Row typedef which refers to the OneD type of the Matrix. Associated types Concept
Type
Vector
X::Row
Description Row type, same as Matrix::OneD
Models
row matrix
A.1.3 ColumnMatrix Description A column-oriented, or column-major Matrix. The iterators for this Matrix type traverse along the columns of the Matrix, and the operator[](n) returns a column of the Matrix. Additionally, there is a Column typedef which refers to the OneD type of the Matrix. Associated types Concept
Type
Description
Vector
X::Column
Column type, same as Matrix::OneD
APPENDIX A. CONTAINERS
114
Models
column matrix
A.1.4 DiagonalMatrix Description A diagonal matrix is quite different from the normal MTL Matrix. Instead of the OneD parts of the Matrix being Rows or Columns, the OneD parts of this Matrix are diagonals of the matrix. The below example shows a piece of code that uses the iterators of a diagonal matrix to fill up the matrix with incrementing numbers. The matrix depicted below is the result of the code applied to a diagonal matrix. The iterators traverse along the diagonals of the Matrix instead of the rows or columns. int c = 0; typename Matrix::const_iterator i; typename Matrix::OneD::const_iterator j; for (i = A.begin(); i != A.end(); ++i) { for ( j = (*i).begin(); j != (*i).end(); ++j) { c = c + 1; *j = c; } } [ [ [ [
4 8
1 5 9
2 6 10
3 7
] ] ] ]
Associated types Concept
Type
Description
Vector
X::Diagonal
Diagonal type, same as Matrix::OneD
APPENDIX A. CONTAINERS
115
Models
diagonal matrix
A.1.5 Vector Description Not to be confused with the std::vector class. The MTL Vector concept is a Container in which every element has a corresponding index. The elements do not have to be sorted by their index, and the indices do not necessarily have to start at 0. Also the indices do not have to form a contiguous range. The iterator type must be a model of IndexedIterator. Vector is not a refinement of RandomAccessContainer (even though Vector defines operator[]) because Vector does not guarantee amortized constant time for that operation (to allow fo sparse vectors). Note also that the invariant a[n] == advance(A.begin(), n) that applies to RandomAccessContainer does not apply to Vector, since the a[i] is defined for Vector to return the element with the ith index. So a[n] == *i if and only if i.index() == n. Associated types Description
Concept
Type
IndexedIterator const IndexedIterator Tag
X::iterator
X::dimension
Marks this as 1-D
Tag
X::sparsity
dense tag or sparse tag
const Vector
X::scaled type
The vector scaled be a constant
size type
X::N
the static size, 0 if dynamic
X::const iterator
APPENDIX A. CONTAINERS
116
Type
Description
X::IndexArray
An array containing the indices of the elements in the vector.
Vector
X::subrange
The sub-vector type
const Vector
X::const subrange
The const sub-vector type
Concept const tainer
Con-
Notations X
The type of a model of Vector
a
An object of type X
splits A container of integral objects Expression semantics Expression
Semantics
Description Element access Const element access Partition vector Subdivide vector (partition into 2) Subrange access (not yet implemented) Number of non-zero elements (actually the number of stored elements)
Prototype
Description
Complexity
reference operator[](size type n)
Element access
linear time
const reference operator[](size type n)
Const element access
linear time
a[n] a[n] a.partition(splits) a.subdivide(split) a(s,f) a.nnz()
Function specification
APPENDIX A. CONTAINERS
117
Prototype
Description
Complexity
partitioned partition(const Container& splits)
Partition vector
linear number splits
Subdivide vector (partition into 2)
constant
Subrange access (not yet implemented)
linear
size type nnz()
Number of non-zero elements (actually the number of stored elements)
constant time
IndexArray nz struct()
The non-zero structure of the Vector (an array of indices cooresponding to the elements stored)
partitioned subdivide(size type split) subrange operator()(size type start, size type finish)
in of
Invariants a.nnz() Description Matrices that occur in real engineering and scientific applications often have special structure, especially in terms of how many zeros are in the matrix, and where the non-zeros are located in the matrix. This means that space and time saving can be acheived by using various types of compressed storage. There are a multitude of matrix storage formats in use today, and the MTL tries to support many of the more common storage formats. The following discussion will describe how the user of MTL can select the type of matrix he or she wishes to use. To create a MTL matrix, one first needs to construct the appropriate matrix type. This is done using the matrix type generation class, which is easier to think of as a function. It takes as input the characteristics of the matrix type that you want and then returns the
APPENDIX A. CONTAINERS
120
appropriate MTL matrix. The matrix type generators “function” has defaults defined, so in order to create a normal rectangular matrix type, one merely does the following: typedef matrix< double >::type MyMatrix; MyMatrix A(M, N);
The matrix type generators can take up to four arguments, the element type, the matrix shape, the storage format, and the orientation. The following is the “prototype” for the matrix type generators. matrix< EltType, Shape, Storage, Orientation >::type
This type of ”generative” interface technique was developed by by Krzysztof Czarnecki and Ulrich Eisenecker in their work on the Generative Matrix Computation Library. *Storage can be made external by specifying such in the storage parameter. eg. dense, packed. Definition matrix.h Template Parameters Parameter EltType
Default
Description Valid choices for this argument include double, complex, and bool. In essence, any builtin or user defined type can be used for the EltType, however, if one uses the matrix with a particular algorithm, the EltType must support the operations required by the algorithm. For MTL algorithms these typically include the usual numerical operators such as addition and multiplication. The std::complex class is a good example of what is required in a numerical type. The documentation for each algorithm will include the requirements on the element type.
APPENDIX A. CONTAINERS Parameter
121
Default
Description This argument specifies the general positioning of the non zero elements in the matrix, but does not specify the actual storage format. In addition it specifies certain properties such as symmetry. The choices for this argument include rectangle, banded, diagonal, triangle, and symmetric. Hermitian is not yet implemented. The argument specifies the storage scheme used to lay out the matrix elements (and sometimes the element indices) in memory. The storage formats include dense , banded, packed , banded view, compressed, envelope, and array. The storage order for an MTL matrix can either be row major or column major.
Shape
Storage
Orientation
Members Declaration type
Description The generated type
A.2.2 band view Description A helper class for creating a banded view into an existing dense matrix. Example In banded view test.cc: template void print_banded_views(Matrix& A) { using namespace mtl; band_view::type B(A, 2, 1); } int main(int argc, char* argv[]) { using namespace mtl; const int M = atoi(argv[1]), N = atoi(argv[2]); typedef matrix::type Matrix;
Where Defined
APPENDIX A. CONTAINERS
122
Matrix A(M, N); print_banded_views(A); return 0; }
Definition matrix.h Template Parameters Parameter Matrix
Default
Description The type of the Matrix to be viewed, must be dense
Members Declaration type
Description The generated type
A.2.3 block view Description block_view bA = blocked(A, 16, 16); or block_view bA = blocked(A);
Note: currently not supported for egcs (internal compiler error). Example In blocked matrix.cc: const int M = 4; const int N = 4; typedef matrix::type Matrix;
Where Defined
APPENDIX A. CONTAINERS
123
Matrix A(M,N); for (int i = 0; i < M; ++i) for (int j = 0; j < N; ++j) A(i, j) = i * N + j; print_all_matrix(A); block_view::type bA = blocked(A, blk()); print_partitioned_matrix(bA); block_view::type cA = blocked(A, 2, 2); print_partitioned_by_column(cA);
Definition matrix.h Template Parameters Parameter Matrix BM
BN
Default 0 for dynamic size
0 for dynamic size
Description The type of the Matrix to be blocked, must be dense The blocking factor for the rows (M dimension) The blocking factor for the columns (N dimension)
Members Declaration type
Description The generated type
Where Defined
A.2.4 symmetric view Description A helper class for creating a symmetric view into an existing dense or sparse matrix. For sparse matrices, the matrix must already have elements in the appropriate lower/upper
APPENDIX A. CONTAINERS
124
portion of the matrix. This just provides the proper symmetric matrix interface. Definition matrix.h Template Parameters Parameter
Default
Matrix
Uplo
Description The type of the Matrix to be viewed, must be dense Whether to view the upper or lower triangle of the matrix
Members Declaration
Description
Sparsity type
The generated type
Where Defined
A.2.5 triangle view Description A helper class for creating a triangle view into an existing dense or sparse matrix. For sparse matrices, the matrix must already have elements in the appropriate triangular portion of the matrix. This just provides the proper triangular matrix interface. Example In banded view test.cc: template void print_banded_views(Matrix& A) { using namespace mtl; band_view::type B(A, 2, 1); } int main(int argc, char* argv[])
APPENDIX A. CONTAINERS
125
{ using namespace mtl; const int M = atoi(argv[1]), N = atoi(argv[2]); typedef matrix::type Matrix; Matrix A(M, N); print_banded_views(A); return 0; }
Definition matrix.h Template Parameters Parameter Matrix
Uplo
Default
Description The type of the Matrix to be viewed, must be dense Whether to view the upper or lower triangle of the matrix
Members Declaration
Description
Sparsity type
The generated type
Where Defined
A.3 Container type selectors A.3.1 rectangle Description A MTL rectangular matrix is one in which elements could appear in any position in the matrix, i.e., there can be any element A(i,j) where 0
, row_major >::type SparseArrayMat;
Definition matrix.h Template Parameters Parameter
Default
MM
not static
NN
not static
Members
Description The number of rows of the matrix, if the matrix has static size (known at compile time) The number of columns of the matrix, if the matrix has static size (known at compile time)
APPENDIX A. CONTAINERS Declaration
127 Description
enum f M = MM, N = NN, id = RECT, uplo g
Where Defined
A.3.2 symmetric Description Symmetric matrices are similar to banded matrices in that there is only access to a particular band of the matrix. The difference is that in an MTL symmetric matrix, A(i,j) and A(j,i) refer to the same element. The following is an example of a symmetric matrix: the full symmetric matrix [ 1 2 3 4 5 ] [ 2 6 7 8 9 ] [ 3 7 10 11 12 ] [ 4 8 11 13 14 ] [ 5 9 12 14 15 ] the symmetric [ 1 ] [ 2 6 ] [ 3 7 10 [ 4 8 11 [ 5 9 12
matrix in packed storage
] 13 14
] 15
]
Similar to the triangle shape, the user must provide an Uplo argument which specifies which part of the matrix is actually stored. The valid choices are upper and lower for symmetric matrices. typedef matrix < double, symmetric, packed, row_major >::type SymmMatrix;
Example In symm packed vec prod.cc: double da[16]; typedef matrix< double,
APPENDIX A. CONTAINERS
128
symmetric, packed, column_major >::type Matrix; const Matrix::size_type matrix_size = 5; Matrix A(da, matrix_size, matrix_size); typedef dense1D Vec; Vec y(matrix_size,1),x(matrix_size), Ax(matrix_size); double alpha=1, beta=0; // 1 2 3 4 5 1 1000 // 2 6 7 8 9 2 2000 // A = 3 7 10 11 12 x = 3 y = 3000 // 4 8 11 13 14 4 4000 // 5 9 12 14 15 5 5000 //make A for (int i = 0; i < 15; ++i) da[i] = i + 1; //make x y for (int i=0;i::type BLAS_Packed; typedef matrix < double, banded, banded, column_major >::type BLAS_Banded;
Storage Type Selectors
banded is also the type selectors for the banded storage for-
mat. This storage format is equivalent to the banded storage used in the BLAS and LAPACK. Similar to the dense storage format, a single contiguous chunk of memory is allocated. The banded storage format maps the bands of the matrix to a twod-array of dimension (sub + super + 1) by min(M, N + sub). In MTL the 2D array can be row or column major (for the BLAS it is always column major). The twod-array is then in turn mapped the the linear memory space of the single chunk of memory. The following is an example banded matrix with the mapping to the row-major and column-major 2D arrays. The x's represent memory locations that are not used. [ [ [ [ [ [
1 4 0 0 0 0
2 5 8 0 0 0
3 6 9 12 0 0
0 7 10 13 16 0
0 0 11 14 17 19
row-major [ 1 2 3 [ 4 5 6 [ 8 9 10 [ 12 13 14
x 7 11 15
] ] ] ]
0 0 0 15 18 20
] ] ] ] ] ]
APPENDIX A. CONTAINERS [ [
x x
16 x
17 19
18 20
] ]
column-major [ x x 3 [ x 2 6 [ 1 5 9 [ 4 8 12
7 10 13 16
11 14 17 19
15 18 20 x
131
] ] ] ]
Definition matrix.h Template Parameters Parameter
Default
Description
External
internal
Specify whether the memory used is ”owned” by the matrix or if it was provided to the matrix from some external source (with a pointer to some data)
Members Declaration
Description
Where Defined
size type enum f id = BAND, oned id, uplo, ext=External, M=0, N=0, issparse=0, index g
A.3.5 triangle Description The triangular shape is a special case of the banded shape. There are four kinds of triangular matrices in MTL, based on the Uplo argument: The following is an example of a triangle shaped matrix: [
1
2
3
4
5
]
APPENDIX A. CONTAINERS Uplo type upper unit upper lower unit lower
[ [ [ [
0 0 0 0
6 0 0 0
7 10 0 0
8 11 13 0
9 12 14 15
132 Sub Super 0 N-1 -1 N-1 M-1 0 M - 1 -1
] ] ] ]
The next example is of a triangle matrix. The main diagonal is not stored, since it consists of all ones. The MTL algorithms recognize when a matrix is “unit” and perform a slightly different operation to take this into account. The ones will not show up in an iteration of the matrix, and access to the A(i,i) element of a unit lower/upper matrix is an error. [ [ [ [ [
1 1 2 4 7
0 1 3 5 8
0 0 1 6 9
0 0 0 1 10
0 0 0 0 1
] ] ] ] ]
Here are a couple examples of creating some triangular matrix types: typedef matrix < double, triangle, banded, column_major >::type UpperTriangle; typedef matrix < double, triangle, packed, row_major >::type UnitLowerTriangle;
Definition matrix.h
APPENDIX A. CONTAINERS
133
Template Parameters Parameter
Default
Description The type of triangular matrix. Either upper, lower, unit upper, or unit lower.
Uplo
Members Declaration
enum f id = TRI, uplo = Uplo, M=0, N=0 g
Description
Where Defined
A.3.6 diagonal Description The diagonal matrix shape is similar to the banded matrix in that there is a bandwidth that describes the area of the matrix in which non-zero matrix elements can reside. The difference between the banded matrix shape lies in how the MTL iterators traverse the matrix, which is explained in DiagonalMatrix. The MTL storage types that can be used are banded, packed, banded view, and array. To get the traditional tridiagonal matrix format, one just has to specify the bandwith to be (1,1) and use the array dense
> storage format.
Definition matrix.h Template Parameters Parameter External
Default internal
Description Specify whether the memory used is ”owned” by the matrix or if it was provided to the matrix from some external source (with a pointer to some data)
APPENDIX A. CONTAINERS
134
Members Declaration
enum f uplo, id = DIAG, ext=External, M=0, N=0 g
Description
Where Defined
A.3.7 array Description This storage type gives an ”array of pointers” style implementation of a matrix. Each row or column of the matrix is allocated separately. The type of vector used for the rows or columns is very flexible, and one can choose from any of the OneD storage types, which include dense, compressed, sparse pair, tree, and linked list. matrix < double, rectangle, array< dense >, row_major >::type [ ] -> [ 1 0 0 4 0 ] [ ] -> [ 0 7 8 0 0 ] [ ] -> [ 11 0 13 14 0 ] [ ] -> [ 16 0 18 0 20 ] [ ] -> [ 0 22 0 24 0 ] matrix < double, rectangle, array< sparse_pair >, row_major >::type [ ] -> [ (1,0) (4,3) ] [ ] -> [ (7,1) (8,2) ] [ ] -> [ (11,0) (13,2) (14,3) ] [ ] -> [ (16,0) (18,2) (20,4) ] [ ] -> [ (22,1) (24,3) ]
One advantage of this type of storage is that rows can be swapped in constant time. For instance, one could swap the row 3 and 4 of a matrix in the following way.
APPENDIX A. CONTAINERS
135
Matrix::OneD tmp = A[3]; A[3] = A[4]; A[4] = tmp;
The rows are individually reference counted so that the user does not have to worry about deallocating the rows. Definition matrix.h Template Parameters Parameter
Default
OneD
dense
External
internal
Description The storage type used for each row/column of the matrix Specify whether the memory used is ”owned” by the matrix or if it was provided to the matrix from some external source (with a pointer to some data)
Model of TwoDStorage Members Declaration size type enum fid=ARRAY, oned id=OneD::id, ext=External, issparse = OneD::issparse, index=index from zero
g
Description
Where Defined
APPENDIX A. CONTAINERS
136
A.3.8 dense Description TwoD Storage Type Selectors This is the most common way of storing matrices, and consists of one contiguous piece of memory that is divided up into rows or columns of equal length. The following example shows how a matrix can be mapped to linear memory in either a row-major or column-major fashion. [ 1 2 3 ] [ 4 5 6 ] [ 7 8 9 ] row major: [ 1 2 3 4 5 6 7 8 9 ] column major: [ 1 4 7 2 5 8 3 6 9 ]
OneD Storage Type Selectors
This specifies a normal dense vector to be used as the
OneD part of matrix with array storage. Example In swap rows.cc: typedef matrix< double, rectangle, dense, column_major>::type Matrix; const Matrix::size_type N = 3; Matrix::size_type large; double dA[] = { 1, 3, 2, 1.5, 2.5, 3.5, 4.5, 9.5, 5.5 }; Matrix A(dA, N, N); // Find the largest element in column 1. large = max_index(A[0]); // Swap the first row with the row containing the largest // element in column 1. swap( rows(A)[0] , rows(A)[large]);
More examples can be found in general matvec mult.cc
APPENDIX A. CONTAINERS
137
Definition matrix.h Template Parameters Parameter External
Default internal
Description Specify whether the memory used is ”owned” by the matrix or if it was provided to the matrix from some external source (with a pointer to some data)
Members Declaration
Description
Where Defined
size type enum f id = DENSE, oned id, ext=External, issparse=0, index g
A.3.9 compressed Description TwoD Storage Type Selectors
This storage type is the traditional compressed row or
compressed column format. The storage consists of three arrays, one array for all of the elements, one array consisting of the row or column index (row for column-major and column for row-major matrices), and one array consisting of pointers to the start of each row/column. The following is an example sparse matrix in compressed for format, with the stored indices specified as index from one. Note that the MTL interface is still indexed from zero whether or not the underlying stored indices are from one. [ [ [
1
2
3
4 6
5 7
8
] ] ]
APPENDIX A. CONTAINERS [ [
9
10 11
12
138
] ]
row pointer array [ 1 4 6 9 11 13 ] element value array [ 1 2 3 4 5 6
7
8
9 10 11 12 ]
element column index array [ 1 3 4 2 5 1 3 4 1
4
3
5 ]
Of course, the user of the MTL sparse matrix does not need to concern his or herself with the implementation details of this matrix storage format. The interface to an MTL compressed row matrix is the same as that of any MTL matrix, as described in Matrix.
OneD Storage Type Selectors This is a OneD type used to construct array matrices. The compressed OneD format uses two arrays, one to hold the elements of the vector, and the other to hold the indices that coorespond to each element (either their row or column number). Example In sparse matrix.cc: // [1,0,2] // [0,3,0] // [0,4,5] const int m = 3, n = 3, nnz = 5; double values[] = { 1, 2, 3, 4, 5 }; int indices[] = { 1, 3, 2, 2, 3 }; int row_ptr[] = { 1, 3, 4, 6 }; // Create from pre-existing arrays typedef matrix::type MatA;
APPENDIX A. CONTAINERS
139
MatA A(m, n, nnz, values, row_ptr, indices); // Create from scratch typedef matrix::type MatB; MatB B(m, n); B(0,0) = 1; B(0,2) = 2; B(1,1) = 3; B(2,1) = 4; B(2,2) = 5;
Definition matrix.h Template Parameters Parameter
Default
SizeType External
int internal
IndexStyle
Description The type used in the index and pointer array Specify whether the memory used is ”owned” by the matrix or if it was provided to the matrix from some external source (with a pointer to some data) Specify whether the underlying index array stores indices starting from one (fortan style) or from zero (cstyle) - index from zero
Members Declaration size type enum f id = COMPRESSED, oned id, ext=External, issparse=1, index=IndexStyle g
Description
Where Defined
APPENDIX A. CONTAINERS
140
A.3.10 packed Description This storage type is equivalent to the BLAS/LAPACK packed storage format. The packed storage format is similar to the banded format, except that the storage for each row/column of the band is variable so there is no wasted space. This is better for efficiently storing triangular matrices. [ [ [ [ [
1 0 0 0 0
2 6 0 0 0
3 7 10 0 0
[ [ [ [ [
1 6 10 13 15
2 7 11 14 ]
3 8 12 ]
4 8 11 13 0
5 9 12 14 15
] ] ] ] ]
4 9 ]
5
]
]
mapped to linear memory with row-major order: [
1
2
3
4
5
6
7
8
9
10
11
12
13
Example In tri pack vect.cc: typedef matrix< double, triangle, packed, column_major >::type Matrix; typedef dense1D Vector; // 1 3 // A = 2 4 x = 2 // 3 5 6 1 const Matrix::size_type N = 3; double dA[] = { 1, 2, 3, 4, 5, 6 }; Matrix A(dA, N, N); Vector x(N), Ax(N); for (int i = 0; i < N; ++i) x[i] = 3-i;
14
15
]
APPENDIX A. CONTAINERS
141
mult(A, x, Ax);
Template Parameters Parameter
Default
Description
External
internal
Specify whether the memory used is ”owned” by the matrix or if it was provided to the matrix from some external source (with a pointer to some data)
Members Declaration
Description
Where Defined
size type enum f id = PACKED, oned id, ext=External, issparse=0, index g
A.3.11 banded view Description This storage type is used for creating matrices that are ”views” into existing full dense matrices. For instance, one could create a triangular view of a full matrix. Definition matrix.h Template Parameters Parameter External
Default internal
Description Specify whether the memory used is ”owned” by the matrix or if it was provided to the matrix from some external source (with a pointer to some data)
APPENDIX A. CONTAINERS
142
Members Declaration
Description
size type enum f id = BAND VIEW,oned id, ext=External, issparse=0, index
Where Defined
g
A.3.12 envelope Description The storage scheme is for sparse symmetric matrices, where most of the non-zero elements fall near the main diagonal. The storage format is useful in certain factorizations since the fill-ins fall in areas already allocated. This scheme is different than most sparse matrices since the row containers are actually dense, similar to a banded matrix. [ [ [ [ [
1 2 4
3 5 6 8
7 9
] ] ] ] 10 ]
[ 0 2 _______/__/ V V [ 1 2 3 4 0
Definition matrix.h Template Parameters
5 | V 5
7 11 ] Diagonals pointer array |__\___________ V V 6 7 8 0 9 10 ] Element values array
APPENDIX A. CONTAINERS Parameter External
Default internal
143 Description Specify whether the memory used is ”owned” by the matrix or if it was provided to the matrix from some external source (with a pointer to some data)
Members Declaration
Description
Where Defined
size type enum f id = ENVELOPE, oned id, ext=External, issparse = 0, index g
A.3.13 linked list Description This is a OneD type for constructing array matrices. The implementation is a std::list consisting of index-value pairs. Definition matrix.h Members Declaration size type enum f id = LINKED LIST, issparse = 1, index=index from zero
g
Description
Where Defined
APPENDIX A. CONTAINERS
144
A.3.14 sparse pair Description This is a OneD type for constructing array matrices. The implementation is a std:vector consisting of index-value pairs. Definition matrix.h Members Declaration
Description
Where Defined
size type enum f id = SPARSE PAIR, issparse = 1, index=index from zero
g
A.3.15 tree Description This is a OneD type for constructing array matrices. The implementation is a std::set consisting of index-value pairs. Definition matrix.h Members Declaration size type enum f id = TREE, issparse = 1, index=index from zero
g
Description
Where Defined
APPENDIX A. CONTAINERS
145
A.4 Container classes A.4.1 dense1D Description This is the primary class that you will need to use as a Vector. This class uses the STL vector for its implementation. The dense1D class serves as a handle to the vector. The MTL algorithms assume that the vector and matrix arguments are handles, and will not work with the STL style containers. For interoperability, one can create a dense1D from pre-existing memory. In this case, the mtl reference counting does not delete the memory. Definition dense1D.h Template Parameters Parameter RepType
Default
Description the underlying representation
Model of Vector Members Declaration
enum f N = NN dimension sparsity scaled type value type reference
g
Description
Where Defined
The sparsity tag The scaled type of this container The value type The reference type
Scalable Container Container
APPENDIX A. CONTAINERS Declaration
146
iterator const iterator reverse iterator
Description The const reference type The pointer type The type for dimensions and indices The type for differences between iterators The iterator type The const iterator type The reverse iterator type
const reverse iterator
The const reverse iterator type
subrange class IndexArray IndexArrayRef dense1D () dense1D (int n)
The subrange vector type The type for the index array The reference type for index array Default Constructor Non-Initializing Constructor not very standard :(
dense1D (int n, const value type& init) dense1D (const self& x) dense1D () self& operator= (const self& x) iterator begin ()
Initializing Constructor
Sequence
Copy Constructor (shallow copy) The destructor.
ContainerRef Container
Assignment Operator (shallow copy)
AssignableRef
Return an iterator pointing to the beginning of the vector Return an iterator pointing past the end of the vector
Container
const iterator begin () const
Return a const iterator pointing to the begining of the vector
Container
const iterator end () const
Return a const iterator pointing past the end of the vector
Container
reverse iterator rbegin ()
Return a reverse iterator pointing to the last element of the vector Return a reverse iterator pointing past the end of the vector
Reversible Container Reversible Container
const reverse iterator rbegin () const
Return a const reverse iterator pointing to the last element of the vector
Reversible Container
const reverse iterator rend () const
Return a const reverse iterator pointing past the end of the vector
Reversible Container
reference operator[] (int i)
Return a reference to the element with index i
Random Access Container
const reference pointer size type difference type
iterator end ()
reverse iterator rend ()
Where Defined Container Container Container Container Container Container Reversible Container Reversible Container
Container Sequence
Container
APPENDIX A. CONTAINERS Declaration subrange operator() (size type s, size type f) const reference operator[] (int i) const size type size () const size type nnz () const void resize (size type n) void resize (size type n, const T& x) int capacity () const void reserve (int n) const value type* data () const pointer data () iterator insert (iterator position, const value type& x = value type()) void insert (iterator position, size type n, const value type& x = value type()) IndexArrayRef nz struct () const
147 Description
Where Defined
Return a const reference to the element with index i Return the size of the vector Return the number of non-zeroes
Random Access Container Container
Raw Memory Access
Container
A.4.2 compressed1D Description The compressed1D Vector is a sparse vector implemented with a pair of parallel arrays. One array is for the element values, and the other array is for the indices of those elements. The elements are ordered by their index as they are inserted into the compressed1D. compressed1D's can be used to build matrices with the array storage format, and they can also be used on their own. [ (1.2, 3), (4.6, 5), (1.0, 10), (3.7, 32) ] A Sparse Vector [ 1.2, 4.6, 1.0, 3.7 ] [ 3, 5, 10, 32 ]
Value Array Index Array
APPENDIX A. CONTAINERS
148
The compressed1D::iterator dereferences (*i) to return the element value. One can access the index of that element with the i.index() function. One can also access the array of indices through the nz struct() method. One particularly useful fact is that one can perform scatters and gathers of sparse elements by using the mtl::copy(x,y) function with a sparse and a dense vector. Example In gather scatter.cc: void do_gather_scatter(DenseVec& d, SparseVec& c) { using namespace mtl; c[2] = 0; c[5] = 0; c[7] = 0; copy(d,c); scale(c,2.0); copy(c,d); typedef dense1D denseVec; typedef compressed1D compVec; denseVec d(9,2); compVec c; do_gather_scatter(d, c);
More examples can be found in array2D.cc, sparse copy.cc Definition compressed1D.h Template Parameters Parameter
Default
T SizeType
int
IND OFFSET
index from zero
Description the element type the type for the stored indices To handle indexing from 0 or 1
APPENDIX A. CONTAINERS
149
Model of Vector Members Declaration
enum f N = 0 sparsity dimension scaled type value type pointer size type
g
difference type reference const reference iterator const iterator reverse iterator const reverse iterator IndexArrayRef IndexArray subrange class insert iterator insert iterator inserter () compressed1D () compressed1D (size type n) compressed1D (const self& x) template compressed1D (const IndexArray& x) compressed1D (const light1D& x) self& operator= (const self& x)
Description This is a sparse vector This is a 1D container Scaled type of this vector Element type A pointer to the element type Unsigned integral type for dimensions and indices Integral type for differences in iterators Reference to the value type Const reference to the value type Iterator type Const iterator type Reverse iterator type The const reverse iterator type Reference to the index array The type for the index array The type for the subrange vector
Default Constructor Length N Constructor Copy Constructor Index Array Constructor
Assignment Operator
Where Defined
APPENDIX A. CONTAINERS Declaration
150 Description Return an iterator pointing to the beginning of the vector Return an iterator pointing past the end of the vector
Where Defined Container
const iterator begin () const
Return a const iterator pointing to the begining of the vector
Container
const iterator end () const
Return a const iterator pointing past the end of the vector
Container
reverse iterator rbegin ()
Return a reverse iterator pointing to the last element of the vector Return a reverse iterator pointing past the end of the vector
Reversible Container Reversible Container
const reverse iterator rbegin () const
Return a const reverse iterator pointing to the last element of the vector
Reversible Container
const reverse iterator rend () const
Return a const reverse iterator pointing past the end of the vector
Reversible Container
reference operator[] (size type i) const reference operator[] (size type i) const iterator insert (size type i, const T& val) void clear () size type size () const size type nnz () const
Access the element with index i
void resize (size type n) void reserve (size type n) IndexArrayRef nz struct () const IndexArrayRef nz struct ()
Resize the vector to size n
iterator begin () iterator end ()
reverse iterator rend ()
Container
Access the element with index i Insert val into the vector at index i Erase the vector The size of the vector The number of non-zero elements (the number stored)
Reserve storage for n elements Returns the array of indices Returns the array of indices
A.4.3 external vec Description This is similar to dense1D, except that the memory is provided by the user. This allows for interoperability with other array packages and even with Fortran.
APPENDIX A. CONTAINERS
151
Example In dot prod.cc: const int N = 3; double dx[] = { 1, 2, 3}; double dy[] = { 3, 0, -1}; typedef external_vec Vec; Vec x(dx,N), y(dy,N); print_vector(x); print_vector(y); double dotprd = dot(x, y); if (dotprd == 0) cout , row_major>::type MatC; MatA A(M,N);
APPENDIX A. CONTAINERS
167
MatB B(M,N); MatC C(M, N); // Fill A ... mtl::copy(A, B); MatB::Row tmp = B[2]; B[2] = B[3]; B[3] = tmp; mtl::copy(B, C);
Definition array2D.h Template Parameters Parameter OneD
Default
Description the one dimensional container the array is composed of
Model of TwoDStorage Members Declaration
template struct partitioned transpose type submatrix type banded view type enum fM = 0, N = 0g OneD OneDRef ConstOneDRef
Description
Where Defined
APPENDIX A. CONTAINERS Declaration storage loc sparsity strideability value type reference const reference size type iterator const iterator reverse iterator const reverse iterator dim type band type array2D () array2D (dim type d, size type start index = 0) array2D (dim type d, band type band, size type start index = 0) template array2D (const TwoD& x, band type) template array2D (MatrixStream& s, Orien) template array2D (MatrixStream& s, Orien, band type bw) array2D (const self& x) iterator begin () iterator end ()
168 Description
The 1D container type A reference to the value type A const reference to the value type The integral type for dimensions and indices The iterator type The const iterator type The reverse iterator type The const reverse iterator type A pair type for the dimension A pair type for the bandwidth Default Constructor Normal Constructor Banded Constructor sparse banded view constructor
Matrix Stream Constructor
Banded Matrix Stream Constructor
Copy Constructor (shallow) Return an iterator pointing to the first 1D container Return an iterator pointing past the end of the 2D container
const iterator begin () const
Return a const iterator pointing to the first 1D container
const iterator end () const
Return a const iterator pointing past the end of the 2D container
Where Defined
APPENDIX A. CONTAINERS
169
Declaration
Description
reverse iterator rbegin ()
Return a reverse iterator pointing to the last 1D container Return a reverse iterator pointing past the start of the 2D container
reverse iterator rend () const reverse iterator rbegin () const
Return a const reverse iterator pointing to the last 1D container
const reverse iterator rend () const
Return a const reverse iterator pointing past the start of the 2D container
OneD reference operator () (size type i, size type j) OneD const reference operator () (size type i, size type j) const
Return a reference to the (i,j) element, where (i,j) is in the 2D coordinate system Return a const reference to the (i,j) element, where (i,j) is in the 2D coordinate system
OneDRef operator [] (size type i)
Return a reference to the ith 1D container
ConstOneDRef operator [] (size type i) const
Return a const reference to the ith 1D container The dimension of the 2D container The dimension of the 1D containers The number of non-zeros
size type major () const size type minor () const size type nnz () const void print () const size type first index () const template void fast copy (const Matrix& x)
Where Defined
A faster specialization for copying
A.5 Container adaptors A.5.1 linalg vec Description This captures the main functionality of a dense MTL vector. The dense1D and external1D derive from this class, and specialize this class to use either internal or external storage.
APPENDIX A. CONTAINERS
170
Definition linalg vector.h Template Parameters Parameter RepType
Default
Description the underlying representation
Model of Linalg Vector Members Declaration
Description
self rep type rep ptr enum f N = NN g dimension sparsity scaled type value type reference const reference pointer size type difference type iterator const iterator reverse iterator
The sparsity tag The scaled type of this container The value type The reference type The const reference type The pointer (to the value type) type The size type (non negative) The difference type (an integral type) The iterator type The const iterator type The reverse iterator type
const reverse iterator
The const reverse iterator type
Vec class IndexArray IndexArrayRef
The type for an array of the indices of the element in the vector
Where Defined
Scalable Container Container Container Container Container Container Container Container Reversible Container Reversible Container
Vector
APPENDIX A. CONTAINERS Declaration subrange linalg vec () linalg vec (rep ptr x, size type start index) linalg vec (const self& x) linalg vec () self& operator= (const self& x) iterator begin ()
171 Description The type for a subrange vector-view of the original vector Default Constructor (allocates the container)
Where Defined Vector Container
Normal Constructor Copy Constructor (shallow copy)
ContainerRef
The destructor.
Container
Assignment Operator (shallow copy)
AssignableRef
Return an iterator pointing to the beginning of the vector Return an iterator pointing past the end of the vector
Container
const iterator begin () const
Return a const iterator pointing to the begining of the vector
Container
const iterator end () const
Return a const iterator pointing past the end of the vector
Container
reverse iterator rbegin ()
Return a reverse iterator pointing to the last element of the vector Return a reverse iterator pointing past the end of the vector
Reversible Container Reversible Container
const reverse iterator rbegin () const
Return a const reverse iterator pointing to the last element of the vector
Reversible Container
const reverse iterator rend () const
Return a const reverse iterator pointing past the end of the vector
Reversible Container
reference operator[] (size type i)
Return a reference to the element with the ith index
Vector
Return a const reference to the element with the ith index The size of the vector The number of non-zeroes in the vector
Vector
iterator end ()
reverse iterator rend ()
const reference operator[] (size type i) const size type size () const size type nnz () const void resize (size type n) void resize (size type n, const value type& x)
Resize the vector to n
size type capacity () const
Return the total capacity of the vector
Resize the vector to n, and assign x to the new positions
Container
Container
APPENDIX A. CONTAINERS
172
Declaration
Description
void reserve (size type n) const value type* data () const value type* data () iterator insert (iterator position, const value type& x = value type()) IndexArrayRef nz struct () const
Reserve more space in the vector
Where Defined
Raw Memory Access Raw Memory Access Insert x at the indicated position in the vector
Container
A.5.2 scaled1D Description This class in not meant to be used directly. Instead it is created automatically when the scaled(x,alpha) function is invoked. See the documentation for ”Shortcut for Creating A Scaled Vector”. This vector type is READ ONLY therefore there are only const versions of things ie. there is no iterator typedef, just const iterator. Definition scaled1D.h Template Parameters Parameter
Default
RandomAccessContainerRef
Model of RandomAccessRefContainerRef Members
Description The type of underlying container
APPENDIX A. CONTAINERS Declaration
enum f N = RandomAccessContainerRef::N g value type size type dimension iterator const iterator const reverse iterator pointer reference const reference difference type scaled type sparsity subrange IndexArray IndexArrayRef scaled1D () scaled1D (const Vector& r, value type scale ) scaled1D (const Vector& r, value type scale , do scaled s) scaled1D (const self& x) self& operator= (const self& x) scaled1D () operator Vector& () const iterator begin () const
173 Description
Where Defined
Static size, 0 if dynamic size The value type The unsigned integral type for dimensions and indices The dimension, should be 1D The iterator type (do not use this) The const iterator type The const reverse iterator type The pointer to the value type The reference type The const reference type The difference type The scaled type The sparsity tag (dense tag or sparse tag) The type for the index array The reference type to the index array Default constructor Normal constructor
Copy constructor Assignment operator Destructor Access base containers Return a const iterator pointing to the beginning of the vector
Container
const iterator end () const
Return a const iterator pointing past the end of the vector
Container
const reverse iterator rbegin () const
Return a const reverse iterator pointing to the last element of the vector
Reversible Container
const reverse iterator rend () const
Return a const reverse iterator pointing past the end of the vector
Reversible Container
const reference operator[] (int i) const
Return a const reference to the element with index i Return the size of the vector Return the number of non-zeroes
Container Vector
size type size () const size type nnz () const
APPENDIX A. CONTAINERS Declaration
174 Description
Where Defined
void adjust index (size type i)
A.5.3 sparse1D Description This is a sparse vector implementation that can use several different underlying containers, including std::vector, std::list, and std::set. This adaptor is used in the implementation of the linked list, tree, and sparse pair OneD storage types (used with the array matrix storage type). This adaptor can also be used as a stand-alone Vector. The value type of the underlying containers must be entry1, which is just an indexvalue pair. The elements are ordered by their index as they are inserted. Example In gather scatter.cc: void do_gather_scatter(DenseVec& d, SparseVec& c) { using namespace mtl; c[2] = 0; c[5] = 0; c[7] = 0; copy(d,c); scale(c,2.0); copy(c,d); typedef dense1D denseVec; typedef compressed1D compVec; denseVec d(9,2); compVec c; do_gather_scatter(d, c);
APPENDIX A. CONTAINERS
175
Definition sparse1D.h Template Parameters Parameter RepType
Default
Description The Container type used to store the index value pairs.
Model of ContainerRef? Type requirements
The value type of RepType must be of type entry1
Members Declaration
enum f N = 0 sparsity entry type PR dimension scaled type value type pointer
g
size type difference type entry reference const reference
Description This is a sparse vector The index-value pair type The value type This is a 1D container The scaled type The value type The type for pointers to the value type The unsigned integral type for dimensions and indices The type for differences between iterators The type for references to the value type The type for const references to the value type
Where Defined
APPENDIX A. CONTAINERS Declaration iterator const iterator reverse iterator const reverse iterator IndexArray IndexArrayRef subrange sparse1D () sparse1D (int n) sparse1D (const self& x) template sparse1D (const IndexArray& x) self& operator= (const self& x) iterator begin ()
176 Description The iterator type The const iterator type The reverse iterator type The const reverse iterator type The type for the index array The reference type for the index array The type for subrange vectors Default Constructor Length N Constructor Copy Constructor
Where Defined
Construct from index array Assignment Operator Return an iterator pointing to the beginning of the vector Return an iterator pointing past the end of the vector
Container
const iterator begin () const
Return a const iterator pointing to the begining of the vector
Container
const iterator end () const
Return a const iterator pointing past the end of the vector
Container
reverse iterator rbegin ()
Return a reverse iterator pointing to the last element of the vector Return a reverse iterator pointing past the end of the vector
Reversible Container Reversible Container
const reverse iterator rbegin () const
Return a const reverse iterator pointing to the last element of the vector
Reversible Container
const reverse iterator rend () const
Return a const reverse iterator pointing past the end of the vector
Reversible Container
const reference operator[] (int i) const
Element Access, return element with index i
reference operator[] (int i)
Element Access, return element with index i
iterator insert (int i, const PR& value)
Insert the value at index i of the vector Returns size of the vector
iterator end ()
reverse iterator rend ()
int size () const
Container
APPENDIX A. CONTAINERS Declaration int nnz () const template void resize imp (int n, R*) template void resize imp (int n, R*) void resize (int n) rep type& get rep () void print () const IndexArrayRef nz struct () const
177 Description Number of non-zero (stored) elements
Where Defined
Resizes the vector to size n
Return an array of indices describing the non-zero structure
A.5.4 strided1D Description This class in not meant to be used directly. Instead it is created automatically when the stride(x,inc) function is invoked. See the documentation for ”Shortcut for Creating A Strided Vector”. Definition strided1D.h Template Parameters Parameter Default RandomAccessContainerRef
Model of RandomAccessContainerRef
Description base container type
APPENDIX A. CONTAINERS
178
Members Declaration
Description
Where Defined
Container Container
iterator const iterator reverse iterator
The value type The type for references to the value type The type for const references to the value type The iterator type The const iterator type The reverse iterator type
const reverse iterator
The const reverse iterator type
scaled type sparsity IndexArrayRef
The scaled vector type Whether the vector is sparse or dense The type for references to the index array The type for the index array This is a 1D container The unsigned integral type for dimensions and indices The integral type for differences between iterators The type for pointers to the value type The subrange vector type
enum f N = RandomAccessContainerRef::N g value type reference const reference
IndexArray dimension size type difference type pointer subrange strided1D (const Vector& r, int stride ) strided1D (const self& x) operator Vector& () iterator begin ()
Container Container Container Reversible Container Reversible Container Scalable
Container Container
Normal Constructor Copy Constructor Return an iterator pointing to the beginning of the vector Return an iterator pointing past the end of the vector
Container
const iterator begin () const
Return a const iterator pointing to the begining of the vector
Container
const iterator end () const
Return a const iterator pointing past the end of the vector
Container
iterator end ()
Container
APPENDIX A. CONTAINERS
179
Declaration
Description
Where Defined
reverse iterator rbegin ()
Return a reverse iterator pointing to the last element of the vector Return a reverse iterator pointing past the end of the vector
Reversible Container Reversible Container
const reverse iterator rbegin () const
Return a const reverse iterator pointing to the last element of the vector
Reversible Container
const reverse iterator rend () const
Return a const reverse iterator pointing past the end of the vector
Reversible Container
reference operator[] (int i)
Return a reference to the element with index i
Random Access Container
const reference operator[] (int i) const
Return a const reference to the element with index i Return the size of the vector Return the number of non-zeroes
Random Access Container Container Vector
reverse iterator rend ()
int size () const size type nnz () const void reindex (int i) subrange operator() (size type s, size type f) void adjust index (size type i)
Return a subrange vector containing the elements from index s to f
A.5.5 scaled2D Description This class is not meant to be used directly. Instead, use the scaled() function to create a scaled matrix to pass into an MTL algorithm. Members Declaration
Description
value type reference
The unsigned integral type for dimensions and indices The 1D container type The type for references to value type
template struct partitioned enum f M = TwoD::M, N = TwoD::N g size type
Where Defined
APPENDIX A. CONTAINERS Declaration const reference iterator const iterator reverse iterator const reverse iterator sparsity strideability storage loc transpose type scaled2D () scaled2D (const TwoD& x, const T& a) const iterator begin () const
180 Description The type for const references to value type The iterator type (not used) The const iterator type The reverse iterator type (not used) The const reverse iterator type Either sparse tag or dense tag Whether the underlying 2D container is strideable Either internal or external storage The transpose type Default Constructor Normal Constructor Return a const iterator pointing to the first 1D container
const iterator end () const
Return a const iterator pointing past the end of the 2D container
const reverse iterator rbegin () const
Return a const reverse iterator pointing to the last 1D container
const reverse iterator rend () const
Return a const reverse iterator pointing past the start of the 2D container
reference operator[] (int i) const
Return a const reference to the ith 1D container
T operator() (int i, int j) const
Return a const reference to the (i,j) element, where (i,j) is in the 2D coordinate system The dimension of the 2D container The dimension of the 1D containers The number of non-zeros
int major () const int minor () const size type nnz () const
Where Defined
A.5.6 block2D Description For use in blocked algorithms with rectangle dense matrices. The blocks all have the same size (vs. variable sizes as in a partitioned matrix). The matrix objects for each block are not stored, they are generated on the fly as they are requested, and they are lightweight object on the stack so no overhead is incurred.
APPENDIX A. CONTAINERS
181
The blocking size must divide evenly into the original matrix size. One good way to ensure this is to partition the original matrix into a main region that divides evenly and into the blocks, and 3 others edge regions that do not get blocked. Use the block view type constructor and the blocked function to create matrices of this type. Example In blocked matrix.cc: const int M = 4; const int N = 4; typedef matrix::type Matrix; Matrix A(M,N); for (int i = 0; i < M; ++i) for (int j = 0; j < N; ++j) A(i, j) = i * N + j; print_all_matrix(A); block_view::type bA = blocked(A, blk()); print_partitioned_matrix(bA); block_view::type cA = blocked(A, 2, 2); print_partitioned_by_column(cA);
Definition block2D.h Template Parameters Parameter Block
OffsetGen
Default
Description The submatrix block, a dense external matrix. The Offset generator.
APPENDIX A. CONTAINERS
182
Members Declaration
enum f M = 0, N = 0 size type difference type
g
sparsity storage loc strideability template struct partitioned class block vector value type reference pointer class iterator class const iterator dyn dim bdt template block2D (TwoD& x, dyn dim b) block2D (const block2D& x) const block2D& operator= (const block2D& x) block2D () block2D () iterator begin () iterator end ()
Description The 1D container type The type for differences between iterators This is a dense 2D container This has external storage This is strideable
The 1D container type A reference to the value type The type for pointers to the value type The iterator type The const iterator type
Constructor from underlying 2D container Copy Constructor
Default Constructor Destructor Return an iterator pointing to the first 1D container Return an iterator pointing past the end of the 2D container
const iterator begin () const
Return a const iterator pointing to the first 1D container
const iterator end () const
Return a const iterator pointing past the end of the 2D container
block vector operator[] (size type i)
Return a reference to the ith 1D container
Where Defined
APPENDIX A. CONTAINERS
183
Declaration
Description
Block operator() (size type i, size type j)
Return a reference to the (i,j) element, where (i,j) is in the 2D coordinate system
const Block operator() (size type i, size type j) const size type ld () const
Where Defined
Return a const reference to the (i,j) element, where (i,j) is in the 2D coordinate system The leading dimension
A.6 Container functions A.6.1 scaled Prototype template Scalable::scaled type scaled(const Scalable& A, const T& alpha) ;
Description This function can be used to scale arguments in MTL functions. For example, to perform the vector addition operation z ::type Matrix; Matrix A(3,3); double SCALE = - A(2,1) / A(1,1); add(scaled(A[0], SCALE), A[1], A[1]);
A.6.2 strided Prototype template strided1D strided(RandomAccessContainerRef& v, Distance stride ) ;
Description The helper function for creating a strided vector adaptor. Definition strided1D.h Requirements on types
Distance must be compatible with RandomAccessContainerRef's Distance
APPENDIX A. CONTAINERS
185
Complexity compile time
A.6.3 rows Prototype template rows type::type rows(const Matrix& A) ;
Description For matrix A, A[i] now gives you the ith row and A.begin() gives you an iterator over rows Definition matrix implementation.h Example In swap rows.cc: typedef matrix< double, rectangle, dense, column_major>::type Matrix; const Matrix::size_type N = 3; Matrix::size_type large; double dA[] = { 1, 3, 2, 1.5, 2.5, 3.5, 4.5, 9.5, 5.5 }; Matrix A(dA, N, N); // Find the largest element in column 1. large = max_index(A[0]); // Swap the first row with the row containing the largest // element in column 1. swap( rows(A)[0] , rows(A)[large]);
APPENDIX A. CONTAINERS
186
A.6.4 columns Prototype template columns type::type columns(const Matrix& A) ;
Description For matrix A, A[i] now gives you the ith column and A.begin() gives you an iterator over columns. See rows for an example. Definition matrix implementation.h
A.6.5 trans Prototype template Matrix::transpose type trans(const Matrix& A) ;
Description Swap the orientation of a matrix (i.e., from row-major to column-major). In essence this transposes the matrix. This operation occurs at compile time. Definition matrix implementation.h Example In trans mult.cc: typedef matrix< double, rectangle, dense, row_major >::type EMatrix; typedef dense1D Vector;
APPENDIX A. CONTAINERS
187
const EMatrix::size_type n = 5; Vector y(n,1),Ay(n); double da[n*n]; EMatrix A(da,n,n); mult(trans(A),y,Ay);
A.6.6 blocked Prototype template block view::type blocked(Matrix& A, int bm, int bn) ;
Description block_view bA = blocked(A, 16, 16);
Note: currently not supported for egcs (internal compiler error). Definition matrix.h Example In blocked matrix.cc: const int M = 4; const int N = 4; typedef matrix::type Matrix; Matrix A(M,N); for (int i = 0; i < M; ++i) for (int j = 0; j < N; ++j) A(i, j) = i * N + j; print_all_matrix(A); block_view::type bA = blocked(A, blk());
APPENDIX A. CONTAINERS
188
print_partitioned_matrix(bA); block_view::type cA = blocked(A, 2, 2); print_partitioned_by_column(cA);
A.6.7 blocked Prototype template block view::type blocked(Matrix& A, blk) ;
Description This version of the blocked matrix generator is for statically sized blocks. block_view bA = blocked(A);
Note: currently not supported for egcs (internal compiler error). Definition matrix.h Example In blocked matrix.cc: const int M = 4; const int N = 4; typedef matrix::type Matrix; Matrix A(M,N); for (int i = 0; i < M; ++i) for (int j = 0; j < N; ++j) A(i, j) = i * N + j; print_all_matrix(A); block_view::type
APPENDIX A. CONTAINERS
189
bA = blocked(A, blk()); print_partitioned_matrix(bA); block_view::type cA = blocked(A, 2, 2); print_partitioned_by_column(cA);
A.7 Container tags A.7.1 banded tag A.7.2 column matrix traits Members Declaration
Description
Where Defined
Column
A.7.3 column tag A.7.4 dense tag A.7.5 diagonal matrix traits Members Declaration
Description
Where Defined
Diagonal
A.7.6 diagonal tag A.7.7 external tag A.7.8 hermitian tag A.7.9 internal tag A.7.10 linalg traits Members Declaration dimension
Description Whether the object is a 1D or 2D container
Where Defined
APPENDIX A. CONTAINERS Declaration value type sparsity magnitude type
190 Description The element type within the container Either sparse or dense The return type for abs(value type)
Where Defined
A.7.11 matrix traits Members Declaration shape
orientation sparsity transpose type strided type strideability scaled type
storage loc OneD
value type reference const reference pointer size type difference type
Description The shape of the matrix, either rectangle tag, banded tag, diagonal tag, triangle tag, or symmetric tag The orientation, either row tag or column tag The sparsity, either dense tag or sparse tag Used by the trans helper function Used by the rows and columns helper functions Whether the rows and columns functions can be used with this Matrix The Matrix type resulting from wrapping a scaled adator around this Matrix Whether the Matrix owns its data, either external tag or internal tag A OneD part of a Matrix. This could be a Row, a Column or a Diagonal depending on the type of Matrix. The element type of the matrix
A NonNegativeIntegral type
Where Defined
APPENDIX A. CONTAINERS
191
A.7.12 not strideable A.7.13 oned tag A.7.14 rectangle tag A.7.15 row matrix traits Members Declaration Row
A.7.16 row tag A.7.17 sparse tag A.7.18 strideable A.7.19 symmetric tag A.7.20 triangle tag A.7.21 twod tag
Description
Where Defined
Appendix B Iterators B.1 Concepts B.1.1 IndexedIterator Description The iterator concept for iterators of Vector's. An IndexedIterator provides access to the indices, as well as the elements, of a Vector. For instance, i.row() gives the row index cooresponding to the element *i. Refinement of BidirectionalIterator Notations X A type that is a model of IndexedIterator i Object of type X V A type that is a model of Vector a An object of type V n An object of integral type r A row in some Matrix.
192
APPENDIX B. ITERATORS
193
Expression semantics Expression
Semantics
i.row() i.column() i.index()
Description Row Index access Column Index access Index access
Function specification Prototype
Description
Complexity
size type row()
Row Index access
constant time
size type column()
Column Index access
constant time
size type index()
Index access
constant time
Models
dense iterator
sparse iterator
scale iterator
stride iterator
compressed iter
B.2 Iterator functions B.2.1 trans iter Prototype template transform iterator trans iter(Iterator i, UnaryFunction op) ;
APPENDIX B. ITERATORS
194
Description Definition transform iterator.h
B.3 Iterator adaptors B.3.1 dense iterator Description An iterator for dense contiguous container that keeps track of the index. Definition dense iterator.h Template Parameters Parameter Default RandomAccessIterator
Description the base iterator
Model of RandomAccessIterator Members Declaration value type iterator category difference type pointer
Description The value type This is a random access iterator The type for differences between iterators The type for pointers to the value type
Where Defined
APPENDIX B. ITERATORS Declaration reference Distance RandomAccessIterator start int pos int start index int index () const dense iterator () dense iterator (RandomAccessIterator s, int i, int first index = 0) dense iterator (const self& x) template dense iterator (const SELF& x) self& operator= (const self& x) dense iterator () RandomAccessIterator base () const operator RandomAccessIterator () const reference operator* () const pointer operator-> () const self& operator++ () self operator++ (int) self& operator-- () self operator-- (int) self operator+ (Distance n) const self& operator+= (Distance n) self operator- (Distance n) const difference type operator- (const self& x) const self& operator-= (Distance n)
195 Description The type for references to the value type
Where Defined
Return the index of the current element Default Constructor
IndexedIterator
Constructor from underlying iterator Copy Constructor
Assignment operator Destructor Access the underlying iterator
Dereference operator Member access operator Pre-increment operator Post-increment operator Pre-decrement operator Post-decrement operator Add iterator and distance n Add distance n to this iterator Subtract iterator and distance n Return the difference between two iterators Subtract distance n from this iterator
APPENDIX B. ITERATORS
196
Declaration
Description
bool operator!= (const self& x) const
Return whether this iterator is not equal to iterator x
bool operator < (const self& x) const bool operator > (const self& x) const bool operator== (const self& x) const bool operator= (const self& x) const reference operator[] (Distance n) const
Where Defined
Return whether this iterator is less than iterator x Return whether this iterator is greater than iterator x Return whether this iterator is equal to iterator x Return whether this iterator is less than or equal to iterator x Return whether this iterator is greater than or equal to iterator x Equivalent to *(i + n)
B.3.2 scale iterator Description The scale iterator is an adaptor which multiplies the value of the underlying element by some scalar as they are access (through the dereference operator). Scale iterators are somewhat different from most in that they are always considered to be a constant iterator whether or not the underlying elements are mutable. Typically users will not need to use scale iterator directly. It is really just an implementation detail of the scaled1D container. Definition scale iterator.h Template Parameters Parameter RandomAccessIterator
T
Default
Description The underlying iterator The type of the scalar to multiply by
APPENDIX B. ITERATORS
197
Model of RandomAccessIterator Type requirements
T must be convertible to RandomAccessIterator's value type
RandomAccessIterator's value type must be a model of Ring
Members Declaration difference type value type iterator category pointer Distance iterator type reference const reference scale iterator () scale iterator (const RandomAccessIterator& x) scale iterator (const RandomAccessIterator& x, const value type& a) scale iterator (const self& x) int index () const operator RandomAccessIterator () RandomAccessIterator base () const value type operator* () const self& operator++ () self operator++ (int)
Description The difference type The value type The iterator category The pointer type
Where Defined
The reference type The default constructor
Trivial Iterator scale iterator
Normal constructor
scale iterator
Copy constructor
Trivial Iterator
MTL index method
Indexible Iterator
Convert to base iterator
scale iterator
Access base iterator
scale iterator
Dereference (and scale)
Trivial Iterator
Preincrement Postincrement
Forward Iterator Forward Iterator
APPENDIX B. ITERATORS Declaration
198
self& operator-- ()
Description Preincrement
self operator-- (int)
Postincrement
self operator+ (Distance n) const
Iterator addition
Random Access Iterator
self& operator+= (Distance n)
Advance a distance
Random Access Iterator
self operator- (Distance n) const
Subtract a distance
Random Access Iterator
Retreat a distance
Random Access Iterator
difference type operator- (const self& x) const self& operator-= (Distance n) value type operator[] (Distance n) const bool operator== (const self& x) const bool operator!= (const self& x) const bool operator< (const self& x) const
Where Defined Bidirectional Iterator Bidirectional Iterator
Access at an offset Equality
Trivial Iterator
Inequality
Trivial Iterator
Less than
Random Access Iterator
B.3.3 sparse iterator Description This iterators is used to implement the sparse1D adaptor. The base iterator returns a entry1 (an index-value pair) and this iterator makes it look like we are just dealing with the value for dereference, while the index() method return the index. Template Parameters Parameter Iterator
T
Default
Description the underlying iterator type the value type
APPENDIX B. ITERATORS
199
Members Declaration PR iterator category value type difference type reference pointer sparse iterator () sparse iterator (const sparse iterator& x) sparse iterator (const Iterator& iter , int p = 0) sparse iterator (const Iterator& start, const Iterator& finish) self& operator= (const self& x) operator Iterator () int index () const bool operator!= (const self& x) const bool operator< (const self& x) const reference operator* () const self& operator++ () self operator++ (int) self& operator-- () self& operator+= (int n) self& operator-= (int n) self operator+ (int n) const self operator- (int n) const int operator- (const self& x) const Iterator base () const Iterator iter int pos
Description
Where Defined
The iterator category The value type The type for differences between iterators The type for references to value type The type for pointers to value type Default Constructor Copy Constructor Constructor from underlying iterator
Assignment Operator Return the index of the element pointed to by this iterator Return whether this iterator is not equal to iterator x Return whether this iterator is less than iterator x Deference, return the element pointed to by this iterator Pre-increment operator Post-increment operator Pre-decrement operator Add distance n to this iterator Subtract distance n from this iterator Add this iterator and distance n Subtract this iterator and distance n Return the difference between this iterator and iterator x
IndexedIterator
APPENDIX B. ITERATORS
200
B.3.4 strided iterator Description This iterator moves a constant stride for each increment or decrement operator invoked. The strided iterator is used to implement a row-view to column oriented matrices, or column-views to row oriented matrices. Model of RandomAccessIterator Members Declaration difference type value type reference iterator category pointer Distance iterator type strided iterator () strided iterator (const RandomAccessIterator& x, int s) strided iterator (const self& x) self& operator= (const self& x) int index () const operator RandomAccessIterator () const RandomAccessIterator base () const
Description The type for the difference between two iterators The value type pointed to by this iterator type The type for references to the value type The iterator category for this iterator The type for pointers to the value type
Where Defined
The underlying iterator type Default Constructor Construct from the underlying iterator Copy Constructor Assignment Operator Return the index of the element this iterator points to Convert to the underlying iterator
IndexedIterator
APPENDIX B. ITERATORS
201
Declaration
Description
reference operator* () const
Dereference, return the element currently pointed to Pre-increment operator Post-increment operator Pre-decrement operator Post-decrement operator
self& operator++ () self operator++ (int) self& operator-- () self operator-- (int) self operator+ (Distance n) const self& operator+= (Distance n) self operator- (Distance n) const self& operator-= (Distance n) self operator+ (const self& x) const Distance operator(const self& x) const
Add this iterator and n Add distance n to this iterator Subtract this iterator and distance n Subtract distance n from this iterator Add this iterator and iterator x Return this distance between this iterator and iterator x
reference operator[] (Distance n) const bool operator== (const self& x) const
Return *(i + n)
bool operator!= (const self& x) const
Return whether this iterator is not equal to iterator x
bool operator< (const self& x) const
Where Defined
Return whether this iterator is equal to iterator x
Return whether this iterator is less than iterator x
B.3.5 transform iterator Description This iterator adaptor applies some function during the dereference Template Parameters Parameter Iterator
UnaryFunction
Default
Description The underlying iterator type A function that takes one argument of value type
APPENDIX B. ITERATORS
202
Members Declaration value type difference type iterator category pointer reference transform iterator (Iterator i, UnaryFunction op) transform iterator (const transform iterator& x) transform iterator& operator= (const transform iterator& x) value type operator* ()
Description The value type The difference type The iterator category The pointer type The reference type Normal Constructor Copy Constructor Assignment Operator Dereference Operator (applies the function here)
Where Defined
Appendix C Algorithms C.0.6 sum Prototype template linalg traits::value type sum(const Vector& x) ;
Description The sum of all of the elements in the container. Definition mtl.h Requirements on types
The addition operator must be defined for Vector::value type.
Complexity linear Example In vec sum.cc: mtl::dense1D< double > x(10, 2.0); double s = vec::sum(x); cout x(10, 2.0); vec::scale(x, 2.0); mtl::print_vector(x);
APPENDIX C. ALGORITHMS
206
C.0.9 set diagonal Prototype template void set diagonal(Matrix& A, const T& alpha) ;
Description Set the value of the elements on the main diagonal of A to alpha. Definition mtl.h Requirements on types
T must be convertible to Matrix::value type.
Complexity O(min(m,n)) for dense matrices, O(nnz) for sparse matrices (except envelope, which is O(m)) Example In tri pack sol.cc: const int N = 4; Matrix A(N, N); set_diagonal(A, 1); // // // //
1.0 A = 2.0 3.0 4.0
1.0 5.0 6.0
1.0 7.0
1.0
8.0 b = 25.0 79.0 167.0
A(1,0) = 2; A(2,1) = 5; A(3,2) = 7; A(2,0) = 3; A(3,1) = 6; A(3,0) = 4;
APPENDIX C. ALGORITHMS
207
C.0.10 two norm Prototype template linalg traits::magnitude type two norm(const Vector& x) ;
Description The square root of the sum of the squares of the elements of the container. Definition mtl.h Requirements on types
Vector must have an associated magnitude type that is the type of the absolute value of Vector::value type.
There must be std::abs() defined for Vector::value type.
The addition must be defined for magnitude type.
sqrt() must be defined for magnitude type.
Complexity O(n) Example In vec two norm.cc: dense1D x(10, 2.0); double s = two_norm(x); cout OutIter transform(InIter1 first1, count, InIter2 first2, OutIter result, BinOp binary_op);
Description Takes input from two iterators, applies a binary operator, and outputs the result into a third iterator. Definition fast.h
F.0.8 fill Prototype template OutputIterator fill(OutputIterator first, count, const T& value);
Description Assign the value into N elements of the output iterator first. Definition fast.h
APPENDIX F. FIXED ALGORITHM SIZE TEMPLATE (FAST) LIBRARY
255
F.0.9 swap ranges Prototype template ForwardIterator2 swap_ranges(ForwardIterator1 first1, count, ForwardIterator2 first2);
Description Swap N elements from first1 and first2. Definition fast.h
F.0.10 accumulate Prototype template T accumulate(InputIterator first, count, T init) ;
Description Sum N elements from first. Definition fast.h
F.0.11 accumulate Prototype template T accumulate(InputIterator first, count, T init, BinaryOperation binary_op);
APPENDIX F. FIXED ALGORITHM SIZE TEMPLATE (FAST) LIBRARY
256
Description Accumulate the result of the binary operator applied to the N elements of first and init. Definition fast.h
F.0.12 inner product Prototype template T inner_product(InIter1 first1, count, InIter2 first2, T init, BinOp1 binary_op1, BinOp2 binary_op2);
Description A fixed size inner product. Definition fast.h
F.0.13 inner product Prototype template T inner product(InIter1 first1, count, InIter2 first2, T init) ;
Description A fixed size inner product using addition and multiplication operators. Definition fast.h
APPENDIX F. FIXED ALGORITHM SIZE TEMPLATE (FAST) LIBRARY
257
F.0.14 count Description A class for representing numbers at compile time. Members Declaration enum
f
N = NN
g
Description
Where Defined
Appendix G Basic Linear Algebra Instruction Set (BLAIS) Library G.0.15 add Description This adds vector x into vector y. Example In blais add.cc: template inline void do_add(VecX& x, VecY& y) { for (int i = 0; i < N; ++i) { x[i] = i; y[i] = i + 1; } blais_vv::add(x.begin(), y.begin()); } int ix[N], iy[N]; external_vec x1(ix, N); external_vec y1(iy, N); do_add(x1, y1); dense1D x2(N); dense1D y2(N); do_add(x2, y2, N);
Template Parameters 258
APPENDIX G. BASIC LINEAR ALGEBRA INSTRUCTION SET (BLAIS) LIBRARY259 Parameter N
Default
Description the length of the vectors
Members Declaration
template add (Vec1 x, Vec2 y)
Description
Where Defined
G.0.16 copy Description Copies vector x into vector y Template Parameters Parameter N
Default
Description the length of the vectors
Members Declaration
template copy (Vec1 x, Vec2 y)
Description
G.0.17 copy Description Template Parameters Parameter M
N
Default
Description Number of rows in A Number of columns in A
Where Defined
APPENDIX G. BASIC LINEAR ALGEBRA INSTRUCTION SET (BLAIS) LIBRARY260
Members Declaration
template copy (MatrixA& A, MatrixB& B)
Description
Where Defined
G.0.18 dot Template Parameters Parameter N
Default
Description the length of the vectors
Members Declaration
template dot (Vec1 x, Vec2 y, T& prod)
Description
G.0.19 mult Template Parameters Parameter M
N
Members
Default
Description Number of rows in A Number of columns in A
Where Defined
APPENDIX G. BASIC LINEAR ALGEBRA INSTRUCTION SET (BLAIS) LIBRARY261 Declaration
template mult (const Matrix& A, VecX x, VecY y)
Description
Where Defined
G.0.20 mult Template Parameters Parameter
Default
M N
K
Description Number of rows in A and C Number of columns in B and rows in C Number of columns in A and rows in B
Members Declaration
template mult (MatrixA& A, MatrixB& B, MatrixC& C)
Description
Where Defined
G.0.21 rank one Description Perform a rank one update (outer product) on the M
N static sized matrix.
Template Parameters Parameter M
N
Members
Default
Description Number of rows in A Number of columns in A
APPENDIX G. BASIC LINEAR ALGEBRA INSTRUCTION SET (BLAIS) LIBRARY262 Declaration
template rank one (Matrix& A, VecX x, VecY y)
Description
Where Defined
G.0.22 set Description Set the elements of the static sized N vector to alpha. Template Parameters Parameter N
Default
Description static length of x
Members Declaration
template set (Vector x, const T& alpha)
Description
Where Defined
G.0.23 set Description Set the elements of the static sized M
N matrix to alpha.
Members Declaration
template set (Matrix A, const T& alpha)
Description
Where Defined
Appendix H MTL to LAPACK Interface H.0.24 lapack matrix Description Use this matrix type constructor to create the type of matrix to use in conjunction with the mtl2lapack functions. The vector type you use with mtl2lapack functions must be contiguous in memory, and have a function data() defined which returns a pointer to that memory, and a function size() with gives the length. Example In getrf.cc: double da [] = { 1, 2, 2, 2, 1, 2, 2, 2, 1 }; lapack_matrix::type A(da, M, N); lapack_matrix::type B(M*NRHS, NRHS); mtl::set(B, 15.0); dense1D pivot(N, 0); int info = getrf(A, pivot); if (info == 0) { info = getrs('N', A, pivot, B); if (info == 0) { cout M, ROWCND contains the ratio of the smallest R(i) to the largest R(i). If ROWCND >= 0.1 and AMAX
row cond (OUT - Real number) If INFO = 0 or INFO
is neither too large nor too small, it is not worth scaling by R.
col cond (OUT - Real number) If INFO = 0, COLCND contains the ratio of the smallest C(i) to the largest C(i). If COLCND >= 0.1, it is not worth scaling by C.
amax (OUT - Real number) Absolute value of largest matrix element. If AMAX is very close to overflow or very close to underflow, the matrix should be scaled.
Definition mtl2lapack.h
APPENDIX H. MTL TO LAPACK INTERFACE
272
H.1.9 gelqf Prototype template int gelqf(LapackMatA& a, VectorT& tau) ;
Description Compute an LQ factorization of a M-by-N matrix A.
a (IN/OUT - matrix(M,N)) On entry, the M-by-N matrix A. On exit, the elements on and below the diagonal of the array contain the m-by-min(m,n) lower trapezoidal matrix L (L is lower triangular if m
::type matlab_dense;
Definition matlabio.h
I.2.2 write dense matlab Prototype void write dense matlab(matlab dense& A, char* matrix name, const char* file) ;
Description The matrix type for this function is the following typedef matrix::type matlab_dense;
Definition matlabio.h
I.2.3 read sparse matlab Prototype void read sparse matlab(matlab sparse& A, char* matrix name, const char* file) ;
279
APPENDIX I. UTILITIES
280
Description The matrix type for this function is the following typedef matrix, column_major >::type matlab_sparse;
Definition matlabio.h
I.2.4 write sparse matlab Prototype void write sparse matlab(matlab sparse& A, char* matrix name, const char* file) ;
Description The matrix type for this function is the following typedef matrix< double, rectangle, array< compressed >, column_major >::type matlab_sparse;
Definition matlabio.h
I.3 Classes I.3.1 dimension Description This is similar to the std::pair class except that it can have static parameters, and only deals with size types. The purpose of this class is to transparently hide whether the dimensions
APPENDIX I. UTILITIES
281
of a matrix are specified statically or dynamically. Members Declaration
Description
Where Defined
transpose type size type enum f M = MM, N = NN g dimension () dimension (const dimension& x) template dimension (const std::pair& x) m (x.first) template dimension (const Dim& x) dimension (size type m , size type n ) dimension& operator= (const dimension& x) size type first () const size type second () const bool is static () const transpose type transpose () const size type m, n
I.3.2 harwell boeing stream Description This class simplifies the job of creating matrices from files stored in the Harwell-Boeing format. All matrix types have a constructor that takes a harwell boeing stream object. One can also access the elements from a matrix stream using operator>>(). The stream handles both real and complex numbers. Usage: harwell_boeing_stream mms( fielname ); Matrix A(mms);
APPENDIX I. UTILITIES
282
Template Parameters Parameter
Default
Description the matrix element complex)
T
type
(double
or
Members Declaration
Description
harwell boeing stream (char* filename) harwell boeing stream () int nrows () const int ncols () const int nnz () const bool eof () bool is complex ()
Construct from file name
Where Defined
Destructor Number of rows in matrix Number of columns in matrix Number of non-zeroes in matrix At the end of the file?
I.3.3 matrix market stream Description This class simplifies the job of creating matrices from files stored in the Matrix Market format. All matrix types have a constructor that takes a matrix market stream object. One can also access the elements (of type entry2) from a matrix stream using the stream operator. The stream handles both real and complex numbers. Usage: matrix_market_stream mms( fielname ); Matrix A(mms);
Template Parameters Parameter T
Default
Description the matrix element complex)
type
(double
or
APPENDIX I. UTILITIES
283
Members Declaration
Description
matrix market stream (char* filename) matrix market stream () bool eof () const int nrows () const int ncols () const int nnz () const bool is symmetric () const bool is complex () const bool is hermitian () const
Construct from filename Destructor, closes the file At the end of the file yet? Number of rows in matrix Number of columns in matrix Number of non-zeroes in matrix
Where Defined